[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

2019-06-14 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864458#comment-16864458
 ] 

Mithun Radhakrishnan commented on HIVE-21877:
-

No worries, mate. Cheers.

> Change HCatTableInfo to not be transient in PartInfo
> 
>
> Key: HIVE-21877
> URL: https://issues.apache.org/jira/browse/HIVE-21877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from 
> it. We were running into NPE during serialization while using HCatalogIO with 
> Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

2019-06-14 Thread Ankit Jhalaria (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864433#comment-16864433
 ] 

Ankit Jhalaria commented on HIVE-21877:
---

Thanks [~mithun] for your explanation. I will close my PR

> Change HCatTableInfo to not be transient in PartInfo
> 
>
> Key: HIVE-21877
> URL: https://issues.apache.org/jira/browse/HIVE-21877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from 
> it. We were running into NPE during serialization while using HCatalogIO with 
> Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

2019-06-14 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864420#comment-16864420
 ] 

Mithun Radhakrishnan commented on HIVE-21877:
-

Pasting your question from the PR here:

{quote}
While using Hcatalog with Apache Beam, we ran into an issue with HCatTableInfo 
being null during serialization. I don't see a reason why it should be 
transient. However, there might be use-cases that I may not be aware of and 
might require it to be transient. Would love to hear some feedback regardless.
{quote}

This has to do with HIVE-9845. It would not be a good idea to make 
HCatTableInfo non-transient. Doing so will make Pig/HCatLoader, as well as 
{{HCatInputFormat}} inefficient for large partition sets.
{{HCatTableInfo}} contains table-information that is static for all partition 
within a partition-set for a given table. {{PartInfo}} is the variable part. 
Serializing this multiple times for a partition set increases the 
split-meta-info for a Hadoop job to unreasonable lengths.

I would advise perusing the HCat code to see how {{HCatTableInfo}} is restored, 
post serialization.

> Change HCatTableInfo to not be transient in PartInfo
> 
>
> Key: HIVE-21877
> URL: https://issues.apache.org/jira/browse/HIVE-21877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from 
> it. We were running into NPE during serialization while using HCatalogIO with 
> Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)