[jira] [Commented] (HIVE-22771) Partition location incorrectly formed in FileOutputCommitterContainer

2020-02-05 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030941#comment-17030941
 ] 

Mithun Radhakrishnan commented on HIVE-22771:
-

{quote}Sufficiently long format should not increase the chance of collision I 
think. Also, do you know if there's any reason for keeping this random ID 
numeric?
{quote}
This is a good point, [~dvoros]. There is no reason that I can think of to keep 
this numeric. There certainly isn't a reason for this to be in scientific 
notation. IIRC, the whole point was to keep the temp directory from clashing 
against other attempts. Perhaps a better way forward would be to use a 
sufficiently long random {{String}}.

> Partition location incorrectly formed in FileOutputCommitterContainer
> -
>
> Key: HIVE-22771
> URL: https://issues.apache.org/jira/browse/HIVE-22771
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Shivam
>Assignee: Shivam
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-22771.2.patch, HIVE-22771.3.patch, HIVE-22771.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses 
> function _setOutput_ to generate _idHash_ using below statement:
> *+In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+*
>  *line 116: idHash = String.valueOf(Math.random());*
> The output of idHash can be similar to values like this : 7.145347157239135E-4
>  
> And, in class _FileOutputCommitterContainer_ in package 
> _org.apache.hive.hcatalog.mapreduce;_
> Uses below statement to compute final partition path:
> +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+
> *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + 
> SCRATCH_DIR_NAME + "{color:#ff}\\d\\.?
> d+"{color},"");*
> *line 367: partPath = new Path(finalLocn);*
>  
> Regex used here is incorrect, since it will only remove integers after the 
> *SCRATCH_DIR_NAME,* and hence will append  'E-4' (for the above example) in 
> the final partition location. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22771) Partition location incorrectly formed in FileOutputCommitterContainer

2020-01-27 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024768#comment-17024768
 ] 

Mithun Radhakrishnan commented on HIVE-22771:
-

FWIW, +1 from me as well. This is a good catch.
{quote}bq. Can we add a test for it (e.g., in TestHCatStorer)?
{quote}
This bug probably got through because while there are tests in 
{{AbstractHCatStorerTest}} for the dynamic-partitioning case, and the hybrid 
(i.e. partially dynamic) case, the static case (i.e. when the partition-ids are 
fully known _a priori_) isn't covered. There will be value in adding a test for 
Pig scripts writing to static partitions.

Thank you for working on this, [~shivam-mohan]. :]

> Partition location incorrectly formed in FileOutputCommitterContainer
> -
>
> Key: HIVE-22771
> URL: https://issues.apache.org/jira/browse/HIVE-22771
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Shivam
>Assignee: Shivam
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-22771.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses 
> function _setOutput_ to generate _idHash_ using below statement:
> *+In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+*
>  *line 116: idHash = String.valueOf(Math.random());*
> The output of idHash can be similar to values like this : 7.145347157239135E-4
>  
> And, in class _FileOutputCommitterContainer_ in package 
> _org.apache.hive.hcatalog.mapreduce;_
> Uses below statement to compute final partition path:
> +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+
> *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + 
> SCRATCH_DIR_NAME + "{color:#ff}\\d\\.?
> d+"{color},"");*
> *line 367: partPath = new Path(finalLocn);*
>  
> Regex used here is incorrect, since it will only remove integers after the 
> *SCRATCH_DIR_NAME,* and hence will append  'E-4' (for the above example) in 
> the final partition location. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22771) Partition location incorrectly formed in FileOutputCommitterContainer

2020-01-24 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023137#comment-17023137
 ] 

Mithun Radhakrishnan edited comment on HIVE-22771 at 1/24/20 5:44 PM:
--

Hmm... This looks like a good catch to me. It looks like the 
static-partitioning case was missed/mishandled before.

The dynamic-partitioning case seems to be handled better, with a more greedy 
regex:

[https://github.com/apache/hive/blob/706c1d44f7734ed36900a02c7255c1d0ce0ad45a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L867]

[~shivam-mohan], you might want to post the patch here, for {{master}} as well, 
and have CI run tests. The fix seems good to me, on the face of it.


was (Author: mithun):
Hmm... This looks like a good catch to me. It looks like the 
static-partitioning case was missed/mishandled before.

The dynamic-partitioning case seems to be handled better, with a more greedy 
regex:

[https://github.com/apache/hive/blob/706c1d44f7734ed36900a02c7255c1d0ce0ad45a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L867]

 

[~shivam-mohan], you might want to post the patch here, for {{master}} as well, 
and have CI run tests. The fix seems good to me, on the face of it.

> Partition location incorrectly formed in FileOutputCommitterContainer
> -
>
> Key: HIVE-22771
> URL: https://issues.apache.org/jira/browse/HIVE-22771
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Shivam
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses 
> function _setOutput_ to generate _idHash_ using below statement:
> +In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+
>  *line 116: idHash = String.valueOf(Math.random());*
> The output of idHash can be similar to values like this : 7.145347157239135E-4
>  
> And,
> in class _FileOutputCommitterContainer_ in package 
> _org.apache.hive.hcatalog.mapreduce;_
> Uses below statement to compute final partition path:
> +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+
> *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + 
> SCRATCH_DIR_NAME + "{color:#FF}\\d\\.?\\d+"{color},"");*
> *line 367: partPath = new Path(finalLocn);*
>  
> Regex used here is incorrect, since it will only remove integers after the 
> *SCRATCH_DIR_NAME,* and hence will append  'E-4' (for the above example) in 
> the final partition location. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22771) Partition location incorrectly formed in FileOutputCommitterContainer

2020-01-24 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023137#comment-17023137
 ] 

Mithun Radhakrishnan commented on HIVE-22771:
-

Hmm... This looks like a good catch to me. It looks like the 
static-partitioning case was missed/mishandled before.

The dynamic-partitioning case seems to be handled better, with a more greedy 
regex:

[https://github.com/apache/hive/blob/706c1d44f7734ed36900a02c7255c1d0ce0ad45a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L867]

 

[~shivam-mohan], you might want to post the patch here, for {{master}} as well, 
and have CI run tests. The fix seems good to me, on the face of it.

> Partition location incorrectly formed in FileOutputCommitterContainer
> -
>
> Key: HIVE-22771
> URL: https://issues.apache.org/jira/browse/HIVE-22771
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Shivam
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses 
> function _setOutput_ to generate _idHash_ using below statement:
> +In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+
>  *line 116: idHash = String.valueOf(Math.random());*
> The output of idHash can be similar to values like this : 7.145347157239135E-4
>  
> And,
> in class _FileOutputCommitterContainer_ in package 
> _org.apache.hive.hcatalog.mapreduce;_
> Uses below statement to compute final partition path:
> +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+
> *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + 
> SCRATCH_DIR_NAME + "{color:#FF}\\d\\.?\\d+"{color},"");*
> *line 367: partPath = new Path(finalLocn);*
>  
> Regex used here is incorrect, since it will only remove integers after the 
> *SCRATCH_DIR_NAME,* and hence will append  'E-4' (for the above example) in 
> the final partition location. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2019-12-12 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995221#comment-16995221
 ] 

Mithun Radhakrishnan commented on HIVE-9736:


[~zshao], sorry I missed your comment. You're right about this not being faster 
today, since the DFSClient currently just loops on the client-side. The 
intention was for the file-list to be pushed down to the name-node, so as to 
avoid the loop. We intended to request for the {{DFSClient}} to implement this. 
:/

> StorageBasedAuthProvider should batch namenode-calls where possible.
> 
>
> Key: HIVE-9736
> URL: https://issues.apache.org/jira/browse/HIVE-9736
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security
>Affects Versions: 1.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC1.2
> Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
> HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, 
> HIVE-9736.8.patch
>
>
> Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
> have 1 associated regions. Consider that the user does:
> {code:sql}
> ALTER TABLE my_table DROP PARTITION (dt='20150101');
> {code}
> As things stand now, {{StorageBasedAuthProvider}} will make individual 
> {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
> and authorize each one separately. It'd be faster to batch the calls, and 
> examine multiple FileStatus objects at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22590) Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers as it causes issues with SSL and LB

2019-12-12 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995216#comment-16995216
 ] 

Mithun Radhakrishnan edited comment on HIVE-22590 at 12/13/19 12:01 AM:


My apologies for not responding to this sooner. This sort of failure was 
unintended. I'm supportive of reverting HIVE-17218, if required.

[~pvary], would you happen to know if HIVE-21899 sorts this out?

Edit: I'm not sure it will. With a load-balancer, it wouldn't be enough to 
check if an IP address is returned instead of the hostname. The load-balancer 
would have its own hostname and IP-address.

 


was (Author: mithun):
My apologies for not responding to this sooner. This sort of failure was 
unintended. I'm supportive of reverting HIVE-17218, if required.

[~pvary], would you happen to know if HIVE-21899 sorts this out?

> Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers 
> as it causes issues with SSL and LB
> -
>
> Key: HIVE-22590
> URL: https://issues.apache.org/jira/browse/HIVE-22590
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> HIVE-17218 causes issues with Load balanced + SSL environments, like this:
> {code}
> java.net.SocketException: Socket is closed 
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1532) 
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
> at 
> org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
>  
> at org.apache.thrift.transport.TSocket.close(TSocket.java:235) 
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:318) 
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  
> at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204) 
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:169) 
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) 
> at java.sql.DriverManager.getConnection(DriverManager.java:664) 
> at java.sql.DriverManager.getConnection(DriverManager.java:208) 
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:146)
>  
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:211)
>  
> at org.apache.hive.beeline.Commands.connect(Commands.java:1496) 
> at org.apache.hive.beeline.Commands.connect(Commands.java:1391) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
>  
> at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1135) 
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1174) 
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010) 
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922) 
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518) 
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> Unknown HS2 problem when communicating with Thrift server. 
> Error: Could not open client transport with JDBC Uri: jdbc:hive2:// For privacy> GSS initiate failed 
> Also, could not send response: 

[jira] [Commented] (HIVE-22590) Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers as it causes issues with SSL and LB

2019-12-12 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995216#comment-16995216
 ] 

Mithun Radhakrishnan commented on HIVE-22590:
-

My apologies for not responding to this sooner. This sort of failure was 
unintended. I'm supportive of reverting HIVE-17218, if required.

[~pvary], would you happen to know if HIVE-21899 sorts this out?

> Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers 
> as it causes issues with SSL and LB
> -
>
> Key: HIVE-22590
> URL: https://issues.apache.org/jira/browse/HIVE-22590
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> HIVE-17218 causes issues with Load balanced + SSL environments, like this:
> {code}
> java.net.SocketException: Socket is closed 
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1532) 
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
> at 
> org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
>  
> at org.apache.thrift.transport.TSocket.close(TSocket.java:235) 
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:318) 
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
>  
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  
> at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204) 
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:169) 
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) 
> at java.sql.DriverManager.getConnection(DriverManager.java:664) 
> at java.sql.DriverManager.getConnection(DriverManager.java:208) 
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:146)
>  
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:211)
>  
> at org.apache.hive.beeline.Commands.connect(Commands.java:1496) 
> at org.apache.hive.beeline.Commands.connect(Commands.java:1391) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
>  
> at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1135) 
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1174) 
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010) 
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922) 
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518) 
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> Unknown HS2 problem when communicating with Thrift server. 
> Error: Could not open client transport with JDBC Uri: jdbc:hive2:// For privacy> GSS initiate failed 
> Also, could not send response: 
> org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address 52 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-19261) Avro SerDe's InstanceCache should not be synchronized on retrieve

2019-09-16 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929610#comment-16929610
 ] 

Mithun Radhakrishnan edited comment on HIVE-19261 at 9/16/19 5:35 PM:
--

Hello, [~shengzhixia], et al.

This looks like a good fix, generally. A couple of suggestions:
 # Please consider the rephrasing the ending of {{InstanceCache::retrieve()}} 
as follows, for readability:
 ## 
{code:java}
Instance newInstance = makeInstance(hv, seenSchemas);
instance = cache.putIfAbsent(hv, newInstance);
return instance == null? newInstance : instance;{code}
 # Please remove the documentation regarding 
{{ConcurrentHashMap::computeIfAbsent()}}; it won't work here. 
{{makeInstance()}} lands up making reentrant calls to 
{{InstanceCache::retrieve()}}. {{computeIfAbsent()}} requires that the 
{{cache}} not be modified within a single call to the {{putIfAbsent()}} lambda 
(i.e. {{hv -> makeInstance(hv, seenSchemas)}}. As such, this will cause 
{{retrieve()}} to hang on the first reentrant call.


was (Author: mithun):
Hello, [~shengzhixia], et al.

This looks like a good fix, generally. A couple of suggestions:
 # Please consider the rephrasing the ending of {{InstanceCache::retrieve()}} 
as follows, for readability:
 ## 
{code:java}
Instance newInstance = makeInstance(hv, seenSchemas);
instance = cache.putIfAbsent(hv, newInstance);
return instance == null? newInstance : instance;{code}

 # Please remove the documentation regarding 
{{ConcurrentHashMap::computeIfAbsent()}}; it won't work here. 
{{makeInstance()}} lands up making reentrant calls to 
{{InstanceCache::retrieve()}}. {{computeIfAbsent()}} requires that the 
{{cache}} not be modified within a single call to the {{putIfAbsent()}} lambda 
(i.e. {{hv -> makeInstance(hv, seenSchemas)}}. As such, this will cause 
{{retrieve()}} to hang on the first reentrant call.

> Avro SerDe's InstanceCache should not be synchronized on retrieve
> -
>
> Key: HIVE-19261
> URL: https://issues.apache.org/jira/browse/HIVE-19261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Fangshi Li
>Assignee: Fangshi Li
>Priority: Major
> Attachments: HIVE-19261.1.patch
>
>
> In HIVE-16175, upstream made a patch to fix the thread safety issue in 
> AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache 
> synchronized. While it should make InstanceCache thread-safe, making retrieve 
> synchronized for the cache can be expensive in highly concurrent environment 
> like Spark, as multiple threads need to be synchronized on entering the 
> entire retrieve method.
> We are proposing another way to fix this thread safety issue by making the 
> underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use 
> atomic computeIfAbsent in the retrieve method to avoid synchronizing the 
> entire method.
> While computeIfAbsent is only available on java 8 and java 7 is still 
> supported in Hive,
> we use a pattern to simulate the behavior of computeIfAbsent. In the future, 
> we should move to computeIfAbsent when Hive requires java 8.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-19261) Avro SerDe's InstanceCache should not be synchronized on retrieve

2019-09-13 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929610#comment-16929610
 ] 

Mithun Radhakrishnan commented on HIVE-19261:
-

Hello, [~shengzhixia], et al.

This looks like a good fix, generally. A couple of suggestions:
 # Please consider the rephrasing the ending of {{InstanceCache::retrieve()}} 
as follows:
 ## {code:java}
Instance newInstance = makeInstance(hv, seenSchemas);
instance = cache.putIfAbsent(hv, newInstance);
return instance == null? newInstance : instance;{code}
 # Please remove the documentation regarding 
{{ConcurrentHashMap::computeIfAbsent()}}; it won't work here. 
{{makeInstance()}} lands up making reentrant calls to 
{{InstanceCache::retrieve()}}. {{computeIfAbsent()}} requires that the 
{{cache}} not be modified within a single call to the {{putIfAbsent()}} lambda 
(i.e. {{hv -> makeInstance(hv, seenSchemas)}}. As such, this will cause 
{{retrieve()}} to hang on the first reentrant call.

> Avro SerDe's InstanceCache should not be synchronized on retrieve
> -
>
> Key: HIVE-19261
> URL: https://issues.apache.org/jira/browse/HIVE-19261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Fangshi Li
>Assignee: Fangshi Li
>Priority: Major
> Attachments: HIVE-19261.1.patch
>
>
> In HIVE-16175, upstream made a patch to fix the thread safety issue in 
> AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache 
> synchronized. While it should make InstanceCache thread-safe, making retrieve 
> synchronized for the cache can be expensive in highly concurrent environment 
> like Spark, as multiple threads need to be synchronized on entering the 
> entire retrieve method.
> We are proposing another way to fix this thread safety issue by making the 
> underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use 
> atomic computeIfAbsent in the retrieve method to avoid synchronizing the 
> entire method.
> While computeIfAbsent is only available on java 8 and java 7 is still 
> supported in Hive,
> we use a pattern to simulate the behavior of computeIfAbsent. In the future, 
> we should move to computeIfAbsent when Hive requires java 8.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (HIVE-19261) Avro SerDe's InstanceCache should not be synchronized on retrieve

2019-09-13 Thread Mithun Radhakrishnan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929610#comment-16929610
 ] 

Mithun Radhakrishnan edited comment on HIVE-19261 at 9/13/19 11:38 PM:
---

Hello, [~shengzhixia], et al.

This looks like a good fix, generally. A couple of suggestions:
 # Please consider the rephrasing the ending of {{InstanceCache::retrieve()}} 
as follows, for readability:
 ## 
{code:java}
Instance newInstance = makeInstance(hv, seenSchemas);
instance = cache.putIfAbsent(hv, newInstance);
return instance == null? newInstance : instance;{code}

 # Please remove the documentation regarding 
{{ConcurrentHashMap::computeIfAbsent()}}; it won't work here. 
{{makeInstance()}} lands up making reentrant calls to 
{{InstanceCache::retrieve()}}. {{computeIfAbsent()}} requires that the 
{{cache}} not be modified within a single call to the {{putIfAbsent()}} lambda 
(i.e. {{hv -> makeInstance(hv, seenSchemas)}}. As such, this will cause 
{{retrieve()}} to hang on the first reentrant call.


was (Author: mithun):
Hello, [~shengzhixia], et al.

This looks like a good fix, generally. A couple of suggestions:
 # Please consider the rephrasing the ending of {{InstanceCache::retrieve()}} 
as follows:
 ## {code:java}
Instance newInstance = makeInstance(hv, seenSchemas);
instance = cache.putIfAbsent(hv, newInstance);
return instance == null? newInstance : instance;{code}
 # Please remove the documentation regarding 
{{ConcurrentHashMap::computeIfAbsent()}}; it won't work here. 
{{makeInstance()}} lands up making reentrant calls to 
{{InstanceCache::retrieve()}}. {{computeIfAbsent()}} requires that the 
{{cache}} not be modified within a single call to the {{putIfAbsent()}} lambda 
(i.e. {{hv -> makeInstance(hv, seenSchemas)}}. As such, this will cause 
{{retrieve()}} to hang on the first reentrant call.

> Avro SerDe's InstanceCache should not be synchronized on retrieve
> -
>
> Key: HIVE-19261
> URL: https://issues.apache.org/jira/browse/HIVE-19261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Fangshi Li
>Assignee: Fangshi Li
>Priority: Major
> Attachments: HIVE-19261.1.patch
>
>
> In HIVE-16175, upstream made a patch to fix the thread safety issue in 
> AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache 
> synchronized. While it should make InstanceCache thread-safe, making retrieve 
> synchronized for the cache can be expensive in highly concurrent environment 
> like Spark, as multiple threads need to be synchronized on entering the 
> entire retrieve method.
> We are proposing another way to fix this thread safety issue by making the 
> underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use 
> atomic computeIfAbsent in the retrieve method to avoid synchronizing the 
> entire method.
> While computeIfAbsent is only available on java 8 and java 7 is still 
> supported in Hive,
> we use a pattern to simulate the behavior of computeIfAbsent. In the future, 
> we should move to computeIfAbsent when Hive requires java 8.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-21969) Beeline doesnt support HiveQL

2019-07-08 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880693#comment-16880693
 ] 

Mithun Radhakrishnan commented on HIVE-21969:
-

{quote}I can provide repro steps if they arent obvious
{quote}
Please do. HiveQL is rather the point of Beeline.

> Beeline doesnt support HiveQL
> -
>
> Key: HIVE-21969
> URL: https://issues.apache.org/jira/browse/HIVE-21969
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.3.5
>Reporter: Valeriy Trofimov
>Priority: Major
>
> Beeline doesnt support HiveQL. I can provide repro steps if they arent 
> obvious - run a HiveQL query in Beeline, watch it faill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-20 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868949#comment-16868949
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

bq. Mithun Radhakrishnan just for sure I've inserted two rows, one with dt=1, 
one with dt=2, and checked the files in HDFS. They are both ORC files.

Pardon me, but this does not sound right. How exactly did you insert values 
into {{dt=1}} and {{dt=2}}? At what point did you write to each of the 
partitions? If you {{INSERT OVERWRITE}} *after* the partitions were created, 
then I can understand how you see what you see. But, consider this sequence:

{code:sql}
-- Create the table.
CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY (dt STRING) 
STORED AS TEXTFILE;
-- 1.
INSERT OVERWRITE TABLE foobar PARTITION( dt='1' ) VALUES ( "foo1", "value1" ); 
-- SerDe == LazySimpleSerDe.
-- Describe the partition, to confirm.
DESC FORMATTED TABLE foobar PARTITION (dt='1');

-- Alter format.
ALTER TABLE foobar SET FILEFORMAT ORCFILE; -- (No CASCADE)

-- 2.
INSERT OVERWRITE TABLE foobar PARTITION( dt='2' ) VALUES ( "foo2", "value2" ); 
-- SerDe == OrcSerDe.
-- Describe the partition, to confirm.
DESC FORMATTED TABLE foobar PARTITION (dt='2');
{code}

In this case, if {{dt='1'}} doesn't retain its SerDe setting, it will be 
rendered unreadable after the table-format is changed. Please correct me if I'm 
wrong.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868092#comment-16868092
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

At the risk of muddying the waters, I'd consider {{AvroSerDe}} which relies on 
the table/serde settings for {{"avro.schema.url"}} and 
{{"avro.schema.literal"}}.
When an Avro table's schema changes, old partitions might link to an older 
schema-literal SerDe-parameter value than newer partitions.

I could be wrong, but we might want to reevaluate the assumption that SerDe 
settings should apply uniformly across all partitions in a table.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868085#comment-16868085
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

bq. SerDe is for a table, and not for a partition.

Pardon me, but wouldn't a SerDe be exercised per partition?

{code:sql}
CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY (dt STRING) 
STORED AS TEXTFILE;
ALTER TABLE foobar ADD PARTITION ( dt='1' ); -- SerDe == LazySimpleSerDe.
ALTER TABLE foobar SET FILEFORMAT ORCFILE;
ALTER TABLE foobar ADD PARTITION ( dt='2' ); -- SerDe == OrcSerDe.
{code}

{{foobar(dt='1')}} should use {{LazySimpleSerDe}}, while {{foobar(dt='2')}} 
would use {{OrcSerDe}}, when each is read.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

2019-06-14 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864458#comment-16864458
 ] 

Mithun Radhakrishnan commented on HIVE-21877:
-

No worries, mate. Cheers.

> Change HCatTableInfo to not be transient in PartInfo
> 
>
> Key: HIVE-21877
> URL: https://issues.apache.org/jira/browse/HIVE-21877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from 
> it. We were running into NPE during serialization while using HCatalogIO with 
> Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

2019-06-14 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864420#comment-16864420
 ] 

Mithun Radhakrishnan commented on HIVE-21877:
-

Pasting your question from the PR here:

{quote}
While using Hcatalog with Apache Beam, we ran into an issue with HCatTableInfo 
being null during serialization. I don't see a reason why it should be 
transient. However, there might be use-cases that I may not be aware of and 
might require it to be transient. Would love to hear some feedback regardless.
{quote}

This has to do with HIVE-9845. It would not be a good idea to make 
HCatTableInfo non-transient. Doing so will make Pig/HCatLoader, as well as 
{{HCatInputFormat}} inefficient for large partition sets.
{{HCatTableInfo}} contains table-information that is static for all partition 
within a partition-set for a given table. {{PartInfo}} is the variable part. 
Serializing this multiple times for a partition set increases the 
split-meta-info for a Hadoop job to unreasonable lengths.

I would advise perusing the HCat code to see how {{HCatTableInfo}} is restored, 
post serialization.

> Change HCatTableInfo to not be transient in PartInfo
> 
>
> Key: HIVE-21877
> URL: https://issues.apache.org/jira/browse/HIVE-21877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from 
> it. We were running into NPE during serialization while using HCatalogIO with 
> Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2019-05-16 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841538#comment-16841538
 ] 

Mithun Radhakrishnan commented on HIVE-17794:
-

This patch will need rebasing. I'm afraid it's been ages since I posted this, 
so it *could* take some doing.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-17794.02.patch, HIVE-17794.03.patch, 
> HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> 

[jira] [Commented] (HIVE-21000) Upgrade thrift to at least 0.10.0

2019-02-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772356#comment-16772356
 ] 

Mithun Radhakrishnan commented on HIVE-21000:
-

[~isuller], [~kgyrtkirk], [~vihangk1], thanks so much for working on this one.

There's a chance that after porting to Thrift 0.12, we might run into noisy 
logging, stemming from load-balancer health-checks. I've documented the 
failures that I ran into (Hive 1.2.x + Thrift 0.12) at THRIFT-4805:

{noformat}
org.apache.thrift.transport.TTransportException
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
at 
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
at 
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
at 
org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
at 
org.apache.hive.service.cli.thrift.ThriftCLIMetricsProcessor.process(ThriftCLIMetricsProcessor.java:63)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:777)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:773)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:773)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{noformat}

With Thrift 0.9.x, the {{TSaslTransportException}} was trapped and handled 
separately.
With Thrift 0.12.x, it's being caught and logged as a {{TException}}. (It 
doesn't seem to be wrapped in a {{RuntimeException}}). You're likely to see the 
above fill up the metastore server logs.

I have also posted a tentative fix (which we're currently testing):
https://github.com/apache/thrift/pull/1746

Just a heads-up.



> Upgrade thrift to at least 0.10.0
> -
>
> Key: HIVE-21000
> URL: https://issues.apache.org/jira/browse/HIVE-21000
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Ivan Suller
>Priority: Major
> Attachments: HIVE-21000.01.patch, HIVE-21000.02.patch, 
> HIVE-21000.03.patch, HIVE-21000.04.patch, HIVE-21000.05.patch, 
> HIVE-21000.06.patch, sampler_before.png
>
>
> I was looking into some compile profiles for tables with lots of columns; and 
> it turned out that [thrift 0.9.3 is allocating a 
> List|https://github.com/apache/hive/blob/8e30b5e029570407d8a1db67d322a95db705750e/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java#L348]
>  during every hashcode calculation; but luckily THRIFT-2877 is improving on 
> that - so I propose to upgrade to at least 0.10.0 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-03-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395833#comment-16395833
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Ah, I see what you've done there! +1.

I applied your changes in a local setup, and sampled the failing tests. It 
looks like you've sorted them out very well.

Thank you for working on this, [~aihuaxu]!

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Aihua Xu
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch, 
> HIVE-14792.4.patch, HIVE-14792.5.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-03-09 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393380#comment-16393380
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Ok, this patch does not look ready to go in. :/ I was hoping these were 
transient failures, but it looks like [^HIVE-14792.3.patch] is still failing 
hard, based on the test-results above.

[^HIVE-14792.4.patch] happens to run, because the optimization is not enabled.

This is back to where it was when I ran out of time. :[ I'll update if I can 
take a crack at this.

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Aihua Xu
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch, 
> HIVE-14792.4.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-03-08 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392017#comment-16392017
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

bq. since you have all the context.

For what it's worth, +1. Thank you for going above and beyond here, [~aihuaxu]!

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Aihua Xu
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch, 
> HIVE-14792.4.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-03-08 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392004#comment-16392004
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Hey, [~aihuaxu]. Thanks for verifying that the tests are not related to my 
changes, and for restoring the default.

Am I allowed to +1 this change? ;]

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Aihua Xu
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch, 
> HIVE-14792.4.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-03-07 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390579#comment-16390579
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

I've still not been able to devote time to this. Sorry.

It isn't meaningful to delay you on this. The latest patch on this JIRA has the 
fix. The only things to be changed are the failing tests, and disabling the 
feature by default.

I'd be obliged if someone else picked this up. I'll update here if I'm able to 
make any progress before then.

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-03-07 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: HIVE-18608.2-branch-2.2.patch

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch, 
> HIVE-18608.2-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-03-07 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390030#comment-16390030
 ] 

Mithun Radhakrishnan commented on HIVE-18608:
-

Hey, [~owen.omalley]. I've renamed that property, as you've suggested.

Also, thanks for ORC-308. :]

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch, 
> HIVE-18608.2-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Status: Patch Available  (was: Open)

(Submitting for tests.)

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth. E.g. 
{{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the {{STRUCT}}, recursively).

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349660#comment-16349660
 ] 

Mithun Radhakrishnan commented on HIVE-18608:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth. E.g. 
{{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: (was: HIVE-18608.1-branch-2.2.patch)

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: HIVE-18608.1-branch-2.2.patch

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: HIVE-18608.1-branch-2.2.patch

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-18608:
---


> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349374#comment-16349374
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Terribly sorry for the delay, [~aihuaxu]. Yes, of course. The default should 
remain false. I'd changed it to true to see if the tests were affected by it.

This patch seems to cause test failures on trunk. I'll sort that out and post 
an update.

Thanks for reviewing, [~aihuaxu].

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2018-01-28 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342873#comment-16342873
 ] 

Mithun Radhakrishnan commented on HIVE-8472:


bq. By the way, Fix Version/s should include 2.2.1.
Thanks for pointing that out, [~leftylev]. Corrected.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, 
> HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2018-01-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Fix Version/s: 2.2.1

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, 
> HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18376) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312106#comment-16312106
 ] 

Mithun Radhakrishnan commented on HIVE-18376:
-

+1. Welcome to the fold! :)

> Update committer-list
> -
>
> Key: HIVE-18376
> URL: https://issues.apache.org/jira/browse/HIVE-18376
> Project: Hive
>  Issue Type: Bug
>Reporter: Chris Drome
>Assignee: Chris Drome
>Priority: Trivial
> Attachments: HIVE-18376.1.patch
>
>
> Adding new entry to committer-list:
> {noformat}
> +
> +cdrome 
> +Chris Drome 
> + href="https://www.oath.com/;>Oath 
> +
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-18374) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan resolved HIVE-18374.
-
Resolution: Fixed

Thanks, [~thejas]! :] Committed.

> Update committer-list
> -
>
> Key: HIVE-18374
> URL: https://issues.apache.org/jira/browse/HIVE-18374
> Project: Hive
>  Issue Type: Task
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Trivial
> Attachments: HIVE-18374.1.patch
>
>
> I'm afraid I need to make a trivial change to my organization affiliation:
> {code:xml}
> 
> mithun
> Mithun Radhakrishnan
> https://oath.com/;>Oath
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18374) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18374:

Attachment: HIVE-18374.1.patch

[~thejas], could I bother you for this one?

> Update committer-list
> -
>
> Key: HIVE-18374
> URL: https://issues.apache.org/jira/browse/HIVE-18374
> Project: Hive
>  Issue Type: Task
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Trivial
> Attachments: HIVE-18374.1.patch
>
>
> I'm afraid I need to make a trivial change to my organization affiliation:
> {code:xml}
> 
> mithun
> Mithun Radhakrishnan
> https://oath.com/;>Oath
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18374) Update committer-list

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-18374:
---


> Update committer-list
> -
>
> Key: HIVE-18374
> URL: https://issues.apache.org/jira/browse/HIVE-18374
> Project: Hive
>  Issue Type: Task
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Trivial
>
> I'm afraid I need to make a trivial change to my organization affiliation:
> {code:xml}
> 
> mithun
> Mithun Radhakrishnan
> https://oath.com/;>Oath
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18363) Not able to add jar from local path other than HIVE_HOME' path using beeline

2018-01-04 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311769#comment-16311769
 ] 

Mithun Radhakrishnan commented on HIVE-18363:
-

[~anuradha_gadge], I'm guessing that 
{{file:///home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar}} exists on 
the client machine (where you're running beeline), and not on the HS2 server. 
Is that right? 

Please realize that beeline won't transmit your jar from the client machine's 
local FS to the HS2 server (where your query is actually running). You're 
better off using an HDFS path for your jar.



> Not able to add jar from local path other than HIVE_HOME' path using beeline
> 
>
> Key: HIVE-18363
> URL: https://issues.apache.org/jira/browse/HIVE-18363
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2, Thrift API
>Affects Versions: 1.2.1
>Reporter: Anuradha Gadge
>Priority: Critical
>
> When I tried to add jar from local path to hive using beeline it fails. When 
> I put jar in some directory in HIVE_HOME, I could add it using that local 
> path. But for paths other than HIVE_HOME it did not worked.
> Note: For hive-cli I could add jar from any local paths.
> Command I tried : add jar 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar;
> Stack Trace for Hiveserver2:
> 2018-01-03 18:43:03,086 DEBUG [HiveServer2-Handler-Pool: Thread-41]: 
> transport.TSaslTransport (TSaslTransport.java:readFrame(459)) - SERVER: 
> reading data length: 173
> 2018-01-03 18:43:03,087 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> session.HiveSessionImpl (HiveSessionImpl.java:acquire(304)) - We are setting 
> the hadoop caller context to ea0f9b24-89ff-4225-9948-6112bbb29189 for thread 
> HiveServer2-Handler-Pool: Thread-41
> 2018-01-03 18:43:03,087 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> operation.Operation (HiveCommandOperation.java:setupSessionIO(69)) - Putting 
> temp output to file 
> /tmp/hive/ea0f9b24-89ff-4225-9948-6112bbb291891944312414426376214.pipeout
> 2018-01-03 18:43:03,088 DEBUG [HiveServer2-Handler-Pool: Thread-41]: 
> parse.VariableSubstitution (VariableSubstitution.java:substitute(53)) - 
> Substitution is on: jar 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar
> 2018-01-03 18:43:03,089 ERROR [HiveServer2-Handler-Pool: Thread-41]: 
> SessionState (SessionState.java:printError(932)) - 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar does not exist
> java.lang.IllegalArgumentException: 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar does not exist
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:970)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1074)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1063)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1121)
>   at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
>   at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:105)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:419)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:406)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:274)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-01-03 18:43:03,089 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> session.HiveSessionImpl 

[jira] [Commented] (HIVE-18363) Not able to add jar from local path other than HIVE_HOME' path using beeline

2018-01-03 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310336#comment-16310336
 ] 

Mithun Radhakrishnan commented on HIVE-18363:
-

Wouldn't this depend on what {{fs.default.name}} is set to? With an unqualified 
path literal, HS2 might just be looking on HDFS, not local FS.

> Not able to add jar from local path other than HIVE_HOME' path using beeline
> 
>
> Key: HIVE-18363
> URL: https://issues.apache.org/jira/browse/HIVE-18363
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2, Thrift API
>Affects Versions: 1.2.1
>Reporter: Anuradha Gadge
>Priority: Critical
>
> When I tried to add jar from local path to hive using beeline it fails. When 
> I put jar in some directory in HIVE_HOME, I could add it using that local 
> path. But for paths other than HIVE_HOME it did not worked.
> Note: For hive-cli I could add jar from any local paths.
> Command I tried : add jar 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar;
> Stack Trace for Hiveserver2:
> 2018-01-03 18:43:03,086 DEBUG [HiveServer2-Handler-Pool: Thread-41]: 
> transport.TSaslTransport (TSaslTransport.java:readFrame(459)) - SERVER: 
> reading data length: 173
> 2018-01-03 18:43:03,087 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> session.HiveSessionImpl (HiveSessionImpl.java:acquire(304)) - We are setting 
> the hadoop caller context to ea0f9b24-89ff-4225-9948-6112bbb29189 for thread 
> HiveServer2-Handler-Pool: Thread-41
> 2018-01-03 18:43:03,087 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> operation.Operation (HiveCommandOperation.java:setupSessionIO(69)) - Putting 
> temp output to file 
> /tmp/hive/ea0f9b24-89ff-4225-9948-6112bbb291891944312414426376214.pipeout
> 2018-01-03 18:43:03,088 DEBUG [HiveServer2-Handler-Pool: Thread-41]: 
> parse.VariableSubstitution (VariableSubstitution.java:substitute(53)) - 
> Substitution is on: jar 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar
> 2018-01-03 18:43:03,089 ERROR [HiveServer2-Handler-Pool: Thread-41]: 
> SessionState (SessionState.java:printError(932)) - 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar does not exist
> java.lang.IllegalArgumentException: 
> /home/anuradha/shareinsights/udfs/hivexmlserde-1.0.5.3.jar does not exist
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:970)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1074)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1063)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1121)
>   at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
>   at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:105)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:419)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:406)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:274)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-01-03 18:43:03,089 INFO  [HiveServer2-Handler-Pool: Thread-41]: 
> session.HiveSessionImpl (HiveSessionImpl.java:release(318)) - We are 
> resetting the hadoop caller context for thread HiveServer2-Handler-Pool: 
> Thread-41
> 2018-01-03 18:43:03,089 WARN  [HiveServer2-Handler-Pool: Thread-41]: 
> thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(492)) - 

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-12-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Status: Patch Available  (was: Reopened)

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-12-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reopened HIVE-14792:
-

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-12-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Attachment: HIVE-14792.3.patch

Addendum. (This adds the {{avro.schema.literal}} value to the 
{{TBLPROPERTIES}}, instead of {{SERDEPROPERTIES}}.)

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-12-18 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295744#comment-16295744
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

[~aihuaxu], sorry for the bother, but it looks like my fix here is not 
complete. 
On enabling {{hive.optimize.update.table.properties.from.serde}}, one sees 
errors when prefetching Avro schemas, such as the following:
{noformat}
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:137) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_144]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144]
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110) 
~[hadoop-common-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79) 
~[hadoop-common-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) 
~[hadoop-common-3.0.0-beta1.jar:?]
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) 
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_144]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144]
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110) 
~[hadoop-common-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79) 
~[hadoop-common-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) 
~[hadoop-common-3.0.0-beta1.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456) 
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
 ~[hadoop-mapreduce-client-common-3.0.0-beta1.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_144]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_144]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
Caused by: java.lang.RuntimeException: cannot find field number from 
[0:error_error_error_error_error_error_error, 1:cannot_determine_schema, 
2:check, 3:schema, 4:url, 5:and, 6:literal]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:530)
 ~[hive-serde-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153)
 ~[hive-serde-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1096) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1122)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:367) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:557) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:509) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:504)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

The reason we're not seeing this failure in regular builds is that 

[jira] [Commented] (HIVE-18198) TablePropertyEnrichmentOptimizer.java is missing the Apache license header

2017-12-15 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293339#comment-16293339
 ] 

Mithun Radhakrishnan commented on HIVE-18198:
-

I have cherry-picked this into the other branches where HIVE-14792 was 
committed. Again, apologies for the negligence.

> TablePropertyEnrichmentOptimizer.java is missing the Apache license header
> --
>
> Key: HIVE-18198
> URL: https://issues.apache.org/jira/browse/HIVE-18198
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-18198.patch
>
>
> This causes warnings in the yetus check:
> {quote}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? 
> /data/hiveptest/working/yetus/ql/src/java/org/apache/hadoop/hive/ql/optimizer/TablePropertyEnrichmentOptimizer.java
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18198) TablePropertyEnrichmentOptimizer.java is missing the Apache license header

2017-12-15 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18198:

Fix Version/s: 2.2.1
   2.4.0

> TablePropertyEnrichmentOptimizer.java is missing the Apache license header
> --
>
> Key: HIVE-18198
> URL: https://issues.apache.org/jira/browse/HIVE-18198
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-18198.patch
>
>
> This causes warnings in the yetus check:
> {quote}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? 
> /data/hiveptest/working/yetus/ql/src/java/org/apache/hadoop/hive/ql/optimizer/TablePropertyEnrichmentOptimizer.java
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18198) TablePropertyEnrichmentOptimizer.java is missing the Apache license header

2017-12-15 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293324#comment-16293324
 ] 

Mithun Radhakrishnan commented on HIVE-18198:
-

Yikes! Sorry for the screw-up. Thank you for fixing this.

> TablePropertyEnrichmentOptimizer.java is missing the Apache license header
> --
>
> Key: HIVE-18198
> URL: https://issues.apache.org/jira/browse/HIVE-18198
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 3.0.0
>
> Attachments: HIVE-18198.patch
>
>
> This causes warnings in the yetus check:
> {quote}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? 
> /data/hiveptest/working/yetus/ql/src/java/org/apache/hadoop/hive/ql/optimizer/TablePropertyEnrichmentOptimizer.java
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2017-12-15 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14274:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

This is a dupe. Closing in favour of HIVE-17794.

> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>
>
> Consider this sequence of table/partition creation and schema evolution:
> {code:sql}
> -- Create table.
> CREATE EXTERNAL TABLE `simple_text` (
> foo STRING,
> bar STRUCT
> )
> PARTITIONED BY ( dt STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ':'
> STORED AS TEXTFILE ;
> -- Add partition.
> ALTER TABLE simple_text ADD PARTITION ( dt='0' );
> -- Alter the struct-column to add a new sub-field.
> ALTER TABLE simple_text CHANGE COLUMN bar bar STRUCT zoo:STRING>;
> {code}
> The {{dt='0'}} partition's schema indicates 2 fields in {{bar}}. The data can 
> be read using Hive, but not through HCatLoader. The error looks as follows:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: data_raw: 
> Store(hdfs://dilithiumblue-nn1.blue.ygrid.yahoo.com:8020/tmp/temp-643668868/tmp-1639945319:org.apache.pig.impl.io.TFileStorage)
>  - scope-1 Operator Key: scope-1): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>   ... 16 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
> Error converting read value to tuple
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
>   at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:63)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:118)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
>   ... 17 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
>   at 

[jira] [Commented] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-14 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291470#comment-16291470
 ] 

Mithun Radhakrishnan commented on HIVE-17794:
-

[~owen.omalley], the failing tests don't involve HCatalog at all. The added 
test passes. Does this look acceptable?

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.02.patch, HIVE-17794.03.patch, 
> HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.03.patch

Added missing Apache license to {{MiniGenericCluster.java}}.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.02.patch, HIVE-17794.03.patch, 
> HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.02.patch

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.02.patch, HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Patch Available  (was: Open)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.02.patch, HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: (was: HIVE-17794.2-branch-2.patch)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: (was: HIVE-17794.2-branch-2.2.patch)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: (was: HIVE-17794.2.patch)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Open  (was: Patch Available)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.2.patch, 
> HIVE-17794.2-branch-2.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Affects Version/s: 2.4.0

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.2.patch, 
> HIVE-17794.2-branch-2.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.2-branch-2.patch

The tests for {{branch-2}} took some doing. Submitting for tests.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.patch, 
> HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Open  (was: Patch Available)

Alas, I spoke too soon. Cancelling patch for {{branch-2}} to resolve conflicts 
with {{TestExtendedAcls}}.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.patch, 
> HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.2-branch-2.2.patch

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.2.patch, 
> HIVE-17794.2-branch-2.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Patch Available  (was: Open)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.patch, 
> HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.2-branch-2.patch

Re-resubmitting for tests, for {{branch-2}}. I'll  start with {{branch-2.2}} 
next.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2-branch-2.patch, 
> HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: (was: HIVE-17794.2-branch-2.patch)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Patch Available  (was: Open)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Open  (was: Patch Available)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Commented] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-08 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284331#comment-16284331
 ] 

Mithun Radhakrishnan commented on HIVE-17794:
-

Hello, [~owen.omalley]. I have added a test (i.e. 
{{TestHCatLoaderStructColumnAddition}}) that fails without the proposed fix.

In order to make the test palatable, I have had to refactor some stuff in the 
HCatalog. I had initially included these changes in HIVE-11548, but given that 
that patch is starved for attention, I have move some of those changes here. 
(The changes allow to me to invoke a Pig script in my test.)

Once this is checked in, I'll go back and update the patch in HIVE-11548.

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.2.patch

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Open  (was: Patch Available)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-12-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Patch Available  (was: Open)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch, HIVE-17794.2.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Updated] (HIVE-17600) Make OrcFile's "enforceBufferSize" user-settable.

2017-12-04 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17600:

   Resolution: Fixed
Fix Version/s: 2.2.1
   Status: Resolved  (was: Patch Available)

Committed to {{branch-2.2}}. Thanks for the review, [~owen.omalley]!

> Make OrcFile's "enforceBufferSize" user-settable.
> -
>
> Key: HIVE-17600
> URL: https://issues.apache.org/jira/browse/HIVE-17600
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.2.1
>
> Attachments: HIVE-17600.1-branch-2.2.patch
>
>
> This is a duplicate of ORC-238, but it applies to {{branch-2.2}}.
> Compression buffer-sizes in OrcFile are computed at runtime, except when 
> enforceBufferSize is set. The only snag here is that this flag can't be set 
> by the user.
> When runtime-computed buffer-sizes are not optimal (for some reason), the 
> user has no way to work around it by setting a custom value.
> I have a patch that we use at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17853:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275046#comment-16275046
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

Checked into {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you for the 
fix, [~cdrome]! And thank you, [~vihangk1], for taking the time to review.

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275017#comment-16275017
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

I'll check this version of the fix in. I have raised HIVE-18199 to track the 
addition of the unit-test.

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18199) Add unit-test for lost UGI doAs() context in RetryingMetaStoreClient

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-18199:
---


> Add unit-test for lost UGI doAs() context in RetryingMetaStoreClient
> 
>
> Key: HIVE-18199
> URL: https://issues.apache.org/jira/browse/HIVE-18199
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> This has to do with HIVE-17853. The {{RetryingMetaStoreClient}} would lose 
> the {{UGI.doAs()}} context, in case of a socket timeout. The connection after 
> the reconnect might cause operation failures, because of using the wrong user.
> We'll need to add a unit-test to simulate this case, if possib.e



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17940:

   Resolution: Fixed
Fix Version/s: 1.2.3
   1.3.0
   Status: Resolved  (was: Patch Available)

I have checked this into {{branch-1}}, and {{branch-1.2}}. Thanks for the 
contribution, [~selinazh], and [~cdrome]. And thank you for the review, 
[~sershe]!

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 1.2.2
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 1.3.0, 1.2.3
>
> Attachments: HIVE-17940.1-branch-1.2.patch, 
> HIVE-17940.1-branch-1.patch
>
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17949) itests compile is busted on branch-1.2

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17949:

   Resolution: Fixed
Fix Version/s: 1.2.3
   Status: Resolved  (was: Patch Available)

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 1.2.3
>
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17949) itests compile is busted on branch-1.2

2017-12-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274812#comment-16274812
 ] 

Mithun Radhakrishnan commented on HIVE-17949:
-

bq. I'll check this compile fix in now.
I'm back to this, at last. I've checked this fix into {{branch-1.2}}. Thank you 
for sparing the time to review, [~sershe]!

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-11-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-11-30 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273151#comment-16273151
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you for the 
review, [~aihuaxu]!

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-11-29 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271567#comment-16271567
 ] 

Mithun Radhakrishnan edited comment on HIVE-17467 at 11/29/17 9:54 PM:
---

Rebased the master-patch, again. The {{branch-2}} patch seems alright.


was (Author: mithun):
Rebased, again.

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2-branch-2.patch, HIVE-17467.2.patch, HIVE-17467.3.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2017-11-29 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271593#comment-16271593
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Thanks for the review, [~aihuaxu]. The failures are unrelated. I'll have to 
rebase this patch, given how old it has gotten.

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-11-29 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.3.patch

Rebased, again.

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2-branch-2.patch, HIVE-17467.2.patch, HIVE-17467.3.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17949) itests compile is busted on branch-1.2

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234507#comment-16234507
 ] 

Mithun Radhakrishnan edited comment on HIVE-17949 at 11/1/17 10:02 PM:
---

On the bright side, the compile issue is fixed. On the other hand, the tests on 
{{branch-1.2}} are busted. 

I'll check this compile fix in now.


was (Author: mithun):
On the bright side, the compile issue is fixed. On the other hand, the tests on 
{{branch-1.2}} are busted. 

I'll check this in this compile fix now.

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17956) Retrieve "latest" partition from Hive Metastore

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234512#comment-16234512
 ] 

Mithun Radhakrishnan commented on HIVE-17956:
-

I'm afraid the sort-order is lexicographical, in the order of the keys 
specified.

One uncomfortable truth about Hive's metastore is that the concept of "latest" 
is alien. The partition key-values are stored internally as strings. Any notion 
of data-types for partition keys is a retrofit. Square pegs and round holes.

> Retrieve "latest" partition from Hive Metastore
> ---
>
> Key: HIVE-17956
> URL: https://issues.apache.org/jira/browse/HIVE-17956
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Micah Whitacre
>
> We are trying to utilize the Hive Metastore for our processing needs, 
> specifically focusing on consuming through the HCatalog APIs.  One use case 
> we have is that we want to consume the "latest" partition.  In researching 
> there are a number of posts[1][2] that talk about using queries through Hive 
> Server2 to find that information.  It would be more ideal if this was a first 
> class API offered from the Hive Metastore without requiring a query to be 
> executed.
> The other option would be to retrieve all of the partitions and sort client 
> side.  There is a concern about the efficiency and memory requirements of 
> this especially without the "iterator" concept implemented from HIVE-7195.
> [1] - 
> https://community.hortonworks.com/questions/85330/how-to-optimize-hive-access-to-the-latest-partitio.html
> [2] - 
> https://stackoverflow.com/questions/36095790/how-to-find-the-most-recent-partition-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17949) itests compile is busted on branch-1.2

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234507#comment-16234507
 ] 

Mithun Radhakrishnan commented on HIVE-17949:
-

On the bright side, the compile issue is fixed. On the other hand, the tests on 
{{branch-1.2}} are busted. 

I'll check this in this compile fix now.

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17956) Retrieve "latest" partition from Hive Metastore

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234468#comment-16234468
 ] 

Mithun Radhakrishnan commented on HIVE-17956:
-

Hello, [~mkwhitacre].

HIVE-17466 might be of interest to you. This feature adds metastore-calls that 
return unique values for specified partition keys. In the 
{{PartitionValuesRequest}}, one can specify the required partition keys (e.g. 
{{dt}}), a filter (e.g. {{dt > "20170101" && dt < "20171231"}}), a sort order 
(ascending/descending), and a limit (e.g. top 10).

The implementation sorts and filters on the metastore (in fact, it's pushed 
down to the database). It doesn't require to form complete {{Partition}} 
objects. This is much more memory-friendly and efficient than the alternative. 
It's in {{master}}, and {{branch-2}}. I have yet to port this to {{branch-2.2}}.

HIVE-17467 wraps this raw API up in an {{HCatClient}} wrapper that's easier to 
use. This was developed for use in Oozie (for discovery of data dependencies), 
and a couple of other projects. The unit test in that patch indicates how to 
use it. This has yet to be reviewed or checked in, I'm afraid.

Would this suit your requirement?



> Retrieve "latest" partition from Hive Metastore
> ---
>
> Key: HIVE-17956
> URL: https://issues.apache.org/jira/browse/HIVE-17956
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Micah Whitacre
>
> We are trying to utilize the Hive Metastore for our processing needs, 
> specifically focusing on consuming through the HCatalog APIs.  One use case 
> we have is that we want to consume the "latest" partition.  In researching 
> there are a number of posts[1][2] that talk about using queries through Hive 
> Server2 to find that information.  It would be more ideal if this was a first 
> class API offered from the Hive Metastore without requiring a query to be 
> executed.
> The other option would be to retrieve all of the partitions and sort client 
> side.  There is a concern about the efficiency and memory requirements of 
> this especially without the "iterator" concept implemented from HIVE-7195.
> [1] - 
> https://community.hortonworks.com/questions/85330/how-to-optimize-hive-access-to-the-latest-partitio.html
> [2] - 
> https://stackoverflow.com/questions/36095790/how-to-find-the-most-recent-partition-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234314#comment-16234314
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

[~vihangk1], would you be averse to our adding a unit-test in a followup JIRA?

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233534#comment-16233534
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

For the record, we've tested this out manually, using an Oozie setup, with 
user-impersonation. I suppose it might be possible to set up a unit-test using 
something based on {{MiniHiveKdc}}, but it looks non-trivial. Hmm...

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.2-branch-2.patch

Periodic rebase. :/

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2-branch-2.patch, HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227758#comment-16227758
 ] 

Mithun Radhakrishnan commented on HIVE-17949:
-

Thanks for the review, [~sershe]. I'm just waiting for the tests to kick in, 
before I check this in.

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Affects Version/s: 2.4.0
   3.0.0
   Status: Patch Available  (was: Open)

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.2.patch

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17949:

Status: Patch Available  (was: Open)

Submitting for tests. 

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17949:

Attachment: HIVE-17949.01-branch-1.2.patch

Here's the fix. Could I please bother either of \[[~vgumashta], [~sershe]\] to 
review?

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17949:
---


> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227419#comment-16227419
 ] 

Mithun Radhakrishnan commented on HIVE-17940:
-

bq. ... {{branch-1.2}} builds on my box.
I spoke too soon. Looks like {{branch-1.2}} is busted:

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-it-unit: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[31,41]
 package org.apache.hadoop.hbase.zookeeper does not exist
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[42,11]
 cannot find symbol
[ERROR]   symbol:   class MiniZooKeeperCluster
[ERROR]   location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAdminUser.java:[41,12]
 cannot find symbol
[ERROR]   symbol:   method getPrivilege()
[ERROR]   location: class 
org.apache.hadoop.hive.metastore.api.HiveObjectPrivilege
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAdminUser.java:[42,75]
 cannot find symbol
[ERROR]   symbol:   method getRole()
[ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
 no suitable method found for 
updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
 incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
cannot be converted to boolean
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
 incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
cannot be converted to boolean
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
 cannot find symbol
[ERROR]   symbol:   class MiniZooKeeperCluster
[ERROR]   location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-it-unit
{noformat}

This is without HIVE-17940. I'll raise (yet) another JIRA to sort out the 
breakage. 

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
>   

[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227401#comment-16227401
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

[~vihangk1],

bq. All subsequent actions coming via HS2 should also do a doAs() using the 
sessionProxy. Is this happening in case of HCatalog...
Right. HS2 doesn't come into it, since this has more to do with {{HCatClient}}. 
The HCatalog APIs use {{HiveClientCache}} to amortize the cost of 
{{HiveMetaStoreClient}} construction and metastore connections.
Systems like Oozie/Falcon that use {{HCatClient}} to make metastore-calls 
within a {{doAs()}} context might land up losing their {{UGI.doAs()}} contexts 
after timeout, causing any retried actions to run as a privileged, rather than 
the impersonated user.

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   >