[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102927#comment-16102927 ] Yibing Shi commented on HIVE-17050: --- No, [~pvary], these test failures seem irrelevant. I cannot reproduce the test failures even on my local machine. Not sure why they have failed. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102927#comment-16102927 ] Yibing Shi edited comment on HIVE-17050 at 7/27/17 8:32 AM: No, [~pvary], these test failures seem irrelevant. was (Author: yibing): No, [~pvary], these test failures seem irrelevant. I cannot reproduce the test failures even on my local machine. Not sure why they have failed. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.4.patch Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resutmit the patch. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102389#comment-16102389 ] Yibing Shi edited comment on HIVE-17050 at 7/26/17 10:30 PM: - Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resubmit the patch. was (Author: yibing): Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resutmit the patch. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.3.PATCH Submit a new patch > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101097#comment-16101097 ] Yibing Shi commented on HIVE-17050: --- Hi [~ychena], I will have a further look at this. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.5.PATCH Thank you for the review, [~stakiar_impala_496e]! I have modified the patch to adopt in point #1 and #3. As for #2: bq. Where does l4j print to? It depends. By default the local task is run in a new process, and the log4j appenders are not setup in the child process. As such, this l4j doesn't print to anywhere. But if {{hive.exec.submit.local.task.via.child}} is disabled, the Hive l4j variable is used, and thus the information will be printed to Hive log. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch, HIVE-17078.4.PATCH, HIVE-17078.5.PATCH > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085634#comment-16085634 ] Yibing Shi commented on HIVE-15767: --- Thanks for the explanation! This may be done by YARN instead of Spark. > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085612#comment-16085612 ] Yibing Shi commented on HIVE-15767: --- bq. I think the Spark driver will get the tokens afterwards I really doubt that Spark driver can do this. In Oozie environment, it is Oozie server that obtains all the tokens *on behalf of the end user*. When the Hive actions starts a Spark job, the Spark driver has no access to end user ticket or keytab file. I don't think it can obtain necessary tokens. I believe we should somehow extract all the tokens from existing toke file, and pass it on to the Spark driver. > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.4.PATCH > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch, HIVE-17078.4.PATCH > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085252#comment-16085252 ] Yibing Shi commented on HIVE-17078: --- Checked the failed tests. # org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] fails with something irrelevant to this patch # org.apache.hive.hcatalog.api.TestHCatClient. The failure also has nothing to do our patch. # org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14/23] fails because the output has changed in order. Nothing serious. We need to somehow update the .out files, but maybe in a separate JIRA # The other tests fails because now we have more logs in local tasks. Will update the .out files. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.3.patch Add a bit more logs > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084891#comment-16084891 ] Yibing Shi commented on HIVE-17078: --- I am trying to keep the current behaviour. With Hive CLI, by default Hive logs are not printed. Some users may rely on the stdout/stderr information. I don't want to surprise them. If you still think it is unnecessary to print child stdout/stderr to Hive stdout/stderr, I can remove the corresponding code. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.2.patch Recreate the patch > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.1.patch Attach a quick patch. No tests are added, because this feature seems not be able to be tested in mini cluster. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Status: Patch Available (was: Open) > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-17078: - Assignee: Yibing Shi > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083374#comment-16083374 ] Yibing Shi commented on HIVE-15767: --- [~peterceluch], can the tokens in Oozie launcher application still be passed to Spark job when property {{mapreduce.job.credentials.binary}} is unset? For example, in an environment where HDFS transparent encryption is enabled, is Spark job still able to connect to KMS servers? (The change is in {{RemoteHiveSparkClient}}. Hive on MR shouldn't be affected. Oozie actions have already make sure the tokens are added to action configuration, which then should be passed to MR jobs). > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.2.patch > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077393#comment-16077393 ] Yibing Shi commented on HIVE-17050: --- [~asherman], your change has covered what my change does. So I just added a few more tests in this JIRA. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076414#comment-16076414 ] Yibing Shi commented on HIVE-17050: --- Error seem irrelevant. The only failed test that is possibly affected by this change is TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1], which doesn't contain any comment in the query script. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17052) Remove logging of predicate filters
[ https://issues.apache.org/jira/browse/HIVE-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17052: -- Attachment: HIVE-17052.1.patch Submit the patch > Remove logging of predicate filters > --- > > Key: HIVE-17052 > URL: https://issues.apache.org/jira/browse/HIVE-17052 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Yibing Shi > Attachments: HIVE-17052.1.patch > > > HIVE-16869 added the filter predicate to the debug log of HS2, but since > these filters may contain sensitive information they should not be logged out. > The log statement should be changed back to the original form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17052) Remove logging of predicate filters
[ https://issues.apache.org/jira/browse/HIVE-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17052: -- Status: Patch Available (was: Open) > Remove logging of predicate filters > --- > > Key: HIVE-17052 > URL: https://issues.apache.org/jira/browse/HIVE-17052 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Yibing Shi > Attachments: HIVE-17052.1.patch > > > HIVE-16869 added the filter predicate to the debug log of HS2, but since > these filters may contain sensitive information they should not be logged out. > The log statement should be changed back to the original form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17052) Remove logging of predicate filters
[ https://issues.apache.org/jira/browse/HIVE-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-17052: - Assignee: Yibing Shi > Remove logging of predicate filters > --- > > Key: HIVE-17052 > URL: https://issues.apache.org/jira/browse/HIVE-17052 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Yibing Shi > > HIVE-16869 added the filter predicate to the debug log of HS2, but since > these filters may contain sensitive information they should not be logged out. > The log statement should be changed back to the original form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Assignee: Yibing Shi Status: Patch Available (was: Open) > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.1.patch Submit a patch > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi > Attachments: HIVE-17050.1.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16930) HoS should verify the value of Kerberos principal and keytab file before adding them to spark-submit command parameters
[ https://issues.apache.org/jira/browse/HIVE-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16930: -- Assignee: Yibing Shi Status: Patch Available (was: Open) > HoS should verify the value of Kerberos principal and keytab file before > adding them to spark-submit command parameters > --- > > Key: HIVE-16930 > URL: https://issues.apache.org/jira/browse/HIVE-16930 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16930.1.patch > > > When Kerberos is enabled, Hive CLI fails to run Hive on Spark queries: > {noformat} > >hive -e "set hive.execution.engine=spark; create table if not exists test(a > >int); select count(*) from test" --hiveconf hive.root.logger=INFO,console > > >/var/tmp/hive_log.txt > /var/tmp/hive_log_2.txt > 17/06/16 16:13:13 [main]: ERROR client.SparkClientImpl: Error while waiting > for client to connect. > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel > client 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited > before connecting back with error log Error: Cannot load main class from JAR > file:/tmp/spark-submit.7196051517706529285.properties > Run with --help for usage help or --verbose for debug output > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) > at > org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:107) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) > > at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:100) > > at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:96) > > at > org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:66) > > at > org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) > > at > org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) > > at > org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:111) > > at > org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1972) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1685) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1421) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1205) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1195) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.RuntimeException: Cancel client > 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited before > connecting back with error log Error: Cannot load main class from JAR > file:/tmp/spark-submit.7196051517706529285.properties > Run with --help for usage help or --verbose for debug output > at > org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) > at > org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:490) > at java.lang.Thread.run(Thread.java:745) > 17/06/16 16:13:13 [Driver]: WARN client.SparkClientImpl: Child process exited > with code 1 > {noformat} > In the log, below message shows up: >
[jira] [Updated] (HIVE-16930) HoS should verify the value of Kerberos principal and keytab file before adding them to spark-submit command parameters
[ https://issues.apache.org/jira/browse/HIVE-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16930: -- Attachment: HIVE-16930.1.patch Submit a patch. > HoS should verify the value of Kerberos principal and keytab file before > adding them to spark-submit command parameters > --- > > Key: HIVE-16930 > URL: https://issues.apache.org/jira/browse/HIVE-16930 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Yibing Shi > Attachments: HIVE-16930.1.patch > > > When Kerberos is enabled, Hive CLI fails to run Hive on Spark queries: > {noformat} > >hive -e "set hive.execution.engine=spark; create table if not exists test(a > >int); select count(*) from test" --hiveconf hive.root.logger=INFO,console > > >/var/tmp/hive_log.txt > /var/tmp/hive_log_2.txt > 17/06/16 16:13:13 [main]: ERROR client.SparkClientImpl: Error while waiting > for client to connect. > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel > client 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited > before connecting back with error log Error: Cannot load main class from JAR > file:/tmp/spark-submit.7196051517706529285.properties > Run with --help for usage help or --verbose for debug output > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) > at > org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:107) > at > org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) > > at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:100) > > at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:96) > > at > org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:66) > > at > org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) > > at > org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) > > at > org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:111) > > at > org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1972) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1685) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1421) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1205) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1195) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.RuntimeException: Cancel client > 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited before > connecting back with error log Error: Cannot load main class from JAR > file:/tmp/spark-submit.7196051517706529285.properties > Run with --help for usage help or --verbose for debug output > at > org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) > at > org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:490) > at java.lang.Thread.run(Thread.java:745) > 17/06/16 16:13:13 [Driver]: WARN client.SparkClientImpl: Child process exited > with code 1 > {noformat} > In the log, below message shows up: > {noformat} > 17/06/16 16:13:12 [main]: INFO
[jira] [Commented] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
[ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045279#comment-16045279 ] Yibing Shi commented on HIVE-16869: --- The idea of the patch is to change the logic of predicate {{OR}}. Currently, if a child of predicate {{OR}} returns a null predicate, this child is ignored. This is not correct. A null predicate means that the condition is on a column that doesn't exist in Parquet file (partition column etc.). In such a scenario, the whole {{OR}} should be considered to true (returns null) so that the record should be returned for further checking (if this {{OR}} is at top level) or the parent predicate can be correctly evaluated (if current {{OR}} is a child of another predicate). > Hive returns wrong result when predicates on non-existing columns are pushed > down to Parquet reader > --- > > Key: HIVE-16869 > URL: https://issues.apache.org/jira/browse/HIVE-16869 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Critical > Attachments: HIVE-16869.1.patch, HIVE-16869.2.patch > > > When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and > a select query has a condition on a column that doesn't exist in Parquet file > (such as a partition column), Hive often returns wrong result. > Please see below example for details: > {noformat} > hive> create table test_parq (a int, b int) partitioned by (p int) stored as > parquet; > OK > Time taken: 0.292 seconds > hive> insert overwrite table test_parq partition (p=1) values (1, 2); > OK > Time taken: 5.08 seconds > hive> select * from test_parq where a=1 and p=1; > OK > 1 2 1 > Time taken: 0.441 seconds, Fetched: 1 row(s) > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > 1 2 1 > Time taken: 0.197 seconds, Fetched: 1 row(s) > hive> set hive.optimize.index.filter=true; > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > Time taken: 0.167 seconds > hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); > OK > Time taken: 0.563 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
[ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16869: -- Attachment: HIVE-16869.2.patch fix the typo in qtest > Hive returns wrong result when predicates on non-existing columns are pushed > down to Parquet reader > --- > > Key: HIVE-16869 > URL: https://issues.apache.org/jira/browse/HIVE-16869 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Critical > Attachments: HIVE-16869.1.patch, HIVE-16869.2.patch > > > When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and > a select query has a condition on a column that doesn't exist in Parquet file > (such as a partition column), Hive often returns wrong result. > Please see below example for details: > {noformat} > hive> create table test_parq (a int, b int) partitioned by (p int) stored as > parquet; > OK > Time taken: 0.292 seconds > hive> insert overwrite table test_parq partition (p=1) values (1, 2); > OK > Time taken: 5.08 seconds > hive> select * from test_parq where a=1 and p=1; > OK > 1 2 1 > Time taken: 0.441 seconds, Fetched: 1 row(s) > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > 1 2 1 > Time taken: 0.197 seconds, Fetched: 1 row(s) > hive> set hive.optimize.index.filter=true; > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > Time taken: 0.167 seconds > hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); > OK > Time taken: 0.563 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
[ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16869: -- Status: Patch Available (was: Open) > Hive returns wrong result when predicates on non-existing columns are pushed > down to Parquet reader > --- > > Key: HIVE-16869 > URL: https://issues.apache.org/jira/browse/HIVE-16869 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Critical > Attachments: HIVE-16869.1.patch > > > When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and > a select query has a condition on a column that doesn't exist in Parquet file > (such as a partition column), Hive often returns wrong result. > Please see below example for details: > {noformat} > hive> create table test_parq (a int, b int) partitioned by (p int) stored as > parquet; > OK > Time taken: 0.292 seconds > hive> insert overwrite table test_parq partition (p=1) values (1, 2); > OK > Time taken: 5.08 seconds > hive> select * from test_parq where a=1 and p=1; > OK > 1 2 1 > Time taken: 0.441 seconds, Fetched: 1 row(s) > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > 1 2 1 > Time taken: 0.197 seconds, Fetched: 1 row(s) > hive> set hive.optimize.index.filter=true; > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > Time taken: 0.167 seconds > hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); > OK > Time taken: 0.563 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
[ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16869: -- Attachment: HIVE-16869.1.patch Submit a patch > Hive returns wrong result when predicates on non-existing columns are pushed > down to Parquet reader > --- > > Key: HIVE-16869 > URL: https://issues.apache.org/jira/browse/HIVE-16869 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Critical > Attachments: HIVE-16869.1.patch > > > When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and > a select query has a condition on a column that doesn't exist in Parquet file > (such as a partition column), Hive often returns wrong result. > Please see below example for details: > {noformat} > hive> create table test_parq (a int, b int) partitioned by (p int) stored as > parquet; > OK > Time taken: 0.292 seconds > hive> insert overwrite table test_parq partition (p=1) values (1, 2); > OK > Time taken: 5.08 seconds > hive> select * from test_parq where a=1 and p=1; > OK > 1 2 1 > Time taken: 0.441 seconds, Fetched: 1 row(s) > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > 1 2 1 > Time taken: 0.197 seconds, Fetched: 1 row(s) > hive> set hive.optimize.index.filter=true; > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > Time taken: 0.167 seconds > hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); > OK > Time taken: 0.563 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
[ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-16869: - > Hive returns wrong result when predicates on non-existing columns are pushed > down to Parquet reader > --- > > Key: HIVE-16869 > URL: https://issues.apache.org/jira/browse/HIVE-16869 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Critical > > When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and > a select query has a condition on a column that doesn't exist in Parquet file > (such as a partition column), Hive often returns wrong result. > Please see below example for details: > {noformat} > hive> create table test_parq (a int, b int) partitioned by (p int) stored as > parquet; > OK > Time taken: 0.292 seconds > hive> insert overwrite table test_parq partition (p=1) values (1, 2); > OK > Time taken: 5.08 seconds > hive> select * from test_parq where a=1 and p=1; > OK > 1 2 1 > Time taken: 0.441 seconds, Fetched: 1 row(s) > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > 1 2 1 > Time taken: 0.197 seconds, Fetched: 1 row(s) > hive> set hive.optimize.index.filter=true; > hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); > OK > Time taken: 0.167 seconds > hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); > OK > Time taken: 0.563 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16660) Not able to add partition for views in hive when sentry is enabled
[ https://issues.apache.org/jira/browse/HIVE-16660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009153#comment-16009153 ] Yibing Shi commented on HIVE-16660: --- [~ychena], should we solve these 2 problems in 2 different JIRAs? They are not related. > Not able to add partition for views in hive when sentry is enabled > -- > > Key: HIVE-16660 > URL: https://issues.apache.org/jira/browse/HIVE-16660 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-16660.1.patch > > > Repro: > create table tesnit (a int) partitioned by (p int); > insert into table tesnit partition (p = 1) values (1); > insert into table tesnit partition (p = 2) values (1); > create view test_view partitioned on (p) as select * from tesnit where p =1; > alter view test_view add partition (p = 2); > Error: Error while compiling statement: FAILED: SemanticException [Error > 10056]: The query does not reference any valid partition. To run this query, > set hive.mapred.mode=nonstrict (state=42000,code=10056) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007695#comment-16007695 ] Yibing Shi edited comment on HIVE-16646 at 5/12/17 6:37 AM: These errors seem irrelevant. Could you please have a look as well [~ychena]? was (Author: yibing): Pulled down the latest master branch, and applied the patch from [~ychena]. The failed tests listed above all succeed for me. Can we kick off the test again and see how it goes? > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16646.1.patch, HIVE-16646.2.patch > > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007695#comment-16007695 ] Yibing Shi commented on HIVE-16646: --- Pulled down the latest master branch, and applied the patch from [~ychena]. The failed tests listed above all succeed for me. Can we kick off the test again and see how it goes? > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16646.1.patch, HIVE-16646.2.patch > > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007339#comment-16007339 ] Yibing Shi commented on HIVE-16646: --- Thank you, [~ychena]! > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16646.1.patch, HIVE-16646.2.patch > > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16646: -- Assignee: Yibing Shi Status: Patch Available (was: Open) > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16646.1.patch > > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16646: -- Attachment: HIVE-16646.1.patch Attach a patch > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi > Attachments: HIVE-16646.1.patch > > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16646) Alias in transform ... as clause shouldn't be case sensitive
[ https://issues.apache.org/jira/browse/HIVE-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006394#comment-16006394 ] Yibing Shi commented on HIVE-16646: --- Another query that can show this problem more clearly is: {code:sql} select t.col from ( select transform(col) using 'cat' as (COL string) from transform3_t1 ) t; {code} It fails with below error: {noformat} FAILED: SemanticException [Error 10002]: Line 1:9 Invalid column reference 'col' {noformat} Changing {{as (COL string)}} to {{as (col string)}} makes the query run properly. > Alias in transform ... as clause shouldn't be case sensitive > > > Key: HIVE-16646 > URL: https://issues.apache.org/jira/browse/HIVE-16646 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Reporter: Yibing Shi > > Create a table like below: > {code:sql} > CREATE TABLE hive_bug(col1 string); > {code} > Run below query in Hive: > {code} > from hive_bug select transform(col1) using '/bin/cat' as ( string); > {code} > The result would be: > {noformat} > 0: jdbc:hive2://localhost:1> from hive_bug select transform(col1) using > '/bin/cat' as ( string); > .. > INFO : OK > +---+--+ > | | > +---+--+ > +---+--+ > {noformat} > The output column name is ** instead of the lowercase . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself
[ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16291: -- Attachment: HIVE-16291.2.patch > Hive fails when unions a parquet table with itself > -- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16291.1.patch, HIVE-16291.2.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as > parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin where > tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive > tries to push the query predicates to the table scan operators so that only > the needed columns are read. This is done by adding the needed column IDs > into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data > from one same table. The first subquery doesn't need any column from Parquet > file, while the second subquery needs a column "col1". Hive has a bug here, > it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which > method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16291) Hive fails when unions a parquet table with itself
[ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958307#comment-15958307 ] Yibing Shi commented on HIVE-16291: --- [~aihuaxu] Actually, I have just realized that we can change it into below line: {code} String newConfStr = HiveStringUtils.joinIgnoringEmpty(new String[] {id, old}, StringUtils.COMMA); {code} > Hive fails when unions a parquet table with itself > -- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16291.1.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as > parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin where > tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive > tries to push the query predicates to the table scan operators so that only > the needed columns are read. This is done by adding the needed column IDs > into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data > from one same table. The first subquery doesn't need any column from Parquet > file, while the second subquery needs a column "col1". Hive has a bug here, > it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which > method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16291) Hive fails when unions a parquet table with itself
[ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958304#comment-15958304 ] Yibing Shi commented on HIVE-16291: --- [~aihuaxu] Sorry for the delay! I was totally stuck in other problems and didn't get a chance to check this. I submitted my patch trying to minimize the scope of my changes (touch as less line as possible). Yes, I agree that the logic is a bit confusing. Your suggestions look great! I have a slightly modified version as below. How do you think? {code} String newConfStr = null; for (String s : Arrays.asList(id, old)) { if (org.apache.commons.lang.StringUtils.isNotBlank(s)) { newConfStr = newConfStr == null ? s : newConfStr + StringUtils.COMMA_STR + s; } } {code} > Hive fails when unions a parquet table with itself > -- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16291.1.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as > parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin where > tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive > tries to push the query predicates to the table scan operators so that only > the needed columns are read. This is done by adding the needed column IDs > into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data > from one same table. The first subquery doesn't need any column from Parquet > file, while the second subquery needs a column "col1". Hive has a bug here, > it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which > method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself
[ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16291: -- Assignee: Yibing Shi Status: Patch Available (was: Open) > Hive fails when unions a parquet table with itself > -- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-16291.1.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as > parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin union all select > tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive > tries to push the query predicates to the table scan operators so that only > the needed columns are read. This is done by adding the needed column IDs > into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data > from one same table. The first subquery doesn't need any column from Parquet > file, while the second subquery needs a column "col1". Hive has a bug here, > it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which > method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16291) Hive fails when unions a parquet table with itself
[ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-16291: -- Attachment: HIVE-16291.1.patch > Hive fails when unions a parquet table with itself > -- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > Attachments: HIVE-16291.1.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as > parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin union all select > tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive > tries to push the query predicates to the table scan operators so that only > the needed columns are read. This is done by adding the needed column IDs > into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data > from one same table. The first subquery doesn't need any column from Parquet > file, while the second subquery needs a column "col1". Hive has a bug here, > it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which > method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.5.patch Attach a new patch based on [~ctang.ma]'s comment > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch, HIVE-15530.5.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814799#comment-15814799 ] Yibing Shi commented on HIVE-15530: --- You are right that the column stats don't need to be updated if only column positions are changed. Current patch doesn't optimize this, because I didn't notice that {{areSameColumns}} also compares column positions. I will upload a new patch soon. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812787#comment-15812787 ] Yibing Shi commented on HIVE-15530: --- Hi [~ctang.ma], thanks for looking into this patch! I believe that the stats should be still be updated in the scenario you described, because it is column name (not ID) is stored in stats tables. When a column name is changed, the existing stats info should be updated, or at least removed. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.4.patch Thanks [~aihuaxu] for looking into the patch. I have corrected the license declarement of the new files based on your suggestion. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch, HIVE-15530.4.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.3.patch Try to fix the broken patch > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, > HIVE-15530.3.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.2.patch Add unit tests > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Status: Patch Available (was: Open) > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-15530: - Assignee: Yibing Shi > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.1.patch > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Description: Currently when a table is altered, if any of below conditions is true, HMS would try to update column statistics for the table: # database name is changed # table name is changed # old columns and new columns are not the same As a result, when a column is added to a table, Hive also tries to update column statistics, which is not necessary. We can loose the last condition by checking whether all existing columns are changed or not. If not, we don't have to update stats info. was: Currently when a table is altered, if any of below conditions is false, HMS would try to update column statistics for the table: # database name is changed # table name is changed # old columns and new columns are not the same As a result, when a column is added to a table, Hive also tries to update column statistics, which is not necessary. We can loose the last condition by checking whether all existing columns are changed or not. If not, we don't have to update stats info. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15225) QueryPlan.getJSONValue should code against empty string values
[ https://issues.apache.org/jira/browse/HIVE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15225: -- Status: Patch Available (was: Open) > QueryPlan.getJSONValue should code against empty string values > -- > > Key: HIVE-15225 > URL: https://issues.apache.org/jira/browse/HIVE-15225 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi > Attachments: HIVE-15225.1.patch > > > The current {{QueryPlan.getJSONValue}} implementation is as below: > {code} > public String getJSONValue(Object value) { > String v = "null"; > if (value != null) { > v = value.toString(); > if (v.charAt(0) != '[' && v.charAt(0) != '{') { > v = "\"" + v + "\""; > } > } > return v; > } > {code} > When {{value.toString()}} returns an empty string, a > StringIndexOutOfRangeException would be thrown out when "v.charAt(0)" is > evaluated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-15225) QueryPlan.getJSONValue should code against empty string values
[ https://issues.apache.org/jira/browse/HIVE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-15225: - Assignee: Yibing Shi > QueryPlan.getJSONValue should code against empty string values > -- > > Key: HIVE-15225 > URL: https://issues.apache.org/jira/browse/HIVE-15225 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15225.1.patch > > > The current {{QueryPlan.getJSONValue}} implementation is as below: > {code} > public String getJSONValue(Object value) { > String v = "null"; > if (value != null) { > v = value.toString(); > if (v.charAt(0) != '[' && v.charAt(0) != '{') { > v = "\"" + v + "\""; > } > } > return v; > } > {code} > When {{value.toString()}} returns an empty string, a > StringIndexOutOfRangeException would be thrown out when "v.charAt(0)" is > evaluated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15225) QueryPlan.getJSONValue should code against empty string values
[ https://issues.apache.org/jira/browse/HIVE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15225: -- Attachment: HIVE-15225.1.patch Attach a quick patch > QueryPlan.getJSONValue should code against empty string values > -- > > Key: HIVE-15225 > URL: https://issues.apache.org/jira/browse/HIVE-15225 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi > Attachments: HIVE-15225.1.patch > > > The current {{QueryPlan.getJSONValue}} implementation is as below: > {code} > public String getJSONValue(Object value) { > String v = "null"; > if (value != null) { > v = value.toString(); > if (v.charAt(0) != '[' && v.charAt(0) != '{') { > v = "\"" + v + "\""; > } > } > return v; > } > {code} > When {{value.toString()}} returns an empty string, a > StringIndexOutOfRangeException would be thrown out when "v.charAt(0)" is > evaluated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14609) HS2 cannot drop a function whose associated jar file has been removed
[ https://issues.apache.org/jira/browse/HIVE-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435978#comment-15435978 ] Yibing Shi commented on HIVE-14609: --- To drop a function, Hive first gets the function definition: https://github.com/cloudera/hive/blob/cdh5-1.1.0_5.8.0/ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java#L99 {code} FunctionInfo info = FunctionRegistry.getFunctionInfo(functionName); if (info == null) { if (throwException) { throw new SemanticException(ErrorMsg.INVALID_FUNCTION.getMsg(functionName)); } else { // Fail silently return; } } else if (info.isBuiltIn()) { throw new SemanticException(ErrorMsg.DROP_NATIVE_FUNCTION.getMsg(functionName)); } {code} Unfortunately {{FunctionRegistry.getFunctionInfo}} tries to load the function into registry after gets its definition, which includes the step of downloading jars and causes the failure. We should be able to fix this by adding one parameter to the getFunctionInfo method to control whether to adds the function to registry. And for the reason why Hive fails silently, it is because "hive.exec.drop.ignorenonexistent" is set to true by default, and thus Hive doesn't throw any exception when the failure happens. > HS2 cannot drop a function whose associated jar file has been removed > - > > Key: HIVE-14609 > URL: https://issues.apache.org/jira/browse/HIVE-14609 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Chaoyu Tang > > Create a permanent function with below command: > {code:sql} > create function yshi.dummy as 'com.yshi.hive.udf.DummyUDF' using jar > 'hdfs://host-10-17-81-142.coe.cloudera.com:8020/hive/jars/yshi.jar'; > {code} > After that, delete the HDFS file > {{hdfs://host-10-17-81-142.coe.cloudera.com:8020/hive/jars/yshi.jar}}, and > *restart HS2 to remove the loaded class*. > Now the function cannot be dropped: > {noformat} > 0: jdbc:hive2://10.17.81.144:1/default> show functions yshi.dummy; > INFO : Compiling > command(queryId=hive_20160821213434_d0271d77-84d8-45ba-8d92-3da1c143bded): > show functions yshi.dummy > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from > deserializer)], properties:null) > INFO : Completed compiling > command(queryId=hive_20160821213434_d0271d77-84d8-45ba-8d92-3da1c143bded); > Time taken: 1.259 seconds > INFO : Executing > command(queryId=hive_20160821213434_d0271d77-84d8-45ba-8d92-3da1c143bded): > show functions yshi.dummy > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : SHOW FUNCTIONS is deprecated, please use SHOW FUNCTIONS LIKE instead. > INFO : Completed executing > command(queryId=hive_20160821213434_d0271d77-84d8-45ba-8d92-3da1c143bded); > Time taken: 0.024 seconds > INFO : OK > +-+--+ > | tab_name | > +-+--+ > | yshi.dummy | > +-+--+ > 1 row selected (3.877 seconds) > 0: jdbc:hive2://10.17.81.144:1/default> drop function yshi.dummy; > INFO : Compiling > command(queryId=hive_20160821213434_47d14df5-59b3-4ebc-9a48-5e1d9c60c1fc): > drop function yshi.dummy > INFO : converting to local > hdfs://host-10-17-81-142.coe.cloudera.com:8020/hive/jars/yshi.jar > ERROR : Failed to read external resource > hdfs://host-10-17-81-142.coe.cloudera.com:8020/hive/jars/yshi.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://host-10-17-81-142.coe.cloudera.com:8020/hive/jars/yshi.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1200) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1136) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1126) > at > org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:304) > at > org.apache.hadoop.hive.ql.exec.Registry.registerToSessionRegistry(Registry.java:470) > at > org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:456) > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:245) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:455) > at > org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeDropFunction(FunctionSemanticAnalyzer.java:99) > at > org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeInternal(FunctionSemanticAnalyzer.java:61) > at >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395038#comment-15395038 ] Yibing Shi commented on HIVE-14205: --- Thanks [~ctang.ma]! > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, > HIVE-14205.6.patch, HIVE-14205.7.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391794#comment-15391794 ] Yibing Shi commented on HIVE-14205: --- These errors seem irrelevant. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, > HIVE-14205.6.patch, HIVE-14205.7.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.7.patch Fix the qtests > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, > HIVE-14205.6.patch, HIVE-14205.7.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391021#comment-15391021 ] Yibing Shi commented on HIVE-14205: --- It looks like some latest change in master branch breaks my test. After applying the latest changes of master branch, I can reproduce the test failure. Will look further into it. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, HIVE-14205.6.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value`
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.6.patch Modify the itests to use text files. See how it goes now > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, HIVE-14205.6.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387114#comment-15387114 ] Yibing Shi commented on HIVE-14205: --- Still failed. I will work on a new patch. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.5.patch attach a new patch that includes latest changes in master branch. If this still doesn't work, I will remove the binary files and use insert instead as [~ctang.ma] has said. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '')
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.4.patch I have verified this patch can be applied: {noformat} ➜ repo git:(master) patch -p0 <~/Downloads/HIVE-14205.4.patch File data/files/union_non_nullable.avro: git binary diffs are not supported. File data/files/union_nullable.avro: git binary diffs are not supported. patching file ql/src/test/queries/clientnegative/avro_non_nullable_union.q patching file ql/src/test/queries/clientpositive/avro_nullable_union.q patching file ql/src/test/results/clientnegative/avro_non_nullable_union.q.out patching file ql/src/test/results/clientpositive/avro_nullable_union.q.out patching file serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java patching file serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java patching file serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java patching file serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java {noformat} > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381670#comment-15381670 ] Yibing Shi commented on HIVE-14205: --- [~ctang.ma], could you please helpl check whether you can apply the patch? I can apply it on my laptop > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.3.patch I created this patch with command: {noformat} git diff --no-prefix --binary HEAD~1 HEAD > ~/Downloads/HIVE-14205.3.patch {noformat} > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: (was: HIVE-14205.3.patch) > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.3.patch > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381651#comment-15381651 ] Yibing Shi commented on HIVE-14205: --- [~ctang.ma], these 2 files are binary AVRO files. Looks like they are causing trouble to git apply. Let me recreate the patch file with the command described [here|http://stackoverflow.com/questions/17152171/git-cannot-apply-binary-patch-without-full-index-line] > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value`
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.2.patch submit a new patch based on code review > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT >
[jira] [Issue Comment Deleted] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Comment: was deleted (was: Just found that current Hive union type implementation has an essential confliction with AVRO implementation. Currently Hive uses {{UnionObject}} as the value of union type columns. For example, if we create a table like below: {noformat} create table avro_union_test2 (value uniontype); {noformat} We cannot just stored int or bigint data to column "value". Instead, we will have to use UDF create_union to create a {{UnionObject}} value: {noformat} insert overwrite table avro_union_test2 select 1 as value; -- this fails insert overwrite table avro_union_test2 select create_union(0,1,2L) as value; -- this succeeds {noformat} If the table uses text file format, the data stored in file is as below: {noformat} 0:1 {noformat} where the 0 is the tag/offset of the object, and 1 is the actual value. (the 2L part is used only for type checking and isn't stored in data file at all) AvroSerDe stores data in a similar way. It stores the type offset together with the actual data. But when reading data, avro returns the actual data instead of a {{UnionObject}}: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L179 For above data created by {{create_union}}, the AvroSerDe returns an Integer instead of a UnionObject. This makes Hive fail in future operations (writing to data files or formatting as Json string). I will check to see whether we have a way to fix this.) > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378929#comment-15378929 ] Yibing Shi commented on HIVE-14205: --- Just found that current Hive union type implementation has an essential confliction with AVRO implementation. Currently Hive uses {{UnionObject}} as the value of union type columns. For example, if we create a table like below: {noformat} create table avro_union_test2 (value uniontype); {noformat} We cannot just stored int or bigint data to column "value". Instead, we will have to use UDF create_union to create a {{UnionObject}} value: {noformat} insert overwrite table avro_union_test2 select 1 as value; -- this fails insert overwrite table avro_union_test2 select create_union(0,1,2L) as value; -- this succeeds {noformat} If the table uses text file format, the data stored in file is as below: {noformat} 0:1 {noformat} where the 0 is the tag/offset of the object, and 1 is the actual value. (the 2L part is used only for type checking and isn't stored in data file at all) AvroSerDe stores data in a similar way. It stores the type offset together with the actual data. But when reading data, avro returns the actual data instead of a {{UnionObject}}: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L179 For above data created by {{create_union}}, the AvroSerDe returns an Integer instead of a UnionObject. This makes Hive fail in future operations (writing to data files or formatting as Json string). I will check to see whether we have a way to fix this. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372699#comment-15372699 ] Yibing Shi commented on HIVE-14205: --- code review: https://reviews.apache.org/r/49952/ > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.1.patch > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Assignee: Yibing Shi Status: Patch Available (was: Open) > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: (was: HIVE-14205.1.patch) > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION > 'hdfs://localhost/user/hive/warehouse/avro_union_test2' > TBLPROPERTIES
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.1.patch > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi > Attachments: HIVE-14205.1.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION >
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370611#comment-15370611 ] Yibing Shi commented on HIVE-14205: --- Will submit a patch later > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION >
[jira] [Commented] (HIVE-13065) Hive throws NPE when writing map type data to a HBase backed table
[ https://issues.apache.org/jira/browse/HIVE-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149531#comment-15149531 ] Yibing Shi commented on HIVE-13065: --- How about the reading part? If we skip the null values, would it affect the reading part? And what if we have a null value in key set? This is possible in theory. > Hive throws NPE when writing map type data to a HBase backed table > -- > > Key: HIVE-13065 > URL: https://issues.apache.org/jira/browse/HIVE-13065 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-13065.1.patch > > > Hive throws NPE when writing data to a HBase backed table with below > conditions: > # There is a map type column > # The map type column has NULL in its values > Below are the reproduce steps: > *1) Create a HBase backed Hive table* > {code:sql} > create table hbase_test (id bigint, data map) > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > with serdeproperties ("hbase.columns.mapping" = ":key,cf:map_col") > tblproperties ("hbase.table.name" = "hive_test"); > {code} > *2) insert data into above table* > {code:sql} > insert overwrite table hbase_test select 1 as id, map('abcd', null) as data > from src limit 1; > {code} > The mapreduce job for insert query fails. Error messages are as below: > {noformat} > 2016-02-15 02:26:33,225 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) > ... 7 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:731) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:666) > ... 14 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:221) > at > org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:236) > at > org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:275) > at > org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:222) > at > org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194) > at > org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118) > at > org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"
[ https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908836#comment-14908836 ] Yibing Shi commented on HIVE-11733: --- Sorry, got distracted by other stuff. Will add a test case for this. > UDF GenericUDFReflect cannot find classes added by "ADD JAR" > > > Key: HIVE-11733 > URL: https://issues.apache.org/jira/browse/HIVE-11733 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.1 >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-11733.1.patch > > > When run below command: > {quote} > hive -e "add jar /root/hive/TestReflect.jar; \ > select reflect('com.yshi.hive.TestReflect', 'testReflect', code) from > sample_07 limit 3" > {quote} > Get below error: > {noformat} > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > {noformat} > The full stack trace is: > {noformat} > 15/09/04 07:00:37 [main]: INFO compress.CodecPool: Got brand-new decompressor > [.bz2] > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > 15/09/04 07:00:37 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect > evaluate > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:107) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) > ... 13 more > Caused by: java.lang.ClassNotFoundException: com.yshi.hive.TestReflect > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:190) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:105) > ... 22 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"
[ https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-11733: - Assignee: Yibing Shi > UDF GenericUDFReflect cannot find classes added by "ADD JAR" > > > Key: HIVE-11733 > URL: https://issues.apache.org/jira/browse/HIVE-11733 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.1 >Reporter: Yibing Shi >Assignee: Yibing Shi > > When run below command: > {quote} > hive -e "add jar /root/hive/TestReflect.jar; \ > select reflect('com.yshi.hive.TestReflect', 'testReflect', code) from > sample_07 limit 3" > {quote} > Get below error: > {noformat} > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > {noformat} > The full stack trace is: > {noformat} > 15/09/04 07:00:37 [main]: INFO compress.CodecPool: Got brand-new decompressor > [.bz2] > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > 15/09/04 07:00:37 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect > evaluate > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:107) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) > ... 13 more > Caused by: java.lang.ClassNotFoundException: com.yshi.hive.TestReflect > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:190) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:105) > ... 22 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11733) UDF GenericUDFReflect cannot find classes added by "ADD JAR"
[ https://issues.apache.org/jira/browse/HIVE-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-11733: -- Attachment: HIVE-11733.1.patch Upload the patch. > UDF GenericUDFReflect cannot find classes added by "ADD JAR" > > > Key: HIVE-11733 > URL: https://issues.apache.org/jira/browse/HIVE-11733 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.1 >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-11733.1.patch > > > When run below command: > {quote} > hive -e "add jar /root/hive/TestReflect.jar; \ > select reflect('com.yshi.hive.TestReflect', 'testReflect', code) from > sample_07 limit 3" > {quote} > Get below error: > {noformat} > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > {noformat} > The full stack trace is: > {noformat} > 15/09/04 07:00:37 [main]: INFO compress.CodecPool: Got brand-new decompressor > [.bz2] > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > 15/09/04 07:00:37 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: > UDFReflect evaluate > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect > evaluate > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:107) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) > ... 13 more > Caused by: java.lang.ClassNotFoundException: com.yshi.hive.TestReflect > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:190) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:105) > ... 22 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11216) UDF GenericUDFMapKeys throws NPE when a null map value is passed in
[ https://issues.apache.org/jira/browse/HIVE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-11216: - Assignee: Yibing Shi UDF GenericUDFMapKeys throws NPE when a null map value is passed in --- Key: HIVE-11216 URL: https://issues.apache.org/jira/browse/HIVE-11216 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Yibing Shi Assignee: Yibing Shi We can reproduce the problem as below: {noformat} hive show create table map_txt; OK CREATE TABLE `map_txt`( `id` int, `content` mapint,string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ... Time taken: 0.233 seconds, Fetched: 18 row(s) hive select * from map_txt; OK 1 NULL Time taken: 0.679 seconds, Fetched: 1 row(s) hive select id, map_keys(content) from map_txt; Error during job, obtaining debugging information... Examining task ID: task_1435534231122_0025_m_00 (and more) from job job_1435534231122_0025 Task with the most failures(4): - Task ID: task_1435534231122_0025_m_00 URL: http://host-10-17-80-40.coe.cloudera.com:8088/taskdetails.jsp?jobid=job_1435534231122_0025tipid=task_1435534231122_0025_m_00 - Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating map_keys(content) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549) ... 9 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) ... 13 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL hive {noformat} The error is as below (in mappers): {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:778) ... 17 more {noformat} Looking at the source code: {code} public Object evaluate(DeferredObject[] arguments) throws HiveException {
[jira] [Updated] (HIVE-11216) UDF GenericUDFMapKeys throws NPE when a null map value is passed in
[ https://issues.apache.org/jira/browse/HIVE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-11216: -- Attachment: HIVE-11216.patch UDF GenericUDFMapKeys throws NPE when a null map value is passed in --- Key: HIVE-11216 URL: https://issues.apache.org/jira/browse/HIVE-11216 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Yibing Shi Assignee: Yibing Shi Attachments: HIVE-11216.patch We can reproduce the problem as below: {noformat} hive show create table map_txt; OK CREATE TABLE `map_txt`( `id` int, `content` mapint,string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ... Time taken: 0.233 seconds, Fetched: 18 row(s) hive select * from map_txt; OK 1 NULL Time taken: 0.679 seconds, Fetched: 1 row(s) hive select id, map_keys(content) from map_txt; Error during job, obtaining debugging information... Examining task ID: task_1435534231122_0025_m_00 (and more) from job job_1435534231122_0025 Task with the most failures(4): - Task ID: task_1435534231122_0025_m_00 URL: http://host-10-17-80-40.coe.cloudera.com:8088/taskdetails.jsp?jobid=job_1435534231122_0025tipid=task_1435534231122_0025_m_00 - Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating map_keys(content) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549) ... 9 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) ... 13 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL hive {noformat} The error is as below (in mappers): {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:778) ... 17 more {noformat} Looking at the source code: {code} public Object
[jira] [Updated] (HIVE-11216) UDF GenericUDFMapKeys throws NPE when a null map value is passed in
[ https://issues.apache.org/jira/browse/HIVE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-11216: -- Attachment: HIVE-11216.1.patch Attach a new patch. UDF GenericUDFMapKeys throws NPE when a null map value is passed in --- Key: HIVE-11216 URL: https://issues.apache.org/jira/browse/HIVE-11216 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Yibing Shi Assignee: Yibing Shi Attachments: HIVE-11216.1.patch, HIVE-11216.patch We can reproduce the problem as below: {noformat} hive show create table map_txt; OK CREATE TABLE `map_txt`( `id` int, `content` mapint,string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ... Time taken: 0.233 seconds, Fetched: 18 row(s) hive select * from map_txt; OK 1 NULL Time taken: 0.679 seconds, Fetched: 1 row(s) hive select id, map_keys(content) from map_txt; Error during job, obtaining debugging information... Examining task ID: task_1435534231122_0025_m_00 (and more) from job job_1435534231122_0025 Task with the most failures(4): - Task ID: task_1435534231122_0025_m_00 URL: http://host-10-17-80-40.coe.cloudera.com:8088/taskdetails.jsp?jobid=job_1435534231122_0025tipid=task_1435534231122_0025_m_00 - Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:1,content:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating map_keys(content) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549) ... 9 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) ... 13 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL hive {noformat} The error is as below (in mappers): {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapKeys.evaluate(GenericUDFMapKeys.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:778) ... 17 more {noformat} Looking at the source code:
[jira] [Commented] (HIVE-11150) Remove wrong warning message related to chgrp
[ https://issues.apache.org/jira/browse/HIVE-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609303#comment-14609303 ] Yibing Shi commented on HIVE-11150: --- Should we also fix {{Hadoop20Shims}} and {{Hadoop20SShims}}? Should we also protect the call to {{chmod}} in a similar way? Remove wrong warning message related to chgrp - Key: HIVE-11150 URL: https://issues.apache.org/jira/browse/HIVE-11150 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11150.1.patch When using other file system other than hdfs, users see warning message regarding hdfs chgrp. The warning is very annoying and confusing. We'd better remove it. The warning example: {noformat} hive insert overwrite table s3_test select total_emp, salary, description from sample_07 limit 5; -chgrp: '' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)