[jira] [Created] (HIVE-25757) Use cached database type to choose metastore backend queries
Yongzhi Chen created HIVE-25757: --- Summary: Use cached database type to choose metastore backend queries Key: HIVE-25757 URL: https://issues.apache.org/jira/browse/HIVE-25757 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In HIVE-21075, we use DatabaseProduct.determineDatabaseProduct which can be expensive. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2
Yongzhi Chen created HIVE-25238: --- Summary: Make excluded SSL cipher suites configurable for Hive Web UI and HS2 Key: HIVE-25238 URL: https://issues.apache.org/jira/browse/HIVE-25238 Project: Hive Issue Type: Improvement Components: HiveServer2, Web UI Reporter: Yongzhi Chen When starting a jetty http server, one can explicitly exclude certain (unsecure) SSL cipher suites. This can be especially important, when Hive needs to be compliant with security regulations. Need add properties to support Hive WebUi and HiveServer2 to this -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25211) Create database throws NPE
Yongzhi Chen created HIVE-25211: --- Summary: Create database throws NPE Key: HIVE-25211 URL: https://issues.apache.org/jira/browse/HIVE-25211 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Yongzhi Chen <11>1 2021-06-06T17:32:48.964Z metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) at com.sun.proxy.$Proxy31.create_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120) at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1618) ... 21 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24552) Possible HMS connections leak or accumulation in loadDynamicPartitions
Yongzhi Chen created HIVE-24552: --- Summary: Possible HMS connections leak or accumulation in loadDynamicPartitions Key: HIVE-24552 URL: https://issues.apache.org/jira/browse/HIVE-24552 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen When loadDynamicPartitions (Hive.java) is called, it generates several threads to handle FileMove. These threads may generate HiveMetaStore connections. These connections may not be closed in time and cause many accumulated connections. Following is the log got from running insert overwrites many times, you can see these threads created new HMS connections, and the total number of open connections is large. And the finalizer closes the connections and sometimes had errors: {noformat} <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] Opened a connection to metastore, current connections: 44021 <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] Connected to metastore. <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] Opened a connection to metastore, current connections: 44022 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] Connected to metastore. <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] Opened a connection to metastore, current connections: 44023 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] Connected to metastore. <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-3"] Opened a connection to metastore, current connections: 44024 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43904 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43903 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43902 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43901 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"]
[jira] [Created] (HIVE-24392) Send table id in get_parttions_by_names_req api
Yongzhi Chen created HIVE-24392: --- Summary: Send table id in get_parttions_by_names_req api Key: HIVE-24392 URL: https://issues.apache.org/jira/browse/HIVE-24392 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen Table id is not part of the get_partitions_by_names_req API thrift definition, add it by this Jira -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24292) hive webUI should support keystoretype by config
Yongzhi Chen created HIVE-24292: --- Summary: hive webUI should support keystoretype by config Key: HIVE-24292 URL: https://issues.apache.org/jira/browse/HIVE-24292 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen We need a property to pass-in keystore type in webui too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24253) HMS needs to support keystore/truststores types besides JKS
Yongzhi Chen created HIVE-24253: --- Summary: HMS needs to support keystore/truststores types besides JKS Key: HIVE-24253 URL: https://issues.apache.org/jira/browse/HIVE-24253 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support the default keystore type specified for the JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support to set additional keystore/truststore types used for different applications like for FIPS crypto algorithms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24236) Connection leak in TxnHandler
Yongzhi Chen created HIVE-24236: --- Summary: Connection leak in TxnHandler Key: HIVE-24236 URL: https://issues.apache.org/jira/browse/HIVE-24236 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen We see failures in QE tests with cannot allocate connections errors. The exception stack like following: {noformat} 2020-09-29T18:44:26,563 INFO [Heartbeater-0]: txn.TxnHandler (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, general error (SQLState=null, ErrorCode=0) 2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to select from transaction database org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general error at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118) at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605) at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452) at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy63.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247) at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) at com.sun.proxy.$Proxy64.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1112) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 29 more ) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2747) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452) at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source) {noformat} and {noformat} Caused by: java.util.NoSuchElementException: Timeout waiting for idle object at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1134) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 53 more ) at org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:3375) at org.apache.hadoop.hive.metastore.AcidEventListener.onDropTable(AcidEventListener.java:65) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$19.notify(MetaStoreListenerNotifier.java:103) at
[jira] [Created] (HIVE-22461) NPE Metastore Transformer
Yongzhi Chen created HIVE-22461: --- Summary: NPE Metastore Transformer Key: HIVE-22461 URL: https://issues.apache.org/jira/browse/HIVE-22461 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.1.2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The stack looks as following: {noformat} 2019-10-08 18:09:12,198 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [pool-6-thread-328]: Starting translation for processor Hiveserver2#3.1.2000.7.0.2.0...@vc0732.halxg.cloudera.com on list 1 2019-10-08 18:09:12,198 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-328]: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-10-08 18:09:12,199 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-6-thread-328]: Error occurred during processing of message. java.lang.NullPointerException: null at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at
[jira] [Created] (HIVE-21840) Hive Metastore Translation: Bucketed table Readonly capability
Yongzhi Chen created HIVE-21840: --- Summary: Hive Metastore Translation: Bucketed table Readonly capability Key: HIVE-21840 URL: https://issues.apache.org/jira/browse/HIVE-21840 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam Impala needs a new capability to tell only read supported for bucketed tables. No matter it is managed or external, ACID or not. Also in the current implementation, when HIVEBUCKET2 is not in the capabilities list, a bucked external table returned as an un-bucketed one, we need a way to know it is "downgraded" from a bucketed table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21839) Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it
Yongzhi Chen created HIVE-21839: --- Summary: Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it Key: HIVE-21839 URL: https://issues.apache.org/jira/browse/HIVE-21839 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam Hive can either return an error message or provide an API call to check the permission even without a table instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21838) Hive Metastore Translation: Add API call to tell client why table has limited access
Yongzhi Chen created HIVE-21838: --- Summary: Hive Metastore Translation: Add API call to tell client why table has limited access Key: HIVE-21838 URL: https://issues.apache.org/jira/browse/HIVE-21838 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam When a table access type is Read-only or None, we need a way to tell clients why. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB
Yongzhi Chen created HIVE-21075: --- Summary: Metastore: Drop partition performance downgrade with Postgres DB Key: HIVE-21075 URL: https://issues.apache.org/jira/browse/HIVE-21075 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Yongzhi Chen In order to workaround oracle not supporting limit statement caused performance issue, HIVE-9447 makes all the backend DB run select count(1) from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in SDS table before drop a partition. This select count(1) statement does not scale well in Postgres, and there is no index for CD_ID column in SDS table. For a SDS table with with 1.5 million rows, select count(1) has average 700ms without index, while in 10-20ms with index. But the statement before HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses less than 10ms . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21019) Fix autoColumnStats tests to make auto stats gather possible.
Yongzhi Chen created HIVE-21019: --- Summary: Fix autoColumnStats tests to make auto stats gather possible. Key: HIVE-21019 URL: https://issues.apache.org/jira/browse/HIVE-21019 Project: Hive Issue Type: Bug Components: Test Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Before https://issues.apache.org/jira/browse/HIVE-20915 , the optimizer sort dynamic partitions is turn off for these tests. So these test can have group by in the query plan which can trigger compute statistics. After the jira, the optimizer is enabled, the query plan do not have group by, but a reduce sorting operation. In order to test the auto column stats gather feature, we should disable sort dynamic partitions for these tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20915) Make dynamic sort partition optimization available to HoS and MR
Yongzhi Chen created HIVE-20915: --- Summary: Make dynamic sort partition optimization available to HoS and MR Key: HIVE-20915 URL: https://issues.apache.org/jira/browse/HIVE-20915 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen HIVE-20703 put dynamic sort partition optimization under cost based decision, but it also makes the optimizer only available to tez. hive.optimize.sort.dynamic.partition works with other execution engines for a long time, we should keep the optimizer available to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20741) Disable or fix random failed tests
Yongzhi Chen created HIVE-20741: --- Summary: Disable or fix random failed tests Key: HIVE-20741 URL: https://issues.apache.org/jira/browse/HIVE-20741 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Two qfile tests for TestCliDriver, they may all relate to number precision issues: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_context_ngrams] (batchId=79) Error: Client Execution succeeded but contained differences (error code = 1) after executing udaf_context_ngrams.q 43c43 < [{"ngram":["travelling"],"estfrequency":1.0}] --- > [{"ngram":["travelling"],"estfrequency":3.0}] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_corr] (batchId=84) Client Execution succeeded but contained differences (error code = 1) after executing udaf_corr.q 100c100 < 0.6633880657639324 --- > 0.6633880657639326 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20695) HoS Query fails with hive.exec.parallel=true
Yongzhi Chen created HIVE-20695: --- Summary: HoS Query fails with hive.exec.parallel=true Key: HIVE-20695 URL: https://issues.apache.org/jira/browse/HIVE-20695 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.2.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive queries which fail when running a HiveOnSpark job: {noformat} ERROR : Failed to execute spark task, with exception 'java.lang.Exception(Failed to submit Spark work, please retry later)' java.lang.Exception: Failed to submit Spark work, please retry later at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.execute(RemoteHiveSparkClient.java:186) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:107) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive/dbname/_spark_session_dir/e202c452-8793-4e4e-ad55-61e3d4965c69/somename.jar (inode 725730760): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-1981084042_486659, pending creates: 7] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3755) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3556) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3412) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:688) {format} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20016) Investigate random test failure
Yongzhi Chen created HIVE-20016: --- Summary: Investigate random test failure Key: HIVE-20016 URL: https://issues.apache.org/jira/browse/HIVE-20016 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3 failed with: java.lang.AssertionError: Concurrent Statement failed: org.apache.hive.service.cli.HiveSQLException: java.lang.AssertionError: Authorization plugins not initialized! at org.junit.Assert.fail(Assert.java:88) at org.apache.hive.jdbc.TestJdbcWithMiniHS2.finishTasks(TestJdbcWithMiniHS2.java:374) at org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3(TestJdbcWithMiniHS2.java:304) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19897) Add more tests for parallel compilation
Yongzhi Chen created HIVE-19897: --- Summary: Add more tests for parallel compilation Key: HIVE-19897 URL: https://issues.apache.org/jira/browse/HIVE-19897 Project: Hive Issue Type: Test Components: HiveServer2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The two parallel compilation tests in org.apache.hive.jdbc.TestJdbcWithMiniHS2 do not real cover the case of queries compile concurrently from different connections. No sure it is on purpose or by mistake. Add more tests to cover the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19296) Add log to record MapredLocalTask Failure
Yongzhi Chen created HIVE-19296: --- Summary: Add log to record MapredLocalTask Failure Key: HIVE-19296 URL: https://issues.apache.org/jira/browse/HIVE-19296 Project: Hive Issue Type: Bug Components: Diagnosability Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In some cases, When MapredLocalTask fails around Child process start time, we can not find the detail error information anywhere(not in strerr log, no MapredLocal log file). All we get is : {noformat} *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1 *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Obtaining error information *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Task failed! Task ID: Stage-48 Logs: *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: /var/log/hive/hadoop-cmf-hive1-HIVESERVER2-t.log.out *** ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: [HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1 {noformat} It is really hard to debug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18671) lock not released after Hive on Spark query was cancelled
Yongzhi Chen created HIVE-18671: --- Summary: lock not released after Hive on Spark query was cancelled Key: HIVE-18671 URL: https://issues.apache.org/jira/browse/HIVE-18671 Project: Hive Issue Type: Bug Affects Versions: 2.3.2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When cancel the query is running on spark, the SparkJobMonitor can not return, therefore the locks hold by the query can not be released. When enable debug in log, you will see many log info as following: {noformat} 2018-02-09 08:27:09,613 INFO org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: [HiveServer2-Background-Pool: Thread-80]: state = CANCELLED 2018-02-09 08:27:10,613 INFO org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: [HiveServer2-Background-Pool: Thread-80]: state = CANCELLED {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-17640) Comparison of date return null if only time part is provided in string.
Yongzhi Chen created HIVE-17640: --- Summary: Comparison of date return null if only time part is provided in string. Key: HIVE-17640 URL: https://issues.apache.org/jira/browse/HIVE-17640 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 2.1.0 Reproduce: select '2017-01-01 00:00:00' < current_date; INFO : OK ... 1 row selected (18.324 seconds) ... NULL -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.
Yongzhi Chen created HIVE-16875: --- Summary: Query against view with partitioned child on HoS fails with privilege exception. Key: HIVE-16875 URL: https://issues.apache.org/jira/browse/HIVE-16875 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Query against view with child table that has partitions fails with privilege exception even with correct privileges. Reproduce: {noformat} create table jsamp1 (a string) partitioned by (b int); insert into table jsamp1 partition (b=1) values ("hello"); create view jview as select * from jsamp1; create role viewtester; grant all on table jview to role viewtester; grant role viewtester to group testers; Use MR, the select will succeed: set hive.execution.engine=mr; select count(*) from jview; while use spark: set hive.execution.engine=spark; select count(*) from jview; it fails with: Error: Error while compiling statement: FAILED: SemanticException No valid privileges User tester does not have privileges for QUERY The required privileges: Server=server1->Db=default->Table=j1part->action=select; (state=42000,code=4) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16660) Not able to add partition for views in hive when sentry is enabled
Yongzhi Chen created HIVE-16660: --- Summary: Not able to add partition for views in hive when sentry is enabled Key: HIVE-16660 URL: https://issues.apache.org/jira/browse/HIVE-16660 Project: Hive Issue Type: Bug Components: Parser Reporter: Yongzhi Chen Assignee: Yongzhi Chen Repro: create table tesnit (a int) partitioned by (p int); insert into table tesnit partition (p = 1) values (1); insert into table tesnit partition (p = 2) values (1); create view test_view partitioned on (p) as select * from tesnit where p =1; alter view test_view add partition (p = 2); Error: Error while compiling statement: FAILED: SemanticException [Error 10056]: The query does not reference any valid partition. To run this query, set hive.mapred.mode=nonstrict (state=42000,code=10056) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16426) Query cancel: improve the way to handle files
Yongzhi Chen created HIVE-16426: --- Summary: Query cancel: improve the way to handle files Key: HIVE-16426 URL: https://issues.apache.org/jira/browse/HIVE-16426 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen Assignee: Yongzhi Chen 1. Add data structure support to make it is easy to check query cancel status. 2. Handle query cancel more gracefully. Remove possible file leaks caused by query cancel as shown in following stack: {noformat} 2017-04-11 09:57:30,727 WARN org.apache.hadoop.hive.ql.exec.Utilities: [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories. java.io.InterruptedIOException: Call interrupted at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1439) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy20.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy21.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671) at org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} 3. Add checkpoints to related file operations to improve response time for query cancelling. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15997) Resource leaks when query is cancelled
Yongzhi Chen created HIVE-15997: --- Summary: Resource leaks when query is cancelled Key: HIVE-15997 URL: https://issues.apache.org/jira/browse/HIVE-15997 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen There may some resource leaks when query is cancelled. We see following stacks in the log: Possible files and folder leak: {noformat} 2017-02-02 06:23:25,410 WARN hive.ql.Context: [HiveServer2-Background-Pool: Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: "ychencdh511t-1.vpc.cloudera.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy25.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy26.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671) at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405) at org.apache.hadoop.hive.ql.Context.clear(Context.java:541) at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109) at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) at org.apache.hadoop.ipc.Client.call(Client.java:1448) ... 35 more 2017-02-02 12:26:52,706 INFO org.apache.hive.service.cli.operation.OperationManager: [HiveServer2-Background-Pool: Thread-23]: Operation is timed out,operation=OperationHandle [opType=EXECUTE_STATEMENT,
[jira] [Created] (HIVE-15735) In some cases, view objects inside a view do not have parents
Yongzhi Chen created HIVE-15735: --- Summary: In some cases, view objects inside a view do not have parents Key: HIVE-15735 URL: https://issues.apache.org/jira/browse/HIVE-15735 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen This cause Sentry throws "No valid privileges" error: Error: Error while compiling statement: FAILED: SemanticException No valid privileges. To reproduce: Enable sentry: create table t1( i int); create view v1 as select * from t1; create view v2 as select * from v1 union all select * from v1; If the user does not have read permission on t1 and v1, the query select * from v2; This will fail with: Error: Error while compiling statement: FAILED: SemanticException No valid privileges User foo does not have privileges for QUERY The required privileges: Server=server1->Db=database2->Table=v1->action=select; (state=42000,code=4) Sentry should not check v1's permission, for v1 has at least one parent(v2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15615) Fix unit tests failures cause by HIVE-13696
Yongzhi Chen created HIVE-15615: --- Summary: Fix unit tests failures cause by HIVE-13696 Key: HIVE-15615 URL: https://issues.apache.org/jira/browse/HIVE-15615 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Following unit tests failed with same stack: org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters {noformat} 2017-01-11T15:02:27,774 ERROR [main] ql.Driver: FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.cleanName(QueuePlacementRule.java:351) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$User.getQueueForApp(QueuePlacementRule.java:132) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) at org.apache.hadoop.hive.schshim.FairSchedulerShim.setJobQueueForUserInternal(FairSchedulerShim.java:96) at org.apache.hadoop.hive.schshim.FairSchedulerShim.validateQueueConfiguration(FairSchedulerShim.java:82) at org.apache.hadoop.hive.ql.session.YarnFairScheduling.validateYarnQueue(YarnFairScheduling.java:68) at org.apache.hadoop.hive.ql.Driver.configureScheduling(Driver.java:671) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:543) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1223) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15572) Improve the response time for query canceling when it happens during acquiring locks
Yongzhi Chen created HIVE-15572: --- Summary: Improve the response time for query canceling when it happens during acquiring locks Key: HIVE-15572 URL: https://issues.apache.org/jira/browse/HIVE-15572 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen Assignee: Yongzhi Chen When query canceling command sent during Hive Acquire locks (from zookeeper), hive will finish acquiring all the locks and release them. As it is shown in the following log: It took 165 s to finish acquire the lock,then spend 81s to release them. We can improve the performance by not acquiring any more locks and releasing held locks when the query canceling command is received. Background-Pool: Thread-224]: 2017-01-03 10:50:35,413 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-224]: 2017-01-03 10:51:00,671 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:00,672 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:00,672 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-218]: FAILED: query select count(*) from manyparttbl has been cancelled 2017-01-03 10:51:00,673 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:40,755 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-215]: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix
Yongzhi Chen created HIVE-15437: --- Summary: avro tables join fails when - tbl join tbl_postfix Key: HIVE-15437 URL: https://issues.apache.org/jira/browse/HIVE-15437 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The following queries return good results: select * from table1 where col1=key1; select * from table1_1 where col1=key1; When join them together, it gets following error: {noformat} Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found long, expecting union at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] {noformat} The two avro tables both is defined by using avro schema, and the first table's name is the second table name's prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15391) Location validation for table should ignore the values for view.
Yongzhi Chen created HIVE-15391: --- Summary: Location validation for table should ignore the values for view. Key: HIVE-15391 URL: https://issues.apache.org/jira/browse/HIVE-15391 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 2.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When use schematool to do location validation, we got error message for views, for example: {noformat} n DB with Name: viewa NULL Location for TABLE with Name: viewa In DB with Name: viewa NULL Location for TABLE with Name: viewb In DB with Name: viewa {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15359) skip.footer.line.count doesnt work properly for certain situations
Yongzhi Chen created HIVE-15359: --- Summary: skip.footer.line.count doesnt work properly for certain situations Key: HIVE-15359 URL: https://issues.apache.org/jira/browse/HIVE-15359 Project: Hive Issue Type: Bug Components: Reader Reporter: Yongzhi Chen Assignee: Yongzhi Chen This issue's reproduce is very like HIVE-12718 , but the data file is larger than 128M . In this case, even make sure only one mapper is used, the footer is still wrongly skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15320) Cross Realm hive query is failing with KERBEROS authentication error
Yongzhi Chen created HIVE-15320: --- Summary: Cross Realm hive query is failing with KERBEROS authentication error Key: HIVE-15320 URL: https://issues.apache.org/jira/browse/HIVE-15320 Project: Hive Issue Type: Improvement Components: Security Reporter: Yongzhi Chen Executing cross realm query and it is failing. Authentication against remote NN is tried with SIMPLE, not KERBEROS. It looks Hive does not obtain needed ticket for remote NN. insert overwrite directory 'hdfs://differentrealmhost:8020/hive/test' select * from currentrealmtable where ...; It will fail with java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] hdfs command distcp works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table
Yongzhi Chen created HIVE-15074: --- Summary: Schematool provides a way to detect invalid entries in VERSION table Key: HIVE-15074 URL: https://issues.apache.org/jira/browse/HIVE-15074 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen Priority: Minor For some unknown reason, we see customer's HMS can not start because there are multiple entries in their HMS VERSION table. Schematool should provide a way to validate the HMS db and provide warning and fix options for this kind of issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15073) Schematool should detect malformed URIs
Yongzhi Chen created HIVE-15073: --- Summary: Schematool should detect malformed URIs Key: HIVE-15073 URL: https://issues.apache.org/jira/browse/HIVE-15073 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen For some causes(most unknown), HMS DB tables sometimes has invalid entries, for example URI missing scheme for SDS table's LOCATION column or DBS's DB_LOCATION_URI column. These malformed URIs lead to hard to analyze errors in HIVE and SENTRY. Schematool need to provide a command to detect these malformed URI, give a warning and provide an option to fix the URIs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15072) Schematool should recognize missing tables in metastore
Yongzhi Chen created HIVE-15072: --- Summary: Schematool should recognize missing tables in metastore Key: HIVE-15072 URL: https://issues.apache.org/jira/browse/HIVE-15072 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen When Install a new database failed half way(for some other reasons), not all of the metastore tables are installed. This caused HMS server failed to start up due to missing tables. Re-run the Schematool, It ran successfully, and in the stdout log said: "Database already has tables. Skipping table creation". However, restarting HMS getting the same error reporting missing tables. Schematool should detect missing tables and provide options to go ahead and recreate missing tables in the case of new installation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14743) ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs
Yongzhi Chen created HIVE-14743: --- Summary: ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs Key: HIVE-14743 URL: https://issues.apache.org/jira/browse/HIVE-14743 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The stack: {noformat} 2016-09-13T09:38:49,972 ERROR [186b4545-65b5-4bfc-bc8e-3e14e251bb12 main] exec.Task: Job Submission failed with exception 'java.lang.ArrayIndexOutOfBoundsException(1)' java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.createFilterScan(HiveHBaseTableInputFormat.java:224) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:492) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:449) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:466) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:356) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:546) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) {noformat} Repro: {noformat} CREATE TABLE HBASE_TABLE_TEST_1( cvalue string , pk string, ccount int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'='cf:val,:key,cf2:count', 'hbase.scan.cache'='500', 'hbase.scan.cacheblocks'='false', 'serialization.format'='1') TBLPROPERTIES ( 'hbase.table.name'='hbase_table_test_1', 'serialization.null.format'='' ); CREATE VIEW VIEW_HBASE_TABLE_TEST_1 AS SELECT hbase_table_test_1.cvalue,hbase_table_test_1.pk,hbase_table_test_1.ccount FROM hbase_table_test_1 WHERE hbase_table_test_1.ccount IS NOT NULL; CREATE TABLE HBASE_TABLE_TEST_2( cvalue string , pk string , ccount int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'='cf:val,:key,cf2:count', 'hbase.scan.cache'='500', 'hbase.scan.cacheblocks'='false', 'serialization.format'='1') TBLPROPERTIES ( 'hbase.table.name'='hbase_table_test_2', 'serialization.null.format'=''); CREATE VIEW VIEW_HBASE_TABLE_TEST_2 AS SELECT hbase_table_test_2.cvalue,hbase_table_test_2.pk,hbase_table_test_2.ccount FROM hbase_table_test_2 WHERE hbase_table_test_2.pk >='3-h-0' AND hbase_table_test_2.pk <= '3-h-g' AND hbase_table_test_2.ccount IS NOT NULL; set hive.auto.convert.join=false; SELECT p.cvalue cvalue FROM `VIEW_HBASE_TABLE_TEST_1` `p` LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_2` `A1` ON `p`.cvalue = `A1`.cvalue LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_1` `A2` ON `p`.cvalue = `A2`.cvalue; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14715) Hive throws NumberFormatException with query with Null value
Yongzhi Chen created HIVE-14715: --- Summary: Hive throws NumberFormatException with query with Null value Key: HIVE-14715 URL: https://issues.apache.org/jira/browse/HIVE-14715 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen The java.lang.NumberFormatException will throw with following reproduce: set hive.cbo.enable=false; CREATE TABLE `paqtest`( `c1` int, `s1` string, `s2` string, `bn1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; insert into paqtest values (58, '', 'ABC', 0); SELECT 'Pricing mismatch' AS category, c1, NULL AS itemtype_used, NULL AS acq_itemtype, s2, NULL AS currency_used_avg, NULL AS acq_items_avg, sum(bn1) AS cca FROM paqtest WHERE (s1 IS NULL OR length(s1) = 0) GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL; The stack like following: java.lang.NumberFormatException: ABC GroupByOperator.process(Object, int) line: 773 ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236 ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, RawKeyValueIterator, RawComparator, Class, Class) line: 444 ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392 LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319 Executors$RunnableAdapter.call() line: 471 It works fine when hive.cbo.enable = true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14596) Canceling hive query takes very long time
Yongzhi Chen created HIVE-14596: --- Summary: Canceling hive query takes very long time Key: HIVE-14596 URL: https://issues.apache.org/jira/browse/HIVE-14596 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen when the Hue user clicks cancel, the Hive query does not stop immediately, it can take very long time. And in the yarn job history you will see exceptions like following: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive/hive/80a5cfdb-9f98-44d2-ae53-332c8dae62a3/hive_2016-08-20_07-06-12_819_8780093905859269639-3/-mr-1/.hive-staging_hive_2016-08-20_07-06-12_819_8780093905859269639-3/_task_tmp.-ext-10001/_tmp.00_0 (inode 28224): File does not exist. Holder DFSClient_attempt_1471630445417_0034_m_00_0_-50732711_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3624) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3427) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3283) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:246) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1007) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:206) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement
Yongzhi Chen created HIVE-14538: --- Summary: beeline throws exceptions with parsing hive config when using !sh statement Key: HIVE-14538 URL: https://issues.apache.org/jira/browse/HIVE-14538 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When beeline has a connection to a server, in some env it has following problem: {noformat} 0: jdbc:hive2://localhost> !verbose verbose: on 0: jdbc:hive2://localhost> !sh id java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hive.beeline.Commands.addConf(Commands.java:758) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 0: jdbc:hive2://localhost> !sh echo hello java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hive.beeline.Commands.addConf(Commands.java:758) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 0: jdbc:hive2://localhost> {noformat} Also it breaks if there is no connection established: {noformat} beeline> !sh id java.lang.NullPointerException at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897) at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14519) Multi insert query bug
Yongzhi Chen created HIVE-14519: --- Summary: Multi insert query bug Key: HIVE-14519 URL: https://issues.apache.org/jira/browse/HIVE-14519 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Yongzhi Chen Assignee: Yongzhi Chen When running multi-insert queries, when one of the query is not returning results, the other query is not returning the right result. For example: After following query, there is no value in /tmp/emp/dir3/00_0 {noformat} >From (select * from src) a insert overwrite directory '/tmp/emp/dir1/' select key, value insert overwrite directory '/tmp/emp/dir2/' select 'header' where 1=2 insert overwrite directory '/tmp/emp/dir3/' select key, value where key = 100; {noformat} where clause in the second insert should not affect the third insert. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14015) SMB MapJoin failed for Hive on Spark when kerberized
Yongzhi Chen created HIVE-14015: --- Summary: SMB MapJoin failed for Hive on Spark when kerberized Key: HIVE-14015 URL: https://issues.apache.org/jira/browse/HIVE-14015 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0, 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication It could be reproduced: 1) prepare sample data: a=1 while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data 2) prepare source hive table: CREATE TABLE `s`(`c` string); load data local inpath 'data' into table s; 3) prepare the bucketed table: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS; insert into t select * from s; 4) reproduce this issue: SET hive.execution.engine=spark; SET hive.auto.convert.sortmerge.join = true; SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ; SET hive.auto.convert.sortmerge.join.noconditionaltask = true; SET hive.optimize.bucketmapjoin = true; SET hive.optimize.bucketmapjoin.sortedmerge = true; select * from t join t t1 on t.c=t1.c; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13991) Union All on view fail with no valid permission on underneath table
Yongzhi Chen created HIVE-13991: --- Summary: Union All on view fail with no valid permission on underneath table Key: HIVE-13991 URL: https://issues.apache.org/jira/browse/HIVE-13991 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Yongzhi Chen Assignee: Yongzhi Chen When sentry is enabled. create view V as select * from T; When the user has read permission on view V, but does not have read permission on table T, select * from V union all select * from V failed with: {noformat} 0: jdbc:hive2://> select * from s07view union all select * from s07view limit 1; Error: Error while compiling statement: FAILED: SemanticException No valid privileges Required privileges for this query: Server=server1->Db=default->Table=sample_07->action=select; (state=42000,code=4) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13932) Hive SMB Map Join with small set of LIMIT failed with NPE
Yongzhi Chen created HIVE-13932: --- Summary: Hive SMB Map Join with small set of LIMIT failed with NPE Key: HIVE-13932 URL: https://issues.apache.org/jira/browse/HIVE-13932 Project: Hive Issue Type: Bug Affects Versions: 2.0.0, 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen 1) prepare sample data: a=1 while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data 2) prepare source hive table: CREATE TABLE `s`(`c` string); load data local inpath 'data' into table s; 3) prepare the bucketed table: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS; insert into t select * from s; 4) reproduce this issue: SET hive.auto.convert.sortmerge.join = true; SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ; SET hive.auto.convert.sortmerge.join.noconditionaltask = true; SET hive.optimize.bucketmapjoin = true; SET hive.optimize.bucketmapjoin.sortedmerge = true; select * from t join t t1 on t.c=t1.c limit 1; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13632) Hive failing on insert empty array into parquet table
Yongzhi Chen created HIVE-13632: --- Summary: Hive failing on insert empty array into parquet table Key: HIVE-13632 URL: https://issues.apache.org/jira/browse/HIVE-13632 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The insert will fail with following stack: {noformat} by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:271) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:271) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697) {noformat} Reproduce: {noformat} create table test_small ( key string, arrayValues array) stored as parquet; insert into table test_small select 'abcd', array() from src limit 1; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13570) Some query with Union all fails when CBO is off
Yongzhi Chen created HIVE-13570: --- Summary: Some query with Union all fails when CBO is off Key: HIVE-13570 URL: https://issues.apache.org/jira/browse/HIVE-13570 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries with union all throws IndexOutOfBoundsException when: set hive.cbo.enable=false; set hive.ppd.remove.duplicatefilters=true; The stack is as: {noformat} {code} java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13200) Aggregation functions returning empty rows on partitioned columns
Yongzhi Chen created HIVE-13200: --- Summary: Aggregation functions returning empty rows on partitioned columns Key: HIVE-13200 URL: https://issues.apache.org/jira/browse/HIVE-13200 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 2.0.0, 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Running aggregation functions like MAX, MIN, DISTINCT against partitioned columns will return empty rows if table has property: 'skip.header.line.count'='1' Reproduce: {noformat} DROP TABLE IF EXISTS test; CREATE TABLE test (a int) PARTITIONED BY (b int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' TBLPROPERTIES('skip.header.line.count'='1'); INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4); INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4); SELECT * FROM test; SELECT DISTINCT b FROM test; SELECT MAX(b) FROM test; SELECT DISTINCT a FROM test; {noformat} The output: {noformat} 0: jdbc:hive2://localhost:1/default> SELECT * FROM test; +-+-+--+ | test.a | test.b | +-+-+--+ | 2 | 1 | | 3 | 1 | | 4 | 1 | | 2 | 2 | | 3 | 2 | | 4 | 2 | +-+-+--+ 6 rows selected (0.631 seconds) 0: jdbc:hive2://localhost:1/default> SELECT DISTINCT b FROM test; ++--+ | b | ++--+ ++--+ No rows selected (47.229 seconds) 0: jdbc:hive2://localhost:1/default> SELECT MAX(b) FROM test; +---+--+ | _c0 | +---+--+ | NULL | +---+--+ 1 row selected (49.508 seconds) 0: jdbc:hive2://localhost:1/default> SELECT DISTINCT a FROM test; ++--+ | a | ++--+ | 2 | | 3 | | 4 | ++--+ 3 rows selected (46.859 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13065) Hive throws NPE when writing map type data to a HBase backed table
Yongzhi Chen created HIVE-13065: --- Summary: Hive throws NPE when writing map type data to a HBase backed table Key: HIVE-13065 URL: https://issues.apache.org/jira/browse/HIVE-13065 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive throws NPE when writing data to a HBase backed table with below conditions: # There is a map type column # The map type column has NULL in its values Below are the reproduce steps: *1) Create a HBase backed Hive table* {code:sql} create table hbase_test (id bigint, data map) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ":key,cf:map_col") tblproperties ("hbase.table.name" = "hive_test"); {code} *2) insert data into above table* {code:sql} insert overwrite table hbase_test select 1 as id, map('abcd', null) as data from src limit 1; {code} The mapreduce job for insert query fails. Error messages are as below: {noformat} 2016-02-15 02:26:33,225 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:731) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 7 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:666) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:221) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:236) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:275) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:222) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282) ... 15 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13039) BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
Yongzhi Chen created HIVE-13039: --- Summary: BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table Key: HIVE-13039 URL: https://issues.apache.org/jira/browse/HIVE-13039 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 1.2.1, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen BETWEEN becomes exclusive in parquet table when predicate pushdown is on (as it is by default in newer Hive versions). To reproduce(in a cluster, not local setup): CREATE TABLE parquet_tbl( key int, ldate string) PARTITIONED BY ( lyear string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; insert overwrite table parquet_tbl partition (lyear='2016') select 1, '2016-02-03' from src limit 1; set hive.optimize.ppd.storage = true; set hive.optimize.ppd = true; select * from parquet_tbl where ldate between '2016-02-03' and '2016-02-03'; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12795) Vectorized execution causes ClassCastException
Yongzhi Chen created HIVE-12795: --- Summary: Vectorized execution causes ClassCastException Key: HIVE-12795 URL: https://issues.apache.org/jira/browse/HIVE-12795 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In some hive versions, when set hive.auto.convert.join=false; set hive.vectorized.execution.enabled = true; Some join queries fail with ClassCastException: The stack: {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableStringObjectInspector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:419) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1102) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126) ... 22 more {noformat} It can not be reproduced in hive 2.0 and 1.3 because of different code path. Reproduce: {noformat} CREATE TABLE test1 ( id string) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); CREATE TABLE test2( id string ) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); set hive.auto.convert.join=false; set hive.vectorized.execution.enabled = true; SELECT cr.id1 , cr.id2 FROM (SELECT t1.id id1, t2.id id2 from (select * from test1 ) t1 left outer join test2 t2 on t1.id=t2.id) cr; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12784) Group by SemanticException: Invalid column reference
Yongzhi Chen created HIVE-12784: --- Summary: Group by SemanticException: Invalid column reference Key: HIVE-12784 URL: https://issues.apache.org/jira/browse/HIVE-12784 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries work fine in older versions throws SemanticException, the stack trace: {noformat} FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 'key2' 15/12/21 18:56:44 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 'key2' org.apache.hadoop.hive.ql.parse.SemanticException: Line 96:1 Invalid column reference 'key2' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4228) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5670) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9007) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9884) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9777) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10250) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10261) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10141) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Reproduce: {noformat} create table tlb (key int, key1 int, key2 int); create table src (key int, value string); select key, key1, key2 from (select a.key, 0 as key1 , 0 as key2 from tlb a inner join src b on a.key = b.key) a group by key, key1, key2; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly
Yongzhi Chen created HIVE-12646: --- Summary: beeline and HIVE CLI do not parse ; in quote properly Key: HIVE-12646 URL: https://issues.apache.org/jira/browse/HIVE-12646 Project: Hive Issue Type: Bug Components: CLI, Clients Reporter: Yongzhi Chen Assignee: Vaibhav Gumashta Beeline and Cli have to escape ; in the quote while most other shell scripts need not. For example: in Beeline: {noformat} 0: jdbc:hive2://localhost:1> select ';' from tlb1; select ';' from tlb1; 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403 Error: Error while compiling statement: FAILED: ParseException line 1:8 cannot recognize input near '' ' {noformat} while in mysql shell: {noformat} mysql> SELECT CONCAT(';', 'foo') FROM test limit 3; ++ | ;foo | | ;foo | | ;foo | ++ 3 rows in set (0.00 sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12378) Exception on HBaseSerDe.serialize binary field
Yongzhi Chen created HIVE-12378: --- Summary: Exception on HBaseSerDe.serialize binary field Key: HIVE-12378 URL: https://issues.apache.org/jira/browse/HIVE-12378 Project: Hive Issue Type: Bug Components: HBase Handler, Serializers/Deserializers Affects Versions: 1.1.0, 1.0.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen An issue was reproduced with the binary typed HBase columns in Hive: It works fine as below: CREATE TABLE test9 (key int, val string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:val#b" ); insert into test9 values(1,"hello"); But when string type is changed to binary as: CREATE TABLE test2 (key int, val binary) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:val#b" ); insert into table test2 values(1, 'hello'); The following exception is thrown: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"} ... Caused by: java.lang.RuntimeException: Hive internal error. at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282) ... 16 more We should support hive binary type column for hbase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
Yongzhi Chen created HIVE-12189: --- Summary: The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large Key: HIVE-12189 URL: https://issues.apache.org/jira/browse/HIVE-12189 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries are very slow in compile time, for example following query {noformat} select * from tt1 nf join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and a2.hdp_databaseid = nf.hdp_databaseid) join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and a5.hdp_databaseid = nf.hdp_databaseid) JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid = nf.hdp_databaseid) JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid = nf.hdp_databaseid) where nf.hdp_databaseid = 102 limit 10; {noformat} takes around 120 seconds to compile in hive 1.1 when hive.mapred.mode=strict; hive.optimize.ppd=true; and hive is not in test mode. All the above tables are tables with one column as partition. But all the tables are empty table. If the tables are not empty, it is reported that the compile so slow that it looks like hive is hanging. In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is still a lot of time. One of the problem slows ppd down is that list in pushdownPreds can grow very large which makes extractPushdownPreds bad performance: {noformat} public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, Operator op, List preds) {noformat} During run the query above, in the following break point preds has size of 12051, and most entry of the list is: GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), Following code in extractPushdownPreds will clone all the nodes in preds and do the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much faster, but we still clone thousands of nodes with same expression. Should we store so many same predicates in the list or just one is good enough? {noformat} List startNodes = new ArrayList(); List clonedPreds = new ArrayList(); for (ExprNodeDesc node : preds) { ExprNodeDesc clone = node.clone(); clonedPreds.add(clone); exprContext.getNewToOldExprMap().put(clone, node); } startNodes.addAll(clonedPreds); egw.startWalking(startNodes, null); {noformat} Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java method public void addFinalCandidate(String alias, ExprNodeDesc expr) and public void addPushDowns(String alias, List pushDowns) to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12058) Change hive script to record errors when calling hbase fails
Yongzhi Chen created HIVE-12058: --- Summary: Change hive script to record errors when calling hbase fails Key: HIVE-12058 URL: https://issues.apache.org/jira/browse/HIVE-12058 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 1.1.0, 0.14.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen By default hive will try to find out which jars need to be added to the classpath in order to run MR jobs against an HBase cluster, however if hbase can't be found or if hbase mapredcp fails, the hive script will fail silently and ignore some of the jars to be included into the. That makes very difficult to analyze the real problem. Hive script should record the error not just simply redirect two hbase failures: HBASE_BIN=$ {HBASE_BIN:-"$(which hbase 2>/dev/null)"} $HBASE_BIN mapredcp 2>/dev/null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false
Yongzhi Chen created HIVE-12008: --- Summary: Make last two tests added by HIVE-11384 pass when hive.in.test is false Key: HIVE-12008 URL: https://issues.apache.org/jira/browse/HIVE-12008 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The last two qfile unit tests fail when hive.in.test is false. It may relate how we handle prunelist for select. When select include every column in a table, the prunelist for the select is empty. It may cause issues to calculate its parent's prunelist.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11982) Some test case for union all with recent changes
Yongzhi Chen created HIVE-11982: --- Summary: Some test case for union all with recent changes Key: HIVE-11982 URL: https://issues.apache.org/jira/browse/HIVE-11982 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The tests throw java.lang.IndexOutOfBoundsException again. It was supposed to be fixed by HIVE-11271 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11801) In HMS HA env, "show databases" fails when"current" HMS is stopped.
Yongzhi Chen created HIVE-11801: --- Summary: In HMS HA env, "show databases" fails when"current" HMS is stopped. Key: HIVE-11801 URL: https://issues.apache.org/jira/browse/HIVE-11801 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0, 1.2.0, 0.14.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Reproduce steps: # Enable HMS HA on a cluster # Use beeline to connect to HS2 and execute command {{show databases}}. Don't quit beeline after command has finished # Stop the first HMS in configuration {{hive.metastore.uri}} # Execute {{show databases}} in beeline again. Will get below error: {noformat} MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe) {noformat} The error message in HS2 is as below: {noformat} 2015-09-08 12:06:53,236 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161) at org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:501) at org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.TFilterTransport.flush(TFilterTransport.java:77) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_databases(ThriftHiveMetastore.java:692) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:684) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:964) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1909) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at org.apache.hive.service.cli.operation.GetSchemasOperation.runInternal(GetSchemasOperation.java:59) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:462) at org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:296) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:534) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159) ... 31 more 2015-09-08 12:06:53,238 ERROR hive.log: Converting exception to MetaException 2015-09-08 12:06:53,238 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error getting schemas: org.apache.hive.service.cli.HiveSQLException: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException
[jira] [Created] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working
Yongzhi Chen created HIVE-11745: --- Summary: Alter table Exchange partition with multiple partition_spec is not working Key: HIVE-11745 URL: https://issues.apache.org/jira/browse/HIVE-11745 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Single partition works, but multiple partitions will not work. Reproduce steps: {noformat} DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; DROP TABLE IF EXISTS t3; DROP TABLE IF EXISTS t4; CREATE TABLE t1 (a int) PARTITIONED BY (d1 int); CREATE TABLE t2 (a int) PARTITIONED BY (d1 int); CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int); CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int); INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 10; INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM jsmall LIMIT 10; SELECT * FROM t1; SELECT * FROM t3; ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1; SELECT * FROM t1; SELECT * FROM t2; ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3; SELECT * FROM t3; SELECT * FROM t4; {noformat} The output: {noformat} 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3; +---+++--+ | t3.a | t3.d1 | t3.d2 | +---+++--+ +---+++--+ No rows selected (0.227 seconds) 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4; +---+++--+ | t4.a | t4.d1 | t4.d2 | +---+++--+ +---+++--+ No rows selected (0.266 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11604) HIVE return wrong results in some queries with PTF function
Yongzhi Chen created HIVE-11604: --- Summary: HIVE return wrong results in some queries with PTF function Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.1.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11502) Map side aggregation is extremely slow
Yongzhi Chen created HIVE-11502: --- Summary: Map side aggregation is extremely slow Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40), the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11380) NPE when FileSinkOperator is not inialized
Yongzhi Chen created HIVE-11380: --- Summary: NPE when FileSinkOperator is not inialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333
Yongzhi Chen created HIVE-11384: --- Summary: Add Test case which cover both HIVE-11271 and HIVE-11333 Key: HIVE-11384 URL: https://issues.apache.org/jira/browse/HIVE-11384 Project: Hive Issue Type: Test Components: Logical Optimizer, Parser Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11319) CTAS with location qualifier overwrites directories
Yongzhi Chen created HIVE-11319: --- Summary: CTAS with location qualifier overwrites directories Key: HIVE-11319 URL: https://issues.apache.org/jira/browse/HIVE-11319 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen CTAS with location clause acts as an insert overwrite. This can cause problems when there sub directories with in a directory. This cause some users accidentally wipe out directories with very important data. We should bind CTAS with location to a non-empty directory. Reproduce: create table ctas1 location '/Users/ychen/tmp' as select * from jsmall limit 10; create table ctas2 location '/Users/ychen/tmp' as select * from jsmall limit 5; Both creates will succeed. But value in table ctas1 will be replaced by ctas2 accidentally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
Yongzhi Chen created HIVE-11271: --- Summary: java.lang.IndexOutOfBoundsException when union all with if function Key: HIVE-11271 URL: https://issues.apache.org/jira/browse/HIVE-11271 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Some queries with Union all as subquery fail in MapReduce task with stacktrace: {noformat} 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing operator UNION[104] 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor complete. 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: job_local826862759_0005 java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) ... 21 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119) ... 21 more {noformat} Reproduce: {noformat} create table if not exists union_all_bug_test_1 ( f1 int, f2 int ); create table if not exists union_all_bug_test_2 ( f1 int ); SELECT f1 FROM ( SELECT f1 , if('helloworld' like '%hello%' ,f1,f2) as filter FROM union_all_bug_test_1 union all select f1 , 0 as filter from union_all_bug_test_2 ) A WHERE (filter = 1); {noformat} -- This message was
[jira] [Created] (HIVE-11208) Can not drop a default partition __HIVE_DEFAULT_PARTITION__ which is not a string type
Yongzhi Chen created HIVE-11208: --- Summary: Can not drop a default partition __HIVE_DEFAULT_PARTITION__ which is not a string type Key: HIVE-11208 URL: https://issues.apache.org/jira/browse/HIVE-11208 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.1.0 Reporter: Yongzhi Chen When partition is not a string type, for example, if it is a int type, when drop the default partition __HIVE_DEFAULT_PARTITION__, you will get: SemanticException Unexpected unknown partitions Reproduce: {noformat} SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1; DROP TABLE IF EXISTS test; CREATE TABLE test (col1 string) PARTITIONED BY (p1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE; INSERT OVERWRITE TABLE test PARTITION (p1) SELECT code, IF(salary 600, 100, null) as p1 FROM jsmall; hive SHOW PARTITIONS test; OK p1=100 p1=__HIVE_DEFAULT_PARTITION__ Time taken: 0.124 seconds, Fetched: 2 row(s) hive ALTER TABLE test DROP partition (p1 = '__HIVE_DEFAULT_PARTITION__'); FAILED: SemanticException Unexpected unknown partitions for (p1 = null) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11150) Remove wrong warning message related to chgrp
Yongzhi Chen created HIVE-11150: --- Summary: Remove wrong warning message related to chgrp Key: HIVE-11150 URL: https://issues.apache.org/jira/browse/HIVE-11150 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 1.2.0, 1.0.0, 0.14.0, 0.13.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When using other file system other than hdfs, users see warning message regarding hdfs chgrp. The warning is very annoying and confusing. We'd better remove it. The warning example: {noformat} hive insert overwrite table s3_test select total_emp, salary, description from sample_07 limit 5; -chgrp: '' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
Yongzhi Chen created HIVE-2: --- Summary: ISO-8859-1 text output has fragments of previous longer rows appended Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk 3. Execute SELECT * FROM person_lat1 Result - The following output appears: +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
Yongzhi Chen created HIVE-11062: --- Summary: Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11042) Need fix Utilities.replaceTaskId method
Yongzhi Chen created HIVE-11042: --- Summary: Need fix Utilities.replaceTaskId method Key: HIVE-11042 URL: https://issues.apache.org/jira/browse/HIVE-11042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When I are looking at other bug, I found Utilities.replaceTaskId (String, int) method is not right. For example Utilities.replaceTaskId(ds%3D1)01, 5); return 5 It should return (ds%3D1)05 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10881) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10881: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10881 URL: https://issues.apache.org/jira/browse/HIVE-10881 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 1.3.0 Reporter: Yongzhi Chen Priority: Critical When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10879) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10879: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10879 URL: https://issues.apache.org/jira/browse/HIVE-10879 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Yongzhi Chen Priority: Blocker When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10880) The bucket number is not respected in insert overwrite.
Yongzhi Chen created HIVE-10880: --- Summary: The bucket number is not respected in insert overwrite. Key: HIVE-10880 URL: https://issues.apache.org/jira/browse/HIVE-10880 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Yongzhi Chen Priority: Blocker When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2. This is a regression. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestinput( data string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Then I inserted the following data into the buckettestinput table firstinsert1 firstinsert2 firstinsert3 firstinsert4 firstinsert5 firstinsert6 firstinsert7 firstinsert8 secondinsert1 secondinsert2 secondinsert3 secondinsert4 secondinsert5 secondinsert6 secondinsert7 secondinsert8 set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141) {noformat} The related debug information related to insert overwrite: {noformat} 0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 0: jdbc:hive2://localhost:1 ; select * from buckettestinput where data like ' first%'; INFO : Number of reduce tasks determined at compile time: 2 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=number INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=number INFO : In order to set a constant number of reducers: INFO : set mapred.reduce.tasks=number INFO : Job running in-process (local Hadoop) INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100% INFO : Ended Job = job_local107155352_0001 INFO : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1 INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48] No rows affected (1.692 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10866) Throw error when client try to insert into bucketed table
Yongzhi Chen created HIVE-10866: --- Summary: Throw error when client try to insert into bucketed table Key: HIVE-10866 URL: https://issues.apache.org/jira/browse/HIVE-10866 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen Currently, hive does not support appends(insert into) bucketed table, see open jira HIVE-3608. When insert into such table, the data will be corrupted and not fit for bucketmapjoin. We need find a way to prevent client from inserting into such table. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert into table buckettestoutput1 select code from sample_07 where total_emp 134354250 limit 10; After this first insert, I did: set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask=true; 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); +---+---+ | data | data | +---+---+ +---+---+ So select works fine. Second insert: 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select code from sample_07 where total_emp = 134354250 limit 10; No rows affected (61.235 seconds) Then select: 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 (state=42000,code=10141) 0: jdbc:hive2://localhost:1 {noformat} Insert into empty table or partition will be fine, but insert into the non-empty one (after second insert in the reproduce), the bucketmapjoin will throw an error. We should not let second insert succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10771) separatorChar has no effect in CREATE TABLE AS SELECT statement
Yongzhi Chen created HIVE-10771: --- Summary: separatorChar has no effect in CREATE TABLE AS SELECT statement Key: HIVE-10771 URL: https://issues.apache.org/jira/browse/HIVE-10771 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Yongzhi Chen Assignee: Yongzhi Chen To replicate: CREATE TABLE separator_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (separatorChar = |,quoteChar=\,escapeChar= ) STORED AS TEXTFILE AS SELECT * FROM sample_07; Then hadoop fs -cat /user/hive/warehouse/separator_test/* 53-3032,Truck drivers, heavy and tractor-trailer,1693590,37560 53-3033,Truck drivers, light or delivery services,922900,28820 53-3041,Taxi drivers and chauffeurs,165590,22740 The separator is till ,, not | as specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10646) ColumnValue does not handle NULL_TYPE
Yongzhi Chen created HIVE-10646: --- Summary: ColumnValue does not handle NULL_TYPE Key: HIVE-10646 URL: https://issues.apache.org/jira/browse/HIVE-10646 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen This will cause NPE if the thrift client use protocol V5 or older: {noformat} 1:46:07.199 PM ERROR org.apache.thrift.server.TThreadPoolServer Error occurred during processing of message. java.lang.NullPointerException at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388) at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338) at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525) at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486) at org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client (for example some version of Hue). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10453) HS2 leaking open file descriptors when using UDFs
Yongzhi Chen created HIVE-10453: --- Summary: HS2 leaking open file descriptors when using UDFs Key: HIVE-10453 URL: https://issues.apache.org/jira/browse/HIVE-10453 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen 1. create a custom function by CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar'; 2. Create a simple jdbc client, just do connect, run simple query which using the function such as: select myfunc(col1) from sometable 3. Disconnect. Check open file for HiveServer2 by: lsof -p HSProcID | grep myudf.jar You will see the leak as: {noformat} java 28718 ychen txt REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar java 28718 ychen 330r REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster
Yongzhi Chen created HIVE-10098: --- Summary: HS2 local task for map join fails in KMS encrypted cluster Key: HIVE-10098 URL: https://issues.apache.org/jira/browse/HIVE-10098 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Env: KMS was enabled after cluster was kerberos secured. Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails with a java.lang.reflect.UndeclaredThrowableException from KMSClientProvider.addDelegationTokens. {code} 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map local work failed java.io.IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559) ... 9 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808) ... 18 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command
Yongzhi Chen created HIVE-9813: -- Summary: Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command Key: HIVE-9813 URL: https://issues.apache.org/jira/browse/HIVE-9813 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Execute following JDBC client program: {code} import java.sql.*; public class TestAddJar { private static Connection makeConnection(String connString, String classPath) throws ClassNotFoundException, SQLException { System.out.println(Current Connection info: + connString); Class.forName(classPath); System.out.println(Current driver info: + classPath); return DriverManager.getConnection(connString); } public static void main(String[] args) { if(2 != args.length) { System.out.println(Two arguments needed: connection string, path to jar to be added (include jar name)); System.out.println(Example: java -jar TestApp.jar jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar); return; } Connection conn; try { conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver); System.out.println(---); System.out.println(DONE); System.out.println(---); System.out.println(Execute query: add jar + args[1] + ;); Statement stmt = conn.createStatement(); int c = stmt.executeUpdate(add jar + args[1]); System.out.println(Returned value is: [ + c + ]\n); System.out.println(---); final String createTableQry = Create table if not exists json_test(id int, content string) + row format serde 'org.openx.data.jsonserde.JsonSerDe'; System.out.println(Execute query: + createTableQry + ;); stmt.execute(createTableQry); System.out.println(---); System.out.println(getColumn() Call---\n); DatabaseMetaData md = conn.getMetaData(); System.out.println(Test get all column in a schema:); ResultSet rs = md.getColumns(Hive, default, json_test, null); while (rs.next()) { System.out.println(rs.getString(1)); } conn.close(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } } {code} Get Exception, and from metastore log: 7:41:30.316 PM ERROR hive.log error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy5.get_schema(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:556) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9716: --- Attachment: HIVE-9716.1.patch Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-9716.1.patch When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: {noformat} 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9716: --- Status: Patch Available (was: Open) Need code review. Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.0, 0.12.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-9716.1.patch When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: {noformat} 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
Yongzhi Chen created HIVE-9716: -- Summary: Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.0, 0.12.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count (*) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9716: --- Description: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more was: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count (*) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException:
[jira] [Commented] (HIVE-9528) SemanticException: Ambiguous column reference
[ https://issues.apache.org/jira/browse/HIVE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301389#comment-14301389 ] Yongzhi Chen commented on HIVE-9528: [~navis], any idea which jira cause the change of behavior? And Yes, we can close the jira as not-problem. Thanks SemanticException: Ambiguous column reference - Key: HIVE-9528 URL: https://issues.apache.org/jira/browse/HIVE-9528 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Navis When running the following query: {code} SELECT if( COUNT(*) = 0, 'true', 'false' ) as RESULT FROM ( select * from sim a join sim2 b on a.simstr=b.simstr) app Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference simstr in app (state=42000,code=10007) {code} This query works fine in hive 0.10 In the apache trunk, following workaround will work: {code} SELECT if(COUNT(*) = 0, 'true', 'false') as RESULT FROM (select a.* from sim a join sim2 b on a.simstr=b.simstr) app; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9528) SemanticException: Ambiguous column reference
[ https://issues.apache.org/jira/browse/HIVE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301429#comment-14301429 ] Yongzhi Chen commented on HIVE-9528: Is this jira? https://issues.apache.org/jira/browse/HIVE-2723 Thanks SemanticException: Ambiguous column reference - Key: HIVE-9528 URL: https://issues.apache.org/jira/browse/HIVE-9528 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Navis When running the following query: {code} SELECT if( COUNT(*) = 0, 'true', 'false' ) as RESULT FROM ( select * from sim a join sim2 b on a.simstr=b.simstr) app Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference simstr in app (state=42000,code=10007) {code} This query works fine in hive 0.10 In the apache trunk, following workaround will work: {code} SELECT if(COUNT(*) = 0, 'true', 'false') as RESULT FROM (select a.* from sim a join sim2 b on a.simstr=b.simstr) app; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9528) SemanticException: Ambiguous column reference
Yongzhi Chen created HIVE-9528: -- Summary: SemanticException: Ambiguous column reference Key: HIVE-9528 URL: https://issues.apache.org/jira/browse/HIVE-9528 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Yongzhi Chen When running the following query: SELECT if( COUNT( * ) == 0, 'true', 'false' ) as RESULT FROM ( select * from sim a join sim2 b on a.simstr=b.simstr) app Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference simstr in app (state=42000,code=10007) This query works fine in hive 0.10 In the apache trunk, following workaround will work: SELECT if(COUNT( * ) == 0, 'true', 'false') as RESULT FROM (select a.* from sim a join sim2 b on a.simstr=b.simstr) app; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299469#comment-14299469 ] Yongzhi Chen commented on HIVE-7733: [~navis], I just create a new jira related to the issue, do you want to look at it? HIVE-9528 Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt, HIVE-7733.5.patch.txt, HIVE-7733.6.patch.txt, HIVE-7733.7.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295305#comment-14295305 ] Yongzhi Chen commented on HIVE-6308: Thank you Szehon! This fix treats creating Avro tables without col defs in hive the same as creating table with all col defs. This fix does not address this kind of avro tables created before the fix. Tested with hive command: analyze table compute statistics for column. COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Fix For: 1.2.0 Attachments: HIVE-6308.1.patch Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291325#comment-14291325 ] Yongzhi Chen commented on HIVE-6308: The test failures are not related to the change. COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Attachments: HIVE-6308.1.patch Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-6308: --- Status: Patch Available (was: Open) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Attachments: HIVE-6308.1.patch Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-6308: --- Attachment: HIVE-6308.1.patch Need code review COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Attachments: HIVE-6308.1.patch Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned HIVE-6308: -- Assignee: Yongzhi Chen COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
[ https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279450#comment-14279450 ] Yongzhi Chen commented on HIVE-9393: [~brocknoland], could you review and commit the patch? Thanks. reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG --- Key: HIVE-9393 URL: https://issues.apache.org/jira/browse/HIVE-9393 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-9393.1.patch From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG to INFO, this has introduced an very large amount of noise into the logs causing the underlying filesystem to fill up. This request is to drop is back to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
[ https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned HIVE-9393: -- Assignee: Yongzhi Chen reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG --- Key: HIVE-9393 URL: https://issues.apache.org/jira/browse/HIVE-9393 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG to INFO, this has introduced an very large amount of noise into the logs causing the underlying filesystem to fill up. This request is to drop is back to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
Yongzhi Chen created HIVE-9393: -- Summary: reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG Key: HIVE-9393 URL: https://issues.apache.org/jira/browse/HIVE-9393 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Yongzhi Chen Priority: Minor From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG to INFO, this has introduced an very large amount of noise into the logs causing the underlying filesystem to fill up. This request is to drop is back to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
[ https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9393: --- Status: Patch Available (was: Open) Need code review. reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG --- Key: HIVE-9393 URL: https://issues.apache.org/jira/browse/HIVE-9393 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-9393.1.patch From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG to INFO, this has introduced an very large amount of noise into the logs causing the underlying filesystem to fill up. This request is to drop is back to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
[ https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9393: --- Attachment: HIVE-9393.1.patch reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG --- Key: HIVE-9393 URL: https://issues.apache.org/jira/browse/HIVE-9393 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-9393.1.patch From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG to INFO, this has introduced an very large amount of noise into the logs causing the underlying filesystem to fill up. This request is to drop is back to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266175#comment-14266175 ] Yongzhi Chen commented on HIVE-9201: Even we will support line terminator other than \n in the future, we have to handle the case when line terminator used in string value. Any suggestions or corrections for my current approach? Or any better ideas? Thanks Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265035#comment-14265035 ] Yongzhi Chen commented on HIVE-9201: Just found out, in SerDeUtils, escapeString and lightEscapeString use the same way to escape \n and \r as my fix for the issue: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L98 https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L129 Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264965#comment-14264965 ] Yongzhi Chen commented on HIVE-9201: [~ashutoshgupt...@gmail.com], Are you trying to say we start to Implement LINES TERMINATED BY for hive? It is treated as not fixable by https://issues.apache.org/jira/browse/HIVE-302 In current hive code, it seems we just error out the line terminator other than \n, and many places just assume the \n is the only line terminator. case HiveParser.TOK_TABLEROWFORMATLINES: String lineDelim = unescapeSQLString(rowChild.getChild(0).getText()); tblDesc.getProperties().setProperty(serdeConstants.LINE_DELIM, lineDelim); if (!lineDelim.equals(\n) !lineDelim.equals(10)) { throw new SemanticException(generateErrorMessage(rowChild, ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg())); } break; But with MAPREDUCE-2602 fixed, it is possible for hive to support changing the line terminator. Just wonder it may not be a easy change. Thanks. Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)