[jira] [Created] (HIVE-25757) Use cached database type to choose metastore backend queries

2021-12-01 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25757:
---

 Summary: Use cached database type to choose metastore backend 
queries
 Key: HIVE-25757
 URL: https://issues.apache.org/jira/browse/HIVE-25757
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In HIVE-21075, we use DatabaseProduct.determineDatabaseProduct which can be 
expensive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25238:
---

 Summary: Make excluded SSL cipher suites configurable for Hive Web 
UI and HS2
 Key: HIVE-25238
 URL: https://issues.apache.org/jira/browse/HIVE-25238
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Web UI
Reporter: Yongzhi Chen


When starting a jetty http server, one can explicitly exclude certain (unsecure)
SSL cipher suites. This can be especially important, when Hive
needs to be compliant with security regulations. Need add properties to support 
Hive WebUi and HiveServer2 to this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25211) Create database throws NPE

2021-06-07 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25211:
---

 Summary: Create database throws NPE
 Key: HIVE-25211
 URL: https://issues.apache.org/jira/browse/HIVE-25211
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Yongzhi Chen


<11>1 2021-06-06T17:32:48.964Z 
metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local 
metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 
class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] 
MetaException(message:java.lang.NullPointerException)

at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
at com.sun.proxy.$Proxy31.create_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1618)
... 21 more




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24552) Possible HMS connections leak or accumulation in loadDynamicPartitions

2020-12-20 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24552:
---

 Summary: Possible HMS connections leak or accumulation in 
loadDynamicPartitions
 Key: HIVE-24552
 URL: https://issues.apache.org/jira/browse/HIVE-24552
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When loadDynamicPartitions (Hive.java) is called, it generates several threads 
to handle FileMove. These threads may generate HiveMetaStore connections. These 
connections may not be closed in time and cause many accumulated connections. 
Following is the log got from running insert overwrites many times, you can see 
these threads created new HMS connections, and the total number of open 
connections is large. And the finalizer closes the connections and sometimes 
had errors:
{noformat}
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] Opened a connection to metastore, 
current connections: 44021
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] Connected to metastore.
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] Opened a connection to metastore, 
current connections: 44022
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] Connected to metastore.
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] Opened a connection to metastore, 
current connections: 44023
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] Connected to metastore.
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-3"] Opened a connection to metastore, 
current connections: 44024


<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43904
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43903
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
con

[jira] [Created] (HIVE-24392) Send table id in get_parttions_by_names_req api

2020-11-16 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24392:
---

 Summary: Send table id in get_parttions_by_names_req api
 Key: HIVE-24392
 URL: https://issues.apache.org/jira/browse/HIVE-24392
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Table id is not part of the get_partitions_by_names_req API thrift definition, 
add it by this Jira



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24292:
---

 Summary: hive webUI should support keystoretype by config
 Key: HIVE-24292
 URL: https://issues.apache.org/jira/browse/HIVE-24292
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24253) HMS needs to support keystore/truststores types besides JKS

2020-10-09 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24253:
---

 Summary: HMS needs to support keystore/truststores types besides 
JKS
 Key: HIVE-24253
 URL: https://issues.apache.org/jira/browse/HIVE-24253
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
the default keystore type specified for the JDK and not always use JKS. Same as 
HIVE-23958 for hive, HMS should support to set additional keystore/truststore 
types used for different applications like for FIPS crypto algorithms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24236) Connection leak in TxnHandler

2020-10-06 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24236:
---

 Summary: Connection leak in TxnHandler
 Key: HIVE-24236
 URL: https://issues.apache.org/jira/browse/HIVE-24236
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We see failures in QE tests with cannot allocate connections errors. The 
exception stack like following:
{noformat}
2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler 
(TxnHandler.java:checkRetryable(3733)) - Non-retryable error in 
heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, 
general error (SQLState=null, ErrorCode=0)
2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to 
select from transaction database org.apache.commons.dbcp.SQLNestedException: 
Cannot get a connection, general error
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1112)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 29 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2747)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
{noformat}

and
{noformat}
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1134)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 53 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:3375)
at 
org.apache.hadoop.hive.metastore.AcidEventListener.onDropTable(AcidEventListener.java:65)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$19.notify(MetaStoreListenerNotifier.java:103)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent

[jira] [Created] (HIVE-22461) NPE Metastore Transformer

2019-11-05 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-22461:
---

 Summary: NPE Metastore Transformer
 Key: HIVE-22461
 URL: https://issues.apache.org/jira/browse/HIVE-22461
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.1.2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The stack looks as following:
{noformat}
2019-10-08 18:09:12,198 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[pool-6-thread-328]: Starting translation for processor 
Hiveserver2#3.1.2000.7.0.2.0...@vc0732.halxg.cloudera.com on list 1
2019-10-08 18:09:12,198 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-328]: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy28.get_table_req(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2019-10-08 18:09:12,199 ERROR org.apache.thrift.server.TThreadPoolServer: 
[pool-6-thread-328]: Error occurred during processing of message.
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59

[jira] [Created] (HIVE-21840) Hive Metastore Translation: Bucketed table Readonly capability

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21840:
---

 Summary: Hive Metastore Translation: Bucketed table Readonly 
capability
 Key: HIVE-21840
 URL: https://issues.apache.org/jira/browse/HIVE-21840
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


Impala needs a new capability to tell only read supported for bucketed tables. 
No matter it is managed or external, ACID or not. Also in the current 
implementation, when HIVEBUCKET2 is not in the capabilities list, a bucked 
external table returned as an un-bucketed one,  we need a way to know it is 
"downgraded" from a bucketed table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21839) Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21839:
---

 Summary: Hive Metastore Translation: Hive need to create a type of 
table if the client does not have the write capability for it
 Key: HIVE-21839
 URL: https://issues.apache.org/jira/browse/HIVE-21839
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


Hive can either return an error message or provide an API call to check the 
permission even without a table instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21838) Hive Metastore Translation: Add API call to tell client why table has limited access

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21838:
---

 Summary: Hive Metastore Translation: Add API call to tell client 
why table has limited access
 Key: HIVE-21838
 URL: https://issues.apache.org/jira/browse/HIVE-21838
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


When a table access type is Read-only or None, we need a way to tell clients 
why. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69672: HIVE-21045: Add total API timing stats and connection pool stats to metrics

2019-01-11 Thread Yongzhi Chen via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69672/#review211886
---


Ship it!




Ship It!

- Yongzhi Chen


On Jan. 5, 2019, 12:41 a.m., Karthik Manamcheri wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69672/
> ---
> 
> (Updated Jan. 5, 2019, 12:41 a.m.)
> 
> 
> Review request for hive, Adam Holley, Morio Ramdenbourg, Naveen Gangam, and 
> Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21045: Add total API timing stats and connection pool stats to metrics
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java
>  dfd7abff85 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
>  7e33c519a8 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProvider.java
>  6dc63fb3bc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java
>  5a92e104be 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
>  7fe487b184 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java
>  8f6ae57e36 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
>  3b188f83af 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java
>  a2def26fc5 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  2a6290315a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java
>  6ae7f50471 
> 
> 
> Diff: https://reviews.apache.org/r/69672/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing to verify that the new metrics show up for hikaricp, bonecp, 
> and also the total stats. Here are samples of
> 1. [HikariCP json metrics 
> sample](https://gist.github.com/kmanamcheri/48ff2a680e85c7e925a6f95a9384dcef)
> 2. [BoneCP json metrics 
> sample](https://gist.github.com/kmanamcheri/b005f68263a1a1be06b25156a159d975)
> 
> In both the reports note that there are pool gauges (for tracking the 
> connection pool info) and also a timer for total api calls.
> 
> 
> Thanks,
> 
> Karthik Manamcheri
> 
>



Re: Review Request 69672: HIVE-21045: Add total API timing stats and connection pool stats to metrics

2019-01-09 Thread Yongzhi Chen via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69672/#review211805
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java
Line 228 (original), 227 (patched)
<https://reviews.apache.org/r/69672/#comment297398>

This is a little bit different from the old impl funtionally : The old impl 
will return null if there is no custom properties, where new impl will still 
return the provider. The old impl has a kind of sanity check. But if the custom 
properties are not required here, it should be fine.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
Lines 101 (patched)
<https://reviews.apache.org/r/69672/#comment297397>

If registry is null, should we give a warning in the log?


- Yongzhi Chen


On Jan. 5, 2019, 12:41 a.m., Karthik Manamcheri wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69672/
> ---
> 
> (Updated Jan. 5, 2019, 12:41 a.m.)
> 
> 
> Review request for hive, Adam Holley, Morio Ramdenbourg, Naveen Gangam, and 
> Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21045: Add total API timing stats and connection pool stats to metrics
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java
>  dfd7abff85 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
>  7e33c519a8 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProvider.java
>  6dc63fb3bc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java
>  5a92e104be 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
>  7fe487b184 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java
>  8f6ae57e36 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
>  3b188f83af 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java
>  a2def26fc5 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  2a6290315a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java
>  6ae7f50471 
> 
> 
> Diff: https://reviews.apache.org/r/69672/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing to verify that the new metrics show up for hikaricp, bonecp, 
> and also the total stats. Here are samples of
> 1. [HikariCP json metrics 
> sample](https://gist.github.com/kmanamcheri/48ff2a680e85c7e925a6f95a9384dcef)
> 2. [BoneCP json metrics 
> sample](https://gist.github.com/kmanamcheri/b005f68263a1a1be06b25156a159d975)
> 
> In both the reports note that there are pool gauges (for tracking the 
> connection pool info) and also a timer for total api calls.
> 
> 
> Thanks,
> 
> Karthik Manamcheri
> 
>



[jira] [Created] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2018-12-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21075:
---

 Summary: Metastore: Drop partition performance downgrade with 
Postgres DB
 Key: HIVE-21075
 URL: https://issues.apache.org/jira/browse/HIVE-21075
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Yongzhi Chen


In order to workaround oracle not supporting limit statement caused performance 
issue, HIVE-9447 makes all the backend DB run select count(1) from SDS where 
SDS.CD_ID=? to check if the specific CD_ID is referenced in SDS table before 
drop a partition. This select count(1) statement does not scale well in 
Postgres, and there is no index for CD_ID column in SDS table.
For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
without index, while in 10-20ms with index. But the statement before HIVE-9447( 
SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses less than 10ms .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21019) Fix autoColumnStats tests to make auto stats gather possible.

2018-12-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21019:
---

 Summary: Fix autoColumnStats tests to make auto stats gather 
possible.
 Key: HIVE-21019
 URL: https://issues.apache.org/jira/browse/HIVE-21019
 Project: Hive
  Issue Type: Bug
  Components: Test
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Before https://issues.apache.org/jira/browse/HIVE-20915 , the optimizer sort 
dynamic partitions is turn off for these tests. So these test can have group by 
in the query plan which can trigger compute statistics. After the jira, the 
optimizer is enabled, the query plan do not have group by, but a reduce sorting 
operation. In order to test the auto column stats gather feature, we should 
disable sort dynamic partitions for these tests. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20915) Make dynamic sort partition optimization available to HoS and MR

2018-11-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20915:
---

 Summary: Make dynamic sort partition optimization available to HoS 
and MR
 Key: HIVE-20915
 URL: https://issues.apache.org/jira/browse/HIVE-20915
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


HIVE-20703 put dynamic sort partition optimization under cost based decision, 
but it also makes the optimizer only available to tez. 
hive.optimize.sort.dynamic.partition works with other execution engines for a 
long time, we should keep the optimizer available to them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20741) Disable or fix random failed tests

2018-10-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20741:
---

 Summary: Disable or fix random failed tests
 Key: HIVE-20741
 URL: https://issues.apache.org/jira/browse/HIVE-20741
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


Two qfile tests for TestCliDriver, they may all relate to number precision 
issues:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_context_ngrams] 
(batchId=79)

Error:
Client Execution succeeded but contained differences (error code = 1) after 
executing udaf_context_ngrams.q 
43c43
< [{"ngram":["travelling"],"estfrequency":1.0}]
---
> [{"ngram":["travelling"],"estfrequency":3.0}]

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_corr] (batchId=84)

Client Execution succeeded but contained differences (error code = 1) after 
executing udaf_corr.q 
100c100
< 0.6633880657639324
---
> 0.6633880657639326





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20695) HoS Query fails with hive.exec.parallel=true

2018-10-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20695:
---

 Summary: HoS Query fails with hive.exec.parallel=true
 Key: HIVE-20695
 URL: https://issues.apache.org/jira/browse/HIVE-20695
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Hive queries which fail when running a HiveOnSpark job:
{noformat}
ERROR : Failed to execute spark task, with exception 
'java.lang.Exception(Failed to submit Spark work, please retry later)'
java.lang.Exception: Failed to submit Spark work, please retry later
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.execute(RemoteHiveSparkClient.java:186)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:107)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive/dbname/_spark_session_dir/e202c452-8793-4e4e-ad55-61e3d4965c69/somename.jar
 (inode 725730760): File does not exist. [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-1981084042_486659, pending creates: 7]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3755)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3556)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3412)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:688)
{format}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20016) Investigate random test failure

2018-06-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20016:
---

 Summary: Investigate random test failure 
 Key: HIVE-20016
 URL: https://issues.apache.org/jira/browse/HIVE-20016
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3 failed with:
java.lang.AssertionError: Concurrent Statement failed: 
org.apache.hive.service.cli.HiveSQLException: java.lang.AssertionError: 
Authorization plugins not initialized!
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.finishTasks(TestJdbcWithMiniHS2.java:374)
at 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3(TestJdbcWithMiniHS2.java:304)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19897) Add more tests for parallel compilation

2018-06-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-19897:
---

 Summary: Add more tests for parallel compilation 
 Key: HIVE-19897
 URL: https://issues.apache.org/jira/browse/HIVE-19897
 Project: Hive
  Issue Type: Test
  Components: HiveServer2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The two parallel compilation tests in org.apache.hive.jdbc.TestJdbcWithMiniHS2 
do not real cover the case of queries compile concurrently from different 
connections. No sure it is on purpose or by mistake. Add more tests to cover 
the case. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Stricter commit guidelines

2018-05-16 Thread Yongzhi Chen
+1

On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth  wrote:

> +1
>
> On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
> jcama...@apache.org> wrote:
>
> > After work has been done to ignore most of the tests that were failing
> > consistently/intermittently [1], I wanted to start this vote to gather
> > support from the community to be stricter wrt committing patches to Hive.
> > The committers guide [2] already specifies that a +1 should be obtained
> > before committing, but there is another clause that allows committing
> under
> > the presence of flaky tests (clause 4). Flaky tests are as good as having
> > no tests, hence I propose to remove clause 4 and enforce the +1 from
> > testing infra before committing.
> >
> >
> >
> > As I see it, by enforcing that we always get a +1 from the testing infra
> > before committing, 1) we will have a more stable project, and 2) we will
> > have another incentive as a community to create a more robust testing
> > infra, e.g., replacing flaky tests for similar unit tests that are not
> > flaky, trying to decrease running time for tests, etc.
> >
> >
> >
> > Please, share your thoughts about this.
> >
> >
> >
> > Here is my +1.
> >
> >
> >
> > Thanks,
> >
> > Jesús
> >
> >
> >
> > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
> >
> > [2] https://cwiki.apache.org/confluence/display/Hive/
> > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
> >
> >
> >
> >
>


[jira] [Created] (HIVE-19296) Add log to record MapredLocalTask Failure

2018-04-25 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-19296:
---

 Summary: Add log to record MapredLocalTask Failure
 Key: HIVE-19296
 URL: https://issues.apache.org/jira/browse/HIVE-19296
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In some cases, When MapredLocalTask fails around Child process start time, we 
can not find the detail error information anywhere(not in strerr log, no 
MapredLocal log file). All we get is :
{noformat}
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Obtaining error information
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
Task failed!
Task ID:
  Stage-48

Logs:

*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
/var/log/hive/hadoop-cmf-hive1-HIVESERVER2-t.log.out
*** ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
{noformat}
It is really hard to debug. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

2018-04-17 Thread Yongzhi Chen via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review201323
---




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 7730 (patched)
<https://reviews.apache.org/r/66188/#comment282498>

Should you call addQueryAfterUse and closeAllQueries ? That's how do you 
release the resources held by the batch queries?


- Yongzhi Chen


On March 21, 2018, 6:57 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> ---
> 
> (Updated March 21, 2018, 6:57 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would 
> fail with the following stack trace. The issue is datanucleus can't handle 
> the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate 
> the result together.
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java
>  PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  6ead20aeaf 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  88d88ed4df 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  9f822564bd 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/2/
> 
> 
> Testing
> ---
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



[jira] [Created] (HIVE-18671) lock not released after Hive on Spark query was cancelled

2018-02-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-18671:
---

 Summary: lock not released after Hive on Spark query was cancelled
 Key: HIVE-18671
 URL: https://issues.apache.org/jira/browse/HIVE-18671
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When cancel the query is running on spark, the SparkJobMonitor can not return, 
therefore the locks hold by the query can not be released. When enable debug in 
log, you will see many log info as following:
{noformat}

2018-02-09 08:27:09,613 INFO 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: 
[HiveServer2-Background-Pool: Thread-80]: state = CANCELLED
2018-02-09 08:27:10,613 INFO 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: 
[HiveServer2-Background-Pool: Thread-80]: state = CANCELLED

{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Apache Hive 2.3.2 Release Candidate 0

2017-11-13 Thread Yongzhi Chen
+1
I verified the release by
. Checked the gpg signature
. Checked the md5 files.
And install the hive 2.3.2 and test commands:
show tables;
create table;
select * from table;

The release works fine.

On Mon, Nov 13, 2017 at 11:43 AM, Sergio Pena 
wrote:

> +1
>
> I verified the release by doing the following:
> * checked the gpg signature
> * checked the md5 files
> * installed hive 2.3.2 in my local machine with hadoop 2.7.2 and run a few
> commands:
>   > show databases
>   > show tables
>   > insert into table values()
>   > select * from table
>   > select count(*) from table
> * checked the maven artifacts are correctly pulled by other components and
> run unit tests
> * checked that storage-api-2.4.0 is pulled
> * checked the release tag
> * checked the RELEASE_NOTES, NOTICE, LICENSE are correct
>
> The release is working correctly.
>
> Thanks Sahil for making this release.
> - Sergio
>
> On Thu, Nov 9, 2017 at 5:37 PM, Sahil Takiar 
> wrote:
>
> > Apache Hive 2.3.2 Release Candidate 0 is available here:
> > http://people.apache.org/~stakiar/hive-2.3.2/
> >
> > Maven artifacts are available here:
> > https://repository.apache.org/content/repositories/orgapachehive-1082/
> >
> > Source tag for RCN is at:https://github.com/apache/
> hive/tree/release-2.3.2
> >
> > Voting will conclude in 72 hours.
> >
> > Hive PMC Members: Please test and vote.
> >
> > Thanks.
> >
>


[jira] [Created] (HIVE-17640) Comparison of date return null if only time part is provided in string.

2017-09-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-17640:
---

 Summary: Comparison of date return null if only time part is 
provided in string.
 Key: HIVE-17640
 URL: https://issues.apache.org/jira/browse/HIVE-17640
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 2.1.0


Reproduce:
select '2017-01-01 00:00:00' < current_date;
INFO  : OK
...
1 row selected (18.324 seconds)
...
 NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.

2017-06-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16875:
---

 Summary: Query against view with partitioned child on HoS fails 
with privilege exception.
 Key: HIVE-16875
 URL: https://issues.apache.org/jira/browse/HIVE-16875
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Query against view with child table that has partitions fails with privilege 
exception even with correct privileges.

Reproduce:
{noformat}
create table jsamp1 (a string) partitioned by (b int);
insert into table jsamp1 partition (b=1) values ("hello");
create view jview as select * from jsamp1;

create role viewtester;
grant all on table jview to role viewtester;
grant role viewtester to group testers;

Use MR, the select will succeed:
set hive.execution.engine=mr;
select count(*) from jview;

while use spark:
set hive.execution.engine=spark;
select count(*) from jview;

it fails with:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 User tester does not have privileges for QUERY
 The required privileges: 
Server=server1->Db=default->Table=j1part->action=select; 
(state=42000,code=4)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Welcome Rui Li to Hive PMC

2017-05-25 Thread Yongzhi Chen
Congrats Rui!

On Thu, May 25, 2017 at 1:48 PM, Vineet Garg  wrote:

> Congrats Rui!
>
> > On May 24, 2017, at 9:19 PM, Xuefu Zhang  wrote:
> >
> > Hi all,
> >
> > It's an honer to announce that Apache Hive PMC has recently voted to
> invite
> > Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and
> > committer, and has made significant contribution in Hive especially in
> Hive
> > on Spark. Please join me in congratulating him and looking forward to a
> > bigger role that he will play in Apache Hive project.
> >
> > Thanks,
> > Xuefu
>
>


[jira] [Created] (HIVE-16660) Not able to add partition for views in hive when sentry is enabled

2017-05-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16660:
---

 Summary: Not able to add partition for views in hive when sentry 
is enabled
 Key: HIVE-16660
 URL: https://issues.apache.org/jira/browse/HIVE-16660
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Repro:
create table tesnit (a int) partitioned by (p int);
insert into table tesnit partition (p = 1) values (1);
insert into table tesnit partition (p = 2) values (1);
create view test_view partitioned on (p) as select * from tesnit where p =1;

alter view test_view add partition (p = 2);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10056]: The query does not reference any valid partition. To run this query, 
set hive.mapred.mode=nonstrict (state=42000,code=10056)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 58992: HIVE-16572: Rename a partition should not drop its column stats

2017-05-08 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58992/#review174169
---


Ship it!




Ship It!

- Yongzhi Chen


On May 4, 2017, 2:19 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58992/
> ---
> 
> (Updated May 4, 2017, 2:19 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16572
> https://issues.apache.org/jira/browse/HIVE-16572
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch is to fix the issue in renaming a partition.
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 
> d8af7a7 
>   ql/src/test/queries/clientpositive/alter_table_column_stats.q 39dfb0c 
>   ql/src/test/queries/clientpositive/rename_external_partition_location.q 
> be93bd4 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 8739bfe 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out 
> 1670b4e 
> 
> 
> Diff: https://reviews.apache.org/r/58992/diff/1/
> 
> 
> Testing
> ---
> 
> Manual tests
> new qtests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 58992: HIVE-16572: Rename a partition should not drop its column stats

2017-05-08 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58992/#review174165
---




metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Line 555 (original), 568 (patched)
<https://reviews.apache.org/r/58992/#comment247282>

Can the transaction be properly rolled back with the incomplete state?


- Yongzhi Chen


On May 4, 2017, 2:19 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58992/
> ---
> 
> (Updated May 4, 2017, 2:19 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16572
> https://issues.apache.org/jira/browse/HIVE-16572
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch is to fix the issue in renaming a partition.
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 
> d8af7a7 
>   ql/src/test/queries/clientpositive/alter_table_column_stats.q 39dfb0c 
>   ql/src/test/queries/clientpositive/rename_external_partition_location.q 
> be93bd4 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 8739bfe 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out 
> 1670b4e 
> 
> 
> Diff: https://reviews.apache.org/r/58992/diff/1/
> 
> 
> Testing
> ---
> 
> Manual tests
> new qtests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 58456: Query cancel: improve the way to handle files

2017-04-20 Thread Yongzhi Chen


> On April 19, 2017, 5:50 p.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
> > Lines 46 (patched)
> > <https://reviews.apache.org/r/58456/diff/1/?file=1692688#file1692688line46>
> >
> > To be honest, I am not very comfortable to import the Driver here. I 
> > thought the CombineHiveInputFormat in io package is at a lower architecture 
> > layer than Driver ql. 
> > Is there any other way which we can detect if the thread has been 
> > interrupted (e.g. Thread.getCurrentThread().isInterrupted() etc?
> > Also as I recall (if I am right), there might be a class which handles 
> > this interrupt signal globally, I could not find it at this moment.

The check in CombineHiveInputFormat is just to check the threadlocal object, it 
is following the same pattern to check hive's own cancel related status. I 
think we are trying to avoid use the 
In the CombineHiveInputFormat, it can include:
import org.apache.hadoop.hive.ql.exec.Operator;
import org.apache.hadoop.hive.ql.exec.Utilities;

I do not think it is a problem to import
org.apache.hadoop.hive.ql.Driver


> On April 19, 2017, 5:50 p.m., Chaoyu Tang wrote:
> > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java
> > Lines 399 (patched)
> > <https://reviews.apache.org/r/58456/diff/1/?file=1692689#file1692689line399>
> >
> > As I understand, basically the cleanup is called with the parameter 
> > state value CANCELED, TIMEOUT and CLOSED, and here you are trying to 
> > address the race issue in the normal CLOSE case where the thread should not 
> > be interrupted and further clean the tmp file. Is it right?
> > Another thought, could moving the code
> > {code}
> >   ss.deleteTmpOutputFile();
> >   ss.deleteTmpErrOutputFile();
> > {code}
> > from sqlOperation to driver close() or destroy() will be help to solve 
> > the problem?

The exception error "Failed to clean-up tmp directories." is from 
Utilities.clearWork(job); from execute, the clearWork cleans the folders used 
for map and reduce plan path.
ss.deleteTmpOutputFile(); ss.deleteTmpErrOutputFile(); is to clean the output 
data tmp folder, so they are different. 

  /**
   * Temporary file name used to store results of non-Hive commands (e.g., set, 
dfs)
   * and HiveServer.fetch*() function will read results from this file
   */
  protected File tmpOutputFile;

  /**
   * Temporary file name used to store error output of executing non-Hive 
commands (e.g., set, dfs)
   */
  protected File tmpErrOutputFile;


- Yongzhi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58456/#review172374
---


On April 14, 2017, 1:14 p.m., Yongzhi Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58456/
> ---
> 
> (Updated April 14, 2017, 1:14 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena.
> 
> 
> Bugs: HIVE-16426
> https://issues.apache.org/jira/browse/HIVE-16426
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Use threadlocal variable to store cancel state to make it is accessible 
> without being passed around by parameters. 
> 2. Add checkpoints for file operations.
> 3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the 
> interruption. By what I observed that the method seems not very effective for 
> scheduled operation, for example, the on going HMS API calls.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a80004662068eb2391c0dd7062f77156b222375b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> 7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 04fc0a17c93120b8f6e6d7c36e4d70631d56baca 
> 
> 
> Diff: https://reviews.apache.org/r/58456/diff/1/
> 
> 
> Testing
> ---
> 
> Manually tested.
> 
> 
> Thanks,
> 
> Yongzhi Chen
> 
>



Re: Review Request 58456: Query cancel: improve the way to handle files

2017-04-14 Thread Yongzhi Chen


> On April 14, 2017, 5:43 p.m., Aihua Xu wrote:
> > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java
> > Lines 399 (patched)
> > <https://reviews.apache.org/r/58456/diff/1/?file=1692689#file1692689line399>
> >
> > I'm not exactly following what we are doing here. Not sure how 
> > background thread gets closed later.
> > 
> > Otherwise, the other changes look good.

The background thread will complete the task or gracefully closed with the 
guidance of the cancel status. Our current cancel design majorly follows the 
pattern that cancel command set the cancel status, the working 
thread(background thread) check the cancel status and decide to quit or 
continue. The backgroundHandle.cancel(true) does not follow the pattern and 
cause some conflicts. The following warning log is caused by this:
2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
[HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
java.io.InterruptedIOException: Call interrupted


- Yongzhi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58456/#review172009
-------


On April 14, 2017, 1:14 p.m., Yongzhi Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58456/
> ---
> 
> (Updated April 14, 2017, 1:14 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena.
> 
> 
> Bugs: HIVE-16426
> https://issues.apache.org/jira/browse/HIVE-16426
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Use threadlocal variable to store cancel state to make it is accessible 
> without being passed around by parameters. 
> 2. Add checkpoints for file operations.
> 3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the 
> interruption. By what I observed that the method seems not very effective for 
> scheduled operation, for example, the on going HMS API calls.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> a80004662068eb2391c0dd7062f77156b222375b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> 7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 04fc0a17c93120b8f6e6d7c36e4d70631d56baca 
> 
> 
> Diff: https://reviews.apache.org/r/58456/diff/1/
> 
> 
> Testing
> ---
> 
> Manually tested.
> 
> 
> Thanks,
> 
> Yongzhi Chen
> 
>



Review Request 58456: Query cancel: improve the way to handle files

2017-04-14 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58456/
---

Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena.


Bugs: HIVE-16426
https://issues.apache.org/jira/browse/HIVE-16426


Repository: hive-git


Description
---

1. Use threadlocal variable to store cancel state to make it is accessible 
without being passed around by parameters. 
2. Add checkpoints for file operations.
3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the 
interruption. By what I observed that the method seems not very effective for 
scheduled operation, for example, the on going HMS API calls.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
a80004662068eb2391c0dd7062f77156b222375b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
04fc0a17c93120b8f6e6d7c36e4d70631d56baca 


Diff: https://reviews.apache.org/r/58456/diff/1/


Testing
---

Manually tested.


Thanks,

Yongzhi Chen



[jira] [Created] (HIVE-16426) Query cancel: improve the way to handle files

2017-04-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16426:
---

 Summary: Query cancel: improve the way to handle files
 Key: HIVE-16426
 URL: https://issues.apache.org/jira/browse/HIVE-16426
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


1. Add data structure support to make it is easy to check query cancel status.
2. Handle query cancel more gracefully. Remove possible file leaks caused by 
query cancel as shown in following stack:
{noformat}
2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
[HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
java.io.InterruptedIOException: Call interrupted
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
at 
org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
3. Add checkpoints to related file operations to improve response time for 
query cancelling. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 58203: HIVE-16345 BeeLineDriver should be able to run qtest files which are using default database tables

2017-04-05 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58203/#review171166
---




itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java
Lines 130 (patched)
<https://reviews.apache.org/r/58203/#comment244045>

How do you handle the case command has comment following ';' and new 
command start after ;  ? Do these cases matters?
For example:
show tables; --comment

show tables; select * from
src;

The beeline.Commands class has code similar to getCommands:
handleMultiLineCmd, logic in execute
Could you figure out a way to use the some of the code there?



itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java
Lines 160 (patched)
<https://reviews.apache.org/r/58203/#comment244048>

Is that possible the table belong to other database?
For example:
use foo;
select * from tableinfoo;



itests/util/src/main/java/org/apache/hive/beeline/qfile/QFileBeeLineClient.java
Line 92 (original), 90 (patched)
<https://reviews.apache.org/r/58203/#comment244047>

Why we need to replace the tablename with default.tablename? Could you just 
add use default ?


- Yongzhi Chen


On April 5, 2017, 10:35 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58203/
> ---
> 
> (Updated April 5, 2017, 10:35 a.m.)
> 
> 
> Review request for hive, Aihua Xu, Zoltan Haindrich, Yongzhi Chen, and Barna 
> Zsombor Klara.
> 
> 
> Bugs: HIVE-16345
> https://issues.apache.org/jira/browse/HIVE-16345
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The goal of the change is to run qtest files which contain queries on tables 
> created by the init scripts.
> It adds the possibility to rewrite the src table references to default.src
> 
> This patch contains the following changes:
> - Added new parameter to the driver, to control weather the rewrite the table 
> names or not (test.rewrite.source.tables) - default is true
> - Made QTestUtil.getSrcTables() available for QFile class
> - Run the QFile not with "!run testfile.q", but reading the file, and 
> assembling the commands - enable us to parse the queries, and provide better 
> feedback about the failing queries
> - QFile rewrites the source tables, if it is required
> - Used 9 qtest files from the CliDriver, and added them to BeeLine tests
> - Added new filters, and removed redundant ones - I was able to remove every 
> QFile specific filter, and corresponding setter methods as well
> - Moved QFile classes to org.apache.hive.beeline package, so it can use 
> package private methods from BeeLine, and Commands
> - Refactored needsContinuation method in BeeLine, so it can be called from a 
> static context as well
> 
> And one important change is:
> - In Utilities.setMapRedWork, change the INPUT_NAME value in the conf to a 
> mapreduce task specific value. This one is used by the IOContextMap to cache 
> the IOContext objects. Using the same value for every mapred task prevented 
> them to run in the same JVM. The test were running sequencially, but failed 
> randomly in parallel
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 11526a7 
>   itests/src/test/resources/testconfiguration.properties 7a70c9c 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java
>  0d63f5d 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2abf252 
>   itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java ae5a349 
>   
> itests/util/src/main/java/org/apache/hive/beeline/qfile/QFileBeeLineClient.java
>  760fde6 
>   itests/util/src/main/java/org/apache/hive/beeline/qfile/package-info.java 
> fcd50ec 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 79955e9 
>   ql/src/test/results/clientpositive/beeline/drop_with_concurrency.q.out 
> 385f9b7 
>   ql/src/test/results/clientpositive/beeline/escape_comments.q.out abc0fee 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_10.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_11.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_12.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_13.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_16.q.out 
> PRE-C

[jira] [Created] (HIVE-15997) Resource leaks when query is cancelled

2017-02-21 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15997:
---

 Summary: Resource leaks when query is cancelled 
 Key: HIVE-15997
 URL: https://issues.apache.org/jira/browse/HIVE-15997
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


There may some resource leaks when query is cancelled.
We see following stacks in the log:
Possible files and folder leak:
{noformat}
2017-02-02 06:23:25,410 WARN  hive.ql.Context: [HiveServer2-Background-Pool: 
Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: 
"ychencdh511t-1.vpc.cloudera.com":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy25.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy26.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405)
at org.apache.hadoop.hive.ql.Context.clear(Context.java:541)
at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109)
at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
... 35 more

2017-02-02 12:26:52,706 INFO  
org.apache.hive.service.cli.operation.OperationManager: 
[HiveServer2-Background-Pool: Thread-23]: Operation is timed 
out,operation=OperationHandle [opType=EXECUTE_STATEMENT, 
getHandleIdenti

[jira] [Created] (HIVE-15735) In some cases, view objects inside a view do not have parents

2017-01-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15735:
---

 Summary: In some cases, view objects inside a view do not have 
parents
 Key: HIVE-15735
 URL: https://issues.apache.org/jira/browse/HIVE-15735
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This cause Sentry throws "No valid privileges" error:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges.
To reproduce:
Enable sentry:
create table t1( i int);
create view v1 as select * from t1;
create view v2 as select * from v1 union all select * from v1;
If the user does not have read permission on t1 and v1, the query
select * from v2;  
This will fail with:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 User foo does not have privileges for QUERY
 The required privileges: 
Server=server1->Db=database2->Table=v1->action=select; (state=42000,code=4)
Sentry should not check v1's permission, for v1 has at least one parent(v2).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window

2017-01-18 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55623/#review162147
---


Ship it!




Ship It!

- Yongzhi Chen


On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55623/
> ---
> 
> (Updated Jan. 17, 2017, 3:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-15617: Improve the avg performance for Range based window
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java 
> f5f9f7bb8980636fa364001c5508c215b304b9eb 
> 
> Diff: https://reviews.apache.org/r/55623/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window

2017-01-18 Thread Yongzhi Chen


> On Jan. 17, 2017, 3:46 p.m., Yongzhi Chen wrote:
> >

Could you add a test case which range size is 0 for avg ?


- Yongzhi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55623/#review161875
---


On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55623/
> ---
> 
> (Updated Jan. 17, 2017, 3:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-15617: Improve the avg performance for Range based window
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java 
> f5f9f7bb8980636fa364001c5508c215b304b9eb 
> 
> Diff: https://reviews.apache.org/r/55623/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window

2017-01-17 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55623/#review161875
---




ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java (line 
132)
<https://reviews.apache.org/r/55623/#comment233154>

Is that possible sum is not null, numRows == 0 ?


- Yongzhi Chen


On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55623/
> ---
> 
> (Updated Jan. 17, 2017, 3:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-15617: Improve the avg performance for Range based window
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java 
> f5f9f7bb8980636fa364001c5508c215b304b9eb 
> 
> Diff: https://reviews.apache.org/r/55623/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 55479: Improve canceling response time for acquiring locks

2017-01-13 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55479/
---

(Updated Jan. 13, 2017, 5:14 p.m.)


Review request for hive, Aihua Xu and Chaoyu Tang.


Changes
---

New Patch fixed issues found by review


Bugs: HIVE-15572
https://issues.apache.org/jira/browse/HIVE-15572


Repository: hive-git


Description
---

1. Add data structure to pass driverstate
2. Driver state check when acquire locks by zookeeper.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
fd6020b85591ea190aa33ae9f2dc925a38fc7471 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 
721974db03f1f29bdb84f41db317e37a6a78ca32 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 
45ead16560ce7514a1ab6f4ac2de6771582a8a73 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
24fbd9af5fb7be6b238c6ed246e360477d3c47de 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 
20e114776f143715d5820e6a1acb794a9d6de02c 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java 
b2eb99775c220e9ce347fa1cb918ebf4e738eac2 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
ce220a21de01a188da940e4511ee6876d0c15a4a 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java 
ed022d9193f14436ed527f9cbd3df45d48857cf4 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 14d0ef4e27e0518c1bafcbdcde12f09e101a3321 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
e189d383b6d090ce151b6ab30fb240c261430239 

Diff: https://reviews.apache.org/r/55479/diff/


Testing
---

Unit test
Manual test


Thanks,

Yongzhi Chen



Re: Review Request 55479: Improve canceling response time for acquiring locks

2017-01-13 Thread Yongzhi Chen


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> >

Enum DriverState and lock somestime work with code outside, they can not 
totally encapsulated.


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 205
> > <https://reviews.apache.org/r/55479/diff/1/?file=1604041#file1604041line205>
> >
> > I wonder if it might look cleaner if we have an inner class called 
> > DriverState similar to this LockedDriverState. But all driver state related 
> > stuffs such as enum, lock are encapsulated in this class. It provides the 
> > methods for state transition etc.

The lock and state sometimes work with code outside, so it can not fully 
encapsulated.


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java,
> >  line 189
> > <https://reviews.apache.org/r/55479/diff/1/?file=1604049#file1604049line189>
> >
> > as I commented before, the DriverState class might provide this method 
> > for inspecting this state, which looks better.

This is a simplefied condition check to mimize lock time, the strict check 
should be:
lock()
if state is not interrupt
do aquirelock from zookeeper
unlock()

And private boolean isInterrupted() in Driver.java is different from this one. 
In Driver.java it interrupts current thread, here we do not
So if we encapsulate the method, we lose the flexible.


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java, line 184
> > <https://reviews.apache.org/r/55479/diff/1/?file=1604044#file1604044line184>
> >
> > Race condition here:
> > if hiveLocks == null was caused by the interruption, but when the code 
> > executes this step, the state was just changed to be interrupted, then the 
> > exception msg will not be right.

I thought about this, in our code we sacrify race condition a little bit to 
improve performance. The worst case is the error message becomes:
Locks on the underlying objects cannot be acquired. Other wise, the lock for 
driverstate has to be locked the whole acquirelock method.


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 1134
> > <https://reviews.apache.org/r/55479/diff/1/?file=1604041#file1604041line1134>
> >
> > nit: need a space between userFromUGI,lDrvState

The new patch will fix the issue.


> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java, line 454
> > <https://reviews.apache.org/r/55479/diff/1/?file=1604042#file1604042line454>
> >
> > should be "Query was cancelled while acquiring locks on the underlying 
> > objects."?

The new patch will fix the issue.


- Yongzhi


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55479/#review161468
---


On Jan. 12, 2017, 11:21 p.m., Yongzhi Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55479/
> ---
> 
> (Updated Jan. 12, 2017, 11:21 p.m.)
> 
> 
> Review request for hive, Aihua Xu and Chaoyu Tang.
> 
> 
> Bugs: HIVE-15572
> https://issues.apache.org/jira/browse/HIVE-15572
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Add data structure to pass driverstate
> 2. Driver state check when acquire locks by zookeeper.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> fd6020b85591ea190aa33ae9f2dc925a38fc7471 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 
> 721974db03f1f29bdb84f41db317e37a6a78ca32 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 
> 45ead16560ce7514a1ab6f4ac2de6771582a8a73 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 24fbd9af5fb7be6b238c6ed246e360477d3c47de 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 
> 20e114776f143715d5820e6a1acb794a9d6de02c 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java 
> b2eb99775c220e9ce347fa1cb918ebf4e738eac2 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> ce220a21de01a188da940e4511ee6876d0c15a4a 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java 
> ed022d9193f14436ed527f9cbd3df45d48857cf4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
>  14d0ef4e27e0518c1bafcbdcde12f09e101a3321 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
> e189d383b6d090ce151b6ab30fb240c261430239 
> 
> Diff: https://reviews.apache.org/r/55479/diff/
> 
> 
> Testing
> ---
> 
> Unit test
> Manual test
> 
> 
> Thanks,
> 
> Yongzhi Chen
> 
>



[jira] [Created] (HIVE-15615) Fix unit tests failures cause by HIVE-13696

2017-01-13 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15615:
---

 Summary: Fix unit tests failures cause by HIVE-13696
 Key: HIVE-15615
 URL: https://issues.apache.org/jira/browse/HIVE-15615
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Following unit tests failed with same stack:
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters
{noformat}
2017-01-11T15:02:27,774 ERROR [main] ql.Driver: FAILED: NullPointerException 
null
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.cleanName(QueuePlacementRule.java:351)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$User.getQueueForApp(QueuePlacementRule.java:132)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
at 
org.apache.hadoop.hive.schshim.FairSchedulerShim.setJobQueueForUserInternal(FairSchedulerShim.java:96)
at 
org.apache.hadoop.hive.schshim.FairSchedulerShim.validateQueueConfiguration(FairSchedulerShim.java:82)
at 
org.apache.hadoop.hive.ql.session.YarnFairScheduling.validateYarnQueue(YarnFairScheduling.java:68)
at org.apache.hadoop.hive.ql.Driver.configureScheduling(Driver.java:671)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:543)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1223)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 55479: Improve canceling response time for acquiring locks

2017-01-12 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55479/
---

Review request for hive, Aihua Xu and Chaoyu Tang.


Bugs: HIVE-15572
https://issues.apache.org/jira/browse/HIVE-15572


Repository: hive-git


Description
---

1. Add data structure to pass driverstate
2. Driver state check when acquire locks by zookeeper.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
fd6020b85591ea190aa33ae9f2dc925a38fc7471 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 
721974db03f1f29bdb84f41db317e37a6a78ca32 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 
45ead16560ce7514a1ab6f4ac2de6771582a8a73 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
24fbd9af5fb7be6b238c6ed246e360477d3c47de 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 
20e114776f143715d5820e6a1acb794a9d6de02c 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java 
b2eb99775c220e9ce347fa1cb918ebf4e738eac2 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
ce220a21de01a188da940e4511ee6876d0c15a4a 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java 
ed022d9193f14436ed527f9cbd3df45d48857cf4 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 14d0ef4e27e0518c1bafcbdcde12f09e101a3321 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
e189d383b6d090ce151b6ab30fb240c261430239 

Diff: https://reviews.apache.org/r/55479/diff/


Testing
---

Unit test
Manual test


Thanks,

Yongzhi Chen



[jira] [Created] (HIVE-15572) Improve the response time for query canceling when it happens during acquiring locks

2017-01-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15572:
---

 Summary: Improve the response time for query canceling when it 
happens during acquiring locks
 Key: HIVE-15572
 URL: https://issues.apache.org/jira/browse/HIVE-15572
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When query canceling command sent during Hive Acquire locks (from zookeeper), 
hive will finish acquiring all the locks and release them. As it is shown in 
the following log:
It took 165 s to finish acquire the lock,then spend 81s to release them.
We can improve the performance by not acquiring any more locks and releasing 
held locks when the query canceling command is received. 

Background-Pool: Thread-224]: 
2017-01-03 10:50:35,413 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-224]: 
2017-01-03 10:51:00,671 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:00,672 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:00,672 ERROR org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-218]: FAILED: query select count(*) from 
manyparttbl has been cancelled
2017-01-03 10:51:00,673 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:40,755 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-215]: 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Invitation for Hive committers to become ORC committers

2016-12-16 Thread Yongzhi Chen
Hi Owen,
I am interested.

Thanks

Yongzhi Chen

On Thu, Dec 15, 2016 at 4:12 PM, Owen O'Malley <omal...@apache.org> wrote:

> All,
>As you are aware, we are in the last stages of removing the forked ORC
> code out of Hive. The goal of moving ORC out of Hive was to increase its
> community and we want to be very deliberately inclusive of the Hive
> development community. Towards that end, the ORC PMC wants to welcome
> anyone who is already a Hive committer to become a committer on ORC.
>
>   Please respond on this thread to let us know if you are interested.
>
> Thanks,
>Owen on behalf of the ORC PMC
>


[jira] [Created] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix

2016-12-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15437:
---

 Summary: avro tables join fails when - tbl join tbl_postfix
 Key: HIVE-15437
 URL: https://issues.apache.org/jira/browse/HIVE-15437
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The following queries return good results:
select * from table1 where col1=key1; 
select * from table1_1 where col1=key1; 
When join them together, it gets following error:
{noformat}
Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found long, 
expecting union
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
{noformat}

The two avro tables both is defined by using avro schema, and the first table's 
name is the second table name's prefix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15391) Location validation for table should ignore the values for view.

2016-12-08 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15391:
---

 Summary: Location validation for table should ignore the values 
for view.
 Key: HIVE-15391
 URL: https://issues.apache.org/jira/browse/HIVE-15391
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 2.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When use schematool to do location validation, we got error message for views, 
for example:
{noformat}
n DB with Name: viewa
NULL Location for TABLE with Name: viewa
In DB with Name: viewa
NULL Location for TABLE with Name: viewb
In DB with Name: viewa
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15359) skip.footer.line.count doesnt work properly for certain situations

2016-12-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15359:
---

 Summary: skip.footer.line.count doesnt work properly for certain 
situations
 Key: HIVE-15359
 URL: https://issues.apache.org/jira/browse/HIVE-15359
 Project: Hive
  Issue Type: Bug
  Components: Reader
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This issue's reproduce is very like HIVE-12718 , but the data file is larger 
than 128M . In this case, even make sure only one mapper is used, the footer is 
still wrongly skipped. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15320) Cross Realm hive query is failing with KERBEROS authentication error

2016-11-30 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15320:
---

 Summary: Cross Realm hive query is failing with KERBEROS 
authentication error
 Key: HIVE-15320
 URL: https://issues.apache.org/jira/browse/HIVE-15320
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Yongzhi Chen


Executing cross realm query and it is failing.
Authentication against remote NN is tried with SIMPLE, not KERBEROS.
It looks Hive does not obtain needed ticket for remote NN.

insert overwrite directory 'hdfs://differentrealmhost:8020/hive/test' select * 
from currentrealmtable where ...;
It will fail with
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS]

hdfs command distcp works fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53966/#review156659
---



The latest patch solved all the issues Illya Yalovyy pointed out, the fix looks 
good. 
+1

- Yongzhi Chen


On Nov. 22, 2016, 10:35 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53966/
> ---
> 
> (Updated Nov. 22, 2016, 10:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15199
> https://issues.apache.org/jira/browse/HIVE-15199
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch helps execute repeated INSERT INTO statements on S3 tables when the 
> scratch directory is on S3.
> 
> 
> Diffs
> -
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q 
> 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out 
> c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe 
> 
> Diff: https://reviews.apache.org/r/53966/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53966/#review156644
---


Ship it!




Ship It!

- Yongzhi Chen


On Nov. 21, 2016, 11:54 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53966/
> ---
> 
> (Updated Nov. 21, 2016, 11:54 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15199
> https://issues.apache.org/jira/browse/HIVE-15199
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch helps execute repeated INSERT INTO statements on S3 tables when the 
> scratch directory is on S3.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 
> 1d8c04160c35e48781b20f8e6e14760c19df9ca5 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q 
> 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out 
> c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe 
> 
> Diff: https://reviews.apache.org/r/53966/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53966/#review156639
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2951)
<https://reviews.apache.org/r/53966/#comment226831>

if (isBlobStoragePath && !destFs.exists(destFilePath)
then the second condition :
!destFs.rename(sourcePath, destFilePath) will be evaluated. I assume you do 
not want that be called, right


- Yongzhi Chen


On Nov. 21, 2016, 11:54 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53966/
> ---
> 
> (Updated Nov. 21, 2016, 11:54 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15199
> https://issues.apache.org/jira/browse/HIVE-15199
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch helps execute repeated INSERT INTO statements on S3 tables when the 
> scratch directory is on S3.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 
> 1d8c04160c35e48781b20f8e6e14760c19df9ca5 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q 
> 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out 
> c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe 
> 
> Diff: https://reviews.apache.org/r/53966/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



[jira] [Created] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15074:
---

 Summary: Schematool provides a way to detect invalid entries in 
VERSION table
 Key: HIVE-15074
 URL: https://issues.apache.org/jira/browse/HIVE-15074
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen
Priority: Minor


For some unknown reason, we see customer's HMS can not start because there are 
multiple entries in their HMS VERSION table. Schematool should provide a way to 
validate the HMS db and provide warning and fix options for this kind of 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15073) Schematool should detect malformed URIs

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15073:
---

 Summary: Schematool should detect malformed URIs
 Key: HIVE-15073
 URL: https://issues.apache.org/jira/browse/HIVE-15073
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen


For some causes(most unknown), HMS DB tables sometimes has invalid entries, for 
example URI missing scheme for SDS table's LOCATION column or DBS's 
DB_LOCATION_URI column. These malformed URIs lead to hard to analyze errors in 
HIVE and SENTRY. Schematool need to provide a command to detect these malformed 
URI, give a warning and provide an option to fix the URIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15072) Schematool should recognize missing tables in metastore

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15072:
---

 Summary: Schematool should recognize missing tables in metastore
 Key: HIVE-15072
 URL: https://issues.apache.org/jira/browse/HIVE-15072
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen


When Install a new database failed half way(for some other reasons), not all of 
the metastore tables are installed. This caused HMS server failed to start up 
due to missing tables. Re-run the Schematool, It ran successfully, and in the 
stdout log said: "Database already has tables. Skipping table creation".
However, restarting HMS getting the same error reporting missing tables.
Schematool should detect missing tables and provide options to go ahead and 
recreate missing tables in the case of new installation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 52835: HIVE-14926: Keep Schema in consistent state where schemaTool fails or succeeds

2016-10-14 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52835/#review152674
---




beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java (line 231)
<https://reviews.apache.org/r/52835/#comment221767>

Is that possible a command has more than one lines?


- Yongzhi Chen


On Oct. 13, 2016, 8:43 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52835/
> ---
> 
> (Updated Oct. 13, 2016, 8:43 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14926: Keep Schema in consistent state where schemaTool fails or succeeds
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java 181f0d2 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java cd36ddf 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestSchemaTool.java 
> 0d5f9c8 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
> 9c30ee7 
> 
> Diff: https://reviews.apache.org/r/52835/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation

2016-10-14 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52559/#review152659
---



The 8th version looks good to me. +1

- Yongzhi Chen


On Oct. 13, 2016, 11:38 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52559/
> ---
> 
> (Updated Oct. 13, 2016, 11:38 p.m.)
> 
> 
> Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and 
> Yongzhi Chen.
> 
> 
> Bugs: HIVE-14799
> https://issues.apache.org/jira/browse/HIVE-14799
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch is going to fix a couple of Driver issues related to the close 
> request from a thread other than the one running the query (e.g. from 
> SQLOperation cancel via Timeout or Ctrl-C):
> 1. Driver is not thread safe and usually supports only one thread at time 
> since it has variables like ctx, plan which are not thread protected. But 
> certain special use cases need access the Driver objects from multiply 
> threads. For example, when a query runs in a background thread, driver.close 
> is invoked in another thread by the query timeout (see HIVE-4924). The close 
> process could nullify the shared variables like ctx which could cause NPE in 
> the other query thread which is using them. This runtime exception is 
> unpredictable and not well handled in the code. Some resources (e.g. locks, 
> files) are left behind and not be cleaned because there are no more available 
> = references to them. In this patch, I use the waiting in the close which 
> makes sure only one thread uses these variables and the resource cleaning 
> happens after the query finished (or interrupted).
> 2. SQLOperation.cancel sends the interrupt signal to the background thread 
> running the query (via backgroundHandle.cancel(true)) but it could not stop 
> that process since there is no code to capture the signal in the process. In 
> another word, current timeout code could not gracefully and promptly stop the 
> query process, though it could eventually stop the process by killing the 
> running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901). 
> So in the patch, I added a couple of checkpoints to intercept the interrupt 
> signal either set by close method (a volatile variable) or thread.interrupt. 
> They should be helpful to capture these signals earlier , though not 
> intermediately.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 
> 
> Diff: https://reviews.apache.org/r/52559/diff/
> 
> 
> Testing
> ---
> 
> Manually tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation

2016-10-13 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52559/#review152556
---



It is not thread safe for releaseDriverContext can be called in compling mode 
from cancel, but seems only null value matters. So use the local 
variable(driverCxt) to avoid NPE after the close() is called from cancel?

- Yongzhi Chen


On Oct. 12, 2016, 4:31 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52559/
> ---
> 
> (Updated Oct. 12, 2016, 4:31 a.m.)
> 
> 
> Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and 
> Yongzhi Chen.
> 
> 
> Bugs: HIVE-14799
> https://issues.apache.org/jira/browse/HIVE-14799
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch is going to fix a couple of Driver issues related to the close 
> request from a thread other than the one running the query (e.g. from 
> SQLOperation cancel via Timeout or Ctrl-C):
> 1. Driver is not thread safe and usually supports only one thread at time 
> since it has variables like ctx, plan which are not thread protected. But 
> certain special use cases need access the Driver objects from multiply 
> threads. For example, when a query runs in a background thread, driver.close 
> is invoked in another thread by the query timeout (see HIVE-4924). The close 
> process could nullify the shared variables like ctx which could cause NPE in 
> the other query thread which is using them. This runtime exception is 
> unpredictable and not well handled in the code. Some resources (e.g. locks, 
> files) are left behind and not be cleaned because there are no more available 
> = references to them. In this patch, I use the waiting in the close which 
> makes sure only one thread uses these variables and the resource cleaning 
> happens after the query finished (or interrupted).
> 2. SQLOperation.cancel sends the interrupt signal to the background thread 
> running the query (via backgroundHandle.cancel(true)) but it could not stop 
> that process since there is no code to capture the signal in the process. In 
> another word, current timeout code could not gracefully and promptly stop the 
> query process, though it could eventually stop the process by killing the 
> running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901). 
> So in the patch, I added a couple of checkpoints to intercept the interrupt 
> signal either set by close method (a volatile variable) or thread.interrupt. 
> They should be helpful to capture these signals earlier , though not 
> intermediately.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 
> 
> Diff: https://reviews.apache.org/r/52559/diff/
> 
> 
> Testing
> ---
> 
> Manually tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 50525: HIVE-14341: Altered skewed location is not respected for list bucketing

2016-09-20 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50525/#review149640
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java (line 234)
<https://reviews.apache.org/r/50525/#comment217314>

Any reason you change the logic from replace(overwrite) to something 
like(insert into)?


- Yongzhi Chen


On Sept. 19, 2016, 9:02 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50525/
> ---
> 
> (Updated Sept. 19, 2016, 9:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14341: Altered skewed location is not respected for list bucketing
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e386717 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java da46854 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
>  ba4f6a7 
>   ql/src/test/queries/clientpositive/create_alter_list_bucketing_table1.q 
> bf89e8f 
>   ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out 
> 216d3be 
> 
> Diff: https://reviews.apache.org/r/50525/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 50525: HIVE-14341: Altered skewed location is not respected for list bucketing

2016-09-20 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50525/#review149635
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java (line 899)
<https://reviews.apache.org/r/50525/#comment217309>

You change old logic here a little bit in following case:
When locationMap has skewedValsCandidate, but
allSkewedVals.contains(skewedValsCandidate) == false

Before your change, it uses defaultKey in locationMap while after the 
change, skewedValsCandidate is used. 
Is that safe?


- Yongzhi Chen


On Sept. 19, 2016, 9:02 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50525/
> ---
> 
> (Updated Sept. 19, 2016, 9:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14341: Altered skewed location is not respected for list bucketing
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e386717 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java da46854 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
>  ba4f6a7 
>   ql/src/test/queries/clientpositive/create_alter_list_bucketing_table1.q 
> bf89e8f 
>   ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out 
> 216d3be 
> 
> Diff: https://reviews.apache.org/r/50525/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



[jira] [Created] (HIVE-14743) ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs

2016-09-13 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14743:
---

 Summary: ArrayIndexOutOfBoundsException - HBASE-backed views' 
query with JOINs
 Key: HIVE-14743
 URL: https://issues.apache.org/jira/browse/HIVE-14743
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The stack:
{noformat}
2016-09-13T09:38:49,972 ERROR [186b4545-65b5-4bfc-bc8e-3e14e251bb12 main] 
exec.Task: Job Submission failed with exception 
'java.lang.ArrayIndexOutOfBoundsException(1)'
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.createFilterScan(HiveHBaseTableInputFormat.java:224)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:492)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:449)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:466)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:356)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:546)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)

{noformat}

Repro:
{noformat}
CREATE TABLE HBASE_TABLE_TEST_1(
  cvalue string ,
  pk string,
 ccount int   )
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'='cf:val,:key,cf2:count',
  'hbase.scan.cache'='500',
  'hbase.scan.cacheblocks'='false',
  'serialization.format'='1')
TBLPROPERTIES (
  'hbase.table.name'='hbase_table_test_1',
  'serialization.null.format'=''  );


  CREATE VIEW VIEW_HBASE_TABLE_TEST_1 AS SELECT 
hbase_table_test_1.cvalue,hbase_table_test_1.pk,hbase_table_test_1.ccount FROM 
hbase_table_test_1 WHERE hbase_table_test_1.ccount IS NOT NULL;

CREATE TABLE HBASE_TABLE_TEST_2(
  cvalue string ,
pk string ,
   ccount int  )
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'='cf:val,:key,cf2:count',
  'hbase.scan.cache'='500',
  'hbase.scan.cacheblocks'='false',
  'serialization.format'='1')
TBLPROPERTIES (
  'hbase.table.name'='hbase_table_test_2',
  'serialization.null.format'='');


CREATE VIEW VIEW_HBASE_TABLE_TEST_2 AS SELECT 
hbase_table_test_2.cvalue,hbase_table_test_2.pk,hbase_table_test_2.ccount FROM 
hbase_table_test_2 WHERE  hbase_table_test_2.pk >='3-h-0' AND 
hbase_table_test_2.pk <= '3-h-g' AND hbase_table_test_2.ccount IS NOT NULL;

set hive.auto.convert.join=false;

  SELECT  p.cvalue cvalue
FROM `VIEW_HBASE_TABLE_TEST_1` `p`
LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_2` `A1`
ON `p`.cvalue = `A1`.cvalue
LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_1` `A2`
ON `p`.cvalue = `A2`.cvalue;

{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14715) Hive throws NumberFormatException with query with Null value

2016-09-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14715:
---

 Summary: Hive throws NumberFormatException with query with Null 
value
 Key: HIVE-14715
 URL: https://issues.apache.org/jira/browse/HIVE-14715
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


The java.lang.NumberFormatException will throw with following reproduce:
set hive.cbo.enable=false;
CREATE TABLE `paqtest`(
`c1` int,
`s1` string,
`s2` string,
`bn1` bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

insert into paqtest values (58, '', 'ABC', 0);

SELECT
'Pricing mismatch' AS category,
c1,
NULL AS itemtype_used,
NULL AS acq_itemtype,
s2,
NULL AS currency_used_avg,
NULL AS acq_items_avg,
sum(bn1) AS cca
FROM paqtest
WHERE (s1 IS NULL OR length(s1) = 0)
GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL;

The stack like following:
java.lang.NumberFormatException: ABC
GroupByOperator.process(Object, int) line: 773  
ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236   
ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, 
RawKeyValueIterator, RawComparator, Class, Class) line: 
444   
ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392
LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319   
Executors$RunnableAdapter.call() line: 471   

It works fine when hive.cbo.enable = true




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14596) Canceling hive query takes very long time

2016-08-22 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14596:
---

 Summary: Canceling hive query takes very long time
 Key: HIVE-14596
 URL: https://issues.apache.org/jira/browse/HIVE-14596
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


when the Hue user clicks cancel, the Hive query does not stop immediately, it 
can take very long time. And in the yarn job history you will see exceptions 
like following:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive/hive/80a5cfdb-9f98-44d2-ae53-332c8dae62a3/hive_2016-08-20_07-06-12_819_8780093905859269639-3/-mr-1/.hive-staging_hive_2016-08-20_07-06-12_819_8780093905859269639-3/_task_tmp.-ext-10001/_tmp.00_0
 (inode 28224): File does not exist. Holder 
DFSClient_attempt_1471630445417_0034_m_00_0_-50732711_1 does not have any 
open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3624)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3427)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3283)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:246)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1007)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:206)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement

2016-08-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14538:
---

 Summary: beeline throws exceptions with parsing hive config when 
using !sh statement
 Key: HIVE-14538
 URL: https://issues.apache.org/jira/browse/HIVE-14538
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When beeline has a connection to a server, in some env it has following problem:
{noformat}
0: jdbc:hive2://localhost> !verbose
verbose: on
0: jdbc:hive2://localhost> !sh id
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost> !sh echo hello
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost>
{noformat}

Also it breaks if there is no connection established:
{noformat}
beeline> !sh id
java.lang.NullPointerException
at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897)
at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14519) Multi insert query bug

2016-08-11 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14519:
---

 Summary: Multi insert query bug
 Key: HIVE-14519
 URL: https://issues.apache.org/jira/browse/HIVE-14519
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When running multi-insert queries, when one of the query is not returning 
results, the other query is not returning the right result.
For example:
After following query, there is no value in /tmp/emp/dir3/00_0
{noformat}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
where 1=2
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{noformat}

where clause in the second insert should not affect the third insert. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14015) SMB MapJoin failed for Hive on Spark when kerberized

2016-06-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14015:
---

 Summary: SMB MapJoin failed for Hive on Spark when kerberized
 Key: HIVE-14015
 URL: https://issues.apache.org/jira/browse/HIVE-14015
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0, 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


java.io.IOException: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
can be issued only with kerberos or web authentication

It could be reproduced:
1) prepare sample data:
a=1
while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data

2) prepare source hive table:
CREATE TABLE `s`(`c` string);
load data local inpath 'data' into table s;

3) prepare the bucketed table:
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS;
insert into t select * from s;

4) reproduce this issue:
SET hive.execution.engine=spark;
SET hive.auto.convert.sortmerge.join = true;
SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = 
org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ;
SET hive.auto.convert.sortmerge.join.noconditionaltask = true;
SET hive.optimize.bucketmapjoin = true;
SET hive.optimize.bucketmapjoin.sortedmerge = true;
select * from t join t t1 on t.c=t1.c;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13991) Union All on view fail with no valid permission on underneath table

2016-06-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13991:
---

 Summary: Union All on view fail with no valid permission on 
underneath table
 Key: HIVE-13991
 URL: https://issues.apache.org/jira/browse/HIVE-13991
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When sentry is enabled. 
create view V as select * from T;
When the user has read permission on view V, but does not have read permission 
on table T,

select * from V union all select * from V 
failed with:
{noformat}
0: jdbc:hive2://> select * from s07view union all select * from s07view 
limit 1;
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 Required privileges for this query: 
Server=server1->Db=default->Table=sample_07->action=select; 
(state=42000,code=4)
{noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13932) Hive SMB Map Join with small set of LIMIT failed with NPE

2016-06-02 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13932:
---

 Summary: Hive SMB Map Join with small set of LIMIT failed with NPE
 Key: HIVE-13932
 URL: https://issues.apache.org/jira/browse/HIVE-13932
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0, 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


1) prepare sample data:
a=1
while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data

2) prepare source hive table:
CREATE TABLE `s`(`c` string);
load data local inpath 'data' into table s;

3) prepare the bucketed table:
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS;
insert into t select * from s;

4) reproduce this issue:
SET hive.auto.convert.sortmerge.join = true;
SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = 
org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ;
SET hive.auto.convert.sortmerge.join.noconditionaltask = true;
SET hive.optimize.bucketmapjoin = true;
SET hive.optimize.bucketmapjoin.sortedmerge = true;
select * from t join t t1 on t.c=t1.c limit 1;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 47787: HIVE-13453: Support ORDER BY and windowing clause in partitioning clause with distinct function

2016-05-25 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47787/#review134784
---




ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
169)
<https://reviews.apache.org/r/47787/#comment199690>

How do you handle countDistinct non windowing case?


- Yongzhi Chen


On May 24, 2016, 6:51 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47787/
> ---
> 
> (Updated May 24, 2016, 6:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-13453: Support ORDER BY and windowing clause in partitioning clause with 
> distinct function
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
> 2f4a94c3796d3aff986eb638246248b75306183c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java 
> 3b54b4998c9efbf34bd9c5b08de55cd7062a0843 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java 
> 5ce72004e03bc19a38bd87ae70f38a0d35c20927 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFunctionDef.java 
> ed6c67156b93d6f9e4b76fb76dfa28c5dee6fd0c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> 3c1ce26b26646a6075b3a661816e8d1b50ffc78e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
> 2825045890de1bcc414197ad3e06e723b9d212f3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java
>  6a62d7cc324286ae9aee95d2d71a688859f8c03f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
> 7b1d6e545cdf35f3b2906621c7b0208bf0433731 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java
>  1a1b570256afff46761daf4ebcf1da5e8f0e4f88 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 
> 858b47ad43fa751e23482e4cb58f77bb9fb16a27 
>   ql/src/test/queries/clientpositive/windowing_distinct.q 
> bb192a7882fda592b3d2ba09a10c2f899aa5e165 
>   ql/src/test/results/clientpositive/windowing_distinct.q.out 
> 074a59498ebebc9e78553f68f59dd00bb51f4792 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
>  c58e8ed05453c78cbe2e4daf0b7afa51adbc0ce9 
> 
> Diff: https://reviews.apache.org/r/47787/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 47040: Monitor changes to FairScheduler.xml file and automatically update / validate jobs submitted to fair-scheduler

2016-05-17 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47040/#review133592
---




ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 533)
<https://reviews.apache.org/r/47040/#comment198119>

This if statement is duplicate with the Precondition. If you want to throw 
exception,only use Precondition, otherwise, just use if statement. Use both 
will end up checking the same condition twice.


- Yongzhi Chen


On May 14, 2016, 5:51 p.m., Reuben Kuhnert wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47040/
> ---
> 
> (Updated May 14, 2016, 5:51 p.m.)
> 
> 
> Review request for hive, Lenni Kuff, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: HIVE-13696
> https://issues.apache.org/jira/browse/HIVE-13696
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Ensure that jobs sent to YARN with impersonation off are correctly routed to 
> the proper queue based on fair-scheduler.xml. Monitor this file for changes 
> and validate that jobs can only be sent to queues authorized for the user.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 3fecc5c4ca2a06a031c0c4a711fb49e757c49062 
>   ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> a0015ebc655931f241b28c53fbb94cfe172841b1 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/SchedulerShim.java 
> 63803b8b0752745bd2fedaccc5d100befd97093b 
>   shims/scheduler/pom.xml b36c12325c588cdb609c6200b1edef73a2f79552 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerQueueAllocator.java
>  PRE-CREATION 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java
>  372244dc3c989d2a3ae2eb2bfb8cd0a235705e18 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/QueueAllocator.java
>  PRE-CREATION 
>   
> shims/scheduler/src/test/java/org/apache/hadoop/hive/schshim/TestFairSchedulerQueueAllocator.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/47040/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Reuben Kuhnert
> 
>



[jira] [Created] (HIVE-13632) Hive failing on insert empty array into parquet table

2016-04-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13632:
---

 Summary: Hive failing on insert empty array into parquet table
 Key: HIVE-13632
 URL: https://issues.apache.org/jira/browse/HIVE-13632
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The insert will fail with following stack:
{noformat}
by: parquet.io.ParquetEncodingException: empty fields are illegal, the field 
should be ommited completely instead
at 
parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:271)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:271)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at 
parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
at 
parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697)
{noformat}
Reproduce:
{noformat}
create table test_small (
key string,
arrayValues array)
stored as parquet;
insert into table test_small select 'abcd', array() from src limit 1;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13570:
---

 Summary: Some query with Union all fails when CBO is off
 Key: HIVE-13570
 URL: https://issues.apache.org/jira/browse/HIVE-13570
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries with union all throws IndexOutOfBoundsException
when:
set hive.cbo.enable=false;
set hive.ppd.remove.duplicatefilters=true;
The stack is as:
{noformat}
{code} 
java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
at java.util.ArrayList.get(ArrayList.java:411) 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
 
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13200) Aggregation functions returning empty rows on partitioned columns

2016-03-03 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13200:
---

 Summary: Aggregation functions returning empty rows on partitioned 
columns
 Key: HIVE-13200
 URL: https://issues.apache.org/jira/browse/HIVE-13200
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 2.0.0, 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Running aggregation functions like MAX, MIN, DISTINCT against partitioned 
columns will return empty rows if table has property: 
'skip.header.line.count'='1'
Reproduce:
{noformat}
DROP TABLE IF EXISTS test;

CREATE TABLE test (a int) 
PARTITIONED BY (b int) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
TBLPROPERTIES('skip.header.line.count'='1');

INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4);
INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4);

SELECT * FROM test;

SELECT DISTINCT b FROM test;
SELECT MAX(b) FROM test;
SELECT DISTINCT a FROM test;
{noformat}

The output:
{noformat}
0: jdbc:hive2://localhost:1/default> SELECT * FROM test;
+-+-+--+
| test.a  | test.b  |
+-+-+--+
| 2   | 1   |
| 3   | 1   |
| 4   | 1   |
| 2   | 2   |
| 3   | 2   |
| 4   | 2   |
+-+-+--+
6 rows selected (0.631 seconds)

0: jdbc:hive2://localhost:1/default> SELECT DISTINCT b FROM test;
++--+
| b  |
++--+
++--+
No rows selected (47.229 seconds)

0: jdbc:hive2://localhost:1/default> SELECT MAX(b) FROM test;
+---+--+
|  _c0  |
+---+--+
| NULL  |
+---+--+
1 row selected (49.508 seconds)

0: jdbc:hive2://localhost:1/default> SELECT DISTINCT a FROM test;
++--+
| a  |
++--+
| 2  |
| 3  |
| 4  |
++--+
3 rows selected (46.859 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13065) Hive throws NPE when writing map type data to a HBase backed table

2016-02-16 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13065:
---

 Summary: Hive throws NPE when writing map type data to a HBase 
backed table
 Key: HIVE-13065
 URL: https://issues.apache.org/jira/browse/HIVE-13065
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Hive throws NPE when writing data to a HBase backed table with below conditions:

# There is a map type column
# The map type column has NULL in its values

Below are the reproduce steps:

*1) Create a HBase backed Hive table*
{code:sql}
create table hbase_test (id bigint, data map<string, string>)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,cf:map_col")
tblproperties ("hbase.table.name" = "hive_test");
{code}

*2) insert data into above table*
{code:sql}
insert overwrite table hbase_test select 1 as id, map('abcd', null) as data 
from src limit 1;
{code}

The mapreduce job for insert query fails. Error messages are as below:
{noformat}
2016-02-15 02:26:33,225 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) 
{"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:731)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:666)
... 14 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:221)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:236)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:275)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:222)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
... 15 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13039) BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table

2016-02-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13039:
---

 Summary: BETWEEN predicate is not functioning correctly with 
predicate pushdown on Parquet table
 Key: HIVE-13039
 URL: https://issues.apache.org/jira/browse/HIVE-13039
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 1.2.1, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


BETWEEN becomes exclusive in parquet table when predicate pushdown is on (as it 
is by default in newer Hive versions). To reproduce(in a cluster, not local 
setup):
CREATE TABLE parquet_tbl(
  key int,
  ldate string)
 PARTITIONED BY (
 lyear string )
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
 STORED AS INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

insert overwrite table parquet_tbl partition (lyear='2016') select
  1,
  '2016-02-03' from src limit 1;

set hive.optimize.ppd.storage = true;
set hive.optimize.ppd = true;
select * from parquet_tbl where ldate between '2016-02-03' and '2016-02-03';





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12795) Vectorized execution causes ClassCastException

2016-01-06 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12795:
---

 Summary: Vectorized execution causes ClassCastException
 Key: HIVE-12795
 URL: https://issues.apache.org/jira/browse/HIVE-12795
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In some hive versions, when
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = true;

Some join queries fail with ClassCastException:
The stack:
{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableStringObjectInspector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:419)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1102)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:55)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more

{noformat}
It can not be reproduced in hive 2.0 and 1.3 because of different code path. 
Reproduce:
{noformat}

CREATE TABLE test1
 (
   id string)
   PARTITIONED BY (
  cr_year bigint,
  cr_month bigint)
 ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
TBLPROPERTIES (
  'serialization.null.format'='' );
  
  CREATE TABLE test2(
id string
  )
   PARTITIONED BY (
  cr_year bigint,
  cr_month bigint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
TBLPROPERTIES (
  'serialization.null.format'=''
 );
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = true;
 SELECT cr.id1 ,
cr.id2 
FROM
(SELECT t1.id id1,
 t2.id id2
 from
 (select * from test1 ) t1
 left outer join test2  t2
 on t1.id=t2.id) cr;

{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12784) Group by SemanticException: Invalid column reference

2016-01-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12784:
---

 Summary: Group by SemanticException: Invalid column reference
 Key: HIVE-12784
 URL: https://issues.apache.org/jira/browse/HIVE-12784
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries work fine in older versions throws SemanticException, the stack 
trace:

{noformat}
FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 
'key2'
15/12/21 18:56:44 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 
10002]: Line 96:1 Invalid column reference 'key2'
org.apache.hadoop.hive.ql.parse.SemanticException: Line 96:1 Invalid column 
reference 'key2'
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4228)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5670)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9007)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9884)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9777)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10250)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10261)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10141)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}
Reproduce:
{noformat}
create table tlb (key int, key1 int, key2 int);
create table src (key int, value string);
select key, key1, key2 from (select a.key, 0 as key1 , 0 as key2 from tlb a 
inner join src b on a.key = b.key) a group by key, key1, key2;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2015-12-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12646:
---

 Summary: beeline and HIVE CLI do not parse ; in quote properly
 Key: HIVE-12646
 URL: https://issues.apache.org/jira/browse/HIVE-12646
 Project: Hive
  Issue Type: Bug
  Components: CLI, Clients
Reporter: Yongzhi Chen
Assignee: Vaibhav Gumashta


Beeline and Cli have to escape ; in the quote while most other shell scripts 
need not. For example:
in Beeline:
{noformat}
0: jdbc:hive2://localhost:1> select ';' from tlb1;
select ';' from tlb1;
15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
Error: Error while compiling statement: FAILED: ParseException line 1:8 cannot 
recognize input near '' '
{noformat}
while in mysql shell:
{noformat}
mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
++
| ;foo   |
| ;foo   |
| ;foo   |
++
3 rows in set (0.00 sec)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12378:
---

 Summary: Exception on HBaseSerDe.serialize binary field
 Key: HIVE-12378
 URL: https://issues.apache.org/jira/browse/HIVE-12378
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, Serializers/Deserializers
Affects Versions: 1.1.0, 1.0.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


An issue was reproduced with the binary typed HBase columns in Hive:

It works fine as below:
CREATE TABLE test9 (key int, val string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:val#b"
);
insert into test9 values(1,"hello");

But when string type is changed to binary as:
CREATE TABLE test2 (key int, val binary)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:val#b"
);
insert into table test2 values(1, 'hello');

The following exception is thrown:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"}
...
Caused by: java.lang.RuntimeException: Hive internal error.
at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
... 16 more

We should support hive binary type column for hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12189:
---

 Summary: The list in pushdownPreds of ppd.ExprWalkerInfo should 
not be allowed to grow very large
 Key: HIVE-12189
 URL: https://issues.apache.org/jira/browse/HIVE-12189
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries are very slow in compile time, for example following query
{noformat}
select * from tt1 nf 
join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
a2.hdp_databaseid = nf.hdp_databaseid) 
join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
a5.hdp_databaseid = nf.hdp_databaseid) 
JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid = 
nf.hdp_databaseid) 
JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid = 
nf.hdp_databaseid)
where nf.hdp_databaseid = 102 limit 10;
{noformat}
takes around 120 seconds to compile in hive 1.1 when
hive.mapred.mode=strict;
hive.optimize.ppd=true;
and hive is not in test mode.
All the above tables are tables with one column as partition. But all the 
tables are empty table. If the tables are not empty, it is reported that the 
compile so slow that it looks like hive is hanging. 
In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
still a lot of time. One of the problem slows ppd down is that list in 
pushdownPreds can grow very large which makes extractPushdownPreds bad 
performance:
{noformat}
public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
Operator op, List preds)
{noformat}
During run the query above, in the following break point preds  has size of 
12051, and most entry of the list is: GenericUDFOPEqual(Column[hdp_databaseid], 
Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
Following code in extractPushdownPreds will clone all the nodes in preds and do 
the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much faster, 
but we still clone thousands of nodes with same expression. Should we store so 
many same predicates in the list or just one is good enough?  

{noformat}
List startNodes = new ArrayList();
List clonedPreds = new ArrayList();
for (ExprNodeDesc node : preds) {
  ExprNodeDesc clone = node.clone();
  clonedPreds.add(clone);
  exprContext.getNewToOldExprMap().put(clone, node);
}
startNodes.addAll(clonedPreds);

egw.startWalking(startNodes, null);

{noformat}

Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
method 
public void addFinalCandidate(String alias, ExprNodeDesc expr) 
and
public void addPushDowns(String alias, List pushDowns) 

to only add expr which is not in the PushDown list for an alias?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12058) Change hive script to record errors when calling hbase fails

2015-10-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12058:
---

 Summary: Change hive script to record errors when calling hbase 
fails
 Key: HIVE-12058
 URL: https://issues.apache.org/jira/browse/HIVE-12058
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 1.1.0, 0.14.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


By default hive will try to find out which jars need to be added to the 
classpath in order to run MR jobs against an HBase cluster, however if hbase 
can't be found or if hbase mapredcp fails, the hive script  will fail silently 
and ignore some of the jars to be included into the. That makes very difficult 
to analyze the real problem.
Hive script should record the error not just simply redirect two hbase failures:
HBASE_BIN=$
{HBASE_BIN:-"$(which hbase 2>/dev/null)"}
$HBASE_BIN mapredcp 2>/dev/null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false

2015-10-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12008:
---

 Summary: Make last two tests added by HIVE-11384 pass when 
hive.in.test is false
 Key: HIVE-12008
 URL: https://issues.apache.org/jira/browse/HIVE-12008
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The last two qfile unit tests fail when hive.in.test is false. It may relate 
how we handle prunelist for select. When select include every column in a 
table, the prunelist for the select is empty. It may cause issues to calculate 
its parent's prunelist.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 38946: Need review the fix for HIVE-11973

2015-10-01 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38946/
---

Review request for hive, Chao Sun, Chaoyu Tang, and Szehon Ho.


Repository: hive-git


Description
---

HIVE-11973: IN operator fails when the column type is DATE


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
218b2df3e6bf4d8094d01cf0c78934324a04f1b1 
  ql/src/test/queries/clientpositive/selectindate.q PRE-CREATION 
  ql/src/test/results/clientpositive/selectindate.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/38946/diff/


Testing
---

Add new qfile test for the issue and run pre-commit build


Thanks,

Yongzhi Chen



[jira] [Created] (HIVE-11982) Some test case for union all with recent changes

2015-09-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11982:
---

 Summary: Some test case for union all with recent changes
 Key: HIVE-11982
 URL: https://issues.apache.org/jira/browse/HIVE-11982
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The tests throw java.lang.IndexOutOfBoundsException again. 
It was supposed to be fixed by HIVE-11271



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working

2015-09-13 Thread Yongzhi Chen


> On Sept. 12, 2015, 1:23 a.m., Szehon Ho wrote:
> > I dont know if you saw in the earlier comments, please add a test to the 
> > file 'FolderPermissionBase' to verify permission inheritance works with the 
> > feature.

Sorry, overlooked the comments. I added a test to cover this. 
The fix respect original design: destination partition folder inheritance 
original partition folder's permission.
For the intermediate folders between destination partition folder and base 
table folder, if they do not exist, the permission inherit from base table 
folder's(the same behavior as when add a new partition), otherwise keep their 
original permission.


- Yongzhi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38216/#review98723
---


On Sept. 12, 2015, 4:07 a.m., Yongzhi Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38216/
> ---
> 
> (Updated Sept. 12, 2015, 4:07 a.m.)
> 
> 
> Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-11745
> https://issues.apache.org/jira/browse/HIVE-11745
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Alter table Exchange partition with multiple partition_spec does not work in 
> cluster mode because in rename, the parent folder for destination path does 
> not physically exist. Some files system(hdfs for instance) does not 
> support(or allow) this. Fix by create parent folder first.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/FolderPermissionBase.java
>  f28edc66ea4644c5847ee6abe2e26306f9fbb43e 
>   itests/src/test/resources/testconfiguration.properties 
> bed621d3eb74f01e54110552f68538afd228018d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 1840e76cc567e95e1942d912b8ab0db516d63a3b 
>   ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION 
>   ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/38216/diff/
> 
> 
> Testing
> ---
> 
> Add minimr unit test.
> 
> 
> Thanks,
> 
> Yongzhi Chen
> 
>



[jira] [Created] (HIVE-11801) In HMS HA env, "show databases" fails when"current" HMS is stopped.

2015-09-11 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11801:
---

 Summary: In HMS HA env, "show databases" fails when"current" HMS 
is stopped.
 Key: HIVE-11801
 URL: https://issues.apache.org/jira/browse/HIVE-11801
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.1.0, 1.2.0, 0.14.0, 2.0.0
        Reporter: Yongzhi Chen
        Assignee: Yongzhi Chen


Reproduce steps:
# Enable HMS HA on a cluster
# Use beeline to connect to HS2 and execute command {{show databases}}. Don't 
quit beeline after command has finished
# Stop the first HMS in configuration {{hive.metastore.uri}}
# Execute {{show databases}} in beeline again. Will get below error:
{noformat}
MetaException(message:Got exception: 
org.apache.thrift.transport.TTransportException java.net.SocketException: 
Broken pipe)
{noformat}

The error message in HS2 is as below:
{noformat}
2015-09-08 12:06:53,236 ERROR hive.log: Got exception: 
org.apache.thrift.transport.TTransportException java.net.SocketException: 
Broken pipe
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Broken pipe
at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at 
org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:501)
at 
org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.TFilterTransport.flush(TFilterTransport.java:77)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_databases(ThriftHiveMetastore.java:692)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:684)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:964)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1909)
at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
at 
org.apache.hive.service.cli.operation.GetSchemasOperation.runInternal(GetSchemasOperation.java:59)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:462)
at 
org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:296)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:534)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
... 31 more
2015-09-08 12:06:53,238 ERROR hive.log: Converting exception to MetaException
2015-09-08 12:06:53,238 WARN 
org.apache.hive.service.cli.thrift.ThriftCLIService: Error getting schemas:
org.apache.hive.service.cli.HiveSQLException: MetaException(message:Got 
exception: org.apache.thrift.transport

Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working

2015-09-11 Thread Yongzhi Chen


> On Sept. 10, 2015, 6:02 p.m., Szehon Ho wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
> > line 2552
> > <https://reviews.apache.org/r/38216/diff/1/?file=1065987#file1065987line2552>
> >
> > I think this whole method can be moved to FileUtils for organization.  
> > Also please check if there's any method there already.
> 
> Yongzhi Chen wrote:
> I think it may be better as a private method in the HiveMetaStore class 
> for it will using its private variable wh (hdfs warehouse) .
> 
> Szehon Ho wrote:
> Actually looking more into the code, this method should not be necessary. 
>  You can just call wh.mkdirs directly.  The underlying FileSystem.mkdirs has 
> the same semantics as -p, there should be no file system that violates this.  
> If there were, many other partition codes would break..

Thanks Szehon. As you pointed out and the name of the function, the wh.mkdirs 
should be the same as mkdir -p in all the filesystem. I have worried too much. 
The third patch remove the createFullPath method and use mkdirs directly. I 
also add a new test case to cover the case when more than one intermediate dirs 
are missing.


- Yongzhi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38216/#review98435
-------


On Sept. 11, 2015, 1:02 p.m., Yongzhi Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38216/
> ---
> 
> (Updated Sept. 11, 2015, 1:02 p.m.)
> 
> 
> Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-11745
> https://issues.apache.org/jira/browse/HIVE-11745
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Alter table Exchange partition with multiple partition_spec does not work in 
> cluster mode because in rename, the parent folder for destination path does 
> not physically exist. Some files system(hdfs for instance) does not 
> support(or allow) this. Fix by create parent folder first.
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 
> bed621d3eb74f01e54110552f68538afd228018d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 1840e76cc567e95e1942d912b8ab0db516d63a3b 
>   ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION 
>   ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/38216/diff/
> 
> 
> Testing
> ---
> 
> Add minimr unit test.
> 
> 
> Thanks,
> 
> Yongzhi Chen
> 
>



Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working

2015-09-11 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38216/
---

(Updated Sept. 12, 2015, 4:07 a.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Changes
---

add test to test permission inheritance.


Bugs: HIVE-11745
https://issues.apache.org/jira/browse/HIVE-11745


Repository: hive-git


Description
---

Alter table Exchange partition with multiple partition_spec does not work in 
cluster mode because in rename, the parent folder for destination path does not 
physically exist. Some files system(hdfs for instance) does not support(or 
allow) this. Fix by create parent folder first.


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/FolderPermissionBase.java
 f28edc66ea4644c5847ee6abe2e26306f9fbb43e 
  itests/src/test/resources/testconfiguration.properties 
bed621d3eb74f01e54110552f68538afd228018d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1840e76cc567e95e1942d912b8ab0db516d63a3b 
  ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION 
  ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/38216/diff/


Testing
---

Add minimr unit test.


Thanks,

Yongzhi Chen



Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working

2015-09-11 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38216/
---

(Updated Sept. 11, 2015, 1:02 p.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-11745
https://issues.apache.org/jira/browse/HIVE-11745


Repository: hive-git


Description
---

Alter table Exchange partition with multiple partition_spec does not work in 
cluster mode because in rename, the parent folder for destination path does not 
physically exist. Some files system(hdfs for instance) does not support(or 
allow) this. Fix by create parent folder first.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 
bed621d3eb74f01e54110552f68538afd228018d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1840e76cc567e95e1942d912b8ab0db516d63a3b 
  ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION 
  ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/38216/diff/


Testing
---

Add minimr unit test.


Thanks,

Yongzhi Chen



Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working

2015-09-10 Thread Yongzhi Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38216/
---

(Updated Sept. 10, 2015, 7:36 p.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-11745
https://issues.apache.org/jira/browse/HIVE-11745


Repository: hive-git


Description
---

Alter table Exchange partition with multiple partition_spec does not work in 
cluster mode because in rename, the parent folder for destination path does not 
physically exist. Some files system(hdfs for instance) does not support(or 
allow) this. Fix by create parent folder first.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 
bed621d3eb74f01e54110552f68538afd228018d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1840e76cc567e95e1942d912b8ab0db516d63a3b 
  ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION 
  ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/38216/diff/


Testing
---

Add minimr unit test.


Thanks,

Yongzhi Chen



[jira] [Created] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-04 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11745:
---

 Summary: Alter table Exchange partition with multiple 
partition_spec is not working
 Key: HIVE-11745
 URL: https://issues.apache.org/jira/browse/HIVE-11745
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Single partition works, but multiple partitions will not work.
Reproduce steps:
{noformat}
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
DROP TABLE IF EXISTS t3;
DROP TABLE IF EXISTS t4;

CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);

INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 10;
INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM jsmall 
LIMIT 10;

SELECT * FROM t1;

SELECT * FROM t3;

ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
SELECT * FROM t1;
SELECT * FROM t2;

ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
SELECT * FROM t3;
SELECT * FROM t4;
{noformat}
The output:
{noformat}
0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
+---+++--+
| t3.a  | t3.d1  | t3.d2  |
+---+++--+
+---+++--+
No rows selected (0.227 seconds)
0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
+---+++--+
| t4.a  | t4.d1  | t4.d2  |
+---+++--+
+---+++--+
No rows selected (0.266 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-19 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11604:
---

 Summary: HIVE return wrong results in some queries with PTF 
function
 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.1.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Following query returns empty result which is not right:
{noformat}
select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}

After remove row_number() over (partition by id, fkey) as rnum from query, the 
right result returns.

Reproduce:
{noformat}
create table tlb1 (id int, fkey int, val string);
create table tlb2 (fid int, name string);
insert into table tlb1 values(100,1,'abc');
insert into table tlb1 values(200,1,'efg');
insert into table tlb2 values(1, 'key1');

select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;

INFO  : Ended Job = job_local1070163923_0017
+-+---+---+--+
No rows selected (14.248 seconds)
| ddd.id  | ddd.fkey  | aaa.name  |
+-+---+---+--+
+-+---+---+--+

0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey 
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
0: jdbc:hive2://localhost:1 from (
0: jdbc:hive2://localhost:1 select id, fkey 
0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
0: jdbc:hive2://localhost:1  ) ddd 
0: jdbc:hive2://localhost:1 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
...
INFO  : Ended Job = job_local672340505_0019
+-+---+---+--+
2 rows selected (14.383 seconds)
| ddd.id  | ddd.fkey  | aaa.name  |
+-+---+---+--+
| 100 | 1 | key1  |
| 200 | 1 | key1  |
+-+---+---+--+

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11502) Map side aggregation is extremely slow

2015-08-08 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11502:
---

 Summary: Map side aggregation is extremely slow
 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


For the query as following:
{noformat}
create table tbl2 as 
select col1, max(col2) as col2 
from tbl1 group by col1;
{noformat}
If the column for group by has many different values (for example 40), the 
map side aggregation is very slow. I ran the query which took more than 3 hours 
, after 3 hours, I have to kill the query.
The same query can finish in 7 seconds, if I turn off map side aggregation by:
{noformat}
set hive.map.aggr = false;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11380) NPE when FileSinkOperator is not inialized

2015-07-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11380:
---

 Summary: NPE when FileSinkOperator is not inialized
 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When FileSinkOperator's initializeOp is not called (which may happen when an 
operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
throw NPE at close time. The stacktrace:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)

... 18 more
{noformat}
This Exception is misleading and often distracts users from finding real 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11384:
---

 Summary: Add Test case which cover both HIVE-11271 and HIVE-11333
 Key: HIVE-11384
 URL: https://issues.apache.org/jira/browse/HIVE-11384
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer, Parser
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11319) CTAS with location qualifier overwrites directories

2015-07-20 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11319:
---

 Summary: CTAS with location qualifier overwrites directories
 Key: HIVE-11319
 URL: https://issues.apache.org/jira/browse/HIVE-11319
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


CTAS with location clause acts as an insert overwrite. This can cause problems 
when there sub directories with in a directory.
This cause some users accidentally wipe out directories with very important 
data. We should  bind CTAS with location to a non-empty directory. 

Reproduce:
create table ctas1  
location '/Users/ychen/tmp' 
as 
select * from jsmall limit 10;

create table ctas2  
location '/Users/ychen/tmp' 
as 
select * from jsmall limit 5;

Both creates will succeed. But value in table ctas1 will be replaced by ctas2 
accidentally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11271:
---

 Summary: java.lang.IndexOutOfBoundsException when union all with 
if function
 Key: HIVE-11271
 URL: https://issues.apache.org/jira/browse/HIVE-11271
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen


Some queries with Union all as subquery fail in MapReduce task with stacktrace:
{noformat}
15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
operator UNION[104]
15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
complete.
15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
job_local826862759_0005
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 10 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
... 21 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
... 21 more

{noformat}

Reproduce:

{noformat}
create table if not exists union_all_bug_test_1 
( 
f1 int,
f2 int
); 

create table if not exists union_all_bug_test_2 
( 
f1 int 
); 

SELECT f1 
FROM ( 

SELECT 
f1 
, if('helloworld' like '%hello%' ,f1,f2) as filter 
FROM union_all_bug_test_1 

union all 

select 
f1 
, 0 as filter 
from union_all_bug_test_2 
) A 
WHERE (filter = 1); 

{noformat}




--
This message

  1   2   >