[jira] [Created] (HIVE-25757) Use cached database type to choose metastore backend queries

2021-12-01 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25757:
---

 Summary: Use cached database type to choose metastore backend 
queries
 Key: HIVE-25757
 URL: https://issues.apache.org/jira/browse/HIVE-25757
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In HIVE-21075, we use DatabaseProduct.determineDatabaseProduct which can be 
expensive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25238:
---

 Summary: Make excluded SSL cipher suites configurable for Hive Web 
UI and HS2
 Key: HIVE-25238
 URL: https://issues.apache.org/jira/browse/HIVE-25238
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Web UI
Reporter: Yongzhi Chen


When starting a jetty http server, one can explicitly exclude certain (unsecure)
SSL cipher suites. This can be especially important, when Hive
needs to be compliant with security regulations. Need add properties to support 
Hive WebUi and HiveServer2 to this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25211) Create database throws NPE

2021-06-07 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25211:
---

 Summary: Create database throws NPE
 Key: HIVE-25211
 URL: https://issues.apache.org/jira/browse/HIVE-25211
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Yongzhi Chen


<11>1 2021-06-06T17:32:48.964Z 
metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local 
metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 
class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] 
MetaException(message:java.lang.NullPointerException)

at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
at com.sun.proxy.$Proxy31.create_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480)
at 
org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1618)
... 21 more




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24552) Possible HMS connections leak or accumulation in loadDynamicPartitions

2020-12-20 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24552:
---

 Summary: Possible HMS connections leak or accumulation in 
loadDynamicPartitions
 Key: HIVE-24552
 URL: https://issues.apache.org/jira/browse/HIVE-24552
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When loadDynamicPartitions (Hive.java) is called, it generates several threads 
to handle FileMove. These threads may generate HiveMetaStore connections. These 
connections may not be closed in time and cause many accumulated connections. 
Following is the log got from running insert overwrites many times, you can see 
these threads created new HMS connections, and the total number of open 
connections is large. And the finalizer closes the connections and sometimes 
had errors:
{noformat}
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] Opened a connection to metastore, 
current connections: 44021
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] Connected to metastore.
<14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-14"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] Opened a connection to metastore, 
current connections: 44022
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] Connected to metastore.
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-5"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] Opened a connection to metastore, 
current connections: 44023
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] Connected to metastore.
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.RetryingMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-6"] RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 
lifetime=0
<14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" 
thread="load-dynamic-partitionsToAdd-3"] Opened a connection to metastore, 
current connections: 44024


<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43904
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43903
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43902
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a 
connection to metastore, current connections: 43901
<14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 
a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 
class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] 

[jira] [Created] (HIVE-24392) Send table id in get_parttions_by_names_req api

2020-11-16 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24392:
---

 Summary: Send table id in get_parttions_by_names_req api
 Key: HIVE-24392
 URL: https://issues.apache.org/jira/browse/HIVE-24392
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Table id is not part of the get_partitions_by_names_req API thrift definition, 
add it by this Jira



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24292:
---

 Summary: hive webUI should support keystoretype by config
 Key: HIVE-24292
 URL: https://issues.apache.org/jira/browse/HIVE-24292
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24253) HMS needs to support keystore/truststores types besides JKS

2020-10-09 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24253:
---

 Summary: HMS needs to support keystore/truststores types besides 
JKS
 Key: HIVE-24253
 URL: https://issues.apache.org/jira/browse/HIVE-24253
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
the default keystore type specified for the JDK and not always use JKS. Same as 
HIVE-23958 for hive, HMS should support to set additional keystore/truststore 
types used for different applications like for FIPS crypto algorithms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24236) Connection leak in TxnHandler

2020-10-06 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-24236:
---

 Summary: Connection leak in TxnHandler
 Key: HIVE-24236
 URL: https://issues.apache.org/jira/browse/HIVE-24236
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


We see failures in QE tests with cannot allocate connections errors. The 
exception stack like following:
{noformat}
2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler 
(TxnHandler.java:checkRetryable(3733)) - Non-retryable error in 
heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, 
general error (SQLState=null, ErrorCode=0)
2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to 
select from transaction database org.apache.commons.dbcp.SQLNestedException: 
Cannot get a connection, general error
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1112)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 29 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2747)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
{noformat}

and
{noformat}
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1134)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 53 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:3375)
at 
org.apache.hadoop.hive.metastore.AcidEventListener.onDropTable(AcidEventListener.java:65)
at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$19.notify(MetaStoreListenerNotifier.java:103)
at 

[jira] [Created] (HIVE-22461) NPE Metastore Transformer

2019-11-05 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-22461:
---

 Summary: NPE Metastore Transformer
 Key: HIVE-22461
 URL: https://issues.apache.org/jira/browse/HIVE-22461
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.1.2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The stack looks as following:
{noformat}
2019-10-08 18:09:12,198 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[pool-6-thread-328]: Starting translation for processor 
Hiveserver2#3.1.2000.7.0.2.0...@vc0732.halxg.cloudera.com on list 1
2019-10-08 18:09:12,198 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-328]: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy28.get_table_req(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2019-10-08 18:09:12,199 ERROR org.apache.thrift.server.TThreadPoolServer: 
[pool-6-thread-328]: Error occurred during processing of message.
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617)
 ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59]
at 

[jira] [Created] (HIVE-21840) Hive Metastore Translation: Bucketed table Readonly capability

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21840:
---

 Summary: Hive Metastore Translation: Bucketed table Readonly 
capability
 Key: HIVE-21840
 URL: https://issues.apache.org/jira/browse/HIVE-21840
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


Impala needs a new capability to tell only read supported for bucketed tables. 
No matter it is managed or external, ACID or not. Also in the current 
implementation, when HIVEBUCKET2 is not in the capabilities list, a bucked 
external table returned as an un-bucketed one,  we need a way to know it is 
"downgraded" from a bucketed table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21839) Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21839:
---

 Summary: Hive Metastore Translation: Hive need to create a type of 
table if the client does not have the write capability for it
 Key: HIVE-21839
 URL: https://issues.apache.org/jira/browse/HIVE-21839
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


Hive can either return an error message or provide an API call to check the 
permission even without a table instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21838) Hive Metastore Translation: Add API call to tell client why table has limited access

2019-06-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21838:
---

 Summary: Hive Metastore Translation: Add API call to tell client 
why table has limited access
 Key: HIVE-21838
 URL: https://issues.apache.org/jira/browse/HIVE-21838
 Project: Hive
  Issue Type: New Feature
Reporter: Yongzhi Chen
Assignee: Naveen Gangam


When a table access type is Read-only or None, we need a way to tell clients 
why. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2018-12-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21075:
---

 Summary: Metastore: Drop partition performance downgrade with 
Postgres DB
 Key: HIVE-21075
 URL: https://issues.apache.org/jira/browse/HIVE-21075
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Yongzhi Chen


In order to workaround oracle not supporting limit statement caused performance 
issue, HIVE-9447 makes all the backend DB run select count(1) from SDS where 
SDS.CD_ID=? to check if the specific CD_ID is referenced in SDS table before 
drop a partition. This select count(1) statement does not scale well in 
Postgres, and there is no index for CD_ID column in SDS table.
For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
without index, while in 10-20ms with index. But the statement before HIVE-9447( 
SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses less than 10ms .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21019) Fix autoColumnStats tests to make auto stats gather possible.

2018-12-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-21019:
---

 Summary: Fix autoColumnStats tests to make auto stats gather 
possible.
 Key: HIVE-21019
 URL: https://issues.apache.org/jira/browse/HIVE-21019
 Project: Hive
  Issue Type: Bug
  Components: Test
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Before https://issues.apache.org/jira/browse/HIVE-20915 , the optimizer sort 
dynamic partitions is turn off for these tests. So these test can have group by 
in the query plan which can trigger compute statistics. After the jira, the 
optimizer is enabled, the query plan do not have group by, but a reduce sorting 
operation. In order to test the auto column stats gather feature, we should 
disable sort dynamic partitions for these tests. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20915) Make dynamic sort partition optimization available to HoS and MR

2018-11-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20915:
---

 Summary: Make dynamic sort partition optimization available to HoS 
and MR
 Key: HIVE-20915
 URL: https://issues.apache.org/jira/browse/HIVE-20915
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


HIVE-20703 put dynamic sort partition optimization under cost based decision, 
but it also makes the optimizer only available to tez. 
hive.optimize.sort.dynamic.partition works with other execution engines for a 
long time, we should keep the optimizer available to them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20741) Disable or fix random failed tests

2018-10-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20741:
---

 Summary: Disable or fix random failed tests
 Key: HIVE-20741
 URL: https://issues.apache.org/jira/browse/HIVE-20741
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


Two qfile tests for TestCliDriver, they may all relate to number precision 
issues:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_context_ngrams] 
(batchId=79)

Error:
Client Execution succeeded but contained differences (error code = 1) after 
executing udaf_context_ngrams.q 
43c43
< [{"ngram":["travelling"],"estfrequency":1.0}]
---
> [{"ngram":["travelling"],"estfrequency":3.0}]

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_corr] (batchId=84)

Client Execution succeeded but contained differences (error code = 1) after 
executing udaf_corr.q 
100c100
< 0.6633880657639324
---
> 0.6633880657639326





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20695) HoS Query fails with hive.exec.parallel=true

2018-10-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20695:
---

 Summary: HoS Query fails with hive.exec.parallel=true
 Key: HIVE-20695
 URL: https://issues.apache.org/jira/browse/HIVE-20695
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Hive queries which fail when running a HiveOnSpark job:
{noformat}
ERROR : Failed to execute spark task, with exception 
'java.lang.Exception(Failed to submit Spark work, please retry later)'
java.lang.Exception: Failed to submit Spark work, please retry later
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.execute(RemoteHiveSparkClient.java:186)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:107)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive/dbname/_spark_session_dir/e202c452-8793-4e4e-ad55-61e3d4965c69/somename.jar
 (inode 725730760): File does not exist. [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-1981084042_486659, pending creates: 7]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3755)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3556)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3412)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:688)
{format}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20016) Investigate random test failure

2018-06-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-20016:
---

 Summary: Investigate random test failure 
 Key: HIVE-20016
 URL: https://issues.apache.org/jira/browse/HIVE-20016
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3 failed with:
java.lang.AssertionError: Concurrent Statement failed: 
org.apache.hive.service.cli.HiveSQLException: java.lang.AssertionError: 
Authorization plugins not initialized!
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.finishTasks(TestJdbcWithMiniHS2.java:374)
at 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3(TestJdbcWithMiniHS2.java:304)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19897) Add more tests for parallel compilation

2018-06-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-19897:
---

 Summary: Add more tests for parallel compilation 
 Key: HIVE-19897
 URL: https://issues.apache.org/jira/browse/HIVE-19897
 Project: Hive
  Issue Type: Test
  Components: HiveServer2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The two parallel compilation tests in org.apache.hive.jdbc.TestJdbcWithMiniHS2 
do not real cover the case of queries compile concurrently from different 
connections. No sure it is on purpose or by mistake. Add more tests to cover 
the case. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19296) Add log to record MapredLocalTask Failure

2018-04-25 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-19296:
---

 Summary: Add log to record MapredLocalTask Failure
 Key: HIVE-19296
 URL: https://issues.apache.org/jira/browse/HIVE-19296
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In some cases, When MapredLocalTask fails around Child process start time, we 
can not find the detail error information anywhere(not in strerr log, no 
MapredLocal log file). All we get is :
{noformat}
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Obtaining error information
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
Task failed!
Task ID:
  Stage-48

Logs:

*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
/var/log/hive/hadoop-cmf-hive1-HIVESERVER2-t.log.out
*** ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
{noformat}
It is really hard to debug. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18671) lock not released after Hive on Spark query was cancelled

2018-02-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-18671:
---

 Summary: lock not released after Hive on Spark query was cancelled
 Key: HIVE-18671
 URL: https://issues.apache.org/jira/browse/HIVE-18671
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.2
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When cancel the query is running on spark, the SparkJobMonitor can not return, 
therefore the locks hold by the query can not be released. When enable debug in 
log, you will see many log info as following:
{noformat}

2018-02-09 08:27:09,613 INFO 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: 
[HiveServer2-Background-Pool: Thread-80]: state = CANCELLED
2018-02-09 08:27:10,613 INFO 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: 
[HiveServer2-Background-Pool: Thread-80]: state = CANCELLED

{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-17640) Comparison of date return null if only time part is provided in string.

2017-09-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-17640:
---

 Summary: Comparison of date return null if only time part is 
provided in string.
 Key: HIVE-17640
 URL: https://issues.apache.org/jira/browse/HIVE-17640
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 2.1.0


Reproduce:
select '2017-01-01 00:00:00' < current_date;
INFO  : OK
...
1 row selected (18.324 seconds)
...
 NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.

2017-06-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16875:
---

 Summary: Query against view with partitioned child on HoS fails 
with privilege exception.
 Key: HIVE-16875
 URL: https://issues.apache.org/jira/browse/HIVE-16875
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Query against view with child table that has partitions fails with privilege 
exception even with correct privileges.

Reproduce:
{noformat}
create table jsamp1 (a string) partitioned by (b int);
insert into table jsamp1 partition (b=1) values ("hello");
create view jview as select * from jsamp1;

create role viewtester;
grant all on table jview to role viewtester;
grant role viewtester to group testers;

Use MR, the select will succeed:
set hive.execution.engine=mr;
select count(*) from jview;

while use spark:
set hive.execution.engine=spark;
select count(*) from jview;

it fails with:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 User tester does not have privileges for QUERY
 The required privileges: 
Server=server1->Db=default->Table=j1part->action=select; 
(state=42000,code=4)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16660) Not able to add partition for views in hive when sentry is enabled

2017-05-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16660:
---

 Summary: Not able to add partition for views in hive when sentry 
is enabled
 Key: HIVE-16660
 URL: https://issues.apache.org/jira/browse/HIVE-16660
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Repro:
create table tesnit (a int) partitioned by (p int);
insert into table tesnit partition (p = 1) values (1);
insert into table tesnit partition (p = 2) values (1);
create view test_view partitioned on (p) as select * from tesnit where p =1;

alter view test_view add partition (p = 2);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10056]: The query does not reference any valid partition. To run this query, 
set hive.mapred.mode=nonstrict (state=42000,code=10056)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16426) Query cancel: improve the way to handle files

2017-04-12 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-16426:
---

 Summary: Query cancel: improve the way to handle files
 Key: HIVE-16426
 URL: https://issues.apache.org/jira/browse/HIVE-16426
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


1. Add data structure support to make it is easy to check query cancel status.
2. Handle query cancel more gracefully. Remove possible file leaks caused by 
query cancel as shown in following stack:
{noformat}
2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
[HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
java.io.InterruptedIOException: Call interrupted
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
at 
org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
3. Add checkpoints to related file operations to improve response time for 
query cancelling. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15997) Resource leaks when query is cancelled

2017-02-21 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15997:
---

 Summary: Resource leaks when query is cancelled 
 Key: HIVE-15997
 URL: https://issues.apache.org/jira/browse/HIVE-15997
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


There may some resource leaks when query is cancelled.
We see following stacks in the log:
Possible files and folder leak:
{noformat}
2017-02-02 06:23:25,410 WARN  hive.ql.Context: [HiveServer2-Background-Pool: 
Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: 
"ychencdh511t-1.vpc.cloudera.com":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy25.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy26.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405)
at org.apache.hadoop.hive.ql.Context.clear(Context.java:541)
at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109)
at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
... 35 more

2017-02-02 12:26:52,706 INFO  
org.apache.hive.service.cli.operation.OperationManager: 
[HiveServer2-Background-Pool: Thread-23]: Operation is timed 
out,operation=OperationHandle [opType=EXECUTE_STATEMENT, 

[jira] [Created] (HIVE-15735) In some cases, view objects inside a view do not have parents

2017-01-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15735:
---

 Summary: In some cases, view objects inside a view do not have 
parents
 Key: HIVE-15735
 URL: https://issues.apache.org/jira/browse/HIVE-15735
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This cause Sentry throws "No valid privileges" error:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges.
To reproduce:
Enable sentry:
create table t1( i int);
create view v1 as select * from t1;
create view v2 as select * from v1 union all select * from v1;
If the user does not have read permission on t1 and v1, the query
select * from v2;  
This will fail with:
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 User foo does not have privileges for QUERY
 The required privileges: 
Server=server1->Db=database2->Table=v1->action=select; (state=42000,code=4)
Sentry should not check v1's permission, for v1 has at least one parent(v2).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15615) Fix unit tests failures cause by HIVE-13696

2017-01-13 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15615:
---

 Summary: Fix unit tests failures cause by HIVE-13696
 Key: HIVE-15615
 URL: https://issues.apache.org/jira/browse/HIVE-15615
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Following unit tests failed with same stack:
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters
{noformat}
2017-01-11T15:02:27,774 ERROR [main] ql.Driver: FAILED: NullPointerException 
null
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.cleanName(QueuePlacementRule.java:351)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$User.getQueueForApp(QueuePlacementRule.java:132)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
at 
org.apache.hadoop.hive.schshim.FairSchedulerShim.setJobQueueForUserInternal(FairSchedulerShim.java:96)
at 
org.apache.hadoop.hive.schshim.FairSchedulerShim.validateQueueConfiguration(FairSchedulerShim.java:82)
at 
org.apache.hadoop.hive.ql.session.YarnFairScheduling.validateYarnQueue(YarnFairScheduling.java:68)
at org.apache.hadoop.hive.ql.Driver.configureScheduling(Driver.java:671)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:543)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1223)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15572) Improve the response time for query canceling when it happens during acquiring locks

2017-01-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15572:
---

 Summary: Improve the response time for query canceling when it 
happens during acquiring locks
 Key: HIVE-15572
 URL: https://issues.apache.org/jira/browse/HIVE-15572
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When query canceling command sent during Hive Acquire locks (from zookeeper), 
hive will finish acquiring all the locks and release them. As it is shown in 
the following log:
It took 165 s to finish acquire the lock,then spend 81s to release them.
We can improve the performance by not acquiring any more locks and releasing 
held locks when the query canceling command is received. 

Background-Pool: Thread-224]: 
2017-01-03 10:50:35,413 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-224]: 
2017-01-03 10:51:00,671 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:00,672 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:00,672 ERROR org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-218]: FAILED: query select count(*) from 
manyparttbl has been cancelled
2017-01-03 10:51:00,673 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-218]: 
2017-01-03 10:51:40,755 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[HiveServer2-Background-Pool: Thread-215]: 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix

2016-12-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15437:
---

 Summary: avro tables join fails when - tbl join tbl_postfix
 Key: HIVE-15437
 URL: https://issues.apache.org/jira/browse/HIVE-15437
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The following queries return good results:
select * from table1 where col1=key1; 
select * from table1_1 where col1=key1; 
When join them together, it gets following error:
{noformat}
Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found long, 
expecting union
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141)
 ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
{noformat}

The two avro tables both is defined by using avro schema, and the first table's 
name is the second table name's prefix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15391) Location validation for table should ignore the values for view.

2016-12-08 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15391:
---

 Summary: Location validation for table should ignore the values 
for view.
 Key: HIVE-15391
 URL: https://issues.apache.org/jira/browse/HIVE-15391
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 2.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When use schematool to do location validation, we got error message for views, 
for example:
{noformat}
n DB with Name: viewa
NULL Location for TABLE with Name: viewa
In DB with Name: viewa
NULL Location for TABLE with Name: viewb
In DB with Name: viewa
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15359) skip.footer.line.count doesnt work properly for certain situations

2016-12-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15359:
---

 Summary: skip.footer.line.count doesnt work properly for certain 
situations
 Key: HIVE-15359
 URL: https://issues.apache.org/jira/browse/HIVE-15359
 Project: Hive
  Issue Type: Bug
  Components: Reader
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This issue's reproduce is very like HIVE-12718 , but the data file is larger 
than 128M . In this case, even make sure only one mapper is used, the footer is 
still wrongly skipped. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15320) Cross Realm hive query is failing with KERBEROS authentication error

2016-11-30 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15320:
---

 Summary: Cross Realm hive query is failing with KERBEROS 
authentication error
 Key: HIVE-15320
 URL: https://issues.apache.org/jira/browse/HIVE-15320
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Yongzhi Chen


Executing cross realm query and it is failing.
Authentication against remote NN is tried with SIMPLE, not KERBEROS.
It looks Hive does not obtain needed ticket for remote NN.

insert overwrite directory 'hdfs://differentrealmhost:8020/hive/test' select * 
from currentrealmtable where ...;
It will fail with
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS]

hdfs command distcp works fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15074:
---

 Summary: Schematool provides a way to detect invalid entries in 
VERSION table
 Key: HIVE-15074
 URL: https://issues.apache.org/jira/browse/HIVE-15074
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen
Priority: Minor


For some unknown reason, we see customer's HMS can not start because there are 
multiple entries in their HMS VERSION table. Schematool should provide a way to 
validate the HMS db and provide warning and fix options for this kind of 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15073) Schematool should detect malformed URIs

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15073:
---

 Summary: Schematool should detect malformed URIs
 Key: HIVE-15073
 URL: https://issues.apache.org/jira/browse/HIVE-15073
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen


For some causes(most unknown), HMS DB tables sometimes has invalid entries, for 
example URI missing scheme for SDS table's LOCATION column or DBS's 
DB_LOCATION_URI column. These malformed URIs lead to hard to analyze errors in 
HIVE and SENTRY. Schematool need to provide a command to detect these malformed 
URI, give a warning and provide an option to fix the URIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15072) Schematool should recognize missing tables in metastore

2016-10-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-15072:
---

 Summary: Schematool should recognize missing tables in metastore
 Key: HIVE-15072
 URL: https://issues.apache.org/jira/browse/HIVE-15072
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Yongzhi Chen


When Install a new database failed half way(for some other reasons), not all of 
the metastore tables are installed. This caused HMS server failed to start up 
due to missing tables. Re-run the Schematool, It ran successfully, and in the 
stdout log said: "Database already has tables. Skipping table creation".
However, restarting HMS getting the same error reporting missing tables.
Schematool should detect missing tables and provide options to go ahead and 
recreate missing tables in the case of new installation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14743) ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs

2016-09-13 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14743:
---

 Summary: ArrayIndexOutOfBoundsException - HBASE-backed views' 
query with JOINs
 Key: HIVE-14743
 URL: https://issues.apache.org/jira/browse/HIVE-14743
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The stack:
{noformat}
2016-09-13T09:38:49,972 ERROR [186b4545-65b5-4bfc-bc8e-3e14e251bb12 main] 
exec.Task: Job Submission failed with exception 
'java.lang.ArrayIndexOutOfBoundsException(1)'
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.createFilterScan(HiveHBaseTableInputFormat.java:224)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:492)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:449)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:466)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:356)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:546)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)

{noformat}

Repro:
{noformat}
CREATE TABLE HBASE_TABLE_TEST_1(
  cvalue string ,
  pk string,
 ccount int   )
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'='cf:val,:key,cf2:count',
  'hbase.scan.cache'='500',
  'hbase.scan.cacheblocks'='false',
  'serialization.format'='1')
TBLPROPERTIES (
  'hbase.table.name'='hbase_table_test_1',
  'serialization.null.format'=''  );


  CREATE VIEW VIEW_HBASE_TABLE_TEST_1 AS SELECT 
hbase_table_test_1.cvalue,hbase_table_test_1.pk,hbase_table_test_1.ccount FROM 
hbase_table_test_1 WHERE hbase_table_test_1.ccount IS NOT NULL;

CREATE TABLE HBASE_TABLE_TEST_2(
  cvalue string ,
pk string ,
   ccount int  )
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'='cf:val,:key,cf2:count',
  'hbase.scan.cache'='500',
  'hbase.scan.cacheblocks'='false',
  'serialization.format'='1')
TBLPROPERTIES (
  'hbase.table.name'='hbase_table_test_2',
  'serialization.null.format'='');


CREATE VIEW VIEW_HBASE_TABLE_TEST_2 AS SELECT 
hbase_table_test_2.cvalue,hbase_table_test_2.pk,hbase_table_test_2.ccount FROM 
hbase_table_test_2 WHERE  hbase_table_test_2.pk >='3-h-0' AND 
hbase_table_test_2.pk <= '3-h-g' AND hbase_table_test_2.ccount IS NOT NULL;

set hive.auto.convert.join=false;

  SELECT  p.cvalue cvalue
FROM `VIEW_HBASE_TABLE_TEST_1` `p`
LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_2` `A1`
ON `p`.cvalue = `A1`.cvalue
LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_1` `A2`
ON `p`.cvalue = `A2`.cvalue;

{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14715) Hive throws NumberFormatException with query with Null value

2016-09-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14715:
---

 Summary: Hive throws NumberFormatException with query with Null 
value
 Key: HIVE-14715
 URL: https://issues.apache.org/jira/browse/HIVE-14715
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


The java.lang.NumberFormatException will throw with following reproduce:
set hive.cbo.enable=false;
CREATE TABLE `paqtest`(
`c1` int,
`s1` string,
`s2` string,
`bn1` bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

insert into paqtest values (58, '', 'ABC', 0);

SELECT
'Pricing mismatch' AS category,
c1,
NULL AS itemtype_used,
NULL AS acq_itemtype,
s2,
NULL AS currency_used_avg,
NULL AS acq_items_avg,
sum(bn1) AS cca
FROM paqtest
WHERE (s1 IS NULL OR length(s1) = 0)
GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL;

The stack like following:
java.lang.NumberFormatException: ABC
GroupByOperator.process(Object, int) line: 773  
ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236   
ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, 
RawKeyValueIterator, RawComparator, Class, Class) line: 
444   
ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392
LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319   
Executors$RunnableAdapter.call() line: 471   

It works fine when hive.cbo.enable = true




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14596) Canceling hive query takes very long time

2016-08-22 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14596:
---

 Summary: Canceling hive query takes very long time
 Key: HIVE-14596
 URL: https://issues.apache.org/jira/browse/HIVE-14596
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


when the Hue user clicks cancel, the Hive query does not stop immediately, it 
can take very long time. And in the yarn job history you will see exceptions 
like following:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive/hive/80a5cfdb-9f98-44d2-ae53-332c8dae62a3/hive_2016-08-20_07-06-12_819_8780093905859269639-3/-mr-1/.hive-staging_hive_2016-08-20_07-06-12_819_8780093905859269639-3/_task_tmp.-ext-10001/_tmp.00_0
 (inode 28224): File does not exist. Holder 
DFSClient_attempt_1471630445417_0034_m_00_0_-50732711_1 does not have any 
open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3624)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3427)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3283)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:246)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1007)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:206)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement

2016-08-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14538:
---

 Summary: beeline throws exceptions with parsing hive config when 
using !sh statement
 Key: HIVE-14538
 URL: https://issues.apache.org/jira/browse/HIVE-14538
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When beeline has a connection to a server, in some env it has following problem:
{noformat}
0: jdbc:hive2://localhost> !verbose
verbose: on
0: jdbc:hive2://localhost> !sh id
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost> !sh echo hello
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost>
{noformat}

Also it breaks if there is no connection established:
{noformat}
beeline> !sh id
java.lang.NullPointerException
at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897)
at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724)
at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702)
at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14519) Multi insert query bug

2016-08-11 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14519:
---

 Summary: Multi insert query bug
 Key: HIVE-14519
 URL: https://issues.apache.org/jira/browse/HIVE-14519
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When running multi-insert queries, when one of the query is not returning 
results, the other query is not returning the right result.
For example:
After following query, there is no value in /tmp/emp/dir3/00_0
{noformat}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
where 1=2
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{noformat}

where clause in the second insert should not affect the third insert. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14015) SMB MapJoin failed for Hive on Spark when kerberized

2016-06-14 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14015:
---

 Summary: SMB MapJoin failed for Hive on Spark when kerberized
 Key: HIVE-14015
 URL: https://issues.apache.org/jira/browse/HIVE-14015
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0, 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


java.io.IOException: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
can be issued only with kerberos or web authentication

It could be reproduced:
1) prepare sample data:
a=1
while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data

2) prepare source hive table:
CREATE TABLE `s`(`c` string);
load data local inpath 'data' into table s;

3) prepare the bucketed table:
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS;
insert into t select * from s;

4) reproduce this issue:
SET hive.execution.engine=spark;
SET hive.auto.convert.sortmerge.join = true;
SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = 
org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ;
SET hive.auto.convert.sortmerge.join.noconditionaltask = true;
SET hive.optimize.bucketmapjoin = true;
SET hive.optimize.bucketmapjoin.sortedmerge = true;
select * from t join t t1 on t.c=t1.c;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13991) Union All on view fail with no valid permission on underneath table

2016-06-09 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13991:
---

 Summary: Union All on view fail with no valid permission on 
underneath table
 Key: HIVE-13991
 URL: https://issues.apache.org/jira/browse/HIVE-13991
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When sentry is enabled. 
create view V as select * from T;
When the user has read permission on view V, but does not have read permission 
on table T,

select * from V union all select * from V 
failed with:
{noformat}
0: jdbc:hive2://> select * from s07view union all select * from s07view 
limit 1;
Error: Error while compiling statement: FAILED: SemanticException No valid 
privileges
 Required privileges for this query: 
Server=server1->Db=default->Table=sample_07->action=select; 
(state=42000,code=4)
{noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13932) Hive SMB Map Join with small set of LIMIT failed with NPE

2016-06-02 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13932:
---

 Summary: Hive SMB Map Join with small set of LIMIT failed with NPE
 Key: HIVE-13932
 URL: https://issues.apache.org/jira/browse/HIVE-13932
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0, 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


1) prepare sample data:
a=1
while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data

2) prepare source hive table:
CREATE TABLE `s`(`c` string);
load data local inpath 'data' into table s;

3) prepare the bucketed table:
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS;
insert into t select * from s;

4) reproduce this issue:
SET hive.auto.convert.sortmerge.join = true;
SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = 
org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ;
SET hive.auto.convert.sortmerge.join.noconditionaltask = true;
SET hive.optimize.bucketmapjoin = true;
SET hive.optimize.bucketmapjoin.sortedmerge = true;
select * from t join t t1 on t.c=t1.c limit 1;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13632) Hive failing on insert empty array into parquet table

2016-04-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13632:
---

 Summary: Hive failing on insert empty array into parquet table
 Key: HIVE-13632
 URL: https://issues.apache.org/jira/browse/HIVE-13632
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The insert will fail with following stack:
{noformat}
by: parquet.io.ParquetEncodingException: empty fields are illegal, the field 
should be ommited completely instead
at 
parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:271)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:271)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at 
parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
at 
parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697)
{noformat}
Reproduce:
{noformat}
create table test_small (
key string,
arrayValues array)
stored as parquet;
insert into table test_small select 'abcd', array() from src limit 1;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13570:
---

 Summary: Some query with Union all fails when CBO is off
 Key: HIVE-13570
 URL: https://issues.apache.org/jira/browse/HIVE-13570
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries with union all throws IndexOutOfBoundsException
when:
set hive.cbo.enable=false;
set hive.ppd.remove.duplicatefilters=true;
The stack is as:
{noformat}
{code} 
java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
at java.util.ArrayList.get(ArrayList.java:411) 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
 
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13200) Aggregation functions returning empty rows on partitioned columns

2016-03-03 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13200:
---

 Summary: Aggregation functions returning empty rows on partitioned 
columns
 Key: HIVE-13200
 URL: https://issues.apache.org/jira/browse/HIVE-13200
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 2.0.0, 1.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Running aggregation functions like MAX, MIN, DISTINCT against partitioned 
columns will return empty rows if table has property: 
'skip.header.line.count'='1'
Reproduce:
{noformat}
DROP TABLE IF EXISTS test;

CREATE TABLE test (a int) 
PARTITIONED BY (b int) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
TBLPROPERTIES('skip.header.line.count'='1');

INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4);
INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4);

SELECT * FROM test;

SELECT DISTINCT b FROM test;
SELECT MAX(b) FROM test;
SELECT DISTINCT a FROM test;
{noformat}

The output:
{noformat}
0: jdbc:hive2://localhost:1/default> SELECT * FROM test;
+-+-+--+
| test.a  | test.b  |
+-+-+--+
| 2   | 1   |
| 3   | 1   |
| 4   | 1   |
| 2   | 2   |
| 3   | 2   |
| 4   | 2   |
+-+-+--+
6 rows selected (0.631 seconds)

0: jdbc:hive2://localhost:1/default> SELECT DISTINCT b FROM test;
++--+
| b  |
++--+
++--+
No rows selected (47.229 seconds)

0: jdbc:hive2://localhost:1/default> SELECT MAX(b) FROM test;
+---+--+
|  _c0  |
+---+--+
| NULL  |
+---+--+
1 row selected (49.508 seconds)

0: jdbc:hive2://localhost:1/default> SELECT DISTINCT a FROM test;
++--+
| a  |
++--+
| 2  |
| 3  |
| 4  |
++--+
3 rows selected (46.859 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13065) Hive throws NPE when writing map type data to a HBase backed table

2016-02-16 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13065:
---

 Summary: Hive throws NPE when writing map type data to a HBase 
backed table
 Key: HIVE-13065
 URL: https://issues.apache.org/jira/browse/HIVE-13065
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Hive throws NPE when writing data to a HBase backed table with below conditions:

# There is a map type column
# The map type column has NULL in its values

Below are the reproduce steps:

*1) Create a HBase backed Hive table*
{code:sql}
create table hbase_test (id bigint, data map)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,cf:map_col")
tblproperties ("hbase.table.name" = "hive_test");
{code}

*2) insert data into above table*
{code:sql}
insert overwrite table hbase_test select 1 as id, map('abcd', null) as data 
from src limit 1;
{code}

The mapreduce job for insert query fails. Error messages are as below:
{noformat}
2016-02-15 02:26:33,225 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) 
{"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:731)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:666)
... 14 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:221)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:236)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:275)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:222)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
... 15 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13039) BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table

2016-02-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-13039:
---

 Summary: BETWEEN predicate is not functioning correctly with 
predicate pushdown on Parquet table
 Key: HIVE-13039
 URL: https://issues.apache.org/jira/browse/HIVE-13039
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 1.2.1, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


BETWEEN becomes exclusive in parquet table when predicate pushdown is on (as it 
is by default in newer Hive versions). To reproduce(in a cluster, not local 
setup):
CREATE TABLE parquet_tbl(
  key int,
  ldate string)
 PARTITIONED BY (
 lyear string )
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
 STORED AS INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

insert overwrite table parquet_tbl partition (lyear='2016') select
  1,
  '2016-02-03' from src limit 1;

set hive.optimize.ppd.storage = true;
set hive.optimize.ppd = true;
select * from parquet_tbl where ldate between '2016-02-03' and '2016-02-03';





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12795) Vectorized execution causes ClassCastException

2016-01-06 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12795:
---

 Summary: Vectorized execution causes ClassCastException
 Key: HIVE-12795
 URL: https://issues.apache.org/jira/browse/HIVE-12795
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In some hive versions, when
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = true;

Some join queries fail with ClassCastException:
The stack:
{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableStringObjectInspector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:419)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1102)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:55)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more

{noformat}
It can not be reproduced in hive 2.0 and 1.3 because of different code path. 
Reproduce:
{noformat}

CREATE TABLE test1
 (
   id string)
   PARTITIONED BY (
  cr_year bigint,
  cr_month bigint)
 ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
TBLPROPERTIES (
  'serialization.null.format'='' );
  
  CREATE TABLE test2(
id string
  )
   PARTITIONED BY (
  cr_year bigint,
  cr_month bigint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
TBLPROPERTIES (
  'serialization.null.format'=''
 );
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = true;
 SELECT cr.id1 ,
cr.id2 
FROM
(SELECT t1.id id1,
 t2.id id2
 from
 (select * from test1 ) t1
 left outer join test2  t2
 on t1.id=t2.id) cr;

{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12784) Group by SemanticException: Invalid column reference

2016-01-05 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12784:
---

 Summary: Group by SemanticException: Invalid column reference
 Key: HIVE-12784
 URL: https://issues.apache.org/jira/browse/HIVE-12784
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries work fine in older versions throws SemanticException, the stack 
trace:

{noformat}
FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 
'key2'
15/12/21 18:56:44 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 
10002]: Line 96:1 Invalid column reference 'key2'
org.apache.hadoop.hive.ql.parse.SemanticException: Line 96:1 Invalid column 
reference 'key2'
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4228)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5670)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9007)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9884)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9777)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10250)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10261)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10141)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}
Reproduce:
{noformat}
create table tlb (key int, key1 int, key2 int);
create table src (key int, value string);
select key, key1, key2 from (select a.key, 0 as key1 , 0 as key2 from tlb a 
inner join src b on a.key = b.key) a group by key, key1, key2;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2015-12-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12646:
---

 Summary: beeline and HIVE CLI do not parse ; in quote properly
 Key: HIVE-12646
 URL: https://issues.apache.org/jira/browse/HIVE-12646
 Project: Hive
  Issue Type: Bug
  Components: CLI, Clients
Reporter: Yongzhi Chen
Assignee: Vaibhav Gumashta


Beeline and Cli have to escape ; in the quote while most other shell scripts 
need not. For example:
in Beeline:
{noformat}
0: jdbc:hive2://localhost:1> select ';' from tlb1;
select ';' from tlb1;
15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
Error: Error while compiling statement: FAILED: ParseException line 1:8 cannot 
recognize input near '' '
{noformat}
while in mysql shell:
{noformat}
mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
++
| ;foo   |
| ;foo   |
| ;foo   |
++
3 rows in set (0.00 sec)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-10 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12378:
---

 Summary: Exception on HBaseSerDe.serialize binary field
 Key: HIVE-12378
 URL: https://issues.apache.org/jira/browse/HIVE-12378
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, Serializers/Deserializers
Affects Versions: 1.1.0, 1.0.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


An issue was reproduced with the binary typed HBase columns in Hive:

It works fine as below:
CREATE TABLE test9 (key int, val string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:val#b"
);
insert into test9 values(1,"hello");

But when string type is changed to binary as:
CREATE TABLE test2 (key int, val binary)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:val#b"
);
insert into table test2 values(1, 'hello');

The following exception is thrown:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"}
...
Caused by: java.lang.RuntimeException: Hive internal error.
at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
at 
org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
... 16 more

We should support hive binary type column for hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12189:
---

 Summary: The list in pushdownPreds of ppd.ExprWalkerInfo should 
not be allowed to grow very large
 Key: HIVE-12189
 URL: https://issues.apache.org/jira/browse/HIVE-12189
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Some queries are very slow in compile time, for example following query
{noformat}
select * from tt1 nf 
join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
a2.hdp_databaseid = nf.hdp_databaseid) 
join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
a5.hdp_databaseid = nf.hdp_databaseid) 
JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid = 
nf.hdp_databaseid) 
JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid = 
nf.hdp_databaseid)
where nf.hdp_databaseid = 102 limit 10;
{noformat}
takes around 120 seconds to compile in hive 1.1 when
hive.mapred.mode=strict;
hive.optimize.ppd=true;
and hive is not in test mode.
All the above tables are tables with one column as partition. But all the 
tables are empty table. If the tables are not empty, it is reported that the 
compile so slow that it looks like hive is hanging. 
In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
still a lot of time. One of the problem slows ppd down is that list in 
pushdownPreds can grow very large which makes extractPushdownPreds bad 
performance:
{noformat}
public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
Operator op, List preds)
{noformat}
During run the query above, in the following break point preds  has size of 
12051, and most entry of the list is: GenericUDFOPEqual(Column[hdp_databaseid], 
Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
Following code in extractPushdownPreds will clone all the nodes in preds and do 
the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much faster, 
but we still clone thousands of nodes with same expression. Should we store so 
many same predicates in the list or just one is good enough?  

{noformat}
List startNodes = new ArrayList();
List clonedPreds = new ArrayList();
for (ExprNodeDesc node : preds) {
  ExprNodeDesc clone = node.clone();
  clonedPreds.add(clone);
  exprContext.getNewToOldExprMap().put(clone, node);
}
startNodes.addAll(clonedPreds);

egw.startWalking(startNodes, null);

{noformat}

Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
method 
public void addFinalCandidate(String alias, ExprNodeDesc expr) 
and
public void addPushDowns(String alias, List pushDowns) 

to only add expr which is not in the PushDown list for an alias?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12058) Change hive script to record errors when calling hbase fails

2015-10-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12058:
---

 Summary: Change hive script to record errors when calling hbase 
fails
 Key: HIVE-12058
 URL: https://issues.apache.org/jira/browse/HIVE-12058
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 1.1.0, 0.14.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


By default hive will try to find out which jars need to be added to the 
classpath in order to run MR jobs against an HBase cluster, however if hbase 
can't be found or if hbase mapredcp fails, the hive script  will fail silently 
and ignore some of the jars to be included into the. That makes very difficult 
to analyze the real problem.
Hive script should record the error not just simply redirect two hbase failures:
HBASE_BIN=$
{HBASE_BIN:-"$(which hbase 2>/dev/null)"}
$HBASE_BIN mapredcp 2>/dev/null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false

2015-10-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-12008:
---

 Summary: Make last two tests added by HIVE-11384 pass when 
hive.in.test is false
 Key: HIVE-12008
 URL: https://issues.apache.org/jira/browse/HIVE-12008
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The last two qfile unit tests fail when hive.in.test is false. It may relate 
how we handle prunelist for select. When select include every column in a 
table, the prunelist for the select is empty. It may cause issues to calculate 
its parent's prunelist.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11982) Some test case for union all with recent changes

2015-09-28 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11982:
---

 Summary: Some test case for union all with recent changes
 Key: HIVE-11982
 URL: https://issues.apache.org/jira/browse/HIVE-11982
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


The tests throw java.lang.IndexOutOfBoundsException again. 
It was supposed to be fixed by HIVE-11271



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11801) In HMS HA env, "show databases" fails when"current" HMS is stopped.

2015-09-11 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11801:
---

 Summary: In HMS HA env, "show databases" fails when"current" HMS 
is stopped.
 Key: HIVE-11801
 URL: https://issues.apache.org/jira/browse/HIVE-11801
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.1.0, 1.2.0, 0.14.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Reproduce steps:
# Enable HMS HA on a cluster
# Use beeline to connect to HS2 and execute command {{show databases}}. Don't 
quit beeline after command has finished
# Stop the first HMS in configuration {{hive.metastore.uri}}
# Execute {{show databases}} in beeline again. Will get below error:
{noformat}
MetaException(message:Got exception: 
org.apache.thrift.transport.TTransportException java.net.SocketException: 
Broken pipe)
{noformat}

The error message in HS2 is as below:
{noformat}
2015-09-08 12:06:53,236 ERROR hive.log: Got exception: 
org.apache.thrift.transport.TTransportException java.net.SocketException: 
Broken pipe
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Broken pipe
at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at 
org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:501)
at 
org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.TFilterTransport.flush(TFilterTransport.java:77)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_databases(ThriftHiveMetastore.java:692)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:684)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:964)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1909)
at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
at 
org.apache.hive.service.cli.operation.GetSchemasOperation.runInternal(GetSchemasOperation.java:59)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:462)
at 
org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:296)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:534)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
... 31 more
2015-09-08 12:06:53,238 ERROR hive.log: Converting exception to MetaException
2015-09-08 12:06:53,238 WARN 
org.apache.hive.service.cli.thrift.ThriftCLIService: Error getting schemas:
org.apache.hive.service.cli.HiveSQLException: MetaException(message:Got 
exception: org.apache.thrift.transport.TTransportException 

[jira] [Created] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-04 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11745:
---

 Summary: Alter table Exchange partition with multiple 
partition_spec is not working
 Key: HIVE-11745
 URL: https://issues.apache.org/jira/browse/HIVE-11745
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Single partition works, but multiple partitions will not work.
Reproduce steps:
{noformat}
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
DROP TABLE IF EXISTS t3;
DROP TABLE IF EXISTS t4;

CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);

INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 10;
INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM jsmall 
LIMIT 10;

SELECT * FROM t1;

SELECT * FROM t3;

ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
SELECT * FROM t1;
SELECT * FROM t2;

ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
SELECT * FROM t3;
SELECT * FROM t4;
{noformat}
The output:
{noformat}
0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
+---+++--+
| t3.a  | t3.d1  | t3.d2  |
+---+++--+
+---+++--+
No rows selected (0.227 seconds)
0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
+---+++--+
| t4.a  | t4.d1  | t4.d2  |
+---+++--+
+---+++--+
No rows selected (0.266 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-19 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11604:
---

 Summary: HIVE return wrong results in some queries with PTF 
function
 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.1.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Following query returns empty result which is not right:
{noformat}
select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}

After remove row_number() over (partition by id, fkey) as rnum from query, the 
right result returns.

Reproduce:
{noformat}
create table tlb1 (id int, fkey int, val string);
create table tlb2 (fid int, name string);
insert into table tlb1 values(100,1,'abc');
insert into table tlb1 values(200,1,'efg');
insert into table tlb2 values(1, 'key1');

select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;

INFO  : Ended Job = job_local1070163923_0017
+-+---+---+--+
No rows selected (14.248 seconds)
| ddd.id  | ddd.fkey  | aaa.name  |
+-+---+---+--+
+-+---+---+--+

0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey 
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
0: jdbc:hive2://localhost:1 from (
0: jdbc:hive2://localhost:1 select id, fkey 
0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
0: jdbc:hive2://localhost:1  ) ddd 
0: jdbc:hive2://localhost:1 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
...
INFO  : Ended Job = job_local672340505_0019
+-+---+---+--+
2 rows selected (14.383 seconds)
| ddd.id  | ddd.fkey  | aaa.name  |
+-+---+---+--+
| 100 | 1 | key1  |
| 200 | 1 | key1  |
+-+---+---+--+

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11502) Map side aggregation is extremely slow

2015-08-08 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11502:
---

 Summary: Map side aggregation is extremely slow
 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


For the query as following:
{noformat}
create table tbl2 as 
select col1, max(col2) as col2 
from tbl1 group by col1;
{noformat}
If the column for group by has many different values (for example 40), the 
map side aggregation is very slow. I ran the query which took more than 3 hours 
, after 3 hours, I have to kill the query.
The same query can finish in 7 seconds, if I turn off map side aggregation by:
{noformat}
set hive.map.aggr = false;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11380) NPE when FileSinkOperator is not inialized

2015-07-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11380:
---

 Summary: NPE when FileSinkOperator is not inialized
 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When FileSinkOperator's initializeOp is not called (which may happen when an 
operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
throw NPE at close time. The stacktrace:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)

at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)

... 18 more
{noformat}
This Exception is misleading and often distracts users from finding real 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11384:
---

 Summary: Add Test case which cover both HIVE-11271 and HIVE-11333
 Key: HIVE-11384
 URL: https://issues.apache.org/jira/browse/HIVE-11384
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer, Parser
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11319) CTAS with location qualifier overwrites directories

2015-07-20 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11319:
---

 Summary: CTAS with location qualifier overwrites directories
 Key: HIVE-11319
 URL: https://issues.apache.org/jira/browse/HIVE-11319
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


CTAS with location clause acts as an insert overwrite. This can cause problems 
when there sub directories with in a directory.
This cause some users accidentally wipe out directories with very important 
data. We should  bind CTAS with location to a non-empty directory. 

Reproduce:
create table ctas1  
location '/Users/ychen/tmp' 
as 
select * from jsmall limit 10;

create table ctas2  
location '/Users/ychen/tmp' 
as 
select * from jsmall limit 5;

Both creates will succeed. But value in table ctas1 will be replaced by ctas2 
accidentally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11271:
---

 Summary: java.lang.IndexOutOfBoundsException when union all with 
if function
 Key: HIVE-11271
 URL: https://issues.apache.org/jira/browse/HIVE-11271
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.0.0, 0.14.0
Reporter: Yongzhi Chen


Some queries with Union all as subquery fail in MapReduce task with stacktrace:
{noformat}
15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
operator UNION[104]
15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
complete.
15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
job_local826862759_0005
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 10 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
... 21 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
... 21 more

{noformat}

Reproduce:

{noformat}
create table if not exists union_all_bug_test_1 
( 
f1 int,
f2 int
); 

create table if not exists union_all_bug_test_2 
( 
f1 int 
); 

SELECT f1 
FROM ( 

SELECT 
f1 
, if('helloworld' like '%hello%' ,f1,f2) as filter 
FROM union_all_bug_test_1 

union all 

select 
f1 
, 0 as filter 
from union_all_bug_test_2 
) A 
WHERE (filter = 1); 

{noformat}




--
This message was 

[jira] [Created] (HIVE-11208) Can not drop a default partition __HIVE_DEFAULT_PARTITION__ which is not a string type

2015-07-08 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11208:
---

 Summary: Can not drop a default partition 
__HIVE_DEFAULT_PARTITION__ which is not a string type
 Key: HIVE-11208
 URL: https://issues.apache.org/jira/browse/HIVE-11208
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.1.0
Reporter: Yongzhi Chen


When partition is not a string type, for example, if it is a int type, when 
drop the default partition __HIVE_DEFAULT_PARTITION__, you will get:
SemanticException Unexpected unknown partitions
Reproduce:
{noformat}
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1;

DROP TABLE IF EXISTS test;
CREATE TABLE test (col1 string) PARTITIONED BY (p1 int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;
INSERT OVERWRITE TABLE test PARTITION (p1) SELECT code, IF(salary  600, 100, 
null) as p1 FROM jsmall;

hive SHOW PARTITIONS test;
OK
p1=100
p1=__HIVE_DEFAULT_PARTITION__
Time taken: 0.124 seconds, Fetched: 2 row(s)

hive ALTER TABLE test DROP partition (p1 = '__HIVE_DEFAULT_PARTITION__');
FAILED: SemanticException Unexpected unknown partitions for (p1 = null)

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11150) Remove wrong warning message related to chgrp

2015-06-30 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11150:
---

 Summary: Remove wrong warning message related to chgrp
 Key: HIVE-11150
 URL: https://issues.apache.org/jira/browse/HIVE-11150
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 1.2.0, 1.0.0, 0.14.0, 0.13.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When using other file system other than hdfs, users see warning message 
regarding hdfs chgrp. The warning is very annoying and confusing. We'd better 
remove it. 
The warning example:
{noformat}
hive insert overwrite table s3_test select total_emp, salary, description from 
sample_07 limit 5;
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended

2015-06-25 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-2:
---

 Summary: ISO-8859-1 text output has fragments of previous longer 
rows appended
 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results 
for a string column are incorrect for any row that was preceded by a row 
containing a longer string.

Example steps to reproduce:

1. Create a table using ISO 8859-1 encoding:

CREATE TABLE person_lat1 (name STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
in HDFS. I'll attach an example file containing the following text: 

Müller,Thomas
Jørgensen,Jørgen
Peña,Andrés
Nåm,Fæk

3. Execute SELECT * FROM person_lat1

Result - The following output appears:

+---+--+
| person_lat1.name |
+---+--+
| Müller,Thomas |
| Jørgensen,Jørgen |
| Peña,Andrésørgen |
| Nåm,Fækdrésørgen |
+---+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.

2015-06-19 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11062:
---

 Summary: Remove Exception stacktrace from Log.info when ACL is not 
supported.
 Key: HIVE-11062
 URL: https://issues.apache.org/jira/browse/HIVE-11062
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When logging set to info, Extended ACL Enabled and the file system does not 
support ACL, there are a lot of Exception stack trace in the log file. Although 
it is benign, it can easily make users frustrated. We should set the level to 
show the Exception in debug. 
Current, the Exception in the log looks like:
{noformat}
2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: 
Skipping ACL inheritance: File system for path s3a://yibing/hive does not 
support ACLs but dfs.namenode.acls.enabled is set to true: 
java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
getAclStatus
java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
getAclStatus
at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729)
at 
org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694)
at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at 
org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at 
org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11042) Need fix Utilities.replaceTaskId method

2015-06-18 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-11042:
---

 Summary: Need fix Utilities.replaceTaskId method
 Key: HIVE-11042
 URL: https://issues.apache.org/jira/browse/HIVE-11042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When I are looking at other bug, I found Utilities.replaceTaskId (String, int) 
method is not right.
For example 
Utilities.replaceTaskId(ds%3D1)01, 5); 
return 5

It should return (ds%3D1)05



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10881) The bucket number is not respected in insert overwrite.

2015-06-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10881:
---

 Summary: The bucket number is not respected in insert overwrite.
 Key: HIVE-10881
 URL: https://issues.apache.org/jira/browse/HIVE-10881
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Priority: Critical


When hive.enforce.bucketing is true, the bucket number defined in the table is 
no longer respected in current master and 1.2. This is a regression.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestinput( 
data string 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then I inserted the following data into the buckettestinput table
firstinsert1 
firstinsert2 
firstinsert3 
firstinsert4 
firstinsert5 
firstinsert6 
firstinsert7 
firstinsert8 
secondinsert1 
secondinsert2 
secondinsert3 
secondinsert4 
secondinsert5 
secondinsert6 
secondinsert7 
secondinsert8
set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true;
insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%';
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
(state=42000,code=10141)
{noformat}

The related debug information related to insert overwrite:
{noformat}
0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%'insert overwrite table 
buckettestoutput1 
0: jdbc:hive2://localhost:1 ;
select * from buckettestinput where data like ' 
first%';
INFO  : Number of reduce tasks determined at compile time: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=number
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=number
INFO  : In order to set a constant number of reducers:
INFO  :   set mapred.reduce.tasks=number
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
INFO  : Ended Job = job_local107155352_0001
INFO  : Loading data to table default.buckettestoutput1 from 
file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
totalSize=52, rawDataSize=48]
No rows affected (1.692 seconds)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10879) The bucket number is not respected in insert overwrite.

2015-06-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10879:
---

 Summary: The bucket number is not respected in insert overwrite.
 Key: HIVE-10879
 URL: https://issues.apache.org/jira/browse/HIVE-10879
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Priority: Blocker


When hive.enforce.bucketing is true, the bucket number defined in the table is 
no longer respected in current master and 1.2. This is a regression.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestinput( 
data string 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then I inserted the following data into the buckettestinput table
firstinsert1 
firstinsert2 
firstinsert3 
firstinsert4 
firstinsert5 
firstinsert6 
firstinsert7 
firstinsert8 
secondinsert1 
secondinsert2 
secondinsert3 
secondinsert4 
secondinsert5 
secondinsert6 
secondinsert7 
secondinsert8
set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true;
insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%';
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
(state=42000,code=10141)
{noformat}

The related debug information related to insert overwrite:
{noformat}
0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%'insert overwrite table 
buckettestoutput1 
0: jdbc:hive2://localhost:1 ;
select * from buckettestinput where data like ' 
first%';
INFO  : Number of reduce tasks determined at compile time: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=number
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=number
INFO  : In order to set a constant number of reducers:
INFO  :   set mapred.reduce.tasks=number
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
INFO  : Ended Job = job_local107155352_0001
INFO  : Loading data to table default.buckettestoutput1 from 
file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
totalSize=52, rawDataSize=48]
No rows affected (1.692 seconds)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10880) The bucket number is not respected in insert overwrite.

2015-06-01 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10880:
---

 Summary: The bucket number is not respected in insert overwrite.
 Key: HIVE-10880
 URL: https://issues.apache.org/jira/browse/HIVE-10880
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Priority: Blocker


When hive.enforce.bucketing is true, the bucket number defined in the table is 
no longer respected in current master and 1.2. This is a regression.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestinput( 
data string 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then I inserted the following data into the buckettestinput table
firstinsert1 
firstinsert2 
firstinsert3 
firstinsert4 
firstinsert5 
firstinsert6 
firstinsert7 
firstinsert8 
secondinsert1 
secondinsert2 
secondinsert3 
secondinsert4 
secondinsert5 
secondinsert6 
secondinsert7 
secondinsert8
set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true;
insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%';
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettestoutput1 is 2, whereas the number of files is 1 
(state=42000,code=10141)
{noformat}

The related debug information related to insert overwrite:
{noformat}
0: jdbc:hive2://localhost:1 insert overwrite table buckettestoutput1 
select * from buckettestinput where data like 'first%'insert overwrite table 
buckettestoutput1 
0: jdbc:hive2://localhost:1 ;
select * from buckettestinput where data like ' 
first%';
INFO  : Number of reduce tasks determined at compile time: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=number
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=number
INFO  : In order to set a constant number of reducers:
INFO  :   set mapred.reduce.tasks=number
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
INFO  : Ended Job = job_local107155352_0001
INFO  : Loading data to table default.buckettestoutput1 from 
file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-1
INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, 
totalSize=52, rawDataSize=48]
No rows affected (1.692 seconds)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-05-29 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10866:
---

 Summary: Throw error when client try to insert into bucketed table
 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Reporter: Yongzhi Chen


Currently, hive does not support appends(insert into) bucketed table, see open 
jira HIVE-3608. When insert into such table, the data will be corrupted and 
not fit for bucketmapjoin. 
We need find a way to prevent client from inserting into such table.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true;
insert into table buckettestoutput1 select code from sample_07 where total_emp 
 134354250 limit 10;
After this first insert, I did:
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.auto.convert.sortmerge.join.noconditionaltask=true;

0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
buckettestoutput2 b on (a.data=b.data);
+---+---+
| data  | data  |
+---+---+
+---+---+
So select works fine. 
Second insert:
0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
code from sample_07 where total_emp = 134354250 limit 10;
No rows affected (61.235 seconds)
Then select:
0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
buckettestoutput2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
(state=42000,code=10141)
0: jdbc:hive2://localhost:1
{noformat}
Insert into empty table or partition will be fine, but insert into the 
non-empty one (after second insert in the reproduce), the bucketmapjoin will 
throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10771) separatorChar has no effect in CREATE TABLE AS SELECT statement

2015-05-20 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10771:
---

 Summary: separatorChar has no effect in CREATE TABLE AS SELECT 
statement
 Key: HIVE-10771
 URL: https://issues.apache.org/jira/browse/HIVE-10771
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


To replicate:
CREATE TABLE separator_test 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (separatorChar = |,quoteChar=\,escapeChar=
) 
STORED AS TEXTFILE
AS
SELECT * FROM sample_07;
Then hadoop fs -cat /user/hive/warehouse/separator_test/*
53-3032,Truck drivers, heavy and tractor-trailer,1693590,37560
53-3033,Truck drivers, light or delivery services,922900,28820
53-3041,Taxi drivers and chauffeurs,165590,22740
The separator is till ,, not | as specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10646) ColumnValue does not handle NULL_TYPE

2015-05-07 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10646:
---

 Summary: ColumnValue does not handle NULL_TYPE
 Key: HIVE-10646
 URL: https://issues.apache.org/jira/browse/HIVE-10646
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This will cause NPE if the thrift client use protocol V5 or older:
{noformat}
1:46:07.199 PM  ERROR   org.apache.thrift.server.TThreadPoolServer  
Error occurred during processing of message.
java.lang.NullPointerException
at 
org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388)
at 
org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338)
at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288)
at 
org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605)
at 
org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525)
at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455)
at 
org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550)
at 
org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486)
at 
org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412)
at 
org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272)
at 
org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236)
at 
org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client 
(for example some version of Hue).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-04-22 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10453:
---

 Summary: HS2 leaking open file descriptors when using UDFs
 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


1. create a custom function by
CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
2. Create a simple jdbc client, just do 
connect, 
run simple query which using the function such as:
select myfunc(col1) from sometable
3. Disconnect.
Check open file for HiveServer2 by:
lsof -p HSProcID | grep myudf.jar
You will see the leak as:
{noformat}
java  28718 ychen  txt  REG1,4741 212977666 
/private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
java  28718 ychen  330r REG1,4741 212977666 
/private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster

2015-03-26 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-10098:
---

 Summary: HS2 local task for map join fails in KMS encrypted cluster
 Key: HIVE-10098
 URL: https://issues.apache.org/jira/browse/HIVE-10098
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen


Env: KMS was enabled after cluster was kerberos secured. 
Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails with 
a java.lang.reflect.UndeclaredThrowableException  from 
KMSClientProvider.addDelegationTokens.

{code}
2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation 
(Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is 
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation 
(UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive 
(auth:KERBEROS) 
cause:org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt) 
2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask 
(MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map local 
work failed 
java.io.IOException: java.io.IOException: 
java.lang.reflect.UndeclaredThrowableException 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634) 
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363)
 
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337)
 
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
Caused by: java.io.IOException: java.lang.reflect.UndeclaredThrowableException 
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826)
 
at 
org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
 
at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017)
 
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
 
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
 
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
 
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) 
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413)
 
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559) 
... 9 more 
Caused by: java.lang.reflect.UndeclaredThrowableException 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
 
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808)
 
... 18 more 
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt) 
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306)
 
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196)
 
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command

2015-02-27 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-9813:
--

 Summary: Hive JDBC - DatabaseMetaData.getColumns method cannot 
find classes added with add jar command
 Key: HIVE-9813
 URL: https://issues.apache.org/jira/browse/HIVE-9813
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen


Execute following JDBC client program:
{code}
import java.sql.*;

public class TestAddJar {
private static Connection makeConnection(String connString, String 
classPath) throws ClassNotFoundException, SQLException
{
System.out.println(Current Connection info: + connString);
Class.forName(classPath);
System.out.println(Current driver info: + classPath);
return DriverManager.getConnection(connString);
}

public static void main(String[] args)
{
if(2 != args.length)
{
System.out.println(Two arguments needed: connection string, path 
to jar to be added (include jar name));
System.out.println(Example: java -jar TestApp.jar 
jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar);
return;
}
Connection conn;
try
{
conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver);

System.out.println(---);
System.out.println(DONE);


System.out.println(---);
System.out.println(Execute query: add jar  + args[1] + ;);
Statement stmt = conn.createStatement();
int c = stmt.executeUpdate(add jar  + args[1]);
System.out.println(Returned value is: [ + c + ]\n);


System.out.println(---);
final String createTableQry = Create table if not exists 
json_test(id int, content string)  +
row format serde 'org.openx.data.jsonserde.JsonSerDe';
System.out.println(Execute query: + createTableQry + ;);
stmt.execute(createTableQry);


System.out.println(---);
System.out.println(getColumn() 
Call---\n);
DatabaseMetaData md = conn.getMetaData();
System.out.println(Test get all column in a schema:);
ResultSet rs = md.getColumns(Hive, default, json_test, null);
while (rs.next()) {
System.out.println(rs.getString(1));
}
conn.close();
}
catch (ClassNotFoundException e)
{
e.printStackTrace();
}
catch (SQLException e)
{
e.printStackTrace();
}
}
}
{code}

Get Exception, and from metastore log:
7:41:30.316 PM  ERROR   hive.log
error in initSerDe: java.lang.ClassNotFoundException Class 
org.openx.data.jsonserde.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not 
found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy5.get_schema(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:556)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 

[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-20 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9716:
---
Attachment: HIVE-9716.1.patch

 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-9716.1.patch


 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 {noformat}
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: 
 Invalid input path file:/user/hive/warehouse/t1/data
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
   ... 9 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-20 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9716:
---
Status: Patch Available  (was: Open)

Need code review. 

 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.0, 0.12.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-9716.1.patch


 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 {noformat}
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: 
 Invalid input path file:/user/hive/warehouse/t1/data
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
   ... 9 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-18 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-9716:
--

 Summary: Map job fails when table's LOCATION does not have scheme
 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.0, 0.12.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count (*) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-18 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9716:
---
Description: 
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count ( * ) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more

  was:
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count (*) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more


 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor

 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 java.lang.IllegalStateException: 

[jira] [Commented] (HIVE-9528) SemanticException: Ambiguous column reference

2015-02-02 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301389#comment-14301389
 ] 

Yongzhi Chen commented on HIVE-9528:


[~navis], any idea which jira cause the change of behavior? And Yes, we can 
close the jira as not-problem. Thanks

 SemanticException: Ambiguous column reference
 -

 Key: HIVE-9528
 URL: https://issues.apache.org/jira/browse/HIVE-9528
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Navis

 When running the following query:
 {code}
 SELECT if( COUNT(*) = 0, 'true', 'false' ) as RESULT FROM ( select  *  from 
 sim a join sim2 b on a.simstr=b.simstr) app
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10007]: Ambiguous column reference simstr in app (state=42000,code=10007)
 {code}
 This query works fine in hive 0.10
 In the apache trunk, following workaround will work:
 {code}
 SELECT if(COUNT(*) = 0, 'true', 'false') as RESULT FROM (select a.* from sim 
 a join sim2 b on a.simstr=b.simstr) app;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9528) SemanticException: Ambiguous column reference

2015-02-02 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301429#comment-14301429
 ] 

Yongzhi Chen commented on HIVE-9528:


Is this jira? https://issues.apache.org/jira/browse/HIVE-2723
Thanks

 SemanticException: Ambiguous column reference
 -

 Key: HIVE-9528
 URL: https://issues.apache.org/jira/browse/HIVE-9528
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Navis

 When running the following query:
 {code}
 SELECT if( COUNT(*) = 0, 'true', 'false' ) as RESULT FROM ( select  *  from 
 sim a join sim2 b on a.simstr=b.simstr) app
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10007]: Ambiguous column reference simstr in app (state=42000,code=10007)
 {code}
 This query works fine in hive 0.10
 In the apache trunk, following workaround will work:
 {code}
 SELECT if(COUNT(*) = 0, 'true', 'false') as RESULT FROM (select a.* from sim 
 a join sim2 b on a.simstr=b.simstr) app;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9528) SemanticException: Ambiguous column reference

2015-01-30 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-9528:
--

 Summary: SemanticException: Ambiguous column reference
 Key: HIVE-9528
 URL: https://issues.apache.org/jira/browse/HIVE-9528
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Yongzhi Chen


When running the following query:

SELECT if( COUNT( * ) == 0, 'true', 'false' ) as RESULT FROM ( select  *  from 
sim a join sim2 b on a.simstr=b.simstr) app

Error: Error while compiling statement: FAILED: SemanticException [Error 
10007]: Ambiguous column reference simstr in app (state=42000,code=10007)

This query works fine in hive 0.10

In the apache trunk, following workaround will work:
SELECT if(COUNT( * ) == 0, 'true', 'false') as RESULT FROM (select a.* from sim 
a join sim2 b on a.simstr=b.simstr) app;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2015-01-30 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299469#comment-14299469
 ] 

Yongzhi Chen commented on HIVE-7733:


[~navis], I just create a new jira related to the issue, do you want to look at 
it?
HIVE-9528

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, 
 HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt, HIVE-7733.5.patch.txt, 
 HIVE-7733.6.patch.txt, HIVE-7733.7.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.

2015-01-28 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295305#comment-14295305
 ] 

Yongzhi Chen commented on HIVE-6308:


Thank you Szehon!

This fix treats creating Avro tables without col defs in hive the same as 
creating table with all col defs. 
This fix does not address this kind of avro tables created before the fix.

Tested with hive command:  analyze table compute statistics for column. 

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen
 Fix For: 1.2.0

 Attachments: HIVE-6308.1.patch


 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.

2015-01-25 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291325#comment-14291325
 ] 

Yongzhi Chen commented on HIVE-6308:


The test failures are not related to the change.

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen
 Attachments: HIVE-6308.1.patch


 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.

2015-01-24 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-6308:
---
Status: Patch Available  (was: Open)

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen
 Attachments: HIVE-6308.1.patch


 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.

2015-01-24 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-6308:
---
Attachment: HIVE-6308.1.patch

Need code review

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen
 Attachments: HIVE-6308.1.patch


 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.

2015-01-23 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-6308:
--

Assignee: Yongzhi Chen

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen

 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG

2015-01-15 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279450#comment-14279450
 ] 

Yongzhi Chen commented on HIVE-9393:


[~brocknoland], could you review and commit the patch? Thanks.

 reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
 ---

 Key: HIVE-9393
 URL: https://issues.apache.org/jira/browse/HIVE-9393
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-9393.1.patch


 From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from 
 DEBUG to INFO, this has introduced an very large amount of noise into the 
 logs causing the underlying filesystem to fill up.
 This request is to drop is back to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG

2015-01-15 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-9393:
--

Assignee: Yongzhi Chen

 reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
 ---

 Key: HIVE-9393
 URL: https://issues.apache.org/jira/browse/HIVE-9393
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor

 From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from 
 DEBUG to INFO, this has introduced an very large amount of noise into the 
 logs causing the underlying filesystem to fill up.
 This request is to drop is back to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG

2015-01-15 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-9393:
--

 Summary: reduce noisy log level of ColumnarSerDe.java:116 from 
INFO to DEBUG
 Key: HIVE-9393
 URL: https://issues.apache.org/jira/browse/HIVE-9393
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Priority: Minor


From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from DEBUG 
to INFO, this has introduced an very large amount of noise into the logs 
causing the underlying filesystem to fill up.
This request is to drop is back to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG

2015-01-15 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9393:
---
Status: Patch Available  (was: Open)

Need code review. 

 reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
 ---

 Key: HIVE-9393
 URL: https://issues.apache.org/jira/browse/HIVE-9393
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-9393.1.patch


 From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from 
 DEBUG to INFO, this has introduced an very large amount of noise into the 
 logs causing the underlying filesystem to fill up.
 This request is to drop is back to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9393) reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG

2015-01-15 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9393:
---
Attachment: HIVE-9393.1.patch

 reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
 ---

 Key: HIVE-9393
 URL: https://issues.apache.org/jira/browse/HIVE-9393
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-9393.1.patch


 From Hive 0.13 the log level of ColumnarSerDe.java:116 was upgraded from 
 DEBUG to INFO, this has introduced an very large amount of noise into the 
 logs causing the underlying filesystem to fill up.
 This request is to drop is back to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-06 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266175#comment-14266175
 ] 

Yongzhi Chen commented on HIVE-9201:


Even we will support line terminator other than \n in the future, we have to 
handle the case when line terminator used in string value. Any suggestions or 
corrections for my current approach? Or any better ideas? Thanks

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265035#comment-14265035
 ] 

Yongzhi Chen commented on HIVE-9201:


Just found out, in SerDeUtils, escapeString and lightEscapeString use the same 
way to escape \n and \r as my fix for the issue:

https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L98

https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L129



 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264965#comment-14264965
 ] 

Yongzhi Chen commented on HIVE-9201:


[~ashutoshgupt...@gmail.com],
Are you trying to say we start to Implement LINES TERMINATED BY for hive? It 
is treated as not fixable by 
https://issues.apache.org/jira/browse/HIVE-302
In current hive code, it seems we just error out the line terminator other than 
\n, and many places just assume the \n is the only line terminator. 
case HiveParser.TOK_TABLEROWFORMATLINES:
  String lineDelim = unescapeSQLString(rowChild.getChild(0).getText());
  tblDesc.getProperties().setProperty(serdeConstants.LINE_DELIM, 
lineDelim);
  if (!lineDelim.equals(\n)  !lineDelim.equals(10)) {
throw new SemanticException(generateErrorMessage(rowChild,
ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg()));
  }
  break;
But with MAPREDUCE-2602  fixed, it is possible for hive to support changing the 
line terminator. Just wonder it may not be a easy change.

Thanks.

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >