[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647491#comment-14647491
 ] 

Jesus Camacho Rodriguez commented on HIVE-11397:


[~hagleitn], this looks good to me, we are just transforming the left deep tree 
into a right deep tree; the transformation is legal.

 Parse Hive OR clauses as they are written into the AST
 --

 Key: HIVE-11397
 URL: https://issues.apache.org/jira/browse/HIVE-11397
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez

 When parsing A OR B OR C, hive converts it into 
 (C OR B) OR A
 instead of turning it into
 A OR (B OR C)
 {code}
 GenericUDFOPOr or = new GenericUDFOPOr();
 ListExprNodeDesc expressions = new ArrayListExprNodeDesc(2);
 expressions.add(previous);
 expressions.add(current);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647586#comment-14647586
 ] 

Jesus Camacho Rodriguez commented on HIVE-11391:


[~pxiong], can you review it? This adds the CBO tests to the testsuite with 
return path enabled; it is useful to check that we do not have any regressions 
while working in the return path. Thanks

 CBO (Calcite Return Path): Add CBO tests with return path on
 

 Key: HIVE-11391
 URL: https://issues.apache.org/jira/browse/HIVE-11391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11391.patch, HIVE-11391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.8.patch

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
 HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, 
 HIVE-11383.7.patch, HIVE-11383.8.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing  rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job.  

  was:in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
hive lock heartbeating progress work with job tracking progress. When job 
tracking progress got any exception, will bring lock time out exception to big 
query job. 


 Transaction lock time out when can't tracking job progress
 --

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Affects Versions: 1.2.0
Reporter: shiqian.huang
Priority: Minor

 in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
 hive lock heartbeating progress works with job tracking progress. 
 such like 
   heartbeater.heartbeat();
   if (initializing  rj.getJobState() == JobStatus.PREP) {
 // No reason to poll untill the job is initialized
 continue;
   } else {
 // By now the job is initialized so no reason to do
 // rj.getJobState() again and we do not want to do an extra RPC call
 initializing = false;
   }
 When job tracking progress got any exception in  rj.getJobState() == 
 JobStatus.PREP, will bring lock time out exception to big query job.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing  rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job finally.  

  was:
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing  rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job.  


 Transaction lock time out when can't tracking job progress
 --

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Affects Versions: 1.2.0
Reporter: shiqian.huang
Priority: Minor

 in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
 hive lock heartbeating progress works with job tracking progress. 
 such like 
   heartbeater.heartbeat();
   if (initializing  rj.getJobState() == JobStatus.PREP) {
 // No reason to poll untill the job is initialized
 continue;
   } else {
 // By now the job is initialized so no reason to do
 // rj.getJobState() again and we do not want to do an extra RPC call
 initializing = false;
   }
 When job tracking progress got any exception in  rj.getJobState() == 
 JobStatus.PREP, will bring lock time out exception to big query job finally.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Affects Version/s: 1.2.0
 Priority: Minor  (was: Major)
  Summary: Transaction lock time out when can't tracking job 
progress  (was: Transaction lock)

 Transaction lock time out when can't tracking job progress
 --

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Affects Versions: 1.2.0
Reporter: shiqian.huang
Priority: Minor

 Transaction lock time out when can't tracking job progress. hive 1.2 
 when hive client cann't connect to appmaster to tracking job progress and job 
 running more than 5mins, hive can't refresh lock heartbeat. then will get 
 exception
 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
 lock: 3645)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
 at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
 at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
 at 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
 at 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
 at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 I find the reason is if (initializing  rj.getJobState() == JobStatus.PREP) 
 throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
 is it a bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647608#comment-14647608
 ] 

Rajat Khandelwal commented on HIVE-11376:
-

Taking patch from reviewboard and attaching

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
 Attachments: HIVE-11376_02.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing  rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring  NoSuchLockException(hive client  exception 
message:No record of lock could be found, may have timed out) to big query job 
finally. 

  was:
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing  rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job finally.  


 Transaction lock time out when can't tracking job progress
 --

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Affects Versions: 1.2.0
Reporter: shiqian.huang
Priority: Minor

 in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
 hive lock heartbeating progress works with job tracking progress. 
 such like 
   heartbeater.heartbeat();
   if (initializing  rj.getJobState() == JobStatus.PREP) {
 // No reason to poll untill the job is initialized
 continue;
   } else {
 // By now the job is initialized so no reason to do
 // rj.getJobState() again and we do not want to do an extra RPC call
 initializing = false;
   }
 When job tracking progress got any exception in  rj.getJobState() == 
 JobStatus.PREP, will bring  NoSuchLockException(hive client  exception 
 message:No record of lock could be found, may have timed out) to big query 
 job finally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647574#comment-14647574
 ] 

Rajat Khandelwal commented on HIVE-11376:
-

Created https://reviews.apache.org/r/36939/

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal

 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11411) Transaction lock

2015-07-30 Thread shiqian.huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647575#comment-14647575
 ] 

shiqian.huang commented on HIVE-11411:
--

Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins,  hive can't refresh lock heartbeat. then will get  
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at 
org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is  if (initializing  rj.getJobState() == JobStatus.PREP) 
throw exception when doesn't configure any hadoop slave host in /etc/hosts.   
is it a bug?

 Transaction lock
 

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Reporter: shiqian.huang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins, hive can't refresh lock heartbeat. then will get 
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is if (initializing  rj.getJobState() == JobStatus.PREP) 
throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
is it a bug?

 Transaction lock
 

 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Reporter: shiqian.huang

 Transaction lock time out when can't tracking job progress. hive 1.2 
 when hive client cann't connect to appmaster to tracking job progress and job 
 running more than 5mins, hive can't refresh lock heartbeat. then will get 
 exception
 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
 lock: 3645)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
 at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
 at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
 at 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
 at 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
 at 

[jira] [Updated] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-11376:

Attachment: HIVE-11376_02.patch

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
 Attachments: HIVE-11376_02.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal reassigned HIVE-11376:
---

Assignee: Rajat Khandelwal

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
 Attachments: HIVE-11376_02.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11380:

Summary: NPE when FileSinkOperator is not initialized  (was: NPE when 
FileSinkOperator is not inialized)

 NPE when FileSinkOperator is not initialized
 

 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 When FileSinkOperator's initializeOp is not called (which may happen when an 
 operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
 throw NPE at close time. The stacktrace:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
 ... 18 more
 {noformat}
 This Exception is misleading and often distracts users from finding real 
 issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11380:

Attachment: HIVE-11380.1.patch

Null check to prevent NPE caused by non-initialized FileSinkOperator

 NPE when FileSinkOperator is not initialized
 

 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11380.1.patch


 When FileSinkOperator's initializeOp is not called (which may happen when an 
 operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
 throw NPE at close time. The stacktrace:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
 ... 18 more
 {noformat}
 This Exception is misleading and often distracts users from finding real 
 issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11397:
---
Attachment: HIVE-11397.patch

 Parse Hive OR clauses as they are written into the AST
 --

 Key: HIVE-11397
 URL: https://issues.apache.org/jira/browse/HIVE-11397
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11397.patch


 When parsing A OR B OR C, hive converts it into 
 (C OR B) OR A
 instead of turning it into
 A OR (B OR C)
 {code}
 GenericUDFOPOr or = new GenericUDFOPOr();
 ListExprNodeDesc expressions = new ArrayListExprNodeDesc(2);
 expressions.add(previous);
 expressions.add(current);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647680#comment-14647680
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747969/HIVE-11383.8.patch

{color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 9276 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subq_where_serialization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in

[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: (was: HIVE-11401.1.patch)

 Predicate push down does not work with Parquet when partitions are in the 
 expression
 

 Key: HIVE-11401
 URL: https://issues.apache.org/jira/browse/HIVE-11401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-11401.1.patch


 When filtering Parquet tables using a partition column, the query fails 
 saying the column does not exist:
 {noformat}
 hive create table part1 (id int, content string) partitioned by (p string) 
 stored as parquet;
 hive alter table part1 add partition (p='p1');
 hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
 hive select id from part1 where p='p1';
 Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
 Column [p] was not found in schema!
 Time taken: 0.151 seconds
 {noformat}
 It is correct that the partition column is not part of the Parquet schema. 
 So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.1.patch

 Predicate push down does not work with Parquet when partitions are in the 
 expression
 

 Key: HIVE-11401
 URL: https://issues.apache.org/jira/browse/HIVE-11401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-11401.1.patch


 When filtering Parquet tables using a partition column, the query fails 
 saying the column does not exist:
 {noformat}
 hive create table part1 (id int, content string) partitioned by (p string) 
 stored as parquet;
 hive alter table part1 add partition (p='p1');
 hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
 hive select id from part1 where p='p1';
 Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
 Column [p] was not found in schema!
 Time taken: 0.151 seconds
 {noformat}
 It is correct that the partition column is not part of the Parquet schema. 
 So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.1.patch

 Predicate push down does not work with Parquet when partitions are in the 
 expression
 

 Key: HIVE-11401
 URL: https://issues.apache.org/jira/browse/HIVE-11401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-11401.1.patch


 When filtering Parquet tables using a partition column, the query fails 
 saying the column does not exist:
 {noformat}
 hive create table part1 (id int, content string) partitioned by (p string) 
 stored as parquet;
 hive alter table part1 add partition (p='p1');
 hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
 hive select id from part1 where p='p1';
 Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
 Column [p] was not found in schema!
 Time taken: 0.151 seconds
 {noformat}
 It is correct that the partition column is not part of the Parquet schema. 
 So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Summary: Fix OOM in MapTask with many input partitions with RCFile  (was: 
Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's 
cachedLazyStruct weakly referenced)

 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to partition here) will construct its own SerDe
 * Each SerDe will do its own caching of deserialized object (and try to reuse 
 it), but will never release it (in this case, the 
 serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
 a lot of space - pretty much the last N rows of a file where N is the number 
 of rows in a columnar block).
 * This problem may exist in other SerDe as well, but columnar file format are 
 affected the most because they need bigger cache for the last N rows instead 
 of 1 row.
 Proposed solution:
 * Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
 other columnar serde if any (e.g. maybe ORCFile's serde as well).
 Alternative solutions:
 * We can also free up the whole SerDe after processing a block/file.  The 
 problem with that is that the input splits may contain multiple blocks/files 
 that maps to the same SerDe, and recreating a SerDe is just more work.
 * We can also move the SerDe creation/free-up to the place when input file 
 changes.  But that requires a much bigger change to the code.
 * We can also add a cleanup() method to SerDe interface that release the 
 cached object, but that change is not backward compatible with many SerDes 
 that people have wrote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Component/s: File Formats

 Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's 
 cachedLazyStruct weakly referenced
 --

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to partition here) will construct its own SerDe
 * Each SerDe will do its own caching of deserialized object (and try to reuse 
 it), but will never release it (in this case, the 
 serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
 a lot of space - pretty much the last N rows of a file where N is the number 
 of rows in a columnar block).
 * This problem may exist in other SerDe as well, but columnar file format are 
 affected the most because they need bigger cache for the last N rows instead 
 of 1 row.
 Proposed solution:
 * Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
 other columnar serde if any (e.g. maybe ORCFile's serde as well).
 Alternative solutions:
 * We can also free up the whole SerDe after processing a block/file.  The 
 problem with that is that the input splits may contain multiple blocks/files 
 that maps to the same SerDe, and recreating a SerDe is just more work.
 * We can also move the SerDe creation/free-up to the place when input file 
 changes.  But that requires a much bigger change to the code.
 * We can also add a cleanup() method to SerDe interface that release the 
 cached object, but that change is not backward compatible with many SerDes 
 that people have wrote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648281#comment-14648281
 ] 

Sushanth Sowmyan commented on HIVE-11407:
-

The edits look good, +1.

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.2.patch

 Predicate push down does not work with Parquet when partitions are in the 
 expression
 

 Key: HIVE-11401
 URL: https://issues.apache.org/jira/browse/HIVE-11401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-11401.1.patch, HIVE-11401.2.patch


 When filtering Parquet tables using a partition column, the query fails 
 saying the column does not exist:
 {noformat}
 hive create table part1 (id int, content string) partitioned by (p string) 
 stored as parquet;
 hive alter table part1 add partition (p='p1');
 hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
 hive select id from part1 where p='p1';
 Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
 Column [p] was not found in schema!
 Time taken: 0.151 seconds
 {noformat}
 It is correct that the partition column is not part of the Parquet schema. 
 So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407-branch-1.0.patch

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407-branch-1.0.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2015-07-30 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648302#comment-14648302
 ] 

Vaibhav Gumashta commented on HIVE-11408:
-

Looks like we fixed this in 1.2 via HIVE-10329.

 HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
 used
 ---

 Key: HIVE-11408
 URL: https://issues.apache.org/jira/browse/HIVE-11408
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta

 I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
 (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
 Basically, add jar creates a new classloader for loading the classes from the 
 new jar and adds the new classloader to the SessionState object of user's 
 session, making the older one its parent. Creating a temporary function uses 
 the new classloader to load the class used for the function. On closing a 
 session, although there is code to close the classloader for the session, I'm 
 not seeing the new classloader getting GCed and from the heapdump I can see 
 it holds on to the temporary function's class that should have gone away 
 after the session close. 
 Steps to reproduce:
 1.
 {code}
 jdbc:hive2://localhost:1/ add jar hdfs:///tmp/audf.jar;
 {code}
 2. 
 Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
 added.
 3. 
 {code}
 jdbc:hive2://localhost:1/ CREATE TEMPORARY FUNCTION funcA AS 
 'org.gumashta.udf.AUDF'; 
 {code}
 4. 
 Close the jdbc session.
 5. 
 Take the memory snapshot and verify that the new URLClassLoader is indeed 
 there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
 session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.


  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
other columnar serde if any (e.g. maybe ORCFile's serde as well).

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.



 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to partition here) will construct its own SerDe
 * Each SerDe will do its own caching of deserialized object (and try to reuse 
 it), but will never release it (in this case, the 
 serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
 a lot of space - pretty much the last N rows of a file where N is the number 
 of rows in a columnar block).
 * 

[jira] [Updated] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks

2015-07-30 Thread Raajay Viswanathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raajay Viswanathan updated HIVE-11413:
--
Attachment: HIVE-11413.patch

Check if _saHooks_ is empty instead of checking if it is NULL. Need code review.

 Error in detecting availability of HiveSemanticAnalyzerHooks
 

 Key: HIVE-11413
 URL: https://issues.apache.org/jira/browse/HIVE-11413
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.0.0
Reporter: Raajay Viswanathan
Assignee: Raajay Viswanathan
Priority: Trivial
  Labels: newbie
 Attachments: HIVE-11413.patch


 In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of 
 available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the 
 {{getHooks}} method. This method always  returns a {{List}} of hooks. 
 However, while checking for availability of hooks, the current version of the 
 code uses a comparison of _saHooks_ with NULL. This is incorrect, as the 
 segment of code designed to call pre and post Analyze functions gets executed 
 even when the list is empty. The comparison should be changed to 
 {{saHooks.size()  0}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407.1.patch

Patch for master and branch-1 .


 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648237#comment-14648237
 ] 

Thejas M Nair edited comment on HIVE-11407 at 7/30/15 8:17 PM:
---

HIVE-11407.1.patch - Patch for master and branch-1 .



was (Author: thejas):
Patch for master and branch-1 .


 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648241#comment-14648241
 ] 

Thejas M Nair commented on HIVE-11407:
--

[~sushanth] Can you please review my edits to your patch ? 


 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: (was: HIVE-11407-branch-1.0.patch)

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: (was: HIVE-11407-branch-1.0.patch)

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407-branch-1.0.patch

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648458#comment-14648458
 ] 

Eugene Koifman commented on HIVE-11418:
---

This feels dangerous, but if rm -Rf exists perhaps this is valid as well.

On a separate note, setting fs.trash.intrval in a Hive session (or 
hive-site.xml) will lead to unexpected behavior.
Hadoop code won't see this value.  (HIVE-10986)

 Dropping a database in an encryption zone with CASCADE and trash enabled fails
 --

 Key: HIVE-11418
 URL: https://issues.apache.org/jira/browse/HIVE-11418
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sergio Peña

 Here's the query that fails:
 {noformat}
 hive CREATE DATABASE db;
 hive USE db;
 hive CREATE TABLE a(id int);
 hive SET fs.trash.interval=1;
 hive DROP DATABASE db CASCADE;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
 db.a because it is in an encryption zone and trash
  is enabled.  Use PURGE option to skip trash.)
 {noformat}
 DROP DATABASE does not support PURGE, so we have to remove the tables one by 
 one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648523#comment-14648523
 ] 

Mostafa Mokhtar commented on HIVE-11410:


[~mmccline]

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648470#comment-14648470
 ] 

Sergio Peña commented on HIVE-10884:


Does this issue happen only with the attached patch? Or it happened because I 
enabled the TestBeeLineDriver tests?
The directory is preserved in the jenkins slaves, but those slaves expired 
after a while, and then destroyed; so we don't have access to those logs 
anymore.

 Enable some beeline tests and turn on HIVE-4239 by default
 --

 Key: HIVE-10884
 URL: https://issues.apache.org/jira/browse/HIVE-10884
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
 HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
 HIVE-10884.06.patch, HIVE-10884.07.patch, HIVE-10884.07.patch, 
 HIVE-10884.patch


 See comments in HIVE-4239.
 Beeline tests with parallelism need to be enabled to turn compilation 
 parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: (was: HIVE-10863.0-spark.patch)

 Merge trunk to Spark branch 7/29/2015 [Spark Branch]
 

 Key: HIVE-10863
 URL: https://issues.apache.org/jira/browse/HIVE-10863
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: mj.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11160) Auto-gather column stats

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648420#comment-14648420
 ] 

Hive QA commented on HIVE-11160:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748041/HIVE-11160.03.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4765/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748041 - PreCommit-HIVE-TRUNK-Build

 Auto-gather column stats
 

 Key: HIVE-11160
 URL: https://issues.apache.org/jira/browse/HIVE-11160
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, 
 HIVE-11160.03.patch


 Hive will collect table stats when set hive.stats.autogather=true during the 
 INSERT OVERWRITE command. And then the users need to collect the column stats 
 themselves using Analyze command. In this patch, the column stats will also 
 be collected automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648434#comment-14648434
 ] 

Sergio Peña commented on HIVE-11418:


I think we should support PURGE when dropping a database as well. 
[~ekoifman] What do you think about this?

 Dropping a database in an encryption zone with CASCADE and trash enabled fails
 --

 Key: HIVE-11418
 URL: https://issues.apache.org/jira/browse/HIVE-11418
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sergio Peña

 Here's the query that fails:
 {noformat}
 hive CREATE DATABASE db;
 hive USE db;
 hive CREATE TABLE a(id int);
 hive SET fs.trash.interval=1;
 hive DROP DATABASE db CASCADE;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
 db.a because it is in an encryption zone and trash
  is enabled.  Use PURGE option to skip trash.)
 {noformat}
 DROP DATABASE does not support PURGE, so we have to remove the tables one by 
 one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648468#comment-14648468
 ] 

Eugene Koifman commented on HIVE-11418:
---

I meant hadoop code that actually check to see if file should be moved to trash

 Dropping a database in an encryption zone with CASCADE and trash enabled fails
 --

 Key: HIVE-11418
 URL: https://issues.apache.org/jira/browse/HIVE-11418
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sergio Peña

 Here's the query that fails:
 {noformat}
 hive CREATE DATABASE db;
 hive USE db;
 hive CREATE TABLE a(id int);
 hive SET fs.trash.interval=1;
 hive DROP DATABASE db CASCADE;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
 db.a because it is in an encryption zone and trash
  is enabled.  Use PURGE option to skip trash.)
 {noformat}
 DROP DATABASE does not support PURGE, so we have to remove the tables one by 
 one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648520#comment-14648520
 ] 

Hive QA commented on HIVE-10863:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748082/HIVE-10863.1-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7742 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-945/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748082 - PreCommit-HIVE-SPARK-Build

 Merge master to Spark branch 7/29/2015 [Spark Branch]
 -

 Key: HIVE-10863
 URL: https://issues.apache.org/jira/browse/HIVE-10863
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10863.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648461#comment-14648461
 ] 

Sergio Peña commented on HIVE-8128:
---

Parquet 1.8.1 is now officially released.  Would it help if we bump up to 1.8.1?

 Improve Parquet Vectorization
 -

 Key: HIVE-8128
 URL: https://issues.apache.org/jira/browse/HIVE-8128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Fix For: parquet-branch

 Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, 
 HIVE-8128.6-parquet.patch, HIVE-8128.6-parquet.patch, testParquetFile


 We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
 VectorizedOrcSerde) which was partially done in HIVE-5998.
 As discussed in PARQUET-131, we will work out Hive POC based on the new 
 Parquet vectorized API, and then finish the implementation after finilized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is a much bigger change to 
the code.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.



  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.




 Fix OOM in MapTask with many input partitions with RCFile
 -

 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
Reporter: Zheng Shao
Priority: Minor

 MapTask hit OOM in the following situation in our production environment:
 * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
 * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
 * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
 * MapTask memory Xmx: 1.5GB
 By analyzing the heap dump using jhat, we realized that the problem is:
 * One single mapper is processing many partitions (because of 
 CombineHiveInputFormat)
 * Each input path (equivalent to 

[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648541#comment-14648541
 ] 

Hive QA commented on HIVE-11409:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748059/HIVE-11409.02.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4766/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748059 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
 UNION
 --

 Key: HIVE-11409
 URL: https://issues.apache.org/jira/browse/HIVE-11409
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11409.01.patch, HIVE-11409.02.patch


 Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
 branch is the primary branch) of union can be casted to that of the primary 
 branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
 redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks

2015-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648726#comment-14648726
 ] 

Xuefu Zhang edited comment on HIVE-11423 at 7/31/15 4:20 AM:
-

FYI: this issue was found in Spark branch (HIVE-10863) and fix was included  in 
patch for HIVE-10166.


was (Author: xuefuz):
FYI: this issue was found in Spark branch (HIVE-10853) and fix was included  in 
patch for HIVE-10166.

 Ship hive-storage-api along with hive-exec jar to all Tasks
 ---

 Key: HIVE-11423
 URL: https://issues.apache.org/jira/browse/HIVE-11423
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 2.0.0
Reporter: Gopal V
Priority: Blocker

 After moving critical classes into hive-storage-api, those classes are needed 
 for queries to execute successfully.
 Currently all queries run fail with ClassNotFound exceptions on a large 
 cluster.
 {code}
 Caused by: java.lang.NoClassDefFoundError: 
 Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
 at java.lang.Class.getDeclaredFields0(Native Method)
 at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
 at java.lang.Class.getDeclaredFields(Class.java:1916)
 at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150)
 at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109)
 ... 57 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 62 more
 {code}
 Temporary workaround added to hiverc: {{add jar 
 ./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks

2015-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648726#comment-14648726
 ] 

Xuefu Zhang edited comment on HIVE-11423 at 7/31/15 4:19 AM:
-

FYI: this issue was found in Spark branch (HIVE-10853) and fix was included  in 
patch for HIVE-10166.


was (Author: xuefuz):
FYI: this issue was found in Spark branch (HIVE-10835) and fix was included  in 
patch for HIVE-10166.

 Ship hive-storage-api along with hive-exec jar to all Tasks
 ---

 Key: HIVE-11423
 URL: https://issues.apache.org/jira/browse/HIVE-11423
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 2.0.0
Reporter: Gopal V
Priority: Blocker

 After moving critical classes into hive-storage-api, those classes are needed 
 for queries to execute successfully.
 Currently all queries run fail with ClassNotFound exceptions on a large 
 cluster.
 {code}
 Caused by: java.lang.NoClassDefFoundError: 
 Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
 at java.lang.Class.getDeclaredFields0(Native Method)
 at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
 at java.lang.Class.getDeclaredFields(Class.java:1916)
 at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150)
 at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109)
 ... 57 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 62 more
 {code}
 Temporary workaround added to hiverc: {{add jar 
 ./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11355:
--
Attachment: HIVE-11355.3.patch

Union tests fix.

 Hive on tez: memory manager for sort buffers (input/output) and operators
 -

 Key: HIVE-11355
 URL: https://issues.apache.org/jira/browse/HIVE-11355
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 2.0.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
 HIVE-11355.3.patch


 We need to better manage the sort buffer allocations to ensure better 
 performance. Also, we need to provide configurations to certain operators to 
 stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-30 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648619#comment-14648619
 ] 

Yongzhi Chen commented on HIVE-11384:
-

Thanks [~szehon] for reviewing it. 

 Add Test case which cover both HIVE-11271 and HIVE-11333
 

 Key: HIVE-11384
 URL: https://issues.apache.org/jira/browse/HIVE-11384
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer, Parser
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11384.1.patch


 Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
 pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648623#comment-14648623
 ] 

Hive QA commented on HIVE-11407:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748056/HIVE-11407.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4767/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4767/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4767/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748056 - PreCommit-HIVE-TRUNK-Build

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-07-30 Thread Wojciech Indyk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648781#comment-14648781
 ] 

Wojciech Indyk commented on HIVE-11329:
---

sure, here is the request: https://reviews.apache.org/r/36974/

 Column prefix in key of hbase column prefix map
 ---

 Key: HIVE-11329
 URL: https://issues.apache.org/jira/browse/HIVE-11329
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Wojciech Indyk
Assignee: Wojciech Indyk
Priority: Minor
 Attachments: HIVE-11329.1.patch


 When I create a table with hbase column prefix 
 https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
 map in hive. 
 E.g. record in HBase
 rowkey: 123
 column: tag_one, value: 0.5
 column: tag_two, value 0.5
 representation in Hive via column prefix mapping tag_.*:
 column: tag mapstring,string
 key: tag_one, value: 0.5
 key: tag_two, value: 0.5
 should be:
 key: one, value: 0.5
 key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648766#comment-14648766
 ] 

Hive QA commented on HIVE-11416:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748086/HIVE-11416.01.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9274 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4769/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4769/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4769/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748086 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby 
 Optimizer assumes the schema can match after removing RS and GBY
 --

 Key: HIVE-11416
 URL: https://issues.apache.org/jira/browse/HIVE-11416
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11416.01.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648150#comment-14648150
 ] 

Hive QA commented on HIVE-11380:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748014/HIVE-11380.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4763/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748014 - PreCommit-HIVE-TRUNK-Build

 NPE when FileSinkOperator is not initialized
 

 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11380.1.patch


 When FileSinkOperator's initializeOp is not called (which may happen when an 
 operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
 throw NPE at close time. The stacktrace:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
 ... 18 more
 {noformat}
 This Exception is misleading and often distracts users from finding real 
 issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647241#comment-14647241
 ] 

Hive QA commented on HIVE-10319:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747893/HIVE-10319.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4755/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-metastore 
---
[INFO] Compiling 244 source files to 
/data/hive-ptest/working/apache-github-source-source/metastore/target/classes
[INFO] -
[WARNING] COMPILATION WARNING : 
[INFO] -
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 4 warnings 
[INFO] -
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[71,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[5544,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[217,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: interface 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Iface
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[79,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[40,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[2049,10]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class org.apache.hadoop.hive.metastore.HiveMetaStoreClient
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[1134,3]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: interface org.apache.hadoop.hive.metastore.IMetaStoreClient
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[7491,14]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_all_functions_call
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[3300,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
[ERROR] 

[jira] [Updated] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated HIVE-10319:
---
Attachment: HIVE-10319.5.patch

 Hive CLI startup takes a long time with a large number of databases
 ---

 Key: HIVE-10319
 URL: https://issues.apache.org/jira/browse/HIVE-10319
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.0.0
Reporter: Nezih Yigitbasi
Assignee: Nezih Yigitbasi
 Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
 HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch


 The Hive CLI takes a long time to start when there is a large number of 
 databases in the DW. I think the root cause is the way permanent UDFs are 
 loaded from the metastore. When I looked at the logs and the source code I 
 see that at startup Hive first gets all the databases from the metastore and 
 then for each database it makes a metastore call to get the permanent 
 functions for that database [see Hive.java | 
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
  So the number of metastore calls made is in the order of the number of 
 databases. In production we have several hundreds of databases so Hive makes 
 several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-30 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647266#comment-14647266
 ] 

Carl Steinbach commented on HIVE-4239:
--

It should probably go in both the hs2 and compiler sections.



 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance

2015-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11406:

Attachment: HIVE-11406.01.patch

 Vectorization: StringExpr::compare() == 0 is bad for performance
 

 Key: HIVE-11406
 URL: https://issues.apache.org/jira/browse/HIVE-11406
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-11406.01.patch


 {{StringExpr::compare() == 0}} is forced to evaluate the whole memory 
 comparison loop for differing lengths of strings, though there is no 
 possibility they will ever be equal.
 Add a {{StringExpr::equals}} which can be a smaller and tighter loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner

2015-07-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8343:
-
Description: 
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}

The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html

  was:
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}
The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html


 Return value from BlockingQueue.offer() is not checked in 
 DynamicPartitionPruner
 

 Key: HIVE-8343
 URL: https://issues.apache.org/jira/browse/HIVE-8343
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: JongWon Park
Priority: Minor
 Attachments: HIVE-8343.patch


 In addEvent() and processVertex(), there is call such as the following:
 {code}
   queue.offer(event);
 {code}
 The return value should be checked. If false is returned, event would not 
 have been queued.
 Take a look at line 328 in:
 http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648058#comment-14648058
 ] 

Szehon Ho commented on HIVE-11384:
--

No problem, makes sense.  +1, always good to have more tests.

 Add Test case which cover both HIVE-11271 and HIVE-11333
 

 Key: HIVE-11384
 URL: https://issues.apache.org/jira/browse/HIVE-11384
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer, Parser
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11384.1.patch


 Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
 pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648063#comment-14648063
 ] 

Szehon Ho commented on HIVE-11380:
--

+1, seems good to add the null check here to me

 NPE when FileSinkOperator is not initialized
 

 Key: HIVE-11380
 URL: https://issues.apache.org/jira/browse/HIVE-11380
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11380.1.patch


 When FileSinkOperator's initializeOp is not called (which may happen when an 
 operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
 throw NPE at close time. The stacktrace:
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
 ... 18 more
 {noformat}
 This Exception is misleading and often distracts users from finding real 
 issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648081#comment-14648081
 ] 

Prasanth Jayachandran commented on HIVE-11405:
--

[~gopalv] is the column stats available for this query? If not your patch will 
early terminate because of data size becoming 0 and AND evaluation terminating 
early. Also I am not sure if this assumption is correct
{code}
final long branch2Rows = (newNumRows = branchRows) ? 0 : (newNumRows - 
branchRows);
{code}

I am still evaluating this change. The idea of mirroring the tree and passing 
the branchRows to sibling branch looks good so far.

 Add early termination for recursion in 
 StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
 --

 Key: HIVE-11405
 URL: https://issues.apache.org/jira/browse/HIVE-11405
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Prasanth Jayachandran

 Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
 him,
 The recursion protection works well with an AND expr, but it doesn't work 
 against
 (OR a=1 (OR a=2 (OR a=3 (OR ...)
 since the for the rows will never be reduced during recursion due to the 
 nature of the OR.
 We need to execute a short-circuit to satisfy the OR properly - no case which 
 matches a=1 qualifies for the rest of the filters.
 Recursion should pass in the numRows - branch1Rows for the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648100#comment-14648100
 ] 

Szehon Ho commented on HIVE-11401:
--

+1 makes sense from my end.

 Predicate push down does not work with Parquet when partitions are in the 
 expression
 

 Key: HIVE-11401
 URL: https://issues.apache.org/jira/browse/HIVE-11401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-11401.1.patch


 When filtering Parquet tables using a partition column, the query fails 
 saying the column does not exist:
 {noformat}
 hive create table part1 (id int, content string) partitioned by (p string) 
 stored as parquet;
 hive alter table part1 add partition (p='p1');
 hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
 hive select id from part1 where p='p1';
 Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
 Column [p] was not found in schema!
 Time taken: 0.151 seconds
 {noformat}
 It is correct that the partition column is not part of the Parquet schema. 
 So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11160) Auto-gather column stats

2015-07-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11160:
---
Attachment: HIVE-11160.03.patch

rebase the patch

 Auto-gather column stats
 

 Key: HIVE-11160
 URL: https://issues.apache.org/jira/browse/HIVE-11160
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, 
 HIVE-11160.03.patch


 Hive will collect table stats when set hive.stats.autogather=true during the 
 INSERT OVERWRITE command. And then the users need to collect the column stats 
 themselves using Analyze command. In this patch, the column stats will also 
 be collected automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647238#comment-14647238
 ] 

Hive QA commented on HIVE-11387:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747871/HIVE-11387.04.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4754/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747871 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
 reduce_deduplicate optimization
 --

 Key: HIVE-11387
 URL: https://issues.apache.org/jira/browse/HIVE-11387
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
 HIVE-11387.03.patch, HIVE-11387.04.patch


 {noformat}
 The main problem is that, due to return path, now we may have 
 (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
 non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
 problem is that it does not take into account of the setting.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()

2015-07-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647243#comment-14647243
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11316:
--

[~jcamachorodriguez] Can you please look at patch#7

Thanks
Hari

 Use datastructure that doesnt duplicate any part of string for 
 ASTNode::toStringTree()
 --

 Key: HIVE-11316
 URL: https://issues.apache.org/jira/browse/HIVE-11316
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11316-branch-1.0.patch, 
 HIVE-11316-branch-1.2.patch, HIVE-11316.1.patch, HIVE-11316.2.patch, 
 HIVE-11316.3.patch, HIVE-11316.4.patch, HIVE-11316.5.patch, 
 HIVE-11316.6.patch, HIVE-11316.7.patch


 HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
 is suppose to alter the string memoization to use a different data structure 
 that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647277#comment-14647277
 ] 

Lefty Leverenz commented on HIVE-10165:
---

If the fix version bit looks familiar, that's because I borrowed it from your 
comment on HIVE-9583.

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: TODOC2.0, streaming_api
 Fix For: 2.0.0

 Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, 
 HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, 
 HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647283#comment-14647283
 ] 

Lefty Leverenz commented on HIVE-4239:
--

Hmm ... a compiler section would be nice to have.  Maybe we could add one.  
Thanks Carl.

 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647930#comment-14647930
 ] 

Pengcheng Xiong commented on HIVE-11391:


[~jcamachorodriguez], can u resubmit the patch for a QA run due to the recent 
commit of multijoin? If it can pass, +1.

 CBO (Calcite Return Path): Add CBO tests with return path on
 

 Key: HIVE-11391
 URL: https://issues.apache.org/jira/browse/HIVE-11391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11391.patch, HIVE-11391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-11410:
---

Assignee: Matt McCline

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Assignee: Matt McCline
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file

2015-07-30 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647986#comment-14647986
 ] 

Ryan Blue commented on HIVE-8950:
-

[~gauravkumar37], the schema is read from the file once and then converted to 
DDL. The file is no longer used after that, so schema evolution proceeds as it 
normally would for any table.

 Add support in ParquetHiveSerde to create table schema from a parquet file
 --

 Key: HIVE-8950
 URL: https://issues.apache.org/jira/browse/HIVE-8950
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish K Singh
Assignee: Gaurav Kumar
 Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, 
 HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, 
 HIVE-8950.8.patch, HIVE-8950.patch


 PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without 
 having to specify the column names and types. As, parquet files store schema 
 in their footer, it is possible to generate hive schema from parquet file's 
 metadata. This will improve usability of parquet backed tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648023#comment-14648023
 ] 

Jason Dere commented on HIVE-10319:
---

+1

 Hive CLI startup takes a long time with a large number of databases
 ---

 Key: HIVE-10319
 URL: https://issues.apache.org/jira/browse/HIVE-10319
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.0.0
Reporter: Nezih Yigitbasi
Assignee: Nezih Yigitbasi
 Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
 HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch


 The Hive CLI takes a long time to start when there is a large number of 
 databases in the DW. I think the root cause is the way permanent UDFs are 
 loaded from the metastore. When I looked at the logs and the source code I 
 see that at startup Hive first gets all the databases from the metastore and 
 then for each database it makes a metastore call to get the permanent 
 functions for that database [see Hive.java | 
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
  So the number of metastore calls made is in the order of the number of 
 databases. In production we have several hundreds of databases so Hive makes 
 several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648123#comment-14648123
 ] 

Pengcheng Xiong commented on HIVE-11409:


a good example is union_remove_10.q
{code}
Group By Operator
  aggregations: count(VALUE._col0)
  keys: KEY._col0 (type: string)
  mode: mergepartial
  outputColumnNames: $f0, $f1
  Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f0 (type: string), $f1 (type: bigint)
outputColumnNames: key, values
Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
  output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
  name: default.outputtbl1
{code}

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
 UNION
 --

 Key: HIVE-11409
 URL: https://issues.apache.org/jira/browse/HIVE-11409
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11409.01.patch


 Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
 branch is the primary branch) of union can be casted to that of the primary 
 branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
 redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks

2015-07-30 Thread Raajay Viswanathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raajay Viswanathan reassigned HIVE-11413:
-

Assignee: Raajay Viswanathan

 Error in detecting availability of HiveSemanticAnalyzerHooks
 

 Key: HIVE-11413
 URL: https://issues.apache.org/jira/browse/HIVE-11413
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.0.0
Reporter: Raajay Viswanathan
Assignee: Raajay Viswanathan
Priority: Trivial
  Labels: newbie

 In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of 
 available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the 
 {{getHooks}} method. This method always  returns a {{List}} of hooks. 
 However, while checking for availability of hooks, the current version of the 
 code uses a comparison of _saHooks_ with NULL. This is incorrect, as the 
 segment of code designed to call pre and post Analyze functions gets executed 
 even when the list is empty. The comparison should be changed to 
 {{saHooks.size()  0}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647474#comment-14647474
 ] 

Hive QA commented on HIVE-11406:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747946/HIVE-11406.01.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747946 - PreCommit-HIVE-TRUNK-Build

 Vectorization: StringExpr::compare() == 0 is bad for performance
 

 Key: HIVE-11406
 URL: https://issues.apache.org/jira/browse/HIVE-11406
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-11406.01.patch


 {{StringExpr::compare() == 0}} is forced to evaluate the whole memory 
 comparison loop for differing lengths of strings, though there is no 
 possibility they will ever be equal.
 Add a {{StringExpr::equals}} which can be a smaller and tighter loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647478#comment-14647478
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747964/HIVE-11383.7.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4759/

Messages:
{noformat}
 This message was trimmed, see log for full details 
3426/3551 KB   
3430/3551 KB   
3434/3551 KB   
3438/3551 KB   
3442/3551 KB   
3446/3551 KB   
3450/3551 KB   
3454/3551 KB   
3458/3551 KB   
3462/3551 KB   
3466/3551 KB   
3470/3551 KB   
3474/3551 KB   
3478/3551 KB   
3482/3551 KB   
3486/3551 KB   
3490/3551 KB   
3494/3551 KB   
3498/3551 KB   
3502/3551 KB   
3506/3551 KB   
3510/3551 KB   
3514/3551 KB   
3518/3551 KB   
3522/3551 KB   
3526/3551 KB   
3530/3551 KB   
3534/3551 KB   
3538/3551 KB   
3542/3551 KB   
3546/3551 KB   
3550/3551 KB   
3551/3551 KB   
   
Downloaded: 
http://repository.apache.org/snapshots/org/apache/calcite/calcite-core/1.4.0-incubating-SNAPSHOT/calcite-core-1.4.0-incubating-20150729.211031-2.jar
 (3551 KB at 1005.9 KB/sec)
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-exec ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
Generating vector expression code
Generating vector expression test code
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java
 added.
[INFO] 
[INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec ---
[INFO] ANTLR: Processing source directory 
/data/hive-ptest/working/apache-github-source-source/ql/src/java
ANTLR Parser Generator  Version 3.4
org/apache/hadoop/hive/ql/parse/HiveLexer.g
org/apache/hadoop/hive/ql/parse/HiveParser.g
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_REDUCE 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, 

[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.7.patch

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
 HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647384#comment-14647384
 ] 

Hive QA commented on HIVE-10319:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747929/HIVE-10319.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4757/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747929 - PreCommit-HIVE-TRUNK-Build

 Hive CLI startup takes a long time with a large number of databases
 ---

 Key: HIVE-10319
 URL: https://issues.apache.org/jira/browse/HIVE-10319
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.0.0
Reporter: Nezih Yigitbasi
Assignee: Nezih Yigitbasi
 Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
 HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch


 The Hive CLI takes a long time to start when there is a large number of 
 databases in the DW. I think the root cause is the way permanent UDFs are 
 loaded from the metastore. When I looked at the logs and the source code I 
 see that at startup Hive first gets all the databases from the metastore and 
 then for each database it makes a metastore call to get the permanent 
 functions for that database [see Hive.java | 
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
  So the number of metastore calls made is in the order of the number of 
 databases. In production we have several hundreds of databases so Hive makes 
 several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Nicholas Brenwald (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Brenwald updated HIVE-11410:
-
Attachment: hive-site.xml

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)