[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST
[ https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647491#comment-14647491 ] Jesus Camacho Rodriguez commented on HIVE-11397: [~hagleitn], this looks good to me, we are just transforming the left deep tree into a right deep tree; the transformation is legal. Parse Hive OR clauses as they are written into the AST -- Key: HIVE-11397 URL: https://issues.apache.org/jira/browse/HIVE-11397 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez When parsing A OR B OR C, hive converts it into (C OR B) OR A instead of turning it into A OR (B OR C) {code} GenericUDFOPOr or = new GenericUDFOPOr(); ListExprNodeDesc expressions = new ArrayListExprNodeDesc(2); expressions.add(previous); expressions.add(current); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on
[ https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647586#comment-14647586 ] Jesus Camacho Rodriguez commented on HIVE-11391: [~pxiong], can you review it? This adds the CBO tests to the testsuite with return path enabled; it is useful to check that we do not have any regressions while working in the return path. Thanks CBO (Calcite Return Path): Add CBO tests with return path on Key: HIVE-11391 URL: https://issues.apache.org/jira/browse/HIVE-11391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11391.patch, HIVE-11391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11383: --- Attachment: HIVE-11383.8.patch Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shiqian.huang updated HIVE-11411: - Description: in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job. was:in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress work with job tracking progress. When job tracking progress got any exception, will bring lock time out exception to big query job. Transaction lock time out when can't tracking job progress -- Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Affects Versions: 1.2.0 Reporter: shiqian.huang Priority: Minor in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shiqian.huang updated HIVE-11411: - Description: in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job finally. was: in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job. Transaction lock time out when can't tracking job progress -- Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Affects Versions: 1.2.0 Reporter: shiqian.huang Priority: Minor in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job finally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shiqian.huang updated HIVE-11411: - Affects Version/s: 1.2.0 Priority: Minor (was: Major) Summary: Transaction lock time out when can't tracking job progress (was: Transaction lock) Transaction lock time out when can't tracking job progress -- Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Affects Versions: 1.2.0 Reporter: shiqian.huang Priority: Minor Transaction lock time out when can't tracking job progress. hive 1.2 when hive client cann't connect to appmaster to tracking job progress and job running more than 5mins, hive can't refresh lock heartbeat. then will get exception 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 3645) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293) at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) I find the reason is if (initializing rj.getJobState() == JobStatus.PREP) throw exception when doesn't configure any hadoop slave host in /etc/hosts. is it a bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647608#comment-14647608 ] Rajat Khandelwal commented on HIVE-11376: - Taking patch from reviewboard and attaching CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Attachments: HIVE-11376_02.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shiqian.huang updated HIVE-11411: - Description: in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring NoSuchLockException(hive client exception message:No record of lock could be found, may have timed out) to big query job finally. was: in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring lock time out exception to big query job finally. Transaction lock time out when can't tracking job progress -- Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Affects Versions: 1.2.0 Reporter: shiqian.huang Priority: Minor in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive lock heartbeating progress works with job tracking progress. such like heartbeater.heartbeat(); if (initializing rj.getJobState() == JobStatus.PREP) { // No reason to poll untill the job is initialized continue; } else { // By now the job is initialized so no reason to do // rj.getJobState() again and we do not want to do an extra RPC call initializing = false; } When job tracking progress got any exception in rj.getJobState() == JobStatus.PREP, will bring NoSuchLockException(hive client exception message:No record of lock could be found, may have timed out) to big query job finally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647574#comment-14647574 ] Rajat Khandelwal commented on HIVE-11376: - Created https://reviews.apache.org/r/36939/ CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11411) Transaction lock
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647575#comment-14647575 ] shiqian.huang commented on HIVE-11411: -- Transaction lock time out when can't tracking job progress. hive 1.2 when hive client cann't connect to appmaster to tracking job progress and job running more than 5mins, hive can't refresh lock heartbeat. then will get exception 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 3645) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293) at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) I find the reason is if (initializing rj.getJobState() == JobStatus.PREP) throw exception when doesn't configure any hadoop slave host in /etc/hosts. is it a bug? Transaction lock Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Reporter: shiqian.huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11411) Transaction lock
[ https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shiqian.huang updated HIVE-11411: - Description: Transaction lock time out when can't tracking job progress. hive 1.2 when hive client cann't connect to appmaster to tracking job progress and job running more than 5mins, hive can't refresh lock heartbeat. then will get exception 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 3645) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293) at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) I find the reason is if (initializing rj.getJobState() == JobStatus.PREP) throw exception when doesn't configure any hadoop slave host in /etc/hosts. is it a bug? Transaction lock Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Reporter: shiqian.huang Transaction lock time out when can't tracking job progress. hive 1.2 when hive client cann't connect to appmaster to tracking job progress and job running more than 5mins, hive can't refresh lock heartbeat. then will get exception 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 3645) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293) at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at
[jira] [Updated] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-11376: Attachment: HIVE-11376_02.patch CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Attachments: HIVE-11376_02.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal reassigned HIVE-11376: --- Assignee: Rajat Khandelwal CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Attachments: HIVE-11376_02.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized
[ https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-11380: Summary: NPE when FileSinkOperator is not initialized (was: NPE when FileSinkOperator is not inialized) NPE when FileSinkOperator is not initialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized
[ https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-11380: Attachment: HIVE-11380.1.patch Null check to prevent NPE caused by non-initialized FileSinkOperator NPE when FileSinkOperator is not initialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11380.1.patch When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11397) Parse Hive OR clauses as they are written into the AST
[ https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11397: --- Attachment: HIVE-11397.patch Parse Hive OR clauses as they are written into the AST -- Key: HIVE-11397 URL: https://issues.apache.org/jira/browse/HIVE-11397 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11397.patch When parsing A OR B OR C, hive converts it into (C OR B) OR A instead of turning it into A OR (B OR C) {code} GenericUDFOPOr or = new GenericUDFOPOr(); ListExprNodeDesc expressions = new ArrayListExprNodeDesc(2); expressions.add(previous); expressions.add(current); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647680#comment-14647680 ] Hive QA commented on HIVE-11383: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747969/HIVE-11383.8.patch {color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 9276 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_exists org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subq_where_serialization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression
[ https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11401: --- Attachment: (was: HIVE-11401.1.patch) Predicate push down does not work with Parquet when partitions are in the expression Key: HIVE-11401 URL: https://issues.apache.org/jira/browse/HIVE-11401 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11401.1.patch When filtering Parquet tables using a partition column, the query fails saying the column does not exist: {noformat} hive create table part1 (id int, content string) partitioned by (p string) stored as parquet; hive alter table part1 add partition (p='p1'); hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b'); hive select id from part1 where p='p1'; Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [p] was not found in schema! Time taken: 0.151 seconds {noformat} It is correct that the partition column is not part of the Parquet schema. So, the fix should be to remove such expression from the Parquet PPD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression
[ https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11401: --- Attachment: HIVE-11401.1.patch Predicate push down does not work with Parquet when partitions are in the expression Key: HIVE-11401 URL: https://issues.apache.org/jira/browse/HIVE-11401 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11401.1.patch When filtering Parquet tables using a partition column, the query fails saying the column does not exist: {noformat} hive create table part1 (id int, content string) partitioned by (p string) stored as parquet; hive alter table part1 add partition (p='p1'); hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b'); hive select id from part1 where p='p1'; Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [p] was not found in schema! Time taken: 0.151 seconds {noformat} It is correct that the partition column is not part of the Parquet schema. So, the fix should be to remove such expression from the Parquet PPD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression
[ https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11401: --- Attachment: HIVE-11401.1.patch Predicate push down does not work with Parquet when partitions are in the expression Key: HIVE-11401 URL: https://issues.apache.org/jira/browse/HIVE-11401 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11401.1.patch When filtering Parquet tables using a partition column, the query fails saying the column does not exist: {noformat} hive create table part1 (id int, content string) partitioned by (p string) stored as parquet; hive alter table part1 add partition (p='p1'); hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b'); hive select id from part1 where p='p1'; Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [p] was not found in schema! Time taken: 0.151 seconds {noformat} It is correct that the partition column is not part of the Parquet schema. So, the fix should be to remove such expression from the Parquet PPD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile
[ https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11414: -- Summary: Fix OOM in MapTask with many input partitions with RCFile (was: Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced) Fix OOM in MapTask with many input partitions with RCFile - Key: HIVE-11414 URL: https://issues.apache.org/jira/browse/HIVE-11414 Project: Hive Issue Type: Improvement Components: File Formats, Serializers/Deserializers Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Zheng Shao Priority: Minor MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Make cachedLazyStruct a weakly referenced object. Do similar changes to other columnar serde if any (e.g. maybe ORCFile's serde as well). Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced
[ https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11414: -- Component/s: File Formats Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced -- Key: HIVE-11414 URL: https://issues.apache.org/jira/browse/HIVE-11414 Project: Hive Issue Type: Improvement Components: File Formats, Serializers/Deserializers Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Zheng Shao Priority: Minor MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Make cachedLazyStruct a weakly referenced object. Do similar changes to other columnar serde if any (e.g. maybe ORCFile's serde as well). Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648281#comment-14648281 ] Sushanth Sowmyan commented on HIVE-11407: - The edits look good, +1. JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression
[ https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11401: --- Attachment: HIVE-11401.2.patch Predicate push down does not work with Parquet when partitions are in the expression Key: HIVE-11401 URL: https://issues.apache.org/jira/browse/HIVE-11401 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11401.1.patch, HIVE-11401.2.patch When filtering Parquet tables using a partition column, the query fails saying the column does not exist: {noformat} hive create table part1 (id int, content string) partitioned by (p string) stored as parquet; hive alter table part1 add partition (p='p1'); hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b'); hive select id from part1 where p='p1'; Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [p] was not found in schema! Time taken: 0.151 seconds {noformat} It is correct that the partition column is not part of the Parquet schema. So, the fix should be to remove such expression from the Parquet PPD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-11407: - Attachment: HIVE-11407-branch-1.0.patch JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407-branch-1.0.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648302#comment-14648302 ] Vaibhav Gumashta commented on HIVE-11408: - Looks like we fixed this in 1.2 via HIVE-10329. HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used --- Key: HIVE-11408 URL: https://issues.apache.org/jira/browse/HIVE-11408 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). Basically, add jar creates a new classloader for loading the classes from the new jar and adds the new classloader to the SessionState object of user's session, making the older one its parent. Creating a temporary function uses the new classloader to load the class used for the function. On closing a session, although there is code to close the classloader for the session, I'm not seeing the new classloader getting GCed and from the heapdump I can see it holds on to the temporary function's class that should have gone away after the session close. Steps to reproduce: 1. {code} jdbc:hive2://localhost:1/ add jar hdfs:///tmp/audf.jar; {code} 2. Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was added. 3. {code} jdbc:hive2://localhost:1/ CREATE TEMPORARY FUNCTION funcA AS 'org.gumashta.udf.AUDF'; {code} 4. Close the jdbc session. 5. Take the memory snapshot and verify that the new URLClassLoader is indeed there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile
[ https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11414: -- Description: MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly referenced object. Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. was: MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Make cachedLazyStruct a weakly referenced object. Do similar changes to other columnar serde if any (e.g. maybe ORCFile's serde as well). Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. Fix OOM in MapTask with many input partitions with RCFile - Key: HIVE-11414 URL: https://issues.apache.org/jira/browse/HIVE-11414 Project: Hive Issue Type: Improvement Components: File Formats, Serializers/Deserializers Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Zheng Shao Priority: Minor MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). *
[jira] [Updated] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raajay Viswanathan updated HIVE-11413: -- Attachment: HIVE-11413.patch Check if _saHooks_ is empty instead of checking if it is NULL. Need code review. Error in detecting availability of HiveSemanticAnalyzerHooks Key: HIVE-11413 URL: https://issues.apache.org/jira/browse/HIVE-11413 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.0.0 Reporter: Raajay Viswanathan Assignee: Raajay Viswanathan Priority: Trivial Labels: newbie Attachments: HIVE-11413.patch In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the {{getHooks}} method. This method always returns a {{List}} of hooks. However, while checking for availability of hooks, the current version of the code uses a comparison of _saHooks_ with NULL. This is incorrect, as the segment of code designed to call pre and post Analyze functions gets executed even when the list is empty. The comparison should be changed to {{saHooks.size() 0}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-11407: - Attachment: HIVE-11407.1.patch Patch for master and branch-1 . JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648237#comment-14648237 ] Thejas M Nair edited comment on HIVE-11407 at 7/30/15 8:17 PM: --- HIVE-11407.1.patch - Patch for master and branch-1 . was (Author: thejas): Patch for master and branch-1 . JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648241#comment-14648241 ] Thejas M Nair commented on HIVE-11407: -- [~sushanth] Can you please review my edits to your patch ? JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-11407: - Attachment: (was: HIVE-11407-branch-1.0.patch) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-11407: - Attachment: (was: HIVE-11407-branch-1.0.patch) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-11407: - Attachment: HIVE-11407-branch-1.0.patch JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails
[ https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648458#comment-14648458 ] Eugene Koifman commented on HIVE-11418: --- This feels dangerous, but if rm -Rf exists perhaps this is valid as well. On a separate note, setting fs.trash.intrval in a Hive session (or hive-site.xml) will lead to unexpected behavior. Hadoop code won't see this value. (HIVE-10986) Dropping a database in an encryption zone with CASCADE and trash enabled fails -- Key: HIVE-11418 URL: https://issues.apache.org/jira/browse/HIVE-11418 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Here's the query that fails: {noformat} hive CREATE DATABASE db; hive USE db; hive CREATE TABLE a(id int); hive SET fs.trash.interval=1; hive DROP DATABASE db CASCADE; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop db.a because it is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) {noformat} DROP DATABASE does not support PURGE, so we have to remove the tables one by one, and then drop the database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results
[ https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648523#comment-14648523 ] Mostafa Mokhtar commented on HIVE-11410: [~mmccline] Join with subquery containing a group by incorrectly returns no results --- Key: HIVE-11410 URL: https://issues.apache.org/jira/browse/HIVE-11410 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Nicholas Brenwald Priority: Minor Attachments: hive-site.xml Start by creating a table *t* with columns *c1* and *c2* and populate with 1 row of data. For example create table *t* from an existing table which contains at least 1 row of data by running: {code} create table t as select 'abc' as c1, 0 as c2 from Y limit 1; {code} Table *t* looks like the following: ||c1||c2|| |abc|0| Running the following query then returns zero results. {code} SELECT t1.c1 FROM t t1 JOIN (SELECT t2.c1, MAX(t2.c2) AS c2 FROM t t2 GROUP BY t2.c1 ) t3 ON t1.c2=t3.c2 {code} However, we expected to see the following: ||c1|| |abc| The problem seems to relate to the fact that in the subquery, we group by column *c1*, but this is not subsequently used in the join condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default
[ https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648470#comment-14648470 ] Sergio Peña commented on HIVE-10884: Does this issue happen only with the attached patch? Or it happened because I enabled the TestBeeLineDriver tests? The directory is preserved in the jenkins slaves, but those slaves expired after a while, and then destroyed; so we don't have access to those logs anymore. Enable some beeline tests and turn on HIVE-4239 by default -- Key: HIVE-10884 URL: https://issues.apache.org/jira/browse/HIVE-10884 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, HIVE-10884.06.patch, HIVE-10884.07.patch, HIVE-10884.07.patch, HIVE-10884.patch See comments in HIVE-4239. Beeline tests with parallelism need to be enabled to turn compilation parallelism on by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10863: --- Attachment: (was: HIVE-10863.0-spark.patch) Merge trunk to Spark branch 7/29/2015 [Spark Branch] Key: HIVE-10863 URL: https://issues.apache.org/jira/browse/HIVE-10863 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: mj.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11160) Auto-gather column stats
[ https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648420#comment-14648420 ] Hive QA commented on HIVE-11160: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748041/HIVE-11160.03.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9277 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4765/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748041 - PreCommit-HIVE-TRUNK-Build Auto-gather column stats Key: HIVE-11160 URL: https://issues.apache.org/jira/browse/HIVE-11160 Project: Hive Issue Type: New Feature Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, HIVE-11160.03.patch Hive will collect table stats when set hive.stats.autogather=true during the INSERT OVERWRITE command. And then the users need to collect the column stats themselves using Analyze command. In this patch, the column stats will also be collected automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails
[ https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648434#comment-14648434 ] Sergio Peña commented on HIVE-11418: I think we should support PURGE when dropping a database as well. [~ekoifman] What do you think about this? Dropping a database in an encryption zone with CASCADE and trash enabled fails -- Key: HIVE-11418 URL: https://issues.apache.org/jira/browse/HIVE-11418 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Here's the query that fails: {noformat} hive CREATE DATABASE db; hive USE db; hive CREATE TABLE a(id int); hive SET fs.trash.interval=1; hive DROP DATABASE db CASCADE; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop db.a because it is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) {noformat} DROP DATABASE does not support PURGE, so we have to remove the tables one by one, and then drop the database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails
[ https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648468#comment-14648468 ] Eugene Koifman commented on HIVE-11418: --- I meant hadoop code that actually check to see if file should be moved to trash Dropping a database in an encryption zone with CASCADE and trash enabled fails -- Key: HIVE-11418 URL: https://issues.apache.org/jira/browse/HIVE-11418 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Here's the query that fails: {noformat} hive CREATE DATABASE db; hive USE db; hive CREATE TABLE a(id int); hive SET fs.trash.interval=1; hive DROP DATABASE db CASCADE; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop db.a because it is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) {noformat} DROP DATABASE does not support PURGE, so we have to remove the tables one by one, and then drop the database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648520#comment-14648520 ] Hive QA commented on HIVE-10863: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748082/HIVE-10863.1-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7742 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-945/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748082 - PreCommit-HIVE-SPARK-Build Merge master to Spark branch 7/29/2015 [Spark Branch] - Key: HIVE-10863 URL: https://issues.apache.org/jira/browse/HIVE-10863 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10863.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648461#comment-14648461 ] Sergio Peña commented on HIVE-8128: --- Parquet 1.8.1 is now officially released. Would it help if we bump up to 1.8.1? Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Fix For: parquet-branch Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, HIVE-8128.6-parquet.patch, HIVE-8128.6-parquet.patch, testParquetFile We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. As discussed in PARQUET-131, we will work out Hive POC based on the new Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile
[ https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11414: -- Description: MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase. The cost saving of not recreating a single object is too small compared to processing N rows. Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is a much bigger change to the code. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. * We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly referenced object, but that feels like an overkill. was: MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase. The cost saving of not recreating a single object is too small compared to processing N rows. Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. * We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly referenced object, but that feels like an overkill. Fix OOM in MapTask with many input partitions with RCFile - Key: HIVE-11414 URL: https://issues.apache.org/jira/browse/HIVE-11414 Project: Hive Issue Type: Improvement Components: File Formats, Serializers/Deserializers Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Zheng Shao Priority: Minor MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to
[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION
[ https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648541#comment-14648541 ] Hive QA commented on HIVE-11409: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748059/HIVE-11409.02.patch {color:green}SUCCESS:{color} +1 9276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4766/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12748059 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION -- Key: HIVE-11409 URL: https://issues.apache.org/jira/browse/HIVE-11409 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11409.01.patch, HIVE-11409.02.patch Two purpose: (1) to ensure that the data type of non-primary branch (the 1st branch is the primary branch) of union can be casted to that of the primary branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is redundant, it will be removed by IdentidyProjectRemover optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks
[ https://issues.apache.org/jira/browse/HIVE-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648726#comment-14648726 ] Xuefu Zhang edited comment on HIVE-11423 at 7/31/15 4:20 AM: - FYI: this issue was found in Spark branch (HIVE-10863) and fix was included in patch for HIVE-10166. was (Author: xuefuz): FYI: this issue was found in Spark branch (HIVE-10853) and fix was included in patch for HIVE-10166. Ship hive-storage-api along with hive-exec jar to all Tasks --- Key: HIVE-11423 URL: https://issues.apache.org/jira/browse/HIVE-11423 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 2.0.0 Reporter: Gopal V Priority: Blocker After moving critical classes into hive-storage-api, those classes are needed for queries to execute successfully. Currently all queries run fail with ClassNotFound exceptions on a large cluster. {code} Caused by: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2583) at java.lang.Class.getDeclaredFields(Class.java:1916) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109) ... 57 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 62 more {code} Temporary workaround added to hiverc: {{add jar ./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks
[ https://issues.apache.org/jira/browse/HIVE-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648726#comment-14648726 ] Xuefu Zhang edited comment on HIVE-11423 at 7/31/15 4:19 AM: - FYI: this issue was found in Spark branch (HIVE-10853) and fix was included in patch for HIVE-10166. was (Author: xuefuz): FYI: this issue was found in Spark branch (HIVE-10835) and fix was included in patch for HIVE-10166. Ship hive-storage-api along with hive-exec jar to all Tasks --- Key: HIVE-11423 URL: https://issues.apache.org/jira/browse/HIVE-11423 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 2.0.0 Reporter: Gopal V Priority: Blocker After moving critical classes into hive-storage-api, those classes are needed for queries to execute successfully. Currently all queries run fail with ClassNotFound exceptions on a large cluster. {code} Caused by: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2583) at java.lang.Class.getDeclaredFields(Class.java:1916) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109) ... 57 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 62 more {code} Temporary workaround added to hiverc: {{add jar ./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators
[ https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-11355: -- Attachment: HIVE-11355.3.patch Union tests fix. Hive on tez: memory manager for sort buffers (input/output) and operators - Key: HIVE-11355 URL: https://issues.apache.org/jira/browse/HIVE-11355 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, HIVE-11355.3.patch We need to better manage the sort buffer allocations to ensure better performance. Also, we need to provide configurations to certain operators to stay within memory limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333
[ https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648619#comment-14648619 ] Yongzhi Chen commented on HIVE-11384: - Thanks [~szehon] for reviewing it. Add Test case which cover both HIVE-11271 and HIVE-11333 Key: HIVE-11384 URL: https://issues.apache.org/jira/browse/HIVE-11384 Project: Hive Issue Type: Test Components: Logical Optimizer, Parser Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11384.1.patch Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648623#comment-14648623 ] Hive QA commented on HIVE-11407: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748056/HIVE-11407.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4767/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4767/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4767/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748056 - PreCommit-HIVE-TRUNK-Build JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map
[ https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648781#comment-14648781 ] Wojciech Indyk commented on HIVE-11329: --- sure, here is the request: https://reviews.apache.org/r/36974/ Column prefix in key of hbase column prefix map --- Key: HIVE-11329 URL: https://issues.apache.org/jira/browse/HIVE-11329 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.14.0 Reporter: Wojciech Indyk Assignee: Wojciech Indyk Priority: Minor Attachments: HIVE-11329.1.patch When I create a table with hbase column prefix https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result map in hive. E.g. record in HBase rowkey: 123 column: tag_one, value: 0.5 column: tag_two, value 0.5 representation in Hive via column prefix mapping tag_.*: column: tag mapstring,string key: tag_one, value: 0.5 key: tag_two, value: 0.5 should be: key: one, value: 0.5 key: two: value: 0.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
[ https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648766#comment-14648766 ] Hive QA commented on HIVE-11416: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748086/HIVE-11416.01.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9274 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4769/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4769/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4769/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748086 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY -- Key: HIVE-11416 URL: https://issues.apache.org/jira/browse/HIVE-11416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11416.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized
[ https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648150#comment-14648150 ] Hive QA commented on HIVE-11380: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12748014/HIVE-11380.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9276 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4763/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12748014 - PreCommit-HIVE-TRUNK-Build NPE when FileSinkOperator is not initialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11380.1.patch When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases
[ https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647241#comment-14647241 ] Hive QA commented on HIVE-10319: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747893/HIVE-10319.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4755/ Messages: {noformat} This message was trimmed, see log for full details [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-metastore --- [INFO] Compiling 244 source files to /data/hive-ptest/working/apache-github-source-source/metastore/target/classes [INFO] - [WARNING] COMPILATION WARNING : [INFO] - [WARNING] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java: Some input files use unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java: Recompile with -Xlint:unchecked for details. [INFO] 4 warnings [INFO] - [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[71,44] cannot find symbol symbol: class GetAllFunctionsResponse location: package org.apache.hadoop.hive.metastore.api [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[5544,12] cannot find symbol symbol: class GetAllFunctionsResponse location: class org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[217,12] cannot find symbol symbol: class GetAllFunctionsResponse location: interface org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Iface [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[79,44] cannot find symbol symbol: class GetAllFunctionsResponse location: package org.apache.hadoop.hive.metastore.api [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[40,44] cannot find symbol symbol: class GetAllFunctionsResponse location: package org.apache.hadoop.hive.metastore.api [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[2049,10] cannot find symbol symbol: class GetAllFunctionsResponse location: class org.apache.hadoop.hive.metastore.HiveMetaStoreClient [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[1134,3] cannot find symbol symbol: class GetAllFunctionsResponse location: interface org.apache.hadoop.hive.metastore.IMetaStoreClient [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[7491,14] cannot find symbol symbol: class GetAllFunctionsResponse location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_all_functions_call [ERROR] /data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[3300,12] cannot find symbol symbol: class GetAllFunctionsResponse location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client [ERROR]
[jira] [Updated] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases
[ https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nezih Yigitbasi updated HIVE-10319: --- Attachment: HIVE-10319.5.patch Hive CLI startup takes a long time with a large number of databases --- Key: HIVE-10319 URL: https://issues.apache.org/jira/browse/HIVE-10319 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.0.0 Reporter: Nezih Yigitbasi Assignee: Nezih Yigitbasi Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch The Hive CLI takes a long time to start when there is a large number of databases in the DW. I think the root cause is the way permanent UDFs are loaded from the metastore. When I looked at the logs and the source code I see that at startup Hive first gets all the databases from the metastore and then for each database it makes a metastore call to get the permanent functions for that database [see Hive.java | https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185]. So the number of metastore calls made is in the order of the number of databases. In production we have several hundreds of databases so Hive makes several hundreds of RPC calls during startup, taking 30+ seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4239) Remove lock on compilation stage
[ https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647266#comment-14647266 ] Carl Steinbach commented on HIVE-4239: -- It should probably go in both the hs2 and compiler sections. Remove lock on compilation stage Key: HIVE-4239 URL: https://issues.apache.org/jira/browse/HIVE-4239 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Carl Steinbach Assignee: Sergey Shelukhin Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance
[ https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-11406: Attachment: HIVE-11406.01.patch Vectorization: StringExpr::compare() == 0 is bad for performance Key: HIVE-11406 URL: https://issues.apache.org/jira/browse/HIVE-11406 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-11406.01.patch {{StringExpr::compare() == 0}} is forced to evaluate the whole memory comparison loop for differing lengths of strings, though there is no possibility they will ever be equal. Add a {{StringExpr::equals}} which can be a smaller and tighter loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8343: - Description: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html was: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner Key: HIVE-8343 URL: https://issues.apache.org/jira/browse/HIVE-8343 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: JongWon Park Priority: Minor Attachments: HIVE-8343.patch In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333
[ https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648058#comment-14648058 ] Szehon Ho commented on HIVE-11384: -- No problem, makes sense. +1, always good to have more tests. Add Test case which cover both HIVE-11271 and HIVE-11333 Key: HIVE-11384 URL: https://issues.apache.org/jira/browse/HIVE-11384 Project: Hive Issue Type: Test Components: Logical Optimizer, Parser Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11384.1.patch Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized
[ https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648063#comment-14648063 ] Szehon Ho commented on HIVE-11380: -- +1, seems good to add the null check here to me NPE when FileSinkOperator is not initialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11380.1.patch When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression
[ https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648081#comment-14648081 ] Prasanth Jayachandran commented on HIVE-11405: -- [~gopalv] is the column stats available for this query? If not your patch will early terminate because of data size becoming 0 and AND evaluation terminating early. Also I am not sure if this assumption is correct {code} final long branch2Rows = (newNumRows = branchRows) ? 0 : (newNumRows - branchRows); {code} I am still evaluating this change. The idea of mirroring the tree and passing the branchRows to sibling branch looks good so far. Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression -- Key: HIVE-11405 URL: https://issues.apache.org/jira/browse/HIVE-11405 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Prasanth Jayachandran Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330. Quoting him, The recursion protection works well with an AND expr, but it doesn't work against (OR a=1 (OR a=2 (OR a=3 (OR ...) since the for the rows will never be reduced during recursion due to the nature of the OR. We need to execute a short-circuit to satisfy the OR properly - no case which matches a=1 qualifies for the rest of the filters. Recursion should pass in the numRows - branch1Rows for the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression
[ https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648100#comment-14648100 ] Szehon Ho commented on HIVE-11401: -- +1 makes sense from my end. Predicate push down does not work with Parquet when partitions are in the expression Key: HIVE-11401 URL: https://issues.apache.org/jira/browse/HIVE-11401 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11401.1.patch When filtering Parquet tables using a partition column, the query fails saying the column does not exist: {noformat} hive create table part1 (id int, content string) partitioned by (p string) stored as parquet; hive alter table part1 add partition (p='p1'); hive insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b'); hive select id from part1 where p='p1'; Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [p] was not found in schema! Time taken: 0.151 seconds {noformat} It is correct that the partition column is not part of the Parquet schema. So, the fix should be to remove such expression from the Parquet PPD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11160) Auto-gather column stats
[ https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11160: --- Attachment: HIVE-11160.03.patch rebase the patch Auto-gather column stats Key: HIVE-11160 URL: https://issues.apache.org/jira/browse/HIVE-11160 Project: Hive Issue Type: New Feature Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, HIVE-11160.03.patch Hive will collect table stats when set hive.stats.autogather=true during the INSERT OVERWRITE command. And then the users need to collect the column stats themselves using Analyze command. In this patch, the column stats will also be collected automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization
[ https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647238#comment-14647238 ] Hive QA commented on HIVE-11387: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747871/HIVE-11387.04.patch {color:green}SUCCESS:{color} +1 9276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4754/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12747871 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization -- Key: HIVE-11387 URL: https://issues.apache.org/jira/browse/HIVE-11387 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, HIVE-11387.03.patch, HIVE-11387.04.patch {noformat} The main problem is that, due to return path, now we may have (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main problem is that it does not take into account of the setting. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()
[ https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647243#comment-14647243 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11316: -- [~jcamachorodriguez] Can you please look at patch#7 Thanks Hari Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree() -- Key: HIVE-11316 URL: https://issues.apache.org/jira/browse/HIVE-11316 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11316-branch-1.0.patch, HIVE-11316-branch-1.2.patch, HIVE-11316.1.patch, HIVE-11316.2.patch, HIVE-11316.3.patch, HIVE-11316.4.patch, HIVE-11316.5.patch, HIVE-11316.6.patch, HIVE-11316.7.patch HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira is suppose to alter the string memoization to use a different data structure that doesn't duplicate any part of the string so that we do not run into OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647277#comment-14647277 ] Lefty Leverenz commented on HIVE-10165: --- If the fix version bit looks familiar, that's because I borrowed it from your comment on HIVE-9583. Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: TODOC2.0, streaming_api Fix For: 2.0.0 Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4239) Remove lock on compilation stage
[ https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647283#comment-14647283 ] Lefty Leverenz commented on HIVE-4239: -- Hmm ... a compiler section would be nice to have. Maybe we could add one. Thanks Carl. Remove lock on compilation stage Key: HIVE-4239 URL: https://issues.apache.org/jira/browse/HIVE-4239 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Carl Steinbach Assignee: Sergey Shelukhin Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on
[ https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647930#comment-14647930 ] Pengcheng Xiong commented on HIVE-11391: [~jcamachorodriguez], can u resubmit the patch for a QA run due to the recent commit of multijoin? If it can pass, +1. CBO (Calcite Return Path): Add CBO tests with return path on Key: HIVE-11391 URL: https://issues.apache.org/jira/browse/HIVE-11391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11391.patch, HIVE-11391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results
[ https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-11410: --- Assignee: Matt McCline Join with subquery containing a group by incorrectly returns no results --- Key: HIVE-11410 URL: https://issues.apache.org/jira/browse/HIVE-11410 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Nicholas Brenwald Assignee: Matt McCline Priority: Minor Attachments: hive-site.xml Start by creating a table *t* with columns *c1* and *c2* and populate with 1 row of data. For example create table *t* from an existing table which contains at least 1 row of data by running: {code} create table t as select 'abc' as c1, 0 as c2 from Y limit 1; {code} Table *t* looks like the following: ||c1||c2|| |abc|0| Running the following query then returns zero results. {code} SELECT t1.c1 FROM t t1 JOIN (SELECT t2.c1, MAX(t2.c2) AS c2 FROM t t2 GROUP BY t2.c1 ) t3 ON t1.c2=t3.c2 {code} However, we expected to see the following: ||c1|| |abc| The problem seems to relate to the fact that in the subquery, we group by column *c1*, but this is not subsequently used in the join condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file
[ https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647986#comment-14647986 ] Ryan Blue commented on HIVE-8950: - [~gauravkumar37], the schema is read from the file once and then converted to DDL. The file is no longer used after that, so schema evolution proceeds as it normally would for any table. Add support in ParquetHiveSerde to create table schema from a parquet file -- Key: HIVE-8950 URL: https://issues.apache.org/jira/browse/HIVE-8950 Project: Hive Issue Type: Improvement Reporter: Ashish K Singh Assignee: Gaurav Kumar Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, HIVE-8950.8.patch, HIVE-8950.patch PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without having to specify the column names and types. As, parquet files store schema in their footer, it is possible to generate hive schema from parquet file's metadata. This will improve usability of parquet backed tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases
[ https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648023#comment-14648023 ] Jason Dere commented on HIVE-10319: --- +1 Hive CLI startup takes a long time with a large number of databases --- Key: HIVE-10319 URL: https://issues.apache.org/jira/browse/HIVE-10319 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.0.0 Reporter: Nezih Yigitbasi Assignee: Nezih Yigitbasi Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch The Hive CLI takes a long time to start when there is a large number of databases in the DW. I think the root cause is the way permanent UDFs are loaded from the metastore. When I looked at the logs and the source code I see that at startup Hive first gets all the databases from the metastore and then for each database it makes a metastore call to get the permanent functions for that database [see Hive.java | https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185]. So the number of metastore calls made is in the order of the number of databases. In production we have several hundreds of databases so Hive makes several hundreds of RPC calls during startup, taking 30+ seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION
[ https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648123#comment-14648123 ] Pengcheng Xiong commented on HIVE-11409: a good example is union_remove_10.q {code} Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: $f0, $f1 Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: $f0 (type: string), $f1 (type: bigint) outputColumnNames: key, values Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat serde: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe name: default.outputtbl1 {code} CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION -- Key: HIVE-11409 URL: https://issues.apache.org/jira/browse/HIVE-11409 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11409.01.patch Two purpose: (1) to ensure that the data type of non-primary branch (the 1st branch is the primary branch) of union can be casted to that of the primary branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is redundant, it will be removed by IdentidyProjectRemover optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raajay Viswanathan reassigned HIVE-11413: - Assignee: Raajay Viswanathan Error in detecting availability of HiveSemanticAnalyzerHooks Key: HIVE-11413 URL: https://issues.apache.org/jira/browse/HIVE-11413 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.0.0 Reporter: Raajay Viswanathan Assignee: Raajay Viswanathan Priority: Trivial Labels: newbie In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the {{getHooks}} method. This method always returns a {{List}} of hooks. However, while checking for availability of hooks, the current version of the code uses a comparison of _saHooks_ with NULL. This is incorrect, as the segment of code designed to call pre and post Analyze functions gets executed even when the list is empty. The comparison should be changed to {{saHooks.size() 0}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance
[ https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647474#comment-14647474 ] Hive QA commented on HIVE-11406: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747946/HIVE-11406.01.patch {color:green}SUCCESS:{color} +1 9276 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4758/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12747946 - PreCommit-HIVE-TRUNK-Build Vectorization: StringExpr::compare() == 0 is bad for performance Key: HIVE-11406 URL: https://issues.apache.org/jira/browse/HIVE-11406 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-11406.01.patch {{StringExpr::compare() == 0}} is forced to evaluate the whole memory comparison loop for differing lengths of strings, though there is no possibility they will ever be equal. Add a {{StringExpr::equals}} which can be a smaller and tighter loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647478#comment-14647478 ] Hive QA commented on HIVE-11383: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747964/HIVE-11383.7.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4759/ Messages: {noformat} This message was trimmed, see log for full details 3426/3551 KB 3430/3551 KB 3434/3551 KB 3438/3551 KB 3442/3551 KB 3446/3551 KB 3450/3551 KB 3454/3551 KB 3458/3551 KB 3462/3551 KB 3466/3551 KB 3470/3551 KB 3474/3551 KB 3478/3551 KB 3482/3551 KB 3486/3551 KB 3490/3551 KB 3494/3551 KB 3498/3551 KB 3502/3551 KB 3506/3551 KB 3510/3551 KB 3514/3551 KB 3518/3551 KB 3522/3551 KB 3526/3551 KB 3530/3551 KB 3534/3551 KB 3538/3551 KB 3542/3551 KB 3546/3551 KB 3550/3551 KB 3551/3551 KB Downloaded: http://repository.apache.org/snapshots/org/apache/calcite/calcite-core/1.4.0-incubating-SNAPSHOT/calcite-core-1.4.0-incubating-20150729.211031-2.jar (3551 KB at 1005.9 KB/sec) [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-exec --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen Generating vector expression code Generating vector expression test code [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java added. [INFO] [INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec --- [INFO] ANTLR: Processing source directory /data/hive-ptest/working/apache-github-source-source/ql/src/java ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/ql/parse/HiveLexer.g org/apache/hadoop/hive/ql/parse/HiveParser.g warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_REDUCE using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP,
[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11383: --- Attachment: HIVE-11383.7.patch Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases
[ https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647384#comment-14647384 ] Hive QA commented on HIVE-10319: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12747929/HIVE-10319.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4757/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12747929 - PreCommit-HIVE-TRUNK-Build Hive CLI startup takes a long time with a large number of databases --- Key: HIVE-10319 URL: https://issues.apache.org/jira/browse/HIVE-10319 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.0.0 Reporter: Nezih Yigitbasi Assignee: Nezih Yigitbasi Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch The Hive CLI takes a long time to start when there is a large number of databases in the DW. I think the root cause is the way permanent UDFs are loaded from the metastore. When I looked at the logs and the source code I see that at startup Hive first gets all the databases from the metastore and then for each database it makes a metastore call to get the permanent functions for that database [see Hive.java | https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185]. So the number of metastore calls made is in the order of the number of databases. In production we have several hundreds of databases so Hive makes several hundreds of RPC calls during startup, taking 30+ seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results
[ https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Brenwald updated HIVE-11410: - Attachment: hive-site.xml Join with subquery containing a group by incorrectly returns no results --- Key: HIVE-11410 URL: https://issues.apache.org/jira/browse/HIVE-11410 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Nicholas Brenwald Priority: Minor Attachments: hive-site.xml Start by creating a table *t* with columns *c1* and *c2* and populate with 1 row of data. For example create table *t* from an existing table which contains at least 1 row of data by running: {code} create table t as select 'abc' as c1, 0 as c2 from Y limit 1; {code} Table *t* looks like the following: ||c1||c2|| |abc|0| Running the following query then returns zero results. {code} SELECT t1.c1 FROM t t1 JOIN (SELECT t2.c1, MAX(t2.c2) AS c2 FROM t t2 GROUP BY t2.c1 ) t3 ON t1.c2=t3.c2 {code} However, we expected to see the following: ||c1|| |abc| The problem seems to relate to the fact that in the subquery, we group by column *c1*, but this is not subsequently used in the join condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)