[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM
[ https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681255#comment-14681255 ] Hive QA commented on HIVE-11467: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749692/HIVE-11467.04.patch {color:green}SUCCESS:{color} +1 9348 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4913/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4913/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4913/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749692 - PreCommit-HIVE-TRUNK-Build WriteBuffers rounding wbSize to next power of 2 may cause OOM - Key: HIVE-11467 URL: https://issues.apache.org/jira/browse/HIVE-11467 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, HIVE-11467.03.patch, HIVE-11467.04.patch If wbSize passed to WriteBuffers cstr is not power of 2, it will do a rounding first to the next power of 2 {code} public WriteBuffers(int wbSize, long maxSize) { this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : (Integer.highestOneBit(wbSize) 1); this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize); this.offsetMask = this.wbSize - 1; this.maxSize = maxSize; writePos.bufferIndex = -1; nextBufferToWrite(); } {code} That may break existing memory consumption assumption for mapjoin, and potentially cause OOM. The solution will be to pass a power of 2 number as wbSize from upstream during hashtable creation, to avoid this late expansion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11442) Remove commons-configuration.jar from Hive distribution
[ https://issues.apache.org/jira/browse/HIVE-11442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-11442: -- Attachment: HIVE-11442.3.patch commons-configuration.jar is needed in testing. Should only be removed in packaging. Remove commons-configuration.jar from Hive distribution --- Key: HIVE-11442 URL: https://issues.apache.org/jira/browse/HIVE-11442 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11442.1.patch, HIVE-11442.2.patch, HIVE-11442.3.patch Some customer report version conflicting for Hive bundled commons-configuration.jar. Actually commons-configuration.jar is not needed by Hive. It is a transitive dependency of Hadoop/Accumulo. User should be able to pick those jars from Hadoop at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11376: -- Labels: TODOC2.0 (was: ) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-11376.02.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
[ https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681285#comment-14681285 ] Lefty Leverenz commented on HIVE-11376: --- Doc note: This removes *hive.hadoop.supports.splittable.combineinputformat* from HiveConf.java, so the wikidoc needs a bullet Removed In: 2.0.0 with HIVE-11376 for the parameter. * [Configuration Properties -- hive.hadoop.supports.splittable.combineinputformat | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hadoop.supports.splittable.combineinputformat] CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files - Key: HIVE-11376 URL: https://issues.apache.org/jira/browse/HIVE-11376 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-11376.02.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680179#comment-14680179 ] Hive QA commented on HIVE-11103: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749540/HIVE-11103.4.patch {color:green}SUCCESS:{color} +1 9354 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4902/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4902/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4902/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749540 - PreCommit-HIVE-TRUNK-Build Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, HIVE-11103.2.patch, HIVE-11103.4.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11504: Attachment: HIVE-11504.1.patch Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6892) Permission inheritance issues
[ https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680220#comment-14680220 ] Andrés Cordero commented on HIVE-6892: -- Can some changes be made to [Permission Inheritance in Hive|https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive]? I've seen some behavior that doesn't match what the doc claims. Namely: * Group isn't inherited when the flag is off, already done by HDFS for new directories implies that it shouldn't matter. * Extended ACLs are not inherited they are cloned, which means that default ACLs don't propagate down as default+access (the HDFS way), but default only (which means default for directories and nothing for files). Extended Acl's are taken from parent in the first paragraph already implies this, but it's still rather ambiguous (especially with below containing the same already done by HDFS text). Permission inheritance issues - Key: HIVE-6892 URL: https://issues.apache.org/jira/browse/HIVE-6892 Project: Hive Issue Type: Bug Components: Security Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho *HDFS Background* * When a file or directory is created, its owner is the user identity of the client process, and its group is inherited from parent (the BSD rule). Permissions are taken from default umask. Extended Acl's are taken from parent unless they are set explicitly. *Goals* To reduce need to set fine-grain file security props after every operation, users may want the following Hive warehouse file/dir to auto-inherit security properties from their directory parents: * Directories created by new database/table/partition/bucket * Files added to tables via load/insert * Table directories exported/imported (open question of whether exported table inheriting perm from new parent needs another flag) What may be inherited: * Basic file permission * Groups (already done by HDFS for new directories) * Extended ACL's (already done by HDFS for new directories) *Behavior* * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive will try to do all above inheritances. In the future, we can add more flags for more finer-grained control. * Failure by Hive to inherit will not cause operation to fail. Rule of thumb of when security-prop inheritance will happen is the following: ** To run chmod, a user must be the owner of the file, or else a super-user. ** To run chgrp, a user must be the owner of files, or else a super-user. ** Hence, user that hive runs as (either 'hive' or the logged-in user in case of impersonation), must be super-user or owner of the file whose security properties are going to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees
[ https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680293#comment-14680293 ] Hive QA commented on HIVE-11398: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749542/HIVE-11398.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9347 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4903/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4903/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4903/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12749542 - PreCommit-HIVE-TRUNK-Build Parse wide OR and wide AND trees to flat OR/AND trees - Key: HIVE-11398 URL: https://issues.apache.org/jira/browse/HIVE-11398 Project: Hive Issue Type: New Feature Components: Logical Optimizer, UDF Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, HIVE-11398.4.patch, HIVE-11398.patch Deep trees of AND/OR are hard to traverse particularly when they are merely the same structure in nested form as a version of the operator that takes an arbitrary number of args. One potential way to convert the DFS searches into a simpler BFS search is to introduce a new Operator pair named ALL and ANY. ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A) ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A) The SemanticAnalyser would be responsible for generating these operators and this would mean that the depth and complexity of traversals for the simplest case of wide AND/OR trees would be trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8285: - Description: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. was: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11506) Casting varchar/char type to string cannot be vectorized
[ https://issues.apache.org/jira/browse/HIVE-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680425#comment-14680425 ] Hive QA commented on HIVE-11506: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749551/HIVE-11506.1.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_varchar_mapjoin1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_mapjoin1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4904/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4904/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4904/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12749551 - PreCommit-HIVE-TRUNK-Build Casting varchar/char type to string cannot be vectorized Key: HIVE-11506 URL: https://issues.apache.org/jira/browse/HIVE-11506 Project: Hive Issue Type: Improvement Components: Vectorization Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-11506.1.patch.txt It's not defined in vectorization context. {code} explain select cast(cast(cstring1 as varchar(10)) as string) x from alltypesorc order by x; {code} Mapper is not vectorized by exception, {noformat} 015-08-10 17:02:08,003 INFO [main]: physical.Vectorizer (Vectorizer.java:validateExprNodeDesc(1299)) - Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Unhandled cast input type: varchar(10) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getCastToString(VectorizationContext.java:1543) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUDFBridgeVectorExpression(VectorizationContext.java:1379) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1177) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1293) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1284) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1116) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:906) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees
[ https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680536#comment-14680536 ] Gopal V commented on HIVE-11398: Patch LGTM - +1 The last test-failure seems to be an expected OR rotation due to the traversal order. groupby_multi_single_reducer3.q.out {code} HEAD predicate: key + key) = 400) or (((key - 100) = 500) and value is not null)) or key + key) = 200) or ((key - 100) = 100)) or ((key = 300) and value is not null))) (type: boolean) === predicate: key + key) = 200) or ((key - 100) = 100) or ((key = 300) and value is not null)) or (((key + key) = 400) or (((key - 100) = 500) and value is not null))) (type: boolean) {code} Parse wide OR and wide AND trees to flat OR/AND trees - Key: HIVE-11398 URL: https://issues.apache.org/jira/browse/HIVE-11398 Project: Hive Issue Type: New Feature Components: Logical Optimizer, UDF Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch Deep trees of AND/OR are hard to traverse particularly when they are merely the same structure in nested form as a version of the operator that takes an arbitrary number of args. One potential way to convert the DFS searches into a simpler BFS search is to introduce a new Operator pair named ALL and ANY. ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A) ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A) The SemanticAnalyser would be responsible for generating these operators and this would mean that the depth and complexity of traversals for the simplest case of wide AND/OR trees would be trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.
[ https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680562#comment-14680562 ] Thejas M Nair commented on HIVE-11466: -- [~prasanth_j] [~spena] [~xuefuz] [~jdere] [~csun] Thanks for the great team work ! HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk. Key: HIVE-11466 URL: https://issues.apache.org/jira/browse/HIVE-11466 Project: Hive Issue Type: Bug Reporter: Sergio Peña Assignee: Xuefu Zhang Fix For: spark-branch, 2.0.0 Attachments: HIVE-11466.1.patch, HIVE-11466.patch An issue with HIVE-10166 patch is increasing the size of hive.log and causing jenkins to fail because it does not have more space. Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with the patch, and after other commits. {noformat} BEFORE HIVE-10166 13M Aug 5 11:57 ./hive-unit/target/tmp/log/hive.log WITH HIVE-10166 2.4G Aug 5 12:07 ./hive-unit/target/tmp/log/hive.log CURRENT HEAD 3.2G Aug 5 12:36 ./hive-unit/target/tmp/log/hive.log {noformat} This is just a single test, but on Jenkins, hive.log is more than 13G of size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization
[ https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11387: --- Attachment: HIVE-11387.07.patch resubmit the patch as all the test failures can pass on my mac. CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization -- Key: HIVE-11387 URL: https://issues.apache.org/jira/browse/HIVE-11387 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, HIVE-11387.06.patch, HIVE-11387.07.patch The main problem is that, due to return path, now we may have {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The main problem is that it does not take into account of the setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680395#comment-14680395 ] Anthony Hsu commented on HIVE-4734: --- Any updates on this patch? I'd love to see this committed, too! :-) Use custom ObjectInspectors for AvroSerde - Key: HIVE-4734 URL: https://issues.apache.org/jira/browse/HIVE-4734 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mark Wagner Labels: Avro, AvroSerde, Performance Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, HIVE-4734.4.patch, HIVE-4734.5.patch Currently, the AvroSerde recursively copies all fields of a record from the GenericRecord to a List row object and provides the standard ObjectInspectors. Performance can be improved by providing ObjectInspectors to the Avro record itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE Potential null dereference in Utilities#clearWork() --- Key: HIVE-8458 URL: https://issues.apache.org/jira/browse/HIVE-8458 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8458_001.patch {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8343: - Description: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html was: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner Key: HIVE-8343 URL: https://issues.apache.org/jira/browse/HIVE-8343 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: JongWon Park Priority: Minor Attachments: HIVE-8343.patch In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11465) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix stringToMap
[ https://issues.apache.org/jira/browse/HIVE-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-11465. Resolution: Fixed resolved by HIVE-11436 CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix stringToMap - Key: HIVE-11465 URL: https://issues.apache.org/jira/browse/HIVE-11465 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Right now str_to_map('a=1 b=2 c=3', ' ', '=') will generate a=null, b=null, 2=null, etc, rather than a=1, b=2, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees
[ https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11398: --- Attachment: HIVE-11398.5.patch Parse wide OR and wide AND trees to flat OR/AND trees - Key: HIVE-11398 URL: https://issues.apache.org/jira/browse/HIVE-11398 Project: Hive Issue Type: New Feature Components: Logical Optimizer, UDF Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch Deep trees of AND/OR are hard to traverse particularly when they are merely the same structure in nested form as a version of the operator that takes an arbitrary number of args. One potential way to convert the DFS searches into a simpler BFS search is to introduce a new Operator pair named ALL and ANY. ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A) ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A) The SemanticAnalyser would be responsible for generating these operators and this would mean that the depth and complexity of traversals for the simplest case of wide AND/OR trees would be trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF
[ https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11480: --- Attachment: HIVE-11480.03.patch re-upload the patch for QA run as all the tests passed on my laptop. CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF --- Key: HIVE-11480 URL: https://issues.apache.org/jira/browse/HIVE-11480 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11480.01.patch, HIVE-11480.02.patch, HIVE-11480.03.patch Some of the UDAF can not deal with char/varchar correctly when return path is on, for example udaf_number_format.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9073) NPE when using custom windowing UDAFs
[ https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9073: - Affects Version/s: 0.14.0 1.0.0 NPE when using custom windowing UDAFs - Key: HIVE-9073 URL: https://issues.apache.org/jira/browse/HIVE-9073 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0, 1.0.0 Reporter: Jason Dere Assignee: Jason Dere Fix For: 1.2.0 Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, HIVE-9073.3.patch From the hive-user email group: {noformat} While executing a simple select query using a custom windowing UDAF I created I am constantly running into this error. Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more Just wanted to check if any of you have faced this earlier. Also when I try to run the Custom UDAF on another server it works fine. The only difference I can see it that the hive version I am using on my local machine is 0.13.1 where it is working and on the other machine it is 0.13.0 where I see the above mentioned error. I am not sure if this was a bug which was fixed in the later release but I just wanted to confirm the same. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11477) CBO inserts a UDF cast for integer type promotion (only for negative numbers)
[ https://issues.apache.org/jira/browse/HIVE-11477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680472#comment-14680472 ] Pengcheng Xiong commented on HIVE-11477: May need more work. For example, in input_part6.q, we have {code}SELECT x.* FROM SRCPART x WHERE x.ds = 2008-04-08 LIMIT 10{code} and in union_remove_6_subq, we have {code} explain select avg(c) from( SELECT count(1)-200 as c from src UNION ALL SELECT count(1) as c from src )subq {code} CBO inserts a UDF cast for integer type promotion (only for negative numbers) - Key: HIVE-11477 URL: https://issues.apache.org/jira/browse/HIVE-11477 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-11477.01.patch, HIVE-11477.02.patch When CBO is enabled, filters which compares tinyint, smallint columns with constant integer types will insert a UDFToInteger cast for the columns. When CBO is disabled, there is no such UDF. This behaviour breaks ORC predicate pushdown feature as ORC ignores UDFs in the filters. In the following examples column t is tinyint {code:title=Explain for select count(*) from orc_ppd where t -127; (CBO OFF)} Filter Operator [FIL_9] predicate:(t = 125) (type: boolean) Statistics:Num rows: 1050 Data size: 611757 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:orc_ppd Statistics:Num rows: 2100 Data size: 1223514 Basic stats: COMPLETE Column stats: NONE {code} {code:title=Explain for select count(*) from orc_ppd where t -127; (CBO ON)} Filter Operator [FIL_10] predicate:(UDFToInteger(t) -127) (type: boolean) Statistics:Num rows: 700 Data size: 407838 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:orc_ppd Statistics:Num rows: 2100 Data size: 1223514 Basic stats: COMPLETE Column stats: NONE {code} CBO does not insert such cast for non-negative numbers {code:title=Explain for select count(*) from orc_ppd where t 127; (CBO ON)} Filter Operator [FIL_10] predicate:(t 127) (type: boolean) Statistics:Num rows: 700 Data size: 407838 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:orc_ppd Statistics:Num rows: 2100 Data size: 1223514 Basic stats: COMPLETE Column stats: NONE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
[ https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8282: - Description: In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} was: In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin() - Key: HIVE-8282 URL: https://issues.apache.org/jira/browse/HIVE-8282 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8282.patch In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8342) Potential null dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8342: - Description: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} was: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} Potential null dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-8342 URL: https://issues.apache.org/jira/browse/HIVE-8342 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8342_001.patch, HIVE-8342_002.patch {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM
[ https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680435#comment-14680435 ] Wei Zheng commented on HIVE-11467: -- [~sershe] The test failures are due to customized wbsize setting (not power of 2), and MapJoinBytesTableContainer didn't have this enforcement. Since WriteBuffers has a number of consumers, such as MapJoinBytesTableContainer, HybridHashTableContainer, VectorMapJoinFastKeyStore and VectorMapJoinFastValueStore, I would say we'd better still keep the rounding logic in WriteBuffers cstr. What do you think? Hybrid is the only exception that it does the rounding by itself. WriteBuffers rounding wbSize to next power of 2 may cause OOM - Key: HIVE-11467 URL: https://issues.apache.org/jira/browse/HIVE-11467 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, HIVE-11467.03.patch If wbSize passed to WriteBuffers cstr is not power of 2, it will do a rounding first to the next power of 2 {code} public WriteBuffers(int wbSize, long maxSize) { this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : (Integer.highestOneBit(wbSize) 1); this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize); this.offsetMask = this.wbSize - 1; this.maxSize = maxSize; writePos.bufferIndex = -1; nextBufferToWrite(); } {code} That may break existing memory consumption assumption for mapjoin, and potentially cause OOM. The solution will be to pass a power of 2 number as wbSize from upstream during hashtable creation, to avoid this late expansion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6892) Permission inheritance issues
[ https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680496#comment-14680496 ] Szehon Ho commented on HIVE-6892: - The second point might be a valid change, to mimic the HDFS way instead of cloning the extended ACL's. I dont have bandwidth to make the change at the moment, someone else can feel free to take a stab (looks like HIVE-11481). It would be more complex, we would have to traverse the tree and essentially copy the HDFS logic for extended ACL for 'default' group. I have not investigated enough to comment on the first point. Permission inheritance issues - Key: HIVE-6892 URL: https://issues.apache.org/jira/browse/HIVE-6892 Project: Hive Issue Type: Bug Components: Security Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho *HDFS Background* * When a file or directory is created, its owner is the user identity of the client process, and its group is inherited from parent (the BSD rule). Permissions are taken from default umask. Extended Acl's are taken from parent unless they are set explicitly. *Goals* To reduce need to set fine-grain file security props after every operation, users may want the following Hive warehouse file/dir to auto-inherit security properties from their directory parents: * Directories created by new database/table/partition/bucket * Files added to tables via load/insert * Table directories exported/imported (open question of whether exported table inheriting perm from new parent needs another flag) What may be inherited: * Basic file permission * Groups (already done by HDFS for new directories) * Extended ACL's (already done by HDFS for new directories) *Behavior* * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive will try to do all above inheritances. In the future, we can add more flags for more finer-grained control. * Failure by Hive to inherit will not cause operation to fail. Rule of thumb of when security-prop inheritance will happen is the following: ** To run chmod, a user must be the owner of the file, or else a super-user. ** To run chgrp, a user must be the owner of files, or else a super-user. ** Hence, user that hive runs as (either 'hive' or the logged-in user in case of impersonation), must be super-user or owner of the file whose security properties are going to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property
[ https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680308#comment-14680308 ] Chao Sun commented on HIVE-11340: - +1 Create ORC based table using like clause doesn't copy compression property -- Key: HIVE-11340 URL: https://issues.apache.org/jira/browse/HIVE-11340 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Gaurav Kohli Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11340.1.patch, HIVE-11340.2.patch I found a issue in “create table like” clause, as it is not copying the table properties from ORC File format based table. Steps to reproduce: Step1 : {code} create table orc_table ( time string) stored as ORC tblproperties (orc.compress=SNAPPY); {code} Step 2: {code} create table orc_table_using_like like orc_table; {code} Step 3: {code} show create table orc_table_using_like; {code} Result: {code} createtab_stmt CREATE TABLE `orc_table_using_like`( `time` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like' TBLPROPERTIES ( 'transient_lastDdlTime'='1437578939') {code} Issue: 'orc.compress'='SNAPPY' property is missing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11171) Join reordering algorithm might introduce projects between joins
[ https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11171: - Attachment: HIVE-11171.branch-1.patch Some spark qtest changes I was able to regenerate on branch-1 (it matches with the origin master patch). Rest of the tests I was not able to repro. Possibly from different jira. Join reordering algorithm might introduce projects between joins Key: HIVE-11171 URL: https://issues.apache.org/jira/browse/HIVE-11171 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.3.0, 2.0.0, 1.2.2 Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, HIVE-11171.03.patch, HIVE-11171.5.patch, HIVE-11171.branch-1.patch, HIVE-11171.branch-1.patch, HIVE-11171.patch, HIVE-11171.patch Join reordering algorithm might introduce projects between joins which causes multijoin optimization in SemanticAnalyzer to not kick in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680637#comment-14680637 ] Gopal V commented on HIVE-11502: [~ychena]: I've linked the issue to the known issue in HADOOP-12217 Is it possible that you're testing hive against different versions of Hadoop between 0.13 vs 1.2.? Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680740#comment-14680740 ] Gopal V commented on HIVE-11502: A custom hashcode can be used internal to Hive (i.e group-by etc), but not externally to hive (bucketing into HDFS, results of hash() functions). Because that would break external assumptions in a non-backwards-compatible way. The reason shuffle + merge is more uniform is because it starts using [murmur hashes|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L366] for UNIFORM trait RS instead of the builtin writable hash funcs (which are skewed). You will probably notice that using a vectorized input format like ORC would not have the issue you're hitting, since the vector transform inside the operator pipeline gives hive the opportunity to use per-operator specific optimizations. Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680749#comment-14680749 ] Hive QA commented on HIVE-11504: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749615/HIVE-11504.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9348 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.io.parquet.read.TestParquetFilterPredicate.testFilterColumnsThatDoNoExistOnSchema org.apache.hadoop.hive.ql.io.parquet.read.TestParquetFilterPredicate.testFilterFloatColumns {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4908/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4908/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4908/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12749615 - PreCommit-HIVE-TRUNK-Build Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11511) Output the message of orcfiledump when ORC files are not specified
[ https://issues.apache.org/jira/browse/HIVE-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680606#comment-14680606 ] Hive QA commented on HIVE-11511: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749598/HIVE-11511.1.patch {color:green}SUCCESS:{color} +1 9347 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4907/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4907/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4907/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749598 - PreCommit-HIVE-TRUNK-Build Output the message of orcfiledump when ORC files are not specified -- Key: HIVE-11511 URL: https://issues.apache.org/jira/browse/HIVE-11511 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Labels: orcfile Attachments: HIVE-11511.1.patch When I execute the orcfiledump command without specifying a ORC file, any message is not output and return value is 0. {code} [root@hive hive]# /usr/local/hive/bin/hive --orcfiledump [root@hive hive]# echo $? 0 {code} For this behavior, I will be modified to output a error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression
[ https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11405: - Attachment: HIVE-11405-branch-1.patch Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression -- Key: HIVE-11405 URL: https://issues.apache.org/jira/browse/HIVE-11405 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11405-branch-1.patch, HIVE-11405.1.patch, HIVE-11405.2.patch, HIVE-11405.2.patch, HIVE-11405.2.patch, HIVE-11405.2.patch, HIVE-11405.patch Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330. Quoting him, The recursion protection works well with an AND expr, but it doesn't work against (OR a=1 (OR a=2 (OR a=3 (OR ...) since the for the rows will never be reduced during recursion due to the nature of the OR. We need to execute a short-circuit to satisfy the OR properly - no case which matches a=1 qualifies for the rest of the filters. Recursion should pass in the numRows - branch1Rows for the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11171) Join reordering algorithm might introduce projects between joins
[ https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680643#comment-14680643 ] Prasanth Jayachandran commented on HIVE-11171: -- [~jcamachorodriguez] I reverted the patch and reapplied the new branch-1 that contains some spark test diffs. Join reordering algorithm might introduce projects between joins Key: HIVE-11171 URL: https://issues.apache.org/jira/browse/HIVE-11171 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.3.0, 2.0.0, 1.2.2 Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, HIVE-11171.03.patch, HIVE-11171.5.patch, HIVE-11171.branch-1.patch, HIVE-11171.branch-1.patch, HIVE-11171.patch, HIVE-11171.patch Join reordering algorithm might introduce projects between joins which causes multijoin optimization in SemanticAnalyzer to not kick in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11295) LLAP: clean up ORC dependencies on object pools
[ https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680771#comment-14680771 ] Prasanth Jayachandran commented on HIVE-11295: -- +1 LLAP: clean up ORC dependencies on object pools --- Key: HIVE-11295 URL: https://issues.apache.org/jira/browse/HIVE-11295 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11295.01.patch, HIVE-11295.02.patch, HIVE-11295.patch Before there's storage API module, we can clean some things up NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11438) Join a ACID table with non-ACID table fail with MR on 1.0.0
[ https://issues.apache.org/jira/browse/HIVE-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-11438: -- Attachment: test.log I cannot get precommit test run against branch-1.0. Run test locally, see 52 test failures on branch-1.0 even without my patch. With the patch, I get the same result. So those failures are not related. Attach test log. Patch committed to 1.0 branch. Join a ACID table with non-ACID table fail with MR on 1.0.0 --- Key: HIVE-11438 URL: https://issues.apache.org/jira/browse/HIVE-11438 Project: Hive Issue Type: Bug Components: Query Processor, Transactions Affects Versions: 1.0.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 1.0.1 Attachments: HIVE-11438.1-branch-1.0.patch, HIVE-11438.1.patch, HIVE-11438.2-branch-1.0.patch, test.log The following script fail on MR mode: Preparation: {code} CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) CLUSTERED BY (k1) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES(transactional=true); INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I'); CREATE TABLE orc_table (k1 INT, f1 STRING) CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS STORED AS ORC; INSERT OVERWRITE TABLE orc_table VALUES (1, 'x'); {code} Then run the following script: {code} SET hive.execution.engine=mr; SET hive.auto.convert.join=false; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SELECT t1.*, t2.* FROM orc_table t1 JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1; {code} Stack: {code} java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:585) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:580) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:580) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1367) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1006) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:996) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Job Submission failed with exception
[jira] [Commented] (HIVE-11505) Disabling llap cache allocate direct is not honored anymore
[ https://issues.apache.org/jira/browse/HIVE-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680777#comment-14680777 ] Prasanth Jayachandran commented on HIVE-11505: -- I ran llap locally with hive.llap.io.cache.direct set to false. Disabling llap cache allocate direct is not honored anymore --- Key: HIVE-11505 URL: https://issues.apache.org/jira/browse/HIVE-11505 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Sergey Shelukhin ORC refactorings probably broke something. I disabled cache direct allocation but still I am getting this exception {code} Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(I)J at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java:115) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor$ZlibDirectDecompressor.init(ZlibDecompressor.java:358) at org.apache.hadoop.hive.shims.ZeroCopyShims.getDirectDecompressor(ZeroCopyShims.java:114) at org.apache.hadoop.hive.shims.Hadoop23Shims.getDirectDecompressor(Hadoop23Shims.java:975) at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.directDecompress(ZlibCodec.java:128) at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:84) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1128) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:780) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:467) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:355) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:70) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11511) Output the message of orcfiledump when ORC files are not specified
[ https://issues.apache.org/jira/browse/HIVE-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680840#comment-14680840 ] Alan Gates commented on HIVE-11511: --- In general looks good. We should avoid the System.exit call and use return instead. We keep the System.exits out in case we're called by another tool. I can just change that in the patch when I commit it. Output the message of orcfiledump when ORC files are not specified -- Key: HIVE-11511 URL: https://issues.apache.org/jira/browse/HIVE-11511 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Labels: orcfile Attachments: HIVE-11511.1.patch When I execute the orcfiledump command without specifying a ORC file, any message is not output and return value is 0. {code} [root@hive hive]# /usr/local/hive/bin/hive --orcfiledump [root@hive hive]# echo $? 0 {code} For this behavior, I will be modified to output a error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680783#comment-14680783 ] Yongzhi Chen commented on HIVE-11502: - [~gopalv], I have confirmed that HIVE-7041 caused the regression. Because the hadoop bug is there for a long time, after hive switch to use hadoop's hashcode, we got hadoop's bug. Thanks for find the root cause by pointing the hadoop bug. After I add code in serde/src/java/org/apache/hadoop/hive/serde2/io/DoubleWritable.java {noformat} @Override public int hashCode() { long v = Double.doubleToLongBits(super.get()); return (int) (v ^ (v 32)); } {noformat} The group by query can finish in 15 seconds. So next step is, how do we fix the issue now? Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools
[ https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11295: Attachment: HIVE-11295.02.patch Fix a small bug and some bad renames (pascal-case variables due to bulk replace) LLAP: clean up ORC dependencies on object pools --- Key: HIVE-11295 URL: https://issues.apache.org/jira/browse/HIVE-11295 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11295.01.patch, HIVE-11295.02.patch, HIVE-11295.patch Before there's storage API module, we can clean some things up NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11505) Disabling llap cache allocate direct is not honored anymore
[ https://issues.apache.org/jira/browse/HIVE-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680757#comment-14680757 ] Sergey Shelukhin commented on HIVE-11505: - I cannot repro this, the test (that relies on non-direct alloc) passes. Where are you getting this error and how do you disable direct allocation? Disabling llap cache allocate direct is not honored anymore --- Key: HIVE-11505 URL: https://issues.apache.org/jira/browse/HIVE-11505 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Sergey Shelukhin ORC refactorings probably broke something. I disabled cache direct allocation but still I am getting this exception {code} Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(I)J at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java:115) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor$ZlibDirectDecompressor.init(ZlibDecompressor.java:358) at org.apache.hadoop.hive.shims.ZeroCopyShims.getDirectDecompressor(ZeroCopyShims.java:114) at org.apache.hadoop.hive.shims.Hadoop23Shims.getDirectDecompressor(Hadoop23Shims.java:975) at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.directDecompress(ZlibCodec.java:128) at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:84) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1128) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:780) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:467) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:355) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:70) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11457) Vectorization: Improve SIMD JIT in GenVectorCode StringExpr instrinsics
[ https://issues.apache.org/jira/browse/HIVE-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11457: --- Fix Version/s: 1.3.0 Vectorization: Improve SIMD JIT in GenVectorCode StringExpr instrinsics Key: HIVE-11457 URL: https://issues.apache.org/jira/browse/HIVE-11457 Project: Hive Issue Type: Improvement Components: Vectorization Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11457.1.patch, HIVE-11457.1.patch, string-intrinsic-sse.png With HIVE-11406, the Vectorization codegen generates a new and specialized fast-path for equality (and non equality), which removed the ordering and comparison constraints in the old codepath. The equality operation can be much more pipeline and cache line efficient by keeping on comparing even when an inequality has been detected. Optimize the single loop into a pair of loops, to allow the Vectorization codegen to use tighter loops that the JIT superword optimization can understand. !string-intrinsic-sse.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867 ] Sergey Shelukhin commented on HIVE-10289: - Hi. I am not sure what this patch is using (it's too big and no RB or description ;)), but HBase has a built-in serialization helper for sorted, multi-type keys called OrderedBytes: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html and HBASE-8201 Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set
[ https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9024: - Fix Version/s: 1.0.2 NullPointerException when starting webhcat server if templeton.hive.properties is not set - Key: HIVE-9024 URL: https://issues.apache.org/jira/browse/HIVE-9024 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang Fix For: 1.1.0, 1.0.2 Attachments: HIVE-9024.patch If templeton.hive.properties is not set, when starting webhcat server, the following NullPointerException is thrown and webhcat server could not start: {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318) at org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155) at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75) at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867 ] Sergey Shelukhin edited comment on HIVE-10289 at 8/10/15 10:18 PM: --- Hi. I am not sure what this patch is using (it's too big and description ;)), but HBase has a built-in serialization helper for sorted, multi-type keys called OrderedBytes: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html and HBASE-8201 I think we should use that was (Author: sershe): Hi. I am not sure what this patch is using (it's too big and no RB or description ;)), but HBase has a built-in serialization helper for sorted, multi-type keys called OrderedBytes: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html and HBASE-8201 I think we should use that Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set
[ https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680869#comment-14680869 ] Jason Dere commented on HIVE-9024: -- Added this fix to branch-1.0 NullPointerException when starting webhcat server if templeton.hive.properties is not set - Key: HIVE-9024 URL: https://issues.apache.org/jira/browse/HIVE-9024 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang Fix For: 1.1.0, 1.0.2 Attachments: HIVE-9024.patch If templeton.hive.properties is not set, when starting webhcat server, the following NullPointerException is thrown and webhcat server could not start: {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318) at org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155) at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75) at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867 ] Sergey Shelukhin edited comment on HIVE-10289 at 8/10/15 10:18 PM: --- Hi. I am not sure what this patch is using (it's too big and no RB or description ;)), but HBase has a built-in serialization helper for sorted, multi-type keys called OrderedBytes: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html and HBASE-8201 I think we should use that was (Author: sershe): Hi. I am not sure what this patch is using (it's too big and no RB or description ;)), but HBase has a built-in serialization helper for sorted, multi-type keys called OrderedBytes: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html and HBASE-8201 Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680870#comment-14680870 ] Gopal V commented on HIVE-11502: bq. So next step is, how do we fix the issue now? Easiest would be to use vectorization, which doesn't need any Writables in the inner loop. The vector hashcode for doubles would automatically be very similar to your impl (from Arrays.hashCode(double[])) {code} for (double element : a) { long bits = Double.doubleToLongBits(element); result = 31 * result + (int)(bits ^ (bits 32)); } return result; {code} Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees
[ https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680884#comment-14680884 ] Hive QA commented on HIVE-11398: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749637/HIVE-11398.5.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9305 tests executed *Failed tests:* {noformat} TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4909/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4909/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4909/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12749637 - PreCommit-HIVE-TRUNK-Build Parse wide OR and wide AND trees to flat OR/AND trees - Key: HIVE-11398 URL: https://issues.apache.org/jira/browse/HIVE-11398 Project: Hive Issue Type: New Feature Components: Logical Optimizer, UDF Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch Deep trees of AND/OR are hard to traverse particularly when they are merely the same structure in nested form as a version of the operator that takes an arbitrary number of args. One potential way to convert the DFS searches into a simpler BFS search is to introduce a new Operator pair named ALL and ANY. ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A) ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A) The SemanticAnalyser would be responsible for generating these operators and this would mean that the depth and complexity of traversals for the simplest case of wide AND/OR trees would be trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8680) Set Max Message for Binary Thrift endpoints
[ https://issues.apache.org/jira/browse/HIVE-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8680: - Fix Version/s: 1.0.2 Set Max Message for Binary Thrift endpoints --- Key: HIVE-8680 URL: https://issues.apache.org/jira/browse/HIVE-8680 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Labels: TODOC15 Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8680.patch, HIVE-8680.patch Thrift has a configuration open to restrict incoming message size. If we configure this we'll stop OOM'ing when someone sends us an HTTP request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7997) Potential null pointer reference in ObjectInspectorUtils#compareTypes()
[ https://issues.apache.org/jira/browse/HIVE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680919#comment-14680919 ] Jason Dere commented on HIVE-7997: -- Adding fix to branch-1.0 Potential null pointer reference in ObjectInspectorUtils#compareTypes() --- Key: HIVE-7997 URL: https://issues.apache.org/jira/browse/HIVE-7997 Project: Hive Issue Type: Bug Components: Types Reporter: Ted Yu Assignee: Navis Fix For: 1.1.0, 1.0.2 Attachments: HIVE-7997.1.patch.txt {code} if (childFieldsList1 == null childFieldsList2 == null) { return true; } if (childFieldsList1.size() != childFieldsList2.size()) { return false; } {code} If either childFieldsList1 or childFieldsList2 is null but not both, the second if statement would produce NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization
[ https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680993#comment-14680993 ] Hive QA commented on HIVE-11387: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749640/HIVE-11387.07.patch {color:green}SUCCESS:{color} +1 9347 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4910/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4910/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4910/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749640 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization -- Key: HIVE-11387 URL: https://issues.apache.org/jira/browse/HIVE-11387 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, HIVE-11387.06.patch, HIVE-11387.07.patch The main problem is that, due to return path, now we may have {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The main problem is that it does not take into account of the setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680877#comment-14680877 ] Sergey Shelukhin commented on HIVE-10289: - although I guess BSSD might be ok too... Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shannon Ladymon updated HIVE-7271: -- Labels: (was: TODOC14) Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch, HIVE-7271.7.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8518) Compile time skew join optimization returns duplicated results
[ https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680886#comment-14680886 ] Jason Dere commented on HIVE-8518: -- Included this fix to branch-1.0 Compile time skew join optimization returns duplicated results -- Key: HIVE-8518 URL: https://issues.apache.org/jira/browse/HIVE-8518 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0 Reporter: Rui Li Assignee: Rui Li Fix For: 1.1.0 Attachments: HIVE-8518.1.patch Compile time skew join optimization clones the join operator tree and unions the results. The problem here is that we don't properly insert the predicate for the cloned join (relying on an assert statement). To reproduce the issue, run the simple query: {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code} And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER statement). Duplicated results will be returned if you set hive.optimize.skewjoin.compiletime=true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
[ https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10062: --- Attachment: HIVE-10062.branch-1.patch HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data - Key: HIVE-10062 URL: https://issues.apache.org/jira/browse/HIVE-10062 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Critical Fix For: 1.2.0 Attachments: HIVE-10062.01.patch, HIVE-10062.02.patch, HIVE-10062.03.patch, HIVE-10062.04.patch, HIVE-10062.05.patch, HIVE-10062.branch-1.patch In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1500 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11514) Vectorized version of auto_sortmerge_join_1.q fails during execution with NPE
[ https://issues.apache.org/jira/browse/HIVE-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-11514: Attachment: auto_sortmerge_join_1.q Vectorized version of auto_sortmerge_join_1.q fails during execution with NPE - Key: HIVE-11514 URL: https://issues.apache.org/jira/browse/HIVE-11514 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: auto_sortmerge_join_1.q Query from auto_sortmerge_join_1.q: {code} select count(*) FROM bucket_big a JOIN bucket_small b ON a.key = b.key {code} generates stack trace: {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.initializeOp(VectorMapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7997) Potential null pointer reference in ObjectInspectorUtils#compareTypes()
[ https://issues.apache.org/jira/browse/HIVE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7997: - Fix Version/s: 1.0.2 Potential null pointer reference in ObjectInspectorUtils#compareTypes() --- Key: HIVE-7997 URL: https://issues.apache.org/jira/browse/HIVE-7997 Project: Hive Issue Type: Bug Components: Types Reporter: Ted Yu Assignee: Navis Fix For: 1.1.0, 1.0.2 Attachments: HIVE-7997.1.patch.txt {code} if (childFieldsList1 == null childFieldsList2 == null) { return true; } if (childFieldsList1.size() != childFieldsList2.size()) { return false; } {code} If either childFieldsList1 or childFieldsList2 is null but not both, the second if statement would produce NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken
[ https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8889: - Fix Version/s: 1.0.2 Included this fix to branch-1.0 JDBC Driver ResultSet.getXX(String columnLabel) methods Broken -- Key: HIVE-8889 URL: https://issues.apache.org/jira/browse/HIVE-8889 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: G Lingle Assignee: Chaoyu Tang Priority: Critical Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8889.1.patch, HIVE-8889.2.patch, HIVE-8889.patch Using hive-jdbc-0.13.1-cdh5.2.0.jar. All of the get-by-column-label methods of HiveBaseResultSet are now broken. They don't take just the column label as they should. Instead you have to pass in table name.column name. This requirement doesn't conform to the java ResultSet API which specifies: columnLabel - the label for the column specified with the SQL AS clause. If the SQL AS clause was not specified, then the label is the name of the column Looking at the code, it seems that the problem is that findColumn() method is looking in normalizedColumnNames instead of the columnNames. BTW, Another annoying issue with the code is that the SQLException thrown gives no indication of what the problem is. It should at least say that the column name wasn't found in the description string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11500: Description: We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: was: We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too. In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11500: Description: We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: was: We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by fileId In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8874) Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
[ https://issues.apache.org/jira/browse/HIVE-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680940#comment-14680940 ] Jason Dere commented on HIVE-8874: -- Included this fix to branch-1.0 Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster --- Key: HIVE-8874 URL: https://issues.apache.org/jira/browse/HIVE-8874 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8874.1.patch A Hive action workflow on a secure cluster, that does an INSERT INTO regular table FROM hbase table as part of its script will reproduce the issue. And it can be reproduced in Hive 0.13 cluster. {noformat} 10309 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: SemanticException Error while configuring input job properties org.apache.hadoop.hive.ql.parse.SemanticException: Error while configuring input job properties at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:94) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9261) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:332) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:988) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1053) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:914) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.IllegalStateException: Error while configuring input job properties at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:343) at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:279) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:804) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:774) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.convertToWork(SimpleFetchOptimizer.java:241) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$000(SimpleFetchOptimizer.java:207) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:112) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:83) ... 35 more Caused by: org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException:
[jira] [Updated] (HIVE-8874) Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
[ https://issues.apache.org/jira/browse/HIVE-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8874: - Fix Version/s: 1.0.2 Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster --- Key: HIVE-8874 URL: https://issues.apache.org/jira/browse/HIVE-8874 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8874.1.patch A Hive action workflow on a secure cluster, that does an INSERT INTO regular table FROM hbase table as part of its script will reproduce the issue. And it can be reproduced in Hive 0.13 cluster. {noformat} 10309 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: SemanticException Error while configuring input job properties org.apache.hadoop.hive.ql.parse.SemanticException: Error while configuring input job properties at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:94) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9261) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:332) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:988) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1053) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:914) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.IllegalStateException: Error while configuring input job properties at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:343) at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:279) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:804) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:774) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.convertToWork(SimpleFetchOptimizer.java:241) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$000(SimpleFetchOptimizer.java:207) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:112) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:83) ... 35 more Caused by: org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients
[jira] [Updated] (HIVE-8330) HiveResultSet.findColumn() parameters are case sensitive
[ https://issues.apache.org/jira/browse/HIVE-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8330: - Fix Version/s: 1.0.2 Included this fix to branch-1.0 HiveResultSet.findColumn() parameters are case sensitive Key: HIVE-8330 URL: https://issues.apache.org/jira/browse/HIVE-8330 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8330.1.patch, HIVE-8330.2.patch, HIVE-8330.3.patch, HIVE-8330.4.patch Look at the following code: {noformat} Class.forName(org.apache.hive.jdbc.HiveDriver); Connection db = null; Statement stmt = null; ResultSet rs = null; try { db = DriverManager.getConnection(jdbc:hive2://localhost:1/default, hive, ); stmt = db.createStatement(); rs = stmt.executeQuery(SELECT * FROM sample_07 limit 1); ResultSetMetaData metaData = rs.getMetaData(); for (int i = 1; i = metaData.getColumnCount(); i++) { System.out.println(Column + i + : + metaData.getColumnName(i)); } while (rs.next()) { System.out.println(rs.findColumn(code)); } } finally { DbUtils.closeQuietly(db, stmt, rs); } {noformat} Above program will generate following result on my cluster: {noformat} Column 1: code Column 2: description Column 3: total_emp Column 4: salary 1 {noformat} However, if the last print sentence is changed as following (using uppercase characters): {noformat} System.out.println(rs.findColumn(Code)); {noformat} The program will fail at exactly that line. The same happens if the column name is changed as CODE Based on the JDBC ResultSet documentation, this method should be case insensitive. Column names used as input to getter methods are case insensitive http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11513: Attachment: HIVE-11513.1.patch.txt RB: https://reviews.apache.org/r/37329/ AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools
[ https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11295: Attachment: HIVE-11295.01.patch Fixed method names LLAP: clean up ORC dependencies on object pools --- Key: HIVE-11295 URL: https://issues.apache.org/jira/browse/HIVE-11295 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11295.01.patch, HIVE-11295.patch Before there's storage API module, we can clean some things up -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools
[ https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11295: Description: Before there's storage API module, we can clean some things up NO PRECOMMIT TESTS was:Before there's storage API module, we can clean some things up LLAP: clean up ORC dependencies on object pools --- Key: HIVE-11295 URL: https://issues.apache.org/jira/browse/HIVE-11295 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11295.01.patch, HIVE-11295.patch Before there's storage API module, we can clean some things up NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile
[ https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-7214: - Labels: ORC (was: ) Support predicate pushdown for complex data types in ORCFile Key: HIVE-7214 URL: https://issues.apache.org/jira/browse/HIVE-7214 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Rohini Palaniswamy Labels: ORC Currently ORCFile does not support predicate pushdown for complex datatypes like map, array and struct while Parquet does. Came across this during discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) columns and most of the filter conditions are on them. Would be great to have support added for them in ORC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile
[ https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-7214: - Component/s: File Formats Support predicate pushdown for complex data types in ORCFile Key: HIVE-7214 URL: https://issues.apache.org/jira/browse/HIVE-7214 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Rohini Palaniswamy Currently ORCFile does not support predicate pushdown for complex datatypes like map, array and struct while Parquet does. Came across this during discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) columns and most of the filter conditions are on them. Would be great to have support added for them in ORC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11493) Predicate with integer column equals double evaluates to false
[ https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681178#comment-14681178 ] Hive QA commented on HIVE-11493: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749659/HIVE-11493.02.patch {color:green}SUCCESS:{color} +1 9348 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4912/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4912/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4912/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749659 - PreCommit-HIVE-TRUNK-Build Predicate with integer column equals double evaluates to false -- Key: HIVE-11493 URL: https://issues.apache.org/jira/browse/HIVE-11493 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Pengcheng Xiong Priority: Blocker Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch Filters with integer column equals double constant evaluates to false everytime. Negative double constant works fine. {code:title=explain select * from orc_ppd where t = 10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:false (type: boolean) TableScan [TS_0] alias:orc_ppd {code} {code:title=explain select * from orc_ppd where t = -10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:(t = (- 10.0)) (type: boolean) TableScan [TS_0] alias:orc_ppd {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity
[ https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681180#comment-14681180 ] Dong Chen commented on HIVE-11498: -- [~dapengsun] Thanks for your contribution! I have commit this to master, branch-1, and branch-1.2. HIVE Authorization v2 should not check permission for dummy entity -- Key: HIVE-11498 URL: https://issues.apache.org/jira/browse/HIVE-11498 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 1.2.0, 1.3.0, 2.0.0 Reporter: Dapeng Sun Assignee: Dapeng Sun Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, HIVE-11498.003.patch The queries like {{SELECT 1+1;}}, The target table and database will set to {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds of databases or tables. For authz v1. it has skip them. eg1. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600] {noformat} for (WriteEntity write : outputs) { if (write.isDummy() || write.isPathType()) { continue; } {noformat} eg2. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633] {noformat} for (ReadEntity read : inputs) { if (read.isDummy() || read.isPathType()) { continue; } ... } {noformat} ... This patch will fix authz v2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11493) Predicate with integer column equals double evaluates to false
[ https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681182#comment-14681182 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11493: -- +1 for patch 2. Predicate with integer column equals double evaluates to false -- Key: HIVE-11493 URL: https://issues.apache.org/jira/browse/HIVE-11493 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Pengcheng Xiong Priority: Blocker Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch Filters with integer column equals double constant evaluates to false everytime. Negative double constant works fine. {code:title=explain select * from orc_ppd where t = 10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:false (type: boolean) TableScan [TS_0] alias:orc_ppd {code} {code:title=explain select * from orc_ppd where t = -10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:(t = (- 10.0)) (type: boolean) TableScan [TS_0] alias:orc_ppd {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity
[ https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681187#comment-14681187 ] Dong Chen commented on HIVE-11498: -- Thanks for your review on this patch! [~thejas] HIVE Authorization v2 should not check permission for dummy entity -- Key: HIVE-11498 URL: https://issues.apache.org/jira/browse/HIVE-11498 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 1.2.0, 1.3.0, 2.0.0 Reporter: Dapeng Sun Assignee: Dapeng Sun Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, HIVE-11498.003.patch The queries like {{SELECT 1+1;}}, The target table and database will set to {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds of databases or tables. For authz v1. it has skip them. eg1. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600] {noformat} for (WriteEntity write : outputs) { if (write.isDummy() || write.isPathType()) { continue; } {noformat} eg2. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633] {noformat} for (ReadEntity read : inputs) { if (read.isDummy() || read.isPathType()) { continue; } ... } {noformat} ... This patch will fix authz v2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11462) GenericUDFStruct should constant fold at compile time
[ https://issues.apache.org/jira/browse/HIVE-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11462: --- Attachment: HIVE-11462.3.patch To workaround Kryo StdInstantiatorStrategy issues, prevent patch from folding deeper than 1 level. Updated golden files. GenericUDFStruct should constant fold at compile time - Key: HIVE-11462 URL: https://issues.apache.org/jira/browse/HIVE-11462 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-11462.1.patch, HIVE-11462.2.patch, HIVE-11462.3.patch, HIVE-11462.WIP.patch HIVE-11428 introduces a constant Struct Object, which is available for the runtime operators to assume as a constant parameter. This operator isn't constant folded during compilation since the UDF returns a complex type, which is logged as warning by the constant propogation layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity
[ https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681190#comment-14681190 ] Dapeng Sun commented on HIVE-11498: --- Thank [~thejas] and [~dongc] for your review. HIVE Authorization v2 should not check permission for dummy entity -- Key: HIVE-11498 URL: https://issues.apache.org/jira/browse/HIVE-11498 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 1.2.0, 1.3.0, 2.0.0 Reporter: Dapeng Sun Assignee: Dapeng Sun Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, HIVE-11498.003.patch The queries like {{SELECT 1+1;}}, The target table and database will set to {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds of databases or tables. For authz v1. it has skip them. eg1. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600] {noformat} for (WriteEntity write : outputs) { if (write.isDummy() || write.isPathType()) { continue; } {noformat} eg2. [Source code at github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633] {noformat} for (ReadEntity read : inputs) { if (read.isDummy() || read.isPathType()) { continue; } ... } {noformat} ... This patch will fix authz v2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11461) Transform flat AND/OR into IN struct clause
[ https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680651#comment-14680651 ] Gopal V commented on HIVE-11461: [~jcamachorodriguez]: the PreOrderOnceWalker improves the performance of the optimizer significantly. Patch LGTM - the early exit makes it fast for the miss cases as well. +1 to the patch, golden file updates after HIVE-11398 goes in. Transform flat AND/OR into IN struct clause --- Key: HIVE-11461 URL: https://issues.apache.org/jira/browse/HIVE-11461 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680720#comment-14680720 ] Yongzhi Chen commented on HIVE-11502: - [~gopalv], I checked the related hadoop code between two versions used by 0.13 and 1.2, there is no change in hadoop side for DoubleWritable. I think the regression may relate to HIVE-7041 which switch from using hive's own DoubleWritable to hadoop's . But just revert the change cause exceptions, I am still looking at it. Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM
[ https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11467: - Attachment: HIVE-11467.04.patch OK, updated the patch. WriteBuffers rounding wbSize to next power of 2 may cause OOM - Key: HIVE-11467 URL: https://issues.apache.org/jira/browse/HIVE-11467 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 2.0.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, HIVE-11467.03.patch, HIVE-11467.04.patch If wbSize passed to WriteBuffers cstr is not power of 2, it will do a rounding first to the next power of 2 {code} public WriteBuffers(int wbSize, long maxSize) { this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : (Integer.highestOneBit(wbSize) 1); this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize); this.offsetMask = this.wbSize - 1; this.maxSize = maxSize; writePos.bufferIndex = -1; nextBufferToWrite(); } {code} That may break existing memory consumption assumption for mapjoin, and potentially cause OOM. The solution will be to pass a power of 2 number as wbSize from upstream during hashtable creation, to avoid this late expansion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681110#comment-14681110 ] Thejas M Nair commented on HIVE-7224: - [~vgumashta] can you please rebase ? Set incremental printing to true by default in Beeline -- Key: HIVE-7224 URL: https://issues.apache.org/jira/browse/HIVE-7224 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Labels: TODOC1.2 Attachments: HIVE-7224.1.patch See HIVE-7221. By default beeline tries to buffer the entire output relation before printing it on stdout. This can cause OOM when the output relation is large. However, beeline has the option of incremental prints. We should keep that as the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7224: -- Affects Version/s: 1.1.0 1.0.0 1.2.0 Set incremental printing to true by default in Beeline -- Key: HIVE-7224 URL: https://issues.apache.org/jira/browse/HIVE-7224 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-7224.1.patch See HIVE-7221. By default beeline tries to buffer the entire output relation before printing it on stdout. This can cause OOM when the output relation is large. However, beeline has the option of incremental prints. We should keep that as the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7224: -- Labels: (was: TODOC1.2) Set incremental printing to true by default in Beeline -- Key: HIVE-7224 URL: https://issues.apache.org/jira/browse/HIVE-7224 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-7224.1.patch See HIVE-7221. By default beeline tries to buffer the entire output relation before printing it on stdout. This can cause OOM when the output relation is large. However, beeline has the option of incremental prints. We should keep that as the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11500: Attachment: HBase metastore split cache.pdf Attaching the doc implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11500: Attachment: (was: HBase metastore split cache.pdf) implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11500: Attachment: HBase metastore split cache.pdf implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9377) UDF in_file() in WHERE predicate causes NPE.
[ https://issues.apache.org/jira/browse/HIVE-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9377: - Fix Version/s: 1.0.2 Including fix to branch-1.0 UDF in_file() in WHERE predicate causes NPE. Key: HIVE-9377 URL: https://issues.apache.org/jira/browse/HIVE-9377 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 1.1.0, 1.0.2 Attachments: HIVE-9377.1.patch Consider the following query: {code:sql} SELECT foo, bar from mythdb.foobar where in_file( bar, '/tmp/bar_list.txt' ); {code} Using {{in_file()}} in a WHERE predicate causes the following NPE: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getWritableConstantValue(ObjectInspectorUtils.java:1041) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFInFile.getRequiredFiles(GenericUDFInFile.java:93) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.isDeterministicUdf(ConstantPropagateProcFactory.java:303) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:226) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:92) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:623) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate$ConstantPropagateWalker.walk(ConstantPropagate.java:147) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate.transform(ConstantPropagate.java:117) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:177) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10032) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:189) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:701) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:674) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} I have a tentative fix I need advice on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11461) Transform flat AND/OR into IN struct clause
[ https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-11461. --- Resolution: Fixed Failures unrelated. Committed to master. Thank you [~jcamachorodriguez]! Transform flat AND/OR into IN struct clause --- Key: HIVE-11461 URL: https://issues.apache.org/jira/browse/HIVE-11461 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-11461) Transform flat AND/OR into IN struct clause
[ https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reopened HIVE-11461: --- Updated wrong jira. My bad. Transform flat AND/OR into IN struct clause --- Key: HIVE-11461 URL: https://issues.apache.org/jira/browse/HIVE-11461 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681015#comment-14681015 ] Yongzhi Chen commented on HIVE-11502: - [~gopalv], thanks for the workaround. But I am afraid some users do not want to change their input format. And this HashMap may affect mapjoin too. We help a user workaround this map side aggregation issue by set hive.map.aggr = false; After that, the simple group test case has very good performance, but a more complicated join query with group by as subquery stuck on mapjoin. So we have to let the user turn off mapjoin by set hive.auto.convert.join=false; The performance hit by this bug is really outstanding. Without workaround, none of the query can finish in several hours. So I think we have to fix it. Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-11515: - Attachment: HIVE-11515.1.patch.txt Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-11515: - Description: Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. (was: Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that init() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before init() is called.) Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681038#comment-14681038 ] Aaron Tokhy commented on HIVE-10631: Reading more about hive.stats.reliable, it did not appear to be appropriate to use it in this case, and to instead it would be better to defer stats calculation for partitioned tables when partitions are being added to a table (MSCK/ALTER TABLE), and not on table creation (CREATE [EXTERNAL] TABLE) create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 1.0.0 Reporter: Dongwook Kwon Priority: Minor HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11515: --- Component/s: Tez Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor, Tez Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization
[ https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681052#comment-14681052 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11387: -- [~pxiong] I need to get the patch into branch-1 as well. The patch does not apply cleanly with branch-1. Could you please upload one. Thanks Hari CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization -- Key: HIVE-11387 URL: https://issues.apache.org/jira/browse/HIVE-11387 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 2.0.0 Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, HIVE-11387.06.patch, HIVE-11387.07.patch The main problem is that, due to return path, now we may have {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The main problem is that it does not take into account of the setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11504: Attachment: HIVE-11504.2.patch Hi [~spena], please help me review this patch. Thank you! Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10625) Handle Authorization for 'select expr' hive queries in SQL Standard Authorization
[ https://issues.apache.org/jira/browse/HIVE-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou resolved HIVE-10625. -- Resolution: Duplicate Same work is going on at HIVE-11498,so close this one. Handle Authorization for 'select expr' hive queries in SQL Standard Authorization - Key: HIVE-10625 URL: https://issues.apache.org/jira/browse/HIVE-10625 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 1.1.0 Reporter: Nemon Lou Hive internally rewrites this 'select expression' query into 'select expression from _dummy_database._dummy_table', where these dummy db and table are temp entities for the current query. The SQL Standard Authorization need to handle these special objects. Typing select reverse(123); in beeline,will get this error : {code} Error: Error while compiling statement: FAILED: HiveAuthzPluginException Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=_dummy_database._dummy_table] (state=42000,code=4) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF
[ https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681098#comment-14681098 ] Hive QA commented on HIVE-11480: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12749652/HIVE-11480.03.patch {color:green}SUCCESS:{color} +1 9347 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4911/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4911/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4911/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12749652 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF --- Key: HIVE-11480 URL: https://issues.apache.org/jira/browse/HIVE-11480 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11480.01.patch, HIVE-11480.02.patch, HIVE-11480.03.patch Some of the UDAF can not deal with char/varchar correctly when return path is on, for example udaf_number_format.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error
[ https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8326: - Fix Version/s: 1.0.2 Including this fix to branch-1.0 Using DbTxnManager with concurrency off results in run time error - Key: HIVE-8326 URL: https://issues.apache.org/jira/browse/HIVE-8326 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 1.1.0, 1.0.2 Attachments: HIVE-8326.patch Setting {code} hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.support.concurrency=false {code} results in queries failing at runtime with an NPE in DbTxnManager.heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11506) Casting varchar/char type to string cannot be vectorized
[ https://issues.apache.org/jira/browse/HIVE-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-11506: - Attachment: HIVE-11506.2.patch.txt Updated golden files Casting varchar/char type to string cannot be vectorized Key: HIVE-11506 URL: https://issues.apache.org/jira/browse/HIVE-11506 Project: Hive Issue Type: Improvement Components: Vectorization Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-11506.1.patch.txt, HIVE-11506.2.patch.txt It's not defined in vectorization context. {code} explain select cast(cast(cstring1 as varchar(10)) as string) x from alltypesorc order by x; {code} Mapper is not vectorized by exception, {noformat} 015-08-10 17:02:08,003 INFO [main]: physical.Vectorizer (Vectorizer.java:validateExprNodeDesc(1299)) - Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Unhandled cast input type: varchar(10) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getCastToString(VectorizationContext.java:1543) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUDFBridgeVectorExpression(VectorizationContext.java:1379) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1177) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1293) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1284) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1116) at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:906) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: (was: HIVE-10631.patch) create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch.1 HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch.1 create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch.1 HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)