[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: (was: HIVE-3454.4.patch) Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9831) HiveServer2 should use ConcurrentHashMap in ThreadFactory
[ https://issues.apache.org/jira/browse/HIVE-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343949#comment-14343949 ] Thejas M Nair commented on HIVE-9831: - +1 HiveServer2 should use ConcurrentHashMap in ThreadFactory - Key: HIVE-9831 URL: https://issues.apache.org/jira/browse/HIVE-9831 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.0.0, 1.2.0, 1.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 1.2.0 Attachments: HIVE-9831.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9830) Test auto_sortmerge_join_8 is flaky [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9830: -- Summary: Test auto_sortmerge_join_8 is flaky [Spark Branch] (was: Test auto_sortmerge_join_8 is flaky) Test auto_sortmerge_join_8 is flaky [Spark Branch] -- Key: HIVE-9830 URL: https://issues.apache.org/jira/browse/HIVE-9830 Project: Hive Issue Type: Bug Components: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch We found auto_sortmerge_join_8 is flaky is flaky for Spark. Sometimes, the output could be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9744) Move common arguments validation and value extraction code to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343824#comment-14343824 ] Jason Dere commented on HIVE-9744: -- +1 Move common arguments validation and value extraction code to GenericUDF Key: HIVE-9744 URL: https://issues.apache.org/jira/browse/HIVE-9744 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9744.1.patch, HIVE-9744.2.patch, HIVE-9744.3.patch, HIVE-9744.5.patch most of the UDFs - check if arguments are primitive / complex - check if arguments are particular type or type_group - get converters to read values - check if argument is constant - extract arguments values Probably we should move these common methods to GenericUDF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9809) Fix FindBugs found bugs in hive-exec
[ https://issues.apache.org/jira/browse/HIVE-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343888#comment-14343888 ] Jason Dere commented on HIVE-9809: -- Do you have a list of the errors reported by FindBugs? Fix FindBugs found bugs in hive-exec Key: HIVE-9809 URL: https://issues.apache.org/jira/browse/HIVE-9809 Project: Hive Issue Type: Bug Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9809.1.patch, HIVE-9809.2.patch FindBugs finds several bugs in hive-exec project -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9775) LLAP: Add a MiniLLAPCluster for tests
[ https://issues.apache.org/jira/browse/HIVE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9775: - Attachment: HIVE-9775.1.patch Patch to add a MiniLLAPCluster. This isn't wired into the tests and shims just yet - that needs some more work with circular dependencies and such. Will figure that out in a separate jira. Applies on top of HIVE-9808. LLAP: Add a MiniLLAPCluster for tests - Key: HIVE-9775 URL: https://issues.apache.org/jira/browse/HIVE-9775 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-9775.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9779) ATSHook does not log the end user if doAs=false (it logs the hs2 server user)
[ https://issues.apache.org/jira/browse/HIVE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-9779: -- Attachment: (was: 9979.002.patch) ATSHook does not log the end user if doAs=false (it logs the hs2 server user) - Key: HIVE-9779 URL: https://issues.apache.org/jira/browse/HIVE-9779 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: 9979.001.patch, HIVE-9779-testing.xlsx When doAs=false, ATSHook should log the end username in ATS instead of logging the hiveserver2 user's name. The way things are, it is not possible for an admin to identify which query is being run by which user. The end user information is already available in the HookContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
[ https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-9182: -- Attachment: HIVE-9182.1.patch avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl - Key: HIVE-9182 URL: https://issues.apache.org/jira/browse/HIVE-9182 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Abdelrahman Shettia Fix For: 1.2.0 Attachments: HIVE-9182.1.patch File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl functionality. Hadoop23Shims has code that calls getAclStatus on file systems. Instead of calling getAclStatus and catching the exception, we can also check FsPermission#getAclBit . Additionally, instead of catching all exceptions for calls to getAclStatus and ignoring them, it is better to just catch UnsupportedOperationException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
[ https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344189#comment-14344189 ] Abdelrahman Shettia commented on HIVE-9182: --- Hi [~thejas], I have uploaded the patch file called HIVE-9182.1.patch and used the recommended FsPermission#getAclBit. Thanks -Rahman avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl - Key: HIVE-9182 URL: https://issues.apache.org/jira/browse/HIVE-9182 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Abdelrahman Shettia Fix For: 1.2.0 Attachments: HIVE-9182.1.patch File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl functionality. Hadoop23Shims has code that calls getAclStatus on file systems. Instead of calling getAclStatus and catching the exception, we can also check FsPermission#getAclBit . Additionally, instead of catching all exceptions for calls to getAclStatus and ignoring them, it is better to just catch UnsupportedOperationException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4940) udaf_percentile_approx.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344219#comment-14344219 ] Wei Zheng commented on HIVE-4940: - HIVE-9833 has been opened to report the same issue. It seems the problem is still there. udaf_percentile_approx.q is not deterministic - Key: HIVE-4940 URL: https://issues.apache.org/jira/browse/HIVE-4940 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4940.D12189.1.patch Makes different result for 20(S) and 23. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9779) ATSHook does not log the end user if doAs=false (it logs the hs2 server user)
[ https://issues.apache.org/jira/browse/HIVE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-9779: -- Attachment: HIVE-9779-testing.xlsx ATSHook does not log the end user if doAs=false (it logs the hs2 server user) - Key: HIVE-9779 URL: https://issues.apache.org/jira/browse/HIVE-9779 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: 9979.001.patch, 9979.002.patch, HIVE-9779-testing.xlsx When doAs=false, ATSHook should log the end username in ATS instead of logging the hiveserver2 user's name. The way things are, it is not possible for an admin to identify which query is being run by which user. The end user information is already available in the HookContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9480) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY
[ https://issues.apache.org/jira/browse/HIVE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9480: - Issue Type: Improvement (was: Bug) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY Key: HIVE-9480 URL: https://issues.apache.org/jira/browse/HIVE-9480 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 1.2.0 Attachments: HIVE-9480.1.patch, HIVE-9480.3.patch, HIVE-9480.4.patch, HIVE-9480.5.patch, HIVE-9480.6.patch, HIVE-9480.7.patch, HIVE-9480.8.patch, HIVE-9480.9.patch Hive already supports LAST_DAY UDF, in some cases, FIRST_DAY is necessary to do date/timestamp related computation. This JIRA is to track such an implementation. Choose to impl TRUNC, a more standard way to get the first day of a a month, e.g., SELECT TRUNC('2009-12-12', 'MM'); will return 2009-12-01, SELECT TRUNC('2009-12-12', 'YEAR'); will return 2009-01-01. BTW, this TRUNC is not as feature complete as aligned with Oracle one. only 'MM' and 'YEAR' are supported as format, however, it's a base to add on other formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9480) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY
[ https://issues.apache.org/jira/browse/HIVE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9480: - Affects Version/s: (was: 0.14.0) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY Key: HIVE-9480 URL: https://issues.apache.org/jira/browse/HIVE-9480 Project: Hive Issue Type: Bug Components: UDF Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 1.2.0 Attachments: HIVE-9480.1.patch, HIVE-9480.3.patch, HIVE-9480.4.patch, HIVE-9480.5.patch, HIVE-9480.6.patch, HIVE-9480.7.patch, HIVE-9480.8.patch, HIVE-9480.9.patch Hive already supports LAST_DAY UDF, in some cases, FIRST_DAY is necessary to do date/timestamp related computation. This JIRA is to track such an implementation. Choose to impl TRUNC, a more standard way to get the first day of a a month, e.g., SELECT TRUNC('2009-12-12', 'MM'); will return 2009-12-01, SELECT TRUNC('2009-12-12', 'YEAR'); will return 2009-01-01. BTW, this TRUNC is not as feature complete as aligned with Oracle one. only 'MM' and 'YEAR' are supported as format, however, it's a base to add on other formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9830) Map join could dump a small table multiple times [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9830: -- Attachment: HIVE-9830.2-spark.patch Map join could dump a small table multiple times [Spark Branch] --- Key: HIVE-9830 URL: https://issues.apache.org/jira/browse/HIVE-9830 Project: Hive Issue Type: Bug Components: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-9830.1-spark.patch, HIVE-9830.2-spark.patch We found auto_sortmerge_join_8 is flaky is flaky for Spark. Sometimes, the output could be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9779) ATSHook does not log the end user if doAs=false (it logs the hs2 server user)
[ https://issues.apache.org/jira/browse/HIVE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344027#comment-14344027 ] Abdelrahman Shettia commented on HIVE-9779: --- I have uploaded excel sheet called HIVE-9779-testing. It has all the details of the test cases. Thanks -Rahman ATSHook does not log the end user if doAs=false (it logs the hs2 server user) - Key: HIVE-9779 URL: https://issues.apache.org/jira/browse/HIVE-9779 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: 9979.001.patch, 9979.002.patch, HIVE-9779-testing.xlsx When doAs=false, ATSHook should log the end username in ATS instead of logging the hiveserver2 user's name. The way things are, it is not possible for an admin to identify which query is being run by which user. The end user information is already available in the HookContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344079#comment-14344079 ] Hive QA commented on HIVE-9277: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12701966/HIVE-9277.06.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7579 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_hybridhashjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2920/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2920/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2920/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12701966 - PreCommit-HIVE-TRUNK-Build Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, HIVE-9277.06.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash join”_. We can benefit from this feature as illustrated below: * The query will not fail even if the estimated memory requirement is slightly wrong * Expensive garbage collection overhead can be avoided when hash table grows * Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9834) VectorGroupByOperator logs too much
[ https://issues.apache.org/jira/browse/HIVE-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9834: --- Attachment: HIVE-9834.patch [~ashutoshc] you appear to have added this. Can you take a look? It logs every row on debug level, causing even q tests to be slow VectorGroupByOperator logs too much --- Key: HIVE-9834 URL: https://issues.apache.org/jira/browse/HIVE-9834 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9834.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344456#comment-14344456 ] Xin Hao commented on HIVE-9659: --- Hi, Rui, tried to verify this issue based on HIVE-9659.1-spark.patch, and seems that the issue still exists. Could you update Big-Bench to latest version to have a double check (Q12 has update recently)? Thanks. 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Assignee: Rui Li Attachments: HIVE-9659.1-spark.patch We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO
[jira] [Commented] (HIVE-9830) Map join could dump a small table multiple times [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344161#comment-14344161 ] Hive QA commented on HIVE-9830: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12702002/HIVE-9830.1-spark.patch {color:green}SUCCESS:{color} +1 7567 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/753/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/753/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-753/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12702002 - PreCommit-HIVE-SPARK-Build Map join could dump a small table multiple times [Spark Branch] --- Key: HIVE-9830 URL: https://issues.apache.org/jira/browse/HIVE-9830 Project: Hive Issue Type: Bug Components: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-9830.1-spark.patch, HIVE-9830.2-spark.patch We found auto_sortmerge_join_8 is flaky is flaky for Spark. Sometimes, the output could be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344229#comment-14344229 ] Wei Zheng commented on HIVE-9277: - Right now I'm using HIVECONVERTJOINNOCONDITIONALTASK as a threshold to do estimation. Once the memory management part is ready, I can rely on that to provide me an exact number. Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, HIVE-9277.06.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash join”_. We can benefit from this feature as illustrated below: * The query will not fail even if the estimated memory requirement is slightly wrong * Expensive garbage collection overhead can be avoided when hash table grows * Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9674) *DropPartitionEvent should handle partition-sets.
[ https://issues.apache.org/jira/browse/HIVE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9674: --- Attachment: HIVE-9736.3.patch [~cdrome] has me know (thank you!) that I'd neglected to change {{TestMetaStoreEventListener}} for this change. Here's the emended patch. *DropPartitionEvent should handle partition-sets. - Key: HIVE-9674 URL: https://issues.apache.org/jira/browse/HIVE-9674 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9674.2.patch, HIVE-9736.3.patch Dropping a set of N partitions from a table currently results in N DropPartitionEvents (and N PreDropPartitionEvents) being fired serially. This is wasteful, especially so for large N. It also makes it impossible to even try to run authorization-checks on all partitions in a batch. Taking the cue from HIVE-9609, we should compose an {{IterablePartition}} in the event, and expose them via an {{Iterator}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
[ https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344206#comment-14344206 ] Chris Nauroth commented on HIVE-9182: - Hi [~ashettia] and [~thejas]. Do you think {{setFullFileStatus}} needs to be changed too? avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl - Key: HIVE-9182 URL: https://issues.apache.org/jira/browse/HIVE-9182 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Abdelrahman Shettia Fix For: 1.2.0 Attachments: HIVE-9182.1.patch File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl functionality. Hadoop23Shims has code that calls getAclStatus on file systems. Instead of calling getAclStatus and catching the exception, we can also check FsPermission#getAclBit . Additionally, instead of catching all exceptions for calls to getAclStatus and ignoring them, it is better to just catch UnsupportedOperationException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9837) LLAP: Decision to use llap or uber is being lost in some reducers
[ https://issues.apache.org/jira/browse/HIVE-9837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-9837. -- Resolution: Fixed Fix Version/s: llap Committed to branch. LLAP: Decision to use llap or uber is being lost in some reducers - Key: HIVE-9837 URL: https://issues.apache.org/jira/browse/HIVE-9837 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: llap Attachments: HIVE-9837.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9779) ATSHook does not log the end user if doAs=false (it logs the hs2 server user)
[ https://issues.apache.org/jira/browse/HIVE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-9779: -- Attachment: (was: 9979.001.patch) ATSHook does not log the end user if doAs=false (it logs the hs2 server user) - Key: HIVE-9779 URL: https://issues.apache.org/jira/browse/HIVE-9779 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0 Reporter: Abdelrahman Shettia Assignee: Abdelrahman Shettia Attachments: HIVE-9779-testing.xlsx When doAs=false, ATSHook should log the end username in ATS instead of logging the hiveserver2 user's name. The way things are, it is not possible for an admin to identify which query is being run by which user. The end user information is already available in the HookContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9834) VectorGroupByOperator logs too much
[ https://issues.apache.org/jira/browse/HIVE-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9834: --- Priority: Trivial (was: Major) VectorGroupByOperator logs too much --- Key: HIVE-9834 URL: https://issues.apache.org/jira/browse/HIVE-9834 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-9834.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
[ https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344635#comment-14344635 ] Hive QA commented on HIVE-9182: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12702036/HIVE-9182.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7587 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2925/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2925/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2925/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12702036 - PreCommit-HIVE-TRUNK-Build avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl - Key: HIVE-9182 URL: https://issues.apache.org/jira/browse/HIVE-9182 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Abdelrahman Shettia Fix For: 1.2.0 Attachments: HIVE-9182.1.patch File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl functionality. Hadoop23Shims has code that calls getAclStatus on file systems. Instead of calling getAclStatus and catching the exception, we can also check FsPermission#getAclBit . Additionally, instead of catching all exceptions for calls to getAclStatus and ignoring them, it is better to just catch UnsupportedOperationException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9711) ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN
[ https://issues.apache.org/jira/browse/HIVE-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-9711: - Assignee: Gopal V ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN - Key: HIVE-9711 URL: https://issues.apache.org/jira/browse/HIVE-9711 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V The isRepeating=true check uses Java equality, which results in NaN != NaN comparison operations. The noNulls case needs the current check folded into the previous loop, while the hasNulls case needs a logical AND of the isNull[] field instead of == comparisons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9825) CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch]
[ https://issues.apache.org/jira/browse/HIVE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-9825: -- Summary: CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch] (was: CBO (Calcite Return Path): Translate PTFs to Hive Op [CBO branch]) CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch] --- Key: HIVE-9825 URL: https://issues.apache.org/jira/browse/HIVE-9825 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: cbo-branch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9827) LLAP: Make stripe level column readers thread safe
[ https://issues.apache.org/jira/browse/HIVE-9827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343524#comment-14343524 ] Prasanth Jayachandran commented on HIVE-9827: - Committed patch to llap branch. LLAP: Make stripe level column readers thread safe -- Key: HIVE-9827 URL: https://issues.apache.org/jira/browse/HIVE-9827 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: llap Attachments: HIVE-9827-llap.patch previousStripeIndex used in OrcColumnVectorProducer is not thread safe as OrcColumnVectorProducer is singleton. Move it to OrcEncodedDataConsumer which is per query object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9664: Attachment: HIVE-9664.patch Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Anant Nag Labels: hive, patch Attachments: HIVE-9664.patch Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9664: Attachment: (was: HIVE-9664.patch) Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Anant Nag Labels: hive, patch Attachments: HIVE-9664.patch Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9677) Implement privileges call in HBaseStore
[ https://issues.apache.org/jira/browse/HIVE-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-9677: - Attachment: HIVE-9677.patch This patch is more complicated than many of the previous ones. Due to the hierarchical nature of roles and the fact that users can belong to multiple roles it was not possible to do all operations by a direct key lookup as it is with fetching tables, partitions, etc. Obviously this makes things more complicated for HBase. To resolve this I stored the information to different ways: 1) In the ROLES table, each role stores all users and roles that have been directly included in it (that is, granted that role). 2) I added a new table USER_TO_ROLE that for each user, lists all roles the user is in either directly or indirectly. The USER_TO_ROLES table is built to be very efficient for DML/select queries where we need to quickly know what roles the user participates in. However, it is expensive to build, as each row requires a multi-pass walk of the ROLES table. This is alleviated somewhat by reading the entire ROLES table in memory before rebuilding the table. This does mean that adding a user to a role or dropping him is somewhat expensive as the row for that user in the USER_TO_ROLES table has to be rebuilt. Adding a role to another role, dropping a role from another role, or dropping a role altogether is very expensive because multiple rows in the USER_TO_ROLE table have to be rebuilt. Given that grant/revoke statements are very rare compared to DML/select queries and rarely performance sensitive, it makes sense to take grants and revokes take a few more seconds in order to shave milliseconds off each DML or select operation. Implement privileges call in HBaseStore --- Key: HIVE-9677 URL: https://issues.apache.org/jira/browse/HIVE-9677 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-9677.patch All of the list*Grants methods, grantPrivileges, and revokePrivileges need to be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9083) New metastore API to support to purge partition-data directly in dropPartitions().
[ https://issues.apache.org/jira/browse/HIVE-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9083: --- Affects Version/s: (was: 0.15.0) 1.1.0 1.0.0 New metastore API to support to purge partition-data directly in dropPartitions(). -- Key: HIVE-9083 URL: https://issues.apache.org/jira/browse/HIVE-9083 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.1.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 1.2.0 Attachments: HIVE-9083.3.patch, HIVE-9083.4.patch, HIVE-9083.5.patch HIVE-7100 adds the option to purge table-data when dropping a table (from Hive CLI.) This patch adds HiveMetaStoreClient APIs to support the same for {{dropPartitions()}}. (I'll add a follow-up to support a command-line option for the same.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9827) LLAP: Make stripe level column readers thread safe
[ https://issues.apache.org/jira/browse/HIVE-9827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-9827. - Resolution: Fixed LLAP: Make stripe level column readers thread safe -- Key: HIVE-9827 URL: https://issues.apache.org/jira/browse/HIVE-9827 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: llap Attachments: HIVE-9827-llap.patch previousStripeIndex used in OrcColumnVectorProducer is not thread safe as OrcColumnVectorProducer is singleton. Move it to OrcEncodedDataConsumer which is per query object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343682#comment-14343682 ] Alexander Pivovarov commented on HIVE-9518: --- I did it on Feb. 27, 2015, 7:38 p.m. - several minor issues. Check RB Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Bug Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9788) Make double quote optional in tsv/csv/dsv output
[ https://issues.apache.org/jira/browse/HIVE-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343683#comment-14343683 ] Naveen Gangam commented on HIVE-9788: - [~Ferd] Thank you for re-working the fix. What if the value contains a separator char? Based on my limited testing, if the data returned by the result set contains the separator char, the formatted string returned by the CsvListWriter accounted for those characters. Looking at the code, the code appears to add a the quote string around each _quote char_ found in the column value, and then wraps the entire column in a pair of _quotes_. So to me, it appears that it should work with all sorts of characters within the data. Could you please elaborate your usecase where this fails? Also if we do want to retain backward compatibility, would making disableQuotingForSV a system property-based instead of commandline option be better suited? It is easier to stop support for a system property(ignore) than for a commandline switch, if and when we stop supporting this option (or if a new version of super-csv becomes available that does not support this). Just my opinion on this. Thanks again Make double quote optional in tsv/csv/dsv output Key: HIVE-9788 URL: https://issues.apache.org/jira/browse/HIVE-9788 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-9788.1.patch, HIVE-9788.patch Similar to HIVE-7390 some customers would like the double quotes to be optional. So if the data is {{A}} then the output from beeline should be {{A}} which is the same as the Hive CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)