[jira] [Commented] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
[ https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723468#comment-13723468 ] Hive QA commented on HIVE-4951: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594802/HIVE-4951.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/229/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/229/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels) - Key: HIVE-4951 URL: https://issues.apache.org/jira/browse/HIVE-4951 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4951.1.patch combine2.q was updated in HIVE-3253, the corresponding change is missing in combine2_win.q, causing it to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions
[ https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723510#comment-13723510 ] Prasad Mujumdar commented on HIVE-4638: --- [~ashutoshc] my apologies for missing the comment earlier. We found the issue in one of our internal integration tests with Cloudera Manager. The exec hook retrieves the query start time hookContext.getQueryPlan().getQueryStartTime()which sometimes used to return bogus timestamp values. The problem didn't reproduce with the patch. Thread local PerfLog can get shared by multiple hiveserver2 sessions Key: HIVE-4638 URL: https://issues.apache.org/jira/browse/HIVE-4638 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4638-1.patch The PerfLog is accessed as thread local which can be shared by multiple hiveserver2 session, overwriting query runtime information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723511#comment-13723511 ] Hive QA commented on HIVE-4870: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594826/HIVE-4870.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/230/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/230/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723513#comment-13723513 ] Hive QA commented on HIVE-4954: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594832/HIVE-4954.1.patch.txt Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/231/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/231/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-231/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientpositive/join33.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_7.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_2.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin7.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin11.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin2.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_4.q.out' Reverted 'ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin9.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin13.q.out' Reverted 'ql/src/test/results/clientpositive/union22.q.out' Reverted 'ql/src/test/results/clientpositive/join32.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_1.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin10.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin1.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_8.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out' Reverted 'ql/src/test/results/clientpositive/bucketcontext_3.q.out' Reverted 'ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin8.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin12.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin3.q.out' Reverted 'ql/src/test/results/clientpositive/join32_lessSize.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out' Reverted 'ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out' Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out' Reverted 'ql/src/test/results/clientpositive/stats11.q.out' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen ql/src/test/results/clientpositive/bucketmapjoin2.q.out.orig ql/src/test/results/clientpositive/join32_lessSize.q.out.orig ql/src/test/results/clientpositive/bucketmapjoin1.q.out.orig ql/src/test/results/clientpositive/bucketmapjoin7.q.out.orig ql/src/test/results/clientpositive/union22.q.out.orig ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out.orig ql/src/test/results/clientpositive/bucketmapjoin3.q.out.orig + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1508328. At revision 1508328. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat}
[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time
[ https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723515#comment-13723515 ] Hive QA commented on HIVE-4879: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594848/HIVE-4879.3.patch.txt Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/232/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/232/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-232/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1508328. At revision 1508328. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. Window functions that imply order can only be registered at compile time Key: HIVE-4879 URL: https://issues.apache.org/jira/browse/HIVE-4879 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.12.0 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, HIVE-4879.3.patch.txt Adding an annotation for impliesOrder -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions
[ https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723516#comment-13723516 ] Ashutosh Chauhan commented on HIVE-4638: I see. Can you rebase the patch? Lets get it in. Thread local PerfLog can get shared by multiple hiveserver2 sessions Key: HIVE-4638 URL: https://issues.apache.org/jira/browse/HIVE-4638 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4638-1.patch The PerfLog is accessed as thread local which can be shared by multiple hiveserver2 session, overwriting query runtime information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions
[ https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4638: -- Attachment: HIVE-4638-2.patch Rebased patch Thread local PerfLog can get shared by multiple hiveserver2 sessions Key: HIVE-4638 URL: https://issues.apache.org/jira/browse/HIVE-4638 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4638-1.patch, HIVE-4638-2.patch The PerfLog is accessed as thread local which can be shared by multiple hiveserver2 session, overwriting query runtime information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
[ https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723526#comment-13723526 ] Thejas M Nair commented on HIVE-4574: - Regarding the bug report, that has gone into the black hole of oracle bug reporting system. I haven't heard back from the review process. I wish it was really more *open* ! Maybe, switching to a different serialization format as suggested in HIVE-1511 is the best bet. XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck -- Key: HIVE-4574 URL: https://issues.apache.org/jira/browse/HIVE-4574 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4574.1.patch In open jdk7, XMLEncoder.writeObject call leads to calls to java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe because it uses a static WeakHashMap that would get used from multiple threads. See - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46 Concurrent access to HashMap implementation that are not thread safe can sometimes result in infinite-loops and other problems. If jdk7 is in use, it makes sense to synchronize calls to XMLEncoder.writeObject . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. code public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); /code From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null. so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote: Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723541#comment-13723541 ] Jason Dere commented on HIVE-3976: -- Hi Ed/Xuefu, yeah I have similar issues for setting the length parameter for char/varchar types for https://issues.apache.org/jira/browse/HIVE-4844. I've got some prototype code for this, not entirely sure if I have the best approach, was going to try to work through this a bit more but if you'd like I can post a patch for you guys to take a look/comment on. Basically I've added parameterized versions of PrimitiveTypeEntry, PrimitiveTypeInfo, and ObjectInspector, with additional factory methods for these types so that the caller can fetch TypeEntry/TypeInfo/ObjectInspector based on PrimitiveCategory + type parameters. Will definitely need to work out how this interacts with the existing system as currently all of those types have liberal use of pointer-based equality, and there seem to be some instances where it may not be possible to have have access to type params when trying to get the TypeEntry/TypeInfo/ObjectInspector. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723567#comment-13723567 ] Hive QA commented on HIVE-3256: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594735/HIVE-3256.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/233/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/233/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4955) serde_user_properties.q.out needs to be updated
Thejas M Nair created HIVE-4955: --- Summary: serde_user_properties.q.out needs to be updated Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
[ https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723584#comment-13723584 ] Thejas M Nair commented on HIVE-4951: - The test failures are unrelated to this q.out file change. TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask is known to be a flaky test (HIVE-4851). TestCliDriver.testCliDriver_serde_user_properties failure also unrelated to this change, created HIVE-4955 to track that. combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels) - Key: HIVE-4951 URL: https://issues.apache.org/jira/browse/HIVE-4951 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4951.1.patch combine2.q was updated in HIVE-3253, the corresponding change is missing in combine2_win.q, causing it to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4955: Assignee: Thejas M Nair serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4955: Attachment: HIVE-4955.1.patch serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4955: Status: Patch Available (was: Open) serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723612#comment-13723612 ] Hive QA commented on HIVE-2608: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594851/HIVE-2608.8.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 2737 tests executed *Failed tests:* {noformat} org.apache.hcatalog.pig.TestOrcHCatLoader.testReadDataBasic org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lateral_view_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported2 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/234/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/234/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. Do not require AS a,b,c part in LATERAL VIEW Key: HIVE-2608 URL: https://issues.apache.org/jira/browse/HIVE-2608 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Reporter: Igor Kabiljo Assignee: Navis Priority: Minor Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch Currently, it is required to state column names when LATERAL VIEW is used. That shouldn't be necessary, since UDTF returns struct which contains column names - and they should be used by default. For example, it would be great if this was possible: SELECT t.*, t.key1 + t.key4 FROM some_table LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723673#comment-13723673 ] Hive QA commented on HIVE-4843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594872/HIVE-4843.4.patch {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2_hadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/235/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/235/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-821) Return better status messages from HWI
[ https://issues.apache.org/jira/browse/HIVE-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723684#comment-13723684 ] manuel aldana commented on HIVE-821: what's the current status of this? as the dependent HIVE-862 + HIVE-795 are resolved we could start to work on this? Return better status messages from HWI -- Key: HIVE-821 URL: https://issues.apache.org/jira/browse/HIVE-821 Project: Hive Issue Type: New Feature Components: Web UI Reporter: Edward Capriolo Assignee: Edward Capriolo Users of HWI only receive numeric status code. We should return the message to them as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-821) Return better status messages from HWI
[ https://issues.apache.org/jira/browse/HIVE-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723685#comment-13723685 ] manuel aldana commented on HIVE-821: also progress (typical hive 0%100% output) information would be very helpful. Return better status messages from HWI -- Key: HIVE-821 URL: https://issues.apache.org/jira/browse/HIVE-821 Project: Hive Issue Type: New Feature Components: Web UI Reporter: Edward Capriolo Assignee: Edward Capriolo Users of HWI only receive numeric status code. We should return the message to them as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3618) Hive jdbc or thrift server didn't support a method to get job progress when hive client execute a query
[ https://issues.apache.org/jira/browse/HIVE-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723692#comment-13723692 ] manuel aldana commented on HIVE-3618: - is this now possible with hiveServer2 ? Hive jdbc or thrift server didn't support a method to get job progress when hive client execute a query --- Key: HIVE-3618 URL: https://issues.apache.org/jira/browse/HIVE-3618 Project: Hive Issue Type: Wish Components: JDBC, Thrift API, Web UI Affects Versions: 0.7.1, 0.8.0, 0.9.0 Reporter: chenyukang I am writing a Hive web client to run a Hive query using Hive JDBC driver or Hive Thrift Server. Since the data amount is huge I really would like to see the progress when the query is running. I want to get the job progress. otherwise user should wait and be blocked -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4850) Implement vectorized JOIN operators
[ https://issues.apache.org/jira/browse/HIVE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-4850: --- Attachment: HIVE-4850.1.patch This is an initial implementation of Map join. Multiple join aliases and multiple values per key work. The small aliases are row mode data (writable objects) and get converted to vector values *for each row in the bit table* (after filtering). Also the map hash has row mode keys (objects) and the vector mode keys get converted to object keys for lookup of *each row in the big table* (after filtering). Implement vectorized JOIN operators --- Key: HIVE-4850 URL: https://issues.apache.org/jira/browse/HIVE-4850 Project: Hive Issue Type: Sub-task Reporter: Remus Rusanu Assignee: Remus Rusanu Attachments: HIVE-4850.1.patch Easysauce -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
Amareshwari Sriramadasu created HIVE-4956: - Summary: Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723712#comment-13723712 ] Amareshwari Sriramadasu commented on HIVE-4956: --- The same usecase can be applied for tables stored at different rollups like daily rollups and hourly rollups. Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently - Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 13059: HIVE-4850 Implement vector mode map join
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13059/ --- Review request for hive, Eric Hanson and Jitendra Pandey. Bugs: HIVE-4850 https://issues.apache.org/jira/browse/HIVE-4850 Repository: hive-git Description --- This is not the final iteration, but I thought is easier to discuss it with a review. This implementation works, handles multiple aliases and multiple values per key. The implementation uses the exiting hash tables saved by the local task for the map join, which are row mode hash tables (have row mode keys and store row mode writable object values). Going forward we should avoid the size-of-big-table conversions of big table keys to row-mode and conversion of small table values to vector data. This would require either converting on-the-fly the hash tables to vector friendly ones (when loaded) or changing the local task tahstable sink to create a vectorization friendly hash. First approach may have memory consumption problems (potentially two hash tables end up in memory, would have to stream the transformation or transform as reading from serialized format... nasty). Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 82d4b93 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 31dbf41 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 4da1be8 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 29de38d ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java e579c00 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinDoubleKeys.java d774226 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectKey.java 791bb3f ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java 58a9dc0 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinSingleKey.java 4bff936 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExecMapper.java 083b9b9 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 41d2001 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 9c90230 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableDummyDesc.java f15ce48 Diff: https://reviews.apache.org/r/13059/diff/ Testing --- Manually run some join queries on alltypes_orc table. Thanks, Remus Rusanu
[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723726#comment-13723726 ] Hive QA commented on HIVE-4952: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594874/HIVE-4952.D11889.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2737 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/236/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/236/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723781#comment-13723781 ] Hive QA commented on HIVE-4955: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594901/HIVE-4955.1.patch {color:green}SUCCESS:{color} +1 2736 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/238/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/238/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723815#comment-13723815 ] Brock Noland commented on HIVE-4955: +1 serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4955: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) I committed this since it's causing the build to fail. Thanks for your contribution Thejas! serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.12.0 Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723833#comment-13723833 ] Brock Noland commented on HIVE-3256: Test failed due to HIVE-4955. Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Adachi updated HIVE-2564: - Attachment: HIVE-2564.patch Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: hive-2564.patch, HIVE-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Adachi updated HIVE-2564: - Attachment: HIVE-2564.1.patch Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: HIVE-2564.1.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Adachi updated HIVE-2564: - Attachment: (was: HIVE-2564.patch) Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: HIVE-2564.1.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HIVE-2137: - Labels: patch (was: ) JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Labels: patch Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723883#comment-13723883 ] Hive QA commented on HIVE-4388: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594727/HIVE-4388.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 2717 tests executed *Failed tests:* {noformat} junit.framework.TestSuite.org.apache.hcatalog.hbase.TestSnapshots junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestIDGenerator junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManagerEndpoint junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseInputFormat junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManager org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeI org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithTimestamp {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/239/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/239/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4388: --- Attachment: HIVE-4388.patch Update patch should fix some failures. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
[ https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723972#comment-13723972 ] Edward Capriolo commented on HIVE-4574: --- I am pretty close to having an xstream patch on HIVE-1511 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck -- Key: HIVE-4574 URL: https://issues.apache.org/jira/browse/HIVE-4574 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4574.1.patch In open jdk7, XMLEncoder.writeObject call leads to calls to java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe because it uses a static WeakHashMap that would get used from multiple threads. See - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46 Concurrent access to HashMap implementation that are not thread safe can sometimes result in infinite-loops and other problems. If jdk7 is in use, it makes sense to synchronize calls to XMLEncoder.writeObject . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-3256: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Thank you for the contribution Ashutosh! I have committed this to trunk. Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Fix For: 0.12.0 Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4920: --- Attachment: HIVE-4920.patch Minor update to handle the hcat tests (which use TestSuite) correctly. PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Critical Attachments: HIVE-4920.patch, HIVE-4920.patch, HIVE-4920.patch, Screen Shot 2013-07-23 at 3.35.00 PM.png We should handle spot price increases more gracefully and parallelize rsync to slaves better NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724015#comment-13724015 ] Hive QA commented on HIVE-2564: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594952/HIVE-2564.1.patch {color:red}ERROR:{color} -1 due to 58 failed/errored test(s), 2738 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hive.jdbc.TestJdbcDriver2.testSelectAll org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetColumnsMetaData org.apache.hive.jdbc.TestJdbcDriver2.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hive.jdbc.TestJdbcDriver2.testNullType org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder org.apache.hive.jdbc.TestJdbcDriver2.testProccedures org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllFetchSize org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes2 org.apache.hive.jdbc.TestJdbcDriver2.testOutOfBoundCols org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hive.jdbc.TestJdbcDriver2.testDriverProperties org.apache.hive.jdbc.TestJdbcDriver2.testErrorDiag org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hive.jdbc.TestJdbcDriver2.testProcCols org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol org.apache.hive.jdbc.TestJdbcDriver2.testPostClose org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hive.jdbc.TestJdbcDriver2.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetColumns org.apache.hive.jdbc.TestJdbcDriver2.testPrimaryKeys org.apache.hive.jdbc.TestJdbcDriver2.testShowTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hive.jdbc.TestJdbcDriver2.testShowDatabases org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hive.jdbc.TestJdbcDriver2.testBadURL org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetCatalogs org.apache.hive.jdbc.TestJdbcDriver2.testImportedKeys org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hive.jdbc.TestJdbcDriver2.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hive.jdbc.TestJdbcDriver2.testInvalidURL org.apache.hive.jdbc.TestJdbcDriver2.testSetCommand org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hive.jdbc.TestJdbcDriver2.testDatabaseMetaData org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hive.jdbc.TestJdbcDriver2.testExprCol org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowDatabases {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/240/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/240/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 58 tests failed {noformat} This message is automatically generated. Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: HIVE-2564.1.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724019#comment-13724019 ] Xuefu Zhang commented on HIVE-3976: --- Thanks, Jason. Please do attach a patch so that we can see how you plan to do it, even if it's incomplete. I think our issues belong to the same category, so a generic approach works best. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
Brock Noland created HIVE-4957: -- Summary: Restrict number of bit vectors, to prevent out of Java heap memory Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4958) AppConfig.init() loads webhcat-*.xml before core-*.xml
Eugene Koifman created HIVE-4958: Summary: AppConfig.init() loads webhcat-*.xml before core-*.xml Key: HIVE-4958 URL: https://issues.apache.org/jira/browse/HIVE-4958 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman This method first loads webhcat-*.xml and then core-*xml, mapred-*.xml, etc. Shouldn't it be in the opposite order? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3264: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Eli for the patch. Thanks, Mark for testcases. Thanks, Jakob for the review. Add support for binary dataype to AvroSerde --- Key: HIVE-3264 URL: https://issues.apache.org/jira/browse/HIVE-3264 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Jakob Homan Assignee: Eli Reisman Labels: patch Fix For: 0.12.0 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints. Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4525: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Mikhail! Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-2702. Resolution: Fixed Committed to trunk. Thanks, Sergey! Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality --- Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724133#comment-13724133 ] Xuefu Zhang commented on HIVE-4956: --- The syntax, select ... from T1, T2 ... without join, might cause semantic confusion as in some databases it really means crossing (Cartesian production), which has a different meaning from yours. From a database point of view, A table is a table, and two table are two tables. Treating two tables as one seems going beyond what SQL defines. It might be conceptually clearer if we allow tables have heterogeneous partitions. Of course, this may be more involved. Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently - Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724139#comment-13724139 ] Ashutosh Chauhan commented on HIVE-4956: Completely agreed with [~xuefuz] Lets not redefine sql semantics. Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently - Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-2564: --- Status: Open (was: Patch Available) Hey guys, I am excited about this patch! However, since there are test failures I am going to remove the Patch Available status. As soon as you have a another one please mark it Patch Available. Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Shinsuke Sugaya Attachments: HIVE-2564.1.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724152#comment-13724152 ] Sergey Shelukhin commented on HIVE-4051: I've fixed most of the queries, there are a couple of bugs and sorting is undefined (and changed as we no longer sort by partition name) in some tests, couple stubborn ones remain, hopefully will update today Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID` IS NOT NULL 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL) {code} This data is not detached or cached, so this operation is performed during every query plan for the partitions, even in the same hive client. The queries are automatically generated by JDO/DataNucleus which makes it nearly impossible to rewrite it into a single denormalized join operation process it locally. Attempts to optimize this with JDO fetch-groups did not bear fruit in improving the query count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4888) listPartitionsByFilter doesn't support lt/gt/lte/gte
[ https://issues.apache.org/jira/browse/HIVE-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724158#comment-13724158 ] Sergey Shelukhin commented on HIVE-4888: There doesn't appear to be any equivalent of Long.parse in DN/JDO. Casts are not it. I am playing with writing a DN SQLMethod plugin, but it would only work if DN is backed by SQL store as far as I see. I will file DN jira. HIVE-4051 makes pushdown work for SQL, so I might end up doing HIVE-4914 instead and having server decide to do pushdown to SQL or no pushdown for JDOQL for those. Depends on how seamless the plugin would be. listPartitionsByFilter doesn't support lt/gt/lte/gte Key: HIVE-4888 URL: https://issues.apache.org/jira/browse/HIVE-4888 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Filter pushdown could be improved. Based on my experiments there's no reasonable way to do it with DN 2.0, due to DN bug in substring and Collection.get(int) not being implemented. With version as low as 2.1 we can use values.get on partition to extract values to compare to. Type compatibility is an issue, but is easy for strings and integral values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724160#comment-13724160 ] Laurent Chouinard commented on HIVE-4051: - Hi, I will be on vacation from July 29th to August 6th inclusively. For any question or emergency, please contact the group mtl-it-production-to...@ubisoft.commailto:mtl-it-production-to...@ubisoft.com Thanks. Laurent Chouinard IT Production - Tools Programmer Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID` IS NOT NULL 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL) {code} This data is not detached or cached, so this operation is performed during every query plan for the partitions, even in the same hive client. The queries are automatically generated by JDO/DataNucleus which makes it nearly impossible to rewrite it into a single denormalized join operation process it locally. Attempts to optimize this with JDO fetch-groups did not bear fruit in improving the query count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4957: Assignee: Shreepadma Venugopalan Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults
[ https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4395: --- Status: Open (was: Patch Available) Prasad, I am cancelling this patch since it doesn't apply. When we have a patch that applies please change it to patch available! Support TFetchOrientation.FIRST for HiveServer2 FetchResults Key: HIVE-4395 URL: https://issues.apache.org/jira/browse/HIVE-4395 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch Currently HiveServer2 only support fetching next row (TFetchOrientation.NEXT). This ticket is to implement support for TFetchOrientation.FIRST that resets the fetch position at the begining of the resultset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4957 started by Shreepadma Venugopalan. Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724239#comment-13724239 ] Hive QA commented on HIVE-4388: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594972/HIVE-4388.patch {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 2737 tests executed *Failed tests:* {noformat} org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeHCatOutputFormatTestWithDefaultDB org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableIgnoreAbortedAndRunningTransactions org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableIgnoreAbortedTransactions org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat.directModeAbortTest org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.hbaseBulkOutputFormatTest org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.importSequenceFileTest org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseInputFormatProjectionReadMR org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeHCatOutputFormatTest org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeAbortTest org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat.directHCatOutputFormatTest {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/243/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/243/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
Jitendra Nath Pandey created HIVE-4959: -- Summary: Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: HIVE-4844.1.patch.hack Initial patch of progress, as other folks may be interested in type parameters for HIVE-3976. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724260#comment-13724260 ] Jason Dere commented on HIVE-3976: -- I've attached a patch to HIVE-4844, containing the current progress. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4920: --- Attachment: HIVE-4920.patch Trivial update to sort failed tests. PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Critical Attachments: HIVE-4920.patch, HIVE-4920.patch, HIVE-4920.patch, HIVE-4920.patch, Screen Shot 2013-07-23 at 3.35.00 PM.png We should handle spot price increases more gracefully and parallelize rsync to slaves better NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HIVE-2137: - Attachment: HIVE-2137.patch I forgot to drop the table I added in tearDown. Thank you for your advice, tamtam180! JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Labels: patch Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 12795: [HIVE-4827] Merge a Map-only job to its following MapReduce job with multiple inputs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12795/ --- (Updated July 30, 2013, 7:35 p.m.) Review request for hive. Bugs: HIVE-4827 https://issues.apache.org/jira/browse/HIVE-4827 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-4827 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cb59560 conf/hive-default.xml.template e0b7f5c ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 66b84ff ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java bf224e0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java f704ec1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java d532bb1 ql/src/test/queries/clientpositive/auto_join33.q 5c85842 ql/src/test/queries/clientpositive/correlationoptimizer1.q 2adf855 ql/src/test/queries/clientpositive/correlationoptimizer3.q fcbb764 ql/src/test/queries/clientpositive/correlationoptimizer4.q 0e84cb7 ql/src/test/queries/clientpositive/correlationoptimizer5.q 1900f5d ql/src/test/queries/clientpositive/correlationoptimizer6.q 88d790c ql/src/test/queries/clientpositive/correlationoptimizer7.q 9b18972 ql/src/test/queries/clientpositive/multiMapJoin1.q 86b0586 ql/src/test/queries/clientpositive/multiMapJoin2.q PRE-CREATION ql/src/test/queries/clientpositive/union34.q a88e395 ql/src/test/results/clientpositive/auto_join0.q.out c48181d ql/src/test/results/clientpositive/auto_join10.q.out deb8eb5 ql/src/test/results/clientpositive/auto_join11.q.out 82bc3f9 ql/src/test/results/clientpositive/auto_join12.q.out 1a170cb ql/src/test/results/clientpositive/auto_join13.q.out 948ca70 ql/src/test/results/clientpositive/auto_join15.q.out aa40cff ql/src/test/results/clientpositive/auto_join16.q.out 06d73d8 ql/src/test/results/clientpositive/auto_join2.q.out a11f347 ql/src/test/results/clientpositive/auto_join20.q.out cae120a ql/src/test/results/clientpositive/auto_join21.q.out 423094d ql/src/test/results/clientpositive/auto_join22.q.out 6f418db ql/src/test/results/clientpositive/auto_join23.q.out 6a6bc6c ql/src/test/results/clientpositive/auto_join24.q.out c7e872e ql/src/test/results/clientpositive/auto_join26.q.out 7268755 ql/src/test/results/clientpositive/auto_join28.q.out 89db4aa ql/src/test/results/clientpositive/auto_join29.q.out c3744f3 ql/src/test/results/clientpositive/auto_join32.q.out 312664a ql/src/test/results/clientpositive/auto_join33.q.out 8fc0e84 ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out da375f6 ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 9769bd8 ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5c4ba5b ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 6add99a ql/src/test/results/clientpositive/correlationoptimizer1.q.out db3bd78 ql/src/test/results/clientpositive/correlationoptimizer3.q.out cfa7eff ql/src/test/results/clientpositive/correlationoptimizer4.q.out 285a54f ql/src/test/results/clientpositive/correlationoptimizer6.q.out b0438e6 ql/src/test/results/clientpositive/correlationoptimizer7.q.out f8db2bf ql/src/test/results/clientpositive/join28.q.out 60165e2 ql/src/test/results/clientpositive/join32.q.out 41d183b ql/src/test/results/clientpositive/join33.q.out 41d183b ql/src/test/results/clientpositive/join_star.q.out 797b892 ql/src/test/results/clientpositive/mapjoin_filter_on_outerjoin.q.out 0fab62f ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 2f5f613 ql/src/test/results/clientpositive/mapjoin_subquery.q.out 8243c2c ql/src/test/results/clientpositive/mapjoin_subquery2.q.out 292abe4 ql/src/test/results/clientpositive/mapjoin_test_outer.q.out 37817d9 ql/src/test/results/clientpositive/multiMapJoin1.q.out a3f5c53 ql/src/test/results/clientpositive/multiMapJoin2.q.out PRE-CREATION ql/src/test/results/clientpositive/multi_join_union.q.out 5182bdf ql/src/test/results/clientpositive/union34.q.out 166062a Diff: https://reviews.apache.org/r/12795/diff/ Testing --- Running tests. Thanks, Yin Huai
[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs
[ https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4827: --- Attachment: HIVE-4827.6.patch update Merge a Map-only job to its following MapReduce job with multiple inputs Key: HIVE-4827 URL: https://issues.apache.org/jira/browse/HIVE-4827 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job. Example: {code:sql} set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key; {\code} In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job. Another example: {code:sql} SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1 {\code} For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724316#comment-13724316 ] Yin Huai commented on HIVE-4952: seems the failed test is caused by HIVE-4955 When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated
[ https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724320#comment-13724320 ] Hudson commented on HIVE-4955: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/]) HIVE-4955: serde_user_properties.q.out needs to be updated (Thejas M Nair via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508429) * /hive/trunk/ql/src/test/results/clientpositive/serde_user_properties.q.out serde_user_properties.q.out needs to be updated --- Key: HIVE-4955 URL: https://issues.apache.org/jira/browse/HIVE-4955 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.12.0 Attachments: HIVE-4955.1.patch The testcase TestCliDriver.testCliDriver_serde_user_properties was added in HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 has changes that change the expected results of serde_user_properties.q, causing the test to fail now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause
[ https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724322#comment-13724322 ] Hudson commented on HIVE-4928: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/]) HIVE-4928 : Date literals do not work properly in partition spec clause (Jason Dere via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q * /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out Date literals do not work properly in partition spec clause --- Key: HIVE-4928 URL: https://issues.apache.org/jira/browse/HIVE-4928 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.12.0 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch The partition spec parsing doesn't do any actual real evaluation of the values in the partition spec, instead just taking the text value of the ASTNode representing the partition value. This works fine for string/numeric literals (expression tree below): (TOK_PARTVAL region 99) But not for Date literals which are of form DATE '-mm-dd' (expression tree below: (TOK_DATELITERAL '1999-12-31') In this case the parser/analyzer uses TOK_DATELITERAL as the partition column value, when it should really get value of the child of the DATELITERAL token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724325#comment-13724325 ] Hudson commented on HIVE-4525: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724324#comment-13724324 ] Hudson commented on HIVE-2702: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/]) HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality --- Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4954: -- Attachment: HIVE-4879.2.patch.txt PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE
[ https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4624: -- Attachment: HIVE-4624.1-vectorization.patch Integrate Vectorzied Substr into Vectorized QE -- Key: HIVE-4624 URL: https://issues.apache.org/jira/browse/HIVE-4624 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen Assignee: Eric Hanson Attachments: HIVE-4624.1-vectorization.patch Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it can be leveraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4960) lastAlias in CommonJoinOperator is not used
Yin Huai created HIVE-4960: -- Summary: lastAlias in CommonJoinOperator is not used Key: HIVE-4960 URL: https://issues.apache.org/jira/browse/HIVE-4960 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor In CommonJoinOperator, there is object called lastAlias. The initial value of this object is 'null'. After tracing the usage of this object, I found that there is no place to change the value of this object. Also, it is only used in processOp in JoinOperator and MapJoinOperator as {code} if ((lastAlias == null) || (!lastAlias.equals(alias))) { nextSz = joinEmitInterval; } {\code} Since lastAlias will always be null, we will assign joinEmitInterval to nextSz every time we get a row. Later in processOp, we have {code} nextSz = getNextSize(nextSz); {\code} Because we reset the value of nextSz to joinEmitInterval every time we get a row, seems that getNextSize will not be used as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE
[ https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4624: -- Affects Version/s: vectorization-branch Status: Patch Available (was: Open) Integrate Vectorzied Substr into Vectorized QE -- Key: HIVE-4624 URL: https://issues.apache.org/jira/browse/HIVE-4624 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Timothy Chen Assignee: Eric Hanson Attachments: HIVE-4624.1-vectorization.patch Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it can be leveraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4624) Integrate Vectorized Substr into Vectorized QE
[ https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4624: -- Summary: Integrate Vectorized Substr into Vectorized QE (was: Integrate Vectorzied Substr into Vectorized QE) Integrate Vectorized Substr into Vectorized QE -- Key: HIVE-4624 URL: https://issues.apache.org/jira/browse/HIVE-4624 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Timothy Chen Assignee: Eric Hanson Attachments: HIVE-4624.1-vectorization.patch Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it can be leveraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE
[ https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724350#comment-13724350 ] Eric Hanson commented on HIVE-4624: --- This patch includes both changes to VectorizationContext to enable SUBSTR() to run end-to-end, but also bug fixes and unit test fixes related to StringSubstrColStart and StringSubstrColStartLen. I did ad hoc tests from the console to test a large number of variations of use of SUBSTR() in vectorized mode. Integrate Vectorzied Substr into Vectorized QE -- Key: HIVE-4624 URL: https://issues.apache.org/jira/browse/HIVE-4624 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen Assignee: Eric Hanson Attachments: HIVE-4624.1-vectorization.patch Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it can be leveraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724420#comment-13724420 ] Hudson commented on HIVE-2702: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality --- Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724416#comment-13724416 ] Hudson commented on HIVE-3256: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-3256: Update asm version in Hive (Ashutosh Chauhan via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508506) * /hive/trunk/ivy/libraries.properties * /hive/trunk/metastore/ivy.xml Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Fix For: 0.12.0 Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724421#comment-13724421 ] Hudson commented on HIVE-4525: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724418#comment-13724418 ] Hudson commented on HIVE-3264: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-3264 : Add support for binary dataype to AvroSerde (Eli Reisman Mark Wagner via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508528) * /hive/trunk/data/files/csv.txt * /hive/trunk/ql/src/test/queries/clientpositive/avro_nullable_fields.q * /hive/trunk/ql/src/test/results/clientpositive/avro_nullable_fields.q.out * /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java Add support for binary dataype to AvroSerde --- Key: HIVE-3264 URL: https://issues.apache.org/jira/browse/HIVE-3264 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Jakob Homan Assignee: Eli Reisman Labels: patch Fix For: 0.12.0 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints. Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause
[ https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724417#comment-13724417 ] Hudson commented on HIVE-4928: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-4928 : Date literals do not work properly in partition spec clause (Jason Dere via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q * /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out Date literals do not work properly in partition spec clause --- Key: HIVE-4928 URL: https://issues.apache.org/jira/browse/HIVE-4928 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.12.0 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch The partition spec parsing doesn't do any actual real evaluation of the values in the partition spec, instead just taking the text value of the ASTNode representing the partition value. This works fine for string/numeric literals (expression tree below): (TOK_PARTVAL region 99) But not for Date literals which are of form DATE '-mm-dd' (expression tree below: (TOK_DATELITERAL '1999-12-31') In this case the parser/analyzer uses TOK_DATELITERAL as the partition column value, when it should really get value of the child of the DATELITERAL token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4960) lastAlias in CommonJoinOperator is not used
[ https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4960: -- Attachment: HIVE-4960.D11895.1.patch yhuai requested code review of HIVE-4960 [jira] lastAlias in CommonJoinOperator is not used. Reviewers: JIRA first commit In CommonJoinOperator, there is object called lastAlias. The initial value of this object is 'null'. After tracing the usage of this object, I found that there is no place to change the value of this object. Also, it is only used in processOp in JoinOperator and MapJoinOperator as if ((lastAlias == null) || (!lastAlias.equals(alias))) { nextSz = joinEmitInterval; } Since lastAlias will always be null, we will assign joinEmitInterval to nextSz every time we get a row. Later in processOp, we have nextSz = getNextSize(nextSz); Because we reset the value of nextSz to joinEmitInterval every time we get a row, seems that getNextSize will not be used as expected. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11895 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/28341/ To: JIRA, yhuai lastAlias in CommonJoinOperator is not used --- Key: HIVE-4960 URL: https://issues.apache.org/jira/browse/HIVE-4960 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-4960.D11895.1.patch In CommonJoinOperator, there is object called lastAlias. The initial value of this object is 'null'. After tracing the usage of this object, I found that there is no place to change the value of this object. Also, it is only used in processOp in JoinOperator and MapJoinOperator as {code} if ((lastAlias == null) || (!lastAlias.equals(alias))) { nextSz = joinEmitInterval; } {\code} Since lastAlias will always be null, we will assign joinEmitInterval to nextSz every time we get a row. Later in processOp, we have {code} nextSz = getNextSize(nextSz); {\code} Because we reset the value of nextSz to joinEmitInterval every time we get a row, seems that getNextSize will not be used as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4960) lastAlias in CommonJoinOperator is not used
[ https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4960: --- Status: Patch Available (was: Open) lastAlias in CommonJoinOperator is not used --- Key: HIVE-4960 URL: https://issues.apache.org/jira/browse/HIVE-4960 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-4960.D11895.1.patch In CommonJoinOperator, there is object called lastAlias. The initial value of this object is 'null'. After tracing the usage of this object, I found that there is no place to change the value of this object. Also, it is only used in processOp in JoinOperator and MapJoinOperator as {code} if ((lastAlias == null) || (!lastAlias.equals(alias))) { nextSz = joinEmitInterval; } {\code} Since lastAlias will always be null, we will assign joinEmitInterval to nextSz every time we get a row. Later in processOp, we have {code} nextSz = getNextSize(nextSz); {\code} Because we reset the value of nextSz to joinEmitInterval every time we get a row, seems that getNextSize will not be used as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Open (was: Patch Available) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Patch Available (was: Open) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs
[ https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724477#comment-13724477 ] Gunther Hagleitner commented on HIVE-4827: -- This looks really good. Just a few smaller things: - Can you add tests for cascading mapjoins? Something like 5 joins that will produce two groups 3 + 2. - Can you add test to show that setting the threshold to 0 effectively turns off the optimization? Finally, I think it makes sense to remove the old flag since it no longer applies. But can you fill out the release notes section of this ticket and describe how 'optimize.mapjoin.mapreduce' is superseded by threshold = 0 or noconditionaltask? Merge a Map-only job to its following MapReduce job with multiple inputs Key: HIVE-4827 URL: https://issues.apache.org/jira/browse/HIVE-4827 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job. Example: {code:sql} set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key; {\code} In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job. Another example: {code:sql} SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1 {\code} For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs
[ https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4827: --- Status: Open (was: Patch Available) canceling patch. Will update later Merge a Map-only job to its following MapReduce job with multiple inputs Key: HIVE-4827 URL: https://issues.apache.org/jira/browse/HIVE-4827 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job. Example: {code:sql} set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key; {\code} In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job. Another example: {code:sql} SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1 {\code} For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724484#comment-13724484 ] Hive QA commented on HIVE-2137: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595024/HIVE-2137.patch {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/245/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/245/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Labels: patch Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4055) add Date data type
[ https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724515#comment-13724515 ] Lars Francke commented on HIVE-4055: It'd be great if you could document this new data type in the Wiki. add Date data type -- Key: HIVE-4055 URL: https://issues.apache.org/jira/browse/HIVE-4055 Project: Hive Issue Type: Sub-task Components: JDBC, Query Processor, Serializers/Deserializers, UDF Reporter: Sun Rui Assignee: Jason Dere Fix For: 0.12.0 Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, HIVE-4055.3.patch.txt, HIVE-4055.4.patch, HIVE-4055.4.patch.txt, HIVE-4055.D11547.1.patch Add Date data type, a new primitive data type which supports the standard SQL date type. Basically, the implementation can take HIVE-2272 and HIVE-2957 as references. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2608: -- Attachment: HIVE-2608.D4317.7.patch navis updated the revision HIVE-2608 [jira] Do not require AS a,b,c part in LATERAL VIEW. Marked '@Deprecated' for error message which would not be used further Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D4317 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D4317?vs=35643id=36597#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/lateral_view_noalias.q ql/src/test/results/clientpositive/lateral_view_noalias.q.out To: JIRA, ashutoshc, navis Cc: ikabiljo Do not require AS a,b,c part in LATERAL VIEW Key: HIVE-2608 URL: https://issues.apache.org/jira/browse/HIVE-2608 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Reporter: Igor Kabiljo Assignee: Navis Priority: Minor Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch Currently, it is required to state column names when LATERAL VIEW is used. That shouldn't be necessary, since UDTF returns struct which contains column names - and they should be used by default. For example, it would be great if this was possible: SELECT t.*, t.key1 + t.key4 FROM some_table LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724561#comment-13724561 ] Hive QA commented on HIVE-4954: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595032/HIVE-4879.2.patch.txt {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/248/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/248/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724575#comment-13724575 ] Guilherme Braccialli commented on HIVE-896: --- Harish, I noticed that NPath class is on Hive 0.11 source and it's also a known function on hive. Is it working? Could you please give us a sample query? I tried the query below, but its not working. Thanks. create external table flights_tiny (ORIGIN_CITY_NAME string, DEST_CITY_NAME string, YEAR int, MONTH int, DAY_OF_MONTH int, ARR_DELAY float, FL_NUM string) location '/user/x'; select npath( 'ONTIME.LATE+', 'LATE', arr_delay 15, 'EARLY', arr_delay 0, 'ONTIME', arr_delay =0 and arr_delay = 15, 'origin_city_name, fl_num, year, month, day_of_month, size(tpath) as sz, tpath as tpath' ) from flights_tiny; FAILED: NullPointerException null Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Assignee: Harish Butani Priority: Minor Fix For: 0.11.0 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt, hive-896.3.patch.txt, HIVE-896.4.patch, HIVE-896.5.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3455) ANSI CORR(X,Y) is incorrect
[ https://issues.apache.org/jira/browse/HIVE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Hartlaub updated HIVE-3455: --- Attachment: HIVE3455.corrTest.tar.gz Attached are a data file, a .q file and a .q.out file that exercise the corr merge problem. The patch in this ticket passes this test. (test correlates a variable with itself using a CLUSTER BY column) ANSI CORR(X,Y) is incorrect --- Key: HIVE-3455 URL: https://issues.apache.org/jira/browse/HIVE-3455 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0 Reporter: Maxim Bolotin Labels: patch Attachments: HIVE3455.corrTest.tar.gz, my.patch A simple test with 2 collinear vectors returns a wrong result. The problem is the merge of variances, file: http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java?revision=1157222view=markup lines: 347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) * myagg.count; 348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) * myagg.count; the correct merge should be like this: 347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) / myagg.count * nA * nB; 348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) / myagg.count * nA * nB; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request (wikidoc): LZO Compression in Hive
Hi Met with Lefty this afternoon and she was kind to spend time to add my documentation to the site - since I still don't have editing privileges :-) Please review the new wikidoc about LZO compression in the Hive language manual. If anything is unclear or needs more information, you can email suggestions to this list or edit the wiki yourself (if you have editing privileges). Here are the links: 1. Language Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual (new bullet under File Formats) 2. LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO 3. CREATE TABLEhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (near end of section, pasted in here:) Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStoragehttps://cwiki.apache.org/confluence/display/Hive/CompressedStorage if you are planning to keep data compressed in your Hive tables. Use INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat and OutputFormat class as a string literal, e.g., 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO). My cwiki id is https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com It will be great if I could get edit privileges Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4051: -- Attachment: HIVE-4051.D11805.3.patch sershe updated the revision HIVE-4051 [jira] Hive's metastore suffers from 1+N queries when querying partitions is slow. Addressed Phabricator comments, fixed minor bugs (e.g. null checks, setTableName called instead of setDbName), added column schemas that ended up being needed after all, cleaned up the code a bit, added some short circuiting, added order to tests that had undefined order and so depended on the order in which partitions are returned (ORM code returns them by name, SQL by ID). Added some short-circuiting to the queries/getting stuff. I compared the reflection-based dump of SQL- and ORM- based objects from some tests (code not included) and they are the same. The existing tests seem to adequately cover this code. The only concern is that if it fails it's impossible to see as it falls back to ORM... Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11805 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11805?vs=36357id=36615#toc AFFECTED FILES build.xml common/src/java/org/apache/hadoop/hive/conf/HiveConf.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/test/queries/clientpositive/alter_partition_coltype.q ql/src/test/queries/clientpositive/load_dyn_part3.q ql/src/test/queries/clientpositive/load_dyn_part4.q ql/src/test/queries/clientpositive/load_dyn_part9.q ql/src/test/queries/clientpositive/ppr_pushdown2.q ql/src/test/queries/clientpositive/stats4.q ql/src/test/results/clientpositive/load_dyn_part3.q.out ql/src/test/results/clientpositive/load_dyn_part4.q.out ql/src/test/results/clientpositive/load_dyn_part9.q.out ql/src/test/results/clientpositive/ppr_pushdown2.q.out ql/src/test/results/clientpositive/stats4.q.out To: JIRA, sershe Cc: brock Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, HIVE-4051.D11805.3.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND
[jira] [Resolved] (HIVE-4524) Make the Hive HBaseStorageHandler work under HCat
[ https://issues.apache.org/jira/browse/HIVE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-4524. Resolution: Duplicate I'm going to mark this issue as a duplicate of HIVE-4331, since it attempts to correct the same problem, although with a different approach. Make the Hive HBaseStorageHandler work under HCat - Key: HIVE-4524 URL: https://issues.apache.org/jira/browse/HIVE-4524 Project: Hive Issue Type: Bug Components: HBase Handler, HCatalog Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: hbh4.patch Currently, HCatalog has its own HCatHBaseStorageHandler that extends from HBaseStorageHandler to allow for StorageHandler support, and does some translations, like org.apache.mapred-org.apache.mapreduce wrapping, etc. However, this compatibility layer is not complete in functionality as it still assumes the underlying OutputFormat is a mapred.OutputFormat implementation as opposed to a HiveOutputFormat implementation, and it makes assumptions about config property copies that implementations of the HiveStorageHandler, such as the HBaseStorageHandler do not do. To fix this, we need to improve the ability for HCat to properly load native-hive-style StorageHandlers. Also, since HCat has its own HBaseStorageHandler and we'd like to not maintain two separate HBaseStorageHandlers, the idea is to deprecate HCat's storage handler over time, and make sure that hive's HBaseStorageHandler works properly from HCat, and over time, have it reach feature parity with the HCat one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724626#comment-13724626 ] Xuefu Zhang commented on HIVE-4844: --- [~jdere] Thanks for sharing your work. I went thru your patch and had some initial questions. I understand that your patch is still in progress, but I'm wondering what's your thought on how you plan to store the type params. Obviously, type params are metadata of a column, which needs to be stored. I assume that hive schema needs to change to accommodate this. Secondly, SQL VAR or VARCHAR seems to be special hive string with additional restriction. Your patch seems treating them independently. Do you think if type inheritance works here? Lastly, introducing param types seems non-trivial. Do you think if a design doc or wiki page makes sense? Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4051: -- Attachment: HIVE-4051.D11805.4.patch sershe updated the revision HIVE-4051 [jira] Hive's metastore suffers from 1+N queries when querying partitions is slow. Followup - forgot to rerun one query after changing. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11805 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11805?vs=36615id=36621#toc AFFECTED FILES build.xml common/src/java/org/apache/hadoop/hive/conf/HiveConf.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/test/queries/clientpositive/alter_partition_coltype.q ql/src/test/queries/clientpositive/load_dyn_part3.q ql/src/test/queries/clientpositive/load_dyn_part4.q ql/src/test/queries/clientpositive/load_dyn_part9.q ql/src/test/queries/clientpositive/ppr_pushdown2.q ql/src/test/queries/clientpositive/stats4.q ql/src/test/results/clientpositive/alter_partition_coltype.q.out ql/src/test/results/clientpositive/load_dyn_part3.q.out ql/src/test/results/clientpositive/load_dyn_part4.q.out ql/src/test/results/clientpositive/load_dyn_part9.q.out ql/src/test/results/clientpositive/ppr_pushdown2.q.out ql/src/test/results/clientpositive/stats4.q.out To: JIRA, sershe Cc: brock Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID` IS NOT NULL 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL) {code} This data is not detached or cached, so this operation is performed during every query plan for the
Re: [jira] [Created] (HIVE-4954) PTFTranslator hardcodes ranking functions
Can i get some +1 love ? I have 2 or 3 follow ons. On Tuesday, July 30, 2013, Hive QA (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724561#comment-13724561] Hive QA commented on HIVE-4954: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595032/HIVE-4879.2.patch.txt {color:green}SUCCESS:{color} +1 2749 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/248/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/248/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Created] (HIVE-4844) Add char/varchar data types
As for the param types. How do we enforce these? If we have a lazy simple serde and a varchar 10 and we are reading a column with 11 chars, what do we do? Maybe we say that char is an alias and we do not actually enforce anything. On Tuesday, July 30, 2013, Xuefu Zhang (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724626#comment-13724626] Xuefu Zhang commented on HIVE-4844: --- [~jdere] Thanks for sharing your work. I went thru your patch and had some initial questions. I understand that your patch is still in progress, but I'm wondering what's your thought on how you plan to store the type params. Obviously, type params are metadata of a column, which needs to be stored. I assume that hive schema needs to change to accommodate this. Secondly, SQL VAR or VARCHAR seems to be special hive string with additional restriction. Your patch seems treating them independently. Do you think if type inheritance works here? Lastly, introducing param types seems non-trivial. Do you think if a design doc or wiki page makes sense? Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
[ https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724653#comment-13724653 ] Chris Drome commented on HIVE-4574: --- [~thejas], I was wondering why none of the other methods which use XMLEncoder are not synchronized as well. Is there something specific about serializeExpression that makes it different? XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck -- Key: HIVE-4574 URL: https://issues.apache.org/jira/browse/HIVE-4574 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4574.1.patch In open jdk7, XMLEncoder.writeObject call leads to calls to java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe because it uses a static WeakHashMap that would get used from multiple threads. See - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46 Concurrent access to HashMap implementation that are not thread safe can sometimes result in infinite-loops and other problems. If jdk7 is in use, it makes sense to synchronize calls to XMLEncoder.writeObject . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs
[ https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4827: --- Release Note: Before applying this jira to trunk, CommonJoinTaskDispatcher has two methods, mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. The first method tries to merge a map-only task (for MapJoin) to its child map-only task. The second method tries to merge a map-only task to its child MapReduce task (a task has a reducer). There was a flag called hive.optimize.mapjoin.mapreduce to determine if mergeMapJoinTaskWithMapReduceTask will be called. This work combines mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. So, a map-only task will be merged into its child task no matter the child task is a map-only task or a MapReduce task. So hive.optimize.mapjoin.mapreduce is not needed any more. If a user wants to disable merging a map-only task to its child task, he or she can use either set hive.auto.convert.join.noconditionaltask=false; or set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=0; was: Before applying this jira to trunk, CommonJoinTaskDispatcher has two methods, mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. The first method tries to merge a map-only task (for MapJoin) to its child map-only task. The second method tries to merge a map-only task to its child MapReduce task (a task has a reducer). There was a flag called hive.optimize.mapjoin.mapreduce to determine if mergeMapJoinTaskWithMapReduceTask will be called. This work combines mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. So, a map-only task will be merged into its child task no matter the child task is a map-only task or a MapReduce task. So hive.optimize.mapjoin.mapreduce is not needed any more. If a user wants to disable merging a map-only task to its child task, he or she can use either {code} set hive.auto.convert.join.noconditionaltask=false; {\code} {code} set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=0; {\code} Merge a Map-only job to its following MapReduce job with multiple inputs Key: HIVE-4827 URL: https://issues.apache.org/jira/browse/HIVE-4827 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job. Example: {code:sql} set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key; {\code} In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job. Another example: {code:sql} SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1 {\code} For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira