[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267383#comment-14267383 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12690472/HIVE-3972.10.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6724 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2276/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2276/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2276/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12690472 - PreCommit-HIVE-TRUNK-Build Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.10.patch.txt, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964821#comment-13964821 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639337/HIVE-3972.9.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5558 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orderby_query_bucketing {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2195/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2195/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639337 Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963757#comment-13963757 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639142/HIVE-3972.8.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orderby_query_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639142 Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963766#comment-13963766 ] Brock Noland commented on HIVE-3972: Looks like the .out file contains a ^A or something: diff --git ql/src/test/results/clientpositive/orderby_query_bucketing.q.out ql/src/test/results/clientpositive/orderby_query_bucketing.q.out new file mode 100644 index 000..c02b1c9 Binary files /dev/null and ql/src/test/results/clientpositive/orderby_query_bucketing.q.out differ Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803618#comment-13803618 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12609781/D8349.7.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1210/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1210/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801588#comment-13801588 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12609601/D8349.6.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1193/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1193/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1193/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf build hcatalog/build + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1534526. At revision 1534526. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch Going to apply patch with: patch -p0 patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java patching file conf/hive-default.xml.template patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/MergeSortingFetcher.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java patching file ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java patching file ql/src/test/queries/clientpositive/orderby_query_bucketing.q patching file ql/src/test/results/clientpositive/orderby_query_bucketing.q.out + [[ ant == \m\a\v\e\n ]] + [[ ant == \a\n\t ]] + ant -Dtest.continue.on.failure=true -Dtest.silent=false -Divy.default.ivy.user.dir=/data/hive-ptest/working/ivy -Dmvn.local.repo=/data/hive-ptest/working/maven clean package test -Dtestcase=nothing Buildfile: /data/hive-ptest/working/apache-svn-trunk-source/build.xml clean: [echo] Project: hive clean: [echo] Project: anttasks clean: [echo] Project: shims clean: [echo] Project: common clean: [echo] Project: serde clean: [echo] Project: metastore clean: [echo] Project: ql clean: [echo] Project: contrib clean: [echo] Project: service clean: [echo] Project: cli clean: [echo] Project: jdbc clean: [echo] Project: beeline clean: [echo] Project: hwi clean: [echo] Project: hbase-handler clean: [echo] Project: testutils clean: [echo] hcatalog clean: [echo] hcatalog-core clean: [echo] hcatalog-pig-adapter clean: [echo] hcatalog-server-extensions clean: [echo] webhcat clean: [echo] webhcat-java-client clean: clean: [echo] Project: odbc [exec] rm -rf /data/hive-ptest/working/apache-svn-trunk-source/build/odbc /data/hive-ptest/working/apache-svn-trunk-source/build/service/objs /data/hive-ptest/working/apache-svn-trunk-source/build/ql/objs /data/hive-ptest/working/apache-svn-trunk-source/build/metastore/objs clean-online: [echo] Project: hive clean-offline: ivy-init-dirs: [echo] Project: hive [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/build/ivy [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/build/ivy/lib [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/build/ivy/report [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/build/ivy/maven ivy-download: [echo] Project: hive [get] Getting:
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800736#comment-13800736 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12609386/D8349.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4429 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1182/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1182/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781628#comment-13781628 ] Navis commented on HIVE-3972: - [~ashutoshc] A little. It can be an alternative way to acquire ordered result without sampling. If this would be included, simple select queries can use this by default, because it's simpler than HIVE-3562 and number of reducer also can be calculated automatically in a same way with normal MR. HIVE-3562 would be still useful for making final output files in totally-ordered form. HIVE-3972 is HIVE-4002 for order-by. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780916#comment-13780916 ] Ashutosh Chauhan commented on HIVE-3972: [~navis] HIVE-3562 and HIVE-1402 are in now. In light of that, is this optimization still relevant? Are there any queries which may see still further benefits from this patch even after both of those optimizations are on. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577502#comment-13577502 ] Navis commented on HIVE-3972: - [~ashutoshc] Above query has two RSs which means it consists of two MRs (without HIVE-2340). And second MR still can be a target of top-K optimization. But I've realized by your comment that this issue and HIVE-3562 are complementary and should be merged into another one. Thanks. And.. the limit configuration on fetch task is still active, which means early-exit on fetch task is still possible without HIVE-3562. It's merge sort on sorted streams, so it would not demand much of memory. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576831#comment-13576831 ] Ashutosh Chauhan commented on HIVE-3972: [~navis] I agree HIVE-3562 is orthogonal issue which will make what I am suggesting lesser of an issue, but there are still some cases. As getting discussed on HIVE-3562 consider following query: {code} select value, sum(key) as sum from src group by value order by value limit 10; {code} In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization won't kick in. After patch as it is currently on this jira, we will generate 1MR job with multiple reducers and than do order-by on client in Fetch task. Here if you don't take advantage of the fact that there is a limit in query you might possibly read millions of rows from hdfs, bring all of them in client memory and than just show 10 to user. If you instead take limit into account and stop merging and reading as soon as you have seen 10 rows, you have saved both on hdfs IO as well as client memory. Make sense ? Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571160#comment-13571160 ] Phabricator commented on HIVE-3972: --- njain has commented on the revision HIVE-3972 [jira] Support using multiple reducer for fetching order by results. INLINE COMMENTS conf/hive-default.xml.template:1621 nit: reducers for the last MapReduce task for order by ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 apache header ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 can you perform explain extended ? I think, it also shows the number of reducers. ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 Might be easier to create a tmp table with 10 rows initially to reduce the number of results. ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:8 Add some comments - it would be good to have a lot of examples. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 What happens if it is -1 ? Shouldn't useBucketingForOrderBy be false ? REVISION DETAIL https://reviews.facebook.net/D8349 To: JIRA, navis Cc: njain Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572006#comment-13572006 ] Phabricator commented on HIVE-3972: --- navis has commented on the revision HIVE-3972 [jira] Support using multiple reducer for fetching order by results. INLINE COMMENTS conf/hive-default.xml.template:1621 ok. It's harder than writing some codes. ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 ah, ok. ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 ok. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 It will be calculated by input size, which might be 1 or not. Then it would be safer assuming that it's not 1. REVISION DETAIL https://reviews.facebook.net/D8349 To: JIRA, navis Cc: njain Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572148#comment-13572148 ] Navis commented on HIVE-3972: - I've missed some commits (HIVE-3633, etc). Should be merged correctly. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569846#comment-13569846 ] Namit Jain commented on HIVE-3972: -- Does not look like it. The test does not have any limit. Can you explain the new parameter you added - it will make it easier to review the code ? Also, add it in hive-default.xml.template. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569600#comment-13569600 ] Ashutosh Chauhan commented on HIVE-3972: I think this optimization will become more useful if it also considers the limit in query, since in most cases queries order-by is accompanied by limit. So, we can stop fetching and merging the results as soon as we get number of records in limit clause. Or does this already takes limit in account ? Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira