[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2015-01-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267383#comment-14267383
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12690472/HIVE-3972.10.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6724 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2276/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2276/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12690472 - PreCommit-HIVE-TRUNK-Build

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, 
 HIVE-3972.10.patch.txt, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, 
 HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, 
 HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2014-04-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964821#comment-13964821
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639337/HIVE-3972.9.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5558 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orderby_query_bucketing
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2195/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2195/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639337

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, 
 HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, 
 HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2014-04-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963757#comment-13963757
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639142/HIVE-3972.8.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5556 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orderby_query_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639142

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, 
 HIVE-3972.8.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2014-04-08 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963766#comment-13963766
 ] 

Brock Noland commented on HIVE-3972:


Looks like the .out file contains a ^A or something:

diff --git ql/src/test/results/clientpositive/orderby_query_bucketing.q.out 
ql/src/test/results/clientpositive/orderby_query_bucketing.q.out
new file mode 100644
index 000..c02b1c9
Binary files /dev/null and 
ql/src/test/results/clientpositive/orderby_query_bucketing.q.out differ

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, 
 HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, 
 HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-10-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803618#comment-13803618
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12609781/D8349.7.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4471 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1210/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1210/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, 
 HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, 
 HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801588#comment-13801588
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12609601/D8349.6.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1193/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1193/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1193/source-prep.txt
+ [[ true == \t\r\u\e ]]
+ rm -rf ivy maven
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf build hcatalog/build
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1534526.

At revision 1534526.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file conf/hive-default.xml.template
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/MergeSortingFetcher.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java
patching file ql/src/test/queries/clientpositive/orderby_query_bucketing.q
patching file ql/src/test/results/clientpositive/orderby_query_bucketing.q.out
+ [[ ant == \m\a\v\e\n ]]
+ [[ ant == \a\n\t ]]
+ ant -Dtest.continue.on.failure=true -Dtest.silent=false 
-Divy.default.ivy.user.dir=/data/hive-ptest/working/ivy 
-Dmvn.local.repo=/data/hive-ptest/working/maven clean package test 
-Dtestcase=nothing
Buildfile: /data/hive-ptest/working/apache-svn-trunk-source/build.xml

clean:
 [echo] Project: hive

clean:
 [echo] Project: anttasks

clean:
 [echo] Project: shims

clean:
 [echo] Project: common

clean:
 [echo] Project: serde

clean:
 [echo] Project: metastore

clean:
 [echo] Project: ql

clean:
 [echo] Project: contrib

clean:
 [echo] Project: service

clean:
 [echo] Project: cli

clean:
 [echo] Project: jdbc

clean:
 [echo] Project: beeline

clean:
 [echo] Project: hwi

clean:
 [echo] Project: hbase-handler

clean:
 [echo] Project: testutils

clean:
 [echo] hcatalog

clean:
 [echo] hcatalog-core

clean:
 [echo] hcatalog-pig-adapter

clean:
 [echo] hcatalog-server-extensions

clean:
 [echo] webhcat

clean:
 [echo] webhcat-java-client

clean:

clean:
 [echo] Project: odbc
 [exec] rm -rf /data/hive-ptest/working/apache-svn-trunk-source/build/odbc 
/data/hive-ptest/working/apache-svn-trunk-source/build/service/objs 
/data/hive-ptest/working/apache-svn-trunk-source/build/ql/objs 
/data/hive-ptest/working/apache-svn-trunk-source/build/metastore/objs

clean-online:
 [echo] Project: hive

clean-offline:

ivy-init-dirs:
 [echo] Project: hive
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/build/ivy
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/build/ivy/lib
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/build/ivy/report
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/build/ivy/maven

ivy-download:
 [echo] Project: hive
  [get] Getting: 

[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-10-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800736#comment-13800736
 ] 

Hive QA commented on HIVE-3972:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12609386/D8349.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4429 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1182/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1182/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D8349.5.patch, HIVE-3972.D8349.1.patch, 
 HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-09-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781628#comment-13781628
 ] 

Navis commented on HIVE-3972:
-

[~ashutoshc] A little. It can be an alternative way to acquire ordered result 
without sampling. If this would be included, simple select queries can use this 
by default, because it's simpler than HIVE-3562 and number of reducer also can 
be calculated automatically in a same way with normal MR. HIVE-3562 would be 
still useful for making final output files in totally-ordered form.

HIVE-3972 is HIVE-4002 for order-by.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780916#comment-13780916
 ] 

Ashutosh Chauhan commented on HIVE-3972:


[~navis] HIVE-3562 and HIVE-1402 are in now. In light of that, is this 
optimization still relevant? Are there any queries which may see still further 
benefits from this patch even after both of those optimizations are on.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-13 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577502#comment-13577502
 ] 

Navis commented on HIVE-3972:
-

[~ashutoshc] Above query has two RSs which means it consists of two MRs 
(without HIVE-2340). And second MR still can be a target of top-K optimization. 
But I've realized by your comment that this issue and HIVE-3562 are 
complementary and should be merged into another one. Thanks. 

And.. the limit configuration on fetch task is still active, which means 
early-exit on fetch task is still possible without HIVE-3562. It's merge sort 
on sorted streams, so it would not demand much of memory.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576831#comment-13576831
 ] 

Ashutosh Chauhan commented on HIVE-3972:


[~navis] I agree HIVE-3562 is orthogonal issue which will make what I am 
suggesting lesser of an issue, but there are still some cases. As getting 
discussed on HIVE-3562 consider following query: 
{code}
select value, sum(key) as sum from src group by value order by value limit 10;
{code}
In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization 
won't kick in. After patch as it is currently on this jira, we will generate 
1MR job with multiple reducers and than do order-by on client in Fetch task. 
Here if you don't take advantage of the fact that there is a limit in query you 
might possibly read millions of rows from hdfs, bring all of them in client 
memory and than just show 10 to user. If you instead take limit into account 
and stop merging and reading as soon as you have seen 10 rows, you have saved 
both on hdfs IO as well as client memory. Make sense ? 

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571160#comment-13571160
 ] 

Phabricator commented on HIVE-3972:
---

njain has commented on the revision HIVE-3972 [jira] Support using multiple 
reducer for fetching order by results.

INLINE COMMENTS
  conf/hive-default.xml.template:1621 nit: reducers


  for the last MapReduce task for order by
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 apache header
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 can you 
perform explain extended ?
  I think, it also shows the number of reducers.
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 Might be 
easier to create a tmp table with 10 rows initially to reduce the number of 
results.
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:8 Add some 
comments - it would be good to have a lot of examples.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 What 
happens if it is -1 ?

  Shouldn't useBucketingForOrderBy be false ?

REVISION DETAIL
  https://reviews.facebook.net/D8349

To: JIRA, navis
Cc: njain


 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572006#comment-13572006
 ] 

Phabricator commented on HIVE-3972:
---

navis has commented on the revision HIVE-3972 [jira] Support using multiple 
reducer for fetching order by results.

INLINE COMMENTS
  conf/hive-default.xml.template:1621 ok. It's harder than writing some codes.
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 ah, ok.
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 ok.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 It 
will be calculated by input size, which might be 1 or not. Then it would be 
safer assuming that it's not 1.

REVISION DETAIL
  https://reviews.facebook.net/D8349

To: JIRA, navis
Cc: njain


 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-05 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572148#comment-13572148
 ] 

Navis commented on HIVE-3972:
-

I've missed some commits (HIVE-3633, etc). Should be merged correctly.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569846#comment-13569846
 ] 

Namit Jain commented on HIVE-3972:
--

Does not look like it. The test does not have any limit.
Can you explain the new parameter you added - it will make it easier to review 
the code ?
Also, add it in hive-default.xml.template.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569600#comment-13569600
 ] 

Ashutosh Chauhan commented on HIVE-3972:


I think this optimization will become more useful if it also considers the 
limit in query, since in most cases queries order-by is accompanied by limit. 
So, we can stop fetching and merging the results as soon as we get number of 
records in limit clause. Or does this already takes limit in account ?

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira