[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265038#comment-15265038 ] Lefty Leverenz commented on HIVE-12963: --- Doc note: This adds *hive.groupby.limit.extrastep* to HiveConf.java, so it needs to be documented in the wiki for release 2.1.0. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262677#comment-15262677 ] Sergey Shelukhin commented on HIVE-12963: - Sorry, forgot about this... the test failed in the above QA run, and it passes for other JIRAs. I'll run it locally to see if it passes and commit if it does. > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261766#comment-15261766 ] Alina Abramova commented on HIVE-12963: --- [~sershe] Sorry, is this failed test related with the fix ? > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204073#comment-15204073 ] Alina Abramova commented on HIVE-12963: --- Hi! What about this test? Does it not work as it should? > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191352#comment-15191352 ] Sergey Shelukhin commented on HIVE-12963: - groupby1_limit failure might be related. I will try it locally, and commit on monday if it works and there are no objections. > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190963#comment-15190963 ] Alina Abramova commented on HIVE-12963: --- Anybody has comments for this issue? > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159920#comment-15159920 ] Hive QA commented on HIVE-12963: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12789230/HIVE-12963.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9803 tests executed *Failed tests:* {noformat} TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_limit org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7071/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7071/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7071/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12789230 - PreCommit-HIVE-TRUNK-Build > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch, HIVE-12963.6.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142481#comment-15142481 ] Hive QA commented on HIVE-12963: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12787156/HIVE-12963.4.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9758 tests executed *Failed tests:* {noformat} TestCliDriver-ppd_union.q-udf_var_samp.q-custom_input_output_format.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_limit_extrastep org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_extrastep org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_limit_extrastep org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_extrastep org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6940/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6940/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6940/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12787156 - PreCommit-HIVE-TRUNK-Build > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143204#comment-15143204 ] Sergey Shelukhin commented on HIVE-12963: - [~ashutoshc] can you comment? I am not very familiar with this code. Do we have a good test for this? > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch, > HIVE-12963.3.patch, HIVE-12963.4.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135736#comment-15135736 ] Hive QA commented on HIVE-12963: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12786299/HIVE-12963.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 10052 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_SortUnionTransposeRule org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_union1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input11_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input14_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input1_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_join_transpose org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonreserved_keywords_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_ppd_optimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_mixed_partition_formats2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_union1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_simple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partitioned_date_time org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_varchar_simple org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_constprog_dpp org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_dpp org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partitioned_date_time
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127705#comment-15127705 ] Hive QA commented on HIVE-12963: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12785501/HIVE-12963.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10018 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-orc_merge5.q-vectorization_limit.q-tez_dynpart_hashjoin_1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.createTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLockRetryLimit org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.updateSelectUpdate org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6832/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6832/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6832/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12785501 - PreCommit-HIVE-TRUNK-Build > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch, HIVE-12963.2.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with querie: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125272#comment-15125272 ] Hive QA commented on HIVE-12963: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12785214/HIVE-12963.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6812/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6812/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6812/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] [INFO] Building Hive ORC 2.1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-orc --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-orc --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-orc --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/orc/src/gen/protobuf-java added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-orc --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-orc --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/orc/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-orc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-orc --- [INFO] Compiling 60 source files to /data/hive-ptest/working/apache-github-source-source/orc/target/classes [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-orc --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/orc/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-orc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/orc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/orc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/orc/target/tmp/conf [copy] Copying 16 files to /data/hive-ptest/working/apache-github-source-source/orc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-orc --- [INFO] Compiling 12 source files to /data/hive-ptest/working/apache-github-source-source/orc/target/test-classes [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/test/org/apache/orc/impl/TestRunLengthIntegerReader.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/test/org/apache/orc/impl/TestRunLengthIntegerReader.java: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-orc --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-orc --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/orc/target/hive-orc-2.1.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-orc --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-orc --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/orc/target/hive-orc-2.1.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-orc/2.1.0-SNAPSHOT/hive-orc-2.1.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/orc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-orc/2.1.0-SNAPSHOT/hive-orc-2.1.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Common 2.1.0-SNAPSHOT [INFO] [INFO]
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124823#comment-15124823 ] Alina Abramova commented on HIVE-12963: --- But I see that if line with creating of genReduceSinkPlan in method genLimitMapRedPlan is commented then finish set is sorted too. It means that we could refuse the creating of extra job, and do sorting in the same MR job, doesn't it? > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with queries: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12963) LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer
[ https://issues.apache.org/jira/browse/HIVE-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124000#comment-15124000 ] Sergey Shelukhin commented on HIVE-12963: - I believe it's caused by the fact that Hive doesn't perform the sort, and relies on MR to sort the data; which means that any job with order by has to have one reducer at some point, so that all the data is sorted together. On non-MR engines like Tez it's less of a problem. > LIMIT statement with SORT BY creates additional MR job with hardcoded only > one reducer > -- > > Key: HIVE-12963 > URL: https://issues.apache.org/jira/browse/HIVE-12963 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.1, 0.13 >Reporter: Alina Abramova >Assignee: Alina Abramova > Attachments: HIVE-12963.1.patch > > > I execute query: > hive> select age from test1 sort by age.age limit 10; > Total jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > Launching Job 2 out of 2 > Number of reduce tasks determined at compile time: 1 > When I have a large number of rows then the last stage of the job takes a > long time. I think we could allow to user choose number of reducers of last > job or refuse extra MR job. > The same behavior I observed with queries: > hive> create table new_test as select age from test1 group by age.age limit > 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)