[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326921#comment-14326921 ] Rui Li commented on HIVE-9561: -- Thank you Xuefu! > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: spark-branch > > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, > HIVE-9561.6-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325120#comment-14325120 ] Rui Li commented on HIVE-9561: -- Hi [~xuefuz], thanks very much for taking care of this. I can't really work on it due to limited network access. Sorry for the inconvenience. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, > HIVE-9561.6-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325053#comment-14325053 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12699339/HIVE-9561.6-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7553 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/737/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/737/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-737/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12699339 - PreCommit-HIVE-SPARK-Build > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, > HIVE-9561.6-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324671#comment-14324671 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12699299/HIVE-9561.5-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7553 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_streaming org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/736/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/736/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-736/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12699299 - PreCommit-HIVE-SPARK-Build > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323759#comment-14323759 ] Xuefu Zhang commented on HIVE-9561: --- Unfortunately the patch doesn't apply any more after recent trunk to branch merge. Could you please rebase? > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323756#comment-14323756 ] Xuefu Zhang commented on HIVE-9561: --- +1 > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323617#comment-14323617 ] Hive QA commented on HIVE-9561: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12699189/HIVE-9561.4-spark.patch {color:green}SUCCESS:{color} +1 7471 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/734/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/734/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-734/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12699189 - PreCommit-HIVE-SPARK-Build > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321938#comment-14321938 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698967/HIVE-9561.4-spark.patch {color:red}ERROR:{color} -1 due to 1993 failed/errored test(s), 7069 tests executed *Failed tests:* {noformat} TestSSL - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_change_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_cascade org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4 org.a
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321827#comment-14321827 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698947/HIVE-9561.4-spark.patch {color:red}ERROR:{color} -1 due to 1993 failed/errored test(s), 7069 tests executed *Failed tests:* {noformat} TestSSL - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_change_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_cascade org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4 org.a
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319776#comment-14319776 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698669/HIVE-9561.3-spark.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union4 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-724/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698669 - PreCommit-HIVE-SPARK-Build > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, > HIVE-9561.3-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318283#comment-14318283 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698417/HIVE-9561.2-spark.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union4 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/723/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/723/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-723/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698417 - PreCommit-HIVE-SPARK-Build > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318145#comment-14318145 ] Rui Li commented on HIVE-9561: -- I can't post the patch to RB. Will try again later. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308589#comment-14308589 ] Rui Li commented on HIVE-9561: -- OK will do. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308560#comment-14308560 ] Xuefu Zhang commented on HIVE-9561: --- Yeah, hasSortBy and hasOrderBy are for the whole query. Thus, they are not reliable for either detection. I think we can add a flag in ReduceSinkDesc, similar to enforceSort, telling about sortBy/orderBy/None. Yes, I think that optimization makes sense. Please do some prototyping. If all is well, create an JIRA for it. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308462#comment-14308462 ] Rui Li commented on HIVE-9561: -- Yes I'm trying to avoid using SHUFFLE_SORT for any non-orderBy query. But for this particular case, we'll have QueryProperties.hasSortBy and QueryProperties.hasOrderBy both true. So we can't really distinguish. As to removing the unnecessary ordering for subqueries, that sounds interesting. Maybe we can traverse SparkWork to look for SHUFFLE_SORT edges. If the downstream work on that edge is not a leaf work, we can remove it. What do you think? > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308303#comment-14308303 ] Xuefu Zhang commented on HIVE-9561: --- Does QueryProperties.hasSortBy() help? For a sort by query, we don't need SHUFFLE_SORT, right? MR_SHUFFLE may just be sufficient. For this particular query, it seems making no sense to have either sort by or order by in the subqueries. They make sense only if they are specified for the final output. Do you agree? Maybe we can do some optimization to detect and remove those if they are not for the final output. > SHUFFLE_SORT should only be used for order by query [Spark Branch] > -- > > Key: HIVE-9561 > URL: https://issues.apache.org/jira/browse/HIVE-9561 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-9561.1-spark.patch > > > The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance > and are difficult to control. So we should limit the use of {{sortByKey}} to > order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307504#comment-14307504 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12696704/HIVE-9561.1-spark.patch {color:red}ERROR:{color} -1 due to 57 failed/errored test(s), 7468 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_sortby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_noskew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_identity_project_remove_skip org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multigroupby_singlemr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_transform org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_reduce_deduplicate_exclude_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPAR