[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326921#comment-14326921
 ] 

Rui Li commented on HIVE-9561:
--

Thank you Xuefu!

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
> HIVE-9561.6-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-17 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325120#comment-14325120
 ] 

Rui Li commented on HIVE-9561:
--

Hi [~xuefuz], thanks very much for taking care of this. I can't really work on 
it due to limited network access. Sorry for the inconvenience.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
> HIVE-9561.6-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325053#comment-14325053
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12699339/HIVE-9561.6-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7553 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/737/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/737/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-737/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12699339 - PreCommit-HIVE-SPARK-Build

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
> HIVE-9561.6-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324671#comment-14324671
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12699299/HIVE-9561.5-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7553 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_streaming
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/736/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/736/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-736/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12699299 - PreCommit-HIVE-SPARK-Build

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323759#comment-14323759
 ] 

Xuefu Zhang commented on HIVE-9561:
---

Unfortunately the patch doesn't apply any more after recent trunk to branch 
merge. Could you please rebase?

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323756#comment-14323756
 ] 

Xuefu Zhang commented on HIVE-9561:
---

+1

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323617#comment-14323617
 ] 

Hive QA commented on HIVE-9561:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12699189/HIVE-9561.4-spark.patch

{color:green}SUCCESS:{color} +1 7471 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/734/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/734/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-734/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12699189 - PreCommit-HIVE-SPARK-Build

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321938#comment-14321938
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698967/HIVE-9561.4-spark.patch

{color:red}ERROR:{color} -1 due to 1993 failed/errored test(s), 7069 tests 
executed
*Failed tests:*
{noformat}
TestSSL - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_change_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_cascade
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4
org.a

[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321827#comment-14321827
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698947/HIVE-9561.4-spark.patch

{color:red}ERROR:{color} -1 due to 1993 failed/errored test(s), 7069 tests 
executed
*Failed tests:*
{noformat}
TestSSL - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_change_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_cascade
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4
org.a

[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319776#comment-14319776
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698669/HIVE-9561.3-spark.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7471 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union4
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-724/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12698669 - PreCommit-HIVE-SPARK-Build

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
> HIVE-9561.3-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318283#comment-14318283
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698417/HIVE-9561.2-spark.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7471 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union4
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/723/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/723/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-723/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12698417 - PreCommit-HIVE-SPARK-Build

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-12 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318145#comment-14318145
 ] 

Rui Li commented on HIVE-9561:
--

I can't post the patch to RB. Will try again later.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308589#comment-14308589
 ] 

Rui Li commented on HIVE-9561:
--

OK will do.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308560#comment-14308560
 ] 

Xuefu Zhang commented on HIVE-9561:
---

Yeah, hasSortBy and hasOrderBy are for the whole query. Thus, they are not 
reliable for either detection. I think we can add a flag in ReduceSinkDesc, 
similar to enforceSort, telling about sortBy/orderBy/None.

Yes, I think that optimization makes sense. Please do some prototyping. If all 
is well, create an JIRA for it.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308462#comment-14308462
 ] 

Rui Li commented on HIVE-9561:
--

Yes I'm trying to avoid using SHUFFLE_SORT for any non-orderBy query. But for 
this particular case, we'll have QueryProperties.hasSortBy and 
QueryProperties.hasOrderBy both true. So we can't really distinguish.

As to removing the unnecessary ordering for subqueries, that sounds 
interesting. Maybe we can traverse SparkWork to look for SHUFFLE_SORT edges. If 
the downstream work on that edge is not a leaf work, we can remove it. What do 
you think?

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308303#comment-14308303
 ] 

Xuefu Zhang commented on HIVE-9561:
---

Does QueryProperties.hasSortBy() help?

For a sort by query, we don't need SHUFFLE_SORT, right? MR_SHUFFLE may just be 
sufficient.

For this particular query, it seems making no sense to have either sort by or 
order by in the subqueries. They make sense only if they are specified for the 
final output. Do you agree?

Maybe we can do some optimization to detect and remove those if they are not 
for the final output.

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> --
>
> Key: HIVE-9561
> URL: https://issues.apache.org/jira/browse/HIVE-9561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
> and are difficult to control. So we should limit the use of {{sortByKey}} to 
> order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307504#comment-14307504
 ] 

Hive QA commented on HIVE-9561:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12696704/HIVE-9561.1-spark.patch

{color:red}ERROR:{color} -1 due to 57 failed/errored test(s), 7468 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_sortby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_noskew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_identity_project_remove_skip
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multigroupby_singlemr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_transform
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_reduce_deduplicate_exclude_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPAR