[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Attachment: HIVE-9097.1-spark.patch

The patch splits the original spark task into two tasks so that conditional map 
joins can be inserted to process skewed data.
Changes to golden files are all in query plan.

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Status: Patch Available  (was: Open)

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Affects Version/s: spark-branch

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: spark-branch
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9097:
-
Component/s: Spark

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: spark-branch
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9097) Support runtime skew join for more queries [Spark Branch]

2014-12-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9097:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Rui.

 Support runtime skew join for more queries [Spark Branch]
 -

 Key: HIVE-9097
 URL: https://issues.apache.org/jira/browse/HIVE-9097
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: spark-branch
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-9097.1-spark.patch


 After HIVE-8913, runtime skew join is enabled for spark. But currently the 
 optimization only supports the simplest case where join is the leaf 
 ReduceWork in a work graph. This is because the results from the original 
 join and the conditional map join have to be unioned to feed to downstream 
 works, which can be a little tricky for spark.
 This JIRA is to research and find a way to relax the above restriction. A 
 possible solution is to break the original task into two tasks on the join 
 work, and insert the conditional task in between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)