[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.5-spark.patch

Update the golden file for union_remove_25

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8017:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to spark branch. Thanks to Rui for the contribution.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8017:
--
Labels: Spark-M1  (was: )

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-10 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.4-spark.patch

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-09 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.2-spark.patch

This patch fixes some failed qfile tests caused by last patch.
Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}.
For {{optimize_nullscan.q}}  I checked the corresponding MR output and found 
the operator tree in the new output file is more similar to the one in the MR 
version output. Besides this failure is of age 2, so I guess it's not related 
to the patch here.
For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} 
(6812 - 6826). I checked the MR version and the total size is also 6812. I'm 
not sure what causes this difference. Maybe need to do more tests for 
partitioned table.
[~xuefuz] do you have any idea on this?

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-09 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.3-spark.patch

Use SORT_QUERY_RESULTS instead of SORT_BEFORE_DIFF.
And update golden files for MR/Tez as well.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
 HIVE-8017.3-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017-spark.patch

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Status: Patch Available  (was: Open)

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of Byteswritable as key type of the pair RDD [Spark Branch]

2014-09-06 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Use HiveKey instead of Byteswritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li

 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-06 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Summary: Use HiveKey instead of BytesWritable as key type of the pair RDD 
[Spark Branch]  (was: Use HiveKey instead of Byteswritable as key type of the 
pair RDD [Spark Branch])

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li

 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)