[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.5-spark.patch Update the golden file for union_remove_25 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8017: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to spark branch. Thanks to Rui for the contribution. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8017: -- Labels: Spark-M1 (was: ) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.4-spark.patch Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.2-spark.patch This patch fixes some failed qfile tests caused by last patch. Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}. For {{optimize_nullscan.q}} I checked the corresponding MR output and found the operator tree in the new output file is more similar to the one in the MR version output. Besides this failure is of age 2, so I guess it's not related to the patch here. For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} (6812 - 6826). I checked the MR version and the total size is also 6812. I'm not sure what causes this difference. Maybe need to do more tests for partitioned table. [~xuefuz] do you have any idea on this? Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.3-spark.patch Use SORT_QUERY_RESULTS instead of SORT_BEFORE_DIFF. And update golden files for MR/Tez as well. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017-spark.patch Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Status: Patch Available (was: Open) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of Byteswritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Use HiveKey instead of Byteswritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Summary: Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] (was: Use HiveKey instead of Byteswritable as key type of the pair RDD [Spark Branch]) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)