[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-28 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-8597:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> SMB join small table side should use the same set of serialized payloads 
> across tasks
> -
>
> Key: HIVE-8597
> URL: https://issues.apache.org/jira/browse/HIVE-8597
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.14.0
>
> Attachments: HIVE-8597.1.patch
>
>
> Each task sees all splits belonging to the bucket being processed by the 
> task. At the moment, we end up using different instances of the same 
> serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-8597:
-
Status: Patch Available  (was: Open)

> SMB join small table side should use the same set of serialized payloads 
> across tasks
> -
>
> Key: HIVE-8597
> URL: https://issues.apache.org/jira/browse/HIVE-8597
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.14.0
>
> Attachments: HIVE-8597.1.patch
>
>
> Each task sees all splits belonging to the bucket being processed by the 
> task. At the moment, we end up using different instances of the same 
> serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-8597:
-
Attachment: HIVE-8597.1.patch

Patch to create one set of serialized splits for each bucket, and re-use them 
across tasks processing the same bucket. Also removes some unused variables, 
and cleans up variables to allow for GC.

[~vikram.dixit] - please review.

> SMB join small table side should use the same set of serialized payloads 
> across tasks
> -
>
> Key: HIVE-8597
> URL: https://issues.apache.org/jira/browse/HIVE-8597
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.14.0
>
> Attachments: HIVE-8597.1.patch
>
>
> Each task sees all splits belonging to the bucket being processed by the 
> task. At the moment, we end up using different instances of the same 
> serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)