[jira] [Updated] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs

2015-03-26 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10104:
--
Attachment: HIVE-10104.1.txt

Patch to order the original splits by size and name.
Location is based on a hash of the filename and start position.

[~hagleitn] - could you please take a quick look for sanity.

Will commit after I'm able to test it a bit on a cluster larger than 1 node.

 LLAP: Generate consistent splits and locations for the same split across jobs
 -

 Key: HIVE-10104
 URL: https://issues.apache.org/jira/browse/HIVE-10104
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10104.1.txt


 Locations for splits are currently randomized. Also, the order of splits is 
 random - depending on how threads end up generating the splits.
 Add an option to sort the splits, and generate repeatable locations - 
 assuming all other factors are the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs

2015-03-26 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10104:
--
Attachment: HIVE-10104.2.txt

Updated patch with the sort removed from the scheduler. Tested on a multi-node 
cluster. Will commit after the next rebase of the LLAP branch.

 LLAP: Generate consistent splits and locations for the same split across jobs
 -

 Key: HIVE-10104
 URL: https://issues.apache.org/jira/browse/HIVE-10104
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10104.1.txt, HIVE-10104.2.txt


 Locations for splits are currently randomized. Also, the order of splits is 
 random - depending on how threads end up generating the splits.
 Add an option to sort the splits, and generate repeatable locations - 
 assuming all other factors are the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)