[jira] Updated: (PIG-460) PERFORMANCE: Order by done in 3 MR jobs, could be done in 2

2009-01-28 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated PIG-460:
--

Attachment: sampler2.patch

 PERFORMANCE:  Order by done in 3 MR jobs, could be done in 2
 

 Key: PIG-460
 URL: https://issues.apache.org/jira/browse/PIG-460
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Amir Youssefi
 Fix For: types_branch

 Attachments: sampler.patch


 Currently order by is done in three MR jobs:
 job 1: read data in whatever loader the user requests, store using BinStorage
 job 2: load using RandomSampleLoader, find quantiles
 job 3: load data again and sort
 It is done this way because RandomSampleLoader extends BinStorage, and so 
 needs the data in that format to read it.
 If the logic in RandomSampleLoader was made into an operator instead of being 
 in a loader then jobs 1 and 2 could be merged.  On average job 1 takes about 
 15% of the time of an order by script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-460) PERFORMANCE: Order by done in 3 MR jobs, could be done in 2

2009-01-28 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated PIG-460:
--

Attachment: sampler2.patch

 PERFORMANCE:  Order by done in 3 MR jobs, could be done in 2
 

 Key: PIG-460
 URL: https://issues.apache.org/jira/browse/PIG-460
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Amir Youssefi
 Fix For: types_branch

 Attachments: sampler.patch, sampler2.patch


 Currently order by is done in three MR jobs:
 job 1: read data in whatever loader the user requests, store using BinStorage
 job 2: load using RandomSampleLoader, find quantiles
 job 3: load data again and sort
 It is done this way because RandomSampleLoader extends BinStorage, and so 
 needs the data in that format to read it.
 If the logic in RandomSampleLoader was made into an operator instead of being 
 in a loader then jobs 1 and 2 could be merged.  On average job 1 takes about 
 15% of the time of an order by script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-460) PERFORMANCE: Order by done in 3 MR jobs, could be done in 2

2009-01-28 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated PIG-460:
--

Attachment: (was: sampler2.patch)

 PERFORMANCE:  Order by done in 3 MR jobs, could be done in 2
 

 Key: PIG-460
 URL: https://issues.apache.org/jira/browse/PIG-460
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Amir Youssefi
 Fix For: types_branch

 Attachments: sampler.patch


 Currently order by is done in three MR jobs:
 job 1: read data in whatever loader the user requests, store using BinStorage
 job 2: load using RandomSampleLoader, find quantiles
 job 3: load data again and sort
 It is done this way because RandomSampleLoader extends BinStorage, and so 
 needs the data in that format to read it.
 If the logic in RandomSampleLoader was made into an operator instead of being 
 in a loader then jobs 1 and 2 could be merged.  On average job 1 takes about 
 15% of the time of an order by script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-460) PERFORMANCE: Order by done in 3 MR jobs, could be done in 2

2009-01-28 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated PIG-460:
--

Attachment: sampler2.patch

 PERFORMANCE:  Order by done in 3 MR jobs, could be done in 2
 

 Key: PIG-460
 URL: https://issues.apache.org/jira/browse/PIG-460
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Amir Youssefi
 Fix For: types_branch

 Attachments: sampler.patch, sampler2.patch


 Currently order by is done in three MR jobs:
 job 1: read data in whatever loader the user requests, store using BinStorage
 job 2: load using RandomSampleLoader, find quantiles
 job 3: load data again and sort
 It is done this way because RandomSampleLoader extends BinStorage, and so 
 needs the data in that format to read it.
 If the logic in RandomSampleLoader was made into an operator instead of being 
 in a loader then jobs 1 and 2 could be merged.  On average job 1 takes about 
 15% of the time of an order by script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.