[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-09-12 Thread Ram Manohar Bheemana (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742215#comment-14742215
 ] 

Ram Manohar Bheemana commented on MAPREDUCE-6423:
-

Sorry for delay in response, will try to generate the patch as suggested.

> MapOutput Sampler
> -
>
> Key: MAPREDUCE-6423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ram Manohar Bheemana
>Assignee: Ram Manohar Bheemana
>Priority: Minor
> Attachments: MapOutputSampler.java
>
>
> Need a sampler based on the MapOutput Keys. Current InputSampler 
> implementation has a major drawback which is input and output of a mapper 
> should be same, generally this isn't the case.
> approach:
> 1. Create a Sampler which samples the data based on the input.
> 2. Run a small map reduce in uber task mode using the original job mapper and 
> identity reducer to generate required MapOutputSample keys
> 3. Optionally, we can input the input file to be sample. For example inputs 
> files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707436#comment-14707436
 ] 

Chris Douglas commented on MAPREDUCE-6423:
--

Thanks for taking a look at this. That the sampler only works on input data was 
always a weakness for jobs requiring their output be totally ordered.

Could you generate a patch? The contribution wiki is 
[here|http://wiki.apache.org/hadoop/HowToContribute].

It might be easier for others to use if the Mapper was integrated with the 
InputSampler, but a separate tool is still an improvement.

 MapOutput Sampler
 -

 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
 Attachments: MapOutputSampler.java


 Need a sampler based on the MapOutput Keys. Current InputSampler 
 implementation has a major drawback which is input and output of a mapper 
 should be same, generally this isn't the case.
 approach:
 1. Create a Sampler which samples the data based on the input.
 2. Run a small map reduce in uber task mode using the original job mapper and 
 identity reducer to generate required MapOutputSample keys
 3. Optionally, we can input the input file to be sample. For example inputs 
 files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)