subject:"\[jira\] \[Commented\] \(MAPREDUCE\-6423\) MapOutput Sampler"

[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-09-12 Thread Ram Manohar Bheemana (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742215#comment-14742215
 ] 

Ram Manohar Bheemana commented on MAPREDUCE-6423:
-

Sorry for delay in response, will try to generate the patch as suggested.

> MapOutput Sampler
> -
>
> Key: MAPREDUCE-6423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ram Manohar Bheemana
>Assignee: Ram Manohar Bheemana
>Priority: Minor
> Attachments: MapOutputSampler.java
>
>
> Need a sampler based on the MapOutput Keys. Current InputSampler 
> implementation has a major drawback which is input and output of a mapper 
> should be same, generally this isn't the case.
> approach:
> 1. Create a Sampler which samples the data based on the input.
> 2. Run a small map reduce in uber task mode using the original job mapper and 
> identity reducer to generate required MapOutputSample keys
> 3. Optionally, we can input the input file to be sample. For example inputs 
> files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707436#comment-14707436
]

Chris Douglas commented on MAPREDUCE-6423:
--

Thanks for taking a look at this. That the sampler only works on input data was
always a weakness for jobs requiring their output be totally ordered.

Could you generate a patch? The contribution wiki is
[here|http://wiki.apache.org/hadoop/HowToContribute].

It might be easier for others to use if the Mapper was integrated with the
InputSampler, but a separate tool is still an improvement.

MapOutput Sampler
-

Key: MAPREDUCE-6423
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
Attachments: MapOutputSampler.java

Need a sampler based on the MapOutput Keys. Current InputSampler
implementation has a major drawback which is input and output of a mapper
should be same, generally this isn't the case.
approach:
1. Create a Sampler which samples the data based on the input.
2. Run a small map reduce in uber task mode using the original job mapper and
identity reducer to generate required MapOutputSample keys
3. Optionally, we can input the input file to be sample. For example inputs
files A, B; we should be able to specify to use only file A for sampling.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2 matches

Site Navigation

Mail list logo

Footer information