[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6423: - Status: Open (was: Patch Available) MapOutput Sampler - Key: MAPREDUCE-6423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ram Manohar Bheemana Assignee: Ram Manohar Bheemana Priority: Minor Attachments: MapOutputSampler.java Need a sampler based on the MapOutput Keys. Current InputSampler implementation has a major drawback which is input and output of a mapper should be same, generally this isn't the case. approach: 1. Create a Sampler which samples the data based on the input. 2. Run a small map reduce in uber task mode using the original job mapper and identity reducer to generate required MapOutputSample keys 3. Optionally, we can input the input file to be sample. For example inputs files A, B; we should be able to specify to use only file A for sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Manohar Bheemana updated MAPREDUCE-6423: Status: Patch Available (was: In Progress) Please review the attached MapOutputSampler.java MapOutput Sampler - Key: MAPREDUCE-6423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ram Manohar Bheemana Assignee: Ram Manohar Bheemana Priority: Minor Attachments: MapOutputSampler.java Need a sampler based on the MapOutput Keys. Current InputSampler implementation has a major drawback which is input and output of a mapper should be same, generally this isn't the case. approach: 1. Create a Sampler which samples the data based on the input. 2. Run a small map reduce in uber task mode using the original job mapper and identity reducer to generate required MapOutputSample keys 3. Optionally, we can input the input file to be sample. For example inputs files A, B; we should be able to specify to use only file A for sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Manohar Bheemana updated MAPREDUCE-6423: Attachment: MapOutputSampler.java MapOutput Sampler - Key: MAPREDUCE-6423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ram Manohar Bheemana Assignee: Ram Manohar Bheemana Priority: Minor Attachments: MapOutputSampler.java Need a sampler based on the MapOutput Keys. Current InputSampler implementation has a major drawback which is input and output of a mapper should be same, generally this isn't the case. approach: 1. Create a Sampler which samples the data based on the input. 2. Run a small map reduce in uber task mode using the original job mapper and identity reducer to generate required MapOutputSample keys 3. Optionally, we can input the input file to be sample. For example inputs files A, B; we should be able to specify to use only file A for sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)