[jira] Commented: (PIG-652) Need to give user control of OutputFormat

Hong Tang (JIRA) Tue, 10 Feb 2009 10:53:23 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672362#action_12672362
 ]


Hong Tang commented on PIG-652:
-------------------------------

Again, I might miss something obvious to you. My understanding is that any 
OutputFormat classes would be constructed by the default constructor, and all 
its states would have to be passed from JobClient side to actual task-side 
through JobConf. Here is how people create a basic-table directly through 
BasicTableOutputFormat:

At JobClient side:
{code}
jobConf.setOutputFormat(BasicTableOutputFormat.class);
Path outPath = new Path("path/to/the/BasicTable");
BasicTableOutputFormat.setOutputPath(jobConf, outPath);
String schema = new String("Name, Age, Salary, BonusPct");
BasicTableOutputFormat.setSchema(jobConf, schema);
{code}

{code}
static class MyReduceClass implements Reducer<K, V, BytesWritable, Tuple> {
  Tuple outRow;
  int idxName, idxAge, idxSalary, idxBonusPct;
 
  public void configure(JobConf job) {
    Schema outSchema = BasicTableOutputFormat.getSchema(job);
    outRow = TypesUtils.createTuple(outSchema);
    idxName = outSchema.getColumnIndex("Name");
    idxAge = outSchema.getColumnIndex("Age");
    idxSalary = outSchema.getColumnIndex("Salary");
    idxBonusPct = outSchema.getColumnIndex("BonusPct");
  }
  public void reduce(K key, Iterator<V> values,         
      OutputCollector<BytesWritable, Tuple> output, Reporter reporter)
      throws IOException {
    String name;
    int age, salary;
    double bonusPct;

        // ... Determine individual field values of the row to be inserted ...

    try {
      outTuple.set(idxName, name);
      outTuple.set(idxAge, new Integer(age));
      outTuple.set(idxSalary, new Integer(salary));
      outTuple.set(idxBonusPct, new Double(bonusPct));
      output.collect(new BytesWritable(name.getBytes()), outTuple);
    } catch (ExecException e) {
      // should never happen
    }
  }
 
  public void close() throws IOException { /* no-op */  } 
}
{code}

> Need to give user control of OutputFormat
> -----------------------------------------
>
>                 Key: PIG-652
>                 URL: https://issues.apache.org/jira/browse/PIG-652
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>
> Pig currently allows users some control over InputFormat via the Slicer and 
> Slice interfaces.  It does not allow any control over OutputFormat and 
> RecordWriter interfaces.  It just allows the user to implement a storage 
> function that controls how the data is serialized.  For hadoop tables, we 
> will need to allow custom OutputFormats that prepare output information and 
> objects needed by a Table store function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-652) Need to give user control of OutputFormat

Reply via email to