Re: Handling bad records

2012-02-28 Thread madhu phatak
Hi Mohit ,
 A and B refers to two different output files (multipart name). The file
names will be seq-A* and seq-B*.  Its similar to r in part-r-0

On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Thanks that's helpful. In that example what is A and B referring to? Is
 that the output file name?

 mos.getCollector(seq, A, reporter).collect(key, new Text(Bye));
 mos.getCollector(seq, B, reporter).collect(key, new Text(Chau));


 On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote:

  Mohit,
 
  Use the MultipleOutputs API:
 
 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
  to have a named output of bad records. There is an example of use
  detailed on the link.
 
  On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
   What's the best way to write records to a different file? I am doing
 xml
   processing and during processing I might come accross invalid xml
 format.
   Current I have it under try catch block and writing to log4j. But I
 think
   it would be better to just write it to an output file that just
 contains
   errors.
 
 
 
  --
  Harsh J
 




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Handling bad records

2012-02-28 Thread Subir S
Can multiple output be used with Hadoop Streaming?

On Tue, Feb 28, 2012 at 2:07 PM, madhu phatak phatak@gmail.com wrote:

 Hi Mohit ,
  A and B refers to two different output files (multipart name). The file
 names will be seq-A* and seq-B*.  Its similar to r in part-r-0

 On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Thanks that's helpful. In that example what is A and B referring to?
 Is
  that the output file name?
 
  mos.getCollector(seq, A, reporter).collect(key, new Text(Bye));
  mos.getCollector(seq, B, reporter).collect(key, new Text(Chau));
 
 
  On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote:
 
   Mohit,
  
   Use the MultipleOutputs API:
  
  
 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
   to have a named output of bad records. There is an example of use
   detailed on the link.
  
   On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com
 
   wrote:
What's the best way to write records to a different file? I am doing
  xml
processing and during processing I might come accross invalid xml
  format.
Current I have it under try catch block and writing to log4j. But I
  think
it would be better to just write it to an output file that just
  contains
errors.
  
  
  
   --
   Harsh J
  
 



 --
 Join me at http://hadoopworkshop.eventbrite.com/



Re: Handling bad records

2012-02-28 Thread Harsh J
Subir,

No, not unless you use a specialized streaming library (pydoop, dumbo,
etc. for python, for example).

On Tue, Feb 28, 2012 at 2:19 PM, Subir S subir.sasiku...@gmail.com wrote:
 Can multiple output be used with Hadoop Streaming?

 On Tue, Feb 28, 2012 at 2:07 PM, madhu phatak phatak@gmail.com wrote:

 Hi Mohit ,
  A and B refers to two different output files (multipart name). The file
 names will be seq-A* and seq-B*.  Its similar to r in part-r-0

 On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Thanks that's helpful. In that example what is A and B referring to?
 Is
  that the output file name?
 
  mos.getCollector(seq, A, reporter).collect(key, new Text(Bye));
  mos.getCollector(seq, B, reporter).collect(key, new Text(Chau));
 
 
  On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote:
 
   Mohit,
  
   Use the MultipleOutputs API:
  
  
 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
   to have a named output of bad records. There is an example of use
   detailed on the link.
  
   On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com
 
   wrote:
What's the best way to write records to a different file? I am doing
  xml
processing and during processing I might come accross invalid xml
  format.
Current I have it under try catch block and writing to log4j. But I
  think
it would be better to just write it to an output file that just
  contains
errors.
  
  
  
   --
   Harsh J
  
 



 --
 Join me at http://hadoopworkshop.eventbrite.com/




-- 
Harsh J


Handling bad records

2012-02-27 Thread Mohit Anchlia
What's the best way to write records to a different file? I am doing xml
processing and during processing I might come accross invalid xml format.
Current I have it under try catch block and writing to log4j. But I think
it would be better to just write it to an output file that just contains
errors.


Re: Handling bad records

2012-02-27 Thread Harsh J
Mohit,

Use the MultipleOutputs API:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
to have a named output of bad records. There is an example of use
detailed on the link.

On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 What's the best way to write records to a different file? I am doing xml
 processing and during processing I might come accross invalid xml format.
 Current I have it under try catch block and writing to log4j. But I think
 it would be better to just write it to an output file that just contains
 errors.



-- 
Harsh J


Re: Handling bad records

2012-02-27 Thread Mohit Anchlia
Thanks that's helpful. In that example what is A and B referring to? Is
that the output file name?

mos.getCollector(seq, A, reporter).collect(key, new Text(Bye));
mos.getCollector(seq, B, reporter).collect(key, new Text(Chau));


On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote:

 Mohit,

 Use the MultipleOutputs API:

 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
 to have a named output of bad records. There is an example of use
 detailed on the link.

 On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  What's the best way to write records to a different file? I am doing xml
  processing and during processing I might come accross invalid xml format.
  Current I have it under try catch block and writing to log4j. But I think
  it would be better to just write it to an output file that just contains
  errors.



 --
 Harsh J