Re: Avro Map Reduce Question: GenericRecord, renaming reduce output

Shirahatti, Nikhil Fri, 08 Jun 2012 14:04:44 -0700

The magic number check is failing: so the top of the file has some junk in
it?


if (!Arrays.equals(DataFileConstants.MAGIC, magic))
      throw new IOException("Not a data file.");



I checked the (verified by read operation) input file: which has the same
schema:
This starts with the Obj^A^B^Vavro.schema<E0>^D

Whereas the reduce output file: has the 0<tab> before the
Obj^A^B^Vavro.schema<E0>^D
0       Obj^A^B^Vavro.schema<E0>^D


This was what I did not expect. Maybe my previous email was unclear.

Thanks,
Nikhil

On 6/8/12 1:35 PM, "Shirahatti, Nikhil" <[email protected]> wrote:

>The reason is: when I try to read the file using GenericReader.. I get the
>error: not a data file.
>
>
>Code snippet:
>--------------
>DatumReader<GenericData.Record> reader = new
>GenericDatumReader<Record>(AVRO_SCHEMA);
>
>String MUXDEMUX_FILE = outpath.concat("part-r-00000");
>               InputStream in = new BufferedInputStream(new
>FileInputStream(MUXDEMUX_FILE));
>               DataFileStream<GenericData.Record> records = new
>DataFileStream<GenericData.Record>(in,
>                               reader);
>               for (GenericData.Record r : records)
>               {
>                       System.out.println(r.toString());
>               }
>
>
>
>Nikhil
>
>On 6/8/12 12:17 PM, "Doug Cutting" <[email protected]> wrote:
>
>>On Fri, Jun 8, 2012 at 11:49 AM, snikhil0 <[email protected]> wrote:
>>> My expectation is that I can use the same input schema to read the
>>>output
>>> file. But alas this is not working.
>>> In the part-r-00000 I have a 0<tab>Obj<Avroschema>....datums...... Why
>>>is
>>> this?
>>
>>That looks approximately like an Avro data file.  How is it not what you
>>expect?
>>
>>> Also how can rename the reduce output file to something other than
>>> part-r-0000*?
>>
>>That's the standard name for Hadoop mapreduce output files.  You could
>>override it in the OutputFormat, but most folks do not.  The name of
>>the directory these are in is normally used to identify the result
>>set.  The files within the directory are just fragments of that result
>>set.
>>
>>Doug
>

Re: Avro Map Reduce Question: GenericRecord, renaming reduce output

Reply via email to