The magic number check is failing: so the top of the file has some junk in
it?
if (!Arrays.equals(DataFileConstants.MAGIC, magic))
throw new IOException("Not a data file.");
I checked the (verified by read operation) input file: which has the same
schema:
This starts with the Obj^A^B^Vavro.schema<E0>^D
Whereas the reduce output file: has the 0<tab> before the
Obj^A^B^Vavro.schema<E0>^D
0 Obj^A^B^Vavro.schema<E0>^D
This was what I did not expect. Maybe my previous email was unclear.
Thanks,
Nikhil
On 6/8/12 1:35 PM, "Shirahatti, Nikhil" <[email protected]> wrote:
>The reason is: when I try to read the file using GenericReader.. I get the
>error: not a data file.
>
>
>Code snippet:
>--------------
>DatumReader<GenericData.Record> reader = new
>GenericDatumReader<Record>(AVRO_SCHEMA);
>
>String MUXDEMUX_FILE = outpath.concat("part-r-00000");
> InputStream in = new BufferedInputStream(new
>FileInputStream(MUXDEMUX_FILE));
> DataFileStream<GenericData.Record> records = new
>DataFileStream<GenericData.Record>(in,
> reader);
> for (GenericData.Record r : records)
> {
> System.out.println(r.toString());
> }
>
>
>
>Nikhil
>
>On 6/8/12 12:17 PM, "Doug Cutting" <[email protected]> wrote:
>
>>On Fri, Jun 8, 2012 at 11:49 AM, snikhil0 <[email protected]> wrote:
>>> My expectation is that I can use the same input schema to read the
>>>output
>>> file. But alas this is not working.
>>> In the part-r-00000 I have a 0<tab>Obj<Avroschema>....datums...... Why
>>>is
>>> this?
>>
>>That looks approximately like an Avro data file. How is it not what you
>>expect?
>>
>>> Also how can rename the reduce output file to something other than
>>> part-r-0000*?
>>
>>That's the standard name for Hadoop mapreduce output files. You could
>>override it in the OutputFormat, but most folks do not. The name of
>>the directory these are in is normally used to identify the result
>>set. The files within the directory are just fragments of that result
>>set.
>>
>>Doug
>