Thanks, Doug.  I just created AVRO-1519.  I did create a patch, but I haven't 
attached it to the ticket yet—I haven't contributed to Avro before, and several 
unrelated tests are failing in my build environment independent of the 
mapreduce changes, so I'll have to take a look at that when I have time before 
I submit this patch.  I also noticed that there don't appear to be any unit 
tests for the mapred implementation of the metadata storage which I would have 
used as a template for the mapreduce implementation (but maybe I'm missing 
that?).

-----Original Message-----
From: Doug Cutting [mailto:[email protected]] 
Sent: Friday, May 23, 2014 1:39 PM
To: [email protected]
Subject: Re: Is it possible to write a magic byte in Avro file head?

org.apache.avro.mapred.AvroOutputFormat automatically copies metadata from the 
job configuration to the output files.  In particular, it copies values for 
keys starting with AvroJob.TEXT_PREFIX as strings, and for those starting with 
AvroJob.BINARY_PREFIX it decodes a binary metadata value from the configuration 
value.

It does not appear that the output formats in org.apache.avro.mapreduce have 
yet implemented this feature.  Please file an issue in Jira if you think it 
would be useful, ideally accompanied by a patch that implements it.

Thanks,

Doug

On Fri, May 23, 2014 at 10:16 AM, James Campbell <[email protected]> 
wrote:
> Is it possible to configure a AvroKeyOutputFormat to include specific 
> metadata in each of the output files generated by the job?  I’m 
> thinking of metadata about the time/version of the job that produced 
> the entire file, which would not necessarily need to be stored in each record.
>
>
>
> 2014-05-17 7:20 GMT+08:00 Doug Cutting <[email protected]>:
>
>
>
>> This incompatibly alters the Avro file format.  Could you perhaps
>
>> instead add this into the Avro file's metadata?
>
>>
>
>> Doug
>
>>
>
>> On Thu, May 15, 2014 at 5:44 AM, Fengyun RAO <[email protected]> wrote:
>
>> > I have a cache file using Avro serialization, and I want to add a 
>> > magic
>
>> byte
>
>> > indicating cache version at the beginning of the file.
>
>> > I find it's easy to serialize, but difficult to deserialize in C#.
>
>> > First I open a filestream, read my magic byte, and then pass the 
>> > stream
>
>> to
>
>> > the DataFileReader:
>
>> >
>
>> > var reader = DataFileReader<Dictionary<string,
>
>> MyType>>.OpenReader(stream,
>
>> > CACHE_SCHEMA)
>
>> >
>
>> > but it throws an AvroRuntimeException("Not an Avro data file")
>
>> >
>
>> > I look into the OpenReader() method:
>
>> >
>
>> >   // verify magic header
>
>> >   byte[] magic = new byte[DataFileConstants.Magic.Length];
>
>> >   inStream.Seek(0, SeekOrigin.Begin);
>
>> >
>
>> > It will always seek back to the beginning of the FileStream (which
>
>> includes
>
>> > my own byte), and thus throws an Exception.
>
>> >
>
>> > However, in java version, I could use DataFileStream which wouldn't 
>> > seek
>
>> > back and it works.
>
>> >
>
>> > Is there a way to make it work in C# version? I also wonder why 
>> > there
>
>> isn't
>
>> > an equivalent "DataFileStream" class in C#.
>
>>
>
>

Reply via email to