org.apache.avro.mapred.AvroOutputFormat automatically copies metadata
from the job configuration to the output files.  In particular, it
copies values for keys starting with AvroJob.TEXT_PREFIX as strings,
and for those starting with AvroJob.BINARY_PREFIX it decodes a binary
metadata value from the configuration value.

It does not appear that the output formats in
org.apache.avro.mapreduce have yet implemented this feature.  Please
file an issue in Jira if you think it would be useful, ideally
accompanied by a patch that implements it.

Thanks,

Doug

On Fri, May 23, 2014 at 10:16 AM, James Campbell
<ja...@breachintelligence.com> wrote:
> Is it possible to configure a AvroKeyOutputFormat to include specific
> metadata in each of the output files generated by the job?  I’m thinking of
> metadata about the time/version of the job that produced the entire file,
> which would not necessarily need to be stored in each record.
>
>
>
> 2014-05-17 7:20 GMT+08:00 Doug Cutting <cutt...@apache.org>:
>
>
>
>> This incompatibly alters the Avro file format.  Could you perhaps
>
>> instead add this into the Avro file's metadata?
>
>>
>
>> Doug
>
>>
>
>> On Thu, May 15, 2014 at 5:44 AM, Fengyun RAO <raofeng...@gmail.com> wrote:
>
>> > I have a cache file using Avro serialization, and I want to add a magic
>
>> byte
>
>> > indicating cache version at the beginning of the file.
>
>> > I find it's easy to serialize, but difficult to deserialize in C#.
>
>> > First I open a filestream, read my magic byte, and then pass the stream
>
>> to
>
>> > the DataFileReader:
>
>> >
>
>> > var reader = DataFileReader<Dictionary<string,
>
>> MyType>>.OpenReader(stream,
>
>> > CACHE_SCHEMA)
>
>> >
>
>> > but it throws an AvroRuntimeException("Not an Avro data file")
>
>> >
>
>> > I look into the OpenReader() method:
>
>> >
>
>> >   // verify magic header
>
>> >   byte[] magic = new byte[DataFileConstants.Magic.Length];
>
>> >   inStream.Seek(0, SeekOrigin.Begin);
>
>> >
>
>> > It will always seek back to the beginning of the FileStream (which
>
>> includes
>
>> > my own byte), and thus throws an Exception.
>
>> >
>
>> > However, in java version, I could use DataFileStream which wouldn't seek
>
>> > back and it works.
>
>> >
>
>> > Is there a way to make it work in C# version? I also wonder why there
>
>> isn't
>
>> > an equivalent "DataFileStream" class in C#.
>
>>
>
>

Reply via email to