Re: Union in AvroMapper.map: Not in Union??

2013-04-09 Thread Martin Kleppmann
Hi Nir, I'm not an expert with the avro.mapred APIs, but as far as I know, AvroJob does not perform schema evolution, so the schema you provide to AvroJob.setInputSchema has to be the exact same schema with which your input files to the mappers are encoded. So if your input isn't actually a union

Re: Picking up default value for a union?

2013-04-09 Thread Martin Kleppmann
With Avro, it is generally assumed that your reader is working with the exact same schema as the data was written with. If you want to change your schema, e.g. add a field to a record, you still need the exact same schema as was used for writing (the "writer's schema"), but you can also give the de

Re: Issue writing union in avro?

2013-04-09 Thread Jeremy Kahn
I will open a JIRA ticket to request a Python StrictJSONEncoder that produces these type-hints. Probably a StrictJSONDecoder needs to be there too -- at any rate, the StrictJSONDecoder would be nice so that Python could consume JSON-encoded output from Java et al. A StrictJSON{Decoder,Encoder} mig

Parsing a Pair's value - inherited namespace?

2013-04-09 Thread nir_zamir
Hi, I noticed that after calling: /AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(Schema.create(Type.INT), schema));/ (schema is parsed from an avro file, and has no namespace) When the M/R job is run, there's a call to /AvroJob.getJobOutputSchema /which calls /Schema.parse/ - which parses

Re: Avro support for streaming in Python

2013-04-09 Thread Jeremy Karn
Thanks. It looks like the RPC code uses HTTP to transport the messages back and forth and from the code I only see a HTTPTransceiver. Is there support for sending messages directly over standard input/output? Nice name. :) On Tue, Apr 9, 2013 at 10:41 AM, Jeremy Kahn wrote: > It's only pyth

Re: Avro support for streaming in Python

2013-04-09 Thread Jeremy Kahn
It's only python data file logic that doesn't support reading from a stream. Look at the python part of the avro RPC quickstart project (it's available in github, and I point you there only because it's nicely isolated there - the code is all in the Avro trunk now, I think. -- Jeremy Kahn (oh, if

Avro support for streaming in Python

2013-04-09 Thread Jeremy Karn
I'm looking to use Avro to send data back and forth between a Java process and a Python process. I was planning on just streaming the data across standard input but it looks like Python doesn't support reading from a stream (https://issues.apache.org/jira/browse/AVRO-959). Is there a way around th

Failed Serializing Bytes

2013-04-09 Thread Milind Vaidya
//Data to be written unsigned char buffer_data[] = {0x12, 0x34, 0x56, 0x78,0x12,0x34,0x56,0x78,0x12, 0x34, 0x56, 0x78,0x12, 0x34, 0x56, 0x78,0x12,0x34,0x56,0x78,0x12, 0x34, 0x56, 0x78,0x12, 0x34, 0x56, 0x78,0x12,0x34,0x56,0x78,0x12, 0x34, 0x56, 0x78}; //Serialize by

Re: avro_value_t or avro_datum_t

2013-04-09 Thread Milind Vaidya
Cool. Thanks again Doug. Worked like a charm, On Mon, Apr 8, 2013 at 8:32 AM, Douglas Creager wrote: > > //Assume path variable to be having proper value and proper exception > > handling in place > > > > PART A: > > avro_value_t data; > > avro_file_reader_t fileReader; > > > > result = avro

Re: Picking up default value for a union?

2013-04-09 Thread Jonathan Coveney
Stepping through the code, it looks like the code only uses defaults for writing, not for reading. IE at read time it assumes that the defaults were already filled in. It seems like if the reader evolved the schema to include new fields, it would be desirable for the defaults to get filled in if no

Re: Picking up default value for a union?

2013-04-09 Thread Jonathan Coveney
Please note: {"name":"hey", "type":"record", "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also doesn't work 2013/4/9 Jonathan Coveney > I have the following schema: {"name":"hey", "type":"record", > "fields":[{"name":"a","type":["null","string"],"default":null}]} > > I am

Picking up default value for a union?

2013-04-09 Thread Jonathan Coveney
I have the following schema: {"name":"hey", "type":"record", "fields":[{"name":"a","type":["null","string"],"default":null}]} I am trying to deserialize the following against this schema using Java and the GenericDatumReader: {} I get the following error: Caused by: org.apache.avro.AvroTypeExcept

Re: Enabling compression

2013-04-09 Thread Harsh J
Hi Vinod, In Avro, compression is provided only at the file container level (i.e. block compression). For compressing a simple byte array, you can rely on the Hadoop's compression classes such as a GzipCodec [1] to compress the byte stream directly (wrapping via a compressed output stream [2] got