----- Forwarded Message -----
>From: Andrew Kenworthy <[email protected]>
>To: Gaurav Nanda <[email protected]>
>Sent: Thursday, December 8, 2011 3:47 PM
>Subject: Re: Collecting union-ed Records in AvroReducer
>
>
>Hallo Gaurav,
>
>
>Thank you for your reply. My problem is that the writer is implemented by
>GenericDatumWriter which is called via hadoop i.e. in my code I only have
>direct access to an AvroCollector object, which - several layers later -
>invokes a GenericDatumWriter. I don't really want to have re-implement a lot
>of code that the avro-mapred package provides for me.
>
>
>But I think I can get around this by defining my output schema as being one
>with a nested record structure, and "embed" my type B record within the type
>"A". That way i am emitting a single record, albeit holding a composition of
>my two entities.
>
>
>Andrew
>
>
>
>>________________________________
>> From: Gaurav Nanda <[email protected]>
>>To: [email protected]; Andrew Kenworthy <[email protected]>
>>Sent: Thursday, December 8, 2011 3:32 PM
>>Subject: Re: Collecting union-ed Records in AvroReducer
>>
>>You don't need to construct a record object. You can just write your
>>RecordA/RecorbB objects directly.
>>
>>Sample Writer:
>> DatumWriter<Object> datum = new
>>GenericDatumWriter<Object>(schema);
>> DataFileWriter<Object> writer = new DataFileWriter<Object>(datum);
>>
>> FileOutputStream out = new FileOutputStream("h:\\TestFile.avro");
>>
>> writer.create(schema, out);
>> writer.append(1050324); //You can write your recordA/recordB here.
>>
>> writer.close();
>>
>>Sample Reader:
>>
>> File out = new File("h:\\TestFile.avro");
>> GenericDatumReader<Object> datum
= new GenericDatumReader<Object>();
>> DataFileReader<Object> reader = new DataFileReader<Object>(out,
>>datum);
>>
>> while (reader.hasNext()) {
>> System.out.println(reader.next());
>> }
>> reader.close();
>>
>>Hope this helps.
>>
>>Thanks,
>>Gaurav Nanda
>>
>>On Thu, Dec 8, 2011 at 5:40 PM, Andrew Kenworthy <[email protected]>
>>wrote:
>>> Hallo,
>>>
>>> is it possible to write/collect a union-ed record from an avro reducer?
>>>
>>> I have a reduce class (extending AvroReducer), and the output schema is a
>>> union schema of record type A and record type B. In the reduce logic I want
>>> to combine instances of A and B in the same
datum, passing it to my
>>> Avrocollector. My code looks a bit like this:
>>>
>>> Record unionRecord = new GenericData.Record(myUnionSchema); // not legal!
>>> unionRecord.put("type A", recordA);
>>> unionRecord.put("type B", recordB);
>>> collector.collect(unionRecord);
>>>
>>> but GenericData.Record constructor expects a Record Schema. How can I write
>>> both records such that they appear in the same output datum?
>>>
>>> Andrew
>>
>>
>>
>
>