Re: Create Avro from bytes, not by fields

Daniel Rodriguez Fri, 07 Feb 2014 17:30:33 -0800

Thanks you Doug!

That was all I needed to make it work.


Just for the record this is the code:

// Writing...
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(schemaString);

File outFile = new File("generated.avro");
DatumWriter<GenericRecord> datumWriter = new 
GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new 
DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, outFile);
dataFileWriter.appendEncoded(body);
dataFileWriter.close();

Thanks again!


On Feb 7, 2014, at 2:29 PM, Doug Cutting <[email protected]> wrote:

> You might use DataFileWriter#appendEncoded:
> 
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer)
> 
> If the body has just single instance of the record then you'd call this once. 
>  If you have multiple instances then you might change the body to have the 
> schema {"type":"array", "items", "bytes"}.
> 
> Doug
> 
> 
> On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez <[email protected]> 
> wrote:
> Hi all,
> 
> Some context (not an expert Java programmer, and just starting with 
> AVRO/Flume): 
> 
> I need to transfer avro files from different servers to HDFS I am trying to 
> use Flume to do it.
> I have a Flume spooldir source (reading the avro files) with an avro sink and 
> avro sink with a HDFS sink. Like this:
> 
>            servers                      |                  hadoop
> spooldir src -> avro sink     -------->       avro src -> hdfs
> 
> When Flume spooldir deserialize the avro files creates an flume event with 
> two fields: 1) header contains the schema; 2) and in the body field has the 
> binary Avro record data, not including the schema or the rest of the 
> container file elements. See the flume docs: 
> http://flume.apache.org/FlumeUserGuide.html#avro
> 
> So the avro sink creates an avro file like this:
> 
> {"headers": {"flume.avro.schema.literal": 
> "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
>  "body": {"bytes": "{BYTES}"}}
> 
> So now I am trying to write a serializer since flume only includes an 
> FlumeEvent serializer creating avro files like the one above, not the 
> original avro files on the servers.
> 
> I am almost there, I got the schema from the header field and the bytes from 
> the body field.
> But now I need to create write the AVRO file based on the bytes, not the 
> values from the fields, I cannot do: r.put("field", "value") since I don't 
> have the values, just the bytes.
> 
> This is the code:
> 
> File file = TESTFILE;
>         
> DatumReader<GenericRecord> datumReader = new 
> GenericDatumReader<GenericRecord>();
> DataFileReader<GenericRecord> dataFileReader = new 
> DataFileReader<GenericRecord>(file, datumReader);
> GenericRecord user = null;
> while (dataFileReader.hasNext()) {
>     user = dataFileReader.next(user);
>     
>     Map headers = (Map) user.get("headers");
>     
>     Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
>     String schema = headers.get(schemaHeaderKey).toString();
>     
>     ByteBuffer body = (ByteBuffer) user.get("body");
>     
>     
>     // Writing...
>     Schema.Parser parser = new Schema.Parser();
>     Schema schemaSimpleWrapper = parser.parse(schema);
>     GenericRecord r =  new GenericData.Record(schemaSimpleWrapper);
> 
>     // NOT SURE WHAT COMES NEXT
> }
> 
> Is possible to actually create the AVRO files from the value bytes?
> 
> I appreciate any help.
> 
> Thanks,
> Daniel
>

Re: Create Avro from bytes, not by fields

Reply via email to