Re: Create Avro from bytes, not by fields

Doug Cutting Fri, 07 Feb 2014 12:38:54 -0800

You might use DataFileWriter#appendEncoded:

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer)


If the body has just single instance of the record then you'd call this
once.  If you have multiple instances then you might change the body to
have the schema {"type":"array", "items", "bytes"}.

Doug


On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez <[email protected]
> wrote:

> Hi all,
>
> Some context (not an expert Java programmer, and just starting with
> AVRO/Flume):
>
> I need to transfer avro files from different servers to HDFS I am trying
> to use Flume to do it.
> I have a Flume spooldir source (reading the avro files) with an avro sink
> and avro sink with a HDFS sink. Like this:
>
>            servers                      |                  hadoop
> spooldir src -> avro sink     -------->       avro src -> hdfs
>
> When Flume spooldir deserialize the avro files creates an flume event with
> two fields: 1) header contains the schema; 2) and in the body field has the
> binary Avro record data, not including the schema or the rest of the
> container file elements. See the flume docs:
> http://flume.apache.org/FlumeUserGuide.html#avro
>
> So the avro sink creates an avro file like this:
>
> {"headers": {"flume.avro.schema.literal":
> "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
> "body": {"bytes": "{BYTES}"}}
>
> So now I am trying to write a serializer since flume only includes an
> FlumeEvent serializer creating avro files like the one above, not the
> original avro files on the servers.
>
> I am almost there, I got the schema from the header field and the bytes
> from the body field.
> But now I need to create write the AVRO file based on the bytes, not the
> values from the fields, I cannot do: r.put("field", "value") since I
> don't have the values, just the bytes.
>
> This is the code:
>
> File file = TESTFILE;
>
> DatumReader<GenericRecord> datumReader = new
> GenericDatumReader<GenericRecord>();
> DataFileReader<GenericRecord> dataFileReader = new
> DataFileReader<GenericRecord>(file, datumReader);
> GenericRecord user = null;
> while (dataFileReader.hasNext()) {
>     user = dataFileReader.next(user);
>
>     Map headers = (Map) user.get("headers");
>
>     Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
>     String schema = headers.get(schemaHeaderKey).toString();
>
>     ByteBuffer body = (ByteBuffer) user.get("body");
>
>
>     // Writing...
>     Schema.Parser parser = new Schema.Parser();
>     Schema schemaSimpleWrapper = parser.parse(schema);
>     GenericRecord r =  new GenericData.Record(schemaSimpleWrapper);
>
>     // NOT SURE WHAT COMES NEXT
> }
>
> Is possible to actually create the AVRO files from the value bytes?
>
> I appreciate any help.
>
> Thanks,
> Daniel
>

Re: Create Avro from bytes, not by fields

Reply via email to