Thanks you Doug!
That was all I needed to make it work.
Just for the record this is the code:
// Writing...
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(schemaString);
File outFile = new File("generated.avro");
DatumWriter<GenericRecord> datumWriter = new
GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new
DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, outFile);
dataFileWriter.appendEncoded(body);
dataFileWriter.close();
Thanks again!
On Feb 7, 2014, at 2:29 PM, Doug Cutting <[email protected]> wrote:
> You might use DataFileWriter#appendEncoded:
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer)
>
> If the body has just single instance of the record then you'd call this once.
> If you have multiple instances then you might change the body to have the
> schema {"type":"array", "items", "bytes"}.
>
> Doug
>
>
> On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez <[email protected]>
> wrote:
> Hi all,
>
> Some context (not an expert Java programmer, and just starting with
> AVRO/Flume):
>
> I need to transfer avro files from different servers to HDFS I am trying to
> use Flume to do it.
> I have a Flume spooldir source (reading the avro files) with an avro sink and
> avro sink with a HDFS sink. Like this:
>
> servers | hadoop
> spooldir src -> avro sink --------> avro src -> hdfs
>
> When Flume spooldir deserialize the avro files creates an flume event with
> two fields: 1) header contains the schema; 2) and in the body field has the
> binary Avro record data, not including the schema or the rest of the
> container file elements. See the flume docs:
> http://flume.apache.org/FlumeUserGuide.html#avro
>
> So the avro sink creates an avro file like this:
>
> {"headers": {"flume.avro.schema.literal":
> "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
> "body": {"bytes": "{BYTES}"}}
>
> So now I am trying to write a serializer since flume only includes an
> FlumeEvent serializer creating avro files like the one above, not the
> original avro files on the servers.
>
> I am almost there, I got the schema from the header field and the bytes from
> the body field.
> But now I need to create write the AVRO file based on the bytes, not the
> values from the fields, I cannot do: r.put("field", "value") since I don't
> have the values, just the bytes.
>
> This is the code:
>
> File file = TESTFILE;
>
> DatumReader<GenericRecord> datumReader = new
> GenericDatumReader<GenericRecord>();
> DataFileReader<GenericRecord> dataFileReader = new
> DataFileReader<GenericRecord>(file, datumReader);
> GenericRecord user = null;
> while (dataFileReader.hasNext()) {
> user = dataFileReader.next(user);
>
> Map headers = (Map) user.get("headers");
>
> Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
> String schema = headers.get(schemaHeaderKey).toString();
>
> ByteBuffer body = (ByteBuffer) user.get("body");
>
>
> // Writing...
> Schema.Parser parser = new Schema.Parser();
> Schema schemaSimpleWrapper = parser.parse(schema);
> GenericRecord r = new GenericData.Record(schemaSimpleWrapper);
>
> // NOT SURE WHAT COMES NEXT
> }
>
> Is possible to actually create the AVRO files from the value bytes?
>
> I appreciate any help.
>
> Thanks,
> Daniel
>