Re: Setting bytes in Java

Scott Carey Tue, 18 Jan 2011 10:29:02 -0800

Please open a bug report in JIRA.  I don't have time to look at this now,
but someone else might.



On the topic of per record versioning and how to design a system that does
not store schemas per record, there have been useful topics on this
mailing list in the past:


http://search-hadoop.com/m/66jvQoopYw/HAvroBase&subj=Re+question+about+comp
letely+untagged+data+

http://search-hadoop.com/m/q7lLU1GVhHd2/HAvroBase&subj=Re+Versioning+of+an+
array+of+a+record

On 1/18/11 10:08 AM, "David Rosenstrauch" <[email protected]> wrote:

>I've also found this to be the case, and was wondering about it.  I also
>had thought that I could just re-init an existing BinaryEncoder, but
>found that I had to create a new one each time.  I didn't really think
>much of it at the time, but in retrospect it does sound like it might be
>a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you
>might want to open a bug report about this.)
>
>DR
>
>On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
>> Let me first give some context, I would like to store a datum serialized
>> with a BinaryEncoder without having to place a schema with it (as the
>> DataFileWriter does). Instead I have created a container record that
>>stores
>> a unique id for the schema version and a payload field of type "bytes".
>>This
>> allows me to have a self-describing data object (for example, to place
>>in a
>> cell in HBase) without the overhead of a schema per object. (Perhaps
>>there
>> is a better way to do this, if so please let me know).
>>
>> The code looks something like this:
>>
>>      GenericRecord container = new GenericData.Record(containerSchema);
>>      writer.setSchema(containerSchema);
>>      container.put(CONTAINER_SCHEMA_ID_FIELD,
>> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
>>      container.put(CONTAINER_PAYLOAD_FIELD,
>> ByteBuffer.wrap(datumBits.toByteArray()));
>>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
>>      encoder.init(containerBits);
>>      writer.write(container, encoder);
>>      encoder.flush();
>>      containerBits.flush();
>>      containerBits.close();
>>
>> I am trying to reuse an encoder by calling init() to re-initialize it.
>> Perhaps this is what creates the problem. If I create a new encoder each
>> time everything works fine. However, if I just use init, then the
>> OutputStream for the encoder is reset but the OutputStream for the
>> SimpleByteWriter within the encoder is not. This seems to be causing the
>> problem because when the encoder is flushed, it does not write the
>>bytes in
>> the ByteWriter. Perhaps the init() method is not supposed to be used
>>this
>> way. But it would be nice to not have to create a new encoder each time.
>>
>> Can you please let me know if the above looks right and advise me as to
>>what
>> is the best way to do the serialization.
>>
>> Thanks,
>> Dev
>>
>>
>>
>> On Tue, Jan 18, 2011 at 4:14 AM, Scott
>>Carey<[email protected]>wrote:
>>
>>> BinaryEncoder buffers data, you may have to call flush() to see it in
>>>the
>>> output stream.
>>>
>>>
>>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<[email protected]>  wrote:
>>>
>>> Hi,
>>>
>>> I am just beginning to use Avro, so I apologize if this is a silly
>>> question.
>>>
>>> I would like to set a field of type "bytes" in Java. I am assuming
>>>that all
>>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
>>> Unfortunately that does not seem to work. I am using a BinaryEncoder
>>>and
>>> looking at its output, it has not written any the bytes that were in
>>>the
>>> array. The first four values of the array are 0, -128, -128, -128.
>>>
>>> Is it because Java uses 8-bit signed bytes while the Avro spec calls
>>>for
>>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one
>>>convert
>>> Java bytes to the kind accepted by Avro?
>>>
>>> Thanks in advance.
>>>
>>> Dev
>>>
>>>
>>
>

Re: Setting bytes in Java

Reply via email to