Here is the content of the file base64 encoded.

wwHDssAxUVqSKxhUZXN0IE1lc3NhZ2XaBQ==

On Fri, May 28, 2021 at 12:45 PM Michael A. Smith <mich...@smith-li.com>
wrote:

> > I think the issue here is that the Java BinaryMessageEncoder writes the
> data using a special header that consists of two bytes in the beginning
> followed by the Avro schema fingerprint.
>
> That sounds like Single Object Encoding
> (https://avro.apache.org/docs/current/spec.html#single_object_encoding).
> That's possible, but I'd find it kinda surprising just because I'd
> expect the tools jar to use similar code to what you wrote in Java,
> and your code doesn't explicitly write the single object encoding
> form.
>
> Can you share the entire binary avro that your code produces? You can
> run `base64` on the file and put it in the email.
>
> On Fri, May 28, 2021 at 11:58 AM Chad Preisler <chad.preis...@gmail.com>
> wrote:
> >
> > The function call to array produces an array of bytes. So the code is
> writing out raw binary data. Given that I can read the data back in from
> the output file using the Java API make me think I am writing the data
> correctly.
> >
> > I think the issue here is that the Java BinaryMessageEncoder writes the
> data using a special header that consists of two bytes in the beginning
> followed by the Avro schema fingerprint. I briefly looked at the Python
> avro.io code and did not see where it would look for a fingerprint and
> try to do schema resolution. Do you know if the Python code is doing that
> somewhere? It looks like the python code is looking for b'Obj' followed by
> the number 1 in the header. I only spent about an hour looking at the code
> so I admin, I could be way off on this.
> >
> > Let me know what you think. I will keep digging on my end.
> >
> > On Fri, May 28, 2021 at 10:39 AM Michael A. Smith <mich...@smith-li.com>
> wrote:
> >>
> >> > I created a simple example in Java and wrote some Python to try to
> read the record.
> >>
> >> I think the data your java code is producing might not be valid. I
> >> don't know Java very well, so I can't provide specific advice there,
> >> but I do know the java implementation comes with a tool that should
> >> produce a good example:
> >>
> >> ```
> >> $ tail -n 100 preisler.avsc preisler.json
> >> ==> preisler.avsc <==
> >> {
> >>     "type": "record",
> >>     "name": "simpleMessage",
> >>     "fields": [
> >>         {
> >>             "name": "message",
> >>             "type": "string"
> >>         },
> >>         {
> >>             "name": "aNumber",
> >>             "type": "int"
> >>         }
> >>     ]
> >> }
> >>
> >> ==> preisler.json <==
> >> {
> >>   "message": "Test Message",
> >>   "aNumber": 365
> >> }
> >>
> >> $ java -jar
> ~/dev/avro/lang/java/tools/target/avro-tools-1.11.0-SNAPSHOT.jar
> >> jsontofrag --schema-file preisler.avsc preisler.json >
> >> preisler.avro.frag
> >> 21/05/28 11:25:43 WARN util.NativeCodeLoader: Unable to load
> >> native-hadoop library for your platform... using builtin-java classes
> >> where applicable
> >>
> >> $ base64 preisler.avro.frag  # so you can tell if we're getting the
> same results
> >> GFRlc3QgTWVzc2FnZdoF
> >>
> >> $ python -c 'import avro.io, avro.schema
> >> print(
> >>     avro.io.DatumReader(
> >>         avro.schema.parse(open("preisler.avsc", "rb").read())
> >>     ).read(
> >>         avro.io.BinaryDecoder(open("preisler.avro.frag", "rb"))
> >>     )
> >> )'
> >> {'message': 'Test Message', 'aNumber': 365}
> >> ```
> >>
> >> Sorry my java is not better. Is it correct to change the data to
> >> array() before writing it to a file?
> >> (
> https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java#L50
> )
> >>
> >> On Fri, May 28, 2021 at 10:41 AM Chad Preisler <chad.preis...@gmail.com>
> wrote:
> >> >
> >> > Here is the schema
> >> >
> https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/avroTestSchema.avsc
> >> >
> >> > On Fri, May 28, 2021 at 9:13 AM Michael A. Smith <
> mich...@smith-li.com> wrote:
> >> >>
> >> >> Hi, Chad,
> >> >>
> >> >> Did you share the schema somewhere? Is that something you're able to
> share?
> >> >>
> >> >> On Fri, May 28, 2021 at 10:00 AM Chad Preisler <
> chad.preis...@gmail.com> wrote:
> >> >> >
> >> >> > Hi,
> >> >> > I created a simple example in Java and wrote some Python to try to
> read the record. I am getting the following error when trying to read the
> Java record in Python.
> >> >> >
> >> >> > Traceback (most recent call last):
> >> >> >   File "/home/chad/python/avroReadTest/avro_read_binary_java.py",
> line 18, in <module>
> >> >> >     message = read_datum(java_binary_data, schema)
> >> >> >   File "/home/chad/python/avroReadTest/avro_read_binary_java.py",
> line 10, in read_datum
> >> >> >     return datum_reader.read(decoder)
> >> >> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py",
> line 626, in read
> >> >> >     return self.read_data(self.writers_schema,
> self.readers_schema, decoder)
> >> >> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py",
> line 698, in read_data
> >> >> >     return self.read_record(writers_schema, readers_schema,
> decoder)
> >> >> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py",
> line 898, in read_record
> >> >> >     field_val = self.read_data(field.type, readers_field.type,
> decoder)
> >> >> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py",
> line 655, in read_data
> >> >> >     return decoder.read_utf8()
> >> >> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py",
> line 312, in read_utf8
> >> >> >     return unicode(self.read_bytes(), "utf-8")
> >> >> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in
> position 2: invalid start byte
> >> >> >
> >> >> > Here is a link to the Java code.
> >> >> >
> https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java
> >> >> >
> >> >> > I'll admit I'm fairly new to Python. Here is my Python code.
> >> >> >
> >> >> > import avro.io
> >> >> > import avro.schema
> >> >> > import io
> >> >> >
> >> >> >
> >> >> > def read_datum(buffer, writers_schema, readers_schema=None):
> >> >> >     reader = io.BytesIO(buffer)
> >> >> >     decoder = avro.io.BinaryDecoder(reader)
> >> >> >     datum_reader = avro.io.DatumReader(writers_schema,
> readers_schema)
> >> >> >     return datum_reader.read(decoder)
> >> >> >
> >> >> >
> >> >> > java_binary_data =
> open("/home/chad/app_shared_resources/avroBinaryEncoderTest/java_binary_output.avo",
> "rb").read()
> >> >> > schemaBytes =
> open("/home/chad/app_shared_resources/avroBinaryEncoderTest/avroTestSchema.avsc",
> "rb").read()
> >> >> > print ("Schema read in: " + schemaBytes.decode('UTF-8'))
> >> >> > schema = avro.schema.parse(schemaBytes)
> >> >> > print("Schema " + schema.__str__())
> >> >> > message = read_datum(java_binary_data, schema)
> >> >> > print(message)
> >> >> >
> >> >> > I appreciate any help getting this working.
> >> >> >
> >> >> > Thanks,
> >> >> > Chad
> >> >> >
> >> >> > On Thu, May 27, 2021 at 12:56 PM Michael A. Smith <
> mich...@smith-li.com> wrote:
> >> >> >>
> >> >> >> They should be compatible.
> >> >> >>
> >> >> >> Take a look at lang/py/avro/test/test_io.py in
> >> >> >>
> >> >> >> https://github.com/apache/avro
> >> >> >>
> >> >> >> Line 239 has a simple function that lays it out.
> >> >> >>
> >> >> >> If you encounter a way in which Java and Python are producing
> incompatible results, please let us know.
> >> >> >>
> >> >> >> On Thu, May 27, 2021 at 13:05 Chad Preisler <
> chad.preis...@gmail.com> wrote:
> >> >> >>>
> >> >> >>> Hello,
> >> >> >>>
> >> >> >>> I am writing messages in Java using the BinaryMessageEncoder. I
> would like to read the message in python. Is this supported, or is the
> format written with BinaryMessageEncoder only supported in Java?
> >> >> >>>
> >> >> >>> If it is supported can you point me to a python example that
> reads the binary message format in python?
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Chad
>

Reply via email to