Here is the content of the file base64 encoded. wwHDssAxUVqSKxhUZXN0IE1lc3NhZ2XaBQ==
On Fri, May 28, 2021 at 12:45 PM Michael A. Smith <mich...@smith-li.com> wrote: > > I think the issue here is that the Java BinaryMessageEncoder writes the > data using a special header that consists of two bytes in the beginning > followed by the Avro schema fingerprint. > > That sounds like Single Object Encoding > (https://avro.apache.org/docs/current/spec.html#single_object_encoding). > That's possible, but I'd find it kinda surprising just because I'd > expect the tools jar to use similar code to what you wrote in Java, > and your code doesn't explicitly write the single object encoding > form. > > Can you share the entire binary avro that your code produces? You can > run `base64` on the file and put it in the email. > > On Fri, May 28, 2021 at 11:58 AM Chad Preisler <chad.preis...@gmail.com> > wrote: > > > > The function call to array produces an array of bytes. So the code is > writing out raw binary data. Given that I can read the data back in from > the output file using the Java API make me think I am writing the data > correctly. > > > > I think the issue here is that the Java BinaryMessageEncoder writes the > data using a special header that consists of two bytes in the beginning > followed by the Avro schema fingerprint. I briefly looked at the Python > avro.io code and did not see where it would look for a fingerprint and > try to do schema resolution. Do you know if the Python code is doing that > somewhere? It looks like the python code is looking for b'Obj' followed by > the number 1 in the header. I only spent about an hour looking at the code > so I admin, I could be way off on this. > > > > Let me know what you think. I will keep digging on my end. > > > > On Fri, May 28, 2021 at 10:39 AM Michael A. Smith <mich...@smith-li.com> > wrote: > >> > >> > I created a simple example in Java and wrote some Python to try to > read the record. > >> > >> I think the data your java code is producing might not be valid. I > >> don't know Java very well, so I can't provide specific advice there, > >> but I do know the java implementation comes with a tool that should > >> produce a good example: > >> > >> ``` > >> $ tail -n 100 preisler.avsc preisler.json > >> ==> preisler.avsc <== > >> { > >> "type": "record", > >> "name": "simpleMessage", > >> "fields": [ > >> { > >> "name": "message", > >> "type": "string" > >> }, > >> { > >> "name": "aNumber", > >> "type": "int" > >> } > >> ] > >> } > >> > >> ==> preisler.json <== > >> { > >> "message": "Test Message", > >> "aNumber": 365 > >> } > >> > >> $ java -jar > ~/dev/avro/lang/java/tools/target/avro-tools-1.11.0-SNAPSHOT.jar > >> jsontofrag --schema-file preisler.avsc preisler.json > > >> preisler.avro.frag > >> 21/05/28 11:25:43 WARN util.NativeCodeLoader: Unable to load > >> native-hadoop library for your platform... using builtin-java classes > >> where applicable > >> > >> $ base64 preisler.avro.frag # so you can tell if we're getting the > same results > >> GFRlc3QgTWVzc2FnZdoF > >> > >> $ python -c 'import avro.io, avro.schema > >> print( > >> avro.io.DatumReader( > >> avro.schema.parse(open("preisler.avsc", "rb").read()) > >> ).read( > >> avro.io.BinaryDecoder(open("preisler.avro.frag", "rb")) > >> ) > >> )' > >> {'message': 'Test Message', 'aNumber': 365} > >> ``` > >> > >> Sorry my java is not better. Is it correct to change the data to > >> array() before writing it to a file? > >> ( > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java#L50 > ) > >> > >> On Fri, May 28, 2021 at 10:41 AM Chad Preisler <chad.preis...@gmail.com> > wrote: > >> > > >> > Here is the schema > >> > > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/avroTestSchema.avsc > >> > > >> > On Fri, May 28, 2021 at 9:13 AM Michael A. Smith < > mich...@smith-li.com> wrote: > >> >> > >> >> Hi, Chad, > >> >> > >> >> Did you share the schema somewhere? Is that something you're able to > share? > >> >> > >> >> On Fri, May 28, 2021 at 10:00 AM Chad Preisler < > chad.preis...@gmail.com> wrote: > >> >> > > >> >> > Hi, > >> >> > I created a simple example in Java and wrote some Python to try to > read the record. I am getting the following error when trying to read the > Java record in Python. > >> >> > > >> >> > Traceback (most recent call last): > >> >> > File "/home/chad/python/avroReadTest/avro_read_binary_java.py", > line 18, in <module> > >> >> > message = read_datum(java_binary_data, schema) > >> >> > File "/home/chad/python/avroReadTest/avro_read_binary_java.py", > line 10, in read_datum > >> >> > return datum_reader.read(decoder) > >> >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", > line 626, in read > >> >> > return self.read_data(self.writers_schema, > self.readers_schema, decoder) > >> >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", > line 698, in read_data > >> >> > return self.read_record(writers_schema, readers_schema, > decoder) > >> >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", > line 898, in read_record > >> >> > field_val = self.read_data(field.type, readers_field.type, > decoder) > >> >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", > line 655, in read_data > >> >> > return decoder.read_utf8() > >> >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", > line 312, in read_utf8 > >> >> > return unicode(self.read_bytes(), "utf-8") > >> >> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in > position 2: invalid start byte > >> >> > > >> >> > Here is a link to the Java code. > >> >> > > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java > >> >> > > >> >> > I'll admit I'm fairly new to Python. Here is my Python code. > >> >> > > >> >> > import avro.io > >> >> > import avro.schema > >> >> > import io > >> >> > > >> >> > > >> >> > def read_datum(buffer, writers_schema, readers_schema=None): > >> >> > reader = io.BytesIO(buffer) > >> >> > decoder = avro.io.BinaryDecoder(reader) > >> >> > datum_reader = avro.io.DatumReader(writers_schema, > readers_schema) > >> >> > return datum_reader.read(decoder) > >> >> > > >> >> > > >> >> > java_binary_data = > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/java_binary_output.avo", > "rb").read() > >> >> > schemaBytes = > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/avroTestSchema.avsc", > "rb").read() > >> >> > print ("Schema read in: " + schemaBytes.decode('UTF-8')) > >> >> > schema = avro.schema.parse(schemaBytes) > >> >> > print("Schema " + schema.__str__()) > >> >> > message = read_datum(java_binary_data, schema) > >> >> > print(message) > >> >> > > >> >> > I appreciate any help getting this working. > >> >> > > >> >> > Thanks, > >> >> > Chad > >> >> > > >> >> > On Thu, May 27, 2021 at 12:56 PM Michael A. Smith < > mich...@smith-li.com> wrote: > >> >> >> > >> >> >> They should be compatible. > >> >> >> > >> >> >> Take a look at lang/py/avro/test/test_io.py in > >> >> >> > >> >> >> https://github.com/apache/avro > >> >> >> > >> >> >> Line 239 has a simple function that lays it out. > >> >> >> > >> >> >> If you encounter a way in which Java and Python are producing > incompatible results, please let us know. > >> >> >> > >> >> >> On Thu, May 27, 2021 at 13:05 Chad Preisler < > chad.preis...@gmail.com> wrote: > >> >> >>> > >> >> >>> Hello, > >> >> >>> > >> >> >>> I am writing messages in Java using the BinaryMessageEncoder. I > would like to read the message in python. Is this supported, or is the > format written with BinaryMessageEncoder only supported in Java? > >> >> >>> > >> >> >>> If it is supported can you point me to a python example that > reads the binary message format in python? > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Chad >