> I created a simple example in Java and wrote some Python to try to read the > record.
I think the data your java code is producing might not be valid. I don't know Java very well, so I can't provide specific advice there, but I do know the java implementation comes with a tool that should produce a good example: ``` $ tail -n 100 preisler.avsc preisler.json ==> preisler.avsc <== { "type": "record", "name": "simpleMessage", "fields": [ { "name": "message", "type": "string" }, { "name": "aNumber", "type": "int" } ] } ==> preisler.json <== { "message": "Test Message", "aNumber": 365 } $ java -jar ~/dev/avro/lang/java/tools/target/avro-tools-1.11.0-SNAPSHOT.jar jsontofrag --schema-file preisler.avsc preisler.json > preisler.avro.frag 21/05/28 11:25:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable $ base64 preisler.avro.frag # so you can tell if we're getting the same results GFRlc3QgTWVzc2FnZdoF $ python -c 'import avro.io, avro.schema print( avro.io.DatumReader( avro.schema.parse(open("preisler.avsc", "rb").read()) ).read( avro.io.BinaryDecoder(open("preisler.avro.frag", "rb")) ) )' {'message': 'Test Message', 'aNumber': 365} ``` Sorry my java is not better. Is it correct to change the data to array() before writing it to a file? (https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java#L50) On Fri, May 28, 2021 at 10:41 AM Chad Preisler <chad.preis...@gmail.com> wrote: > > Here is the schema > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/avroTestSchema.avsc > > On Fri, May 28, 2021 at 9:13 AM Michael A. Smith <mich...@smith-li.com> wrote: >> >> Hi, Chad, >> >> Did you share the schema somewhere? Is that something you're able to share? >> >> On Fri, May 28, 2021 at 10:00 AM Chad Preisler <chad.preis...@gmail.com> >> wrote: >> > >> > Hi, >> > I created a simple example in Java and wrote some Python to try to read >> > the record. I am getting the following error when trying to read the Java >> > record in Python. >> > >> > Traceback (most recent call last): >> > File "/home/chad/python/avroReadTest/avro_read_binary_java.py", line 18, >> > in <module> >> > message = read_datum(java_binary_data, schema) >> > File "/home/chad/python/avroReadTest/avro_read_binary_java.py", line 10, >> > in read_datum >> > return datum_reader.read(decoder) >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line >> > 626, in read >> > return self.read_data(self.writers_schema, self.readers_schema, >> > decoder) >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line >> > 698, in read_data >> > return self.read_record(writers_schema, readers_schema, decoder) >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line >> > 898, in read_record >> > field_val = self.read_data(field.type, readers_field.type, decoder) >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line >> > 655, in read_data >> > return decoder.read_utf8() >> > File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line >> > 312, in read_utf8 >> > return unicode(self.read_bytes(), "utf-8") >> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 2: >> > invalid start byte >> > >> > Here is a link to the Java code. >> > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java >> > >> > I'll admit I'm fairly new to Python. Here is my Python code. >> > >> > import avro.io >> > import avro.schema >> > import io >> > >> > >> > def read_datum(buffer, writers_schema, readers_schema=None): >> > reader = io.BytesIO(buffer) >> > decoder = avro.io.BinaryDecoder(reader) >> > datum_reader = avro.io.DatumReader(writers_schema, readers_schema) >> > return datum_reader.read(decoder) >> > >> > >> > java_binary_data = >> > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/java_binary_output.avo", >> > "rb").read() >> > schemaBytes = >> > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/avroTestSchema.avsc", >> > "rb").read() >> > print ("Schema read in: " + schemaBytes.decode('UTF-8')) >> > schema = avro.schema.parse(schemaBytes) >> > print("Schema " + schema.__str__()) >> > message = read_datum(java_binary_data, schema) >> > print(message) >> > >> > I appreciate any help getting this working. >> > >> > Thanks, >> > Chad >> > >> > On Thu, May 27, 2021 at 12:56 PM Michael A. Smith <mich...@smith-li.com> >> > wrote: >> >> >> >> They should be compatible. >> >> >> >> Take a look at lang/py/avro/test/test_io.py in >> >> >> >> https://github.com/apache/avro >> >> >> >> Line 239 has a simple function that lays it out. >> >> >> >> If you encounter a way in which Java and Python are producing >> >> incompatible results, please let us know. >> >> >> >> On Thu, May 27, 2021 at 13:05 Chad Preisler <chad.preis...@gmail.com> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I am writing messages in Java using the BinaryMessageEncoder. I would >> >>> like to read the message in python. Is this supported, or is the format >> >>> written with BinaryMessageEncoder only supported in Java? >> >>> >> >>> If it is supported can you point me to a python example that reads the >> >>> binary message format in python? >> >>> >> >>> Thanks, >> >>> Chad