> I created a simple example in Java and wrote some Python to try to read the 
> record.

I think the data your java code is producing might not be valid. I
don't know Java very well, so I can't provide specific advice there,
but I do know the java implementation comes with a tool that should
produce a good example:

```
$ tail -n 100 preisler.avsc preisler.json
==> preisler.avsc <==
{
    "type": "record",
    "name": "simpleMessage",
    "fields": [
        {
            "name": "message",
            "type": "string"
        },
        {
            "name": "aNumber",
            "type": "int"
        }
    ]
}

==> preisler.json <==
{
  "message": "Test Message",
  "aNumber": 365
}

$ java -jar ~/dev/avro/lang/java/tools/target/avro-tools-1.11.0-SNAPSHOT.jar
jsontofrag --schema-file preisler.avsc preisler.json >
preisler.avro.frag
21/05/28 11:25:43 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

$ base64 preisler.avro.frag  # so you can tell if we're getting the same results
GFRlc3QgTWVzc2FnZdoF

$ python -c 'import avro.io, avro.schema
print(
    avro.io.DatumReader(
        avro.schema.parse(open("preisler.avsc", "rb").read())
    ).read(
        avro.io.BinaryDecoder(open("preisler.avro.frag", "rb"))
    )
)'
{'message': 'Test Message', 'aNumber': 365}
```

Sorry my java is not better. Is it correct to change the data to
array() before writing it to a file?
(https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java#L50)

On Fri, May 28, 2021 at 10:41 AM Chad Preisler <chad.preis...@gmail.com> wrote:
>
> Here is the schema
> https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/avroTestSchema.avsc
>
> On Fri, May 28, 2021 at 9:13 AM Michael A. Smith <mich...@smith-li.com> wrote:
>>
>> Hi, Chad,
>>
>> Did you share the schema somewhere? Is that something you're able to share?
>>
>> On Fri, May 28, 2021 at 10:00 AM Chad Preisler <chad.preis...@gmail.com> 
>> wrote:
>> >
>> > Hi,
>> > I created a simple example in Java and wrote some Python to try to read 
>> > the record. I am getting the following error when trying to read the Java 
>> > record in Python.
>> >
>> > Traceback (most recent call last):
>> >   File "/home/chad/python/avroReadTest/avro_read_binary_java.py", line 18, 
>> > in <module>
>> >     message = read_datum(java_binary_data, schema)
>> >   File "/home/chad/python/avroReadTest/avro_read_binary_java.py", line 10, 
>> > in read_datum
>> >     return datum_reader.read(decoder)
>> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line 
>> > 626, in read
>> >     return self.read_data(self.writers_schema, self.readers_schema, 
>> > decoder)
>> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line 
>> > 698, in read_data
>> >     return self.read_record(writers_schema, readers_schema, decoder)
>> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line 
>> > 898, in read_record
>> >     field_val = self.read_data(field.type, readers_field.type, decoder)
>> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line 
>> > 655, in read_data
>> >     return decoder.read_utf8()
>> >   File "/home/chad/.local/lib/python3.8/site-packages/avro/io.py", line 
>> > 312, in read_utf8
>> >     return unicode(self.read_bytes(), "utf-8")
>> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 2: 
>> > invalid start byte
>> >
>> > Here is a link to the Java code.
>> > https://gitlab.com/chad.preisler/avrojavabinaryencoderexample/-/blob/main/src/main/java/chad/preisler/avro/eamples/AvroWriteReadBinary.java
>> >
>> > I'll admit I'm fairly new to Python. Here is my Python code.
>> >
>> > import avro.io
>> > import avro.schema
>> > import io
>> >
>> >
>> > def read_datum(buffer, writers_schema, readers_schema=None):
>> >     reader = io.BytesIO(buffer)
>> >     decoder = avro.io.BinaryDecoder(reader)
>> >     datum_reader = avro.io.DatumReader(writers_schema, readers_schema)
>> >     return datum_reader.read(decoder)
>> >
>> >
>> > java_binary_data = 
>> > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/java_binary_output.avo",
>> >  "rb").read()
>> > schemaBytes = 
>> > open("/home/chad/app_shared_resources/avroBinaryEncoderTest/avroTestSchema.avsc",
>> >  "rb").read()
>> > print ("Schema read in: " + schemaBytes.decode('UTF-8'))
>> > schema = avro.schema.parse(schemaBytes)
>> > print("Schema " + schema.__str__())
>> > message = read_datum(java_binary_data, schema)
>> > print(message)
>> >
>> > I appreciate any help getting this working.
>> >
>> > Thanks,
>> > Chad
>> >
>> > On Thu, May 27, 2021 at 12:56 PM Michael A. Smith <mich...@smith-li.com> 
>> > wrote:
>> >>
>> >> They should be compatible.
>> >>
>> >> Take a look at lang/py/avro/test/test_io.py in
>> >>
>> >> https://github.com/apache/avro
>> >>
>> >> Line 239 has a simple function that lays it out.
>> >>
>> >> If you encounter a way in which Java and Python are producing 
>> >> incompatible results, please let us know.
>> >>
>> >> On Thu, May 27, 2021 at 13:05 Chad Preisler <chad.preis...@gmail.com> 
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I am writing messages in Java using the BinaryMessageEncoder. I would 
>> >>> like to read the message in python. Is this supported, or is the format 
>> >>> written with BinaryMessageEncoder only supported in Java?
>> >>>
>> >>> If it is supported can you point me to a python example that reads the 
>> >>> binary message format in python?
>> >>>
>> >>> Thanks,
>> >>> Chad

Reply via email to