mposdev21 commented on PR #37972:
URL: https://github.com/apache/spark/pull/37972#issuecomment-1264836213
@rangadi
Here are the changes in this PR and it addresses the following review
comments:
- All references to proto has been changed to protobuf
- to_proto. requires a protobuf descriptor for serialization
- In Deserialization, use the binary directly and avoid calling
ByteArrayInputStream
- Handle schema evolution and also check for invalid schema. Look at fields
in UnknownFieldSet and if they appear in the original schema, raise an
exception as this indicates type mismatch. If they don’t appear in the original
schema, they indicate schema evolution. We have added two new test cases :
oldProducer → newConsumer and newProducer →. OldConsumer for schema evolution.
There is also a test case for schema invalidation
- Maps is now converted to MapType. Unit tests were already available.
- TODO comment about sharing code with Avro support
- {DynamicSchema, MessageDefinition}.scala removed as part of toProto not
supporting schema-less mode. Hence no doc comment required
- Added an exception when building the Catalyst schema from protobuf as a
default case
- Recusrive schemas are detected now. A unit test has been added
- ProtobufDeserializer using datatimeRebaseSpec as an argument has been
removed as it was not used
Here are the pending ones
- Python. support
- Schema registry support
- Proto2 support
Let's discuss how we want to approach the pending items.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]