dhruve commented on issue #23735: [SPARK-26801][SQL] Read avro types other than record URL: https://github.com/apache/spark/pull/23735#issuecomment-463383894 Ideally I would expect input formats to be backward compatible, unless there is a good reason not to be. I understand your views on creating robust unit tests. So lets say something changed in the format. In that case, our tests would continue to pass, however all spark jobs reading files generated with old format end up failing - across organizations. IMHO it is better to address issues like this at the framework/library level. This tries to introduce a shade of integration test with the unit test, but can help identify an issue earlier - which is what I personally prefer. We digress from the main PR. I don't know why we didn't add support for reading non-record types in avro/json. We have a use case where few upstreams are generating avro files with primitive or non-record data. While other frameworks for ex. Pig can handle them, users trying to consider switching to spark are confused by the ability of spark to read only record types.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
