Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/11270#issuecomment-186404419
In order to avoid breaking changes (e.g. we can always read Parquet with
load), maybe we want to special case handle for Parquet beyond looking at file
names.
I looked at the binary protocol (see
https://github.com/Parquet/parquet-format), and it looks like Parquet always
start with "PAR1" in the beginning of the file. That is to say, if the first
four bytes are: 0x50, 0x41, 0x52, 0x31, then it is a Parquet file.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]