Github user cenyuhai commented on the pull request:
https://github.com/apache/spark/pull/4398#issuecomment-73068860
Hi, Owen. I am so sorry for it.This is the first time for me to create a
pull request.
When reading a field of a nested column from Parquet, SparkSQL reads and
assemble all the fields of that nested column. This pull request is to cut the
unnecessary columns.
1. I add a property 'id' for DataType
2. I add a function 'cutUnnecessaryColumns' in ParquetTableScan, this
function will cut the Unnecessary Columns in 'output' and return new
AttributeReferences.
3.Finally, in function ParquetTypesConverter.convertToString(), I use
regular expression to remove the 'id' fields in the json string.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]