[GitHub] spark pull request: fix issue 4502

cenyuhai Thu, 05 Feb 2015 07:50:58 -0800

Github user cenyuhai commented on the pull request:

    https://github.com/apache/spark/pull/4398#issuecomment-73068860
  
    Hi, Owen. I am so sorry for it.This is the first time for me to create a 
pull request.
    When reading a field of a nested column from Parquet, SparkSQL reads and 
assemble all the fields of that nested column. This pull request is to cut the 
unnecessary columns.
    1. I add a property 'id' for DataType
    2. I add a function 'cutUnnecessaryColumns' in ParquetTableScan, this 
function will cut the Unnecessary Columns in 'output' and return new 
AttributeReferences.
    3.Finally, in function ParquetTypesConverter.convertToString(), I use 
regular expression to remove the 'id' fields in the json string.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: fix issue 4502

Reply via email to