[GitHub] viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested fields from object serializers

GitBox Wed, 06 Feb 2019 07:54:04 -0800

viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested 
fields from object serializers
URL: https://github.com/apache/spark/pull/23740
 
 
   ## What changes were proposed in this pull request?
   
   In SPARK-26619, we make change to prune unnecessary individual serializers 
when serializing objects. This is extension to SPARK-26619. We can further 
prune nested fields from object serializers if they are not used.
   
   For example, in following query, we only use one field in a struct column:
   
   ```scala
   val data = Seq((("a", 1), 1), (("b", 2), 2), (("c", 3), 3))
   val df = data.toDS().map(t => (t._1, t._2 + 1)).select("_1._1")
   ```
   
   So, instead of having a serializer to create a two fields struct, we can 
prune unnecessary field from it. This is what this PR proposes to do.
   
   In order to make this change conservative and safer, a SQL config is added 
to control it. It is disabled by default.
   
   TODO: Support to prune nested fields inside MapType's key and value.
   
   ## How was this patch tested?
   
   Added tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested fields from object serializers

Reply via email to