viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested fields from object serializers URL: https://github.com/apache/spark/pull/23740 ## What changes were proposed in this pull request? In SPARK-26619, we make change to prune unnecessary individual serializers when serializing objects. This is extension to SPARK-26619. We can further prune nested fields from object serializers if they are not used. For example, in following query, we only use one field in a struct column: ```scala val data = Seq((("a", 1), 1), (("b", 2), 2), (("c", 3), 3)) val df = data.toDS().map(t => (t._1, t._2 + 1)).select("_1._1") ``` So, instead of having a serializer to create a two fields struct, we can prune unnecessary field from it. This is what this PR proposes to do. In order to make this change conservative and safer, a SQL config is added to control it. It is disabled by default. TODO: Support to prune nested fields inside MapType's key and value. ## How was this patch tested? Added tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
