Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Thanks everyone for your contributions, support and patience. It's been a
journey and a half, and I'm excited for the future. I will open a follow-on PR
to address the current known failure scenario (see ignored test) in this patch,
and we can discuss if/how we can get it into 2.4 as well.
I know there are many early adopters of this patch and #16578. Bug reports
will continue to be very helpful.
Beyond this patch, there are many possibilities for widening the scope of
schema pruning. As part of our review process, we've pared the scope of this
capability to just projection. IMHO, the first limitation we should address
post 2.4 is supporting pruning with query filters of nested fields ("where"
clauses). Joins, aggregations and window queries would be powerful enhancements
as well, bringing the scope of schema pruning to analytic queries.
I believe all of the additional features VideoAmp has implemented for
schema pruning are independent of the underlying column store. Future
enhancements should be automagically inherited by any column store that
implements functionality analogous to `ParquetSchemaPruning.scala`. This should
widen not just the audience that can be reached, but the developer community
that can contribute and review.
Thanks again.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]