Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/5369#issuecomment-91129184
Okay. However, I think it is still important to make the query plan simpler
by removing redundant parts even those parts are fast operations.
E.g., for this pr, the redundant filters will cause problems. First,
because there are duplicate predicates in pushdown parts and `Filter` node, you
may not know the filtering is done by which one. Previously, we have noticed
such problem and don't find the pushdown predicates don't working.
Second, if the data source has different logic on a specific predicate than
`Filter`, duplicate predicates will make something wrong.
The time spent on filtering depends on how many predicates are used and how
these predicates are complicated. Also the data size. Redundant filtering will
still cause somewhat performance cost.
As @liancheng I think it would be good if we can revisit this after new api
is coming out.
Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]