[GitHub] spark pull request: [SPARK-6715][SQL] Eliminate duplicate filters ...

viirya Wed, 08 Apr 2015 23:48:00 -0700

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/5369#issuecomment-91129184
  
    Okay. However, I think it is still important to make the query plan simpler 
by removing redundant parts even those parts are fast operations.
    
    E.g., for this pr, the redundant filters will cause problems. First, 
because there are duplicate predicates in pushdown parts and `Filter` node, you 
may not know the filtering is done by which one. Previously, we have noticed 
such problem and don't find the pushdown predicates don't working.
    
    Second, if the data source has different logic on a specific predicate than 
`Filter`, duplicate predicates will make something wrong.
    
    The time spent on filtering depends on how many predicates are used and how 
these predicates are complicated. Also the data size. Redundant filtering will 
still cause somewhat performance cost.
    
    As @liancheng I think it would be good if we can revisit this after new api 
is coming out.
    
    Thank you.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6715][SQL] Eliminate duplicate filters ...

Reply via email to