[GitHub] spark pull request: [SPARK-12218] [SQL] Invalid splitting of neste...

gatorsmile Thu, 17 Dec 2015 15:14:26 -0800

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10362#issuecomment-165611857
  
    @yhuai Based on my understanding, currently, our strategy of data source 
filtering is very conservative. We do the filtering twice. We let data sources 
do the filter at the first and then Spark will do it again. 
    
    For example, given a filter ```A or (B AND C)```, if the data source is 
unable to process `C`. We still push it down. The result we got is ```A or 
B```. Spark will do the filtering to ensure the result is correct. I think the 
current strategy will still improve the performance in most cases if the data 
sources support index. 
    
    In this JIRA, the root cause is we failed to process `Not`. In the original 
code, the logics is like 
    ```
            not(A and B) => not(A) and not(B)
            not(A or B) => not(A) or not(B)
    ```
    The above logic is wrong. Thus, we are unable to get the correct result. 
    
    @liancheng  Please correct me if my understanding is wrong.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12218] [SQL] Invalid splitting of neste...

Reply via email to