GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2060

    [CARBONDATA-2252] Query performance slows down as the number of columns 
increases in like query with OR expression

    Problem: In case of OR condition with like query contains and ends with, 
the filter is getting pushed down to carbon layer because of which the query is 
slow as compared to spark applying the same filter on the results returned from 
carbon
    
    Analysis: This is because in case of like query the execution is done by 
RowLevelFilterExecutorImpl which will compute the data row by row. As the 
number of columns will increase the computation time will increase thereby 
increasing the query time.
    
    Fix: If there is any OR condition with like query, it is better to return 
back all the results to spark and let spark do the computation.
    
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Added test cases       
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata 
like_or_disable_pushdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2060
    
----
commit c2b1bbc2f01700eab37a65df9ab7bd995973efc6
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-03-13T11:15:48Z

    Problem: In case of OR condition with like query contains and ends with, 
the filter is getting pushed down to carbon layer because of which the
    query is slow as compared to spark applying the same filter on the results 
returned from carbon
    
    Analysis: This is because in case of like query the execution is done by 
RowLevelFilterExecutorImpl which will compute the data row by row. As the
    number of columns will increase the computation time will increase thereby 
increassing the query time.
    
    Fix: If there is any OR condition with like query, it is better to return 
back all the results to spark and let spark do the computation.

----


---

Reply via email to