GitHub user manishgupta88 opened a pull request: https://github.com/apache/carbondata/pull/2060
[CARBONDATA-2252] Query performance slows down as the number of columns increases in like query with OR expression Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increasing the query time. Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation. - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Added test cases - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata like_or_disable_pushdown Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2060.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2060 ---- commit c2b1bbc2f01700eab37a65df9ab7bd995973efc6 Author: manishgupta88 <tomanishgupta18@...> Date: 2018-03-13T11:15:48Z Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increassing the query time. Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation. ---- ---