Re: [New Feature] Range Filter Optimization

2017-03-21 Thread QiangCai
+1

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/New-Feature-Range-Filter-Optimization-tp9343p9383.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [New Feature] Range Filter Optimization

2017-03-21 Thread Kumar Vishal
+1.

This will help in range query as only one time filter will be applied and
range of binary search will decrease as we can do incremental binary
search. Currently in carbon range filter for dictionary column is slow as
number of filter applied can be more and it degrade the filter query
performance, above optimisation will improve the dictionary column also.

-Regards
Kumar Vishal



On Tue, Mar 21, 2017 at 9:14 AM, sounak  wrote:

> Hi,
>
> Currently I am working in a RANGE Filter Optimization. Previously if in
> case there are both Greater Than and Less than predicates on a same column,
> then they are evaluated spearately where multiple times scanning of the
> blocks are involved which is undoubtedly costly.
>
> To optimize this, if "Greater and Less than" predicates are there the a
> same column and joined by AND expression, then these two expression can be
> merged into a single expression and can form Range Expression. Rather than
> evaluating the expressions separately only one time the block will be
> scanned and if the values lies within Range Min and Max then will be
> selected.
>
> Current the filter expression tree is transformed as shown below. The below
> tranformation helps us to carry the Range MIN MAX value so that prunning
> optimization is applied.
>
> //AND  AND
> // | |
> /// \   / \
> //   /   \ /   \
> //Less   Greater =>  TRUE   Range
> // / \/ \   / \
> ///   \  /   \ /   \
> //   a10 a   5 Less   greater
> // /\
>   /\
> //   /  \
> /  \
> // a   10  a
>   5
>
>
> Presence of a Range in a single expression gives few more optimization
> during block prunning. Comparing Filter Min and Max and Block Min and Max
> values few decisions can be taken as shown below.
>
> Case A)  In case the Filter Range lies completely outside of Block range =>
> Then no scanning is required and the block can be completely skipped.
>
>---
>
> |   BLOCK |
>
> ---
>  -
> Min  MAX
> 
> |  FILTER   |
>   or  |
>FILTER |
>--
>
>  
>  Min   Max
>   Min
>  Max
>
> Case B ) The Filter Completely Covers the Block => No scanning is required.
> Select all Rows of the Block. i.e. turn all bits of block to true.
>
> --
>
>|   BLOCK |
>
>--
>
>  --
>|
>  FILTER |
>
>  --
>
> Case C) The Filter Overlaps the Block, But completely donot cover it. =>
> Choose the Block Scan Start or End based on filter overlaping, no need to
> do a binary search to find the Start and End Index of Block.
>
>
> -
>  --
>   |   BLOCK
>   ||
>  BLOCK|
>
> -
>  -
>
>  OR
>   -
>|  FILTER|
>
> | FILTER|
>---
>
>-
>   Start of Block will be default
> StartIndex Value