[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns

2017-01-16 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated CARBONDATA-527:
---
Assignee: Sujith

> Greater than/less-than/Like filters optimization for dictionary encoded 
> columns
> ---
>
> Key: CARBONDATA-527
> URL: https://issues.apache.org/jira/browse/CARBONDATA-527
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Sujith
>Assignee: Sujith
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current design 
> In greater than/less-than/Like filters, system first iterates each row 
> present in the dictionary cache for identifying valid filter actual members  
> by applying the filter expression , once evaluation done system will hold the 
> list of identified valid filter actual member values(String), now in next 
> step again  system will look up the dictionary cache in order to identify the 
> dictionary surrogate values of the identified members. this look up is an 
> additional cost to our system even though the look up methodology is an 
> binary search in dictionary cache.
>  
> Proposed design/solution:
> Identify the dictionary surrogate values in filter expression evaluation step 
> itself  when actual dictionary values will be scanned for identifying valid 
> filter members .
> Keep a dictionary counter variable which will be increased  when system 
> iterates through  the dictionary cache in order to retrieve each actual 
> member stored in dictionary cache , after this system will evaluate each row 
> against the filter expression to identify whether its a valid filter member 
> or not, while doing this process itself counter value can be taken as valid 
> selected dictionary value since the actual member values and its  dictionary 
> values will be kept in same order in dictionary cache as the iteration order.
> thus it will eliminate the further dictionary look up step which is required  
> to retrieve the dictionary surrogate value against identified actual valid 
> filter member. this can also increase significantly the filter query 
> performance of such filter queries which require expression evaluation to 
> identify it the filter members by looking up dictionary cache, like greater 
> than/less-than/Like filters .
> Note : this optimization is applicable for dictionary columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-527:
-
Issue Type: New Feature  (was: Improvement)
   Summary: Greater than/less-than/Like filters optimization for dictionary 
encoded columns  (was: Greater than/less-than/Like filters optimization for 
dictionary columns)

> Greater than/less-than/Like filters optimization for dictionary encoded 
> columns
> ---
>
> Key: CARBONDATA-527
> URL: https://issues.apache.org/jira/browse/CARBONDATA-527
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Sujith
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current design 
> In greater than/less-than/Like filters, system first iterates each row 
> present in the dictionary cache for identifying valid filter actual members  
> by applying the filter expression , once evaluation done system will hold the 
> list of identified valid filter actual member values(String), now in next 
> step again  system will look up the dictionary cache in order to identify the 
> dictionary surrogate values of the identified members. this look up is an 
> additional cost to our system even though the look up methodology is an 
> binary search in dictionary cache.
>  
> Proposed design/solution:
> Identify the dictionary surrogate values in filter expression evaluation step 
> itself  when actual dictionary values will be scanned for identifying valid 
> filter members .
> Keep a dictionary counter variable which will be increased  when system 
> iterates through  the dictionary cache in order to retrieve each actual 
> member stored in dictionary cache , after this system will evaluate each row 
> against the filter expression to identify whether its a valid filter member 
> or not, while doing this process itself counter value can be taken as valid 
> selected dictionary value since the actual member values and its  dictionary 
> values will be kept in same order in dictionary cache as the iteration order.
> thus it will eliminate the further dictionary look up step which is required  
> to retrieve the dictionary surrogate value against identified actual valid 
> filter member. this can also increase significantly the filter query 
> performance of such filter queries which require expression evaluation to 
> identify it the filter members by looking up dictionary cache, like greater 
> than/less-than/Like filters .
> Note : this optimization is applicable for dictionary columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)