[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns
[ https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala updated CARBONDATA-527: --- Assignee: Sujith > Greater than/less-than/Like filters optimization for dictionary encoded > columns > --- > > Key: CARBONDATA-527 > URL: https://issues.apache.org/jira/browse/CARBONDATA-527 > Project: CarbonData > Issue Type: New Feature >Reporter: Sujith >Assignee: Sujith > Time Spent: 50m > Remaining Estimate: 0h > > Current design > In greater than/less-than/Like filters, system first iterates each row > present in the dictionary cache for identifying valid filter actual members > by applying the filter expression , once evaluation done system will hold the > list of identified valid filter actual member values(String), now in next > step again system will look up the dictionary cache in order to identify the > dictionary surrogate values of the identified members. this look up is an > additional cost to our system even though the look up methodology is an > binary search in dictionary cache. > > Proposed design/solution: > Identify the dictionary surrogate values in filter expression evaluation step > itself when actual dictionary values will be scanned for identifying valid > filter members . > Keep a dictionary counter variable which will be increased when system > iterates through the dictionary cache in order to retrieve each actual > member stored in dictionary cache , after this system will evaluate each row > against the filter expression to identify whether its a valid filter member > or not, while doing this process itself counter value can be taken as valid > selected dictionary value since the actual member values and its dictionary > values will be kept in same order in dictionary cache as the iteration order. > thus it will eliminate the further dictionary look up step which is required > to retrieve the dictionary surrogate value against identified actual valid > filter member. this can also increase significantly the filter query > performance of such filter queries which require expression evaluation to > identify it the filter members by looking up dictionary cache, like greater > than/less-than/Like filters . > Note : this optimization is applicable for dictionary columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns
[ https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated CARBONDATA-527: - Issue Type: New Feature (was: Improvement) Summary: Greater than/less-than/Like filters optimization for dictionary encoded columns (was: Greater than/less-than/Like filters optimization for dictionary columns) > Greater than/less-than/Like filters optimization for dictionary encoded > columns > --- > > Key: CARBONDATA-527 > URL: https://issues.apache.org/jira/browse/CARBONDATA-527 > Project: CarbonData > Issue Type: New Feature >Reporter: Sujith > Time Spent: 40m > Remaining Estimate: 0h > > Current design > In greater than/less-than/Like filters, system first iterates each row > present in the dictionary cache for identifying valid filter actual members > by applying the filter expression , once evaluation done system will hold the > list of identified valid filter actual member values(String), now in next > step again system will look up the dictionary cache in order to identify the > dictionary surrogate values of the identified members. this look up is an > additional cost to our system even though the look up methodology is an > binary search in dictionary cache. > > Proposed design/solution: > Identify the dictionary surrogate values in filter expression evaluation step > itself when actual dictionary values will be scanned for identifying valid > filter members . > Keep a dictionary counter variable which will be increased when system > iterates through the dictionary cache in order to retrieve each actual > member stored in dictionary cache , after this system will evaluate each row > against the filter expression to identify whether its a valid filter member > or not, while doing this process itself counter value can be taken as valid > selected dictionary value since the actual member values and its dictionary > values will be kept in same order in dictionary cache as the iteration order. > thus it will eliminate the further dictionary look up step which is required > to retrieve the dictionary surrogate value against identified actual valid > filter member. this can also increase significantly the filter query > performance of such filter queries which require expression evaluation to > identify it the filter members by looking up dictionary cache, like greater > than/less-than/Like filters . > Note : this optimization is applicable for dictionary columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)