[jira] [Created] (CARBONDATA-908) bitmap encode
Jarck created CARBONDATA-908: Summary: bitmap encode Key: CARBONDATA-908 URL: https://issues.apache.org/jira/browse/CARBONDATA-908 Project: CarbonData Issue Type: New Feature Components: core, data-load, data-query Reporter: Jarck Assignee: Jarck for frequent filter queries on low cardinality columns, use bitmap encode can speed up query -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-754) order by query's performance is very bad
Jarck created CARBONDATA-754: Summary: order by query's performance is very bad Key: CARBONDATA-754 URL: https://issues.apache.org/jira/browse/CARBONDATA-754 Project: CarbonData Issue Type: Improvement Components: core, spark-integration Reporter: Jarck Assignee: Jarck currently the order by dimension query's performance is very bad if there is no filter or filtered data is still to large. if I was not wrong, it read all related data in carbon scan physical level, decode the sort dimension's data and sort all of them in spark sql sort physical plan. I think we can optimize as below: 1. push down sort (+limit) to carbon scan 2. leverage the dimension's stored by nature order feature in blocklet level to get a sorted data in each partition 3. implements merge-sort/TopN in the spark's sort physical plan actually I haveI optimized for "order by only 1 dimension + limit" base on branch 0.2. The performance is much better. sort by 1 dimension +limit 1 in 100 million data , it only take less than 1 second to get and print the result. 1. push down -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-748) "between and" filter query is very slow
Jarck created CARBONDATA-748: Summary: "between and" filter query is very slow Key: CARBONDATA-748 URL: https://issues.apache.org/jira/browse/CARBONDATA-748 Project: CarbonData Issue Type: Improvement Reporter: Jarck Hi, Currently In include and exclude filter case when dimension column does not have inverted index it is doing linear search , We can add binary search when data for that column is sorted, to get this information we can check in carbon table for that column whether user has selected no inverted index or not. If user has selected No inverted index while creating a column this code is fine, if user has not selected then data will be sorted so we can add binary search which will improve the performance. Please raise a Jira for this improvement -Regards Kumar Vishal On Fri, Mar 3, 2017 at 7:42 PM, 马云wrote: Hi Dev, I used carbondata version 0.2 in my local machine, and found that the "between and" filter query is very slow. the root caused is by the below code in IncludeFilterExecuterImpl.java. It takes about 20s in my test. The code's time complexity is O(n*m). I think it needs to optimized, please confirm. thanks private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens ionColumnDataChunk, intnumerOfRows) { BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk) { FixedLengthDimensionDataChunk fixedDimensionChunk = (FixedLengthDimensionDataChunk) dimensionColumnDataChunk; byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); longstart = System.currentTimeMillis(); for (intk = 0; k < filterValues.length; k++) { for (intj = 0; j < numerOfRows; j++) { if (ByteUtil.UnsafeComparer.INSTANCE .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j * filterValues[k].length, filterValues[k].length, filterValues[k], 0, filterValues[k].length) == 0) { bitSet.set(j); } } } System.out.println("loop time: "+(System.currentTimeMillis() - start)); } -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-952) add a new Encoding: BITMAP
Jarck created CARBONDATA-952: Summary: add a new Encoding: BITMAP Key: CARBONDATA-952 URL: https://issues.apache.org/jira/browse/CARBONDATA-952 Project: CarbonData Issue Type: Task Reporter: Jarck Assignee: Jarck -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-951) create table ddl can specify a bitmap option
Jarck created CARBONDATA-951: Summary: create table ddl can specify a bitmap option Key: CARBONDATA-951 URL: https://issues.apache.org/jira/browse/CARBONDATA-951 Project: CarbonData Issue Type: Task Reporter: Jarck Assignee: Jarck -- This message was sent by Atlassian JIRA (v6.3.15#6346)