[jira] [Created] (CARBONDATA-908) bitmap encode

2017-04-11 Thread Jarck (JIRA)
Jarck created CARBONDATA-908:


 Summary: bitmap encode
 Key: CARBONDATA-908
 URL: https://issues.apache.org/jira/browse/CARBONDATA-908
 Project: CarbonData
  Issue Type: New Feature
  Components: core, data-load, data-query
Reporter: Jarck
Assignee: Jarck


for frequent filter queries on low cardinality columns, use bitmap encode can 
speed up query



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-754) order by query's performance is very bad

2017-03-09 Thread Jarck (JIRA)
Jarck created CARBONDATA-754:


 Summary: order by query's performance is very bad
 Key: CARBONDATA-754
 URL: https://issues.apache.org/jira/browse/CARBONDATA-754
 Project: CarbonData
  Issue Type: Improvement
  Components: core, spark-integration
Reporter: Jarck
Assignee: Jarck


currently the order by dimension query's performance is very bad if there is no 
filter or filtered data is still to large. 
if I was not  wrong, it read all  related data in carbon scan physical level,  
decode the sort dimension's data  and sort all of them in spark sql sort 
physical  plan.

I think we can optimize as below:

1. push down sort (+limit) to carbon scan 

2. leverage the dimension's stored by nature order feature in blocklet level to 
get a sorted data in each partition

3. implements merge-sort/TopN in the spark's sort physical plan

actually I haveI optimized for  "order by only 1 dimension + limit" base on 
branch 0.2. The performance is much better.
sort by 1 dimension +limit 1  in 100 million data , it only take less than 
1 second to get  and print the result.





1. push down






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-748) "between and" filter query is very slow

2017-03-05 Thread Jarck (JIRA)
Jarck created CARBONDATA-748:


 Summary: "between and" filter query is very slow
 Key: CARBONDATA-748
 URL: https://issues.apache.org/jira/browse/CARBONDATA-748
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jarck


Hi,

Currently In include and exclude filter case when dimension column does not
have inverted index it is doing linear search , We can add binary search
when data for that column is sorted, to get this information we can check
in carbon table for that column whether user has selected no inverted index
or not. If user has selected No inverted index while creating a column this
code is fine, if user has not selected then data will be sorted so we can
add binary search which will improve the performance.

Please raise a Jira for this improvement

-Regards
Kumar Vishal


On Fri, Mar 3, 2017 at 7:42 PM, 马云  wrote:

Hi Dev,


I used carbondata version 0.2 in my local machine, and found that the
"between and" filter query is very slow.
the root caused is by the below code in IncludeFilterExecuterImpl.java.
It takes about 20s in my test.
The code's  time complexity is O(n*m). I think it needs to optimized,
please confirm. thanks





 private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
ionColumnDataChunk,

 intnumerOfRows) {

   BitSet bitSet = new BitSet(numerOfRows);

   if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
{

 FixedLengthDimensionDataChunk fixedDimensionChunk =

 (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;

 byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();



 longstart = System.currentTimeMillis();

 for (intk = 0; k < filterValues.length; k++) {

   for (intj = 0; j < numerOfRows; j++) {

 if (ByteUtil.UnsafeComparer.INSTANCE

 .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
filterValues[k].length,

 filterValues[k].length, filterValues[k], 0,
filterValues[k].length) == 0) {

   bitSet.set(j);

 }

   }

 }

 System.out.println("loop time: "+(System.currentTimeMillis() -
start));

   }






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-952) add a new Encoding: BITMAP


2017-04-18 Thread Jarck (JIRA)
Jarck created CARBONDATA-952:


 Summary: add a new Encoding: BITMAP

 Key: CARBONDATA-952
 URL: https://issues.apache.org/jira/browse/CARBONDATA-952
 Project: CarbonData
  Issue Type: Task
Reporter: Jarck
Assignee: Jarck






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-951) create table ddl can specify a bitmap option

2017-04-18 Thread Jarck (JIRA)
Jarck created CARBONDATA-951:


 Summary: create table ddl can specify a bitmap option
 Key: CARBONDATA-951
 URL: https://issues.apache.org/jira/browse/CARBONDATA-951
 Project: CarbonData
  Issue Type: Task
Reporter: Jarck
Assignee: Jarck






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)