Re: [Discussion] Implement Partition Table Feature

2017-04-14 Thread Jacky Li
Hi Cao Lu, The overall design likes good to me, I just have following points need to confirm: 1. Is there detele partition DDL? 2. For the data loading part, it needs to do global shuffle before actual data loading? And the partition key should not be included in SORT_COLUMNS option, right? If

Re: [jira] [Created] (CARBONDATA-836) Error in load using dataframe - columns containing comma

2017-04-11 Thread Jacky Li
Hi Sanoj, This is because in CarbonData loading flow, it needs to scan input data twice (one for generating global dictionary, another for actual loading). If user is using Dataframe to write to CarbonData, and if the input dataframe compute is costly, it is better to save it as a temporary

[jira] [Created] (CARBONDATA-882) Add no sort support in dataframe writer

2017-04-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-882: --- Summary: Add no sort support in dataframe writer Key: CARBONDATA-882 URL: https://issues.apache.org/jira/browse/CARBONDATA-882 Project: CarbonData Issue Type

Re: [DISCUSSION]support new feature: Partition Table

2017-04-05 Thread Jacky Li
comments inline > 在 2017年4月1日,下午5:06,a 写道: > > additinal suggestion: > 1、support at least two level partition I think we can let user specify the partition columns, it can be multiple columns together to form a partition key. Is this what you mean by two level partition?

Re: [DISCUSSION]implement delta encoding for numeric type column in SORT_COLUMNS

2017-04-05 Thread Jacky Li
> 在 2017年4月5日,下午6:31,QiangCai 写道: > > Hi all, > >Now we plan to implement delta encoding for the numeric type column in > SORT_COLUMNS. > >1. use delta encoding to encode the numeric type data > I think the adaptive data type conversion still apply here, right? >

Re: [VOTE] Apache CarbonData 1.1.0-incubating (RC1) release

2017-04-05 Thread Jacky Li
I think better to resolve following issue before 1.1.0 release document should be synchronized : [CARBONDATA-865] [CARBONDATA-862] bug: [CARBONDATA-870]

Re: [DISCUSSION]: (New Feature) Streaming Ingestion into CarbonData

2017-03-29 Thread Jacky Li
k, in that case implementing > streaming file format is a possibility. > > Best Regards, > Aniket > > On Tue, Mar 28, 2017 at 8:22 AM, Jacky Li <jacky.li...@qq.com> wrote: > >> Hi Aniket, >> >> This feature looks great, the overall plan also seems fi

Re: [DISCUSSION]: (New Feature) Streaming Ingestion into CarbonData

2017-03-28 Thread Jacky Li
Hi Aniket, This feature looks great, the overall plan also seems fine to me. Thanks for proposing it. And I have some doubts inline. > 在 2017年3月27日,下午6:34,Aniket Adnaik 写道: > > Hi All, > > I would like to open up a discussion for new feature to support streaming >

[jira] [Created] (CARBONDATA-829) DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL

2017-03-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-829: --- Summary: DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL Key: CARBONDATA-829 URL: https://issues.apache.org/jira/browse/CARBONDATA-829 Project

[jira] [Created] (CARBONDATA-827) Query statistics log format is incorrect

2017-03-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-827: --- Summary: Query statistics log format is incorrect Key: CARBONDATA-827 URL: https://issues.apache.org/jira/browse/CARBONDATA-827 Project: CarbonData Issue Type

Re: data not input hive

2017-03-27 Thread Jacky Li
Hi, Carbon does not support load data using Hive yet. You can use Spark to load. Regards, Jacky > 在 2017年3月27日,下午2:17,风云际会 <1141982...@qq.com> 写道: > > spark 2.1.0 > hive 1.2.1 > Couldn't find corresponding Hive SerDe for data source provider > org.apache.spark.sql.CarbonSource. Persisting

[jira] [Created] (CARBONDATA-823) Refactory of data write step

2017-03-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-823: --- Summary: Refactory of data write step Key: CARBONDATA-823 URL: https://issues.apache.org/jira/browse/CARBONDATA-823 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-820) Redundant BitSet created in data load

2017-03-25 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-820: --- Summary: Redundant BitSet created in data load Key: CARBONDATA-820 URL: https://issues.apache.org/jira/browse/CARBONDATA-820 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-812) make vectorized reader as default reader

2017-03-23 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-812: --- Summary: make vectorized reader as default reader Key: CARBONDATA-812 URL: https://issues.apache.org/jira/browse/CARBONDATA-812 Project: CarbonData Issue Type

Re: [PROPOSAL] Update on the Jenkins CarbonData job

2017-03-18 Thread Jacky Li
+1 > 在 2017年3月17日,下午10:48,Jean-Baptiste Onofré 写道: > > Hi guys, > > Tomorrow I plan to update our jobs on Apache Jenkins as the following: > > - carbondata-master-spark-1.5 building master branch with Spark 1.5 profile > - carbondata-master-spark-1.6 building master branch

Re: column auto mapping when loading data from csv file

2017-03-14 Thread Jacky Li
Hi Yinwei, I am OK with this new feature if there is an option in load script to enable it. So user can explicitly enable it if he wants, and not changing the current 2 choices. Regards, Jacky > 在 2017年3月13日,上午10:18,Yinwei Li <251469...@qq.com> 写道: > > Hi all, > > > when loading data from

[jira] [Created] (CARBONDATA-747) Add simple performance test for spark2.1 carbon integration

2017-03-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-747: --- Summary: Add simple performance test for spark2.1 carbon integration Key: CARBONDATA-747 URL: https://issues.apache.org/jira/browse/CARBONDATA-747 Project: CarbonData

[jira] [Created] (CARBONDATA-746) Support spark-sql CLI for spark2.1 carbon integration

2017-03-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-746: --- Summary: Support spark-sql CLI for spark2.1 carbon integration Key: CARBONDATA-746 URL: https://issues.apache.org/jira/browse/CARBONDATA-746 Project: CarbonData

Re: Improving Non-dictionary storage & performance.

2017-03-02 Thread Jacky Li
t; no-dictionary columns. > As you mentioned we can suggest 2-pass for first load and subsequent loads > will use single-pass to improve the performance. > > Regards, > Ravindra. > > On 2 March 2017 at 06:48, Jacky Li <jacky.li...@qq.com> wrote: > >> Hi Ravindra &

Re: Improving Non-dictionary storage & performance.

2017-03-01 Thread Jacky Li
Hi Ravindra & Vishal, Yes, I think these works need to be done before switching no-dictionary as default. So as of now, we should use dictionary as default. I think we can suggest user to do loading as: 1. First load: use 2-pass mode to load, the first scan should discover the cardinality, and

Re: [DISCUSS] Graduation to a TLP (Top Level Project)

2017-03-01 Thread Jacky Li
+1 Thanks JB for driving it. I am super exited! Regards, Jacky > 在 2017年3月2日,上午12:33,Naresh P R 写道: > > +1 > > Thanks for your guidance JB > > Regards, > Naresh P R > > On Mar 1, 2017 3:50 PM, "Jean-Baptiste Onofré" wrote: > > Hi Liang, > >

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
IONARY, then system how to handle this case ? >> >> --- >> For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as >> Inverted Index and with Minmax Ind

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
- > For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as > Inverted Index and with Minmax Index > Sort it using original value > Regards > Liang > > 2017-02-28 19:35 GMT+08:00 Jacky Li <jacky.li...@qq.com>: > >&

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this

[ANNOUNCE] Apache CarbonData 1.0.0-incubating released

2017-01-29 Thread Jacky Li
Hi All, The Apache CarbonData PMC team is happy to annouce the release of Apache CarbonData version 1.0.0-incubating. Apache CarbonData(incubating) is an indexed columnar data format for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. The release notes is available

Re: [VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-20 Thread Jacky Li
tor-carbondata/tree/master/build> > 在 2017年1月21日,上午9:36,Jacky Li <jacky.li...@qq.com> 写道: > > Hi all, > > Please vote on releasing the following candidate as Apache > CarbonData(incubating) > version 1.0.0. > > Release Notes: > https://issues.apache.

[VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-20 Thread Jacky Li
Hi all, Please vote on releasing the following candidate as Apache CarbonData(incubating) version 1.0.0. Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12338020

[jira] [Created] (CARBONDATA-638) Move package in carbon-core module

2017-01-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-638: --- Summary: Move package in carbon-core module Key: CARBONDATA-638 URL: https://issues.apache.org/jira/browse/CARBONDATA-638 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-606) Add a Flink example to read CarbonData files

2017-01-07 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-606: --- Summary: Add a Flink example to read CarbonData files Key: CARBONDATA-606 URL: https://issues.apache.org/jira/browse/CARBONDATA-606 Project: CarbonData Issue

Re: Unable to run Carbon Shell on Spark 2.0

2016-12-29 Thread Jacky Li
Hi Harmeet, Now ThriftServer that uses CarbonSession has been merged into master branch, please try the latest master. Thanks. Regards, Jacky -- View this message in context:

[jira] [Created] (CARBONDATA-571) clean up code for carbon-spark module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-571: --- Summary: clean up code for carbon-spark module Key: CARBONDATA-571 URL: https://issues.apache.org/jira/browse/CARBONDATA-571 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-570) clean up code for carbon-hadoop module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-570: --- Summary: clean up code for carbon-hadoop module Key: CARBONDATA-570 URL: https://issues.apache.org/jira/browse/CARBONDATA-570 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-569) clean up code for carbon-core module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-569: --- Summary: clean up code for carbon-core module Key: CARBONDATA-569 URL: https://issues.apache.org/jira/browse/CARBONDATA-569 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-566) clean up code for carbon-spark2 module

2016-12-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-566: --- Summary: clean up code for carbon-spark2 module Key: CARBONDATA-566 URL: https://issues.apache.org/jira/browse/CARBONDATA-566 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-565) Clean up code

2016-12-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-565: --- Summary: Clean up code Key: CARBONDATA-565 URL: https://issues.apache.org/jira/browse/CARBONDATA-565 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-539) Return empty row in map reduce application

2016-12-18 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-539: --- Summary: Return empty row in map reduce application Key: CARBONDATA-539 URL: https://issues.apache.org/jira/browse/CARBONDATA-539 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-538) Add test case to spark2 integration

2016-12-15 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-538: --- Summary: Add test case to spark2 integration Key: CARBONDATA-538 URL: https://issues.apache.org/jira/browse/CARBONDATA-538 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-537) Bug fix for DICTIONARY_EXCLUDE option in spark2 integration

2016-12-15 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-537: --- Summary: Bug fix for DICTIONARY_EXCLUDE option in spark2 integration Key: CARBONDATA-537 URL: https://issues.apache.org/jira/browse/CARBONDATA-537 Project: CarbonData

Re: Some questions about compiling carbondata

2016-12-15 Thread Jacky Li
Hi, You do not need to specify spark.version variable, you can try these: mvn clean package -DskipTests -Pspark-2.0 (to build carbon with spark-2.0.2) mvn clean package -DskipTests (to build carbon with spark-1.5.2, which is default profile) Regards, Jacky -- View this message in context:

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Sorry for the incorrect formatting of previous post. I corrected it in this post. Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do

[DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do actual data encoding and write to carbon files. Obviously, this approach is simple,

[jira] [Created] (CARBONDATA-531) Remove spark dependency in carbon core

2016-12-13 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-531: --- Summary: Remove spark dependency in carbon core Key: CARBONDATA-531 URL: https://issues.apache.org/jira/browse/CARBONDATA-531 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-513) Reduce number of BigDecimal objects for scan

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-513: --- Summary: Reduce number of BigDecimal objects for scan Key: CARBONDATA-513 URL: https://issues.apache.org/jira/browse/CARBONDATA-513 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-512) Reduce number of Timestamp formatter

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-512: --- Summary: Reduce number of Timestamp formatter Key: CARBONDATA-512 URL: https://issues.apache.org/jira/browse/CARBONDATA-512 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-511) Integrate with Spark's TaskMemoryManager

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-511: --- Summary: Integrate with Spark's TaskMemoryManager Key: CARBONDATA-511 URL: https://issues.apache.org/jira/browse/CARBONDATA-511 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-498) Refactor compression model

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-498: --- Summary: Refactor compression model Key: CARBONDATA-498 URL: https://issues.apache.org/jira/browse/CARBONDATA-498 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-495) Unify compressor interface

2016-12-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-495: --- Summary: Unify compressor interface Key: CARBONDATA-495 URL: https://issues.apache.org/jira/browse/CARBONDATA-495 Project: CarbonData Issue Type: Bug

[jira] [Created] (CARBONDATA-490) Unify all RDD in carbon-spark and carbon-spark2 module

2016-12-02 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-490: --- Summary: Unify all RDD in carbon-spark and carbon-spark2 module Key: CARBONDATA-490 URL: https://issues.apache.org/jira/browse/CARBONDATA-490 Project: CarbonData

[jira] [Created] (CARBONDATA-487) spark2 integration is not compiling

2016-12-02 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-487: --- Summary: spark2 integration is not compiling Key: CARBONDATA-487 URL: https://issues.apache.org/jira/browse/CARBONDATA-487 Project: CarbonData Issue Type: Bug

[jira] [Created] (CARBONDATA-463) Extract spark-common module

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-463: --- Summary: Extract spark-common module Key: CARBONDATA-463 URL: https://issues.apache.org/jira/browse/CARBONDATA-463 Project: CarbonData Issue Type: Sub-task

[jira] [Created] (CARBONDATA-462) Clean up code before moving to spark-common package

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-462: --- Summary: Clean up code before moving to spark-common package Key: CARBONDATA-462 URL: https://issues.apache.org/jira/browse/CARBONDATA-462 Project: CarbonData

[jira] [Created] (CARBONDATA-461) Clean partitioner in RDD package

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-461: --- Summary: Clean partitioner in RDD package Key: CARBONDATA-461 URL: https://issues.apache.org/jira/browse/CARBONDATA-461 Project: CarbonData Issue Type: Sub

Re: [Feature Proposal] Spark 2 integration with CarbonData

2016-11-28 Thread Jacky Li
Hi Ramana, Sure, I can work out a subtasks list and put it under CARBONDATA-322 Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3278.html Sent from the Apache

Re: [Feature ]Design Document for Update/Delete support in CarbonData

2016-11-26 Thread Jacky Li
Hi Aniket, Yes, background monitor process is preferred in the future. And there are other places need this process already, like refreshing the caches in driver and executors. Currently, dictionary caches and index caches are refreshed by checking timestamp in every query, which introduces

[Feature Proposal] Spark 2 integration with CarbonData

2016-11-26 Thread Jacky Li
in next CarbonData release. What do you think about this idea? All kinds of contribution and suggestions are welcomed. Regards, Jacky Li -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData

[jira] [Created] (CARBONDATA-449) Remove unnecessary log property

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-449: --- Summary: Remove unnecessary log property Key: CARBONDATA-449 URL: https://issues.apache.org/jira/browse/CARBONDATA-449 Project: CarbonData Issue Type

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-24 Thread Jacky Li
+1, and comments inline > 在 2016年11月24日,上午12:09,Venkata Gollamudi 写道: > > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. >

[jira] [Created] (CARBONDATA-448) Solve compilation error in core for spark2

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-448: --- Summary: Solve compilation error in core for spark2 Key: CARBONDATA-448 URL: https://issues.apache.org/jira/browse/CARBONDATA-448 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-447) Use Carbon log service instead of spark Logging

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-447: --- Summary: Use Carbon log service instead of spark Logging Key: CARBONDATA-447 URL: https://issues.apache.org/jira/browse/CARBONDATA-447 Project: CarbonData

[jira] [Created] (CARBONDATA-429) Remove unnecessary file name check in dictionary cache

2016-11-21 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-429: --- Summary: Remove unnecessary file name check in dictionary cache Key: CARBONDATA-429 URL: https://issues.apache.org/jira/browse/CARBONDATA-429 Project: CarbonData

Re: Single Pass Data Load Design

2016-11-13 Thread Jacky Li
Hi Ravindra, Thanks for proposing this design. It is really exciting if CarbonData can do 1-pass solution for loading. I have given some comment in the design document. Regards, Jacky -- View this message in context:

Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread Jacky Li
+1 binding Regards, Jacky ---Original--- From: "Aniket Adnaik" Date: 2016/11/10 14:43:49 To: "dev";"chenliang613"; Subject: Re: [VOTE] Apache CarbonData 0.2.0-incubating release +1 Regards, Aniket On 9

[jira] [Created] (CARBONDATA-403) add example for data load without using kettle

2016-11-10 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-403: --- Summary: add example for data load without using kettle Key: CARBONDATA-403 URL: https://issues.apache.org/jira/browse/CARBONDATA-403 Project: CarbonData

Re: Use of ANTLR instead of CarbonSqlParser

2016-11-08 Thread Jacky Li
on with Spark 2.0 is planned in near future, we can switch > to ANTLR parser at that time as well. > > On Mon, Nov 7, 2016 at 6:59 AM, Jacky Li <jacky.li...@qq.com> wrote: > >> Hi, >> >> It is because CarbonData currently is integrated with Spark 1.5/1.6 and >

Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-08 Thread Jacky Li
+1 Regards, Jacky > 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道: > > +1 > regards > Jay > > > > > -- 原始邮件 -- > 发件人: "向志强";; > 发送时间: 2016年11月9日(星期三) 上午8:59 > 收件人: "dev"; > > 主题: Re: As planed,

[jira] [Created] (CARBONDATA-331) Support no compression option while loading

2016-10-20 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-331: --- Summary: Support no compression option while loading Key: CARBONDATA-331 URL: https://issues.apache.org/jira/browse/CARBONDATA-331 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

2016-10-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-318: --- Summary: Implement an ExternalSorter that makes maximum usage of memory while sorting Key: CARBONDATA-318 URL: https://issues.apache.org/jira/browse/CARBONDATA-318

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
Hi, I can offer one more approach for this discussion, since new dictionary values are rare in case of incremental load (ensure first load having as much dictionary value as possible), so synchronization should be rare. So how about using Zookeeper + HDFS file to provide this service. This is

[jira] [Created] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-312: --- Summary: Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation Key: CARBONDATA-312 URL: https://issues.apache.org/jira/browse/CARBONDATA-312

[jira] [Created] (CARBONDATA-309) Support two types of ReadSupport in CarbonRecordReader

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-309: --- Summary: Support two types of ReadSupport in CarbonRecordReader Key: CARBONDATA-309 URL: https://issues.apache.org/jira/browse/CARBONDATA-309 Project: CarbonData

[jira] [Created] (CARBONDATA-308) Support multiple segment in CarbonHadoopFSRDD

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-308: --- Summary: Support multiple segment in CarbonHadoopFSRDD Key: CARBONDATA-308 URL: https://issues.apache.org/jira/browse/CARBONDATA-308 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-307) Support full functionality in CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-307: --- Summary: Support full functionality in CarbonInputFormat Key: CARBONDATA-307 URL: https://issues.apache.org/jira/browse/CARBONDATA-307 Project: CarbonData

Re: [Discussion] Code generation in carbon result preparation

2016-10-12 Thread Jacky Li
Hi Vishal, Which part of the preparation are you considering? The column stitching in the executor side? Regards, Jacky > 在 2016年10月12日,下午9:24,Kumar Vishal 写道: > > Hi All, > Currently we are preparing the final result row wise, as number of columns > present in

Re: Discussion regrading design of data load after kettle removal.

2016-10-10 Thread Jacky Li
Hi Ravindra, It seems the picture is missing, can you post it in a URL and share the link? Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1725.html Sent

Re: Discussion about using multi local directorys to improve dataloading perfomance

2016-10-08 Thread Jacky Li
Yes, I think it is a good feature to have. Please feel free to create JIRA issue and Pull Request. Regards, Jacky > 在 2016年10月9日,上午12:04,caiqiang 写道: > > Hi All, > For each dataloading, we write the sorted temp files into only one different > local directory. I think this

[jira] [Created] (CARBONDATA-285) Use path parameter in Spark datasource API

2016-10-04 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-285: --- Summary: Use path parameter in Spark datasource API Key: CARBONDATA-285 URL: https://issues.apache.org/jira/browse/CARBONDATA-285 Project: CarbonData Issue

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
. If data is append only, I think read and load of index is enough. But if we introduce update into CarbonData, then index should be updatable as well, I think it is better to consider this together in data update feature if there are any in the future? > > Jenny > > -

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
> 在 2016年10月4日,上午5:43,Qingqing Zhou <zhouqq.car...@gmail.com> 写道: > > On Fri, Sep 30, 2016 at 10:31 PM, Jacky Li <jacky.li...@qq.com> wrote: >> However, it also introduces memory consumption of the index tree and >> impact first query time because the process

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
I have created a JIRA and a PR for this: CARBONDATA-284 (https://issues.apache.org/jira/browse/CARBONDATA-284) PR208 (https://github.com/apache/incubator-carbondata/pull/208) Please review the interface Regards, Jacky -- View this message in context:

[jira] [Created] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-03 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-284: --- Summary: Abstracting Index and Segment interface Key: CARBONDATA-284 URL: https://issues.apache.org/jira/browse/CARBONDATA-284 Project: CarbonData Issue Type

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
umar Vishal > > On Mon, Oct 3, 2016 at 1:08 PM, Jacky Li <[hidden email] > > wrote: > > > Agreed. Shall I create a JIRA issue and PR for this abstraction? > > I think reviewing on the interface code will be clearer. > > > > Regards, > > Jack

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
> Aniket > > > > On Sun, Oct 2, 2016 at 10:25 PM, Jacky Li <[hidden email] > > wrote: > > > After a second thought regarding the index part, another option is that to > > have a very simple Segment definition which can only list all f

Re: Abstracting CarbonData's Index Interface

2016-10-02 Thread Jacky Li
). In future, developer is free to create MultiIndexSegment to select index internally. Is this option better? Regards, Jacky > 在 2016年10月3日,上午11:00,Jacky Li <jacky.li...@qq.com> 写道: > > I am currently thinking these abstractions: > > - A SegmentManager is the global manager of

Re: Abstracting CarbonData's Index Interface

2016-10-02 Thread Jacky Li
straction required for both Index and Index store. > Also multi-column index(composite index) needs to be considered. > > Regards, > Ramana > > On Sat, Oct 1, 2016 at 11:01 AM, Jacky Li <jacky.li...@qq.com> wrote: > >> Hi community, >> >>Currentl

[jira] [Created] (CARBONDATA-282) Add segment management in CarbonExample

2016-10-01 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-282: --- Summary: Add segment management in CarbonExample Key: CARBONDATA-282 URL: https://issues.apache.org/jira/browse/CARBONDATA-282 Project: CarbonData Issue Type

Abstracting CarbonData's Index Interface

2016-09-30 Thread Jacky Li
Hi community, Currently CarbonData have builtin index support which is one of the key strength of CarbonData. Using index, CarbonData can do very fast filter query by filtering on block and blocklet level. However, it also introduces memory consumption of the index tree and impact first query

Re: intellij compiling issue

2016-09-28 Thread Jacky Li
Hi Ravindra, Since release 0.1.1 tag is made, can we start reviewing and merging PR127 and PR132 now? I think it will solve Qingqing’s problem. Regards, Jacky > 在 2016年9月28日,下午5:15,Ravindra Pesala 写道: > > Hi , > > Please have a look into following jira issue to solve

[jira] [Created] (CARBONDATA-265) Improve Dataframe write to CarbonData file from CSV file

2016-09-21 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-265: --- Summary: Improve Dataframe write to CarbonData file from CSV file Key: CARBONDATA-265 URL: https://issues.apache.org/jira/browse/CARBONDATA-265 Project: CarbonData

[jira] [Created] (CARBONDATA-257) Make CarbonData readable through Spark/MapReduce program

2016-09-18 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-257: --- Summary: Make CarbonData readable through Spark/MapReduce program Key: CARBONDATA-257 URL: https://issues.apache.org/jira/browse/CARBONDATA-257 Project: CarbonData

[jira] [Created] (CARBONDATA-240) Use SQLContext to query CarbonData directly without creating table

2016-09-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-240: --- Summary: Use SQLContext to query CarbonData directly without creating table Key: CARBONDATA-240 URL: https://issues.apache.org/jira/browse/CARBONDATA-240 Project

[jira] [Created] (CARBONDATA-212) Use SQLContext to read CarbonData file

2016-09-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-212: --- Summary: Use SQLContext to read CarbonData file Key: CARBONDATA-212 URL: https://issues.apache.org/jira/browse/CARBONDATA-212 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-211) Support compress CarbonData file create table options

2016-09-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-211: --- Summary: Support compress CarbonData file create table options Key: CARBONDATA-211 URL: https://issues.apache.org/jira/browse/CARBONDATA-211 Project: CarbonData

[jira] [Created] (CARBONDATA-210) Support loading compressed CSV file

2016-09-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-210: --- Summary: Support loading compressed CSV file Key: CARBONDATA-210 URL: https://issues.apache.org/jira/browse/CARBONDATA-210 Project: CarbonData Issue Type: Bug

Re: [VOTE] Apache CarbonData 0.1.0-incubating release

2016-08-20 Thread Jacky Li
+1 (binding) > 在 2016年8月20日,上午11:00,Jihong Ma 写道: > > +1 (binding) > > Great work! > > Jihong > > -Original Message- > From: chenliang613 [mailto:chenliang6...@gmail.com] > Sent: Friday, August 19, 2016 7:33 PM > To: dev@carbondata.incubator.apache.org >

Re: Open Discussion:Apache CarbonData Roadmap

2016-08-09 Thread Jacky Li
I think William’s point is valid, we should focus mainly on usability improvement in 0.2.0 Besides what Liang has pointed out, I have a brief list in mind that can be planned in several releases, if they make sense for the community users. They are mainly for more integration and more

Re: [PROPOSAL] How to merge a pull request

2016-08-09 Thread Jacky Li
definitely +1 > 在 2016年8月9日,下午1:33,Jean-Baptiste Onofré 写道: > > Yes good idea. > > I'm thinking about a github PR template too as we use in Beam. > > Regards > JB > > On 08/09/2016 07:31 AM, Henry Saputra wrote: >> This is great stuff, thanks for taking stab at it, JB. >>

[jira] [Created] (CARBONDATA-61) Change Cube to Table

2016-07-18 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-61: -- Summary: Change Cube to Table Key: CARBONDATA-61 URL: https://issues.apache.org/jira/browse/CARBONDATA-61 Project: CarbonData Issue Type: Bug Affects