Re: [ANNOUNCE] Apache CarbonData 1.0.0-incubating released

2017-02-06 Thread Liang Chen
-hoc queries. How can > I leverage CarbonData for my business, please? > > On Sun, Feb 5, 2017 at 5:27 PM, Liang Chen <chenliang6...@gmail.com> > wrote: > > > Hi xiaoqiao > > > > Very happy to see that you will keep contributing on CarbonData, "Do

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

2017-02-06 Thread Liang Chen
Hi I used the below method in spark shell for DEMO, for your reference: import org.apache.spark.sql.catalyst.util._ benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male" and $"province" === "NB" and $"singler" === "false").count } Regards Liang 2017-02-06 22:07 GMT-05:00

Re: why there are no join in the official benchmark test

2017-02-06 Thread Liang Chen
Hi We are test based on TPC-H/TPC-DS benchmark, the report will be shared soon. Regards Liang 2017-02-07 1:28 GMT-05:00 Yinwei Li <251469...@qq.com>: > Hi all, > > > In Apache CarbonData Performance Benchmark(0.1.0) there are no join in > all SQLs, what's the main reason? > > > I want to

Re: store location can't be found

2017-02-03 Thread Liang Chen
Hi Have you configured as per the guide : https://github.com/apache/incubator-carbondata/blob/master/docs/installation-guide.md Regards Liang 2017-02-04 10:42 GMT+08:00 Mars Xu : > Hello All, > I met a problem of file not exist. it looks like the store >

Re: [ANNOUNCE] Apache CarbonData 1.0.0-incubating released

2017-02-05 Thread Liang Chen
Hi xiaoqiao Very happy to see that you will keep contributing on CarbonData, "Double Array Trie" is really a good feature to improve dictionary part. Yes, CarbonData's goal is for solving complex and diversity scenarios. Please let us(community) know if you deploy CarbonData on scenario system

Re: Introducing V3 format.

2017-02-15 Thread Liang Chen
ld store. So > backward compatibility works even though we jump to V3 format. > > Regards, > Ravindra. > > On 16 February 2017 at 04:18, Liang Chen > chenliang6136@ > wrote: > >> Hi Ravi >> >> Thank you bringing the discussion to mailing list, i h

Re: [DISCUSS] Graduation to a TLP (Top Level Project)

2017-02-20 Thread Liang Chen
Hi JB Thanks for you started the discussion and driving it. I will ping you by skype and email to complete some TODO tasks. One query:for license analysis section, why are there many unknown licenses? do we need to fix it ? Regards Liang -- View this message in context:

Re: data lost when loading data from csv file to carbon table

2017-02-20 Thread Liang Chen
Hi Already raised one JIAR issue:How to handle the bad records. https://issues.apache.org/jira/browse/CARBONDATA-714 Regards Liang -- View this message in context:

Re: carbondata vs. impala performance test under benchmark tpc-ds

2017-02-25 Thread Liang Chen
Hi Thank you shared the test result. It would be more reasonable if you could do the test comparison with same compute engine. Spark 2.1+parquet , Spark 2.1+carbondata. Are you interested in participating in doing this test along with us.(carbondata,parquet) Regards Liang 李寅威 wrote > Hi all,

Re: Introducing V3 format.

2017-02-15 Thread Liang Chen
Hi Ravi Thank you bringing the discussion to mailing list, i have one question: how to ensure backward-compatible after introducing the new format. Regards Liang Jean-Baptiste Onofré wrote > Agree. > > +1 > > Regards > JB > > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal >

Re: Exception throws when I load data using carbondata-1.0.0

2017-02-15 Thread Liang Chen
Hi He xiaoqiao Quick start is local model spark. Your case is yarn cluster , please check : https://github.com/apache/incubator-carbondata/blob/master/docs/installation-guide.md Regards Liang 2017-02-15 3:29 GMT-08:00 Xiaoqiao He : > hi Manish Gupta, > > Thanks for you

Re: question about presto integration

2017-01-17 Thread Liang Chen
Hi 1.Yes, CarbonData would consider to make broader integration with different engine, include presto. 2.As i know ,one contributor who is from ctrip is working on integration between CarbonData and Presto, once this contributor finish it, this feature will be considered into roadmap. Regards

Re: Re: Failed to APPEND_FILE, hadoop.hdfs.protocol.AlreadyBeingCreatedException

2017-01-20 Thread Liang Chen
Hi mvn -DskipTests -Pspark-1.5 -Dspark.version=1.5.2 clean package Please refer to build doc: https://github.com/apache/incubator-carbondata/tree/master/build Regards Liang 2017-01-20 16:00 GMT+08:00 彭 : > I build the jar with hadoop2.6, like "mvn package -DskipTests >

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Liang Chen
Hi A couple of questions: 1) For SORT_KEY option: only build "MDK index, inverted index, minmax index" for these columns which be specified into the option(SORT_KEY) ? 2) If users don't specify TABLE_DICTIONARY, then all columns don't make dictionary encoding, and all shuffle operations are

Re: List the supported datatypes in carbondata

2016-11-08 Thread Liang Chen
Hi Please find the data type list: https://cwiki.apache.org/confluence/display/CARBONDATA/Carbon+Data+Types Regards Liang cenyuhai wrote > I think we should make it clear that what datatypes are supported in > carbondata. > > these types are confused (int or integer, short or smallint, long or

As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-07 Thread Liang Chen
Hi all In 0.2.0 version of CarbonData, there are major performance improvements like blocklets distribution, support BZIP2 compressed files, and so on added to enhance the CarbonData performance significantly. Along with performance improvement, there are new features added to enhance

Re: join mail list

2016-11-11 Thread Liang Chen
Please send mail to : "dev-subscr...@carbondata.incubator.apache.org" for automatically joining. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/join-mail-list-tp2838p2869.html Sent from the Apache CarbonData Mailing List archive mailing

Re: List of File Formats supported to Load Data

2016-11-06 Thread Liang Chen
Hi Apache CarbonData has made good integration with Apache Spark, so you can first write any type of data to dataframe, then transfer dataframe to CarbonData. Regards Liang Pallavi Singh wrote > Hi, > > Can you please specify what all file formats are supported for Loading > Data > into Carbon

Re: Create table with columns contains spaces in name.

2016-10-19 Thread Liang Chen
Hi Harmeet Thank you reported this issue. Would you like to fix this issue? Regards Liang Harmeet Singh wrote > Thanks ravi, I will be raise on Jira. -- View this message in context:

Re: please vote and comment: remove thrift solution

2016-10-24 Thread Liang Chen
Hi I prefer to the new solution for fixing thrift issues:Directly use java code ( thrift compiler compile carbondata format files to java code) to build, then users don't need to do any thrift installation. +1 for new solution. Regards Liang QiangCai wrote > Hi > > Currently, There are two

Re: please vote and comment: remove thrift solution

2016-10-25 Thread Liang Chen
r from repo. If anybody wants to change the thrift > format code then he can compile thrift format code by using separate maven > profile and upload the jar to snapshot repo. > > > Thanks, > Ravi. > > On 24 October 2016 at 14:09, Liang Chen > chenliang6136@ > wrote: &

Beijing Apache CarbonData meetup:https://www.meetup.com/Apache-Carbondata-Meetup/events/235013117/

2016-10-21 Thread Liang Chen
Hi all Saturday, October 29, 2016 1:30 PM to 5:30 PM You can apply through this link : https://www.meetup.com/Apache-Carbondata-Meetup/events/235013117/ Regards Liang

Re: Unable to perform compaction,

2016-10-21 Thread Liang Chen
Hi Can you provide the detail test steps and error logs Regards Liang 2016-10-20 19:35 GMT+08:00 prabhatkashyap : > Hello, > There is some issue with compaction. Auto and force compaction are not > working. > In spark logs I got this error: > > > > > > > > > > -- >

Re: Single Pass Data Load Design

2016-11-14 Thread Liang Chen
Hi Yes, good feature. This improvement would significantly improve data load performance. Can you provide a sequence diagram for the whole data load process? Regards Liang 2016-11-14 15:42 GMT+08:00 Jacky Li : > Hi Ravindra, > > Thanks for proposing this design. It is

Re: [Feature ]Design Document for Update/Delete support in CarbonData

2016-11-22 Thread Liang Chen
Hi Aniket Thanks you finished the good design documents. A couple of inputs from my side: 1.Please add the below mentioned info(Rowid definition etc.) to design documents also. 2.In page6 :"Schema change operation can run in parallel with Update or Delte operations, but not with another schema

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-24 Thread Liang Chen
Hi Ya, good proposal. CarbonData 0.x version integrate with spark 1.x, and the load data solution of 0.x version is using kettle. CarbonData 1.x version integrate with spark 2.x, the load data solution of 1.x version will not use kettle . That would be helpful to reduce maintenance cost through

Re: Re: CarbonData propose major version number increment for nextversion (to 1.0.0)

2016-11-25 Thread Liang Chen
sh gupta" > tomanishgupta18@ > wrote: > >> +1 >> >> Regards >> Manish Gupta >> >> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal > kumarvishal1802@ > >> wrote: >> >> > +1 >> > >> > -Regards >>

Re: carbon data

2016-11-28 Thread Liang Chen
Hi Lionel Don't need to create table first, please find the example code in ExampleUtils.scala df.write .format("carbondata") .option("tableName", tableName) .option("compress", "true") .option("useKettle", "false") .mode(mode) .save() Preparing API docs is in progress.

Please vote and advise on building thrift files

2016-11-16 Thread Liang Chen
Hi all Please vote the below proposals or advise other better proposal on building thrift files. --- CarbonData is file format and introduce Apache thrift for supporting multiple languages and any language can

Re: [ANN] Kumar Vishal as new CarbonData committer

2016-11-01 Thread Liang Chen
Hi Kumar Vishal Congrats to you and welcome aboard! Regards Liang 2016-11-01 14:23 GMT+08:00 Jean-Baptiste Onofré : > Hi all, > > I'm pleased to announce that the PPMC has invited Kumar Vishal as new > CarbonData committer, and the invite has been accepted ! > > Congrats to

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Liang Chen
Hi jihong I am not sure that users can accept to use extra tool to do this work, because provide tool or do scan at first time per table for most of global dict are same cost from users perspective, and maintain the dict file also be same cost, they always expecting that system can automatically

Re: Disscusion shall CI support run carbondata based on multi version spark?

2016-10-13 Thread Liang Chen
Yes, need to solve it , the CI should support different spark version. Regards Liang zhujin wrote > One issue: > I modified the spark.version in pom.xml,using spark1.6.2, then compliation > failed. > > > Root cause: > There was a "unused import statement" warinng in CarbonOptimizer class >

Re: Discussion how to crate the CarbonData table with good performance

2016-10-16 Thread Liang Chen
Hi Thanks for you shared these experience. Can you put these FAQ to CWIKI: https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Home Regards Liang bill.zhou wrote > Discussion how to crate the CarbonData table with good performance > Suggestion to create Carbon table >

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-16 Thread Liang Chen
Hi Vimal Thank you started the discussion. For keys of Map data only can be primitive, can you list these type which will be supported? (Int,String,Double.. For discussing more conveniently, you can go ahead to use google docs. After the design document finalized , please archive and upload it

Re: save dataframe error, why loading ./TEMPCSV ?

2016-12-13 Thread Liang Chen
Hi tempCSV just is a temp folder, will be deleted after finishing load data to carbon table. You can set some breakpoints to debug example DataFrameAPIExample.scala , you will find the temp folder. Regards Liang Regards Liang 2016-12-14 13:55 GMT+08:00 Li Peng : >

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Liang Chen
Hi Jacky Thanks you started a good discussion. see if i understand your points: Scenario1 likes the current load data solution(0.2.0). 1.0.0 Will provide a new solution option of "single-pass data loading" to meet this kind of scenario: For subsequent data loads if the most dictionary code has

Re: error when save DF to carbondata file

2016-12-13 Thread Liang Chen
Hi As discussed, please use 0.2.0 version, and use load method. 2016-12-13 14:08 GMT+08:00 Lu Cao : > Hi Dev team, > I run spark-shell in my local spark standalone mode. It returned error > > java.io.IOException: No input paths specified in job > > when I was trying

Re: [Discussion] Parsing values during data load should adopt a strict check or lenient check mechanism

2016-12-06 Thread Liang Chen
Hi Thank you started a good discussion. I propose to do strict check mechanism to avoid these problems what you mentioned in the below. And the behavior should be same for both dimensions and measures. In a word , need to process the actual data type as per users input. Regards Liang

Re: Hi dev,Apache CarbonData CI now is working for auto-checking all PRs

2016-12-06 Thread Liang Chen
Hi Share the full picture with all of you about Apache CarbonData CI. -- 1.CI Environment For supporting more complex CI test(like cluster), we built the Apache CarbonData Jenkins CI which is running in cloud machine machine with IP

Re: carbondata test join question

2016-12-14 Thread Liang Chen
Hi geda As we know, CarbonData's key feature is index. About tuning SQL, you can refer to : https://cwiki.apache.org/confluence/display/CARBONDATA/FAQ Regards Liang -- View this message in context:

Re: [Discussion] Some confused properties

2016-12-08 Thread Liang Chen
Hi Thanks you started the discussion. the storelocation is for storing all CarbonData files. Regards Liang cenyuhai wrote > Hi, all: > I am trying to use carbon, but I am confused about the properties as > blow: > > > carbon.storelocation=hdfs://hacluster/Opt/CarbonStore > #Base

Re: carbondata-0.2 load data failed in yarn molde

2016-12-08 Thread Liang Chen
Hi Have you solved this issue after applying new configurations? Regards Liang geda wrote > hello: > i test data in spark locak model ,then load data inpath to table ,works > well. > but when i use yarn-client modle, with 1w rows , size :940k ,but error > happend ,there is no lock find in

Re: query on carbondata table return error.

2016-12-08 Thread Liang Chen
Hi Can you raise one JIRA to report this issue? Regards Liang Cao Lu 曹鲁 wrote > Hi dev team, > I build the carbondata from master branch and distributed to the spark on > yarn cluster. > The data successfully loaded and count(*) is OK, but when I tried to query > the detail data, it returns

Re: About hive integration

2016-12-08 Thread Liang Chen
Hi Agree. Hive has been widely used, this is a consensus。 Apache CarbonData community already have the plan to support hive integration, look forward to seeing your contribution on hive integration also :) Regards Liang cenyuhai wrote > Hi, all: > Now carbondata is not working in hive

[VOTE] Apache CarbonData 1.0.0-incubating (RC1)

2017-01-10 Thread Liang Chen
Hi Please vote on releasing the following candidate as Apache CarbonData version 1.0.0. The vote will be open at least for 72 hours, If this vote passes (we need at least 3 binding votes, meaning three votes from the PPMC), I will forward to gene...@incubator.apache.org for the IPMC votes. [ ]

Re: [jira] [Created] (CARBONDATA-624) Complete CarbonData document to be present in git and the same needs to sync with the carbondata.apace.org and for further updates.

2017-01-11 Thread Liang Chen
OK, thank you start this work. One thing please notice : Please only put .md files to github, don't suggest adding other kind of files to github, like pdf,text and so on. Regards Liang -- View this message in context:

Re: [VOTE] Apache CarbonData 1.0.0-incubating (RC1)

2017-01-10 Thread Liang Chen
[+1] Support Spark2.1 > [+1]New load data solution without kettle > [-1] IUD(Supported by Spark 1.5) > [+1]Performance improvement > > > > > > On Jan 11, 2017, 12:14 AM +0800, Liang Chen , wrote: > > Hi > > > > Please vote on releasing the following

Re: [VOTE] Apache CarbonData 1.0.0-incubating (RC1)

2017-01-10 Thread Liang Chen
t; Thanks for the wonderful working. > > I am very interesting and want the following features from a customer view. > > > > [+1] Support Spark2.1 > [+1]New load data solution without kettle > [-1] IUD(Supported by Spark 1.5) > [+1]Performance improvement > > > > > &g

Re: Problem while copying file from local store to carbon store

2017-01-09 Thread Liang Chen
Hi Please use spark-shell to create carboncontext, you can refer to these articles : https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497 Regards Liang -- View this message in context:

Re: Problem while copying file from local store to carbon store

2017-01-09 Thread Liang Chen
putStream.open0(Native Method) > ... > INFO 10-01 10:29:59,547 - [test_table: Graph - > MDKeyGentest_table][partitionID:0] > ---logs print by liyinwei end - > ERROR 10-01 10:29:59,547 - [test_table: Graph - > MDKeyGentest_table][partiti

Hi dev,Apache CarbonData CI now is working for auto-checking all PRs

2016-12-05 Thread Liang Chen
Hi dev Apache CarbonData CI now is working for auto-checking all PRs. This is a job in Jenkins CI with name ApacheCarbonPRBuilder, which is running in cloud machine machine with IP http://136.243.101.176:8080/ , anybody can access this machine and check the build status and result. - When a

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-12-01 Thread Liang Chen
Hi Thanks for all of your comments, will change the current master-SNAPSHOT version to 1.0.0 Regards Liang Venkata Gollamudi wrote > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. >

Re: minor compact throw err 'IndexBuilderException'

2017-01-05 Thread Liang Chen
Hi 1.Just i tested at my machine for 0.2 version,it is working fine. - scala> cc.sql("ALTER TABLE connectdemo1 COMPACT 'MINOR'") INFO 05-01 23:46:54,111 - main Query [ALTER TABLE CONNECTDEMO1 COMPACT 'MINOR'] INFO 05-01

Re: how to make carbon run faster

2017-01-01 Thread Liang Chen
Hi Thanks for you started try Apache CarbonData project. There are may have various reasons for the test result,i assumed that you made time based partition for ORC data ,right ? 1.Can you tell that the SQL generated how many rows data? 2.You can try more SQL query, for example : select *

Re: [UT Fail Report] UT can not pass when run with branch master

2017-01-04 Thread Liang Chen
Hi It is fixed, now the master can pass compilation. Thanks for you pointed out it. Regards Liang hexiaoqiao wrote > UT fails when run with branch master of carbondata ( > https://github.com/apache/incubator-carbondata/tree/master). > > exception as following: > >>

Re: 回复: how to make carbon run faster

2017-01-04 Thread Liang Chen
Hi First: i suggest you reload data again, one time to load all 35G data , to check the query effectiveness again. Second: After you finish the above E2E test, you would understand the whole process of Carbon. then i suggest you start to read source code and some technical documents for further

Re: [Improvement] Carbon query gc problem

2016-12-19 Thread Liang Chen
Hi+1,Store data in offheap to avoid gc problem , the solution will help performance more. Kumar Vishal wrote > There are lots of gc when carbon is processing more number of > recordsduring query, which is impacting carbon query performance.To solve > this gcproblem happening when query output is

Re: carbondata Indenpdent reader

2016-12-20 Thread Liang Chen
Hi For Q1: Carbon Data be stored under storePath , it can specify anywhere. Under "storePath", there are two folders : Fact and Metadata. As per you provided info, you specified the "storePath" is load path, this is why you can not find info from hdfs. For Q2: Please refer to

Re: same query and I change the value than throw a error

2016-12-21 Thread Liang Chen
Hi Are you using hive client to run sql to query carbon table ? jdbc:hive2://172.12.1.24:1> select * from hotel_event_2 where c1 = "key_label_1_10" and c3 > "2005-11-18 00:28:02"; Regards Liang sailingYang wrote > hi I use

Re: How to compile the latest source code of carbondata

2016-12-18 Thread Liang Chen
ster ~]$ cd carbondata/bin/ > [hadoop@master bin]$ ll > total 8 > -rwxrwxr-x 1 hadoop hadoop 3879 Dec 19 14:54 carbon-spark-shell > -rwxrwxr-x 1 hadoop hadoop 2820 Dec 19 14:54 carbon-spark-sql > > > > is this phenomenon normal ? > > > > > > -

Re: [jira] [Created] (CARBONDATA-562) Carbon Context initialization is failed with spark 1.6.3

2016-12-24 Thread Liang Chen
Hi Babulal Spark didn't support spark 1.6.3 ,you can try spark 1.6.1 and 1.6.2. Please refer to : https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration Regards Liang 2016-12-25 13:51 GMT+08:00 Babulal (JIRA) : > Babulal created

Re: 回复: Dictionary file is locked for updation

2016-12-27 Thread Liang Chen
Hi Updated ,thanks for you pointed out the issue. Regards Liang 李寅威 wrote > thx QiangCai, the problem is solved. > > > so, maybe it's better to correct the document at > https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide, > change the value of

Re: [jira] [Created] (CARBONDATA-559) Job failed at last step

2016-12-25 Thread Liang Chen
Copied the below information from Apache JIRA. -- Hi Lionel Global dictionary is generated successfully but data loading graph is not started because it seems that kettle home at executor size is not set properly as displayed in logs. NFO 23-12 16:58:47,461 -

Re: [Discussion]Simplify the deployment of carbondata

2016-12-25 Thread Liang Chen
Hi Thanks you started a good discussion. For 1 and 2, i agree. In 1.0.0 version, will support it. For 3 : Need keep the parameter, users can specify carbon's store location. If users don't specify the carbon store location, can use the default location what you suggested:

Re: etl.DataLoadingException: The input file does not exist

2016-12-22 Thread Liang Chen
Hi This is because that you use cluster mode, but the input file is local file. 1.If you use cluster mode, please load hadoop files 2.If you just want to load local files, please use local mode. 李寅威 wrote > Hi, > > when i run the following script: > > > scala>val dataFilePath = new >

Re: discussion about benchmark standard that carbondata used

2017-01-15 Thread Liang Chen
Hi Agree. currently we are testing as per TPC-H. In the future will also test TPC-DS, do you want to join us together for the benchmark test works? Regards Liang 2017-01-16 8:58 GMT+08:00 251469031 <251469...@qq.com>: > Hi all, > > > Benchmark test can measure the performance of a system.

Re: Questions about dictionary-encoded column and MDK

2017-03-23 Thread Liang Chen
Hi Can you provide your full exception info. Regards Liang 2017-03-23 13:54 GMT+05:30 Jin Zhou : > Hi, > > Recently I'm doing some tests on spark2.1.0+carbondata1.0.0 and have some > questions: > > 1)Exception is thrown when table created without any dictionary column. > Does

Re: [DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release

2017-03-26 Thread Liang Chen
Hi Yes, update and delete feature with spark-2.x, will be supported after 1.1.0. As planed , 1.2 would support it or earlier. Regards Liang xm_zzc wrote > Hi, does this version support for the updating and deleting with > spark-2.1? Seems like it does not support, what time is it planned to >

Re: [DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release

2017-03-26 Thread Liang Chen
Hi +1 for starting to prepare new release 1.1 Great progress, new file format V3 would significantly improve performance. Regards Liang 2017-03-26 10:46 GMT+05:30 Ravindra Pesala : > Hi All, > > As planned we are going to release Apache CarbonData-1.1.0. Please discuss >

Re: Re:Re:Re:Re:Re:Re: insert into carbon table failed

2017-03-27 Thread Liang Chen
Hi Please enable vector , it might help limit query. import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.constants.CarbonCommonConstants CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER, "true") Regards Liang a wrote

Re: Re:Re:Re:Re:Re:Re: insert into carbon table failed

2017-03-26 Thread Liang Chen
Hi 1.Use your current test environment (CarbonData 1.0 + Spark1.6), Please divide 2 billions data into 4 pieces(each is 0.5 billion), load data again. 2.For CarbonData 1.0 + Spark1.6 with kettle for loading data, please configure the bellow 3 parameters in carbon.properties(note: please copy

Re: insert into carbon table failed

2017-03-25 Thread Liang Chen
Hi Please provide all columns' cardinality info(distinct value). Regards Liang ww...@163.com wrote > Hello! > > 0、The failure > When i insert into carbon table,i encounter failure。The failure is as > follow: > Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most >

Re: Need help in configuring dataload.properties

2017-03-30 Thread Liang Chen
Hi Please refer to : https://github.com/apache/incubator-carbondata/blob/master/docs/installation-guide.md Regards Liang 2017-03-30 19:19 GMT+05:30 Srinath Thota : > Hi Team, > > > I have configured Carbon in spark standalone mode as per the documents and > available

Re: Questions about dictionary-encoded column and MDK

2017-03-23 Thread Liang Chen
Hi 1.System makes MDK index for dimensions(string columns as dimensions, numeric columns as measures) , so you have to specify at least one dimension(string column) for building MDK index. 2.You can set numeric column with DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to build MDK index. For case2,

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

2017-03-23 Thread Liang Chen
lter table hive_carbon add columns(name string, scale decimal, country > string, salary double); > > > > > > 6.check table schema > > > execute "show create table hive_carbon" > > > > > > 7. execute "select * from hive_carbon" and "

Re: Re:Re: Re: Optimize Order By + Limit Query

2017-03-30 Thread Liang Chen
Hi +1 for simafengyun's optimization, it looks good to me. I propose to do "limit" pushdown first, similar with filter pushdown. what is your opionion? @simafengyun For "order by" pushdown, let us work out an ideal solution to consider all aggregation push down cases. Ravindara's comment is

Re: [DISCUSSION]: (New Feature) Streaming Ingestion into CarbonData

2017-03-29 Thread Liang Chen
Hi Aniket Thanks for your great contribution, The feature of ingestion streaming data to carbondata would be very useful for some real-time query scenarios. Some inputs from my side: 1. I agree with approach 2 for streaming file format, the performance for query must be ensured. 2. Whether

Re: question about dimension's sort order in blocklet level

2017-03-27 Thread Liang Chen
Hi Can you provide one table to show your info, can't see very clear? The column of high cardinality(>100) would not do dictionary. Regards Liang 2017-03-27 14:32 GMT+05:30 马云 : > Hi DEV, > > I create table according to the below SQL > > cc.sql(""" > >

Re: carbondata find a bug

2017-03-27 Thread Liang Chen
Hi tianli First, please send mail to dev-subscr...@carbondata.incubator.apache.org for joining mailing list group. Then you can send and receive mail from dev@carbondata.incubator.apache.org. Can you raise one JIRA at https://issues.apache.org/jira/browse/CARBONDATA, and raise one pull request

Re: Questions about dictionary-encoded column and MDK

2017-03-25 Thread Liang Chen
lared in create table statement > > On Thu, Mar 23, 2017 at 11:51 PM, Liang Chen <chenliang6...@gmail.com> > wrote: > > > Hi > > > > 1.System makes MDK index for dimensions(string columns as dimensions, > > numeric > > columns as measures) , so you have to

Re: Problem with creating a table in Spark 2.

2017-04-03 Thread Liang Chen
Hi Please check if the below path is correct in your machine? /user/hive/warehouse/carbon/ Regards Liang 2017-04-03 18:05 GMT+05:30 Marek Wiewiorka : > Hi All - I'm trying to follow an example from the quick start guide and in > spark-shell trying to create a

Re: Dimension column of integer type - to exclude from dictionary

2017-04-04 Thread Liang Chen
Hi Sanoj First , see if i understand your requirement: you only want to build index for column "Account", but don't want to build dictionary for column "Account", is it right? If the above my understanding is right, then David mentioned "SORT_COLUMNS" feature will satisfy your requirements.

Re: 关于加载数据字典的问题

2017-04-01 Thread Liang Chen
t I don't know which side of the generated dictionary file path > > > -- 原始邮件 ------ > *发件人:* "Liang Chen";<chenliang...@apache.org>; > *发送时间:* 2017年4月1日(星期六) 下午4:49 > *收件人:* "于天星"<784606...@qq.com>; > *主题:* Re: 关于加载数据字典

Re: [DISCUSSION]implement delta encoding for numeric type column in SORT_COLUMNS

2017-04-05 Thread Liang Chen
Hi David Thanks for your starting this new feature's discussion. Can you explain what are the major benefits after doing delta encoding for the numeric type column. Regards Liang 2017-04-05 16:01 GMT+05:30 QiangCai : > Hi all, > > Now we plan to implement delta encoding

Re: CarbonData performance benchmkaring

2017-04-12 Thread Liang Chen
Hi 1.Did you use the latest master version , or 1.0 ? suggest you use master to test 2.Have you tested other TPC-H query which including where/filter? 3.In your case, the query is slow ? or the below "write.format" is slow ? write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/")

Re: java.io.FileNotFoundException: file:/data/carbon_data/default/carbon_table/Metadata/schema.write

2017-04-15 Thread Liang Chen
Hi Please check if you have the right for the directory: Constants.METASTORE_DB you can use "chmod" to add right. Regards Liang xm_zzc wrote > Hi all: > Please help. I directly ran a CarbonData demo program on Eclipse, which > copy from >

Re: [New Feature] Alter table support in carbondata

2017-03-09 Thread Liang Chen
Hi Thanks for you started this discussion for alter table feature. A couple of comments: 1.For "change of data type" , whether only support from INT to BIGINT, or not ? 2.Whether support adjust the order of columns for MDK , and make compaction to resort data as per the new order of columns , or

Re: I loaded the data with the timestamp field unsuccessful

2017-03-08 Thread Liang Chen
Hi If the issue has be fixed? BTW, you don't need add date column to DICTIONARY_INCLUDE, it do index for date/timestamp columns. Regards Liang kex wrote > I loaded the data with the timestamp field unsuccessful,and timestamp > field is null. > > my sql: > carbon.sql("create TABLE IF NOT EXISTS

Re: Removing of kettle code from Carbondata

2017-03-10 Thread Liang Chen
Hi Agree, +1. The new data load(through spark) is quite stable and good performance, so i agree to remove kettle flow for data loading. Regards Liang 2017-03-11 9:51 GMT+08:00 Ravindra Pesala : > Hi All, > > I guess it is time to remove the kettle flow from Carbondata

Apache CarbonData got the BLACKDUCK award: https://www.blackducksoftware.com/open-source-rookies-2016

2017-03-10 Thread Liang Chen
Hi ALL *Apache CarbonData got the BLACKDUCK award: * https://www.blackducksoftware.com/open-source-rookies-2016: For nine years, the Black Duck Open Source Rookies of the Year awards have recognized some of the most innovative and influential open source projects launched during the previous

Re: Apache CarbonData online meetup on 13th Mar,2017

2017-03-08 Thread Liang Chen
Hi phalodi Sorry for this. Apache CarbonData community will organize meetup in India soon. Regards Liang phalodi wrote > Hi , I also want to join this meetup but when i register for the meetup > and proceed to pay it will not show the indian banks for payment options. > > On Tue, Mar 7, 2017

Apache CarbonData online meetup on 13th Mar,2017

2017-03-06 Thread Liang Chen
Hi all Welcome to attend Apache CarbonData online meetup on 13th Mar,2017, you can register at : http://edu.csdn.net/huiyiCourse/detail/342 This meetup will focus on introducing code modules. Regards Liang -- View this message in context:

Re: [DISCUSSION] Forceful minor Compaction

2017-04-19 Thread Liang Chen
Hi Kunal Thank you for taking the good topic for discussion. First , let us think about : why users want to do forceful minor compaction, which cases? Current "MAJOR compaction" whether can cover "forceful MINOR compaction" scenarios ? As we know, compaction is mainly for optimizing index

[jira] [Created] (CARBONDATA-694) Optimize quick start document through adding hdfs as storepath

2017-02-04 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-694: - Summary: Optimize quick start document through adding hdfs as storepath Key: CARBONDATA-694 URL: https://issues.apache.org/jira/browse/CARBONDATA-694 Project

[jira] [Created] (CARBONDATA-695) Create DataFrame example in example/spark2, read carbon data to dataframe

2017-02-04 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-695: - Summary: Create DataFrame example in example/spark2, read carbon data to dataframe Key: CARBONDATA-695 URL: https://issues.apache.org/jira/browse/CARBONDATA-695

[jira] [Created] (CARBONDATA-679) Add examples read CarbonData file to dataframe in Spark 2.1

2017-01-24 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-679: - Summary: Add examples read CarbonData file to dataframe in Spark 2.1 Key: CARBONDATA-679 URL: https://issues.apache.org/jira/browse/CARBONDATA-679 Project

[jira] [Created] (CARBONDATA-703) Update build command after optimizing thrift compile issues

2017-02-11 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-703: - Summary: Update build command after optimizing thrift compile issues Key: CARBONDATA-703 URL: https://issues.apache.org/jira/browse/CARBONDATA-703 Project

[jira] [Created] (CARBONDATA-343) Optimize the duplicated definition code in GlobalDictionaryUtil.scala

2016-10-27 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-343: - Summary: Optimize the duplicated definition code in GlobalDictionaryUtil.scala Key: CARBONDATA-343 URL: https://issues.apache.org/jira/browse/CARBONDATA-343

[jira] [Created] (CARBONDATA-336) Align the the name description

2016-10-22 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-336: - Summary: Align the the name description Key: CARBONDATA-336 URL: https://issues.apache.org/jira/browse/CARBONDATA-336 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-425) Improve build instruction, add BUILD.md file to describe the detailed step for building

2016-11-18 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-425: - Summary: Improve build instruction, add BUILD.md file to describe the detailed step for building Key: CARBONDATA-425 URL: https://issues.apache.org/jira/browse/CARBONDATA-425

  1   2   >