Re: [VOTE] Apache CarbonData 1.4.1(RC1) release

2018-07-31 Thread Liang Chen
ravipesala wrote > Hi > > > I submit the Apache CarbonData 1.4.1 (RC1) for your vote. > > > 1.Release Notes: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12343148 > > Some key features and improvements in this release: > >1. Supported Local di

Re: New-bie JIRAs for new contributors

2018-07-31 Thread Liang Chen
vikashtalanki wrote > Hi Vikash > > Welcome to Apache CarbonData community. > 1. Firstly, please let me know your apache jira account(email id), i will > add you as contributor. > 2. Secondly,You can run the simple example as per : > https://github.com/apache/carbondata/blob/master/docs/quick-star

[Discussion] Propose to upgrade the version of integration/presto from 0.187 to 0.206

2018-07-24 Thread Liang Chen
Hi Dev The presto community already released 0.206 last week (refer the detail at https://prestodb.io/docs/current/release/release-0.206.html), this release fixed many issues, so propose Apache CarbonData community to upgrade to the latest presto version for carbondata integration. please provid

Re: Index file cache will not work when the table has invalid segment.

2018-07-12 Thread Liang Chen
Hi Currently, CarbonData doesn't support map data type Regards Liang carbondata-newuser wrote > Carbon version is 1.4 rc2. > create table( > col1 string, > col2 int, > col2 string, > date string > ) > > *First step:* > insert into table carbonTest select col1,col2,col3,"20180707" from > hiveTa

Re: Carbondata集成Presto的问题请教

2018-06-14 Thread Liang Chen
Hi Please send your questions to mailing list.(cc to mailing list) Currently, "presto read streaming carbondata table" is not supporting. Can you share with the community , why need to support this feature, what are your exact requirements? Regards Liang kevintop 于2018年6月13日周三 上午9:48写道: > 陈总

Updated release notes . Re: [ANNOUNCE] Apache CarbonData 1.4.0 release

2018-06-05 Thread Liang Chen
Hi Please find the updated 1.4.0 release notes: https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+1.4.0+Release Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Support updating/deleting data for stream table

2018-06-03 Thread Liang Chen
data for one year, > he > need to delete one year ago of data everyday. On the other hand, solution > 2 > is more complicated than solution 1, we need to consider the implement of > solution 2 in depth. > Based on the above reasons, Liang Chen, Jacky, David and I prefered to >

Re: MODERATE for dev@carbondata.apache.org

2018-06-03 Thread Liang Chen
Hi 1. You can get table detail info with the below script: sql("desc formatted xx_your tablename") 2. You can find the more detail docs about datamap at : ../docs/datamap Regards Liang 2018-05-31 17:59 GMT+08:00 < dev-reject-1527760781.11669.gamfjekkdhlpcbigj...@carbondata.apache.org>: > > To

[ANNOUNCE] Apache CarbonData 1.4.0 release

2018-06-01 Thread Liang Chen
Hi Apache CarbonData community is pleased to announce the release of the Version 1.4.0 in The Apache Software Foundation (ASF). CarbonData is a high-performance big data store solution that supports fast filter lookups and ad-hoc OLAP analysis. Due to varied business driven analysis, and the dema

Re: Support updating/deleting data for stream table

2018-05-30 Thread Liang Chen
Hi Thank you started this discussion thread. Agree with solution1, use the easy way to delete data for stream table. Regards Liang xm_zzc wrote > Hi dev: > Sometimes we need to delete some historical data from stream table to > make > the table size not too large, but currently the stream tabl

Re: after load data using SaveMode.Overwrite, query through beeline return all null field

2018-05-23 Thread Liang Chen
Hi Thank you reported this issue. Let us check it and response to you asap. Regards Liang 喜之郎 wrote > hi dev. > carbon version :1.3.1 > spark version:2.2.1 > 1) First I create a carbon table through beeline. > 2) Then I use spark-submit and dataframe load data to carbon. Query is OK。 > 3) Then

Re: [VOTE] Apache CarbonData 1.4.0(RC2) release

2018-05-23 Thread Liang Chen
Hi +1(binding) Regards Liang ravipesala wrote > Hi > > > I submit the Apache CarbonData 1.4.0 (RC2) for your vote. > > > 1.Release Notes: > > *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=1234100 > 1524823736.loc

[ANNOUNCE] Chuanyin Xu as new Apache CarbonData committer

2018-05-01 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Chuanyin Xu as new Apache CarbonData committer, and the invite has been accepted! Congrats to Chuanyin Xu and welcome aboard. Regards Apache CarbonData PMC

[ANNOUNCE] Zhichao Zhang as new Apache CarbonData committer

2018-05-01 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Zhichao Zhang as new Apache CarbonData committer, and the invite has been accepted! Congrats to Zhichao Zhang and welcome aboard. Regards Apache CarbonData PMC

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-04-26 Thread Liang Chen
Hi Ravi Good thinking. Because the inverted index columns by default are the same as sort_column columns, from the user perspective, he only need to set no_inverted_index columns in sort_column columns, so i proposed to display only the no_inverted_index columns info which be set by user. Anyw

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-04-25 Thread Liang Chen
Hi Attaching my proposed "desc_table_info": desc_table_info.txt Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-04-25 Thread Liang Chen
Hi Thank you started the discussion. Propose to completely optimize this part, my suggestion as below : CREATE TABLE IF NOT EXISTS test_table ( id INT COMMENT 'device id for sensor XYZ', name STRING, salary LONG, tax DOUBLE ) PARTITIONED BY (city STRING) STORED BY 'carbondata

Re: query on string type return error

2018-04-16 Thread Liang Chen
Hi >From the log message, seems like can't find the data files. Can you provide more detail info : 1. How you created carbonsession and how loaded data. 2. Have you deployed cluster or only single machine? Regards Liang 喜之郎 wrote > hi all, when I use carbondata to run a query "select count(*)

Re: Problem on carbondata quering performance tuning

2018-04-02 Thread Liang Chen
HI Which carbondata+spark version? and can you provide the full configuration inside "carbondata.properties" Mick Yuan wrote > Hi,all > I have a quering performane tuning case on carbondata. > > *Enviroment is as below:*: > spark on yarn > 4 nodemanagers > 102G,55 cores each nodema

Re: Storing Data Frame as CarbonData Table

2018-04-01 Thread Liang Chen
Hi Michael Yes, it is very easy to save any spark data to carbondata. Just need to do small change based on your script, as below : myDF.write .format("carbondata") .option("tableName" "MyTable") .mode(SaveMode.Overwrite) .save() For more detail, you can refer to examples: https://github.

Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

2018-03-23 Thread Liang Chen
Hi Already arrange to fix this issue, will raise the pull request asap. Thanks for your feedback. Regards Liang yixu2001 wrote > dev > This issue has caused great trouble for our production. I will appreciate > if you have any plan to fix it and let me know. > > > yixu2001 > > From: Babu

Re: Getting [Problem in loading segment blocks] error after doing multi update operations

2018-03-20 Thread Liang Chen
Hi Thanks for your feedback. Let me first reproduce this issue, and check the detail. Regards Liang yixu2001 wrote > I'm using carbondata1.3+spark2.1.1+hadoop2.7.1 to do multi update > operations > here is the replay step: > > import org.apache.spark.sql.SparkSession > import org.apache.spark.

Re: [Discussion] About syntax of compaction on specified segments

2018-03-14 Thread Liang Chen
Hi Thank jinzhou started this discussion session. I also propose to use the proposed solution from manish, not impacts the current Major and Minor compaction behaviors. Regards Liang manishgupta88 wrote > Hi, > > I agree with @gvramana ; > >1. We should *not u

[ANNOUNCE] Apache CarbonData 1.3.1 release

2018-03-13 Thread Liang Chen
Hi The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.3.1. We encourage everyone to download the release https://dist.apache.org/repos/dist/release/carbondata/1.3.1/, and feedback via mailing list(dev@carbondata.apache.org,u...@carbondata.apache.org)

Re: Carbondata Installation

2018-03-13 Thread Liang Chen
Hi Please first join mailing list via sending mail to "dev-subscr...@carbondata.apache.org", then all mailinglist group members could see your mails and give your timely help. Can you provide the detail info for the issues, you can see quickstart document at : https://github.com/apache/carbondat

Re: [VOTE] Apache CarbonData 1.3.1(RC1) release

2018-03-05 Thread Liang Chen
Hi +1(binding) Regards Liang ravipesala wrote > Hi > > I submit the Apache CarbonData 1.3.1 (RC1) for your vote. > > 1.Release Notes: > *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12342754 >

Fwd: Travel Assistance applications open. Please inform your communities

2018-02-18 Thread Liang Chen
Forward ApacheCon info. -- Forwarded message -- From: Gavin McDonald Date: 2018-02-14 17:34 GMT+08:00 Subject: Travel Assistance applications open. Please inform your communities To: travel-assista...@apache.org Hello PMCs. Please could you forward on the below email to your de

[ANNOUNCE] Apache CarbonData 1.3.0 release

2018-02-09 Thread Liang Chen
Hi The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.3.0. What’s New in Version 1.3.0? In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData. Support Spark 2.2.1 S

Re: [VOTE] Apache CarbonData 1.3.0(RC2) release

2018-02-03 Thread Liang Chen
Hi +1(binding) Regards Liang 2018-02-04 5:54 GMT+08:00 Ravindra Pesala : > Hi > > I submit the Apache CarbonData 1.3.0 (RC2) for your vote. > > 1.Release Notes: > *https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12320220&version=12341004 >

Re: Help, carbondata issues on spark

2018-02-03 Thread Liang Chen
Hi 1.no multiple levels partitions , we need three levels partitions, like year,day,hour Reply : Year,day,hour belong to one column(field) or three columns ? Can you explain, what are your exact scenarios? we can help you to design partition + sort columns to solve your specific query iss

Re: 分区表load数据然后update,结果数据被delete

2018-02-02 Thread Liang Chen
> worker-10][partitionID:test3;queryID:376657526113286] sort scope is set > to LOCAL_SORT > 18/02/02 10:06:55 AUDIT CarbonDataRDDFactory$: > [ubuntu][bigdata][Thread-1]Data update is successful for default.test3 > ++ > || > ++ > ++ > > > scala> carbon.sql("SELECT * FROM t

Re: MODERATE for dev@carbondata.apache.org

2018-02-02 Thread Liang Chen
Hi Please join carbon mailing list, you can send mail to dev-subscr...@carbondata.apache.org and follow the guide to join. Please find my reply inline. 1.no multiple levels partitions , we need three levels partitions, like year,day,hour Reply : Year,day,hour belong to one column(field) or thre

Re: Podling Report Reminder - February 2017

2018-01-30 Thread Liang Chen
gt; On Tue, Jan 30, 2018 at 1:14 AM Liang Chen > wrote: > >> Dear John >> >> Apache CarbonData has graduated in 2017, so i kindly remind IPMC to >> remove CarbonData from podling report list. >> >> Regards >> Liang >> >> 2018-01-30 12:20 GMT+0

Re: CarbonData保存CSV找不到方法com.univocity.parsers.csv.CsvWriterSettings.setQuoteEscapingEnabled

2018-01-24 Thread Liang Chen
ok,Thanks for your feedback. Please modify the pom file under processing, see if it can work in 1.2.0. com.univocity univocity-parsers 2.2.1 Regards Liang 2018-01-25 11:56 GMT+08:00 Luo Colin : > Chenliang, > > > >环境:Apache Spark 2.1, CarbonData 1.2, Java > > > >

Re: Select" query failed when executing "COMPACT" and "CLEAN".

2018-01-19 Thread Liang Chen
Hi I can't reproduce it with "spark2.1+ carbondata1.1.1" Maybe not completely finish compaction , then you do the query. Have you tried: Execute the query in the same shell after " compaction and clean"? Regards Liang yixu2001 wrote > dev > > > spark2.1+ carbondata1.1.1 > > "Select" query

Re: Should CarbonData need to integrate with Spark Streaming too?

2018-01-16 Thread Liang Chen
Hi Thanks for you started this discussion for adding spark streaming support. 1. Please try to utilize the current code(structured streaming), not adding separated logic code for spark streaming. 2. I suggest that by default is using structured streaming , please consider how to make configuratio

[ANNOUNCE] Kumar Vishal as new PMC for Apache CarbonData

2018-01-10 Thread Liang Chen
Hi We are pleased to announce that Kumar Vishal as new PMC for Apache CarbonData. Congrats to Kumar Vishal! Apache CarbonData PMC

[ANNOUNCE] David Cai as new PMC for Apache CarbonData

2018-01-10 Thread Liang Chen
Hi We are pleased to announce that David Cai as new PMC for Apache CarbonData. Congrats to David Cai. Regards Liang

Re: [VOTE] Apache CarbonData 1.3.0(RC1) release

2018-01-10 Thread Liang Chen
Hi Yes, i agree with xm_zzc , looks there are some issues which are open. Please consider RC2 for fixing these open issues. Regards Liang xm_zzc wrote > Hi ravipesala: > I find that there are some unresolved jira bugs related to version 1.3: > > https://issues.apache.org/jira/browse/CARBOND

[ANNOUNCE] Kunal Kapoor as new Apache CarbonData committer

2018-01-08 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Kunal Kapoor as new Apache CarbonData committer, and the invite has been accepted! Congrats to Kunal Kapoor and welcome aboard. Regards Apache CarbonData PMC

Re: Should we use Spark 2.2.1 as default version for Spark-2.2 supported

2017-12-26 Thread Liang Chen
Hi +1 from my side. Just i checked spark 2.2.1, there are more than 200+ issues fixed. But should remind one thing: if the proposal be accepted by community , will use 2.2.1 to replace 2.2.0 as spark version of integration module, *would not support both.* Regards Liang xm_zzc wrote > Hi dev: >

Re: Blog on how to use Carbondata with Presto

2017-11-26 Thread Liang Chen
Hi bhavya Thanks for your sharing, a nice blog. Regards Liang bhavya411 wrote > Hi All, > > Please look at the blog to see how we can use CarbonData with Presto. > > > https://blog.knoldus.com/2017/11/20/integrating-presto-with-carbondata/ >

Re: Where could I download ODBC driver for carbondata

2017-11-24 Thread Liang Chen
Hi Kui Actually, CarbonData doesn't have ODBC drive you can connect to apache spark with ODBC, use spark engine to read carbondata. Regards Liang 2017-11-21 10:58 GMT+08:00 高奎 : > Hi CarbonData Team, > > I am now working on technical research about carbondata. > > We need connect carbondata fr

Re: [Discussion]support user specified segments in major compation

2017-11-20 Thread Liang Chen
Hi Jin Zhou OK, Thanks for your proposal. can you raise one PRs to support the two features? Regards Liang Jin Zhou wrote > @Liang Chen, thank you for your reply. > > > After seriously thinking about your suggestion, I also think the two > problems should be considered se

test

2017-11-16 Thread Liang Chen
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

test if can receive mailing list mail

2017-11-16 Thread Liang Chen
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]support table level compaction configuration

2017-11-16 Thread Liang Chen
Hi Jin Zhou Look forward to seeing your pull request. Do you have the contributor right of Apache CarbonData JIRA? If no, please let me know your email id of jira account. Regards Liang Jin Zhou wrote > @xm_zzc, yes, I'm working on this improvement. > > > > -- > Sent from: > http://apache-ca

test whether dev@mailing list working fine, or not ?

2017-11-16 Thread Liang Chen

Re: [DISCUSSION] Regarding to redundancy code and some issues.

2017-11-04 Thread Liang Chen
+1, all are good proposals. Regards Liang David CaiQiang wrote > Hi All, >Here, I listed the following points to improve the code. > > Redundancy: > 1. CarbonLoadModel.isDirectLoad > It is always true, better to remove the related code. > Now CarbonData doesn't pre-partition the input data

Re: Version upgrade for Presto Integration to 0.186

2017-11-03 Thread Liang Chen
+1 Can you raise one PR for this. Regards Liang bhavya411 wrote > Hi All, > > Presto 0.186 version has as lot of improvements that will increase the > performance and improve the reliability. Some of the major issues and > improvements are listed below. > > >- Fix excessive GC overhead c

Re: After MAJOR index lost

2017-11-01 Thread Liang Chen
Hi Yes, checked the log message, looks have some issues. Can you share the reproduce steps: Did you use how many machines to do data load, and load how many times? Regards Liang yixu2001 wrote > dev > environment spark.2.1.1 carbondata 1.1.1 hadoop 2.7.2 > > run ALTER table e_carbon.prod

Re: [PROPOSAL] Tag Pull Request with feature tag

2017-10-28 Thread Liang Chen
+1, agree with this proposal. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Merging carbonindex files for each segments and across segments

2017-10-26 Thread Liang Chen
Yes, Jin Zhou. Merge all index files to one in a segment would be useful feature. it would significantly improve query performance. Regards Liang Jin Zhou wrote > Hi, ravipesala > > Thank you for your proposal, merging index file is a very useful feature > as > we have already met serious perfo

Re: [Discussion]support user specified segments in major compation

2017-10-26 Thread Liang Chen
Hi Jin Zhou Thanks for starting this discussion. 1. For your first proposal : Currently , segment is the system internal concept, not expose to outside. Can you provide what exact problems do you encounter? we can find the alternative solution for your problems. --

Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-26 Thread Liang Chen
gt; property for blocklet size to configure while creating a table. > > Regards, > Ravindra. > > On 11 October 2017 at 13:36, Liang Chen < > chenliang6136@ > > wrote: > >> Hi All >> >> As you know, some default value of parameters need to adju

Re: [Disscussion] Support Streaming Ingest

2017-10-21 Thread Liang Chen
Hi One question: Why not supports structured streaming to replace spark streaming ? --- In first phase implementation, it should support kafka and spark streaming integration. More streaming framework support is preferable in the future. Regards Liang

Re: [Discussion] Merging carbonindex files for each segments and across segments

2017-10-20 Thread Liang Chen
+1 for this proposal and solution, thanks, Ravi Regards Liang 2017-10-20 19:13 GMT+05:30 Ravindra Pesala : > Hi, > > Problem : > The first-time query of carbon becomes very slow. It is because of reading > many small carbonindex files and cache to the driver at the first time. > Many carbonind

Re: [Discussion] Carbon Store abstraction

2017-10-20 Thread Liang Chen
Hi Thank you started this discussion. agree, for exposing the clear interface to users, there are some optimization works. Can you list the more detail about your proposal? for example: what class you propose to move to carbon store, what api you propose to create and expose to users. I suggest

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

2017-10-19 Thread Liang Chen
Hi Execute the below query, return one row record or multiple row records ? - select a.remark from c_indextest1 a where a.id=b.id Regards Liang yixu2001 wrote > dev > You can follow the steps below to reproduce the problem. > tables c_indextest2 has 1700w rec

Re: Query failed after "update" statement interruptted

2017-10-16 Thread Liang Chen
Hi Can you provide the full script? what is your update script? how to reproduce ? Regards Liang yixu2001 wrote > dev > > On the process of "update" statement execution, interruption happened. > After that, the "select" statement failed. > Sometimes the "select" statement will recover to s

Re: Encountered some problems when querying data

2017-10-16 Thread Liang Chen
Hi Can you raise an apache JIRA at : https://issues.apache.org/jira/projects/CARBONDATA and provide the test data and script, need to reproduce this issue. Regards Liang 刘feng wrote > Hello,dev > >1,When using the ‘like’query in sql, I found a bug. > > E.g: select ake005,count(1) from ca_

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-16 Thread Liang Chen
uild pre-aggregate table as >>> update scenario” >>> User need to drop the associated aggregate table and perform alter >>> table, >>> or data update/delete, or delete segment operation, then he can create >>> the >>> pre-agg table using CT

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-14 Thread Liang Chen
Hi Jacky Thanks for you started this discussion, this is a great feature in carbondata. One question: For sub_jar "Handle alter table scenarios for aggregation table", please give more detail info. Just i viewed the pdf attachment as below, looks no need to do any handles for agg table if users d

Re: [DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-14 Thread Liang Chen
Hi lionel As per mailing list discussion result, no objection. so can you create an umbrella jira to remove spark 1.5 & 1.6 code in 1.3.0. Regards Liang lionel061201 wrote > Hi community, > Currently we have three spark related module in carbondata(spark 1.5, 1.6, > 2.1), the project has becom

[DISCUSSION] Optimize the default value for some parameters

2017-10-11 Thread Liang Chen
Hi All As you know, some default value of parameters need to adjust for most of cases, this discussion is for collecting which parameters' default value need to be optimized: 1. TABLE_BLOCKSIZE: current default is 1G, propose to adjust to 512M 2. Please append at here if you propose to adjust

Re: Does index be used when doing "join" operation between a big table and a small table?

2017-10-11 Thread Liang Chen
If the index be used, the number of tasks would be less. Can you share your script (create table script and query script), let us check if you created the effective index for filter columns. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Does index be used when doing "join" operation between a big table and a small table?

2017-10-11 Thread Liang Chen
Hi If the index be used for filtering data, the number of tasks would be more less. Can you share the script(create table and query), let us check if created the effective index for filter columns. Regards Liang Mic Sun wrote > hello, > > I have 2 tables need to do "join" operation by their

Re: [DISCUSSION] support user specified segment reading for query

2017-10-11 Thread Liang Chen
Hi Rahul I suggest only doing "Query HINT". Please finalize the query script : select * from t1 [in SEGMENTS(1,3,5)] or SELECT /*+SEGMENTS(1,3,5) */ from t1 Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Apache CarbonData 1.3.0 scope

2017-10-10 Thread Liang Chen
Hi yuhai I have same comment as Jacky,please provide more info about this requirement. It would be better if you could create a new topic to detailedly discuss this requirement. Regards Liang Jacky Li wrote > Hi Cenyuhai, > > Can you further describe your requirement? Currently carbon supports

[DISCUSSION] Apache CarbonData 1.3.0 scope

2017-09-29 Thread Liang Chen
Hi all First , on behalf of Apache CarbonData community, thanks for all contributors who are from 20+ different organizations. This mail is for discussing 1.3.0 scope (around 3-4 months), i propose the following feature can be considered. 1)Spark 2.2.0 integration (propose committer Ravindra to

Re: [DISCUSSION] optimization of OrderBy sorted columns + Limit Query

2017-09-29 Thread Liang Chen
Hi Jarck Did this solution use dictionary to do limit , right ? this solution can't make sure the data correctness --- Use orderby +limit optimized carbondata1.2 master code + spark1.6.3 @Ravindra @Jarck : let us discus

Re: [VOTE] Apache CarbonData 1.2.0(RC3) release

2017-09-23 Thread Liang Chen
1.Source code can be compiled successfully with script "mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format package" ​ 2.Can query carbondata file properly in Spark-shell. 3.License file looks good. 4.Signature file looks good 5.Hash checksum files look good 6.NOTICE file looks good My vote :

Re: [VOTE] Apache CarbonData 1.2.0(RC2) release

2017-09-18 Thread Liang Chen
Hi 1.Source code can be compiled successfully with script "mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format package" 2.Can query carbondata file properly in Spark-shell. 3.License file looks good. 4.Signature file looks good 5.Hash checksum files look good 6.NOTICE file looks good My vote :

Re: carbondata 加载数据问题咨询

2017-09-18 Thread Liang Chen
Hi I have the same comments as cenyuhai, please provide more detail info, which version you used? Please refer to https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md, for high cardinality columns, you can use script like TBLPROPERTIES ('DICTIONARY_EXCLUDE'='MSISD

Re: [VOTE] Apache CarbonData 1.2.0(RC2) release

2017-09-18 Thread Liang Chen
Hi I think you may input the wrong description "apache-carbondata-1.2.0-rc1"? 2. The tag to be voted upon : apache-carbondata-1.2.0-rc1(commit: ede03f5c963b13cc640feba799a22466246951c6) *https://github.com/apache/carbondata/relea

Re: MODERATE for dev@carbondata.apache.org

2017-09-16 Thread Liang Chen
Hi First please send one mail to dev-subscr...@carbondata.apache.org for joining mailing list group. After you join mailing list group, please send your question again to dev@carbondata.apache.org Please provide the error log message, and your create table script. Regards Liang 2017-09-15 1

[ANNOUNCE] Lu Cao as new Apache CarbonData committer

2017-09-13 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Lu Cao as new Apache CarbonData committer, and the invite has been accepted ! Congrats to Lu Cao and welcome aboard. Regards The Apache CarbonData PMC -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130

Re: Block B-tree loading failed

2017-09-13 Thread Liang Chen
Hi Looks that the path is invalid, can you provide full script: how you created carbonsession? - Caused by: org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid carbon data file: hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_c

Re: Presto+CarbonData optimization work discussion

2017-09-01 Thread Liang Chen
QC | 57467886 | 1385076 SK | 57385152 | 1382364 YT | 57377556 | 1383900 (13 rows) Query 20170902_033821_6_h6g24, FINISHED, 1 node Splits: 50 total, 50 done (100.00%) 0:03 [18M rows, 0B] [6.62M rows/s, 0B/s] Regards Liang Liang Chen wrote > Hi > > For -- 4) Lazy dec

Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-30 Thread Liang Chen
Hi Ohh , Really? a big big welcome! Regards Liang Jean-Baptiste Onofré wrote > Awesome. > > I would love to be there. Let me check if I can. > > Regards > JB > > On Aug 23, 2017, 08:48, at 08:48, Liang Chen < > chenliang6136@ > > wrote: >>

Re: [ANNOUNCE] Manish Gupta as new Apache CarbonData committer

2017-08-25 Thread Liang Chen
Correct the title , to add "committer" info. 2017-08-25 23:56 GMT+08:00 Liang Chen : > Hi all > > We are pleased to announce that the PMC has invited Manish Gupta as new > Apache CarbonData committer, and the invite has been accepted ! > > Congrats to Manish Gupta and

[ANNOUNCE] Manish Gupta as new Apache CarbonData

2017-08-25 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Manish Gupta as new Apache CarbonData committer, and the invite has been accepted ! Congrats to Manish Gupta and welcome aboard. Regards The Apache CarbonData PMC

Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-23 Thread Liang Chen
-- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Apache-CarbonData-6th-meetup-in-Shanghai-on-2nd-Sep-2017-at-https-jinshuju-net-f-X

Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-23 Thread Liang Chen

Re: ClassNotFound error when insert carbontable from hive table

2017-08-22 Thread Liang Chen
Hi lionel Can you share with us how did you fix this issue? Regards Liang lionel061201 wrote > This issue had been fixed. > > On Mon, Aug 21, 2017 at 4:04 PM, Lu Cao < > whucaolu@ > > wrote: > >> Hi dev, >> >> I'm trying to insert data from a hive table to carbon table: >> >> cc.sql("insert

Re: [DISCUSSION] About partition table query performance

2017-08-17 Thread Liang Chen
Hi +1.Very nice feature, Thanks for your good contribution. Look forward to seeing the test report. Regards Liang lionel061201 wrote > Hi dev, > Partition feature is now available on master and I just created a guidance > doc in > https://github.com/apache/carbondata/pull/1258 > > I added some

Re: [DISCUSSION] Interfaces for index frame work

2017-08-14 Thread Liang Chen
Hi Nice feature, +1. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Interfaces-for-index-frame-work-tp13274p20217.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

Re: Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?

2017-08-08 Thread Liang Chen
Hi, In theory, it should support But practically, 1. It may take long time to replicate in case any of the replica is lost/moved due to balancer/mover/replication 2. In case of pipeline recoveries during write/append, if new node is replaced the failed node, then existing data will be copie

Re: carbon data performance doubts

2017-07-23 Thread Liang Chen
Hi simafengyun Can you write a example to introduce how to use sort_columns and update the documents also, thanks. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/carbon-data-performance-doubts-tp18438p18703.html Sent from

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Some more info : In release 1.1.1, there was a good improvement "measure filter optimization", system will use minmax index to do filter for measure column filter. So for INT column to get good filter: one way you can add the INT column to sort_columns, another way, system will automati

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Some more info : In release 1.1.1, there was a good improvement "measure filter optimization", system will use minmax index to do filter for measure column filter. So for INT Regards Liang 2017-07-22 9:22 GMT+08:00 Liang Chen : > Hi Swapnil > > Actually, current s

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Swapnil Actually, current system's behavior is : Index and dictionary encoding are decoupled, no relationship. 1. If you want to make some columns have good filter , just add these columns to sort_columns (like tblproperties('sort_columns'='empno')), to build good MDX index for these columns

[ANNOUNCE] Apache CarbonData 1.1.1 release

2017-07-20 Thread Liang Chen
Hi All, The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.1.1. This release(1.1.1) is a patch, some key improvements and bug fix as below : - Data update and delete with Spark 2.1. - Improve measure filter performance by ~2-4 times. - Some

Re: [question] about new table property "sort_column"

2017-07-20 Thread Liang Chen
Hi Jin zhou Yes, your understanding is correct. The MDK(multi-dimension index) will be created as per your specified sort_columns order. Regards Liang 2017-07-21 10:51 GMT+08:00 Jin Zhou : > > Hi,all > > I notice there is a new table property: sort_column and want to confirm: > > 1) when a NON-

Re: Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
ether it is really a lazy decoding issue or > not. > > Regards, > Ravindra > > On 20 July 2017 at 08:04, Liang Chen wrote: > > > Hi > > > > For -- 4) Lazy decoding of the dictionary, just i tested 180 millions > rows > > data with the script: >

Re: Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
Hi For -- 4) Lazy decoding of the dictionary, just i tested 180 millions rows data with the script: "select province,sum(age),count(*) from presto_carbondata group by province order by province" Spark integration module has "dictionary lazy decode", presto doesn't have "dictionary lazy decode",

Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
Hi Below are some proposed items for Presto optimization: 1) Remove the extra loops for data conversion in Presto Format to increase the performance. 2) Modularize and optimize the filters . 3) Optimize the Carbondata Metadata reading. 4) Lazy decoding of the dictionary. 5) Batch reading of the

Re: [Discussion] Using Lazy Dictionary Decode for Presto Integration

2017-07-18 Thread Liang Chen
+1, use the laze decode to utilize carbondata's dictionary, it would improve aggregation performance. Please consider adding these code to presto integration module, don't directly reuse spark module code. Regards Liang 2017-07-18 23:46 GMT+08:00 Bhavya Aggarwal : > We were trying the Presto wit

Re: FileNotFoundExceptions while running CarbonData

2017-07-18 Thread Liang Chen
Hi Swapnil Very look forward to seeing your PR. Please let me know your Apache JIRA email id, i will add the contributor right for you. Regards Liang 2017-07-18 6:49 GMT+08:00 Swapnil Shinde : > Thanks. I think I fixed it support maprFS. I will do some more testing and > then add a jira ticket

Re: [1.2.0-SNAPSHOT]-delete problem

2017-07-14 Thread Liang Chen
Hi Please provide your test script, this will help us to reproduce your issue. otherwise, we could not know what is your exact problem. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/1-2-0-SNAPSHOT-delete-problem-tp18090

<    1   2   3   >