Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-19 Thread Kumar Vishal
nt: Thursday, December 15, 2016 12:55 AM > > To: dev@carbondata.incubator.apache.org > > Subject: [DISCUSSION] CarbonData loading solution discussion > > > > > > Hi community, > > > > Since CarbonData has global dictionary feature, currently when loading >

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Ravindra Pesala
y, December 15, 2016 12:55 AM > To: dev@carbondata.incubator.apache.org > Subject: [DISCUSSION] CarbonData loading solution discussion > > > Hi community, > > Since CarbonData has global dictionary feature, currently when loading > data to CarbonData, it requires two times of

RE: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jihong Ma
. Regards. Jihong -Original Message- From: Jacky Li [mailto:jacky.li...@qq.com] Sent: Thursday, December 15, 2016 12:55 AM To: dev@carbondata.incubator.apache.org Subject: [DISCUSSION] CarbonData loading solution discussion Hi community, Since CarbonData has global dictionary feature

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread QiangCai
+1We should flexibility choose loading solution according to Scenario 1 and 2, and will get performance benefits. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonData-loading-solution-discussion-tp4490p4520.html Sent from

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Liang Chen
> 10K, run two jobs using two output > formats. Otherwise, run one job that use TableOutputFormat with > single-pass support > > 2) for subsequent load > Run one job that use TableOutputFormat with single-pass support > > What do yo think this idea? > > Regards,

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
use TableOutputFormat with single-pass support What do yo think this idea? Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonData-loading-solution-discussion-tp4490p4491.html Sent from the Apache CarbonD

[DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do actual data encoding and write to carbon files. Obviously, this approach is simple,

carbondata loading

2016-12-01 Thread Lu Cao
Hi dev team, I'm loading data from parquet file to carbondata file(DF read parquet and save to csv then load into carbondata file). The job is blocked at "collect at CarbonDataRDDFactory.scala:963" *Job Id* *Description* *Submitted* *Duration* *Stages: Succeeded/Total* *Tasks (for all stage