Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-19 Thread Kumar Vishal
+1 Now user will have flexibility to choose the output format.Will get performance benefit if dictionary files are already generated. -Regards Kumar Vishal On Fri, Dec 16, 2016 at 10:19 AM, Ravindra Pesala wrote: > +1 to have separate output formats, now user can have

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Ravindra Pesala
+1 to have separate output formats, now user can have flexibility to choose as per scenario. On Fri, Dec 16, 2016, 2:47 AM Jihong Ma wrote: > > It is great idea to have separate OutputFormat for regular Carbon data > files, index files as well as meta data files, For

RE: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jihong Ma
It is great idea to have separate OutputFormat for regular Carbon data files, index files as well as meta data files, For instance: dictionary file, schema file, global index file etc.. for writing Carbon generated files laid out HDFS, and it is orthogonal to the actual data load process.

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread QiangCai
+1We should flexibility choose loading solution according to Scenario 1 and 2, and will get performance benefits. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonData-loading-solution-discussion-tp4490p4520.html Sent from

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Liang Chen
Hi Jacky Thanks you started a good discussion. see if i understand your points: Scenario1 likes the current load data solution(0.2.0). 1.0.0 Will provide a new solution option of "single-pass data loading" to meet this kind of scenario: For subsequent data loads if the most dictionary code has

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Sorry for the incorrect formatting of previous post. I corrected it in this post. Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do