RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-18 Thread Jihong Ma
vs. can't do as well as the amount of effort/complexity required. Jihong -Original Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Tuesday, October 18, 2016 7:21 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-18 Thread Ravindra Pesala
ormance. > > > Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Saturday, October 15, 2016 12:50 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jacky/J

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-17 Thread Jihong Ma
; > > There are a lot to worry about for distributed map, and leveraging KV > store is overkill if simply just for dictionary generation. > > > > Regards. > > > > Jihong > > > > -Original Message- > > From: Ravindra Pesala [mailto:ravi.pes...@gmail.c

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-16 Thread Liang Chen
irst time after >> table >> >> is created, any subsequent loads/incremental loads will proceed and is >> >> capable of updating the global dictionary when it encounters new >> value, >> >> this is easiest way of achieving 1 pass data loading process without &

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-15 Thread Ravindra Pesala
stributed map, and leveraging KV > store is overkill if simply just for dictionary generation. > > > > Regards. > > > > Jihong > > > > -Original Message----- > > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > > Sent: Friday, October 14,

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Friday, October 14, 2016 11:03 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong, > > I agree, we c

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
lobal dictionary encoding, but dictionary is not > > placed before data loading, we error out and direct user to use the tool > > first. > > > > Make sense? > > > > Jihong > > > > -----Original Message- > > From: Ravindra Pesala [mailto: > > >

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Vimal Das Kammath
nal Message- > > From: Liang Chen [mailto:chenliang6...@gmail.com] > > Sent: Thursday, October 13, 2016 5:39 PM > > To: dev@carbondata.incubator.apache.org > > Subject: RE: Discussion(New feature) regarding single pass data loading > > solution. > > > &

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
f regular data loading is the key here. > > Jihong > > -Original Message- > From: Liang Chen [mailto:chenliang6...@gmail.com] > Sent: Thursday, October 13, 2016 5:39 PM > To: dev@carbondata.incubator.apache.org > Subject: RE: Discussion(New feature) regarding single pass

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
to perform encoding based on > that, > >> > we only need to handle occasionally adding entries while loading. For > >> > columns specified with global dictionary encoding, but dictionary is > not > >> > placed before data loading, we error out and direct user to use the

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
] Sent: Thursday, October 13, 2016 5:39 PM To: dev@carbondata.incubator.apache.org Subject: RE: Discussion(New feature) regarding single pass data loading solution. Hi jihong I am not sure that users can accept to use extra tool to do this work, because provide tool or do scan at first time per

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
coding based on that, >> > we only need to handle occasionally adding entries while loading. For >> > columns specified with global dictionary encoding, but dictionary is not >> > placed before data loading, we error out and direct user to use the tool >> > first

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
> > -----Original Message- > > From: Ravindra Pesala [mailto: > > > ravi.pesala@ > > > ] > > Sent: Thursday, October 13, 2016 1:12 AM > > To: dev > > Subject: Re: Discussion(New feature) regarding single pass data loading > > solution. > &

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Liang Chen
sage- > From: Ravindra Pesala [mailto: > ravi.pesala@ > ] > Sent: Thursday, October 13, 2016 1:12 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong/Aniket, > > In the current implementation of carbo

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Jihong Ma
iginal Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Thursday, October 13, 2016 1:12 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution. Hi Jihong/Aniket, In the current implementation of carbondata we are already handling e

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Ravindra Pesala
> -Original Message- > > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > > Sent: Tuesday, October 11, 2016 2:33 AM > > To: dev > > Subject: Discussion(New feature) regarding single pass data loading > > solution. > > > > Hi All, > > &

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Qingqing Zhou
On Tue, Oct 11, 2016 at 2:32 AM, Ravindra Pesala wrote: > Currently data is loading to carbon in 2 pass/jobs > 1. Generating global dictionary using spark job. Do we have local dictionaries? If not, what if the column has many distinct values - will the big dictionary loaded into memory? Regard

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Aniket Adnaik
al step, only > triggered when needed, not a default step as we do currently. > > Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Tuesday, October 11, 2016 2:33 AM > To: dev > Subject: Discussion(New feature) regarding

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-11 Thread Jihong Ma
currently. Jihong -Original Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Tuesday, October 11, 2016 2:33 AM To: dev Subject: Discussion(New feature) regarding single pass data loading solution. Hi All, This discussion is regarding single pass data load solution

Discussion(New feature) regarding single pass data loading solution.

2016-10-11 Thread Ravindra Pesala
Hi All, This discussion is regarding single pass data load solution. Currently data is loading to carbon in 2 pass/jobs 1. Generating global dictionary using spark job. 2. Encode the data with dictionary values and create carbondata files. This 2 pass solution has many disadvantages like it nee