RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-18 Thread Jihong Ma
vs. can't do as well as the amount of effort/complexity required. Jihong -Original Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Tuesday, October 18, 2016 7:21 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-18 Thread Ravindra Pesala
ormance. > > > Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Saturday, October 15, 2016 12:50 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jacky/J

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-17 Thread Jihong Ma
; > > There are a lot to worry about for distributed map, and leveraging KV > store is overkill if simply just for dictionary generation. > > > > Regards. > > > > Jihong > > > > -Original Message- > > From: Ravindra Pesala [mailto:ravi.pes...@gmail.c

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-16 Thread Liang Chen
irst time after >> table >> >> is created, any subsequent loads/incremental loads will proceed and is >> >> capable of updating the global dictionary when it encounters new >> value, >> >> this is easiest way of achieving 1 pass data loading process without &

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-15 Thread Ravindra Pesala
stributed map, and leveraging KV > store is overkill if simply just for dictionary generation. > > > > Regards. > > > > Jihong > > > > -Original Message----- > > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > > Sent: Friday, October 14,

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Friday, October 14, 2016 11:03 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong, > > I agree, we c

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
lobal dictionary encoding, but dictionary is not > > placed before data loading, we error out and direct user to use the tool > > first. > > > > Make sense? > > > > Jihong > > > > -----Original Message- > > From: Ravindra Pesala [mailto: > > >

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Vimal Das Kammath
nal Message----- > > From: Liang Chen [mailto:chenliang6...@gmail.com] > > Sent: Thursday, October 13, 2016 5:39 PM > > To: dev@carbondata.incubator.apache.org > > Subject: RE: Discussion(New feature) regarding single pass data loading > > solution. > > > &

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
f regular data loading is the key here. > > Jihong > > -Original Message- > From: Liang Chen [mailto:chenliang6...@gmail.com] > Sent: Thursday, October 13, 2016 5:39 PM > To: dev@carbondata.incubator.apache.org > Subject: RE: Discussion(New feature) regarding single pass

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
to perform encoding based on > that, > >> > we only need to handle occasionally adding entries while loading. For > >> > columns specified with global dictionary encoding, but dictionary is > not > >> > placed before data loading, we error out and direct user to use the

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
] Sent: Thursday, October 13, 2016 5:39 PM To: dev@carbondata.incubator.apache.org Subject: RE: Discussion(New feature) regarding single pass data loading solution. Hi jihong I am not sure that users can accept to use extra tool to do this work, because provide tool or do scan at first time per

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
coding based on that, >> > we only need to handle occasionally adding entries while loading. For >> > columns specified with global dictionary encoding, but dictionary is not >> > placed before data loading, we error out and direct user to use the tool >> > first

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
> > -----Original Message- > > From: Ravindra Pesala [mailto: > > > ravi.pesala@ > > > ] > > Sent: Thursday, October 13, 2016 1:12 AM > > To: dev > > Subject: Re: Discussion(New feature) regarding single pass data loading > > solution. > &

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Liang Chen
sage- > From: Ravindra Pesala [mailto: > ravi.pesala@ > ] > Sent: Thursday, October 13, 2016 1:12 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong/Aniket, > > In the current implementation of carbo

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Jihong Ma
iginal Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Thursday, October 13, 2016 1:12 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution. Hi Jihong/Aniket, In the current implementation of carbondata we are already handling e

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Ravindra Pesala
Hi Jihong/Aniket, In the current implementation of carbondata we are already handling external dictionary while loading the data. But here the question is what would be the default implementation? Load data with out dictionary? Regards, Ravi On 13 October 2016 at 03:50, Aniket Adnaik wrote: >

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Qingqing Zhou
On Tue, Oct 11, 2016 at 2:32 AM, Ravindra Pesala wrote: > Currently data is loading to carbon in 2 pass/jobs > 1. Generating global dictionary using spark job. Do we have local dictionaries? If not, what if the column has many distinct values - will the big dictionary loaded into memory? Regard

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Aniket Adnaik
Hi Ravi, 1. I agree with Jihong that creation of global dictionary should be optional, so that it can be disabled to improve the load performance. User should be made aware that using global dictionary may boost the query performance. 2. We should have a generic interface to manage global dictiona

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-11 Thread Jihong Ma
A rather straight option is allow user to supply global dictionary generated somewhere else or we build a separate tool just for generating as well updating dictionary. Then the general normal data loading process will encode columns with local dictionary if not supplied. This should cover maj