RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-18 Thread Jihong Ma
vs. can't do as well as the amount of effort/complexity required. Jihong -Original Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Tuesday, October 18, 2016 7:21 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution. Hi

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-17 Thread Jihong Ma
nce this tool is only triggered once per table, not considered too > much > >> burden on the end users. Making global dictionary generation out of the > way > >> of regular data loading is the key here. > >> > >> Jihong > >> > >> -Orig

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-15 Thread Ravindra Pesala
t; > > There are a lot to worry about for distributed map, and leveraging KV > store is overkill if simply just for dictionary generation. > > > > Regards. > > > > Jihong > > > > -Original Message----- > > From: Ravindra Pesala [mailto:rav

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
> Regards. > > Jihong > > -Original Message- > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] > Sent: Friday, October 14, 2016 11:03 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong, &

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
lumns specified with global dictionary encoding, but dictionary is not > > placed before data loading, we error out and direct user to use the tool > > first. > > > > Make sense? > > > > Jihong > > > > -Original Message- > > From: Ravindra Pesala

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
ion out of the way > of regular data loading is the key here. > > Jihong > > -Original Message- > From: Liang Chen [mailto:chenliang6...@gmail.com] > Sent: Thursday, October 13, 2016 5:39 PM > To: dev@carbondata.incubator.apache.org > Subject: RE: Discussion(New

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
ionaries is available > before > >> > normal data loading. We shall be able to perform encoding based on > that, > >> > we only need to handle occasionally adding entries while loading. For > >> > columns specified with global dictionary encoding, but dictionary is > not

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
gt; normal data loading. We shall be able to perform encoding based on that, >> > we only need to handle occasionally adding entries while loading. For >> > columns specified with global dictionary encoding, but dictionary is not >> > placed before data loading, we error

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Liang Chen
sage- > From: Ravindra Pesala [mailto: > ravi.pesala@ > ] > Sent: Thursday, October 13, 2016 1:12 AM > To: dev > Subject: Re: Discussion(New feature) regarding single pass data loading > solution. > > Hi Jihong/Aniket, > > In the current implementation of c

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Jihong Ma
iginal Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Thursday, October 13, 2016 1:12 AM To: dev Subject: Re: Discussion(New feature) regarding single pass data loading solution. Hi Jihong/Aniket, In the current implementation of carbondata we are already handling e

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Ravindra Pesala
Hi Jihong/Aniket, In the current implementation of carbondata we are already handling external dictionary while loading the data. But here the question is what would be the default implementation? Load data with out dictionary? Regards, Ravi On 13 October 2016 at 03:50, Aniket Adnaik

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Qingqing Zhou
On Tue, Oct 11, 2016 at 2:32 AM, Ravindra Pesala wrote: > Currently data is loading to carbon in 2 pass/jobs > 1. Generating global dictionary using spark job. Do we have local dictionaries? If not, what if the column has many distinct values - will the big dictionary

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-12 Thread Aniket Adnaik
Hi Ravi, 1. I agree with Jihong that creation of global dictionary should be optional, so that it can be disabled to improve the load performance. User should be made aware that using global dictionary may boost the query performance. 2. We should have a generic interface to manage global

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-11 Thread Jihong Ma
A rather straight option is allow user to supply global dictionary generated somewhere else or we build a separate tool just for generating as well updating dictionary. Then the general normal data loading process will encode columns with local dictionary if not supplied. This should cover