vs. can't do as well
as the amount of effort/complexity required.
Jihong
-Original Message-
From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
Sent: Tuesday, October 18, 2016 7:21 AM
To: dev
Subject: Re: Discussion(New feature) regarding single pass data loading
solution.
Hi
nce this tool is only triggered once per table, not considered too
> much
> >> burden on the end users. Making global dictionary generation out of the
> way
> >> of regular data loading is the key here.
> >>
> >> Jihong
> >>
> >> -Orig
t;
> > There are a lot to worry about for distributed map, and leveraging KV
> store is overkill if simply just for dictionary generation.
> >
> > Regards.
> >
> > Jihong
> >
> > -Original Message-----
> > From: Ravindra Pesala [mailto:rav
> Regards.
>
> Jihong
>
> -Original Message-
> From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
> Sent: Friday, October 14, 2016 11:03 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi Jihong,
&
lumns specified with global dictionary encoding, but dictionary is not
> > placed before data loading, we error out and direct user to use the tool
> > first.
> >
> > Make sense?
> >
> > Jihong
> >
> > -Original Message-
> > From: Ravindra Pesala
ion out of the way
> of regular data loading is the key here.
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Thursday, October 13, 2016 5:39 PM
> To: dev@carbondata.incubator.apache.org
> Subject: RE: Discussion(New
ionaries is available
> before
> >> > normal data loading. We shall be able to perform encoding based on
> that,
> >> > we only need to handle occasionally adding entries while loading. For
> >> > columns specified with global dictionary encoding, but dictionary is
> not
gt; normal data loading. We shall be able to perform encoding based on that,
>> > we only need to handle occasionally adding entries while loading. For
>> > columns specified with global dictionary encoding, but dictionary is not
>> > placed before data loading, we error
sage-
> From: Ravindra Pesala [mailto:
> ravi.pesala@
> ]
> Sent: Thursday, October 13, 2016 1:12 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi Jihong/Aniket,
>
> In the current implementation of c
iginal Message-
From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
Sent: Thursday, October 13, 2016 1:12 AM
To: dev
Subject: Re: Discussion(New feature) regarding single pass data loading
solution.
Hi Jihong/Aniket,
In the current implementation of carbondata we are already handling
e
Hi Jihong/Aniket,
In the current implementation of carbondata we are already handling
external dictionary while loading the data.
But here the question is what would be the default implementation? Load
data with out dictionary?
Regards,
Ravi
On 13 October 2016 at 03:50, Aniket Adnaik
On Tue, Oct 11, 2016 at 2:32 AM, Ravindra Pesala
wrote:
> Currently data is loading to carbon in 2 pass/jobs
> 1. Generating global dictionary using spark job.
Do we have local dictionaries? If not, what if the column has many
distinct values - will the big dictionary
Hi Ravi,
1. I agree with Jihong that creation of global dictionary should be
optional, so that it can be disabled to improve the load performance. User
should be made aware that using global dictionary may boost the query
performance.
2. We should have a generic interface to manage global
A rather straight option is allow user to supply global dictionary generated
somewhere else or we build a separate tool just for generating as well updating
dictionary. Then the general normal data loading process will encode columns
with local dictionary if not supplied. This should cover
14 matches
Mail list logo