vs. can't do as well
as the amount of effort/complexity required.
Jihong
-Original Message-
From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
Sent: Tuesday, October 18, 2016 7:21 AM
To: dev
Subject: Re: Discussion(New feature) regarding single pass data loading
solution
ormance.
>
>
> Jihong
>
> -Original Message-
> From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
> Sent: Saturday, October 15, 2016 12:50 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi Jacky/J
;
> > There are a lot to worry about for distributed map, and leveraging KV
> store is overkill if simply just for dictionary generation.
> >
> > Regards.
> >
> > Jihong
> >
> > -Original Message-
> > From: Ravindra Pesala [mailto:ravi.pes...@gmail.c
irst time after
>> table
>> >> is created, any subsequent loads/incremental loads will proceed and is
>> >> capable of updating the global dictionary when it encounters new
>> value,
>> >> this is easiest way of achieving 1 pass data loading process without
&
stributed map, and leveraging KV
> store is overkill if simply just for dictionary generation.
> >
> > Regards.
> >
> > Jihong
> >
> > -Original Message-----
> > From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
> > Sent: Friday, October 14,
Jihong
>
> -Original Message-
> From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
> Sent: Friday, October 14, 2016 11:03 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi Jihong,
>
> I agree, we c
lobal dictionary encoding, but dictionary is not
> > placed before data loading, we error out and direct user to use the tool
> > first.
> >
> > Make sense?
> >
> > Jihong
> >
> > -----Original Message-
> > From: Ravindra Pesala [mailto:
>
> >
nal Message-----
> > From: Liang Chen [mailto:chenliang6...@gmail.com]
> > Sent: Thursday, October 13, 2016 5:39 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: RE: Discussion(New feature) regarding single pass data loading
> > solution.
> >
> &
f regular data loading is the key here.
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Thursday, October 13, 2016 5:39 PM
> To: dev@carbondata.incubator.apache.org
> Subject: RE: Discussion(New feature) regarding single pass
to perform encoding based on
> that,
> >> > we only need to handle occasionally adding entries while loading. For
> >> > columns specified with global dictionary encoding, but dictionary is
> not
> >> > placed before data loading, we error out and direct user to use the
]
Sent: Thursday, October 13, 2016 5:39 PM
To: dev@carbondata.incubator.apache.org
Subject: RE: Discussion(New feature) regarding single pass data loading
solution.
Hi jihong
I am not sure that users can accept to use extra tool to do this work,
because provide tool or do scan at first time per
coding based on that,
>> > we only need to handle occasionally adding entries while loading. For
>> > columns specified with global dictionary encoding, but dictionary is not
>> > placed before data loading, we error out and direct user to use the tool
>> > first
> > -----Original Message-
> > From: Ravindra Pesala [mailto:
>
> > ravi.pesala@
>
> > ]
> > Sent: Thursday, October 13, 2016 1:12 AM
> > To: dev
> > Subject: Re: Discussion(New feature) regarding single pass data loading
> > solution.
> &
sage-
> From: Ravindra Pesala [mailto:
> ravi.pesala@
> ]
> Sent: Thursday, October 13, 2016 1:12 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi Jihong/Aniket,
>
> In the current implementation of carbo
iginal Message-
From: Ravindra Pesala [mailto:ravi.pes...@gmail.com]
Sent: Thursday, October 13, 2016 1:12 AM
To: dev
Subject: Re: Discussion(New feature) regarding single pass data loading
solution.
Hi Jihong/Aniket,
In the current implementation of carbondata we are already handling
e
Hi Jihong/Aniket,
In the current implementation of carbondata we are already handling
external dictionary while loading the data.
But here the question is what would be the default implementation? Load
data with out dictionary?
Regards,
Ravi
On 13 October 2016 at 03:50, Aniket Adnaik wrote:
>
On Tue, Oct 11, 2016 at 2:32 AM, Ravindra Pesala
wrote:
> Currently data is loading to carbon in 2 pass/jobs
> 1. Generating global dictionary using spark job.
Do we have local dictionaries? If not, what if the column has many
distinct values - will the big dictionary loaded into memory?
Regard
Hi Ravi,
1. I agree with Jihong that creation of global dictionary should be
optional, so that it can be disabled to improve the load performance. User
should be made aware that using global dictionary may boost the query
performance.
2. We should have a generic interface to manage global dictiona
A rather straight option is allow user to supply global dictionary generated
somewhere else or we build a separate tool just for generating as well updating
dictionary. Then the general normal data loading process will encode columns
with local dictionary if not supplied. This should cover maj
19 matches
Mail list logo