Re: Analyze table compute statistics on wide table taking too long

Roger Marin Tue, 07 Apr 2015 20:01:46 -0700

Hi Gopal,

Thanks for that.


I'm happy to look into improving the Regex serde performance, any tips on
where I should start looking?.

Regards,
Roger
On 08/04/2015 11:44 AM, "Gopal Vijayaraghavan" <[email protected]> wrote:

>
> > The table also has a large Regex serde.
>
> There are no stats fast paths for Regex SerDe.
>
> The statistics computation is lifting each row into memory, parsing it and
> throwing it away.
>
> Most of your time would be spent in GC (check the GC time millis), due to
> the huge expense of the Regex Serde.
>
> For a direct comparison you could compute stats while turning it into
> another format
>
> set hive.stats.autogather=true;
> create table tmp1 stored as orc as select * from oldtable;
>
> Due to the nature of the columnar SerDes, that ETL would happen in
> parallel to the compute stats off the same stream (i.e autogather).
>
> That said, I have noticed performance issues with the RegexSerde, but
> haven¹t bothered to fix it yet - maybe you¹d want to take a shot at fixing
> it?
>
>
> Cheers,
> Gopal
>
>
>

Re: Analyze table compute statistics on wide table taking too long

Reply via email to