It does sound a lot like https://issues.apache.org/jira/browse/IMPALA-5058
or https://issues.apache.org/jira/browse/IMPALA-6671 - the catalog tries to
maintain some kind of consistent for operations but that means that
long-running operations can end up blocking others. I'm not involved but I
know some other people are rearchitecting parts of the catalog to avoid
issues like this.

On Fri, Sep 21, 2018 at 5:32 AM Fawze Abujaber <[email protected]> wrote:

> Hello Community,
>
> I'm investigating an issue we are running on with impala DDL statements
> which sometimes took more than 6-9 minutes.
>
> We have around 144 impala tables that partitioned by YYYY/MM/DD
> We are keeping the data between 3-13 months depend on the table, we are
> running 3 different DDL statements.
>
> ---> ALTER Table recover partitions each 20 minutes to detect the new data
> generated by spark job and written into the HDFS.
>
> ---> DROP AND CREATE table twice a day to detect new schema changes in the
> data.
>
> I don't see the issue occuring on specific table or specific Impala daemon,
>
> On the other side we have 450 hive tables that we running the same DDL
> statements on using the hive.
>
> Trying to find a way to investigate this with no success, for example i
> want to check the size of the metadata stored at each daemon in order to
> see if my issue related to the metadata size or not, but don't aware how to
> check this.
>
> Any suggestions on how to investigate this issue is much appreciated.
>
> --
> Take Care
> Fawze Abujaber
>
  • Slow DDL Fawze Abujaber
    • Re: Slow DDL Tim Armstrong

Reply via email to