It does sound a lot like https://issues.apache.org/jira/browse/IMPALA-5058 or https://issues.apache.org/jira/browse/IMPALA-6671 - the catalog tries to maintain some kind of consistent for operations but that means that long-running operations can end up blocking others. I'm not involved but I know some other people are rearchitecting parts of the catalog to avoid issues like this.
On Fri, Sep 21, 2018 at 5:32 AM Fawze Abujaber <[email protected]> wrote: > Hello Community, > > I'm investigating an issue we are running on with impala DDL statements > which sometimes took more than 6-9 minutes. > > We have around 144 impala tables that partitioned by YYYY/MM/DD > We are keeping the data between 3-13 months depend on the table, we are > running 3 different DDL statements. > > ---> ALTER Table recover partitions each 20 minutes to detect the new data > generated by spark job and written into the HDFS. > > ---> DROP AND CREATE table twice a day to detect new schema changes in the > data. > > I don't see the issue occuring on specific table or specific Impala daemon, > > On the other side we have 450 hive tables that we running the same DDL > statements on using the hive. > > Trying to find a way to investigate this with no success, for example i > want to check the size of the metadata stored at each daemon in order to > see if my issue related to the metadata size or not, but don't aware how to > check this. > > Any suggestions on how to investigate this issue is much appreciated. > > -- > Take Care > Fawze Abujaber >
