Metastore performance on HDFS-backed table with 15000+ partitions

Norbert Burger Fri, 21 Feb 2014 19:06:14 -0800

Hi folks,

We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.


In Hive, we have an external table backed by HDFS which has a 3-level
partitioning scheme that currently has 15000+ partitions.

Within the last day or so, queries against this table have started failing.
 A simple query which shouldn't take very long at all (select * from ...
limit 10) fails after several minutes with a client OOME.  I get the same
outcome on count(*) queries (which I thought wouldn't send any data back to
the client).  Increasing heap on both client and server JVMs (via
HADOOP_HEAPSIZE) doesn't have any impact.

We were only able to work around the client OOMEs by reducing the number of
partitions in the table.

Looking at the MySQL querylog, my thought is that the Hive client is quite
busy making requests for partitions that doesn't contribute to the query.
 Has anyone else had similar experience against tables this size?

Thanks,
Norbert

Metastore performance on HDFS-backed table with 15000+ partitions

Reply via email to