Query optimizer in hive is awful on memory consumption. 15k partitions sounds a bit early for it to fail though..
What is your heap size? Regards, Terje > On 22 Feb 2014, at 12:05, Norbert Burger <norbert.bur...@gmail.com> wrote: > > Hi folks, > > We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore. > > In Hive, we have an external table backed by HDFS which has a 3-level > partitioning scheme that currently has 15000+ partitions. > > Within the last day or so, queries against this table have started failing. > A simple query which shouldn't take very long at all (select * from ... limit > 10) fails after several minutes with a client OOME. I get the same outcome > on count(*) queries (which I thought wouldn't send any data back to the > client). Increasing heap on both client and server JVMs (via > HADOOP_HEAPSIZE) doesn't have any impact. > > We were only able to work around the client OOMEs by reducing the number of > partitions in the table. > > Looking at the MySQL querylog, my thought is that the Hive client is quite > busy making requests for partitions that doesn't contribute to the query. > Has anyone else had similar experience against tables this size? > > Thanks, > Norbert