Query optimizer in hive is awful on memory consumption. 15k partitions sounds a 
bit early for it to fail though.. 

What is your heap size?

Regards,
Terje

> On 22 Feb 2014, at 12:05, Norbert Burger <norbert.bur...@gmail.com> wrote:
> 
> Hi folks,
> 
> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
> 
> In Hive, we have an external table backed by HDFS which has a 3-level 
> partitioning scheme that currently has 15000+ partitions.
> 
> Within the last day or so, queries against this table have started failing.  
> A simple query which shouldn't take very long at all (select * from ... limit 
> 10) fails after several minutes with a client OOME.  I get the same outcome 
> on count(*) queries (which I thought wouldn't send any data back to the 
> client).  Increasing heap on both client and server JVMs (via 
> HADOOP_HEAPSIZE) doesn't have any impact.
> 
> We were only able to work around the client OOMEs by reducing the number of 
> partitions in the table.
> 
> Looking at the MySQL querylog, my thought is that the Hive client is quite 
> busy making requests for partitions that doesn't contribute to the query.  
> Has anyone else had similar experience against tables this size?
> 
> Thanks,
> Norbert

Reply via email to