most interesting. we had an issue recently with querying a table with 15K columns and running out of heap storage but not 15K partitions.
15K partitions shouldn't be causing a problem in my humble estimation. Maybe a million but not 15K. :) So is there a traceback we can look at? or its not heap but real memory? and this is the local hive client? or the hiveserver? Thanks, Stephen. On Fri, Feb 21, 2014 at 7:05 PM, Norbert Burger <norbert.bur...@gmail.com>wrote: > Hi folks, > > We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore. > > In Hive, we have an external table backed by HDFS which has a 3-level > partitioning scheme that currently has 15000+ partitions. > > Within the last day or so, queries against this table have started > failing. A simple query which shouldn't take very long at all (select * > from ... limit 10) fails after several minutes with a client OOME. I get > the same outcome on count(*) queries (which I thought wouldn't send any > data back to the client). Increasing heap on both client and server JVMs > (via HADOOP_HEAPSIZE) doesn't have any impact. > > We were only able to work around the client OOMEs by reducing the number > of partitions in the table. > > Looking at the MySQL querylog, my thought is that the Hive client is quite > busy making requests for partitions that doesn't contribute to the query. > Has anyone else had similar experience against tables this size? > > Thanks, > Norbert >