This is something that a few of us have run into. I think the bottleneck is in partition creation calls to the metastore. My work around was HIVE-10385 which optionally removed partition creation in the metastore but this isn't a solution for everyone. If you don't require actual partitions in the table but simply partitioned data in hdfs give it a shot. It may be worthwhile looking into optimizations for this use case.
-Slava On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > Hi All, > > I have a table which is partitioned on two columns (customer, date). I'm > loading some data into the table using a Hive query. The MapReduce job > completed within a few minutes and needs to "commit" the data to the > appropriate partitions. There were about 32000 partitions generated. The > commit phase has been running for almost 16 hours and has not finished yet. > I've been monitoring jmap, and don't believe it's a memory or gc issue. > I've also been looking at jstack and not sure why it's so slow. I'm not > sure what the problem is, but seems to be a Hive performance issue when it > comes to "highly partitioned" tables. > > Any thoughts on this issue would be greatly appreciated. > > Thanks in advance, > Pradeep > -- Slava Markeyev | Engineering | Upsight Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev> <http://www.linkedin.com/in/slavamarkeyev>