I actually decided to remove one of my 2 partition columns and make it a bucketing column instead... same query completed fully in under 10 minutes with 92 partitions added. This will suffice for me for now.
On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > Hmm... did your performance increase with the patch you supplied? I do > need the partitions in Hive, but I have a separate tool that has the > ability to add partitions to the metastore and is definitely much faster > than this. I just checked my job again, the actual Hive job completed 24 > hours ago and has been adding the dynamic partitions to the metastore since > then and is still not done. According to the metastore theres only 10830 > partitions added so far... at this pace, it will take approximately 2 more > days for it complete. > > On Thu, Jun 11, 2015 at 1:18 PM, Slava Markeyev < > slava.marke...@upsight.com> wrote: > >> This is something that a few of us have run into. I think the bottleneck >> is in partition creation calls to the metastore. My work around was >> HIVE-10385 which optionally removed partition creation in the metastore but >> this isn't a solution for everyone. If you don't require actual partitions >> in the table but simply partitioned data in hdfs give it a shot. It may be >> worthwhile looking into optimizations for this use case. >> >> -Slava >> >> On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota <pradeep...@gmail.com >> > wrote: >> >>> Hi All, >>> >>> I have a table which is partitioned on two columns (customer, date). I'm >>> loading some data into the table using a Hive query. The MapReduce job >>> completed within a few minutes and needs to "commit" the data to the >>> appropriate partitions. There were about 32000 partitions generated. The >>> commit phase has been running for almost 16 hours and has not finished yet. >>> I've been monitoring jmap, and don't believe it's a memory or gc issue. >>> I've also been looking at jstack and not sure why it's so slow. I'm not >>> sure what the problem is, but seems to be a Hive performance issue when it >>> comes to "highly partitioned" tables. >>> >>> Any thoughts on this issue would be greatly appreciated. >>> >>> Thanks in advance, >>> Pradeep >>> >> >> >> >> -- >> >> Slava Markeyev | Engineering | Upsight >> >> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev> >> <http://www.linkedin.com/in/slavamarkeyev> >> > >