On 2 Jul 2013, at 16:51, Owen O'Malley wrote: > On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote: > >> Hi Owen,**** >> >> ** ** >> >> I’m curious about this advice about partitioning. Is there some >> fundamental reason why Hive**** >> >> is slow when the number of partitions is 10,000 rather than 1,000? >> > > The precise numbers don't matter. I wanted to give people a ballpark range > that they should be looking at. Most tables at 1000 partitions won't cause > big slow downs, but the cost scales with the number of partitions. By the > time you are at 10,000 the cost is noticeable. I have one customer who has > a table with 1.2 million partitions. That causes a lot of slow downs.
That is still not really answering the question, which is: why is it slower to run a query on a heavily partitioned table than it is on the same number of files in a less heavily partitioned table. David