Spark Sql behaves strangely with tables with a lot of partitions

Jerrick Hoang Wed, 19 Aug 2015 19:52:52 -0700

Hi all,

I did a simple experiment with Spark SQL. I created a partitioned parquet
table with only one partition (date=20140701). A simple `select count(*)
from table where date=20140701` would run very fast (0.1 seconds). However,
as I added more partitions the query takes longer and longer. When I added
about 10,000 partitions, the query took way too long. I feel like querying
for a single partition should not be affected by having more partitions. Is
this a known behaviour? What does spark try to do here?


Thanks,
Jerrick

Spark Sql behaves strangely with tables with a lot of partitions

Reply via email to