Hi all,

I did a simple experiment with Spark SQL. I created a partitioned parquet
table with only one partition (date=20140701). A simple `select count(*)
from table where date=20140701` would run very fast (0.1 seconds). However,
as I added more partitions the query takes longer and longer. When I added
about 10,000 partitions, the query took way too long. I feel like querying
for a single partition should not be affected by having more partitions. Is
this a known behaviour? What does spark try to do here?

Thanks,
Jerrick

Reply via email to