Thanks for the experiments and analysis!
I think Michael already submitted a patch that avoids scanning all columns
for count(*) or count(1).
On Mon, May 12, 2014 at 9:46 PM, Andrew Ash and...@andrewash.com wrote:
Hi Spark devs,
First of all, huge congrats on the parquet integration with
Thanks for filing -- I'm keeping my eye out for updates on that ticket.
Cheers!
Andrew
On Tue, May 13, 2014 at 2:40 PM, Michael Armbrust mich...@databricks.comwrote:
It looks like currently the .count() on parquet is handled incredibly
inefficiently and all the columns are materialized.
Hi Spark devs,
First of all, huge congrats on the parquet integration with SparkSQL! This
is an incredible direction forward and something I can see being very
broadly useful.
I was doing some preliminary tests to see how it works with one of my
workflows, and wanted to share some numbers that