projection optimization?

Eric Friedman Tue, 28 Jul 2015 07:48:21 -0700

If I have a Hive table with six columns and create a DataFrame (Spark
1.4.1) using a sqlContext.sql("select * from ...") query, the resulting
physical plan shown by explain reflects the goal of returning all six
columns.


If I then call select("one_column") on that first DataFrame, the resulting
DataFrame still gives a physical plan of fetching all six columns.

Shouldn't the subsequent select() have pruned the projections in the
physical plan?

projection optimization?

Reply via email to