If I have a Hive table with six columns and create a DataFrame (Spark
1.4.1) using a sqlContext.sql("select * from ...") query, the resulting
physical plan shown by explain reflects the goal of returning all six
columns.

If I then call select("one_column") on that first DataFrame, the resulting
DataFrame still gives a physical plan of fetching all six columns.

Shouldn't the subsequent select() have pruned the projections in the
physical plan?

Reply via email to