If I have a Hive table with six columns and create a DataFrame (Spark 1.4.1) using a sqlContext.sql("select * from ...") query, the resulting physical plan shown by explain reflects the goal of returning all six columns.
If I then call select("one_column") on that first DataFrame, the resulting DataFrame still gives a physical plan of fetching all six columns. Shouldn't the subsequent select() have pruned the projections in the physical plan?