I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code.
The "b" dataframe does not have field "bar" in its schema, but is still able to filter on that field. scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar") a: org.apache.spark.sql.DataFrame = [foo: int, bar: int] scala> a.show +---+---+ |foo|bar| +---+---+ | 1| 10| | 2| 20| +---+---+ scala> val b = a.select($"foo") b: org.apache.spark.sql.DataFrame = [foo: int] scala> b.schema res3: org.apache.spark.sql.types.StructType = StructType(StructField(foo,IntegerType,false)) scala> b.select($"bar").show org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];; [...snip...] scala> b.where($"bar" === 20).show +---+ |foo| +---+ | 2| +---+