I have 2 dataframes , lets call them A and B, A is made up out of [unique_id, field1] B is made up out of [unique_id, field2]
The have the exact same number of rows, and every id in A is also present
in B
if I execute a join like this A.join(B,
Seq("unique_id")).select($"unique_id", $"field1") then spark will do an
expensive join even though it does not have to because all the fields it
needs are in A. is there some trick I can use so that catalyst will
optimise this join away ?
