I am trying to joind df1 with df2 and result of which to again with df2. df is a common dataframe.
val df3 = df1 .join(*df2*, df1("PARTICIPANT_ID") === df2("PARTICIPANT_ID") and df1("BUSINESS_ID") === df2("BUSINESS_ID")) .drop(df1("BUSINESS_ID")) //dropping duplicates .drop(df1("PARTICIPANT_ID")) //dropping duplicates .select("EMPLOYEE_ID",...) val df4 = df3 .join(*df2*, df3("EMPLOYEE_ID") === df2("EMPLOYEE_ID") and df3("BUSINESS_ID") === df2("BUSINESS_ID")) .drop(df2("BUSINESS_ID")) //dropping duplicates .drop(df2("EMPLOYEE_ID")) //dropping duplicates .select(...) I am getting following warning and most likely its an Cartesian join which is not what I want. 14:30:32.193 12262 [main] WARN org.apache.spark.sql.Column - *Constructing trivially true equals predicate*, 'EMPLOYEE_ID#83 = EMPLOYEE_ID#83'. Perhaps you need to use aliases. 14:30:32.195 12264 [main] WARN org.apache.spark.sql.Column - Constructing trivially true equals predicate, 'BUSINESS_ID#312 = BUSINESS_ID#312'. Perhaps you need to use aliases. As you can see, one of my Join predicate is converted to "(EMPLOYEE_ID#83 = EMPLOYEE_ID#83)" I think this should be okay because they should still be columns from different dataframe (df3 and df2). Just want to confirm that this warning is harmless in this scenario. Problem is similar to this one: https://stackoverflow.com/questions/32190828/spark-sql-performing-carthesian-join-instead-of-inner-join -- <http://www.xactlycorp.com/email-click/> <https://www.instagram.com/xactlycorp/> <https://www.linkedin.com/company/xactly-corporation> <https://twitter.com/Xactly> <https://www.facebook.com/XactlyCorp> <http://www.youtube.com/xactlycorporation>