Github user daniel-shields commented on the issue:
https://github.com/apache/spark/pull/21449
This case can also occur when the datasets are different but share a common
lineage. Consider the following:
`df = spark.range(10)
df1 = df.groupby('id').count()
df2 = df.groupby('id').sum('id')
df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()`
This currently fails with eqNullSafe, but works with ==.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]