Vladimir Prus created SPARK-26782: ------------------------------------- Summary: Wrong column resolved when joining twice with the same dataframe Key: SPARK-26782 URL: https://issues.apache.org/jira/browse/SPARK-26782 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Vladimir Prus
# Execute the following code: {code:java} { val events = Seq(("a", 0)).toDF("id", "ts") val dim = Seq(("a", 0, 24), ("a", 24, 48)).toDF("id", "start", "end") val dimOriginal = dim.as("dim") val dimShifted = dim.as("dimShifted") val r = events .join(dimOriginal, "id") .where(dimOriginal("start") <= $"ts" && $"ts" < dimOriginal("end")) val r2 = r .join(dimShifted, "id") .where(dimShifted("start") <= $"ts" + 24 && $"ts" + 24 < dimShifted("end")) r2.show() r2.explain(true) } {code} # Expected effect: ** One row is shown ** Logical plan shows two independent joints with "dim" and "dimShifted" # Observed effect: ** No rows are printed. ** Logical plan shows two filters are applied: *** 'Filter ((start#17 <= ('ts + 24)) && (('ts + 24) < end#18))' *** Filter ((start#17 <= ts#6) && (ts#6 < end#18)) ** Both these filters refer to the same start#17 and start#18 columns, so they are applied to the same dataframe, not two different ones. ** It appears that dimShifted("start") is resolved to be identical to dimOriginal("start") # I get the desired effect if I replace the second where with {code:java} .where($"dimShifted.start" <= $"ts" + 24 && $"ts" + 24 < $"dimShifted.end") {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org