[ https://issues.apache.org/jira/browse/SPARK-26782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755987#comment-16755987 ]
Marco Gaido commented on SPARK-26782: ------------------------------------- This is a duplicate of many others. I also started a thread on the dev mailing list regarding this problem. Let me close this as a duplicate. > Wrong column resolved when joining twice with the same dataframe > ---------------------------------------------------------------- > > Key: SPARK-26782 > URL: https://issues.apache.org/jira/browse/SPARK-26782 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.1 > Reporter: Vladimir Prus > Priority: Major > > # Execute the following code: > > {code:java} > { > val events = Seq(("a", 0)).toDF("id", "ts") > val dim = Seq(("a", 0, 24), ("a", 24, 48)).toDF("id", "start", "end") > > val dimOriginal = dim.as("dim") > val dimShifted = dim.as("dimShifted") > val r = events > .join(dimOriginal, "id") > .where(dimOriginal("start") <= $"ts" && $"ts" < dimOriginal("end")) > val r2 = r > .join(dimShifted, "id") > .where(dimShifted("start") <= $"ts" + 24 && $"ts" + 24 < dimShifted("end")) > > r2.show() > r2.explain(true) > } > {code} > > # Expected effect: > ** One row is shown > ** Logical plan shows two independent joints with "dim" and "dimShifted" > # Observed effect: > ** No rows are printed. > ** Logical plan shows two filters are applied: > *** 'Filter ((start#17 <= ('ts + 24)) && (('ts + 24) < end#18))' > *** Filter ((start#17 <= ts#6) && (ts#6 < end#18)) > ** Both these filters refer to the same start#17 and start#18 columns, so > they are applied to the same dataframe, not two different ones. > ** It appears that dimShifted("start") is resolved to be identical to > dimOriginal("start") > # I get the desired effect if I replace the second where with > {code:java} > .where($"dimShifted.start" <= $"ts" + 24 && $"ts" + 24 < $"dimShifted.end") > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org