Hi Michael, scala> spark.version res0: String = 2.4.0-SNAPSHOT
scala> val r1 = spark.range(1) r1: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> r1.as("left").join(r1.as("right")).filter($"left.id" === $"right.id ").show +---+---+ | id| id| +---+---+ | 0| 0| +---+---+ Am I missing something? When aliasing a table, use the identifier in column refs (inside). Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Mon, Jan 15, 2018 at 3:26 PM, Michael Shtelma <mshte...@gmail.com> wrote: > Hi Jacek & Gengliang, > > let's take a look at the following query: > > val pos = spark.read.parquet(prefix + "POSITION.parquet") > pos.createOrReplaceTempView("POSITION") > spark.sql("SELECT POSITION.POSITION_ID FROM POSITION POSITION JOIN > POSITION POSITION1 ON POSITION.POSITION_ID0 = POSITION1.POSITION_ID > ").collect() > > This query is working for me right now using spark 2.2. > > Now we can try implementing the same logic with DataFrame API: > > pos.join(pos, pos("POSITION_ID0")===pos("POSITION_ID")).collect() > > I am getting the following error: > > "Join condition is missing or trivial. > > Use the CROSS JOIN syntax to allow cartesian products between these > relations.;" > > I have tried using alias function, but without success: > > val pos2 = pos.alias("P2") > pos.join(pos2, pos("POSITION_ID0")===pos2("POSITION_ID")).collect() > > This also leads us to the same error. > Am I missing smth about the usage of alias? > > Now let's rename the columns: > > val pos3 = pos.toDF(pos.columns.map(_ + "_2"): _*) > pos.join(pos3, pos("POSITION_ID0")===pos3("POSITION_ID_2")).collect() > > It works! > > There is one more really odd thing about all this: a colleague of mine > has managed to get the same exception ("Join condition is missing or > trivial") also using original SQL query, but I think he has been using > empty tables. > > Thanks, > Michael > > > On Mon, Jan 15, 2018 at 11:27 AM, Gengliang Wang > <gengliang.w...@databricks.com> wrote: > > Hi Michael, > > > > You can use `Explain` to see how your query is optimized. > > https://docs.databricks.com/spark/latest/spark-sql/ > language-manual/explain.html > > I believe your query is an actual cross join, which is usually very slow > in > > execution. > > > > To get rid of this, you can set `spark.sql.crossJoin.enabled` as true. > > > > > > 在 2018年1月15日,下午6:09,Jacek Laskowski <ja...@japila.pl> 写道: > > > > Hi Michael, > > > > -dev +user > > > > What's the query? How do you "fool spark"? > > > > Pozdrawiam, > > Jacek Laskowski > > ---- > > https://about.me/JacekLaskowski > > Mastering Spark SQL https://bit.ly/mastering-spark-sql > > Spark Structured Streaming https://bit.ly/spark-structured-streaming > > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > > Follow me at https://twitter.com/jaceklaskowski > > > > On Mon, Jan 15, 2018 at 10:23 AM, Michael Shtelma <mshte...@gmail.com> > > wrote: > >> > >> Hi all, > >> > >> If I try joining the table with itself using join columns, I am > >> getting the following error: > >> "Join condition is missing or trivial. Use the CROSS JOIN syntax to > >> allow cartesian products between these relations.;" > >> > >> This is not true, and my join is not trivial and is not a real cross > >> join. I am providing join condition and expect to get maybe a couple > >> of joined rows for each row in the original table. > >> > >> There is a workaround for this, which implies renaming all the columns > >> in source data frame and only afterwards proceed with the join. This > >> allows us to fool spark. > >> > >> Now I am wondering if there is a way to get rid of this problem in a > >> better way? I do not like the idea of renaming the columns because > >> this makes it really difficult to keep track of the names in the > >> columns in result data frames. > >> Is it possible to deactivate this check? > >> > >> Thanks, > >> Michael > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > > > > >