Re: Inner join with the table itself

Gengliang Wang Mon, 15 Jan 2018 02:27:46 -0800

Hi Michael,

You can use `Explain` to see how your query is optimized. 
https://docs.databricks.com/spark/latest/spark-sql/language-manual/explain.html 
<https://docs.databricks.com/spark/latest/spark-sql/language-manual/explain.html>
I believe your query is an actual cross join, which is usually very slow in 
execution.


To get rid of this, you can set `spark.sql.crossJoin.enabled` as true.


> 在 2018年1月15日，下午6:09，Jacek Laskowski <ja...@japila.pl> 写道：
> 
> Hi Michael,
> 
> -dev +user
> 
> What's the query? How do you "fool spark"?
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski <https://about.me/JacekLaskowski>
> Mastering Spark SQL https://bit.ly/mastering-spark-sql 
> <https://bit.ly/mastering-spark-sql>
> Spark Structured Streaming https://bit.ly/spark-structured-streaming 
> <https://bit.ly/spark-structured-streaming>
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams 
> <https://bit.ly/mastering-kafka-streams>
> Follow me at https://twitter.com/jaceklaskowski
>  <https://twitter.com/jaceklaskowski>
> On Mon, Jan 15, 2018 at 10:23 AM, Michael Shtelma <mshte...@gmail.com 
> <mailto:mshte...@gmail.com>> wrote:
> Hi all,
> 
> If I try joining the table with itself using join columns, I am
> getting the following error:
> "Join condition is missing or trivial. Use the CROSS JOIN syntax to
> allow cartesian products between these relations.;"
> 
> This is not true, and my join is not trivial and is not a real cross
> join. I am providing join condition and expect to get maybe a couple
> of joined rows for each row in the original table.
> 
> There is a workaround for this, which implies renaming all the columns
> in source data frame and only afterwards proceed with the join. This
> allows us to fool spark.
> 
> Now I am wondering if there is a way to get rid of this problem in a
> better way? I do not like the idea of renaming the columns because
> this makes it really difficult to keep track of the names in the
> columns in result data frames.
> Is it possible to deactivate this check?
> 
> Thanks,
> Michael
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> <mailto:dev-unsubscr...@spark.apache.org>
> 
>

Re: Inner join with the table itself

Reply via email to