Re: [SQL] Two columns in output vs one when joining DataFrames?

Sean Owen Fri, 25 Mar 2016 13:27:07 -0700

Although you understand the two are semantically equivalent, the
second case involves an arbitrary condition not a join on a column per
se. In the general case there is not even a shared column between the
two being joined, so all of both are included.


On Fri, Mar 25, 2016 at 9:19 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> I've read the note about both columns included when DataFrames are
> joined, but don't think it differentiated between versions of join. Is
> this a feature or a bug that the following session shows one _1 column
> with Seq("_1") and two columns for ===?
>
> {code}
> scala> left.join(right, Seq("_1")).show
> +---+---+---+
> | _1| _2| _2|
> +---+---+---+
> |  1|  a|  a|
> |  2|  b|  b|
> +---+---+---+
>
>
> scala> left.join(right, left("_1") === right("_1")).show
> +---+---+---+---+
> | _1| _2| _1| _2|
> +---+---+---+---+
> |  1|  a|  1|  a|
> |  2|  b|  2|  b|
> +---+---+---+---+
> {code}
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: [SQL] Two columns in output vs one when joining DataFrames?

Reply via email to