[jira] [Commented] (SPARK-33871) Cannot access to column after left semi join and left join

2020-12-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255978#comment-17255978
 ] 

Hyukjin Kwon commented on SPARK-33871:
--

+1 for [~viirya]'s advice here.

> Cannot access to column after left semi join  and left join
> ---
>
> Key: SPARK-33871
> URL: https://issues.apache.org/jira/browse/SPARK-33871
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> Cannot access to column after left semi join and left join
> {code}
> val col = "c1"
> val df = Seq((1, "a"),(2, "a"),(3, "a"),(4, "a")).toDF(col, "c2")
> val df2 = Seq(1).toDF(col)
> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi")
> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
> left.show
> +---+---+++
> | c1| c2|  c1|  c2|
> +---+---+++
> |  1|  a|   1|   a|
> |  2|  a|null|null|
> |  3|  a|null|null|
> |  4|  a|null|null|
> +---+---+++
> left.select(semiJoin(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> left.select(df(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33871) Cannot access to column after left semi join and left join

2020-12-23 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254224#comment-17254224
 ] 

L. C. Hsieh commented on SPARK-33871:
-

For self-join, Spark will add alias to ambiguous columns in the join query. But 
semiJoin as a query, the column col is still referred to df.col. So 
left.select(semiJoin(col)), left.select(df(col)) are basically selecting same 
column.

If you want to access the column col of the semi join in the left join, a work 
around is to put a relation alias and access col on top of this relation alias.

{code}
scala> val semiJoin = df.join(df2, df(col) === df2(col), 
"left_semi").as("left_semi")
scala> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
scala> left.select("left_semi.c1").show

++
|  c1|
++
|   1|
|null|
|null|
|null|
++

{code}

> Cannot access to column after left semi join  and left join
> ---
>
> Key: SPARK-33871
> URL: https://issues.apache.org/jira/browse/SPARK-33871
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> Cannot access to column after left semi join and left join
> {code}
> val col = "c1"
> val df = Seq((1, "a"),(2, "a"),(3, "a"),(4, "a")).toDF(col, "c2")
> val df2 = Seq(1).toDF(col)
> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi")
> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
> left.show
> +---+---+++
> | c1| c2|  c1|  c2|
> +---+---+++
> |  1|  a|   1|   a|
> |  2|  a|null|null|
> |  3|  a|null|null|
> |  4|  a|null|null|
> +---+---+++
> left.select(semiJoin(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> left.select(df(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org