Hi Jerry,
I think you are running into an issue similar to SPARK-14040
https://issues.apache.org/jira/browse/SPARK-14040
<https://issues.apache.org/jira/browse/SPARK-14040>
One way to resolve it is to use alias.
Here is an example that I tried on trunk and I do not see any exceptions.
val d1=base.where($"label" === 0) as("d1")
val d2=base.where($"label" === 1).as("d2")
d1.join(d2, $"d1.id" === $"d2.id",
"left_outer").drop($"d2.label").select($"d1.label")
Hope this helps some.
Best regards,
Sunitha.
> On Mar 28, 2016, at 2:34 PM, Jerry Lam <[email protected]> wrote:
>
> Hi spark users and developers,
>
> I'm using spark 1.5.1 (I have no choice because this is what we used). I ran
> into some very unexpected behaviour when I did some join operations lately. I
> cannot post my actual code here and the following code is not for practical
> reasons but it should demonstrate the issue.
>
> val base = sc.parallelize(( 0 to 49).map(i =>(i,0)) ++ (50 to
> 99).map((_,1))).toDF("id", "label")
> val d1=base.where($"label" === 0)
> val d2=base.where($"label" === 1)
> d1.join(d2, d1("id") === d2("id"),
> "left_outer").drop(d2("label")).select(d1("label"))
>
>
> The above code will throw an exception saying the column label is not found.
> Do you have a reason for throwing an exception when the column has not been
> dropped for d1("label")?
>
> Best Regards,
>
> Jerry