Hi Cyril,
In the case where there are no documents, it looks like there is a typo in
"addresses" (check the number of "d"s):
| scala> df.select(explode(df("addresses.id")).as("aid"), df("id")) <==
addresses
| org.apache.spark.sql.AnalysisException: Cannot resolve column name "id"
among (adresses
Nobody has the answer ?
Another thing I've seen is that if I have no documents at all :
scala> df.select(explode(df("addresses.id")).as("aid")).collect
res27: Array[org.apache.spark.sql.Row] = Array()
Then
scala> df.select(explode(df("addresses.id")).as("aid"), df("id"))
org.apache.spark.sql.
Hi Ashish,
The issue is not related to converting a RDD to a DF. I did it. I was just
asking if I should do it differently.
The issue regards the exception when using array_contains with a sql.Column
instead of a value.
I found another way to do it using explode as follows :
df.select(explod
Is there any reason you dont want to convert this - i dont think join b/w
RDD and DF is supported.
On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon
wrote:
> Hi,
>
> I have a RDD built during a spark streaming job and I'd like to join it to
> a DataFrame (E/S input) to enrich it.
> It seems that I
Hi,
I have a RDD built during a spark streaming job and I'd like to join it to a
DataFrame (E/S input) to enrich it.
It seems that I can't join the RDD and the DF without converting first the RDD
to a DF (Tell me if I'm wrong). Here are the schemas of both DF :
scala> df
res32: org.apache.spark