One way is to split->explode->pivot
These are column and Dataframe methods.
Here are quick examples from web:
https://www.google.com/amp/s/sparkbyexamples.com/spark/spark-split-dataframe-column-into-multiple-columns/amp/
https://www.google.com/amp/s/sparkbyexamples.com/spark/explode-spark-array-a
Is this the scala syntax?
Yes in scala I know how to do it by converting the df to a dataset.
how for pyspark?
Thanks
On 2022/2/9 10:24, oliver dd wrote:
df.flatMap(row => row.getAs[String]("value").split(" "))
-
To unsubscri
Hi,
You can achieve your goal by:
df.flatMap(row => row.getAs[String]("value").split(" "))
—
Best Regards,
oliverdding
Hello
for the RDD I can apply flatMap method:
>>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x:
x.split(" ")).collect()
['a', 'few', 'words', 'ba', 'na', 'ba', 'na']
But for a dataframe table how can I flatMap that as above?
>>> df.show()
++
| val