So, it seems the only way I found for now is a recursive handling of the Row
instances directly, but to do that I have to go back to RDDs, i've put together
a simple test case demonstrating the problem :
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.scalatest.{FlatSpec,
Hi michael,Well for nested structs, I saw in the tests the behaviour defined by
SPARK-12512 for the "a.b.c" handling in withColumn, and even if it's not ideal
for me, I managed to make it work anyway like that :> df.withColumn("a",
struct(struct(myUDF(df("a.b.c." // I didn't put back the
Is what you are looking for a withColumn that support in place modification
of nested columns? or is it some other problem?
On Wed, Sep 14, 2016 at 11:07 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> I tried to use the RowEncoder but got stuck along the way :
> The main issue
I tried to use the RowEncoder but got stuck along the way :The main issue
really is that even if it's possible (however tedious) to pattern match
generically Row(s) and target the nested field that you need to modify, Rows
being immutable data structure without a method like a case class's copy or
+1 to this request. I talked last week with a product group within IBM that
is struggling with the same issue. It's pretty common in data cleaning
applications for data in the early stages to have nested lists or sets
inconsistent or incomplete schema information.
Fred
On Tue, Sep 13, 2016 at