Hi.
It's surprisingly, but this code solves my problem:
private static Column namedStruct(Column... cols) {
List<_expression_> exprs = Arrays.stream(cols)
.flatMap(c ->
Stream.of(
new Literal(UTF8String.fromString(((NamedExpression) c.expr()).name()), DataTypes.StringType),
c.expr()
)
)
.collect(Collectors.toList());
return new Column(new CreateNamedStruct(JavaConversions.asScalaBuffer(exprs).toSeq()));
}
...
DataFrame profiles = df.select(
column("_id"),
namedStruct(
column("name.first").as("first_name"),
column("name.last").as("last_name"),
column("friends")
).as("profile")
)...
Didn't go deep and wasn't looking for any reasons of the problem.
Best regards, Alexander Chermenin.
Web: http://chermenin.ru
Mail: a...@chermenin.ru
06.05.2016, 14:19, "Alexander Chermenin" <a...@chermenin.ru>:
Hi everybody!This code:DataFrame df = sqlContext.read().json(FILE_NAME);DataFrame profiles = df.select(column("_id"),struct(column("name.first").as("first_name"),column("name.last").as("last_name"),column("friends")).as("profile")).limit(1);profiles.select(column("_id"), column("profile")).toJavaRDD().collect().forEach(r -> printRowFields(r.getStruct(1))); // #1sqlContext.udf().register("schema", (UDF1<Row, Void>) r -> printRowFields(r), DataTypes.NullType); // #2profiles.select(column("_id"), callUDF("schema", column("profile"))).show();out:#1:
StructField(first_name,StringType,true)StructField(last_name,StringType,true)StructField(friends,ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)
#2:
StructField(col1,StringType,true)StructField(col2,StringType,true)StructField(i[2],ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)But why names of fields lost in UDF? What's wrong?Best regards, Alex Chermenin.