Re: Casting nested columns and updated nested struct fields.

2018-11-23 Thread Colin Williams
Looks like it's been reported already. It's too bad it's been a year
but should be released into spark 3:
https://issues.apache.org/jira/browse/SPARK-22231
On Fri, Nov 23, 2018 at 8:42 AM Colin Williams
 wrote:
>
> Seems like it's worthy of filing a bug against withColumn
>
> On Wed, Nov 21, 2018, 6:25 PM Colin Williams 
> >
>> Hello,
>>
>> I'm currently trying to update the schema for a dataframe with nested
>> columns. I would either like to update the schema itself or cast the
>> column without having to explicitly select all the columns just to
>> cast one.
>>
>> In regards to updating the schema it looks like I would probably need
>> to write a more complex map on the schema to find the StructFields I
>> want  to update and update them. I haven't found any examples of this
>> but it seems like there should be a simpler way to do it.
>>
>> In regards to changing the column on the dataframe itself, using E.G.
>>
>> val newDF = 
>> df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>>
>> I end up with a new column named "existing.top.level.FIELD_NAME" at
>> the root level vs updating the nested column to the new type. Then has
>> anybody worked out how to both update nested column datatype and also
>> how to update the column type from the nested schema StructType? Are
>> there any easy ways to do this or is there a reason it is not trivial?

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Casting nested columns and updated nested struct fields.

2018-11-23 Thread Colin Williams
Seems like it's worthy of filing a bug against withColumn

On Wed, Nov 21, 2018, 6:25 PM Colin Williams <
colin.williams.seat...@gmail.com wrote:

> Hello,
>
> I'm currently trying to update the schema for a dataframe with nested
> columns. I would either like to update the schema itself or cast the
> column without having to explicitly select all the columns just to
> cast one.
>
> In regards to updating the schema it looks like I would probably need
> to write a more complex map on the schema to find the StructFields I
> want  to update and update them. I haven't found any examples of this
> but it seems like there should be a simpler way to do it.
>
> In regards to changing the column on the dataframe itself, using E.G.
>
> val newDF =
> df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>
> I end up with a new column named "existing.top.level.FIELD_NAME" at
> the root level vs updating the nested column to the new type. Then has
> anybody worked out how to both update nested column datatype and also
> how to update the column type from the nested schema StructType? Are
> there any easy ways to do this or is there a reason it is not trivial?
>