Hi Michael,

thank you for your reply, I tried to use explode before, but didn't had
much success. But I didn't find an example where it is used like you
suggested.

I applied your suggestion and it works like a charm!

Thanks,
Nuno


Nuno Carvalho
Software Engineer - Big Data & Analytics

On 19 October 2015 at 21:55, Michael Armbrust <mich...@databricks.com>
wrote:

> Quickfix is probably to use Seq[Row] instead of Array (the types that are
> returned are documented here:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types)
>
> Really though you probably want to be using explode.  Perhaps something
> like this would help?
>
> import org.apache.spark.sql.functions._
> dataFrame.select(explode($"provider.contract")).as("contract"))
>
> On Mon, Oct 19, 2015 at 8:08 AM, nunomrc <nuno.carva...@rightster.com>
> wrote:
>
>> Hi I am fairly new to Spark and I am trying to flatten the following
>> structure:
>>
>>  |-- provider: struct (nullable = true)
>>  |    |-- accountId: string (nullable = true)
>>  |    |-- contract: array (nullable = true)
>>
>> And then provider is:
>> root
>>  |-- accountId: string (nullable = true)
>>  |-- contract: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- details: struct (nullable = true)
>>  |    |    |    |-- contractId: string (nullable = true)
>>  |    |    |    |-- countryCode: string (nullable = true)
>>  |    |    |    |-- endDate: string (nullable = true)
>>  |    |    |    |-- noticePeriod: long (nullable = true)
>>  |    |    |    |-- startDate: string (nullable = true)
>>  |    |    |-- endDate: string (nullable = true)
>>  |    |    |-- startDate: string (nullable = true)
>>  |    |    |-- other: struct (nullable = true)
>>  |    |    |    |-- type: string (nullable = true)
>>  |    |    |    |-- values: array (nullable = true)
>>  |    |    |    |    |-- element: struct (containsNull = true)
>>  |    |    |    |    |    |-- key: string (nullable = true)
>>  |    |    |    |    |    |-- value: long (nullable = true)
>>
>>
>> I am trying the following:
>>
>> dataFrame.map { case Row(....., provider: Row, .....) =>
>>    val list = provider.getAs[Array[Row]]("contract")
>>
>> At this point, I get the following exception:
>> [info]   org.apache.spark.SparkException: Job aborted due to stage
>> failure:
>> Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in
>> stage 4.0 (TID 9, localhost): java.lang.ClassCastException:
>> scala.collection.mutable.WrappedArray$ofRef cannot be cast to
>> [Lorg.apache.spark.sql.Row;
>> [info]  at com.mycode.Deal$$anonfun$flattenDeals$1.apply(Deal.scala:62)
>>
>> I tried many different variations of this and tried to get the actual data
>> type of the elements of the array, without any success.
>> This kind of method to flatten json data structures were working for me
>> with
>> previous versions of spark, but I am now trying to upgrade from 1.4.1 to
>> 1.5.1 and started getting this error.
>>
>> What am I doing wrong?
>> Any help would be appreciated.
>>
>> Thanks,
>> Nuno
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-JSON-data-structure-tp25120.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

-- 




1 Neal Street London WC2H 9QL

www.rightster.com | @Rightster <https://twitter.com/rightster> | LinkedIn 
<https://www.linkedin.com/company/rightster>


Rightster Limited (incorporated in England with company number 07634543) 
has its registered office at 1 Neal Street London WC2H 9QL. Rightster 
Limited is a subsidiary of Rightster Group plc (incorporated in England and 
Wales with company number 08754680).

Reply via email to