Quickfix is probably to use Seq[Row] instead of Array (the types that are returned are documented here: http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types)
Really though you probably want to be using explode. Perhaps something like this would help? import org.apache.spark.sql.functions._ dataFrame.select(explode($"provider.contract")).as("contract")) On Mon, Oct 19, 2015 at 8:08 AM, nunomrc <nuno.carva...@rightster.com> wrote: > Hi I am fairly new to Spark and I am trying to flatten the following > structure: > > |-- provider: struct (nullable = true) > | |-- accountId: string (nullable = true) > | |-- contract: array (nullable = true) > > And then provider is: > root > |-- accountId: string (nullable = true) > |-- contract: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- details: struct (nullable = true) > | | | |-- contractId: string (nullable = true) > | | | |-- countryCode: string (nullable = true) > | | | |-- endDate: string (nullable = true) > | | | |-- noticePeriod: long (nullable = true) > | | | |-- startDate: string (nullable = true) > | | |-- endDate: string (nullable = true) > | | |-- startDate: string (nullable = true) > | | |-- other: struct (nullable = true) > | | | |-- type: string (nullable = true) > | | | |-- values: array (nullable = true) > | | | | |-- element: struct (containsNull = true) > | | | | | |-- key: string (nullable = true) > | | | | | |-- value: long (nullable = true) > > > I am trying the following: > > dataFrame.map { case Row(....., provider: Row, .....) => > val list = provider.getAs[Array[Row]]("contract") > > At this point, I get the following exception: > [info] org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 4.0 (TID 9, localhost): java.lang.ClassCastException: > scala.collection.mutable.WrappedArray$ofRef cannot be cast to > [Lorg.apache.spark.sql.Row; > [info] at com.mycode.Deal$$anonfun$flattenDeals$1.apply(Deal.scala:62) > > I tried many different variations of this and tried to get the actual data > type of the elements of the array, without any success. > This kind of method to flatten json data structures were working for me > with > previous versions of spark, but I am now trying to upgrade from 1.4.1 to > 1.5.1 and started getting this error. > > What am I doing wrong? > Any help would be appreciated. > > Thanks, > Nuno > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-JSON-data-structure-tp25120.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >