Hi Michael, thank you for your reply, I tried to use explode before, but didn't had much success. But I didn't find an example where it is used like you suggested.
I applied your suggestion and it works like a charm! Thanks, Nuno Nuno Carvalho Software Engineer - Big Data & Analytics On 19 October 2015 at 21:55, Michael Armbrust <mich...@databricks.com> wrote: > Quickfix is probably to use Seq[Row] instead of Array (the types that are > returned are documented here: > http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types) > > Really though you probably want to be using explode. Perhaps something > like this would help? > > import org.apache.spark.sql.functions._ > dataFrame.select(explode($"provider.contract")).as("contract")) > > On Mon, Oct 19, 2015 at 8:08 AM, nunomrc <nuno.carva...@rightster.com> > wrote: > >> Hi I am fairly new to Spark and I am trying to flatten the following >> structure: >> >> |-- provider: struct (nullable = true) >> | |-- accountId: string (nullable = true) >> | |-- contract: array (nullable = true) >> >> And then provider is: >> root >> |-- accountId: string (nullable = true) >> |-- contract: array (nullable = true) >> | |-- element: struct (containsNull = true) >> | | |-- details: struct (nullable = true) >> | | | |-- contractId: string (nullable = true) >> | | | |-- countryCode: string (nullable = true) >> | | | |-- endDate: string (nullable = true) >> | | | |-- noticePeriod: long (nullable = true) >> | | | |-- startDate: string (nullable = true) >> | | |-- endDate: string (nullable = true) >> | | |-- startDate: string (nullable = true) >> | | |-- other: struct (nullable = true) >> | | | |-- type: string (nullable = true) >> | | | |-- values: array (nullable = true) >> | | | | |-- element: struct (containsNull = true) >> | | | | | |-- key: string (nullable = true) >> | | | | | |-- value: long (nullable = true) >> >> >> I am trying the following: >> >> dataFrame.map { case Row(....., provider: Row, .....) => >> val list = provider.getAs[Array[Row]]("contract") >> >> At this point, I get the following exception: >> [info] org.apache.spark.SparkException: Job aborted due to stage >> failure: >> Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in >> stage 4.0 (TID 9, localhost): java.lang.ClassCastException: >> scala.collection.mutable.WrappedArray$ofRef cannot be cast to >> [Lorg.apache.spark.sql.Row; >> [info] at com.mycode.Deal$$anonfun$flattenDeals$1.apply(Deal.scala:62) >> >> I tried many different variations of this and tried to get the actual data >> type of the elements of the array, without any success. >> This kind of method to flatten json data structures were working for me >> with >> previous versions of spark, but I am now trying to upgrade from 1.4.1 to >> 1.5.1 and started getting this error. >> >> What am I doing wrong? >> Any help would be appreciated. >> >> Thanks, >> Nuno >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-JSON-data-structure-tp25120.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- 1 Neal Street London WC2H 9QL www.rightster.com | @Rightster <https://twitter.com/rightster> | LinkedIn <https://www.linkedin.com/company/rightster> Rightster Limited (incorporated in England with company number 07634543) has its registered office at 1 Neal Street London WC2H 9QL. Rightster Limited is a subsidiary of Rightster Group plc (incorporated in England and Wales with company number 08754680).