https://issues.apache.org/jira/browse/SPARK-22228
On Sun, Oct 8, 2017 at 11:58 AM, kant kodali <kanth...@gmail.com> wrote: > I have the following so far > > private StructType getSchema() { > return new StructType() > .add("name", StringType) > .add("address", StringType) > .add("docs", StringType); > } > > > ds.select(explode_outer(from_json(ds.col("value"), > ArrayType.apply(getSchema()))).as("result")).selectExpr("result.*"); > > This didn't quite work for me so just to clarify I have Json array of > documents as my input string > and I am trying to keep the values of my name, address, docs columns as a > string as well except > my input array string is flattened out by explode function. > Any suggestions will be great > Thanks! > > > On Sat, Oct 7, 2017 at 10:00 AM, Jules Damji <dmat...@comcast.net> wrote: > >> You might find these blogs helpful to parse & extract data from complex >> structures: >> >> https://databricks.com/blog/2017/06/27/4-sql-high-order-lamb >> da-functions-examine-complex-structured-data-databricks.html >> >> https://databricks.com/blog/2017/06/13/five-spark-sql-utilit >> y-functions-extract-explore-complex-data-types.html >> >> Cheers >> Jules >> >> >> Sent from my iPhone >> Pardon the dumb thumb typos :) >> >> On Oct 7, 2017, at 12:30 AM, kant kodali <kanth...@gmail.com> wrote: >> >> I have a Dataset<String> ds which consists of json rows. >> >> *Sample Json Row (This is just an example of one row in the dataset)* >> >> [ >> {"name": "foo", "address": {"state": "CA", "country": "USA"}, >> "docs":[{"subject": "english", "year": 2016}]} >> {"name": "bar", "address": {"state": "OH", "country": "USA"}, >> "docs":[{"subject": "math", "year": 2017}]} >> >> ] >> >> ds.printSchema() >> >> root >> |-- value: string (nullable = true) >> >> Now I want to convert into the following dataset using Spark 2.2.0 >> >> name | address | docs >> ---------------------------------------------------------------------------------- >> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": >> 2016}] >> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": >> 2017}] >> >> Preferably Java but Scala is also fine as long as there are functions >> available in Java API >> >> >