What you suggested works in Spark 2.3 , but in the version that I am using 
(2.1) it produces the following exception : 


found   : org.apache.spark.sql.types.ArrayType

required: org.apache.spark.sql.types.StructType

       ds.select(from_json($"news", schema) as "news_parsed").show(false)


Is it viable/possible to export a function from 2.3 to 2.1?  What other options 
do I have? 


Thank you.



From: Magnus Nilsson <ma...@kth.se> 
Sent: Saturday, February 23, 2019 3:43 PM
Cc: user@spark.apache.org
Subject: Re: How can I parse an "unnamed" json array present in a column?


Use spark.sql.types.ArrayType instead of a Scala Array as the root type when 
you define the schema and it will work.






On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com 
<mailto:em...@yeikel.com> > wrote:

I have an "unnamed" json array stored in a *column*.  

The format is the following : 

column name : news

Data : 

    "source": "source1",
    "name": "News site1"
    "source": "source2",
    "name": "News site2"

Ideally , I'd like to parse it as : 

news ARRAY<struct&lt;source:string, name:string>>

I've tried the following : 

import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types._;

val entry = scala.io.Source.fromFile("1.txt").mkString

val ds = Seq(entry).toDF("news")

val schema = Array(new StructType().add("name", StringType).add("source",

ds.select(from_json($"news", schema) as "news_parsed").show(false)

But this is not allowed : 

found   : Array[org.apache.spark.sql.types.StructType]
required: org.apache.spark.sql.types.StructType

I also tried passing the following schema : 

val schema = StructType(new StructType().add("name",
StringType).add("source", StringType))

But this only parsed the first record : 

|news_parsed         |
|[News site1,source1]|

I am aware that if I fix the JSON like this : 

  "news": [
      "source": "source1",
      "name": "News site1"
      "source": "source2",
      "name": "News site2"

The parsing works as expected , but I would like to avoid doing that if

Another approach that I can think of is to map on it and parse it using
third party libraries like Gson , but  I am not sure if this is any better
than fixing the json beforehand. 

I am running Spark 2.1

Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

To unsubscribe e-mail: user-unsubscr...@spark.apache.org 

Reply via email to