Hello,

You're absolutely right, the syntax you're using is returning the json4s
value objects, not native types like Int, Long etc. fix that problem and
then everything else (filters) will work as you expect.  This is a short
snippet of a larger example: [1]

    val lines = sc.textFile("likes.json")
    val user_interest = lines.map(line => {
      // Parse the JSON, returns RDD[JValue]
      parse(line)
    }).map(json => {
      // Extract the values we need to populate the UserInterest class
      implicit lazy val formats = org.json4s.DefaultFormats
      val name = (json \ "name").extract[String]
      val location_x = (json \ "location" \ "x").extract[Double]
      val location_y = (json \ "location" \ "y").extract[Double]
      val likes = (json \
"likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
      ( UserInterest(name, location_x, location_y, likes) )
    })


The key parts are "implicit lazy val formats = org.json4s.DefaultFormats"
being defined before you mess with the JSON and "(json \ "location" \ "x").
extract[Double]" to extract the parts you need.

One thing to be wary of is if you're JSON is not consistent, i.e. fields
not always being set -- then using the "extract[Double]" method will raise
exceptions.  Then you may wish to use an alternate way to pull out the
values as a String and process them yourself. e.g.

val id = compact(render(json \ "facebook" \ "id"))

Good luck playing with JSON and Spark!  :o)

Best,

MC


[1] UserInterestsExample.scala
https://gist.github.com/cotdp/b471cfff183b59d65ae1





On 11 June 2014 23:26, SK <skrishna...@gmail.com> wrote:

> I have the following piece of code that parses a json file and extracts the
> age and TypeID
>
> val p = sc.textFile(log_file)
>                    .map(line => { parse(line) })
>                    .map(json =>
>                       {  val v1 = json \ "person" \ "age"
>                          val v2 = json \ "Action" \ "Content" \ "TypeID"
>                          (v1, v2)
>                       }
>                     )
>
> p.foreach(r => println(r))
>
> The result is:
>
> (JInt(12),JInt(5))
> (JInt(32),JInt(6))
> (JInt(40),JInt(7))
>
> 1) How can I extract the values (i.e. without the JInt) ? I tried returning
> (v1.toInt, v2.toInt) from the map but got a compilation error stating that
> toInt is not a valid operation.
>
> 2) I would also like to know how  I can filter the above tuples based on
> the
> age values. For e.g. I added the following after the second map operation:
>
>   p.filter(tup => tup._1 > 20)
>
> I got a compilation errror: value > is not a member of org.json4s.JValue
>
> Thanks for your help.
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to