You can use sc.wholeTextFiles to read each file as a complete String, though it 
requires each file to be small enough for one task to process.

On August 26, 2014 at 4:01:45 PM, Chris Fregly (ch...@fregly.com) wrote:

i've seen this done using mapPartitions() where each partition represents a 
single, multi-line json file.  you can rip through each partition (json file) 
and parse the json doc as a whole.

this assumes you use sc.textFile("<path>/*.json") or equivalent to load in 
multiple files at once.  each json file will be a partition.

not sure if this satisfies your use case, but might be a good starting point.

-chris


On Mon, Jul 14, 2014 at 2:55 PM, SK <skrishna...@gmail.com> wrote:
Hi,

I have a json file where the definition of each object spans multiple lines.
An example of one object definition appears below.

 {
    "name": "16287e9cdf",
    "width": 500,
    "height": 325,
    "width": 1024,
    "height": 665,
    "obj": [
      {
        "x": 395.08,
        "y": 82.09,
        "w": 185.48677,
        "h": 185.48677,
        "min": 50,
        "max": 59,
        "attr1": 2,
        "attr2": 68,
        "attr3": 8
      },
      {
        "x": 519.1,
        "y": 225.8,
        "w": 170,
        "h": 171,
        "min": 20,
        "max": 29,
        "attr1": 7,
        "attr2": 93,
        "attr3": 10
      }
   ]
}

I used the following Spark code to parse the file. However, the parsing is
failing because I think it expects one Json object definition per line. I
can try to preprocess the input file  to remove the new lines, but I would
like to know if it is possible to parse a Json object definition that spans
multiple lines, directly in Spark.

val inp = sc.textFile(args(0))
val res = inp.map(line => { parse(line) })
                   .map(json =>
                      {
                         implicit lazy val formats =
org.json4s.DefaultFormats
                         val image = (json \ "name").extract[String]
                      }
                    )


Thanks for  your help.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-Json-object-definition-spanning-multiple-lines-tp9659.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to