You can use sc.wholeTextFiles to read each file as a complete String, though it requires each file to be small enough for one task to process.
On August 26, 2014 at 4:01:45 PM, Chris Fregly (ch...@fregly.com) wrote: i've seen this done using mapPartitions() where each partition represents a single, multi-line json file. you can rip through each partition (json file) and parse the json doc as a whole. this assumes you use sc.textFile("<path>/*.json") or equivalent to load in multiple files at once. each json file will be a partition. not sure if this satisfies your use case, but might be a good starting point. -chris On Mon, Jul 14, 2014 at 2:55 PM, SK <skrishna...@gmail.com> wrote: Hi, I have a json file where the definition of each object spans multiple lines. An example of one object definition appears below. { "name": "16287e9cdf", "width": 500, "height": 325, "width": 1024, "height": 665, "obj": [ { "x": 395.08, "y": 82.09, "w": 185.48677, "h": 185.48677, "min": 50, "max": 59, "attr1": 2, "attr2": 68, "attr3": 8 }, { "x": 519.1, "y": 225.8, "w": 170, "h": 171, "min": 20, "max": 29, "attr1": 7, "attr2": 93, "attr3": 10 } ] } I used the following Spark code to parse the file. However, the parsing is failing because I think it expects one Json object definition per line. I can try to preprocess the input file to remove the new lines, but I would like to know if it is possible to parse a Json object definition that spans multiple lines, directly in Spark. val inp = sc.textFile(args(0)) val res = inp.map(line => { parse(line) }) .map(json => { implicit lazy val formats = org.json4s.DefaultFormats val image = (json \ "name").extract[String] } ) Thanks for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-Json-object-definition-spanning-multiple-lines-tp9659.html Sent from the Apache Spark User List mailing list archive at Nabble.com.