Nick, If you don't want to use avro thrift protbuf etc use a library like lift-json and write the json as string, read it as text file and de serialize using lift json...you can use standard separators like comma tab etc...
I am sure there will be better ways to do it but I am new to spark as well... Deb On Feb 23, 2014 9:10 PM, "nicholas.chammas" <nicholas.cham...@gmail.com> wrote: > I'm new to this field, but it seems like most "Big Data" examples -- > Spark's included -- begin with reading in flat lines of text from a file. > > How would I go about having Spark turn a large JSON file into an RDD? > > So the file would just be a text file that looks like this: > > [{...}, {...}, ...] > > > where the individual JSON objects are arbitrarily complex (i.e. not > necessarily flat) and may or may not be on separate lines. > > Basically, I'm guessing Spark would need to parse the JSON since it cannot > rely on newlines as a delimiter. That sounds like a costly thing. > > Is JSON a "bad" format to have to deal with, or can Spark efficiently > ingest and work with data in this format? If it can, can I get a pointer as > to how I would do that? > > Nick > > ------------------------------ > View this message in context: Having Spark read a JSON > file<http://apache-spark-user-list.1001560.n3.nabble.com/Having-Spark-read-a-JSON-file-tp1963.html> > Sent from the Apache Spark User List mailing list > archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >