Re: Having Spark read a JSON file

Debasish Das Sun, 23 Feb 2014 21:49:26 -0800

Nick,

If you don't want to use avro thrift protbuf etc use a library like
lift-json and write the json as string, read it as text file and de
serialize using lift json...you can use standard separators like comma tab
etc...


I am sure there will be better ways to do it but I am new to spark as
well...

Deb
On Feb 23, 2014 9:10 PM, "nicholas.chammas" <nicholas.cham...@gmail.com>
wrote:

> I'm new to this field, but it seems like most "Big Data" examples --
> Spark's included -- begin with reading in flat lines of text from a file.
>
> How would I go about having Spark turn a large JSON file into an RDD?
>
> So the file would just be a text file that looks like this:
>
> [{...}, {...}, ...]
>
>
>  where the individual JSON objects are arbitrarily complex (i.e. not
> necessarily flat) and may or may not be on separate lines.
>
> Basically, I'm guessing Spark would need to parse the JSON since it cannot
> rely on newlines as a delimiter. That sounds like a costly thing.
>
> Is JSON a "bad" format to have to deal with, or can Spark efficiently
> ingest and work with data in this format? If it can, can I get a pointer as
> to how I would do that?
>
>  Nick
>
> ------------------------------
> View this message in context: Having Spark read a JSON 
> file<http://apache-spark-user-list.1001560.n3.nabble.com/Having-Spark-read-a-JSON-file-tp1963.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: Having Spark read a JSON file

Reply via email to