hi, UMESH, I think you've misunderstood the json definition. there is only one object in a json file:
for the file, people.json, as bellow: -------------------------------------------------------------------------------------------- {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}} ----------------------------------------------------------------------------------------------- it does have two valid format: 1. -------------------------------------------------------------------------------------------- [ {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}, {"name":"Michael", "address":{"city":null, "state":"California"}} ] ----------------------------------------------------------------------------------------------- 2. -------------------------------------------------------------------------------------------- {"name": ["Yin", "Michael"], "address":[ {"city":"Columbus","state":"Ohio"}, {"city":null, "state":"California"} ] } ----------------------------------------------------------------------------------------------- On Thu, Mar 31, 2016 at 4:53 PM, UMESH CHAUDHARY <umesh9...@gmail.com> wrote: > Hi, > Look at below image which is from json.org : > > [image: Inline image 1] > > The above image describes the object formulation of below JSON: > > Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} > Object=> {"name":"Michael", "address":{"city":null, "state":"California"}} > > > Note that "address" is also an object. > > > > On Thu, Mar 31, 2016 at 1:53 PM, charles li <charles.up...@gmail.com> > wrote: > >> as this post says, that in spark, we can load a json file in this way >> bellow: >> >> *post* : >> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html >> >> >> >> ----------------------------------------------------------------------------------------------- >> sqlContext.jsonFile(file_path) >> or >> sqlContext.read.json(file_path) >> >> ----------------------------------------------------------------------------------------------- >> >> >> and the *json file format* looks like bellow, say *people.json* >> >> >> --------------------------------------------------------------------------------------------{"name":"Yin", >> "address":{"city":"Columbus","state":"Ohio"}} >> {"name":"Michael", "address":{"city":null, "state":"California"}} >> >> ----------------------------------------------------------------------------------------------- >> >> >> and here comes my *problems*: >> >> Is that the *standard json format*? according to http://www.json.org/ , >> I don't think so. it's just a *collection of records* [ a dict ], not a >> valid json format. as the json official doc, the standard json format of >> people.json should be : >> >> >> --------------------------------------------------------------------------------------------{"name": >> ["Yin", "Michael"], >> "address":[ {"city":"Columbus","state":"Ohio"}, >> {"city":null, "state":"California"} ] >> } >> >> ----------------------------------------------------------------------------------------------- >> >> So, why we define the json format as a collection of records in spark, I >> mean, it will lead to some unconvenient, for if we had a large standard >> json file, we need to firstly format it to make it correctly readable in >> spark, which will low-efficiency, time-consuming, un-compatible and >> space-consuming. >> >> >> great thanks, >> >> >> >> >> >> >> -- >> *--------------------------------------* >> a spark lover, a quant, a developer and a good man. >> >> http://github.com/litaotao >> > > -- *--------------------------------------* a spark lover, a quant, a developer and a good man. http://github.com/litaotao