as this post says, that in spark, we can load a json file in this way bellow:
*post* : https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html ----------------------------------------------------------------------------------------------- sqlContext.jsonFile(file_path) or sqlContext.read.json(file_path) ----------------------------------------------------------------------------------------------- and the *json file format* looks like bellow, say *people.json* --------------------------------------------------------------------------------------------{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}} ----------------------------------------------------------------------------------------------- and here comes my *problems*: Is that the *standard json format*? according to http://www.json.org/ , I don't think so. it's just a *collection of records* [ a dict ], not a valid json format. as the json official doc, the standard json format of people.json should be : --------------------------------------------------------------------------------------------{"name": ["Yin", "Michael"], "address":[ {"city":"Columbus","state":"Ohio"}, {"city":null, "state":"California"} ] } ----------------------------------------------------------------------------------------------- So, why we define the json format as a collection of records in spark, I mean, it will lead to some unconvenient, for if we had a large standard json file, we need to firstly format it to make it correctly readable in spark, which will low-efficiency, time-consuming, un-compatible and space-consuming. great thanks, -- *--------------------------------------* a spark lover, a quant, a developer and a good man. http://github.com/litaotao