confusing about Spark SQL json format

charles li Thu, 31 Mar 2016 01:24:13 -0700

as this post  says, that in spark, we can load a json file in this way
bellow:


*post* :
https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html


-----------------------------------------------------------------------------------------------
sqlContext.jsonFile(file_path)
or
sqlContext.read.json(file_path)
-----------------------------------------------------------------------------------------------


and the *json file format* looks like bellow, say *people.json*

--------------------------------------------------------------------------------------------{"name":"Yin",
"address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}
-----------------------------------------------------------------------------------------------


and here comes my *problems*:

Is that the *standard json format*? according to http://www.json.org/ , I
don't think so. it's just a *collection of records* [ a dict ], not a valid
json format. as the json official doc, the standard json format of
people.json should be :

--------------------------------------------------------------------------------------------{"name":
["Yin", "Michael"],
"address":[ {"city":"Columbus","state":"Ohio"},
{"city":null, "state":"California"} ]
}
-----------------------------------------------------------------------------------------------

So, why we define the json format as a collection of records in spark, I
mean, it will lead to some unconvenient, for if we had a large standard
json file, we need to firstly format it to make it correctly readable in
spark, which will low-efficiency, time-consuming, un-compatible and
space-consuming.


great thanks,






-- 
*--------------------------------------*
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao

confusing about Spark SQL json format

Reply via email to