You are correct that it does not take the standard JSON file format. From the Spark Docs: "Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.”
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets On Mar 31, 2016, at 5:30 AM, charles li <charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote: hi, UMESH, have you tried to load that json file on your machine? I did try it before, and here is the screenshot: <屏幕快照 2016-03-31 下午5.27.30.png> <屏幕快照 2016-03-31 下午5.27.39.png> On Thu, Mar 31, 2016 at 5:19 PM, UMESH CHAUDHARY <umesh9...@gmail.com<mailto:umesh9...@gmail.com>> wrote: Hi Charles, The definition of object from www.json.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.json.org&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=goeVxSn01bVFiVJp7KJ9Yaz8FjuPpCfcS65BtTLr1d4&e=>: An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma). Its a pretty much OOPS paradigm , isn't it? Regards, Umesh On Thu, Mar 31, 2016 at 2:34 PM, charles li <charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote: hi, UMESH, I think you've misunderstood the json definition. there is only one object in a json file: for the file, people.json, as bellow: -------------------------------------------------------------------------------------------- {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}} ----------------------------------------------------------------------------------------------- it does have two valid format: 1. -------------------------------------------------------------------------------------------- [ {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}, {"name":"Michael", "address":{"city":null, "state":"California"}} ] ----------------------------------------------------------------------------------------------- 2. -------------------------------------------------------------------------------------------- {"name": ["Yin", "Michael"], "address":[ {"city":"Columbus","state":"Ohio"}, {"city":null, "state":"California"} ] } ----------------------------------------------------------------------------------------------- On Thu, Mar 31, 2016 at 4:53 PM, UMESH CHAUDHARY <umesh9...@gmail.com<mailto:umesh9...@gmail.com>> wrote: Hi, Look at below image which is from json.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__json.org&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=R1os0JBEfw1hBGFnNmMyqIHc17wYCdE2yyJVjANbY88&e=> : <image.png> The above image describes the object formulation of below JSON: Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} Object=> {"name":"Michael", "address":{"city":null, "state":"California"}} Note that "address" is also an object. On Thu, Mar 31, 2016 at 1:53 PM, charles li <charles.up...@gmail.com<mailto:charles.up...@gmail.com>> wrote: as this post says, that in spark, we can load a json file in this way bellow: post : https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__databricks.com_blog_2015_02_02_an-2Dintroduction-2Dto-2Djson-2Dsupport-2Din-2Dspark-2Dsql.html&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=zsbEQhumiJod3T8z6Ev_pLMmhJQp5gYOpYbvVl8iPto&e=> ----------------------------------------------------------------------------------------------- sqlContext.jsonFile(file_path) or sqlContext.read.json(file_path) ----------------------------------------------------------------------------------------------- and the json file format looks like bellow, say people.json --------------------------------------------------------------------------------------------{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}} ----------------------------------------------------------------------------------------------- and here comes my problems: Is that the standard json format? according to http://www.json.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.json.org_&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=dqmXt1Kv3AFEJPSn-Bpp6LCBkR-pbTHlLYAYbZ_sMDQ&e=> , I don't think so. it's just a collection of records [ a dict ], not a valid json format. as the json official doc, the standard json format of people.json should be : --------------------------------------------------------------------------------------------{"name": ["Yin", "Michael"], "address":[ {"city":"Columbus","state":"Ohio"}, {"city":null, "state":"California"} ] } ----------------------------------------------------------------------------------------------- So, why we define the json format as a collection of records in spark, I mean, it will lead to some unconvenient, for if we had a large standard json file, we need to firstly format it to make it correctly readable in spark, which will low-efficiency, time-consuming, un-compatible and space-consuming. great thanks, -- -------------------------------------- a spark lover, a quant, a developer and a good man. http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=> -- -------------------------------------- a spark lover, a quant, a developer and a good man. http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=> -- -------------------------------------- a spark lover, a quant, a developer and a good man. http://github.com/litaotao<https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_litaotao&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=p-30mQfpiGcYa4IPhDd3F0Yecif2LwGfBsScx0gXAKw&s=wka5JBoaoNVjZiTllSeNJUZzD8BxrB9RhxNXmruSxyQ&e=>