Re: confusing about Spark SQL json format

charles li Thu, 31 Mar 2016 02:06:00 -0700

hi, UMESH, I think you've misunderstood the json definition.

there is only one object in a json file:



for the file, people.json, as bellow:

--------------------------------------------------------------------------------------------

{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}

-----------------------------------------------------------------------------------------------

it does have two valid format:

1.

--------------------------------------------------------------------------------------------

[ {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}},
{"name":"Michael", "address":{"city":null, "state":"California"}}
]

-----------------------------------------------------------------------------------------------

2.

--------------------------------------------------------------------------------------------

{"name": ["Yin", "Michael"],
"address":[ {"city":"Columbus","state":"Ohio"},
{"city":null, "state":"California"} ]
}
-----------------------------------------------------------------------------------------------



On Thu, Mar 31, 2016 at 4:53 PM, UMESH CHAUDHARY <umesh9...@gmail.com>
wrote:

> Hi,
> Look at below image which is from json.org :
>
> [image: Inline image 1]
>
> The above image describes the object formulation of below JSON:
>
> Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
> Object=> {"name":"Michael", "address":{"city":null, "state":"California"}}
>
>
> Note that "address" is also an object.
>
>
>
> On Thu, Mar 31, 2016 at 1:53 PM, charles li <charles.up...@gmail.com>
> wrote:
>
>> as this post  says, that in spark, we can load a json file in this way
>> bellow:
>>
>> *post* :
>> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
>>
>>
>>
>> -----------------------------------------------------------------------------------------------
>> sqlContext.jsonFile(file_path)
>> or
>> sqlContext.read.json(file_path)
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> and the *json file format* looks like bellow, say *people.json*
>>
>>
>> --------------------------------------------------------------------------------------------{"name":"Yin",
>> "address":{"city":"Columbus","state":"Ohio"}}
>> {"name":"Michael", "address":{"city":null, "state":"California"}}
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> and here comes my *problems*:
>>
>> Is that the *standard json format*? according to http://www.json.org/ ,
>> I don't think so. it's just a *collection of records* [ a dict ], not a
>> valid json format. as the json official doc, the standard json format of
>> people.json should be :
>>
>>
>> --------------------------------------------------------------------------------------------{"name":
>> ["Yin", "Michael"],
>> "address":[ {"city":"Columbus","state":"Ohio"},
>> {"city":null, "state":"California"} ]
>> }
>>
>> -----------------------------------------------------------------------------------------------
>>
>> So, why we define the json format as a collection of records in spark, I
>> mean, it will lead to some unconvenient, for if we had a large standard
>> json file, we need to firstly format it to make it correctly readable in
>> spark, which will low-efficiency, time-consuming, un-compatible and
>> space-consuming.
>>
>>
>> great thanks,
>>
>>
>>
>>
>>
>>
>> --
>> *--------------------------------------*
>> a spark lover, a quant, a developer and a good man.
>>
>> http://github.com/litaotao
>>
>
>


-- 
*--------------------------------------*
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao

Re: confusing about Spark SQL json format

Reply via email to