Re: JSON Input files

Madabhattula Rajesh Kumar Mon, 15 Dec 2014 18:58:08 -0800

Thank you Peter for the clarification.

Regards,
Rajesh


On Tue, Dec 16, 2014 at 12:42 AM, Michael Armbrust <mich...@databricks.com>
wrote:
>
> Underneath the covers, jsonFile uses TextInputFormat, which will split
> files correctly based on new lines.  Thus, there is no fixed maximum size
> for a json object (other than the fact that it must fit into memory on the
> executors).
>
> On Mon, Dec 15, 2014 at 7:22 AM, Madabhattula Rajesh Kumar <
> mrajaf...@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> Thank you for the clarification.
>>
>> Now we need to store each JSON object into one line. Is there any
>> limitation of length of JSON object? So, JSON object will not go to the
>> next line.
>>
>> What will happen if JSON object is a big/huge one?  Will it store in a
>> single line in HDFS?
>>
>> What will happen, if JSON object contains BLOB/CLOB value? Is this entire
>> JSON object stores in single line of HDFS?
>>
>> What will happen, if JSON object exceeding the HDFS block size. For
>> example, single JSON object split into two different worker nodes. In this
>> case, How Spark will read this JSON object?
>>
>> Could you please clarify above questions
>>
>> Regards,
>> Rajesh
>>
>>
>> On Mon, Dec 15, 2014 at 6:52 PM, Peter Vandenabeele <
>> pe...@vandenabeele.com> wrote:
>>>
>>>
>>>
>>> On Sat, Dec 13, 2014 at 5:43 PM, Helena Edelson <
>>> helena.edel...@datastax.com> wrote:
>>>
>>>> One solution can be found here:
>>>> https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets
>>>>
>>>>
>>> As far as I understand, the people.json file is not really a proper json
>>> file, but a file documented as:
>>>
>>>   "... JSON files where each line of the files is a JSON object.".
>>>
>>> This means that is a file with multiple lines, but each line needs to
>>> have a fully self-contained JSON object
>>> (initially confusing, this will not parse a standard multi-line JSON
>>> file). We are working to clarify this in
>>> https://github.com/apache/spark/pull/3517
>>>
>>> HTH,
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>> - Helena
>>>> @helenaedelson
>>>>
>>>> On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar <
>>>> mrajaf...@gmail.com> wrote:
>>>>
>>>> Hi Team,
>>>>
>>>> I have a large JSON file in Hadoop. Could you please let me know
>>>>
>>>> 1. How to read the JSON file
>>>> 2. How to parse the JSON file
>>>>
>>>> Please share any example program based on Scala
>>>>
>>>> Regards,
>>>> Rajesh
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Peter Vandenabeele
>>> http://www.allthingsdata.io
>>> http://www.linkedin.com/in/petervandenabeele
>>> https://twitter.com/peter_v
>>> gsm: +32-478-27.40.69
>>> e-mail: pe...@vandenabeele.com
>>> skype: peter_v_be
>>>
>>

Re: JSON Input files

Reply via email to