Re: Reading TB of JSON file

Chetan Khatri Fri, 19 Jun 2020 05:37:16 -0700

Yes

On Thu, Jun 18, 2020 at 12:34 PM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:


> Hi,
> So you have a single JSON record in multiple lines?
> And all the 50 GB is in one file?
>
> Regards,
> Gourav
>
> On Thu, 18 Jun 2020, 14:34 Chetan Khatri, <chetan.opensou...@gmail.com>
> wrote:
>
>> It is dynamically generated and written at s3 bucket not historical data
>> so I guess it doesn't have jsonlines format
>>
>> On Thu, Jun 18, 2020 at 9:16 AM Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Depends on the data types you use.
>>>
>>> Do you have in jsonlines format? Then the amount of memory plays much
>>> less a role.
>>>
>>> Otherwise if it is one large object or array I would not recommend it.
>>>
>>> > Am 18.06.2020 um 15:12 schrieb Chetan Khatri <
>>> chetan.opensou...@gmail.com>:
>>> >
>>> > 
>>> > Hi Spark Users,
>>> >
>>> > I have a 50GB of JSON file, I would like to read and persist at HDFS
>>> so it can be taken into next transformation. I am trying to read as
>>> spark.read.json(path) but this is giving Out of memory error on driver.
>>> Obviously, I can't afford having 50 GB on driver memory. In general, what
>>> is the best practice to read large JSON file like 50 GB?
>>> >
>>> > Thanks
>>>
>>

Re: Reading TB of JSON file

Reply via email to