Hi, So you have a single JSON record in multiple lines? And all the 50 GB is in one file?
Regards, Gourav On Thu, 18 Jun 2020, 14:34 Chetan Khatri, <chetan.opensou...@gmail.com> wrote: > It is dynamically generated and written at s3 bucket not historical data > so I guess it doesn't have jsonlines format > > On Thu, Jun 18, 2020 at 9:16 AM Jörn Franke <jornfra...@gmail.com> wrote: > >> Depends on the data types you use. >> >> Do you have in jsonlines format? Then the amount of memory plays much >> less a role. >> >> Otherwise if it is one large object or array I would not recommend it. >> >> > Am 18.06.2020 um 15:12 schrieb Chetan Khatri < >> chetan.opensou...@gmail.com>: >> > >> > >> > Hi Spark Users, >> > >> > I have a 50GB of JSON file, I would like to read and persist at HDFS so >> it can be taken into next transformation. I am trying to read as >> spark.read.json(path) but this is giving Out of memory error on driver. >> Obviously, I can't afford having 50 GB on driver memory. In general, what >> is the best practice to read large JSON file like 50 GB? >> > >> > Thanks >> >