Yes On Thu, Jun 18, 2020 at 12:34 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
> Hi, > So you have a single JSON record in multiple lines? > And all the 50 GB is in one file? > > Regards, > Gourav > > On Thu, 18 Jun 2020, 14:34 Chetan Khatri, <chetan.opensou...@gmail.com> > wrote: > >> It is dynamically generated and written at s3 bucket not historical data >> so I guess it doesn't have jsonlines format >> >> On Thu, Jun 18, 2020 at 9:16 AM Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Depends on the data types you use. >>> >>> Do you have in jsonlines format? Then the amount of memory plays much >>> less a role. >>> >>> Otherwise if it is one large object or array I would not recommend it. >>> >>> > Am 18.06.2020 um 15:12 schrieb Chetan Khatri < >>> chetan.opensou...@gmail.com>: >>> > >>> > >>> > Hi Spark Users, >>> > >>> > I have a 50GB of JSON file, I would like to read and persist at HDFS >>> so it can be taken into next transformation. I am trying to read as >>> spark.read.json(path) but this is giving Out of memory error on driver. >>> Obviously, I can't afford having 50 GB on driver memory. In general, what >>> is the best practice to read large JSON file like 50 GB? >>> > >>> > Thanks >>> >>