Thanks, you meant in a for loop. could you please put pseudocode in spark On Fri, Jun 19, 2020 at 8:39 AM Jörn Franke <jornfra...@gmail.com> wrote:
> Make every json object a line and then read t as jsonline not as multiline > > Am 19.06.2020 um 14:37 schrieb Chetan Khatri <chetan.opensou...@gmail.com > >: > > > All transactions in JSON, It is not a single array. > > On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner <step...@buckmaster.ca> > wrote: > >> It's an interesting problem. What is the structure of the file? One big >> array? On hash with many key-value pairs? >> >> Stephan >> >> On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Hi Spark Users, >>> >>> I have a 50GB of JSON file, I would like to read and persist at HDFS so >>> it can be taken into next transformation. I am trying to read as >>> spark.read.json(path) but this is giving Out of memory error on driver. >>> Obviously, I can't afford having 50 GB on driver memory. In general, what >>> is the best practice to read large JSON file like 50 GB? >>> >>> Thanks >>> >> >> >> -- >> Stephan Wehner, Ph.D. >> The Buckmaster Institute, Inc. >> 2150 Adanac Street >> Vancouver BC V5L 2E7 >> Canada >> Cell (604) 767-7415 >> Fax (888) 808-4655 >> >> Sign up for our free email course >> http://buckmaster.ca/small_business_website_mistakes.html >> >> http://www.buckmaster.ca >> http://answer4img.com >> http://loggingit.com >> http://clocklist.com >> http://stephansmap.org >> http://benchology.com >> http://www.trafficlife.com >> http://stephan.sugarmotor.org (Personal Blog) >> @stephanwehner (Personal Account) >> VA7WSK (Personal call sign) >> >