Hi Pradeep, I'm afraid you're running into a hard Java issue. Strings are indexed with signed integers and can therefore not be longer than approximately 2 billion characters. Could you use `textFile` as a workaround? It will give you an RDD of the files' lines instead.
In general, this guide http://spark.apache.org/contributing.html gives information on how to contribute to spark, including instructions on how to file bug reports (which does not apply in this case as it isn't a bug in Spark). regards, --Jakob On Mon, Dec 12, 2016 at 7:34 PM, Pradeep <pradeep.mi...@mail.com> wrote: > Hi, > > Why there is an restriction on max file size that can be read by > wholeTextFile() method. > > I can read a 1.5 gigs file but get Out of memory for 2 gig file. > > Also, how can I raise this as an defect in spark jira. Can someone please > guide. > > Thanks, > Pradeep > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org