Re: laziness in textFile reading from HDFS?

2015-10-06 Thread Jonathan Coveney
on-for-large-lzo-files > > > Mohammed > > > -Original Message- > From: Matt Narrell [mailto:matt.narr...@gmail.com ] > Sent: Tuesday, October 6, 2015 4:08 PM > To: Mohammed Guller > Cc: davidkl; user@spark.apache.org > Subject: Re: laziness in textFile

Re: laziness in textFile reading from HDFS?

2015-10-06 Thread Jonathan Coveney
on-for-large-lzo-files > > > Mohammed > > > -Original Message- > From: Matt Narrell [mailto:matt.narr...@gmail.com ] > Sent: Tuesday, October 6, 2015 4:08 PM > To: Mohammed Guller > Cc: davidkl; user@spark.apache.org > Subject: Re: laziness in textFile

RE: laziness in textFile reading from HDFS?

2015-10-06 Thread Mohammed Guller
spark-hadoop-throws-exception-for-large-lzo-files Mohammed -Original Message- From: Matt Narrell [mailto:matt.narr...@gmail.com] Sent: Tuesday, October 6, 2015 4:08 PM To: Mohammed Guller Cc: davidkl; user@spark.apache.org Subject: Re: laziness in textFile reading from HDFS? Agreed. This is

Re: laziness in textFile reading from HDFS?

2015-10-06 Thread Matt Narrell
gt; save operation, I don't see how caching would help. > > Mohammed > > > -Original Message- > From: Matt Narrell [mailto:matt.narr...@gmail.com] > Sent: Tuesday, October 6, 2015 3:32 PM > To: Mohammed Guller > Cc: davidkl; user@spark.apache.org >

RE: laziness in textFile reading from HDFS?

2015-10-06 Thread Mohammed Guller
idkl; user@spark.apache.org > Subject: Re: laziness in textFile reading from HDFS? > > Is there any more information or best practices here? I have the exact same > issues when reading large data sets from HDFS (larger than available RAM) and > I cannot run without setting the RDD persi

Re: laziness in textFile reading from HDFS?

2015-10-06 Thread Matt Narrell
Mohammed >> >> -----Original Message----- >> From: davidkl [mailto:davidkl...@hotmail.com] >> Sent: Monday, September 28, 2015 1:40 AM >> To: user@spark.apache.org >> Subject: laziness in textFile reading from HDFS? >> >> Hello, >> >> I nee

RE: laziness in textFile reading from HDFS?

2015-10-05 Thread Mohammed Guller
: laziness in textFile reading from HDFS? Is there any more information or best practices here? I have the exact same issues when reading large data sets from HDFS (larger than available RAM) and I cannot run without setting the RDD persistence level to MEMORY_AND_DISK_SER, and using nearly all the

Re: laziness in textFile reading from HDFS?

2015-10-03 Thread Matt Narrell
ad operation is lazy > 4) It is okay to have more number of partitions than number of cores. > > Mohammed > > -Original Message- > From: davidkl [mailto:davidkl...@hotmail.com] > Sent: Monday, September 28, 2015 1:40 AM > To: user@spark.apache.org > Subje

RE: laziness in textFile reading from HDFS?

2015-09-29 Thread Mohammed Guller
[mailto:davidkl...@hotmail.com] Sent: Monday, September 28, 2015 1:40 AM To: user@spark.apache.org Subject: laziness in textFile reading from HDFS? Hello, I need to process a significant amount of data every day, about 4TB. This will be processed in batches of about 140GB. The cluster this will

laziness in textFile reading from HDFS?

2015-09-28 Thread davidkl
spark-user-list.1001560.n3.nabble.com/laziness-in-textFile-reading-from-HDFS-tp24837.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.or