I had posted this question yesterday but formatting of my question was very
bad. So I am posting the same question again. Below is my question:

I am reading a directory of files using wholeTextFiles. After that I am
calling a function on each element of the rdd using map . The whole program
uses just 50 lines of each file. Please find the code at below link

https://gist.github.com/ashwini-anand/0e468da9b4ab7863dff14833d34de79e

The size of each file of the directory can be very large in my case and
because of this reason use of wholeTextFiles api will be inefficient in this
case. Right now wholeTextFiles loads full file content into the memory. can
we make wholeTextFiles to load only first 50 lines of each file ? Apart from
using wholeTextFiles, other solution I can think of is iterating over each
file of the directory one by one but that also seems to be inefficient. I am
new to spark. Please let me know if there is any efficient way to do this. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-large-size-files-from-a-directory-tp28673.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to