Re: Problem when sorting big file

2014-05-19 Thread Andrew Ash
Is your RDD of Strings? If so, you should make sure to use the Kryo serializer instead of the default Java one. It stores strings as UTF8 rather than Java's default UTF16 representation, which can save you half the memory usage in the right situation. Try setting the persistence level on the RDD

Problem when sorting big file

2014-05-16 Thread Gustavo Enrique Salazar Torres
Hi there: I have this dataset (about 12G) which I need to sort by key. I used the sortByKey method but when I try to save the file to disk (HDFS in this case) it seems that some tasks run out of time because they have too much data to save and it can't fit in memory. I say this because before the