Also you will need to bounce the spark services from a new ssh session to
make the ulimit changes take effect (if you changed the value in
/etc/limits)

Sent from my mobile phone
On Jan 20, 2014 5:32 PM, "Jey Kottalam" <j...@cs.berkeley.edu> wrote:

> Can you try ulimit -n to make sure the increased limit has taken effect?
>
> On Monday, January 20, 2014, Ryan Compton <compton.r...@gmail.com> wrote:
>
>> I've got
>>
>>     System.setProperty("spark.shuffle.consolidate.files", "true");
>>
>> but I'm getting the same error.
>>
>> The output of the distinct count will be 101,230,940 (I did it in
>> pig). I've got 13 nodes and each node allows 13,069,279 open files. So
>> even with 1 record per file I think I've got enough. But what do the
>> rest of you have for /proc/sys/fs/file-max?
>>
>> On Sun, Jan 19, 2014 at 5:12 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>> > You should try setting spark.shuffle.consolidate.files to true.
>> >
>> >
>> > On Sun, Jan 19, 2014 at 4:49 PM, Ryan Compton <compton.r...@gmail.com>
>> > wrote:
>> >>
>> >> I think I've shuffled this data before (I often join on it), and I
>> >> know I was using distinct() in 0.7.3 for the same computation.
>> >>
>> >> What do people usually have in  /proc/sys/fs/file-max? I'm real
>> >> surprised that 13M isn't enough.
>> >>
>> >> On Sat, Jan 18, 2014 at 11:47 PM, Mark Hamstra <
>> m...@clearstorydata.com>
>> >> wrote:
>> >> > distinct() needs to do a shuffle -- which is resulting in the need to
>> >> > materialize the map outputs as files.  count() doesn't.
>> >> >
>> >> >
>> >> > On Sat, Jan 18, 2014 at 10:33 PM, Ryan Compton <
>> compton.r...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> I'm able to open ~13M files. I expect the output of
>> >> >> .distinct().count() to be under 100M, why do I need so many files
>> >> >> open?
>> >> >>
>> >> >> rfcompton@node19 ~> cat /etc/redhat-release
>> >> >> CentOS release 5.7 (Final)
>> >> >> rfcompton@node19 ~> cat /proc/sys/fs/file-max
>> >> >> 13069279
>> >> >>
>> >> >> On Sat, Jan 18, 2014 at 9:12 AM, Jey Kottalam <j...@cs.berkeley.edu>
>> >> >> wrote:
>> >> >> > The "too many open files" error is due to running out of available
>> >> >> > FDs, usually due to a limit set in the OS.
>> >> >> >
>> >> >> > The fix will depend on your specific OS, but under Linux it
>> usually
>> >> >> > involves the "fs.file-max" syctl.
>> >> >> >
>> >> >> > On Fri, Jan 17, 2014 at 3:02 PM, Ryan Compton
>> >> >> > <compton.r...@gmail.com>
>> >> >> > wrote:
>> >> >> >> When I try .distinct() my jobs fail. Possibly related:
>> >> >> >> https://groups.google.com/forum/#!topic/shark-users/j2TO-GINuFo
>> >> >> >>
>> >> >> >> This works
>> >> >> >>
>> >> >> >>     //get the node ids
>> >> >> >>     val nodes = dupedKeyedEdgeList.map(x => x._1).cache()
>> >> >> >>     //count the nodes
>> >> >> >>     val numNodes = nodes.count()
>> >> >> >>     logWarning("numNodes:\t"+numNodes)
>> >> >> >>
>> >> >> >> this fails
>> >> >> >>
>> >> >> >>     //get the node ids
>> >> >> >>     val nodes = dupedKeyedEdgeList.map(x => x._1).cache()
>> >> >> >>     //count the nodes
>> >> >> >>     val numNodes = nodes.distinct().count()
>> >> >> >>     logWarning("numNodes:\t"+numNodes)
>> >> >> >>
>> >> >> >> with these stacktraces:
>> >> >> >>
>> >> >> >> 14/01/17 14:54:37 WARN scripts.ComputeNetworkStats: numEdges:
>> >> >> >> 915189977
>> >> >> >> 14/01/17 14:54:37 INFO rdd.MappedRDD: Removing RDD 1 from
>> >> >> >> persistence
>> >> >> >> list
>> >> >> >> --
>> >> >> >> 14/01/17 14:56:07 INFO cluster.ClusterTaskSetManager: Loss was
>> due
>> >> >> >> to
>> >> >> >> java.io.IOException
>> >> >> >> java.io.IOException: Filesystem closed
>> >> >> >> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:299)
>> >> >> >> at
>> org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:77)
>> >> >> >> at
>>
>

Reply via email to