Also you will need to bounce the spark services from a new ssh session to make the ulimit changes take effect (if you changed the value in /etc/limits)
Sent from my mobile phone On Jan 20, 2014 5:32 PM, "Jey Kottalam" <j...@cs.berkeley.edu> wrote: > Can you try ulimit -n to make sure the increased limit has taken effect? > > On Monday, January 20, 2014, Ryan Compton <compton.r...@gmail.com> wrote: > >> I've got >> >> System.setProperty("spark.shuffle.consolidate.files", "true"); >> >> but I'm getting the same error. >> >> The output of the distinct count will be 101,230,940 (I did it in >> pig). I've got 13 nodes and each node allows 13,069,279 open files. So >> even with 1 record per file I think I've got enough. But what do the >> rest of you have for /proc/sys/fs/file-max? >> >> On Sun, Jan 19, 2014 at 5:12 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > You should try setting spark.shuffle.consolidate.files to true. >> > >> > >> > On Sun, Jan 19, 2014 at 4:49 PM, Ryan Compton <compton.r...@gmail.com> >> > wrote: >> >> >> >> I think I've shuffled this data before (I often join on it), and I >> >> know I was using distinct() in 0.7.3 for the same computation. >> >> >> >> What do people usually have in /proc/sys/fs/file-max? I'm real >> >> surprised that 13M isn't enough. >> >> >> >> On Sat, Jan 18, 2014 at 11:47 PM, Mark Hamstra < >> m...@clearstorydata.com> >> >> wrote: >> >> > distinct() needs to do a shuffle -- which is resulting in the need to >> >> > materialize the map outputs as files. count() doesn't. >> >> > >> >> > >> >> > On Sat, Jan 18, 2014 at 10:33 PM, Ryan Compton < >> compton.r...@gmail.com> >> >> > wrote: >> >> >> >> >> >> I'm able to open ~13M files. I expect the output of >> >> >> .distinct().count() to be under 100M, why do I need so many files >> >> >> open? >> >> >> >> >> >> rfcompton@node19 ~> cat /etc/redhat-release >> >> >> CentOS release 5.7 (Final) >> >> >> rfcompton@node19 ~> cat /proc/sys/fs/file-max >> >> >> 13069279 >> >> >> >> >> >> On Sat, Jan 18, 2014 at 9:12 AM, Jey Kottalam <j...@cs.berkeley.edu> >> >> >> wrote: >> >> >> > The "too many open files" error is due to running out of available >> >> >> > FDs, usually due to a limit set in the OS. >> >> >> > >> >> >> > The fix will depend on your specific OS, but under Linux it >> usually >> >> >> > involves the "fs.file-max" syctl. >> >> >> > >> >> >> > On Fri, Jan 17, 2014 at 3:02 PM, Ryan Compton >> >> >> > <compton.r...@gmail.com> >> >> >> > wrote: >> >> >> >> When I try .distinct() my jobs fail. Possibly related: >> >> >> >> https://groups.google.com/forum/#!topic/shark-users/j2TO-GINuFo >> >> >> >> >> >> >> >> This works >> >> >> >> >> >> >> >> //get the node ids >> >> >> >> val nodes = dupedKeyedEdgeList.map(x => x._1).cache() >> >> >> >> //count the nodes >> >> >> >> val numNodes = nodes.count() >> >> >> >> logWarning("numNodes:\t"+numNodes) >> >> >> >> >> >> >> >> this fails >> >> >> >> >> >> >> >> //get the node ids >> >> >> >> val nodes = dupedKeyedEdgeList.map(x => x._1).cache() >> >> >> >> //count the nodes >> >> >> >> val numNodes = nodes.distinct().count() >> >> >> >> logWarning("numNodes:\t"+numNodes) >> >> >> >> >> >> >> >> with these stacktraces: >> >> >> >> >> >> >> >> 14/01/17 14:54:37 WARN scripts.ComputeNetworkStats: numEdges: >> >> >> >> 915189977 >> >> >> >> 14/01/17 14:54:37 INFO rdd.MappedRDD: Removing RDD 1 from >> >> >> >> persistence >> >> >> >> list >> >> >> >> -- >> >> >> >> 14/01/17 14:56:07 INFO cluster.ClusterTaskSetManager: Loss was >> due >> >> >> >> to >> >> >> >> java.io.IOException >> >> >> >> java.io.IOException: Filesystem closed >> >> >> >> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:299) >> >> >> >> at >> org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:77) >> >> >> >> at >> >