Can you try ulimit -n to make sure the increased limit has taken effect?

On Monday, January 20, 2014, Ryan Compton <[email protected]> wrote:

> I've got
>
>     System.setProperty("spark.shuffle.consolidate.files", "true");
>
> but I'm getting the same error.
>
> The output of the distinct count will be 101,230,940 (I did it in
> pig). I've got 13 nodes and each node allows 13,069,279 open files. So
> even with 1 record per file I think I've got enough. But what do the
> rest of you have for /proc/sys/fs/file-max?
>
> On Sun, Jan 19, 2014 at 5:12 PM, Mark Hamstra <[email protected]>
> wrote:
> > You should try setting spark.shuffle.consolidate.files to true.
> >
> >
> > On Sun, Jan 19, 2014 at 4:49 PM, Ryan Compton <[email protected]>
> > wrote:
> >>
> >> I think I've shuffled this data before (I often join on it), and I
> >> know I was using distinct() in 0.7.3 for the same computation.
> >>
> >> What do people usually have in  /proc/sys/fs/file-max? I'm real
> >> surprised that 13M isn't enough.
> >>
> >> On Sat, Jan 18, 2014 at 11:47 PM, Mark Hamstra <[email protected]
> >
> >> wrote:
> >> > distinct() needs to do a shuffle -- which is resulting in the need to
> >> > materialize the map outputs as files.  count() doesn't.
> >> >
> >> >
> >> > On Sat, Jan 18, 2014 at 10:33 PM, Ryan Compton <
> [email protected]>
> >> > wrote:
> >> >>
> >> >> I'm able to open ~13M files. I expect the output of
> >> >> .distinct().count() to be under 100M, why do I need so many files
> >> >> open?
> >> >>
> >> >> rfcompton@node19 ~> cat /etc/redhat-release
> >> >> CentOS release 5.7 (Final)
> >> >> rfcompton@node19 ~> cat /proc/sys/fs/file-max
> >> >> 13069279
> >> >>
> >> >> On Sat, Jan 18, 2014 at 9:12 AM, Jey Kottalam <[email protected]>
> >> >> wrote:
> >> >> > The "too many open files" error is due to running out of available
> >> >> > FDs, usually due to a limit set in the OS.
> >> >> >
> >> >> > The fix will depend on your specific OS, but under Linux it usually
> >> >> > involves the "fs.file-max" syctl.
> >> >> >
> >> >> > On Fri, Jan 17, 2014 at 3:02 PM, Ryan Compton
> >> >> > <[email protected]>
> >> >> > wrote:
> >> >> >> When I try .distinct() my jobs fail. Possibly related:
> >> >> >> https://groups.google.com/forum/#!topic/shark-users/j2TO-GINuFo
> >> >> >>
> >> >> >> This works
> >> >> >>
> >> >> >>     //get the node ids
> >> >> >>     val nodes = dupedKeyedEdgeList.map(x => x._1).cache()
> >> >> >>     //count the nodes
> >> >> >>     val numNodes = nodes.count()
> >> >> >>     logWarning("numNodes:\t"+numNodes)
> >> >> >>
> >> >> >> this fails
> >> >> >>
> >> >> >>     //get the node ids
> >> >> >>     val nodes = dupedKeyedEdgeList.map(x => x._1).cache()
> >> >> >>     //count the nodes
> >> >> >>     val numNodes = nodes.distinct().count()
> >> >> >>     logWarning("numNodes:\t"+numNodes)
> >> >> >>
> >> >> >> with these stacktraces:
> >> >> >>
> >> >> >> 14/01/17 14:54:37 WARN scripts.ComputeNetworkStats: numEdges:
> >> >> >> 915189977
> >> >> >> 14/01/17 14:54:37 INFO rdd.MappedRDD: Removing RDD 1 from
> >> >> >> persistence
> >> >> >> list
> >> >> >> --
> >> >> >> 14/01/17 14:56:07 INFO cluster.ClusterTaskSetManager: Loss was due
> >> >> >> to
> >> >> >> java.io.IOException
> >> >> >> java.io.IOException: Filesystem closed
> >> >> >> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:299)
> >> >> >> at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:77)
> >> >> >> at
>

Reply via email to