On Tue, Dec 17, 2013 at 11:05 PM, Azuryy Yu <[email protected]> wrote:

> I think you need to increase ulimit to avoid 'too many open files' error,
> then FileNotFoundException should disappear.
>

That was our initial thought too... but this is happening on even trivial
jobs that worked fine a few days ago.

And I'm not even sure on what machine I might do this, or for what user.
 As what user does Spark run?  It seems to write output into Hadoop using a
userid that doesn't exist on the cluster - only on the machine from which
I'm running the job.
Also, the problem seems to get reported randomly on one of our worker nodes
(though the problem seems to occur on several, and is only reported on one
- and as I said before, with no sign of problem, according to lsof).  And
on each of these nodes, I can open a significant number of files just fine
outside spark.

Actually, I did eventually find some FileNotFound exception in worker logs
too - both on machines the client reported had problems, and on other
machines.


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [email protected]

Reply via email to