On Tue, Dec 17, 2013 at 11:05 PM, Azuryy Yu <[email protected]> wrote:
> I think you need to increase ulimit to avoid 'too many open files' error, > then FileNotFoundException should disappear. > That was our initial thought too... but this is happening on even trivial jobs that worked fine a few days ago. And I'm not even sure on what machine I might do this, or for what user. As what user does Spark run? It seems to write output into Hadoop using a userid that doesn't exist on the cluster - only on the machine from which I'm running the job. Also, the problem seems to get reported randomly on one of our worker nodes (though the problem seems to occur on several, and is only reported on one - and as I said before, with no sign of problem, according to lsof). And on each of these nodes, I can open a significant number of files just fine outside spark. Actually, I did eventually find some FileNotFound exception in worker logs too - both on machines the client reported had problems, and on other machines. -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: [email protected]
