ah, now that does sound suspicious... On 2 Sep 2015, at 14:09, Sigurd Knippenberg <sig...@knippenberg.com<mailto:sig...@knippenberg.com>> wrote:
Yep. I know. It's was set to 32K when I ran this test. If I bump it to 64K the issue goes away. It still doesn't make sense to me that the Spark job doesn't release its file handles until the end of the job instead of doing that while my loop iterates. Sigurd On Wed, Sep 2, 2015 at 4:33 AM, Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote: On 31 Aug 2015, at 19:49, Sigurd Knippenberg <sig...@knippenberg.com<mailto:sig...@knippenberg.com>> wrote: I know I can adjust the max open files allowed by the OS but I'd rather fix the underlaying issue. bumping up the OS handle limits is step #1 of installing a hadoop cluster https://wiki.apache.org/hadoop/TooManyOpenFiles