Hi,

I have a cluster where we have a parallel networked file system for our
major data storage and our nodes have ~750G of local SSD space.  To
speed up things, we configure yarn.nodemanager.local-dirs to use the
local SSD for local caching.

Recently, I've been trying to do a terasort of 2 terabytes of data over
8 nodes w/ Hadoop 2.7.3.  So that's about 6000 gigs of local SSD space
for caching, or 5400 gigs when hadoop uses its 90% disk full checking
limit.

I always get diskfull errors such as the below when running:

2017-04-11 12:31:44,062 WARN 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory 
/l/ssd/achutest/localstore/yarn-nm error, used space above threshold of 90.0%, 
removing from list of valid directories
2017-04-11 12:31:44,063 INFO 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) 
failed: 1/1 local-dirs are bad: /l/ssd/achutest/localstore/yarn-nm;
2017-04-11 12:31:44,063 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the 
disks failed. 1/1 local-dirs are bad: /l/ssd/achutest/localstore/yarn-nm;

What I don't understand is how I am getting diskfull errors.  Within
terasort, I should have at most 2000 gigs of mapped intermediate data
and at most 2000 gigs of merged data in reducers.  Even assuming some
overhead from Hadoop, I should have more than enough space for this
benchmark to complete given maps and reducers are spread out evenly
across nodes.

So my assumption is something else is being cached in local-dirs that
I'm not accounting for.  Is there any other data I should consider when
coming up with my estimates?

One guess I had.  Is it possible spilled data from reducer merges are
not deleted until a reducer completes?  Given my example above, the
total amount of merged data in reducers may exceed 2000 gigs at some
point?

Al

-- 
Albert Chu
[email protected]
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to