Hi, I have a cluster where we have a parallel networked file system for our major data storage and our nodes have ~750G of local SSD space. To speed up things, we configure yarn.nodemanager.local-dirs to use the local SSD for local caching.
Recently, I've been trying to do a terasort of 2 terabytes of data over 8 nodes w/ Hadoop 2.7.3. So that's about 6000 gigs of local SSD space for caching, or 5400 gigs when hadoop uses its 90% disk full checking limit. I always get diskfull errors such as the below when running: 2017-04-11 12:31:44,062 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /l/ssd/achutest/localstore/yarn-nm error, used space above threshold of 90.0%, removing from list of valid directories 2017-04-11 12:31:44,063 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed: 1/1 local-dirs are bad: /l/ssd/achutest/localstore/yarn-nm; 2017-04-11 12:31:44,063 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs are bad: /l/ssd/achutest/localstore/yarn-nm; What I don't understand is how I am getting diskfull errors. Within terasort, I should have at most 2000 gigs of mapped intermediate data and at most 2000 gigs of merged data in reducers. Even assuming some overhead from Hadoop, I should have more than enough space for this benchmark to complete given maps and reducers are spread out evenly across nodes. So my assumption is something else is being cached in local-dirs that I'm not accounting for. Is there any other data I should consider when coming up with my estimates? One guess I had. Is it possible spilled data from reducer merges are not deleted until a reducer completes? Given my example above, the total amount of merged data in reducers may exceed 2000 gigs at some point? Al -- Albert Chu [email protected] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
