[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615894#comment-13615894
 ] 

Omkar Vinit Joshi commented on YARN-467:
----------------------------------------

The Underlying problem here is that ResourceLocalization is trying to localize 
files more than the allowed file limit per directory for the underlying local 
file system.

Proposed Solution :- ( For Public resources - localized under :- 
<local-dirs>/filecache/ )

We are going to maintain hierarchical directory structure inside the local 
directories for filecache.
so the directory structure will look like this

.../filecache/<default-~8192-files>
.../filecache/<36 directories (0-9 & a-z)>/<default-~8192-files>
.../filecache/<36 directories (0-9 & a-z)>/<36 directories (0-9 & a-z)>
.....................

So in all every directory will have (8192-36) localized files and 36 sub 
directories named 0-9 and a-z. These sub directories are created only if they 
are required. They will not be created in advance. Likewise every sub directory 
will have similar structure.

Now to manage files and to limit the number of files per directory to 
HierarchicalDirectory#PER_DIR_FILE_LIMIT (in this case 8192) introducing below 
classes / implementation.

* LocalResourcesTrackerImpl :-
** maintainHierarchicalDir  :- a boolean flag. It should be set when you want 
to use this resource tracker to track resources with hierarchical directory 
structure.
** directoryMap :- Map of <Path, HierarchicalDirectory>. It makes sure that we 
have one HierarchicalDirectory for every localPath. ( For example if we have 
two local-dirs configured then it will have 2 entries.)
** inProgressRsrcMap :- Map of <LocalResourceRequest, Path>. This is used while 
local resource is getting localized. This map helps in two ways
*** If the resource localization fails for that resource then we can retrieve 
the path and remove the file reservation (file count)
*** If the LocalResourceRequest comes again for the same resourcerequest ( 
which is highly unlikely for today's implementation) it can return the same 
path back.
** getPathForLocalResource :- This method should be called to retrieve the 
Hierarchical directory path for the local-dir identified by the localDirPath. 
Internally it adds this request and returned path to inProgressRsrcMap and 
makes a reservation into the HierarchicalDirectory tracking this local-dir-path.
** decFileCountForHierarchicalPath :- It retrieves the localizedPath from 
either inProgressRsrcMap or from LocalizedResource and then reduces file count 
for the HierarchicalDirectory tracking it.
** localizationCompleted :- (Parameter - success) If true then it will only 
update inProgressRsrcMap; otherwise it will update inProgressRsrcMap and will 
also call decFileCountForHierarchicalPath.

* HierarchicalDirectory :- It just helps in managing hierarchical directories.
** PER_DIR_FILE_LIMIT :- It controls the files per directory /sub directories 
of it. Can be controlled but should not be set too low 
(YarnConfiguration.NM_LOCAL_CACHE_NUM_FILES_PER_DIRECTORY).
** DIRECTORIES_PER_LEVEL (constant 36) :- So every directory/sub-directory will 
have total 36 directories only if they are required. ( 0-9 and a-z). Reason 
behind using single character is the file length limit for windows.
** vacantSubDirectories :- Queue<HierarchicalSubDirectory> :- at the beginning 
this will have root of the HierarchicalDirectory as the only sub directory. if 
the queue becomes empty then new sub directory will be created starting with 0. 
Note:- It will only create internal tracking for this and doesn't create an 
actual directory on file system.
** knownSubDirectories :- Map of <String, HierarchicalSubDirectory> - Root 
directory is identified by an empty string "" and then other sub directories by 
their relative paths. like for directory 0:"0" for 0/a :"0/a"
** getHierarchicalPath :- (synchronized) This method returns the relative path 
for the sub directory which is empty (has not reached its directory file 
limit). If no empty sub directory is present then it will create one using 
totalSubDirectories.
** decFileCountForPath :- (synchronized) This method reduces the count for the 
HierarchicalSubDirectory representing the passed in relative path.

                
> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-467
>                 URL: https://issues.apache.org/jira/browse/YARN-467
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.0.0-alpha
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>         Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>       at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>       at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>       at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>       at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to