By the way, it seems that this ended up being a hard-coded environment variable name "LOCAL_DIRS" instead of ApplicationConstants.LOCAL_DIR_ENV, which we can't see defined anywhere. John
-----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Monday, October 21, 2013 12:11 PM To: <user@hadoop.apache.org> Subject: Re: temporary file locations for YARN applications The dirs in that env-var are app-specific and are for the app's user to utilize. You shouldn't have any permission issues working within them. The LocalDirAllocator is still somewhat MR-bound but you can still be able to make it work by giving it a config with the values it needs. On Mon, Oct 21, 2013 at 8:49 PM, John Lilley <john.lil...@redpoint.net> wrote: > Thanks again. This gives me a lot of options; we will see what works. > > Do you know if there are any permissions issues if we directly access the > folders of LOCAL_DIR_ENV? > > Regarding LocalDirAllocator, I see its constructor: LocalDirAllocator(String > contextCfgItemName) and a note mentioning that an example of this item is > "mapred.local.dir". Is that the correct usage, or is there something > YARN-generic? > > Cheers, > john > > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Sunday, October 20, 2013 11:58 PM > To: <user@hadoop.apache.org> > Subject: Re: temporary file locations for YARN applications > > Hi, > > MR does use multiple disks when spilling. But the work directory is also > round-robined to spread I/O. > > YARN sets an environment property thats a list (comma separated value) > of directories (ApplicationConstants.LOCAL_DIR_ENV) your app container > can together use. Perhaps read it in with > StringUtils.getTrimmedStrings(System.getenv(ApplicationConstants.LOCAL > _DIR_ENV)); and then round robin internally over those paths (with > free space handling)? > > Perhaps you can even reuse the org.apache.hadoop.fs.LocalDirAllocator > class; which is what MR uses. Its not been declared publicly stable though, > but we can do that over a JIRA. > > On Mon, Oct 21, 2013 at 2:05 AM, John Lilley <john.lil...@redpoint.net> wrote: >> Harsh, thanks for the quick response. These files don't need to be on the >> DFS (although we use that too). These are local files used during sorting, >> joining, transitive closure. >> >> The task-relative folder might be good enough, but our app *can* make use of >> multiple temp folders if they are available. Our YARN app can be fairly I/O >> intensive; is it possible to allocate more than one temp folder on different >> physical devices? >> >> Or perhaps YARN might help us. Will YARN assign tasks to CWD folders on >> different disks so that they do not compete with each other on I/O? >> >> For that matter, where does MR allocate the temporary files generated by >> Mapper output? Presumably MR has the same I/O parallelism requirements that >> we do. >> >> Thanks >> John >> >> >> -----Original Message----- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Sunday, October 20, 2013 10:49 AM >> To: <user@hadoop.apache.org> >> Subject: Re: temporary file locations for YARN applications >> >> Every container gets its own local work directory (You can use the relative >> ./) thats auto-cleaned up at the end of the container's life. >> This is the best place to store the temporary files. This is not something >> you need custom configuration for. >> >> Do the files need to be on a distributed FS or a local one? >> >> On Sun, Oct 20, 2013 at 8:54 PM, John Lilley <john.lil...@redpoint.net> >> wrote: >>> We have a pure YARN application (no MapReduce) that has need to >>> store a significant amount of temporary data. How can we know the >>> best location for these files? How can we ensure that our YARN >>> tasks have write access to these locations? Is this something that must be >>> configured outside of YARN? >>> Thanks, >>> John >> >> -- >> Harsh J > > > > -- > Harsh J -- Harsh J