Thanks everyone, Seem like i hit the dead end. It's kind of funny when i read that jira; run it 4 time and everything will work.. where that magic number from..lol
respects On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <[email protected]> wrote: > https://issues.apache.org/jira/browse/MAPREDUCE-4398 > > is the bug that Robin is referring to. > > -- > Arpit Gupta > Hortonworks Inc. > http://hortonworks.com/ > > On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <[email protected]> > wrote: > > This is similar to issues I ran into with permissions/ownership of > mapred.system.dir when using the fair scheduler. We are instructed to set > the ownership of mapred.system.dir to mapred:hadoop and then when the job > tracker starts up (running as user mapred) it explicitly sets the > permissions on this directory to 700. Meanwhile when I go to run a job as > a regular user, it is trying to write stuff into mapred.system.dir but it > can't due to the ownership/permissions that have been established. > > Per discussion with Arpit Gupta, this is a bug with the fair scheduler and > it appears from your experience that there are similar issues with > hadoop.tmp.dir. The whole idea of the fair scheduler is to run jobs under > the user's identity rather than as user mapred. This is good from a > security perspective yet it seems no one bothered to account for this in > terms of the permissions that need to be set in the various directories to > enable this. > > Until this is sorted out by the Hadoop developers, I've put my attempts to > use the fair scheduler on holdÅ > > Regards, > Robin Goldstone, LLNL > > On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <[email protected]> > wrote: > > Hi Harsh, > Thanks for breaking it down clearly. I would say i am successful 98% > from the instruction. > The 2% is about hadoop.tmp.dir > > let's say i have 2 users > userA is a user that start hdfs and mapred > userB is a regular user > > if i use default value of hadoop.tmp.dir > /tmp/hadoop-${user.name} > I can submit job as usersA but not by usersB > ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging" > :userA:supergroup:drwxr-xr-x > > i googled around; someone recommended to change hadoop.tmp.dir to > /tmp/hadoop. > This way it is almost a yay way; the thing is > > if I submit as userA it will create /tmp/hadoop in local machine which > ownership will be userA.userA, > and once I tried to submit job from the same machine as userB I will > get "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to > Permission denied" > (as because /tmp/hadoop is own by userA.userA). vise versa if I delete > /tmp/hadoop and let the directory be created by userB, userA will not > be able to submit job. > > Which is the right approach i should work with? > Please suggest > > Patai > > > On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <[email protected]> wrote: > > Hi Patai, > > Reply inline. > > On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum > <[email protected]> wrote: > > Thanks for input, > > I am reading the document; i forget to mention that i am on cdh3u4. > > > That version should have the support for all of this. > > If you point your poolname property to mapred.job.queue.name, then you > can leverage the Per-Queue ACLs > > > Is that mean if i plan to 3 pools of fair scheduler, i have to > configure 3 queues of capacity scheduler. in order to have each pool > can leverage Per-Queue ACL of each queue.? > > > Queues are not hard-tied into CapacityScheduler. You can have generic > queues in MR. And FairScheduler can bind its Pool concept into the > Queue configuration. > > All you need to do is the following: > > 1. Map FairScheduler pool name to reuse queue names itself: > > mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name' > > 2. Define your required queues: > > mapred.job.queues set to "default,foo,bar" for example, for 3 queues: > default, foo and bar. > > 3. Define Submit ACLs for each Queue: > > mapred.queue.default.acl-submit-job set to "patai,foobar users,adm" > (usernames groupnames) > > mapred.queue.foo.acl-submit-job set to "spam eggs" > > Likewise for remaining queues, as you need itÅ > > 4. Enable ACLs and restart JT. > > mapred.acls.enabled set to "true" > > 5. Users then use the right API to set queue names before submitting > jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool): > > http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf > .html#setQueueName(java.lang.String) > > 6. Done. > > Let us know if this works! > > -- > Harsh J > > >
