Hi Sandy, Thanks for your prompt reply!!
The jira that you pointed out would make it easy for us to do the automatic mapping and getting close towards enforcing a policy automatically. Any idea when it would be incorporated into cdh/hadoop releases and if it could be back-ported for cdh3u2 which we have currently running in production? Currently we are getting around this using the -Dmapred.job.queue.name="X" and the subsequent mapping of map-red job queue to Fair-share scheduler pool. We are using ACLs [more of a white-list] by configuring mapred-queue-acls.xml to ensure people can only submit to the right queue. *Two limitations of this round-about approach are* 1. It is manual 2. It exposes the policy where user A is asked to submit jobs to queue X and user B is asked to submit jobs to queue Y [with different scheduler properties]. We want this to be completely transparent to the user of our cluster. The jira above would be a great first step towards such automatic mapping!! Cheers, Sagar On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <[email protected]>wrote: > Hi Sagar, > > This capability currently does not exist in the fair scheduler (or other > schedulers, as far as I know), but a JIRA has been filed recently that > addresses a similar need. Would > https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're > trying to do? If not, would you mind filing a new JIRA for the > functionality you'd want? > > -Sandy > > > On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <[email protected]> wrote: > >> Hi Guys, >> >> We have a general purpose Hive cluster [about 200 nodes] which is used >> for various jobs like >> >> - Production >> - Experimental/Research >> - Adhoc queries >> >> We are using the fair-share scheduler to schedule them and for this we >> have corresponding 3 pools in the scheduler. >> >> *Here is what we want.* >> >> *A hive query submitted by a user with user-name A should go to one of >> the pools above based on a pre-defined mapping. We are wondering where/how >> to specify this mapping?* >> >> *We can do this manually by adding -Dmapred.job.queue.name="X" on a >> particular job run.* >> >> This puts the job on the map-reduce queue named "X" and the following >> configuration in the fair-share scheduler >> >> <property> >> <name>mapred.fairscheduler.poolnameproperty</name> >> <value>mapred.job.queue.name</value> >> </property> >> >> maps this to a pool named "X" in the fair-share scheduler. >> >> However we [while wearing our Hadoop developer/admin hat] don't want the >> user/analyst to specify that so as to enforce some cluster-use policy. >> >> Based on his/her username we want to automatically select which hadoop >> queue and subsequently which fair-share scheduler pool, his/her job should >> go to. I'm pretty sure this is a common use-case and wondering how to do >> this in Hadoop. >> >> Any help/insights/pointers would be greatly appreciated. >> >> Sagar >> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries. >> >> >> >> >
