Hi Nitin, Thanks for your reply.
Yes this is exactly what we are doing by asking the user to modify the ,hiverc and then using ACLs [white-lists] by configuring mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or are not allowed to] As I said in one of the other threads, besides being a manual approach, it also exposes the policy where user A is asked to modify his/her .hiverc to submit jobs to queue X and user B is asked to modify his/her .hiverc to submit jobs to queue Y potentially with different scheduling properties. We want this to be more or less transparent to the user. We have a decent sized cluster [200 nodes] with more than 30+ different users. I think the JIRA that Sandy pointed out below is a good first step in that direction. Sagar On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <[email protected]>wrote: > the current capacity scheduler guarantees that which users can submit jobs > to which queue and other related features. > More of which you can read at > http://hadoop.apache.org/docs/stable/capacity_scheduler.html > > but on the hive side, unless you set mapred.job.queue.name on the hive > cli, they will be submitted to default job queue. > > So basically what you would like to do is create user, associate it with a > queue on scheduler and ask the user to modify its queue on local hiverc > file. > > I am not sure if this can be part of hive's metastore. Because one user > can be allowed to submit the job to multiple queues and then best way to > handle it is via setting the property each time you open the session or via > hiverc file > > > On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <[email protected]>wrote: > >> Hi Sagar, >> >> This capability currently does not exist in the fair scheduler (or other >> schedulers, as far as I know), but a JIRA has been filed recently that >> addresses a similar need. Would >> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what >> you're trying to do? If not, would you mind filing a new JIRA for the >> functionality you'd want? >> >> -Sandy >> >> >> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <[email protected]>wrote: >> >>> Hi Guys, >>> >>> We have a general purpose Hive cluster [about 200 nodes] which is used >>> for various jobs like >>> >>> - Production >>> - Experimental/Research >>> - Adhoc queries >>> >>> We are using the fair-share scheduler to schedule them and for this we >>> have corresponding 3 pools in the scheduler. >>> >>> *Here is what we want.* >>> >>> *A hive query submitted by a user with user-name A should go to one of >>> the pools above based on a pre-defined mapping. We are wondering where/how >>> to specify this mapping?* >>> >>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a >>> particular job run.* >>> >>> This puts the job on the map-reduce queue named "X" and the following >>> configuration in the fair-share scheduler >>> >>> <property> >>> <name>mapred.fairscheduler.poolnameproperty</name> >>> <value>mapred.job.queue.name</value> >>> </property> >>> >>> maps this to a pool named "X" in the fair-share scheduler. >>> >>> However we [while wearing our Hadoop developer/admin hat] don't want the >>> user/analyst to specify that so as to enforce some cluster-use policy. >>> >>> Based on his/her username we want to automatically select which hadoop >>> queue and subsequently which fair-share scheduler pool, his/her job should >>> go to. I'm pretty sure this is a common use-case and wondering how to do >>> this in Hadoop. >>> >>> Any help/insights/pointers would be greatly appreciated. >>> >>> Sagar >>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries. >>> >>> >>> >>> >> > > > -- > Nitin Pawar >
