Hi Guys,
We have a general purpose Hive cluster [about 200 nodes] which is used for
various jobs like
- Production
- Experimental/Research
- Adhoc queries
We are using the fair-share scheduler to schedule them and for this we have
corresponding 3 pools in the scheduler.
*Here is what we want.*
*A hive query submitted by a user with user-name A should go to one of the
pools above based on a pre-defined mapping. We are wondering where/how to
specify this mapping?*
*We can do this manually by adding -Dmapred.job.queue.name="X" on a
particular job run.*
This puts the job on the map-reduce queue named "X" and the following
configuration in the fair-share scheduler
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>mapred.job.queue.name</value>
</property>
maps this to a pool named "X" in the fair-share scheduler.
However we [while wearing our Hadoop developer/admin hat] don't want the
user/analyst to specify that so as to enforce some cluster-use policy.
Based on his/her username we want to automatically select which hadoop
queue and subsequently which fair-share scheduler pool, his/her job should
go to. I'm pretty sure this is a common use-case and wondering how to do
this in Hadoop.
Any help/insights/pointers would be greatly appreciated.
Sagar
PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.