Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Sagar Mehta Fri, 26 Apr 2013 10:36:08 -0700

Hi Nitin,

Thanks for your reply.


Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different
users.

I think the JIRA that Sandy pointed out below is a good first step in that
direction.

Sagar

On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <[email protected]>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
>
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
>
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
>
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
>
>
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <[email protected]>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <[email protected]>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Reply via email to