Re: Question on running simultaneous jobs

2008-01-10 Thread Doug Cutting
Aaron Kimball wrote: Multiple students should be able to submit jobs and if one student's poorly-written task is grinding up a lot of cycles on a shared cluster, other students still need to be able to test their code in the meantime; I think a simple approach to address this is to limit the

Re: Question on running simultaneous jobs

2008-01-10 Thread Khalil Honsali
used by a job. From: Xavier Stevens [mailto:[EMAIL PROTECTED] Sent: Wed 1/9/2008 2:57 PM To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets

RE: Question on running simultaneous jobs

2008-01-10 Thread Joydeep Sen Sarma
[mailto:[EMAIL PROTECTED] Sent: Thu 1/10/2008 9:50 AM To: hadoop-user@lucene.apache.org Subject: Re: Question on running simultaneous jobs Aaron Kimball wrote: Multiple students should be able to submit jobs and if one student's poorly-written task is grinding up a lot of cycles on a shared cluster

Re: Question on running simultaneous jobs

2008-01-10 Thread Doug Cutting
Joydeep Sen Sarma wrote: if the cluster is unused - why restrict parallelism? if someone's willing to wake up at 4am to beat the crowd - they would just absolutely hate this. [It would be better to make your comments in Jira. ] But if someone starts a long-running job at night that uses the

RE: Question on running simultaneous jobs

2008-01-10 Thread Runping Qi
:[EMAIL PROTECTED] Sent: Thursday, January 10, 2008 9:57 AM To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs this may be simple - but is this the right solution? (and i have the same concern about hod) if the cluster

Re: Question on running simultaneous jobs

2008-01-10 Thread Doug Cutting
Runping Qi wrote: An improvement over Doug's proposal is to make the limit soft in the following sense: 1. A job is entitled to run up to the limit number of tasks. 2. If there are free slots and no other job waits for their entitled slots, a job can run more tasks than the limit. 3. When a job

Re: Question on running simultaneous jobs

2008-01-10 Thread Arun C Murthy
On Thu, Jan 10, 2008 at 10:26:46AM -0800, Doug Cutting wrote: Joydeep Sen Sarma wrote: if the cluster is unused - why restrict parallelism? if someone's willing to wake up at 4am to beat the crowd - they would just absolutely hate this. [It would be better to make your comments in Jira. ] But

Re: Question on running simultaneous jobs

2008-01-10 Thread Ted Dunning
: Question on running simultaneous jobs Aaron Kimball wrote: Multiple students should be able to submit jobs and if one student's poorly-written task is grinding up a lot of cycles on a shared cluster, other students still need to be able to test their code in the meantime; I think a simple

RE: Question on running simultaneous jobs

2008-01-10 Thread Joydeep Sen Sarma
@lucene.apache.org Subject: Re: Question on running simultaneous jobs Runping Qi wrote: An improvement over Doug's proposal is to make the limit soft in the following sense: 1. A job is entitled to run up to the limit number of tasks. 2. If there are free slots and no other job waits

Re: Question on running simultaneous jobs

2008-01-10 Thread Doug Cutting
Joydeep Sen Sarma wrote: can we suspend jobs (just unix suspend) instead of killing them? We could, but they'd still consume RAM and disk. The RAM might eventually get paged out, but relying on that is probably a bad idea. So, this could work for tasks that don't use much memory and whose

RE: Question on running simultaneous jobs

2008-01-10 Thread Joydeep Sen Sarma
are blessed to be in this state). From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thu 1/10/2008 2:24 PM To: hadoop-user@lucene.apache.org Subject: Re: Question on running simultaneous jobs Joydeep Sen Sarma wrote: can we suspend jobs (just unix suspend

Re: Question on running simultaneous jobs

2008-01-09 Thread Michael Bieniosek
Hadoop-0.14 introduced job priorities (https://issues.apache.org/jira/ browse/HADOOP-1433); you might be able to get somewhere with this. Another possibility is to create two mapreduce clusters on top of the same dfs cluster. The mapred.tasktracker.tasks.maximum doesn't do what you think --

Re: Question on running simultaneous jobs

2008-01-09 Thread Ted Dunning
You may need to upgrade, but 15.1 does just fine with multiple jobs in the cluster. Use conf.setNumMapTasks(int) and conf.setNumReduceTasks(int). On 1/9/08 11:25 AM, Xavier Stevens [EMAIL PROTECTED] wrote: Does Hadoop support running simultaneous jobs? If so, what parameters do I need to

RE: Question on running simultaneous jobs

2008-01-09 Thread Xavier Stevens
: Wednesday, January 09, 2008 1:50 PM To: hadoop-user@lucene.apache.org Subject: Re: Question on running simultaneous jobs You may need to upgrade, but 15.1 does just fine with multiple jobs in the cluster. Use conf.setNumMapTasks(int) and conf.setNumReduceTasks(int). On 1/9/08 11:25 AM, Xavier

RE: Question on running simultaneous jobs

2008-01-09 Thread Joydeep Sen Sarma
Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets the total number of map/reduce tasks. When setting the total number of map tasks I get an ArrayOutOfBoundsException within Hadoop; I believe because of the input dataset size (around 90

Re: Question on running simultaneous jobs

2008-01-09 Thread Aaron Kimball
the number of machines used by a job. From: Xavier Stevens [mailto:[EMAIL PROTECTED] Sent: Wed 1/9/2008 2:57 PM To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets the total

Re: Question on running simultaneous jobs

2008-01-09 Thread Ted Dunning
To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets the total number of map/reduce tasks. When setting the total number of map tasks I get an ArrayOutOfBoundsException within Hadoop; I believe because

Re: Question on running simultaneous jobs

2008-01-09 Thread Jeff Hammerbacher
understanding is that with HOD u can restrict the number of machines used by a job. From: Xavier Stevens [mailto:[EMAIL PROTECTED] Sent: Wed 1/9/2008 2:57 PM To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs