Re: [gridengine users] Jobs and reservation

2011-10-04 Thread William Hay
2011/10/4 Carlos Fernández Iglesias : > Hello, > > Is there a way to associate a job to a reservation so it would only > execute when the reservation starts and in the node the reservation is made? > > Thanks. The -ar flag to qsub does this I believe. William

Re: [gridengine users] huge array job - queues in "E" mode

2011-10-04 Thread William Hay
On 4 October 2011 15:40, Schmidt U. wrote: > Dear all, > sometimes I have trouble with array jobs. > e.g. #$ t 1-5000 > Then it happens, that some jobs are submitted and some are rejected back > to "qw". In that case the "touched" queues are set into "E" mode. > I have a cron job to "qmod -cq "*"

Re: [gridengine users] cannot run in PE "mpich" because it only offers 0 slots

2011-10-06 Thread William Hay
On 6 October 2011 09:39, wzlu wrote: > Dear All, > > There are 144 nodes in my queue and I configured 1 slot for each node. That > is 144 nodes with 144 slots. > The PE is used 121 slots now. One job need 12 PE's slots and there are > enough nodes and slots for this job. > But it queued by "cannot

Re: [gridengine users] changing your own job's order

2011-10-07 Thread William Hay
On 7 October 2011 09:45, Balint Takacs wrote: > Can I somehow change the *relative* priority of my own jobs? > I am working in a company environment where lots of people are competing for > grid resources, and jobs usually has to queue. Some of my jobs are more > important than others, but they so

Re: [gridengine users] Parallel GE jobs on 48-way nodes

2011-10-11 Thread William Hay
On 11 October 2011 12:55, Reuti wrote: > Am 10.10.2011 um 20:46 schrieb Gerald Ragghianti: > >> We have a cluster consisting of 48-core compute nodes where we need to run >> parallel (MPI) jobs across nodes.  There is a hardware limitation on the QDR >> Infiniband cards that limits the available

Re: [gridengine users] Parallel GE jobs on 48-way nodes

2011-10-12 Thread William Hay
On 11 October 2011 23:33, Gerald Ragghianti wrote: > >> Like the OP mentioned, one could use a consumable complex for 6.1. If you >> add "complex_values network=16" to the queue, and "load_thresholds >> network=15" it will be pushed to alarm state automatically and you can avoid >> the load sen

[gridengine users] Order of prolog start_proc_args

2011-10-18 Thread William Hay
Grid engine allows you to define: a)prolog in the qrid engine config (qconf -sconf) b)prolog in the queue definition (qconf -sq) c)start_proc_args in a pe definition. Is the order in which these are run defined anywhere? Likewise for epilog and stop_proc_args. I'm hoping I can avoid having to re

Re: [gridengine users] Order of prolog start_proc_args

2011-10-18 Thread William Hay
On 18 October 2011 10:58, Reuti wrote: > Am 18.10.2011 um 11:42 schrieb William Hay: > >> Grid engine allows you to define: >> a)prolog in the qrid engine config (qconf -sconf) >> b)prolog in the queue definition (qconf -sq) >> c)start_proc_args in a pe definition.

[gridengine users] Gaussian Linda & SGE

2011-10-21 Thread William Hay
We've purchased a license for Gaussian with support for parallelism via Linda. A quick google doesn't show up any tight integrations for Linda/SGE. i)Does anyone have a working tight integration config for Linda? Or even a Gaussian specific one? ii)If not does anyone have experience of running Li

Re: [gridengine users] Two clusters, one gridengine to rule them all?

2011-11-04 Thread William Hay
On 4 November 2011 07:24, Johan Finstadsveen wrote: > Hi, > Unsure whether this is the correct forum for this debate. > > We are currently in the process of acquiring a gpu-cluster. From before we > have a cpu-based cluster running Rocks 5.3 and SGE. The desire from the > users is to have three di

Re: [gridengine users] Beware Univa FUD

2011-11-10 Thread William Hay
On 10 November 2011 03:46, Ron Chen wrote: > > 4) Fritz was telling customers (including William Hay) that open source Grid > Engine is "buggy, unstable, hard to debug", and to use SGE in production > customers need to buy support from Univa. I should point out this wa

Re: [gridengine users] Announcing Grid Engine 2011.11 and the Scalable Grid Engine Support Program

2011-11-10 Thread William Hay
On 9 November 2011 16:22, Rayson Ho wrote: > The Open Grid Scheduler Project is releasing a new release: Grid > Engine 2011.11. We are going back to the open source model that was So the software is still called Grid Engine even though the project is open Grid Scheduler? > used by Sun Microsystem

[gridengine users] qrsh wrappers

2011-11-11 Thread William Hay
Looking at the various rsh impersonating qrsh wrappers provided with SGE I notice that the difference between the mpi/rsh and the mpi/openmpi/rsh wrapper is that the openmpi variant uses the -V option to pass all environment variables through to the slave processes. What if anything is the downside

[gridengine users] Intel MPI tight integration?

2011-11-14 Thread William Hay
Are there any guides to doing a tight integration between SGE and Intel MPI? There is a guide to loose integration on the Intel website with a comment suggesting that the mpich2_mpd integration on the sunsource site should work (presumably the same as the mpd section at http://arc.liv.ac.uk/SGE/ho

Re: [gridengine users] Intel MPI tight integration?

2011-11-14 Thread William Hay
On 14 November 2011 12:50, Reuti wrote: > Hi, > > Am 14.11.2011 um 13:41 schrieb William Hay: > >> Are there any guides to doing a tight integration between SGE and >> Intel MPI?  There is a guide to loose integration on the Intel website >> with a comment

Re: [gridengine users] Intel MPI tight integration?

2011-11-14 Thread William Hay
On 14 November 2011 14:28, Reuti wrote: > Am 14.11.2011 um 15:24 schrieb William Hay: > >> On 14 November 2011 12:50, Reuti wrote: >>> Hi, >>> >>> Am 14.11.2011 um 13:41 schrieb William Hay: >>> >>>> Are there any guides to doing a t

Re: [gridengine users] execution daemon on host * didn't accept task

2011-11-16 Thread William Hay
On 16 November 2011 03:29, Vang Le wrote: > Hello GridUsers, > My grid is running, it can deliver jobs, but they only run on one nodes at a > time. > When I tried running with mpirun in a batch script, i get errors like > "execution daemon on host   didn't accept task" as shown at the > bottom

Re: [gridengine users] Beware Univa FUD

2011-11-16 Thread William Hay
On 16 November 2011 00:10, Dave Love wrote: > William Hay writes: > >> On 10 November 2011 03:46, Ron Chen wrote: >> >>> >>> 4) Fritz was telling customers (including William Hay) that open source >>> Grid Engine is "buggy, unstable, hard to d

Re: [gridengine users] Beware Univa FUD

2011-11-16 Thread William Hay
On 16 November 2011 09:38, Reuti wrote: > While I on my own use SGE on all machines I set up, we have access to a > cluster using Torque and I noticed something similar. Besides that we need a > tight integration of parellel jobs using the Linda library (i.e. Gaussian), > and as there is nothi

[gridengine users] Useful side effect of profiling. Any harmful ones?

2011-11-16 Thread William Hay
I added PROFILE=1 to the params of sched_conf in order to measure where the scheduler was spending its time. I discovered a fortuitous side effect is that any "job BLAH should have finished since" lines appear between the PROF: sge_mirror and PROF: static urgency lines. We have a script that pro

Re: [gridengine users] Beware Univa FUD

2011-11-16 Thread William Hay
On 16 November 2011 09:38, Reuti wrote: > Am 16.11.2011 um 10:24 schrieb William Hay: > >> On 16 November 2011 00:10, Dave Love wrote: >>> William Hay writes: >>> >>>> On 10 November 2011 03:46, Ron Chen wrote: >>>> >>>>> &

Re: [gridengine users] Beware Univa FUD

2011-11-16 Thread William Hay
On 16 November 2011 11:51, Reuti wrote: > Am 16.11.2011 um 12:45 schrieb William Hay: > >>> >>> While I on my own use SGE on all machines I set up, we have access to a >>> cluster using Torque and I noticed something similar. Besides that we need >>>

Re: [gridengine users] execution daemon on host * didn't accept task

2011-11-16 Thread William Hay
On 16 November 2011 13:52, Vang Le wrote: > Hi William and Reuti, > Thank you for your suggestions and your time. They are really helpful. I > solved almost of my problems. > > I installed rsh-redone-client and rsh-redone-server, also I modify my PE so > that "control_slaves TRUE" is set. I can ru

Re: [gridengine users] execution daemon on host * didn't accept task

2011-11-16 Thread William Hay
On 16 November 2011 13:52, Vang Le wrote: > I googled and there was something mentioned about editing /etc/hosts.equiv > file to permit rsh and rlogin without password. However, typing "qconf > -mconf" at the management host, I saw this: > > rlogin_daemon/usr/sbin/sshd -i > r

Re: [gridengine users] execution daemon on host * didn't accept task

2011-11-16 Thread William Hay
On 16 November 2011 13:52, Vang Le wrote: > Hi William and Reuti, > Thank you for your suggestions and your time. They are really helpful. I > solved almost of my problems. > > I installed rsh-redone-client and rsh-redone-server, also I modify my PE so > that "control_slaves TRUE" is set. I can ru

Re: [gridengine users] robustness/scalability

2011-11-17 Thread William Hay
On 16 November 2011 23:44, Dave Love wrote: > William Hay writes: >> The main issue we >> currently have with SGE is the time a scheduling cycle takes.  We're >> currently trying to tweak the configuration to minimise the work SGE >> has to do while still impl

Re: [gridengine users] qrsh wrappers

2011-11-18 Thread William Hay
On 17 November 2011 10:27, Reuti wrote: > The wrappers are no longer used in case you use a recent version of Open MPI > (compiled --with-sge) or MPICH2. Both call directly `qrsh -inherit -V ...` if > they discover that they are running under SGE and entries in > start-/stop_proc_args can be s

Re: [gridengine users] cannot run in PE ... because it only offers 0 slots

2011-11-18 Thread William Hay
On 18 November 2011 14:21, Gerard Henry wrote: > hello all, > > i got trouble to confgure a queue on SGE 6.2u5 (linux) > > I have two machines amd64, with this topology: SCCSCC so the total of > cores is 8. > > first, i defined a group: > # qconf -shgrp @qlong > group_name @qlong > hostlist charyb

Re: [gridengine users] define PE on special hosts

2011-11-19 Thread William Hay
On 19 November 2011 04:53, mahbube rustaee wrote: > Hi, > I defined a queue on @node-grp (a group of nodes). > I defined mpi2 parallel environment as: > start_proc_args    /opt/gridengine/mpi/startmpi. > sh $pe_hostfile > stop_proc_args /opt/gridengine/mpi/stopmpi.sh > allocation_rule    2 > c

Re: [gridengine users] define h_vmem via RQS

2011-11-19 Thread William Hay
On 19 November 2011 05:03, mahbube rustaee wrote: > Hi, > > I define slots of all hosts with: > { >    name limit-slots-of-hosts >    description  limits slots of clusters's hosts >    enabled  TRUE >    limit    hosts {@gpu} to slots=48 >    limit    hosts {@xeon} to slots=24

Re: [gridengine users] define PE on special hosts

2011-11-19 Thread William Hay
On 19 November 2011 09:58, mahbube rustaee wrote: > > > On Sat, Nov 19, 2011 at 12:05 PM, William Hay wrote: >> >> On 19 November 2011 04:53, mahbube rustaee wrote: >> > Hi, >> > I defined a queue on @node-grp (a group of nodes). >> > I defined mp

Re: [gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?

2011-11-23 Thread William Hay
On 22 November 2011 20:05, Chris Dagdigian wrote: > > Hi folks, > > I'm hands-on with a shiny new cluster running Univa's 8.0.1 release and > am having some issues running jobs as a non-root user via an account > that lives in Active Directory. > > The cluster is the standard sort of RHEL 5.7 base

[gridengine users] CFX tight integration

2011-11-24 Thread William Hay
Are there any instructions for getting CFX working under tight integration? It appears to work OK loosely integrated but it doesn't appear to work under our existing integrations. If we use an rsh resembling wrapper around qrsh I get the following output: + cfx5solve -max-elapsed-time '14 [min]'

Re: [gridengine users] CFX tight integration

2011-11-24 Thread William Hay
On 24 November 2011 12:59, Reuti wrote: > Am 24.11.2011 um 12:51 schrieb William Hay: > >> Are there any instructions for getting CFX working under tight >> integration?  It appears to work OK loosely integrated but it doesn't >> appear to work under our existing in

Re: [gridengine users] CFX tight integration

2011-11-27 Thread William Hay
B_ID and TASK_ID to a file just before invoking qrsh and all seems sensible). William > > Brian > > -Original Message- > From: wish.dum...@gmail.com [mailto:wish.dum...@gmail.com] On Behalf Of > William Hay > Sent: Saturday, November 26, 2011 2:21 AM > To: Murphy, B

[gridengine users] PE range

2011-12-06 Thread William Hay
128 > 2147483648 A user has submitted a job requesting 128 slots: qstat -j producing the following output: parallel environment: qlc-[1ABCDEFGHIJTWKLMNOPX] range: 128 qalter -w v produces the following: Job 404311 cannot run in PE "qlc-H" because it only offers 2147483648 slots Job 404311 cannot

Re: [gridengine users] PE range

2011-12-06 Thread William Hay
On 6 December 2011 09:48, Reuti wrote: > Hi, > > Am 06.12.2011 um 10:04 schrieb William Hay: > >> 128 > 2147483648 >> >> A user has submitted a job requesting 128 slots: >> qstat -j producing the following output: >> parallel environment:  qlc-[1A

Re: [gridengine users] qacct and exclusive host

2011-12-06 Thread William Hay
On 6 December 2011 10:21, Reuti wrote: > Hi, > > Am 04.12.2011 um 11:57 schrieb mahbube rustaee: > >> I defined an exclusive  tag in complex resources for users that can request >> "-l excl=1 " >> Such users, lock free slots of hosts  that have added excl=true to >> consumable resources. >> >> 1

Re: [gridengine users] qacct and exclusive host

2011-12-06 Thread William Hay
On 6 December 2011 10:21, Reuti wrote: > Hi, > > Am 04.12.2011 um 11:57 schrieb mahbube rustaee: > >> I defined an exclusive  tag in complex resources for users that can request >> "-l excl=1 " >> Such users, lock free slots of hosts  that have added excl=true to >> consumable resources. >> >> 1

Re: [gridengine users] PE range

2011-12-06 Thread William Hay
On 6 December 2011 13:10, Reuti wrote: > Am 06.12.2011 um 12:16 schrieb William Hay: > >> On 6 December 2011 09:48, Reuti wrote: >>> Hi, >>> >>> Am 06.12.2011 um 10:04 schrieb William Hay: >>> >>>> 128 > 2147483648 >>&

Re: [gridengine users] PE range

2011-12-06 Thread William Hay
On 6 December 2011 16:03, Reuti wrote: > Am 06.12.2011 um 17:01 schrieb William Hay: > >> On 6 December 2011 13:10, Reuti wrote: >>> Am 06.12.2011 um 12:16 schrieb William Hay: >>> >>>> On 6 December 2011 09:48, Reuti wrote: >>>>> H

Re: [gridengine users] PE range

2011-12-07 Thread William Hay
On 6 December 2011 16:44, Reuti wrote: > Am 06.12.2011 um 17:32 schrieb William Hay: > >> On 6 December 2011 16:03, Reuti wrote: >>> Am 06.12.2011 um 17:01 schrieb William Hay: >>> >>>> On 6 December 2011 13:10, Reuti wrote: >>>>> Am 0

[gridengine users] Spontaneously unheld job

2011-12-07 Thread William Hay
One of our users is complaining that jobs they have put on hold with qalter -h u are becoming unheld without intervention from them. Is there any practical way to investigate this? AFAICS Grid Engine doesn't provide logging of hold and release events. I would expect it to be enabled by the joblo

Re: [gridengine users] PE range

2011-12-07 Thread William Hay
On 7 December 2011 11:24, Reuti wrote: > Am 07.12.2011 um 09:33 schrieb William Hay: > >> On 6 December 2011 16:44, Reuti wrote: >>> Am 06.12.2011 um 17:32 schrieb William Hay: >>> >>>> On 6 December 2011 16:03, Reuti wrote: >>>>> Am 0

Re: [gridengine users] PE range

2011-12-07 Thread William Hay
On 7 December 2011 11:24, Reuti wrote: > Can you try to create a copy of the job with `qresub` and change for the copy > the resource requests like a time limit. Any change? Well here I think I've found where my little trick with the JSV does bite me. If I qresub this job then it is resubmitte

Re: [gridengine users] no record accounting after qdel

2011-12-13 Thread William Hay
On 10 December 2011 09:01, mahbube rustaee wrote: > Hi all, > > some slicker  users  submit jobs and save needed output , then run qdel and > delete the job on sge. > there isn't any record accounting for such jobs whereas user has saved > outputs. > > Any suggestion to prevent mentioned senario?

Re: [gridengine users] round robin PE config

2011-12-13 Thread William Hay
On 13 December 2011 15:25, Lars van der bijl wrote: > Hey everyone, > > we have been running our sge for a while now but we implemented a new > technique and I'm having trouble figuring out how to make the grid > help with it. > > I have the following task / dependency structure. > > task1 > > tas

Re: [gridengine users] round robin PE config

2011-12-13 Thread William Hay
Possibly assigning a fair share to each job with -js would cause them to change priority between scheduling runs so different jobs would snaffle the reservations on each run. On 13 December 2011 15:57, Lars van der bijl wrote: > hey Reuti, > > I wrote a python api using networkx and a database la

Re: [gridengine users] redirect prolog output

2011-12-14 Thread William Hay
On 14 December 2011 07:50, mahbube rustaee wrote: > Hi , > > 1) By default prolog output  is output of user's job. > How  can I set other path/filename for prolog output? Well since your prolog appears to be shell. Add a line at the top of the script exec >/location/of/prolog/output > > 2) I set

Re: [gridengine users] Access complex resources from prolog script

2011-12-14 Thread William Hay
On 13 December 2011 19:11, Christoph Müller wrote: > Hi Reuti, > >> -Ursprüngliche Nachricht- >> Von: Reuti [mailto:re...@staff.uni-marburg.de] >> Gesendet: Dienstag, 13. Dezember 2011 19:20 >> An: Christoph Müller >> Cc: users@gridengine.org >> Betreff: Re: AW: AW: [gridengine users] Acce

Re: [gridengine users] Access complex resources from prolog script

2011-12-14 Thread William Hay
On 14 December 2011 08:53, Christoph Müller wrote: > Hi William, > >> -Ursprüngliche Nachricht- >> Von: wish.dum...@gmail.com [mailto:wish.dum...@gmail.com] Im Auftrag >> von William Hay >> Gesendet: Mittwoch, 14. Dezember 2011 09:47 >> An:

Re: [gridengine users] Managing Load Average

2011-12-14 Thread William Hay
On 13 December 2011 23:46, Gowtham wrote: > > In some of our Rocks 5.4.2 clusters running SGE > 6.2u5, I have been noticing the load average on > several compute nodes being significantly higher > than others when all cores/processors in all > compute nodes involved are doing about the same > amou

Re: [gridengine users] Access complex resources from prolog script

2011-12-14 Thread William Hay
On 14 December 2011 10:06, Christoph Müller wrote: > Hi William, > >> -Ursprüngliche Nachricht- >> Von: wish.dum...@gmail.com [mailto:wish.dum...@gmail.com] Im Auftrag >> von William Hay >> Gesendet: Mittwoch, 14. Dezember 2011 10:09 >> An:

[gridengine users] Schedule time stamp

2011-12-16 Thread William Hay
The schedule file contains lots of information on jobs. I believe that the 4th field for a RUNNING job is the start time of the job (in seconds since the epoch). Can someone (or better yet some docs) confirm this and if so is it guaranteed to match up with the start time for the head node in the a

Re: [gridengine users] Schedule time stamp

2011-12-16 Thread William Hay
On 16 December 2011 14:21, Reuti wrote: > Am 16.12.2011 um 14:08 schrieb William Hay: > >> The schedule file contains lots of information on jobs.  I believe >> that the 4th field for a RUNNING job is the start time of the job (in >> seconds since the epoch). > > I

Re: [gridengine users] old sun wiki

2011-12-22 Thread William Hay
On 21 December 2011 20:53, Rayson Ho wrote: > Hi Dave, > > Is the original wiki really under a free license? I could not find > references that explicitly give anyone the permission to use it. > I don't know but a colleague of mine noticed the disappearance as well. We found a copy of the page we

Re: [gridengine users] old sun wiki

2011-12-22 Thread William Hay
On 22 December 2011 14:04, Dave Love wrote: > William Hay writes: > >> I don't know but a colleague of mine noticed the disappearance as >> well.  We found a copy of the page we were looking for in the wayback >> machine so Dave isn't the only one to copy it. &g

[gridengine users] Intel MPI and hydra

2012-01-10 Thread William Hay
According to: http://arc.liv.ac.uk/pipermail/gridengine-users/2010-December/033190.html The mpiexec.hydra provided with Intel MPI doesn't tightly integrate with Grid Engine and one therefore has to use MPD (which is a pain). However the MPICH2 FAQ claims(http://wiki.mcs.anl.gov/mpich2/index.php/Fr

Re: [gridengine users] Intel MPI and hydra

2012-01-10 Thread William Hay
On 10 January 2012 15:40, Reuti wrote: > Am 10.01.2012 um 16:03 schrieb William Hay: > >> According to: >> http://arc.liv.ac.uk/pipermail/gridengine-users/2010-December/033190.html >> The mpiexec.hydra provided with Intel MPI doesn't tightly integrate >> with G

Re: [gridengine users] How setup queue priority?

2012-01-12 Thread William Hay
On 12 January 2012 11:41, Semi wrote: > I need to setup high and low priority queues for the same nodes. > I preferred to make it without subordinate lists. > I know, that the following parameters are dealing with this: > seq_no                10 The seq_no is used to determine which queue a j

Re: [gridengine users] how to change queue of a pending job ?

2012-01-27 Thread William Hay
On 27 January 2012 08:03, Gerard Henry wrote: > hello all, > soory if this is a trivial question, but i don't find how to change a > pending job from one queue to other queue: > # status -a qalter -q big should do it > ... >  queue   used   free > -- > CLUSTER     0      

Re: [gridengine users] Implementing a credit system in a cluster

2012-01-27 Thread William Hay
On 27 January 2012 14:01, Martin Gumbau wrote: > Hi, > > I don't known if it is possible and the best way to make it (if was > possible): > > SCENARIO: > > - 2 Cells (cell-A and cell-B) > > - Cell-A have 3 queues (q1-A, q2-A,q3-A) > > - Cell-B have 2 queues (q1-B, q2-B) > > ONLY IN CELL cell-A > >

Re: [gridengine users] SIGKILL as last resort terminate_method

2012-01-31 Thread William Hay
On 31 January 2012 08:09, Anton Löfgren wrote: > Any hints or insight on how this works by default would be much appreciated. sudo? > > Regards, > Anton > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SIGKILL as last resort terminate_method

2012-01-31 Thread William Hay
On 31 January 2012 08:55, Anton Löfgren wrote: >  Really? Or is that speculation? > > -Original Message- > From: wish.dum...@gmail.com [mailto:wish.dum...@gmail.com] On Behalf Of > William Hay > Sent: den 31 januari 2012 09:45 > To: Anton Löfgren > Cc: users@grid

Re: [gridengine users] NIS service doesn't work

2012-02-01 Thread William Hay
On 1 February 2012 09:49, Harris He, Kun - CD wrote: > Hi all, > > > > When I add a new user by NIS, I can access every servers by this user > account. But I cannot submit any job including interactive job at all. > System just return back “critical error: can't resolve group”. > > The NIS passwd

Re: [gridengine users] Simplifying Parallel Environments

2012-02-02 Thread William Hay
On 1 February 2012 19:42, Brian Smith wrote: > I've started a github page for some tools I've put together from various > bits of code, how-tos, etc. to simplify the setup of parallel > environments so that they work universally for all MPI implementations > (on x86_64 Linux) w/ tight-integration

Re: [gridengine users] submitting a job that submits jobs

2012-02-02 Thread William Hay
On 2 February 2012 15:33, Robert Hutton wrote: > Hi Everyone, > > I've just set up a small Grid Engine cluster, but I'm new to using Grid > Engine, and would like some advice on the best way to submit jobs that > in turn submit jobs.  What I'd like to do is: > > Run a regular shell script that loo

[gridengine users] Spurious queue membership

2012-02-06 Thread William Hay
We discovered a host where the infiniband connection was playing up. Our normal procedure for this is to remove the host from the hostgroups it is normally in and add it to a hostgroup associated with queues that only accept single node jobs (ie serial jobs and PEs with an allocation method of $pe_

Re: [gridengine users] Spurious queue membership

2012-02-06 Thread William Hay
On 6 February 2012 22:13, Reuti wrote: > Am 06.02.2012 um 10:55 schrieb William Hay: > >> We discovered a host where the infiniband connection was playing up. >> Our normal procedure for this is to remove the host from the >> hostgroups it is normally in and add it to a h

Re: [gridengine users] Another MATLAB + SGE Question

2012-02-08 Thread William Hay
On 8 February 2012 15:37, Prentice Bisbal wrote: > So I finally have MATLAB set up and working fine with SGE. I can submit > parallel and distributed jobs from MATLAB to SGE, and then SGE does its > thing. > I have one remaining problem, and I thought I'd ask here first before > talking to Mathwor

Re: [gridengine users] Advance reservations and disabled nodes/queues

2012-02-09 Thread William Hay
On 9 February 2012 16:28, Sabine Kreidl wrote: > Hi, > > I'm experiencing (SGE 8.0.0a) that disabled nodes (we have only one > queue) are added to advance reservations. As those nodes are usually > disabled for a reason (defect hard drive, etc...) this is not what I > would want to happen (especia

Re: [gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

2012-02-22 Thread William Hay
On 21 February 2012 19:20, Txema Heredia Genestar wrote: > Hello all, > > I am having some problems to run threaded jobs in SGE 6.1u4. In our > cluster, h_vmem is defined as a consumable attribute in all nodes. It is > mandatory, all jobs must request it, with a default value of 6Gb. That > constr

Re: [gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

2012-02-22 Thread William Hay
On 22 February 2012 08:21, Hay, William wrote: > On 21 February 2012 19:20, Txema Heredia Genestar > wrote: >> Hello all, >> >> I am having some problems to run threaded jobs in SGE 6.1u4. In our >> cluster, h_vmem is defined as a consumable attribute in all nodes. It is >> mandatory, all jobs m

[gridengine users] Determining why a job is in hqw

2012-02-22 Thread William Hay
We just got bit by https://arc.liv.ac.uk/trac/SGE/ticket/802 and it took me a lot longer to figure it out than it should have in part because there does not appear to be any indication when a job has an array dependency on another job (at least in 6.2u3 which we're using) All holds and dependencies

Re: [gridengine users] Determining why a job is in hqw

2012-02-23 Thread William Hay
On 22 February 2012 19:04, Dave Love wrote: > On Wed, 22 Feb 2012 14:39:00 + > William Hay wrote: > >> We just got bit by https://arc.liv.ac.uk/trac/SGE/ticket/802 and it > > Could you attach a script to submit dummy jobs that reproduce it? > I don't know for a f

Re: [gridengine users] Tricky consumables problem

2012-02-23 Thread William Hay
On 23 February 2012 00:36, Maes, Richard wrote: > Reuti, > For the example below where you spec which PE to instantiate. >> $ qsub -pe ixia* 1 job.sh > > Can this accept something other than wildcards?  Is there a way to make > it do REGEX?  Or ranges? > For a case where I have Ixia1, Ixia2, and

Re: [gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

2012-02-23 Thread William Hay
On 22 February 2012 17:57, Txema Heredia Genestar wrote: > William - Yours is my best bet. Long time ago I tried tinkering with the > "slots" attribute, but never thought about adding this threaded one. I > only see one (minor) flaw in your solution: I cannot ask for an interval > of threads (fro

Re: [gridengine users] Tricky consumables problem

2012-02-23 Thread William Hay
On 23 February 2012 09:31, Reuti wrote: > Am 23.02.2012 um 10:01 schrieb William Hay: > >> On 23 February 2012 00:36, Maes, Richard wrote: >>> Reuti, >>> For the example below where you spec which PE to instantiate. >>>> $ qsub -pe ixia* 1 job.sh &g

Re: [gridengine users] Tricky consumables problem

2012-02-23 Thread William Hay
On 23 February 2012 11:56, Reuti wrote: >>> >> That the pe is interpreted as a full pattern (per sge_types) which can >> be set to ixia[12] >> from the server side JSV is the undocumented part.  Sorry if I was unclear. > > Argh, this was an extension beyond 6.2u5 and the pe_name can be any > obj

Re: [gridengine users] Tricky consumables problem

2012-02-23 Thread William Hay
On 23 February 2012 13:49, Reuti wrote: > Am 23.02.2012 um 14:32 schrieb William Hay: > >> On 23 February 2012 11:56, Reuti wrote: >> >>>>> >>>> That the pe is interpreted as a full pattern (per sge_types) which can >>>> be set to ixia[1

Re: [gridengine users] Restricting queue to certain job types

2012-02-24 Thread William Hay
On 24 February 2012 14:58, Reuti wrote: ? > > Default values are only for consumables. True but you can get a similar effect by putting a default request in $SGE_ROOT/$SGE_CELL/sge_request. William ___ users mailing list users@gridengine.org https://g

Re: [gridengine users] Restricting queue to certain job types

2012-02-24 Thread William Hay
On 24 February 2012 15:11, William Hay wrote: > On 24 February 2012 14:58, Reuti wrote: >  ? >> >> Default values are only for consumables. > True but you can get a similar effect by putting a default request in > $SGE_ROOT/$SGE_CELL/sge_request. Sorry, meant S$

Re: [gridengine users] Restricting queue to certain job types

2012-02-24 Thread William Hay
On 24 February 2012 15:27, Reuti wrote: > Am 24.02.2012 um 16:12 schrieb William Hay: > >> On 24 February 2012 15:11, William Hay wrote: >>> On 24 February 2012 14:58, Reuti wrote: >>>  ? >>>> >>>> Default values are only for consumables.

Re: [gridengine users] Restricting queue to certain job types

2012-02-24 Thread William Hay
On 24 February 2012 15:53, Reuti wrote: > Am 24.02.2012 um 16:40 schrieb William Hay: > >> On 24 February 2012 15:27, Reuti wrote: >>> Am 24.02.2012 um 16:12 schrieb William Hay: >>> >>>> On 24 February 2012 15:11, William Hay wrote: >

Re: [gridengine users] check resource request for some users

2012-02-26 Thread William Hay
On 26 February 2012 09:13, mahbube rustaee wrote: > Hi all, > > How can prevent some users to use some resource with -l option ? JSV is best > way for that ? > > Thx If it is a consumable you could could try configuring a quota of 0 for them. William ___

Re: [gridengine users] check resource request for some users

2012-02-27 Thread William Hay
On 26 February 2012 12:41, Reuti wrote: > Am 26.02.2012 um 12:32 schrieb William Hay: > >> On 26 February 2012 09:13, mahbube rustaee wrote: >>> Hi all, >>> >>> How can prevent some users to use some resource with -l option ? JSV is best >>> way

Re: [gridengine users] how to put a sleep before every job?

2012-02-28 Thread William Hay
On 28 February 2012 11:02, Stefano Bridi wrote: > Hi list, I have a problem on a SGE setup where the home directory are > shared trough glusterfs and some job failed to start because of a > latency on the filesystem propagation between the login node and the > compute node. > What happen is that a

Re: [gridengine users] how to put a sleep before every job?

2012-02-28 Thread William Hay
On 28 February 2012 11:02, Stefano Bridi wrote: > Hi list, I have a problem on a SGE setup where the home directory are > shared trough glusterfs and some job failed to start because of a > latency on the filesystem propagation between the login node and the > compute node. > What happen is that a

Re: [gridengine users] Restricting / controlling the access to $TMPDIR

2012-02-29 Thread William Hay
On 29 February 2012 17:47, Reuti wrote: > Hi, > > Am 29.02.2012 um 18:07 schrieb Txema Heredia Genestar: > >> I want to control the usage of the local disk of our execution nodes. As far >> as I have found, the only related option offered by SGE is the h_fsize >> limit. But that will not work be

Re: [gridengine users] mem_free being treated as "per core" on certain queues

2012-02-29 Thread William Hay
On 29 February 2012 22:14, Joe Whitney wrote: > Hello, > > I am having a simple problem where the behaviour of mem_free resource > requests are being treated differently on two different queues (actually, > separate installations of SGE). > > For context, the hosts servicing queue.A have 32G/4core

Re: [gridengine users] mem_free being treated as "per core" on certain queues

2012-03-01 Thread William Hay
On 1 March 2012 07:41, Rayson Ho wrote: > On Thu, Mar 1, 2012 at 2:03 AM, William Hay wrote: >> It is possible that it is also consumable in installation/queue A but per >> JOB. > > Joe, > > As pointed out by William, that can likely be the root of the issue - >

[gridengine users] Job overconsuming slots

2012-03-07 Thread William Hay
We have multiple queue instances on each node each with slots equal to the number of cpus. To prevent oversubscription I added a slots consumable to each host restricting it to a number of slots equal to the cpus on the node. This has worked up to now but this morning there are a couple of jobs th

Re: [gridengine users] Job overconsuming slots

2012-03-07 Thread William Hay
On 7 March 2012 09:47, William Hay wrote: > We have multiple queue instances on each node each with slots equal to > the number of cpus.  To prevent oversubscription I added a slots > consumable to each host restricting it to a number of slots equal to > the cpus on the node. > Thi

Re: [gridengine users] Job overconsuming slots

2012-03-07 Thread William Hay
on't have the same issue. Is there any reason to believe the RQS solution will be more reliable than the host consumable solution (which has worked pretty well up to now)? > Regards, > On Wed, Mar 7, 2012 at 11:00 AM, William Hay wrote: >> >> On 7 March 2012 09:47, William

Re: [gridengine users] Job overconsuming slots

2012-03-07 Thread William Hay
On 7 March 2012 11:36, Reuti wrote: > Am 07.03.2012 um 11:18 schrieb William Hay: > >> On 7 March 2012 10:11, Mazouzi wrote: >>> I remember Reuti  proposed a solution using RQS: >>> >>>  { >>>    name         noverload >>>    descript

Re: [gridengine users] h_vmem consumable and mmap() files

2012-03-09 Thread William Hay
On 9 March 2012 14:50, Stuart Barkley wrote: > We are running into an awkward problem with one of our users jobs. > > We have h_vmem set as a consumable resource and have it set to the > physical memory (minus a small amount) on the systems.  These are > diskless systems and have no swap defined.

Re: [gridengine users] CPU time limit exceeded

2012-03-13 Thread William Hay
On 13 March 2012 09:59, Lars van der bijl wrote: > Hey everyone, > > Where having the following problem. > > randomly on some task we start getting "CPU time limit exceeded". we > don't specify a time limit. we do specify h_vmem. > this only happens on some tasks and not other. even between same t

[gridengine users] SGE config for a host

2012-03-13 Thread William Hay
We recently made a host specific sge_conf change (alternate prolog). This didn't propogate out until we softstoped and then restarted the execd even though we left it for a day. Is this normal or should we be concerned. William ___ users mailing list use

[gridengine users] Posix priority increase for part running array job

2012-03-15 Thread William Hay
A colleague of mine increased the posix priority of an array job part of which was running. The running parts have increased ppri,npprior and prior but qstat of the queued portion shows only ppri and npprior increased while prior remains as it was. In practice new tasks from the array job seem to

Re: [gridengine users] Posix priority increase for part running array job

2012-03-15 Thread William Hay
On 15 March 2012 11:23, Hay, William wrote: > A colleague of mine increased the posix priority of an array job part > of which was running.  The running parts have increased ppri,npprior > and prior but qstat of the queued portion shows only > ppri and npprior increased while prior remains as it w

  1   2   3   4   5   6   >