Re: [gridengine users] Strategies for batch deleting jobs....

2016-08-12 Thread Reuti
.. > > in reality these jobs # are not sequential. > > my problem/question is how do I cancel/qdel all of Team B's jobs easily > It doesn't appear you can delete by job name You can, even with wildcards: $ qdel "*TeamB*" The explanation of "wc_job_range_l

Re: [gridengine users] jobs allocate cores on node but do nothing

2016-08-12 Thread Reuti
uld be a `mpihello.c` where you put an almost endless loop inside and switch off all optimizations during compilations to check whether these slave processes are distributed in the correct way. Note that some applications will check the number of cores they are running on and start by OpenMP (not

Re: [gridengine users] How can i make gridengine not to use ssh?

2016-08-11 Thread Reuti
> Am 11.08.2016 um 12:14 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>: > > Dear Reuti and Hugh, > > thank you for your quick replies. > > I have changed control_slaves to TRUE and job_is_first_task to FALSE, > but it did not help. > > on my side: >

Re: [gridengine users] How can i make gridengine not to use ssh?

2016-08-11 Thread Reuti
el job may be delayed by one to two minutes (until they face a timeout). -- Reuti > Am 10.08.2016 um 21:15 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>: > > Hello, > > My problem: How can i make gridengine not to use ssh? > > Installed: > openmpi-2.0.0 - config

Re: [gridengine users] How can i make gridengine not to use ssh?

2016-08-10 Thread Reuti
he slaves (or even locally on the master node of the parallel job) at times of MPICH(1) (will it allow [n] or [n-1] calls, also to check proper programming by this). Nowadays Open MPI will start only one daemon per exechost by `qrsh -inherit ...` and the granted limit won't be reached anyway.

Re: [gridengine users] evenly balancing multiple array jobs from the same user

2016-08-10 Thread Reuti
Am 10.08.2016 um 21:35 schrieb Jaime Huerta Cepas: > Thanks Reuti, > the problem I have is that the queue system is used to handle jobs submitted > through a web service. Each job is an array job, and if 10 web users submit a > job, the last one in submitting has to wait un

Re: [gridengine users] How can i make gridengine not to use ssh?

2016-08-10 Thread Reuti
'switch' where i can let qsub run _not_ over ssh? > > If is is of interest, the output of 'qconf -sp orte' is: > pe_nameorte > slots 999 > user_lists NONE > xuser_listsNONE > start_proc_argsNONE > stop_proc_args NONE

Re: [gridengine users] evenly balancing multiple array jobs from the same user

2016-08-10 Thread Reuti
...` on a per job basis (could even be put in a JSV) b) the setting "max_aj_instances" in SGE's configuration `qconf -mconf` HTH -- Reuti > thanks a lot in advance, > -jaime > ___ > users mailing list > users@g

Re: [gridengine users] Hybrid PE

2016-07-22 Thread Reuti
nly the given hosts in the file you provide here) - export OMP_NUM_THREADS=5 * The format is explained in `man sge_pe` start_proc_args "$pe_hostfile". -- Reuti PS: Maybe a prolog could also alter the provided $SGE_JOB_SPOOL_DIR/pe_hostfile directly - I never tested it to alter it but

Re: [gridengine users] Grid efficiency from the point of view of memory usage

2016-07-21 Thread Reuti
Then h_vmem will be reserved and at one point your jobs will start essentially. -- Reuti > > Regards, > Sudha > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may

Re: [gridengine users] num_proc and hyperthreading

2016-07-19 Thread Reuti
itching it off on a BIOS level will also change the internal operation of the pipeline in the CPU as there are no threads to handle any more. IIRC there was something about it on the Beowulf list. -- Reuti ___ users mailing list users@gridengine.org https://g

Re: [gridengine users] Listener on job parameters change ?

2016-07-13 Thread Reuti
t in the queue definition. This is not related to a particular job setting. How would you change the priority for a job? -- Reuti > Thanks in advance, > Julien > ___ > users mailing list > users@gridengine.org > https://gridengi

Re: [gridengine users] job # was rejected cause it can't be written: No such file or directory.

2016-07-11 Thread Reuti
mountpoint? Maybe it was (re)mounted in read only mode - can you create a file in /opt/gridengine/default/spool/qmaster by hand? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Tailoring the output using the qacct command

2016-07-05 Thread Reuti
to these tools. -- Reuti > Am 05.07.2016 um 13:18 schrieb Nacheva, Eva <eva.nach...@kcl.ac.uk>: > > Hello, > > I am using the qacct command for sge. However, as I don't need all the > information that the command provides, I have been wondering if there is a > way to tailor

Re: [gridengine users] Prolog and Epilog options

2016-07-04 Thread Reuti
Hi, > Am 04.07.2016 um 15:14 schrieb Nacheva, Eva <eva.nach...@kcl.ac.uk>: > > Dear Reuti, > > In more detail, we are trying to use the prolog and epilog to gather > information about each job and send this information to a database which is > connected

Re: [gridengine users] Prolog and Epilog options

2016-07-03 Thread Reuti
ob during submission time? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Batch job on interactive queue

2016-07-01 Thread Reuti
Hi, > Am 01.07.2016 um 17:01 schrieb Jerome <jer...@ibt.unam.mx>: > > Dear Reuti > > Le 29/06/2016 à 16:34, Reuti a écrit : >> Hi, > >> Am 30.06.2016 um 03:39 schrieb Jerome : >> >> Dear Reuti >> >> Thank's for your answer. I use

Re: [gridengine users] Configuration management

2016-06-30 Thread Reuti
Hi, > How ? > > To add the node, I think it is: > qconf -ae $exechostname > > But how to add it to host list without any interactive (term/display) way (I > am doing it remotely through SSH or salt). $ qconf -aattr hostgroup hostlist node01 @intel2670 reuti@login modified

Re: [gridengine users] Batch job on interactive queue

2016-06-30 Thread Reuti
Hi, > Am 30.06.2016 um 03:39 schrieb Jerome <jer...@ibt.unam.mx>: > > Dear Reuti > > Thank's for your answer. I use SGE for a while, and don't realize what you > explain here with -now yes It specifies whether the job must start immediately, or can start at any la

Re: [gridengine users] Batch job on interactive queue

2016-06-29 Thread Reuti
ial? Interactive is more like immediate, and you can run an interactive job in a plain batch queue with "-now no", resp. a batch job in an interactive queue if you request "-now yes". Note that parallel jobs are unrelated to this setting and even with qtype NONE they will run as l

Re: [gridengine users] bsub -w "started(aJob)"

2016-06-29 Thread Reuti
Hi, Am 29.06.2016 um 11:28 schrieb Ueki Hikonuki: > Hi Reuti, > > I tried your example, but since I'm not familiar with AR, I couldn't achieve > what I wanted. > > Finally I came up with the following Ruby script, which waits until a > specified job > becom

Re: [gridengine users] How to removed jobs stuck in DR state

2016-06-28 Thread Reuti
still thinks they are present there, you can use (as SGE manager): $ qdel -f to adjust the bookeeping. -- Reuti > Any advise would be greatly appreciated. > > Regards, > Sean Smith > > ___ > us

Re: [gridengine users] Create high priority users on a specific queue

2016-06-28 Thread Reuti
the job requirements, it would be better to implement it in another way: by using a forced boolean or string complex with an attached urgency. Special users can then request this complex (either by theirselfs or by an JSV automatism) which must also be attached to a queue of your choice

Re: [gridengine users] Configuration management

2016-06-28 Thread Reuti
t depends whether you choose classic spooling or berkley DB during installation. Although changing the files by hand in the frmer case my work in some cases, the best is to use the `qconf` command to change certain settings. Otherwise changes may not be reflected instantly. As these files are onl

Re: [gridengine users] exclusive scheduling on SGE 6.2u5p2 (under openMPI 1.4.4)

2016-06-27 Thread Reuti
can be requested though. > Previous SGE versions used to have the option "qsub -l exclusive..", but > 6.2u5p2 does not. This is a different issue but nevertheless: did you define a complex "exclusive" of with "relop" = "excl" and attached it to

Re: [gridengine users] -inherit and -notify

2016-06-27 Thread Reuti
ce depending on the time the USR2 arrives, there is a forked sleep 1 or not which will use the default behavior. You can try: while (trap - usr2; exec sleep 1); do -- Reuti > Any idea ? > > Best regards, > Julien > __

Re: [gridengine users] user list restriction

2016-06-21 Thread Reuti
and would like to restrict users from viewing others > queue\nodes that are not assign to them(for ex: qstat\qhost commands)? No. There is no such feature in SGE. You can use a wrapper, but users can bypass it once they got the knowledge. -- Reuti __

Re: [gridengine users] Error message- failed receving gdi request when calling qsub, but job is started

2016-06-21 Thread Reuti
message: “Unable to run job: > failed receiving gdi request. Exiting” Are all machines running the same version of SGE? Do the jobs later succeed on exactly the same nodes they failed before? -- Reuti > But the job runs successfully when it is seen later with qstat. > > We tri

Re: [gridengine users] prolog execution location and behavior?

2016-06-08 Thread Reuti
No, the prolog and epilog are run on their own like the jobscript. The jobscript isn't a child of the prolog. For such a behavior you would need a starter_method. Still on vacation - Reuti Von meinem iPhone gesendet > Am 08.06.2016 um 12:00 schrieb Chris Dagdigian <d...@sonsor

Re: [gridengine users] pe submitted jobs don't run

2016-04-29 Thread Reuti
qconf -sq all.q ... pe_list smp -- Reuti > Or did i misunderstand your question? > > Kind regards, ulrich > > > > On 04/28/2016 07:38 PM, Reuti wrote: >> Hi, >> >> Am 28.04.2016 um 19:18 schrieb Ulrich Hiller: >> >>> D

Re: [gridengine users] pe submitted jobs don't run

2016-04-28 Thread Reuti
ccounting_summary FALSE Was this PE also attached to any of your queues? -- Reuti > > Then i created i simple file: > ~> cat teste.ttt > #!/bin/bash > # > #$ -cwd > #$ -S /bin/bash > #$ -o out.txt > #$ -e err.txt > #$ -pe smp 8 > /bin/date >

Re: [gridengine users] email coming from compute farm

2016-04-18 Thread Reuti
quot; should be mangled here, what is not done by default). I can't find it in short-term, but I'll look it up. Maybe it's done on some of your servers. -- Reuti > -Original Message- > From: Feng Zhang [mailto:prod.f...@gmail.com] > Sent: Monday, April 18, 2016 10:33 AM >

Re: [gridengine users] email coming from compute farm

2016-04-18 Thread Reuti
ot /usr/sge/bin/lx24-em64t/sge_execd root root \_ /bin/sh /usr/sge/cluster/tmpspace.sh sgeadmin root \_ sge_shepherd-61837 -bg ... -- Reuti > Anywhere else I should be looking? > > Thanks for the time. > > -Original Message- > From: Reuti [mailto:re...@s

Re: [gridengine users] grid engine nodes don't submit job

2016-04-18 Thread Reuti
.s...@gmail.com>: > Hi, > > I tried to add four nodes into my grid, I have other 4 nodes working fine. I > can see the nodes infotmation using qhost, but for a reason they don't submit > the job. The may be a static output, based on the information you added with `qconf -a

Re: [gridengine users] Scheduler configuration regd high mem jobs in high priority queue

2016-04-18 Thread Reuti
g the high priority queue they could have their personal ".sge_request" in their home directory or even specific subdirectories. Another way would be to implement a JSV (job submission verifier) and attach "-R y" only to certain jobs depending on some criteria. -- Re

Re: [gridengine users] DRMAA: environment variables values truncated at equal sign

2016-04-18 Thread Reuti
his bug is still here... AFAICS this targets DRMAA_V_ENV. Using the DRMAA_NATIVE_SPECIFICATION my work by setting "-v MY_VAR=foo=bar". -- Reuti > 2016-04-06 17:11 GMT+02:00 Julien Nicoulaud <julien.nicoul...@gmail.com>: > Hi all, > > When submitting jobs via DRMAA with en

Re: [gridengine users] email coming from compute farm

2016-04-16 Thread Reuti
Am 16.04.2016 um 21:48 schrieb Smith, David [ETHUS]: > I use sendmail. When you say the daemon, do you mean sendmail or grid engine > daemon? Mainly the `sendmail` one here, but essentially both should run as root. -- Reuti > Original Message > > Subject:

Re: [gridengine users] queue XXX.q marked QERROR as result of job NNN's failure at host compute-N-M2-10.local

2016-04-16 Thread Reuti
ed admin email address will get a copy where an error may show up like "Can't open input/output file" or "No such file or directory" which might point to an NFS bottleneck. Can you pleaes check the emails of SGE sent to: $ qconf -sconf ... administrator_mail reuti

Re: [gridengine users] Specifying operating system version in qsub command

2016-04-16 Thread Reuti
"MATCHING TYPES". Despite that these values can be supplied just with a loop in `qconf -aattr exechost`, some prefer to use a load sensor to get the correct values set automatically. -- Reuti > > Regards, > Sudha > The information contained in this electron

Re: [gridengine users] allowing a single level of qsub recursion

2016-04-16 Thread Reuti
uld be changed/created to read: $ qconf -sconf nodeXY qlogin_command /bin/false But there won't be a nice error message, it will just try to establish the connection and fails essentially. https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html section "LOCAL CONFIGURA

Re: [gridengine users] email coming from compute farm

2016-04-16 Thread Reuti
ot for all exec hosts. Does > anyone know where this is controlled? This depends on the used mailer. Postfix or sendmail? Either a specific setting is made there to change the sender of the email automatically, or maybe the daemon is running under a user account itself. -- Reuti > >

Re: [gridengine users] SoGE 8.1.8 - Job IDs getting reset very fast 9999999 ==> 1 - 6-7 times in a month

2016-04-15 Thread Reuti
n3380a02000-3380a0300"..., 1024) = 1024 > [pid 1048] write(2, "tils.so.1.3\n3380a02000-3380a0300"..., 1024) = 1024 > [pid 1048] read(9, " \n7fc6ba459000-7fc6bc00 ---p"..., 1024) = 1024 > [pid 1048] write(2, " \n7fc6ba459000-7fc6bc00 ---p"

Re: [gridengine users] SoGE: hyperthreading on or off?

2016-03-19 Thread Reuti
chines, there are also several CPUs without HT capability. -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SoGE 8.1.8 - Job IDs getting reset very fast 9999999 ==> 1 - 6-7 times in a month

2016-03-14 Thread Reuti
Hi, > Am 13.03.2016 um 08:53 schrieb Yuri Burmachenko <yur...@mellanox.com>: > > Hello Reuti, > > We will try that, but we have also found another issue. > > We see that also our SoGE fails and failovers from master to shadow and > vice-versa during the same t

Re: [gridengine users] alias for queue

2016-03-10 Thread Reuti
> Am 10.03.2016 um 13:03 schrieb William Hay <w@ucl.ac.uk>: > > On Thu, Mar 10, 2016 at 11:15:58AM +0100, Reuti wrote: >> >>> Am 10.03.2016 um 10:45 schrieb William Hay <w@ucl.ac.uk>: >>> >>> On Thu, Mar 10, 2016 at 08:1

Re: [gridengine users] alias for queue

2016-03-10 Thread Reuti
queue is needed. Would there be any flaw if the internal check by GridEngine is done only after the JSV has run? This would allow the JSV to correct any mis-submission and/or pseudo/dummy/alias to be resolved, being it names of queues or complexes. -- Reuti signature.asc Description: Messa

Re: [gridengine users] Large cluster with memory reservation leaving cores idle

2016-03-09 Thread Reuti
Hi Chris, Am 08.03.2016 um 18:29 schrieb Christopher Black: > Thanks for the reply Reuti! > > Sounds like some of the suggestions are moving limits out of RQS and into > complexes and consumable resources. Yep. > How do we make that happen without > requiring us

Re: [gridengine users] Large cluster with memory reservation leaving cores idle

2016-03-08 Thread Reuti
.. Here one could use a globale complex for each type of queue, as long as the users specify the particular queue. One will lose the ability that potentially a job may be scheduled to different types of queues, as long as resource requests are met. I can't predict whether this would improve anyt

Re: [gridengine users] SoGE 8.1.8 - Job IDs getting reset very fast 9999999 ==> 1 - 6-7 times in a month

2016-03-08 Thread Reuti
> Am 08.03.2016 um 10:59 schrieb Yuri Burmachenko <yur...@mellanox.com>: > > Hello Reuti, > > See below: > > Job IDJob schedule time > 97453 29-02-2016_03:18:55 > 97454 29-02-2016_03:18:57 > 563 29-02-2016_03:23:44

Re: [gridengine users] SGE crashes immediately after re-start

2016-03-06 Thread Reuti
en references to this condition being fixed by deleting the > job, but how do I do this? We use BDB spooling. This grid is running > SGE 6.2U5. Is the job still running? It looks like it finished already. Nevertheless: did you try a `qdel -f `? -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SoGE 8.1.8 - Max number of grid nodes

2016-03-06 Thread Reuti
Hi, Am 06.03.2016 um 18:48 schrieb Yuri Burmachenko: > Hello Reuti, > > Just last question, is adjusting MAX_DYN_EC is a disruptive action or not and > is the effect immediate or restart required for SGE master? All changes are live and won't interrupt any scheduling. In fact:

Re: [gridengine users] SoGE 8.1.8 - Max number of grid nodes

2016-03-06 Thread Reuti
Am 06.03.2016 um 18:21 schrieb Yuri Burmachenko: > Hi Reuti, > > No we didn't change that setting, but anyway I could not find it in `qconf > -mconf`, see below the contents of this command: Please have a look at `man sge_conf` about it. - Reuti > execd_spool_dir

Re: [gridengine users] SoGE 8.1.8 - Job IDs getting reset very fast 9999999 ==> 1 - 6-7 times in a month

2016-03-06 Thread Reuti
our systems including SGE master/shadow and its execution hosts. To elaborate this. When it suddenly jumps to : what was the highest JOB_ID which was recorded before that skip in the accounting file? -- Reuti > Perhaps something sets $SGE_ROOT/default/spool/qmaster/jobseqnum to “999

Re: [gridengine users] SoGE 8.1.8 - Max number of grid nodes

2016-03-06 Thread Reuti
There is a setting of MAX_DYN_EC in SGE's configuration `qconf -mconf` , but should default to 99. Did you increase it already beyond it to 950? -- Reuti > Does this mean that the limit is 950 grid nodes? > Is it possible to increase this number? > > What are our options if we reach 950 grid no

Re: [gridengine users] rebooting nodes nicely - what happened?

2016-03-01 Thread Reuti
ally in `qconf -mc`) to each exechost and request this when submitting the reboot job? Even one slot would be enough then to get exclusive access to each node. -- Reuti > script: > > ME=`hostname` > > EXECHOSTS=`qconf -sel` > > for TARGETHOST in $EXECHOSTS; do > >

Re: [gridengine users] Understanding load_formula and load calculations for queue overloads..

2016-03-01 Thread Reuti
es a being used for further scheduling. Are these quad-CPU machines without hyperthreading to gain a total of 56 cores? -- Reuti > - and I suppose that's exactly what the default "np_load_avg" does.. awesome! > > > we basically have 2 kinds of queue - a workhorse queue &q

Re: [gridengine users] Understanding load_formula and load calculations for queue overloads..

2016-02-29 Thread Reuti
never ever > put our nodes in alarm mode, we use zabbix to monitor machine's health and we > automatically take it out of the cluster (by disabling all of it's queues) in > cases of "mess" (disk failures, out of space, mounting issues, stuff like > that). Are these in

Re: [gridengine users] Understanding load_formula and load calculations for queue overloads..

2016-02-28 Thread Reuti
tion - the alarm_threshold would take care of this and avoid further jobs to be started on this machine). If you have serial jobs only or parallel jobs which scale well, this isn't necessary to be set. (BTW: we use alarm_threshold only to put a machine in alarm state to avoid further dispatc

Re: [gridengine users] invalid task number 0

2016-02-16 Thread Reuti
quot; order I never saw it before. And it's not an array task which got an illegal lower boundary? -- Reuti > 02/16/2016 08:20:07|worker|nfs-2|W|Skipping remaining 352 orders > 02/16/2016 08:20:07|worker|nfs-2|E|reinitialization of "scheduler" > > They persist over sc

Re: [gridengine users] Strange CPU time usages being recorded for jobs

2016-02-13 Thread Reuti
| RT | start_time | > FROM_UNIXTIME(end_time) | sbmt_time| (CPU/3600) | > +-++-+-+-+---+---+---+ > > > | pangjx | 143320 |277.7775 | 145.3114 | 2015-07-23 21:56:51 | > 2015-07-29 23:15:32 | 2015-07-23 21:56:40 | 277.777

Re: [gridengine users] qstat sometimes doesn't report all currently running jobs

2016-02-12 Thread Reuti
ob), not to mention parallel jobs where you can get a bunch of entries (depending on the PE setting "accounting_summary"). What's the goal behind it? Do you want to start another job or start something external? Besides looking into DRMAA, there is also a small tool `qevent

Re: [gridengine users] RoundRobin scheduling among users

2016-01-27 Thread Reuti
e assigned and limited on a exechost level to 4 (this works, unless the user foul the system and request 0 for this complex, but a JSV could handle it). The complex could also be assigned with an arbitrary high value on a cluster level and an RQS could limit it on certain machines. -- Reuti > Sh

Re: [gridengine users] "cannot run until clean up of an previous run has finished"

2016-01-27 Thread Reuti
> Am 27.01.2016 um 04:40 schrieb Marlies Hankel <m.han...@uq.edu.au>: > > On 01/26/2016 11:22 PM, Reuti wrote: >>> Am 25.01.2016 um 23:32 schrieb Marlies Hankel <m.han...@uq.edu.au>: >>> >>> Hi all, >>> >>> Thank you. So I

Re: [gridengine users] "cannot run until clean up of an previous run has finished"

2016-01-26 Thread Reuti
list while > all other exechosts have NONE there. All of these job IDs belong to jobs that > are either running fine or still queueing. Ran this job before on node3 and were rescheduled already - are there any `qacct` entries for them? -- Reuti > But I do not know from where this file

Re: [gridengine users] nodes group

2016-01-25 Thread Reuti
ee=sse4a, sse=sse4.1 and sse4.2 attached to the exechosts complex_values line. Then one can request: $ qsub -l "sse=sse4.?" ... to get only the latter two. The complete documentation is in `man sge_types` section "MATCHING TYPES". -- Reuti Am 25.01.2016 um 20:27 schrieb

Re: [gridengine users] "cannot run until clean up of an previous run has finished"

2016-01-25 Thread Reuti
The directory node17 will be recreated automatically when the execd starts on node17. -- Reuti > > Also, does anyone know what might have caused this error to appear in the > first place? > > Best wishes and thank you in advance for your help > > Marlies > >

Re: [gridengine users] RoundRobin scheduling among users

2016-01-25 Thread Reuti
ight_urgency0.10 weight_priority 1.00 max_reservation 32 default_duration 8760:00:00 -- Reuti > On Mon, Jan 25, 2016 at 11:25:53AM -0800, Christopher Heiny wrote: >> >> Hi all, >> >> We've been

Re: [gridengine users] nodes group

2016-01-25 Thread Reuti
n some nodes due to architecture/hardware constraints? -- Reuti > -- > Atte. > > Dimar González Soto > Ingeniero Civil en Informática > Universidad Austral de Chile > > > ___ > users mailing list > users@grideng

Re: [gridengine users] How to set the location of checkpoint files per user?

2016-01-08 Thread Reuti
ition could be placed in each script, but this way it's necessary to change it only in the checkpointing definition in case you want to move it to a different location. -- Reuti *) This could be placed in a starter_method too. ___ users mailing list users@g

Re: [gridengine users] May I build hybrid SGE?

2015-12-17 Thread Reuti
> Am 17.12.2015 um 13:19 schrieb Steven Du <edgef...@gmail.com>: > > Hi Reuti/Joshua, > > I tried but failed. Here I wonder ( I am using 2011.11p1 version.): > > 1. If all GRID members have to be NFS shared spooling dir, such as > $SGE_CELL/common. Otherwise, t

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Reuti
Maybe `mpirun` doesn't support/use Hydra. Although not required, the MPI standard specifies `mpiexec` as a portable startup mechanism. Doesn't Intel MPI also have an `mpiexec`, which would match the `mpirun` behavior (and doesn't use Hydra)? -- Reuti > Am 17.12.2015 um 15:06 schrieb Gowt

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-16 Thread Reuti
call to `qrsh` automatically set up as you would need to start certain daemons beforehand. Does your version still need mpdboot? Do you request a proper set up PE in your job submission? -- Reuti > > Thank you for your time and help. > > Best regards, > g > > -- > Gow

Re: [gridengine users] May I build hybrid SGE?

2015-12-15 Thread Reuti
opt/software/$ARC/foobar Having directories /opt/software/lx-amd64 and /opt/software/lx-x86 you will always get the correct binary. I even use this to distinguish between amd64 and em64t to get the correct binaries for each type of CPU although the jobscript stays the same. -- Reuti > I wil

Re: [gridengine users] Old OGS bug

2015-12-14 Thread Reuti
AFAIK it's fixed in SoGE. If you don't want to upgrade you can try an instant fix: $ cat trim.sh #!/bin/sh sed -i "/=()/s/.*/& ; }/" $SGE_JOB_SPOOL_DIR/environment and: $ qconf -sq all.q ... prologsgeadmin@/usr/sge/cluster/trim.sh -- Reuti PS: One error is

Re: [gridengine users] Possible BDB corruption?

2015-11-17 Thread Reuti
uot;) > fscanf(fp, sge_u32, _nr) > guess_job_nr = guess_highest_job_number(); > job_nr = MAX(job_nr, guess_job_nr); I wasn't aware that also the BDB is checked. For a fresh installation it should be sufficient to fill the jobseqnum file with the proper value to start from. -- Reuti &g

Re: [gridengine users] Suspension of jobs and consumeable resources

2015-11-16 Thread Reuti
adays, you could try to have >> swap as large as the built-in memory. >> The result should be, that any suspended job can be swapped out completely. >> >> -- Reuti > > So long I just lowered the default value of h_vmem (assigned when it isn't > ordered explicit with

Re: [gridengine users] Suspension of jobs and consumeable resources

2015-11-16 Thread Reuti
e. Although I usually use only 2 GB of swap nowadays, you could try to have swap as large as the built-in memory. The result should be, that any suspended job can be swapped out completely. -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] sharetree reporting?

2015-11-11 Thread Reuti
> Am 10.11.2015 um 17:33 schrieb Carl G. Riches <c...@u.washington.edu>: > > On Tue, 10 Nov 2015, Reuti wrote: > >> >>> Am 09.11.2015 um 19:02 schrieb Carl G. Riches <c...@u.washington.edu>: >>> >>> On Sat, 7 Nov 2015, Reuti wrote: >

Re: [gridengine users] sharetree reporting?

2015-11-10 Thread Reuti
> Am 09.11.2015 um 19:02 schrieb Carl G. Riches <c...@u.washington.edu>: > > On Sat, 7 Nov 2015, Reuti wrote: > >> Hi, >> >> Am 06.11.2015 um 22:03 schrieb Carl G. Riches: >> >>> >>> We've configured a sharetree policy and would l

Re: [gridengine users] Failure in launching jobs with qrsh

2015-11-09 Thread Reuti
any custom script which tries to access /tmp/9572476.1.preprod.q/pid - I can't remember that it's created by SGE in this directory by default. -- Reuti > Can you please let me know the reason for these errors. > > Regards, > Sudha > The information contained in this elect

Re: [gridengine users] sharetree reporting?

2015-11-07 Thread Reuti
rue reporting=true \ > flush_time=00:00:15 joblog=true sharelog=21600 Were any jobs dispatched during the interval? -- Reuti > but nothing is being written to the reporting file. Is there something we > missed? > > We are using Open Grid Scheduler/Grid

Re: [gridengine users] error message "not enough slots available in the system" for parallel jobs

2015-11-06 Thread Reuti
option too. To ease the handling for the users, a JSV could add the exclusive resource in case a particular PE is requested. It could also adjust the number of requested slots from any value to the range "1-" to use the complete node, independent from the number of built-in cores which

Re: [gridengine users] Queue configurations still stored in text files?

2015-11-03 Thread Reuti
they're still text files. However you have to changes them through the > qconf command. Fiddling > aound with them directly isn't supported. they should be safe to read though. As the page states: only for classic spooling w/o a DB. But a `grep` to get the values is no big deal, as to c

Re: [gridengine users] Possible Causes of: critical error: unrecoverable error - contact systems manager

2015-11-03 Thread Reuti
export whatever this blank environment variable key is, it's > core dumping. I always tell my users to avoid using -V. Sometimes they made some changes in the actual shell to the environment, submit a job and when the job starts after 3 days and crashes it's hard to investigate, as t

Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help.

2015-10-31 Thread Reuti
Ups - typo... Am 31.10.2015 um 19:36 schrieb Leior Varon: > Hi Yuri, > I think it's best that you be the contact with the forum but I think it would > be good to answer something similar to the below: > > Hi Reuti, > > Thank you for the reply. > We also noticed tha

Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help.

2015-10-31 Thread Reuti
Am 31.10.2015 um 19:36 schrieb Leior Varon: > Hi Yuri, > I think it's best that you be the contact with the forum but I think it would > be good to answer something similar to the below: > > Hi Reuti, > > Thank you for the reply. > We also noticed that requesting &

Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help.

2015-10-30 Thread Reuti
E_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm > > Any ideas why does this strange behavior occur? Is this some kind of a bug? > How this can be resolved? Unfortunately I have no idea, but I observed already in former versions that instead of: -l h=foo -q bar it's

Re: [gridengine users] SGE job-status email: anyway to customize the included information?

2015-10-29 Thread Reuti
Am 30.10.2015 um 00:16 schrieb Lane, William: > Is it possible to customize the kind of information included w/an SGE > job-status email that is sent via the -m switch to qsub? I've noticed it > includes (at least on our SoGE install): > > User = lanew >

Re: [gridengine users] Trying to code C program to process SGE job-status email

2015-10-22 Thread Reuti
and end time of a job based on the job number parsed from the subject? Besides race conditions: one could parse the accounting file. -- Reuti -Bill L. From: users-boun...@gridengine.org [users-boun...@gridengine.org] on behalf of Daniel Gruber [dgru...@univa.com] Sent: Tuesday, October 20,

Re: [gridengine users] Trying to code C program to process SGE job-status email

2015-10-20 Thread Reuti
Hi, Are there local configurations overriding the global setting: $ qconf -sconfl -- Reuti Am 20.10.2015 um 09:24 schrieb Lane, William: > I've been trying to code a C program to process the SGE job-status email to > determine if a job failed due to exceeding the h_rt limitation of a d

Re: [gridengine users] execution node installation error

2015-10-15 Thread Reuti
Is the $SGE_ROOT shared too? The location of the spool directory for the exechosts can be checked in `qconf -sconf` ("execd_spool_dir"). -- Reuti > > -- Shazly > > On Thu, Oct 15, 2015 at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, &

Re: [gridengine users] execution node installation error

2015-10-15 Thread Reuti
output you may get. -- Reuti > Am 15.10.2015 um 14:52 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > Yes it is. > > Why do you think that the exec dirs weren't created? all the permissions and > ownerships are granted. > I'm using this script: inst_sge_sc to m

Re: [gridengine users] installing gridengine-exec noninteractive mode

2015-10-15 Thread Reuti
hared between them. -- Reuti > On Wed, Oct 14, 2015 at 12:20 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > > Am 14.10.2015 um 05:42 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > Hello, > > > > I'm a debian user and I'm having

Re: [gridengine users] execution node installation error

2015-10-15 Thread Reuti
entials for connection > > Does this mean that this user should get SUDO privileges? No. Do you start the daemons as root user? -- Reuti > On Thu, Oct 15, 2015 at 2:56 PM, Reuti <re...@staff.uni-marburg.de> wrote: > The spool directory is created when the execd starts

Re: [gridengine users] execution node installation error

2015-10-15 Thread Reuti
They must run as root to allow switching to any user to run a job. reuti@node:~> ps -eo user,ruser,command | grep sge sgeadmin root /usr/sge/bin/lx24-amd64/sge_execd > Am 15.10.2015 um 15:43 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > No. the daemon process is

Re: [gridengine users] sge_execd: corrupted mem or cpu at startup and buggy resource usage report

2015-10-14 Thread Reuti
$ qconf -sconf | grep gid_range Are real groups occupying the same specified range and processes outside of SGE use these too? -- Reuti > > > We run OGS/GE 2011.11p1 under Rocks 6.1.1. We had the same problem > (sge_execd) with Rocks 5.x. > > Any pointers/hint/suggestions/etc w

Re: [gridengine users] installing gridengine-exec noninteractive mode

2015-10-14 Thread Reuti
ring to the Debian packages from their distribution? Then it's better to ask on the Debian list, where the maintainer of these packages will read it. This list targets the configuration of SGE and compilation from scratch, but not individual Linux distributions. -- Reuti > > Any

Re: [gridengine users] exceeded h_rt limit, job aborts, exit status still 0?

2015-09-26 Thread Reuti
ossible of course, Do you submit jobs with "-notify" and/or a set s_rt? --Reuti > CPU = 00:00:00 > Max vmem = 10.229M > failed assumedly after job because: > job 187.1 died through signal KILL (9) > > I had thought an exit status of 0 indicat

Re: [gridengine users] Create short.q queue definition that limits the runtime of a job

2015-09-26 Thread Reuti
Hi, Am 24.09.2015 um 23:52 schrieb Lane, William: > Reuti, > > 1. > The exechost isn't the head node is it? We've always referred to our SGE > clusters as having > three types of nodes: submit nodes, compute nodes and head nodes. It's better to think of machines as of

Re: [gridengine users] Create short.q queue definition that limits the runtime of a job

2015-09-24 Thread Reuti
Hi, Am 23.09.2015 um 21:18 schrieb Lane, William: > Reuti, > > 1. > If more than one compute node takes part in the compute ring, how does one > determine > which one is the exechost? What do you mean by compute ring - a parallel job? The exechost is the one where the job

<    1   2   3   4   5   6   7   8   9   10   >