Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec
nodes.
-Hugh
> On Mar 7, 2020, at 10:55, Reuti wrote:
>
> Hi,
>
> is it alwys failing on one and the same node? Or are several nodes affected?
> One guess could be that the file system is full.
>
> -- Reuti
>
>
What’s the output of ‘qconf -sq long.q’? Are you sure it doesn’t still
reference the old hostname, maybe within a hostgroup?
-Hugh
From: users-boun...@gridengine.org On Behalf Of
Mun Johl
Sent: Tuesday, October 29, 2019 1:38 PM
To: dpo...@gmail.com
Cc: users@gridengine.org
Subject: Re:
David,
Best for qhost sort would be to change your 'cluster' names to zero-padded, if
you really want that kind of sorting. Or you could create an alias like 'qhost
| sort -nk 1.8', assuming 'clusterX' is always true (the 8th character is where
you start the sort).
As Skylar says, if you want
cpu 19305.760s
mem 7.463TBs
io 70.435GB
iow 0.000s
maxvmem 532.004MB
arid undefined
ar_sub_time undefined
category -l hostname=karun10 -pe make 1
Thanks, ulrich
On 5/14/19 3:28 PM, MacMullan IV, Hugh wrote:
> It's a limit being reac
It's a limit being reached, of some sort. Do you have a RQS of any kind (qconf
-srqs)? We see this for job-requested, or system set RAM exhaustion (OOM
killer, as mentioned 'dmesg -T' on compute nodes often useful), as well as time
limits reached. What is the whole output from 'qacct -j JOBID'?
Iyad:
As part of our JSV (jsv.sh), we force any non-qsub CLIENT to our 'interactive'
queue, like:
jsv_on_verify() {
MYQ=$(jsv_get_param 'q_hard')
# force non-qsub to interactive queue
if [[ $(jsv_get_param 'CLIENT') != 'qsub' ]]; then
NEWQ=$(echo $MYQ | sed
Perhaps you can 'wrap' the 'work' in a small script (work.sh), like:
#!/bin/bash
## pre-work
echo TMP: $TMP
OUT=$TMP/env.$JOB_ID.$OMPI_COMM_WORLD_RANK.txt
## WORK
env > $OUT 2>&1
## report and clean up?
ls -la $TMP
rsync -av $OUT $SGE_CWD_PATH
Then use a wrapper.sh job script to 'mpiexec
Sweet! Glad to see you’re up and running.
-H
From: Dimar Jaime González Soto
Sent: Thursday, December 6, 2018 11:57 AM
To: MacMullan IV, Hugh
Subject: Re: [gridengine users] problem with concurrent jobs
I disabled that quota and no I can see 60 processes running. Thanks
El jue., 6 dic. 2018
Yes, as Ruti said:
What’s the output from ‘qconf -srqs’ (line 1 of the max_slots rule)? Looks like
you’re being blocked there (RQS).
-H
From: Dimar Jaime González Soto
Sent: Thursday, December 6, 2018 11:22 AM
To: MacMullan IV, Hugh
Subject: Re: [gridengine users] problem with concurrent jobs
Also: I'm surprised 'qalter -w p ' doesn't show any output. Did you
forget the JOBID?
-H
-Original Message-
From: users-boun...@gridengine.org On Behalf Of
Reuti
Sent: Thursday, December 6, 2018 11:04 AM
To: Dimar Jaime González Soto
Cc: users@gridengine.org
Subject: Re: [gridengine
You could put that code in the exec startup script, above the line that starts
execd, and error out if that part doesn’t complete successfully.
Seems weird, though. Shouldn’t the node build include all necessary bits to set
up your execd nodes to ‘run jobs’? Are these on your Cygwin cluster?
You could try ‘qstat -j 11 -explain E’ to see if there’s an explanation for the
error
On Oct 31, 2017, at 19:56, Simon Matthews
> wrote:
I have finally got a small test grid running with an execd running
under Cygwin on a Windows
You can turn on system firewalls, and allow all inbound port TCP traffic from
all cluster nodes, only. And then open ssh ports to on-site, or some other
restricted set of subnets. Perhaps that will satisfy your InfoSec team.
If you use Univa GridEngine, you can specify the ‘port_range’ option
Absolutely! Take a look at the qconf man page for details on using files
(-[AMRD]attr) and / or the -[amrd]attr options (for specific objects).
Hope that's useful!
-Hugh
On Apr 20, 2017, at 14:13, Douglas Duckworth
> wrote:
Hey!
Is
Good call, Reuti. Thanks for the expansion and details!
> On Aug 10, 2016, at 16:25, Reuti <re...@staff.uni-marburg.de> wrote:
>
>
>> Am 10.08.2016 um 21:46 schrieb MacMullan IV, Hugh:
>>
>> Hi Ulrich:
>>
>> I haven't gone past op
Hi Ulrich:
I haven't gone past openmpi v1.10, but you'll likely want to change
'control_slaves' in your PE conf to 'TRUE', to signal that you have tight
integration. https://www.open-mpi.org/faq/?category=sge (and probably
'job_is_first_task' to 'FALSE')
Does that help?
-Hugh
-Original
Another angle to consider is from the MATLAB side: using the
'-singleCompThread' option in your matlab and worker scripts
($MATLAB_ROOT/bin/(matlab|worker)):
$ diff matlab.dist matlab
497c497
< arglist=""
---
> arglist="-singleCompThread"
$ diff worker.dist worker
19c19
< exec
I'm no Rocks guy, but maybe this thread will help:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2013-January/060918.html
-Original Message-
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On
Behalf Of Pat Haley
Sent: Wednesday, May 25, 2016 10:59 AM
We use HT in AWS when running a large set of single-thread jobs across a
cluster. While the individual jobs run more slowly, the total set of jobs
completes more quickly, saving us money. Generally HT gets about a 15% speed
increase (and corresponding cost savings) over 2x ... in other words 2x
Tested queue re-write within a $SGE_ROOT/$SGE_CELL/sge_request launched JSV ...
it works! Hooray! That eliminates 4 dummy queues for us. Good, good stuff.
We're very JSV now.
-Original Message-
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On
Behalf Of Reuti
Limiting the slots on exec hosts (some options):
http://www.softpanorama.org/HPC/Grid_engine/Resources/slot_limits.shtml
Exclusive complex configuration (2010 post):
https://web.archive.org/web/20101027190030/http://wikis.sun.com/display/gridengine62u3/Configuring+Exclusive+Scheduling
exclusive
21 matches
Mail list logo