ine all the time. Only after
my suggestion to add h_vmem on an exechost level to avoid oversubscription all
the jobs crashed then, due to no memory being available (as h_vmem = 0 was used
this way as an automatically set limit).
Essentially: the default value in a complex definition is ign
Hi,
Any consumables in place like memory or other resource requests? Any output of
`qalter -w v …` or "-w p"?
-- Reuti
> Am 11.06.2020 um 20:32 schrieb Chris Dagdigian :
>
> Hi folks,
>
> Got a bewildering situation I've never seen before with simple SMP/threade
SH itself it is possible with the "match" option in "sshd_config" to
allow only certain users from certain nodes.
Nevertheless: maybe adding "-v" to the `ssh` command will output additional
info, also the messages of `sshd` might be in some log file.
-- Reuti
> Feedb
Hi,
It might be, that the application is ignoring the set OMP_NUM_THREADS (or
assumes a max value if unset) and using all cores in a machine. How many cores
are installed?
-- Reuti
Am 07.05.2020 um 01:04 schrieb Jerome IBt:
> Dear all
>
> I'm facing a strange problem with some
> Am 02.05.2020 um 00:15 schrieb Mun Johl :
>
> Hi Reuti,
>
> Thank you for your reply.
> Please see my comments below.
>
>> Hi,
>>
>> Am 01.05.2020 um 20:44 schrieb Mun Johl:
>>
>>> Hi,
>>>
>>> I am using SGE
he login via SSH. Hence I'm not sure what you
are looking for to be set.
Maybe you want to define in SGE to always use SSH -X?
https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
-- Reuti
>
> Please advise.
>
> Thank you and regards,
>
> --
> Mun
> ___
Hi,
is it alwys failing on one and the same node? Or are several nodes affected?
One guess could be that the file system is full.
-- Reuti
> Am 05.03.2020 um 18:46 schrieb Jerome :
>
> Dear all
>
> I'm facing a strange error in SGE. One job is declared as in err
> Am 31.01.2020 um 18:23 schrieb Jerome IBt :
>
> Le 31/01/2020 à 10:19, Reuti a écrit :
>> Hi Jérôme,
>>
>> Personally I would prefer to keep the output of `qquota` short and use it
>> only for users's limits. I.e. defining the slot limit on an exechost
experience is, that sometime RQS are screwed up especially if used in
combination with some load values (although $num_proc is of course fixed in
your case).
-- Reuti
> Am 31.01.2020 um 17:00 schrieb Jerome :
>
> Dear all
>
> I'm facing a new problem on my cluster with SG
Hi,
I never used SGE OGS/GE 2011.11p1, and for other derivates it seems to work as
intended. Is there any output in the messages file of the executing host where
it mentions to try to kill the process due to an exhausted wallclock time?
-- Reuti
> Am 28.01.2020 um 03:50 schrieb Derrick
longer than 48 hours.
Are your directing these commands to SSH?
-- Reuti
> I am wondering if this is a known issue?
>
> I am running open source version of SGE OGS/GE 2011.11p1
>
> Cheers,
> Derrick
> ___
> users mailing list
&g
n the value of PATH.
Another option could be an "adjustment" of the PATH variable by a JSV.
-- Reuti
>
>> and epilog scripts run with the submission environment but possibly in the
>> context of a different user (i.e. a user could point a root-running prolog
>> s
> Am 22.01.2020 um 15:14 schrieb WALLIS Michael :
>
>
> From: Reuti
>
>>> (for the record, if the number of used_slots is higher than the number
>>> of slots, no jobs using that PE will run. Don't know how that's even
>>> possible.)
>
>>
using that PE will run. Don't know how that's even possible.)
You mean the setting of "slots" in the definition of a particular PE?
-- Reuti
> Cheers,
>
> Mike
>
> --
> Mike Wallis x503305
> University of Edinburgh, Research Services,
> Argyle House, 3 Lady L
There are patches around to attach the additional group id to the ssh daemon:
https://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html
rlogin is used for an interactive login by `qrsh`, rsh for `qrsh` with a
command.
-- Reuti
> Am 09.12.2019 um 18:39 schrieb Korzennik, Sylv
we need something different
> in the GE configuration to enable this?
Are you using the "builtin" method for the startup or SSH, i.e. for the
settings in rsh_daemon resp. rsh_client?
-- Reuti
> Cheers,
> Sylvain
> --
> __
mmunication of the daemons/clients for `qrsh` …
in SGE has no support for X11 forwarding. Hence the approach by `qrsh xterm`
should give a suitable result when set up to use "/usr/bin/ssh -X -Y".
-- Reuti
> Your job 2657108 ("INTERACTIVE") has been submitted
> wai
ramdisk@node19
common@node27
common@node23
common@node23
common@node28
-- Reuti
>
> Regards,
>
> --
> Mun
>
>
> From: Mun Johl
> Sent: Friday, October 25, 2019 5:42 PM
> To: dpo...@gmail.com
> Cc: Skylar Thompson ;
rnal name changes, while the
internal ones stay the same?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
XECD_PORT, ssh, scp, something else)?
It uses its own protocol. No SSH inside the cluster is necessary.
> What are the system-level requirements for succesfully sending the
> submit scripts (for example: same UID for sge across the cluster, same
> UID<->username for the user submitting the job across the cluster, etc)?
Yes.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
out this again - does the subordinate queue
> setting accept 'queueu@@hostgroup' syntax like everything else? Don't
> remember if I ever tried that.
Yes, one can limit it to be available on certain machines only:
subordinate_list NONE,[@intel2667v4=short]
-- Reuti
> Tina
>
&
ond correctly to SIGSTOP, but the GPU portion keeps
> running).
>
> Is there any way, with our current number of queues, to exempt jobs
> using a GPU resource complex (-l gpu) from being suspended by short jobs?
Not that I'm aware of. Almost 10 years ago I had a sim
le: the number of slots will be corrected in the copy of the input
file to the number of granted slots. Let me know if you would like to get them.
-- Reuti
> I was told the this should be possible in slurm (which we don't have,
> and to which w
minating the "jclass" column, which doesn't
> contain any information, but I can only find ways to add columns, not take
> them away. Is there a way to make this column go away?
Besides `cut -b`: what type of output are you looking for? There is a `qstatus`
AWK scri
Hi Ilya,
Am 31.07.2019 um 00:55 schrieb Ilya M:
> Hi Reuti,
>
> So /home is not mounted via NFS as it's usually done?
> Correct.
>
>
> How exactly is your setup? I mean: you want to create some kind of pseudo
> home directory on the nodes (hence "-b y"
getting
> meaningful debug output.
How exactly is your setup? I mean: you want to create some kind of pseudo home
directory on the nodes (hence "-b y" can't be used with user binaries) and the
staged job script (by SGE) will then execute the job
attach it the exechosts although
attaching them to a queue might be shorter and a central location for all
definitions.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
etree" for
> reading: No such file or directory
Did you transfer the old configuration or does this pop up in a fresh installed
system?
Unfortunately the procedure might be changed by the ROCKS distribution compared
to the
mon got.
The changes of the "nofile" setting should be visible in the shell when you log
in too.
-- Reuti
> Currently, my only workaround is to rebuild the Compute Node (reinstall OS
> etc) so that it corrects this issue.
>
> >> Can you check the limits that are set i
ettings but the rest are fine.
Several ulimits can be set in the queue configuration, and can so different for
each queue or exechost.
-- Reuti
> I am wondering if this is SGE related? And idea is welcomed.
>
> Cheers,
> Derrick
> ___
>
that your job and the sum of all of its processes passed this limit and kill
the job. SGE can do this, by using the additional group ID which is attached to
all processes by a particular job, while the kernel might only watch a certain
process.
Using now an assigned cgroup, even the kernel ca
t wait until all running jobs without
this constraint were drained.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ot;Finished Jobs".
But it won't retrieve jobs which were already drained from the listing. Also
after a restart of the `qmaster`, the list will initially be empty.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
AFAICS the sent kill by SGE happens after a task returned already with an
error. SGE would in this case use the kill signal to be sure to kill all child
processes. Hence the question would be: what was the initial command in the
job script, and what output/error did it generate?
-- Reuti
> Am 25.04.2019 um 17:41 schrieb Mun Johl :
>
> Hi Skyler, Reuti,
>
> Thank you for your reply.
> Please see my comments below.
>
> On Thu, Apr 25, 2019 at 08:03 AM PDT, Reuti wrote:
>> Hi,
>>
>>> Am 25.04.2019 um 16:53 schrieb Mun Joh
rched the man pages and web for definitions of the output of
> qacct, but I have not been able to find a complete reference (just bits
> and pieces here and there).
>
> Can anyone point me to a complete reference so that I can better
> understand the output of qacct?
There is a man page about
Am 09.04.2019 um 21:08 schrieb Mun Johl:
> Hi Reuti,
>
> One clarification question below ...
>
> On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
>>> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>>>
>>> Hi Reuti,
>>>
>>>
Am 09.04.2019 um 21:08 schrieb Mun Johl:
> Hi Reuti,
>
> One clarification question below ...
>
> On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
>>> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>>>
>>> Hi Reuti,
>>>
>>>
> Am 09.04.2019 um 17:43 schrieb Mun Johl :
>
> Hi Reuti,
>
> Thank you for your reply!
> Please see my comments below.
>
> On Mon, Apr 08, 2019 at 10:27 PM PDT, Reuti wrote:
>> Hi,
>>
>>> Am 09.04.2019 um 05:37 schrieb Mun Johl :
>>>
he contractor only need an account on serverA in
> order to utilize SGE? Or would he need an account on the grid master as
> well?
Are you not using a central user administration by NIS or LDAP?
AFAICS he needs an entry only on the execution host (and on the submission host
of course).
-- Reut
NONE,[@special_hosts=allowed_users]
and the white listed users have to be in the allowed_users ACL and the
@special_hosts are the machines where ordinary users are banned from.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
> Am 12.03.2019 um 15:55 schrieb David Trimboli :
>
>
> On 3/5/2019 12:34 PM, David Trimboli wrote:
>>
>> On 3/5/2019 12:18 PM, Reuti wrote:
>>>> Am 05.03.2019 um 18:06 schrieb David Trimboli
>>>> :
>>>>
>>>> I
ave this problem on a local scratch directory for $TMPDIR though.
===
BTW: did I mention it: no need to be root anywhere.
-- Reuti
multi-spawn.sh
Description: Binary data
__SGE_PLANCHET__.tgz
Description: Binary data
cluster.tgz
Description: Binary data
__
oesn't seem to change the order in
> which they run.
You mean the value you set with "-p"?
To which value did you change this for certain jobs?
$ qstat -pri
might give a hint about the overall values which are assigned to a job in
column "npprior".
-- Reuti
>
> Can any
.q} wouldn't hurt as it means "for each entry in the list",
and the only entry is all.q. But to lower the impact I would leave this out.
-- Reuti
> }
>
> I get the feeling that will limit the number of slots that all users can
> collectively use simultaneously to 10. I w
Hi,
> Am 27.02.2019 um 22:07 schrieb Kandalaft, Iyad (AAFC/AAC)
> :
>
> HI Reuti
>
> I'm implementing only a share-tree.
Then you can set:
policy_hierarchy S
The past usage is stored in the user object, hence auto_user_delete_time
should be zer
Hi,
there is a man page "man sge_priority". Which policy do you intend to use:
share-tree (honors past usage) or functional (current use), or both?
-- Reuti
> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC)
> :
>
> Hi all,
>
> I recently implement
ing info via qacct. I am wondering
> what is the common way to achieve this without giving access to the qmaster
> node?
You mean, $SGE_ROOT is not shared in your cluster?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
f <(zcat accounting.0.gz)
-- Reuti
> --
> JY
> --
> "All ideas and opinions expressed in this communication are
> those of the author alone and do not necessarily reflect the
> ideas and opinions of anyone else."
>
how to read it. Any helpful insight
> much appreciated
Did you try to stop and start the qmaster?
-- Reuti
> qping -i 5 -info hpc-s 6444 qmaster 1
> 01/26/2019 01:12:18:
> SIRM version: 0.1
> SIRM message id: 1
> start time: 01/26/2
as no access (as the permission bits look fine). Essentially
the exporting NFS machine will deny the access according to certain bits of the
permission bits. Does the line form above contain a plus sign on the exporting
machine like:
-rwxr-xr-x+ 1 root root 1941408 Feb 28 2016 /opt/sge/bin/lx-amd64/qh
> Am 24.01.2019 um 20:29 schrieb David Triimboli :
>
> On 1/24/2019 2:05 PM, Reuti wrote:
>> Do the permissions for the directories include the x flag and not only r?
>>
>> drwxr-xr-x 2 root root 4.0K Jan 13 2010 man1
>> drwxr-xr-x 2 root root 4.0K Jan 13 20
> Am 24.01.2019 um 19:28 schrieb David Triimboli :
>
> On 1/24/2019 1:14 PM, Reuti wrote:
>>> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>>>
>>> On 1/24/2019 12:44 PM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 24.0
> Am 24.01.2019 um 19:28 schrieb David Triimboli :
>
> On 1/24/2019 1:14 PM, Reuti wrote:
>>> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>>>
>>> On 1/24/2019 12:44 PM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 24.0
> Am 24.01.2019 um 19:10 schrieb David Triimboli :
>
> On 1/24/2019 12:44 PM, Reuti wrote:
>> Hi,
>>
>>> Am 24.01.2019 um 18:28 schrieb David Triimboli :
>>>
>>> This is just a silly question. Using Son of Grid Engine 8.1.9, I installed
>>
? How do I get it to do
> so?
What is the output of the command:
manpath
on both machines?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
d memory listed by `ipcs`
and it starts to swap?
-- Reuti
>
> Regards,
>
> Derek
> -Original Message-
> From: Reuti
> Sent: January 18, 2019 11:26 AM
> To: Derek Stephenson
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Dilemma with e
> Am 18.01.2019 um 18:06 schrieb David Triimboli :
>
> On 1/18/2019 11:49 AM, Reuti wrote:
>>> Am 18.01.2019 um 17:41 schrieb David Triimboli :
>>>
>>> On 1/18/2019 11:22 AM, Reuti wrote:
>>>> Hi,
>>>>
>>>>> Am 1
> Am 18.01.2019 um 17:41 schrieb David Triimboli :
>
> On 1/18/2019 11:22 AM, Reuti wrote:
>> Hi,
>>
>>> Am 18.01.2019 um 17:09 schrieb David Triimboli :
>>>
>>> Hi, all. I've got a twenty-four-node cluster running versions of CentOS 5
>>
> Am 18.01.2019 um 16:26 schrieb Derek Stephenson
> :
>
> Hi Reuti,
>
> I don't believe anyone has adjusted the scheduler from defaults but I see:
> schedule_interval 00:00:04
> flush_submit_sec 1
> flush_finish_sec
ript inside SGE isn't prepared for
your actual kernel, i.e. a case for 4.* kernels is missing. What does:
$ $SGE_ROOT/util/arch
return?
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
cket fd 7
For interactive jobs: any firewall in place, blocking the communication between
the submission host and the exechost – maybe switched on at a later point in
time? SGE will use a random port for the communication. After the reboot it
worked instantly again?
-- Reuti
> Now I've seen a serie
> Am 11.01.2019 um 00:30 schrieb Derrick Lin :
>
> Hi Reuti
>
> Thanks for the input. But how does this help on troubleshooting the prolog
> script?
You asked for the meaning of the "-i" option, and I tried to outline its
behavior.
-- Reuti
> I will also tr
> Am 09.01.2019 um 23:39 schrieb Derrick Lin :
>
> Hi Reuti and Iyad,
>
> Here is my prolog script, it just does one thing, setting quota on the XFS
> volume for each job:
>
> The prolog_exec_xx_xx.log file was generated, so I assumed the first exec
> comman
Hi,
Am 09.01.2019 um 23:35 schrieb Derrick Lin:
> Hi Reuti,
>
> I have to say I am still not familiar with the "-i" in qsub after reading the
> man page, what does it do?
It will be feed as stdin to the jobscript. Hence:
$ qsub -i myfile foo.sh
is like:
$ foo.sh
ed.
Is there any statement in the prolog, which could wait for stdin – and in a
batch job there is just no stdin, hence it continues? Could be tested with "-i"
to a batch job.
-- Reuti
> qsub job are working fine.
>
> Any idea will be appreciated
>
> Cheers,
> Derrick
n
interactive.q; maybe you cen remove the PE smp there, unless you want to use it
interactively too.
-- Reuti
> $ qconf -sp smp
> pe_namesmp
> slots 999
> user_lists NONE
> xuser_listsNONE
> start_proc_argsNONE
> stop_proc_
ed the session: rsh, ssh or built-in. IIRC the
last `if [ -n "$MYJOBID" ];` section had only the purpose to display a message,
which was set with "-ac" during submission and might not be necessary here.
-- Reuti
MYPARENT=`ps -p $$ -o ppid --no-header`
#MYPARENT=`ps -p $MYPARENT -o
Am 07.12.2018 um 21:33 schrieb Derrick Lin:
> Reuti,
>
> My further tests confirm that $TMP is set inside PROLOG, $TMPDIR is not.
>
> Both $TMPDIR and $TMP are set in job's environment.
>
> So technically my problem is solved by switching to $TMP.
>
&g
ot receive TMPDIR which should be created by
> the scheduler.
Is $TMP set?
-- Reuti
>
> Other variables such as JOB_ID, PE_HOSTFILE are available though.
>
> We have been using the same script on the CentOS6 cluster with OGS/GE
> 2011.11p1 without an issue
I found my entry about this:
https://arc.liv.ac.uk/trac/SGE/ticket/570
-- Reuti
> Am 06.12.2018 um 19:03 schrieb Reuti :
>
> Hi,
>
>> Am 06.12.2018 um 18:36 schrieb Dan Whitehouse :
>>
>> Hi,
>> I've been running some MPI jobs and I expected that whe
were supposed to run in a dedicated cleaner.q
only with no limits regarding slots (hence they started as soon as they were
eligible tun start), but got a job hold on the actual job which submitted them
to wait until it finished.
-- Reuti
>
> --
> Dan Whitehouse
> Research System
slots limited:
$ qconf -se global
especially the line "complex_values".
And next: any RQS?
$ qconf -srqs
-- Reuti
> El jue., 6 dic. 2018 a las 12:55, Reuti ()
> escribió:
>
> > Am 06.12.2018 um 15:19 schrieb Dimar Jaime González Soto
> > :
> >
> >
ooks fine. So we have other settings to investigate:
$ qconf -sconf
#global:
execd_spool_dir /var/spool/sge
...
max_aj_tasks 75000
Is max_aj_tasks limited in your setup?
-- Reuti
>
> El jue., 6 dic. 2018 a las 11:13, Reuti ()
> escribió:
>
11:04:02
>1 17-60:1
Aha, so they are running already on remote nodes – fine. As the setting in the
queue configuration is per host, this should work and provide more processes
per node instead of four.
Is there a setting for the exechosts:
qconf -se ubuntu-node2
or a running job. There should be a line for each
executing task while the waiting once are abbreviated in one line.
-- Reuti
>
> I would try running qalter -w p against the job id to see what it says.
>
> William
>
>
>
>>
>>> Am 05.12.2018 um 19:10 s
or the
communication? The PE would deliver a hostlist to the application, which then
can be used to start processes on other nodes too. Some MPI libraries even
discover
BTW: as you wrote "16 logical threads": often its advisable for HPC to disable
Hyperthr
with 744 permissions.
Were the sge_execd on these new machines started by root or sgeadmin?
$ ps -e f -o user,ruser,command | grep sge
sgeadmin root /usr/sge/bin/lx24-em64t/sge_execd
root root \_ /bin/sh /usr/sge/cluster/tmpspace.sh
-- Reuti
>
>
> The user directories,
ob_is_first_task FALSE
> urgency_slots min
> accounting_summary FALSE
>
> Then we have run our application "Maker" like this,
> qsub -cwd -N -b y -V -pe mpi /opt/mpich-install/bin/mpiexec
> maker
Which version of MPICH are you using? Maybe it's not tightly integrat
Hi,
> Am 23.10.2018 um 20:31 schrieb Dj Merrill :
>
> Hi Reuti,
> Thank you for your response. I didn't describe our environment very
> well, and I apologize. We only have one queue. We've had a few
> instances of people forgetting they ran a job that doesn't app
There is an introduction to use the checkpoint interface here:
https://arc.liv.ac.uk/SGE/howto/checkpointing.html
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
users can check the result of the computation even
without login into the cluster.
e) this mail-wrapper script needs again to be defined with: qconf -mconf
mailer /usr/sge/cluster/mailer.sh
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
.
-- Reuti
Von meinem iPhone gesendet
> Am 15.09.2018 um 19:15 schrieb Simon Matthews :
>
> Is there any way to bring up an execd node, without explicitly
> configuring it at the qmaster? Perhaps it could come up and be added
> to a default queue?
>
> If it is possible to do
ombine processes (like for MPI) and
threads (like for Open MP).
In your case, it looks to me that you assume that necessary cores are
available, independent from the actual usage of each node?
-- Reuti
PS: I assume with CPUS you refer to CORES.
> I have written up an article at:
>
etup file?
>
> Running on Ubuntu. Installed by `sudo apt-get install gridengine-master
> gridengine-client` and accepted all defaults.
I have no clue about the Ubuntu issue. But usually you have to run a setup
beforehand twice - one for the master, one for the client. Do you have
se), the
jobscript is first transferred by SGE's protocol to the node, where the execd
writes the jobscript in the shared space, which is on the headnode again.
If you peek into the given file, you will hence find the original jobscript of
the user. Does the jobscript try to modify itself, and the
s and they ran and completed without an problem.
Could it be a race condition with the shared file system?
-- Reuti
> I am wondering what may has caused this situation in general?
>
> Cheers,
> Derrick
> ___
> users mailing list
> users
> Am 01.08.2018 um 03:06 schrieb Derrick Lin :
>
> HI Reuti,
>
> The prolog script is set to run by root indeed. The xfs quota requires root
> privilege.
>
> I also tried the 2nd approach but it seems that the addgrpid file has not
> been created when the prolog
> Am 30.07.2018 um 02:31 schrieb Derrick Lin :
>
> Hi Reuti,
>
> The approach sounds great.
>
> But the prolog script seems to be run by root, so this is what I got:
>
> XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)
This is quite unusual. Do
> Am 28.07.2018 um 03:00 schrieb Derrick Lin :
>
> Thanks Reuti,
>
> I know little about group ID created by SGE, and also pretty much confused
> with the Linux group ID.
Yes, SGE assigns a conventional group ID to each job to track the CPU and
memory consumpt
e.
This can be found in the `id` command's output or in location of the spool
directory for the execd_spool_dir in
${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
-- Reuti
> That's why I am trying to implement the xfs_projid to be independent from SGE.
>
>
>
> On Thu, J
did for generating project ID:
>
> XFS_PROJID_CF="/tmp/xfs_projid_counter"
>
> echo $JOB_ID >> $XFS_PROJID_CF
> xfs_projid=$(wc -l < $XFS_PROJID_CF)
The xfs_projid is then the number of lines in the file? Why not using $JOB_ID
directly? Is there a limit in max. project ID an
,56,98,134 failed to finish.
>
> What can I do to only run the failed job now? Can I use -t option in anyway
> or do I have to submit it one by one?
You have to submit it one by one, possibly in a `for` loop, but you can use -t
to specify the to be used index at least as a single number.
> Am 11.06.2018 um 18:43 schrieb Ilya M <4ilya.m+g...@gmail.com>:
>
> Hello,
>
> Thank you for the suggestion, Reuti. Not sure if my users' pipelines can deal
> with multiple job ids, perhaps they will be willing to modify their code.
Also other commands in SGE
to get all the runs listed
though.
-- Reuti
> This is my test script:
>
> #!/bin/bash
>
> #$ -S /bin/bash
> #$ -l s_rt=0:0:5,h_rt=0:0:10
> #$ -j y
>
> set -x
> set -e
> set -o pipefail
> set -u
>
> trap "exit 99" SIGUSR1
>
>
taining all GPU nodes:
qconf -mattr queue calendar wartung common@@myGPUgroup
-- Reuti
> Ilya.
>
>
> On Wed, Jun 6, 2018 at 2:41 AM, Mark Dixon wrote:
> On Tue, 5 Jun 2018, Ilya M wrote:
> ...
> Is there a way to submit AR when there are projects attached to queues? I am
&g
l /var/spool/sge
The default in the script itself I set to:
UNCONFIGURED=no
ACTION_ON=2
ACTIONSIZE=1024
KEEPOLD=3
Note: to read old accouting files in `qacct` on-the-fly you can use:
$ qacct -o reuti -f <(zcat /usr/sge/default/common/accounting.1.gz)
-- Reuti
>
> Thank
B instead of being left as unlimited. You
can compare the limits for the interactive access and a job submission by using:
echo hard limits
ulimit -aH
echo soft limits
ulimit -aS
-- Reuti
>
> On Fri, May 04, 2018 at 01:45:24PM +, Simon Andrews wrote:
>> I've got a strange pro
e recent discussion about the problem of using load values and
RQS which could prevent scheduling. But the value could be used for any alarm
or suspend threshold.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Hi,
Am 20.04.2018 um 22:55 schrieb Ilya M:
> Hi Reuti,
>
> There are dozens on hosts in @gpu. In my test submissions, however, I am
> using only one host that I specify with '-l hostname='. I disabled all other
> queues on this host to make sure nothing else but my test j
1 - 100 of 1847 matches
Mail list logo