jobs. Schedd_job_info is already false.
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
On 29.06.17, 15:47, "Mark Dixon" <m.c.di...@leeds.ac.uk> wrote:
On Tue, 27 Jun 2017, juanesteban.jime...@mdc-berlin.de wrote:
On 28.06.17, 12:12, "William Hay" <w@ucl.ac.uk> wrote:
On Wed, Jun 28, 2017 at 08:35:52AM +, juanesteban.jime...@mdc-berlin.de
wrote:
> I figured it would complain if I did that live so I did shut it down
first. Good advice anyway.
>
> It wasn???t one
, this will reset the master job list and give me back
control?
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
On 27.06.17, 11:12, "William Hay" <w@ucl.ac.uk> wrote:
On Tue, Jun 27, 2017 at 08:44:30AM +, juanest
2800
On 27.06.17, 10:41, "William Hay" <w@ucl.ac.uk> wrote:
On Tue, Jun 27, 2017 at 08:30:55AM +, juanesteban.jime...@mdc-berlin.de
wrote:
> Never mind. One of my users submitted a job with 139k subjobs.
>
> A few other questions:
>
the existing data in /opt/sge?
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
On 27.06.17, 10:04, "SGE-discuss on behalf of
juanesteban.jime...@mdc-berlin.de" <sge-discuss-boun...@liverpool.ac.uk on
behalf of juanesteban.jime...@
I’ve got a problem with my qmaster. It is running but is unresponsive to
commands like qstat. The process status is mostly D for disk sleep, and when I
run it in non-daemon debug mode it spends a LOT of time reading the
Master_Job_List.
Any clues?
Mfg,
Juan Jimenez
System Administrator, BIH
Esteban
Cc: sge-disc...@liverpool.ac.uk
Subject: Re: [SGE-discuss] Ulimit -u in qrsh
Are the system's limits in effect for these login sessions, which could be
lower. Do the system's limits match these settings?
-- Reuti
> Am 09.06.2017 um 14:02 schrieb "juanesteban.jime...@mdc-b
17 8:08 PM,
"juanesteban.jime...@mdc-berlin.de<mailto:juanesteban.jime...@mdc-berlin.de>"
<juanesteban.jime...@mdc-berlin.de<mailto:juanesteban.jime...@mdc-berlin.de>>
wrote:
A daemon -is- a process...
Mfg,
Juan Jimenez
System Administrator, HPC
MDC Be
Where should I start looking to resolve this? I've got a user complaining about
this, even though I told him the util is more for the installation of the
daemon, and that he should be using hostname instead
$ /opt/sge/utilbin/lx-amd64/gethostname
error resolving local host: can't resolve
means nothing can actually start as every malloc() will return E_NOMEM.
Simple.
>-Original Message-
>From: SGE-discuss [mailto:sge-discuss-boun...@liverpool.ac.uk] On Behalf Of
>juanesteban.jime...@mdc-berlin.de
>Sent: Thursday, June 01, 2017 9:49 AM
>To: Reuti <re...@st
From: Reuti [re...@staff.uni-marburg.de]
Sent: Tuesday, May 30, 2017 11:36
To: Jimenez, Juan Esteban
Cc: SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Another QRSH problem
> Am 30.05.2017 um 11:32 schrieb juanesteban.jime...@mdc-berlin.de:
>
>
Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
On 29.05.17, 19:45, "SGE-discuss on behalf of
juanesteban.jime...@mdc-berlin.de" <sge-discuss-boun...@liverpool.ac.uk on
behalf of juanesteban.jime...@mdc-berlin.de> wrote:
How is the sheperd bring up th
To: Jimenez, Juan Esteban
Cc: SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Another QRSH problem
> Am 29.05.2017 um 18:00 schrieb juanesteban.jime...@mdc-berlin.de:
>
> On 29.05.17, 17:56, "Reuti" <re...@staff.uni-marburg.de> wrote:
>
>
>> Am 29.05.2017 um
On 29.05.17, 17:56, "Reuti" <re...@staff.uni-marburg.de> wrote:
> Am 29.05.2017 um 17:26 schrieb juanesteban.jime...@mdc-berlin.de:
>
> I am getting this very specific error:
>
> debug1: ssh_exchange_identification: /usr/sbin/sshd: error
,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
On 29.05.17, 16:39, "Reuti" <re...@staff.uni-marburg.de> wrote:
> Am 29.05.2017 um 16:08 schrieb juanesteban.jime...@mdc-berlin.de:
>
> Out of the b
BTW, I did this to try to troubleshoot this, in qconf -mconf
rsh_command /usr/bin/ssh -Y -A -
But where does qrsh put the result of the - option?
Mfg,
Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
:12:42 schrieb "juanesteban.jime...@mdc-berlin.de"
<juanesteban.jime...@mdc-berlin.de>:
> I am just telling you what my colleagues say they were told by Univa.
>
> Mfg,
> Juan Jimenez
> System Administrator, HPC
> MDC Berlin /
Esteban
Cc: William Hay; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] GPUs as a resource
> Am 19.05.2017 um 16:35 schrieb juanesteban.jime...@mdc-berlin.de:
>
>> You are being told by who or what? If it is a what then the exact message
>> is helpful?
>
> By my colleagu
> It does indeed but not by a whole lot for a queue on a couple of nodes.
> Since you want to reserve these nodes for GPU users then the extra queue is
> needless.
> I suggest:
> 1.Make the GPU complex FORCED (so users who don't request a gpu can't end up
> on a node with gpus).
> 2.Define the
...@staff.uni-marburg.de]
Sent: Friday, May 19, 2017 16:37
To: Jimenez, Juan Esteban
Cc: Kamel Mazouzi; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] GPUs as a resource
> Am 19.05.2017 um 16:33 schrieb juanesteban.jime...@mdc-berlin.de:
>
> I put them in /opt/sge/default/common/sge-
> You are being told by who or what? If it is a what then the exact message is
> helpful?
By my colleagues who are running a 2nd cluster using Univa GridEngine. This was
a warning from Univa not to do it that way because it increases qmaster workload
Juan
...@staff.uni-marburg.de]
Sent: Friday, May 19, 2017 14:20
To: Jimenez, Juan Esteban
Cc: Kamel Mazouzi; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] GPUs as a resource
Hi,
> Am 18.05.2017 um 14:15 schrieb juanesteban.jime...@mdc-berlin.de:
>
> I tried it according to the instructions, but it w
So, I now have a working gpu.q. However, users in the acl eat up slots even if
they have not requested a gpu resource. How do i keep out jobs that do not
specifically request a gpu. I only want jobs to run on that queue/node if they
want to use one of the two gpu's.
thanks!
Juan
Get Outlook
for prolog and epilog or ??
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
From: Kamel Mazouzi <mazo...@gmail.com>
Date: Thursday, 18. May 2017 at 13:07
To: "Jimenez, Juan Esteban" <juanesteban.jime...@mdc-berlin.de&
16.05.2017 um 22:07 schrieb juanesteban.jime...@mdc-berlin.de:
> In our cluster we have one node with two Nvidia GPUs. I have been trying to
> figure out how to set them up as consumable resources tied to an ACL, but I
> can't get SGE to handle them correctly. It always says the resource is not
...@liverpool.ac.uk
Subject: Re: [SGE-discuss] Tying resource use to AD/Linux groups
Hi Juan,
On 16 May 2017 at 12:32,
juanesteban.jime...@mdc-berlin.de<mailto:juanesteban.jime...@mdc-berlin.de>
<juanesteban.jime...@mdc-berlin.de<mailto:juanesteban.jime...@mdc-berlin.de>>
wrot
In our cluster we have one node with two Nvidia GPUs. I have been trying to
figure out how to set them up as consumable resources tied to an ACL, but I
can't get SGE to handle them correctly. It always says the resource is not
available.
Can someone walk me through the steps required to set
Has anyone ever managed to tie permission to use a resource like GPU’s on a
node to membership in an Active Directory and/or Linux Group?
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
___
rs/loveshack/SGE/
Thanks & Regards
Yasir Israr
-Original Message-
From: juanesteban.jime...@mdc-berlin.de
[mailto:juanesteban.jime...@mdc-berlin.de]
Sent: 27 April 2017 04:00 PM
To: ya...@orionsolutions.co.in; 'Maximilian Friedersdorff';
sge-dis
oun...@liverpool.ac.uk] On Behalf
> Of juanesteban.jime...@mdc-berlin.de
> Sent: Wednesday, April 12, 2017 5:15 PM
> To: William Hay <w@ucl.ac.uk>
> Cc: SGE-discuss@liv.ac.uk <sge-disc...@liverpool.ac.uk>
> Subject: Re: [SGE-discuss]
On 12.04.17, 10:21, "William Hay" <w@ucl.ac.uk> wrote:
On Tue, Apr 11, 2017 at 05:11:58PM +, juanesteban.jime...@mdc-berlin.de
wrote:
> I've got a serious problem here with authenetication with AD and
Kerberos. I have already done away with all the possibilities I ca
-marburg.de]
Sent: Sunday, April 09, 2017 17:09
To: Jimenez, Juan Esteban
Cc: Jesse Becker; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
Am 09.04.2017 um 12:38 schrieb juanesteban.jime...@mdc-berlin.de:
> Update.
>
> We
o: Jimenez, Juan Esteban
Cc: Jesse Becker; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster
> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de:
>
>> The "size" of job metadata (scripts, ENV, etc) doesn't really affect
>> the RAM
>The "size" of job metadata (scripts, ENV, etc) doesn't really affect
>the RAM usage appreciably that I've seen. We routinely have jobs
>ENVs of almost 4k or more, and it's never been a problem. The
>"data" processed by jobs isn't a factor in qmaster RAM usage, so far as
>I
From: SGE-discuss [sge-discuss-boun...@liverpool.ac.uk] on behalf of
juanesteban.jime...@mdc-berlin.de [juanesteban.jime...@mdc-berlin.de]
Sent: Tuesday, March 21, 2017 09:41
To: Jesse Becker
Cc: SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster
+, juanesteban.jime...@mdc-berlin.de
wrote:
>Hi folks,
>
>I just ran into my first episode of the scheduler crashing because of too many
>submitted jobs. It pegged memory usage to as much as I could give it (12gb at
>one point) and still crashed while it tries to work
Hi folks,
I just ran into my first episode of the scheduler crashing because of too many
submitted jobs. It pegged memory usage to as much as I could give it (12gb at
one point) and still crashed while it tries to work its way through the stack.
I need to figure out how to size a box properly
Today I did some more testing and the problem appears to be specific to GPFS.
I changed the script to put the logs in a folder on an NFS share and *without*
the throttling, there are no errors.
Juan
On 02/02/2017, 00:23, "SGE-discuss on behalf of
juanesteban.jime...@mdc-berlin.de&
Hi Folks,
New to the list! I am the sysadmin of an HPC cluster using SGE 8.1.8. The
cluster has 100+ nodes running Centos 7 with a shared DDN storage cluster
configured as a GPFS device and a number of NFS mounts to a Centos 7 server.
Some of my users are reporting problems with qsub that have
39 matches
Mail list logo