On 12/30/11 05:57 PM, Reuti wrote:
Am 30.12.2011 um 16:46 schrieb Gerard Henry:

On 12/30/11 04:21 PM, Reuti wrote:
Am 30.12.2011 um 16:19 schrieb Gerard Henry:

On 12/30/11 04:04 PM, Reuti wrote:
Am 30.12.2011 um 15:53 schrieb Gerard Henry:

On 12/30/11 03:39 PM, Reuti wrote:
Am 30.12.2011 um 15:29 schrieb Gerard Henry:

hello all,
i'm in trouble with a ressource limit i don't find where it is.
I have several jobs that are killed, as it is written in the SGE error file:
/local/export/sge/default/spool/charybde/job_scripts/14561: line 12: 25776 
Killed ./benchruntime

Is there more memory installed than 4G? It could be the oom-killer. Can you 
please check /var/log/messages of the exechosts.


yes, exechosts have 8Go. Nothing in 
/local/export/sge/default/spool/qmaster/messages nor /var/log/messages. And all 
the killed jobs have maxvmem 4Go, even if the job wasn't launched with a limit:
qsub   -q long myjob

Is this by accident only a 32 bit application:

$ file ./benchruntime



not really:
-sh-3.2$ file ./benchruntime-0.70
./benchruntime-0.70: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), 
for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 
2.6.9, not stripped

Okay, then: any ulimit in effect already for the running execd?

$ ulimit -aH
$ ulimit -aS

-- Reuti

i don't think:
-sh-3.2$ ulimit -aH

Is this the output of the job? In an interactive session it might be different.

BTW: What does the complete jobscript look like? Any traps or limits set, or 
does the called binary set a limit on its own?


i'm trying the same test on another exechost, because i'm not sure old jobs passed this limit.

very very thanks for your valuable help, i'm not sure i'll have new results soon, so let me say happy new year to you and all helpful people on this list!

gerard
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to