Reuti wrote:
> Am 28.02.2011 um 17:58 schrieb Prentice Bisbal:
>
>> Hello Everyone. I'm using SGE 6.2u3.One of my users suddenly
>> reported getting this error when running Mathematica jobs (which use DRMAA):
>>
>>
>> Java::excptn: A Java exception occurred:
>> org.ggf.drmaa.DeniedByDrmException: error: no suitable queues .
>> at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
>> at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
>>
>>
>> Any ideas what could be causing this? Not sure exactly how to debug
>> DRMAA problems, and I can't think if anything I've changed in the past
>> few months that would affect this at all. I have a lot of users who used
>> gridMathematica heavily in the past, so this definitely worked before...
>
> This doesn't look like a DRMAA error per se - the job requests something
> which can't be satisfied.
>
That's what I thought, too, but I can't find anything that it could be
requesting that can't be satisfied. It's only running on 4 processors,
and I can run other jobs, it's only this one that uses DRMAA that hangs.
I should point out that the job starts to run, then fails when the DRMAA
is invoked. Here's the jobscript I'm submitting:
#!/bin/bash
#$ -V
#$ -cwd
#$ -m abe
#$ -l h_rt=00:10:00
#$ -R y
export
CLASSPATH=/usr/local/share/sge/lib/drmaa.jar:/usr/local/share/sge/lib/jgd
i.jar:/usr/local/share/sge/lib/juti.jar
export LD_LIBRARY_PATH=/usr/local/share/sge/lib/lx24-amd64
/usr/local/share/bin/math < factor.m
And here's the Mathematica file:
Needs["ClusterIntegration`"]
kernels = LaunchKernels[SGE["aurora.sns.ias.edu"], 4]
ParallelMap[FactorInteger, (10^Range[20, 30] - 1)/9]
CloseKernels[]
And here's more of the output:
Mathematica 7.0 for Linux x86 (64-bit)
Copyright 1988-2009 Wolfram Research, Inc.
In[1]:=
In[2]:=
Java::excptn: A Java exception occurred:
org.ggf.drmaa.DeniedByDrmException: error: no suitable queues .
at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
Out[2]= $Failed
In[3]:=
ParallelCombine::nopar:
No parallel kernels available; proceeding with sequential evaluation.
Out[3]= {{{11, 1}, {41, 1}, {101, 1}, {271, 1}, {3541, 1}, {9091, 1},
... the rest is just the answer...
So the problem seems to lie somewhere in the DRMAA invocation. How can I
debug/troubleshoot this? The job doesn't run long enough to do a qstat
-j <jobid>, and I wonder if that would give me anything useful in this
case.
Are there any special configuration you need to enable DRMAA in SGE? I
don't remember evrer setting anything, and this exact program has been
working without a single hiccup for over a year, but it could be that I
accidently changed a setting.
--
Prentice
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users