Reuti wrote:
> Am 28.02.2011 um 17:58 schrieb Prentice Bisbal:
> 
>> Hello Everyone. I'm using SGE 6.2u3.One of my users suddenly
>> reported getting this error when running Mathematica jobs (which use DRMAA):
>>
>>
>> Java::excptn: A Java exception occurred:
>>    org.ggf.drmaa.DeniedByDrmException: error: no suitable queues  .
>>        at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
>>        at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
>>
>>
>> Any ideas what could be causing this? Not sure exactly how to debug
>> DRMAA problems, and I can't think if anything I've changed in the past
>> few months that would affect this at all. I have a lot of users who used
>> gridMathematica heavily in the past, so this definitely worked before...
> 
> This doesn't look like a DRMAA error per se - the job requests something 
> which can't be satisfied. 
> 

That's what I thought, too, but I can't find anything that it could be
requesting that can't be satisfied. It's only running on 4 processors,
and I can run other jobs, it's only this one that uses DRMAA that hangs.

I should point out that the job starts to run, then fails when the DRMAA
is invoked. Here's the jobscript I'm submitting:

#!/bin/bash
#$ -V
#$ -cwd
#$ -m abe
#$ -l h_rt=00:10:00
#$ -R y

export
CLASSPATH=/usr/local/share/sge/lib/drmaa.jar:/usr/local/share/sge/lib/jgd
i.jar:/usr/local/share/sge/lib/juti.jar
export LD_LIBRARY_PATH=/usr/local/share/sge/lib/lx24-amd64

/usr/local/share/bin/math < factor.m

And here's the Mathematica file:

Needs["ClusterIntegration`"]
kernels = LaunchKernels[SGE["aurora.sns.ias.edu"], 4]
ParallelMap[FactorInteger, (10^Range[20, 30] - 1)/9]
CloseKernels[]

And here's more of the output:

Mathematica 7.0 for Linux x86 (64-bit)
Copyright 1988-2009 Wolfram Research, Inc.

In[1]:=
In[2]:=
Java::excptn: A Java exception occurred:
    org.ggf.drmaa.DeniedByDrmException: error: no suitable queues  .
        at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
        at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)

Out[2]= $Failed

In[3]:=
ParallelCombine::nopar:
   No parallel kernels available; proceeding with sequential evaluation.

Out[3]= {{{11, 1}, {41, 1}, {101, 1}, {271, 1}, {3541, 1}, {9091, 1},
... the rest is just the answer...

So the problem seems to lie somewhere in the DRMAA invocation. How can I
 debug/troubleshoot this?  The job doesn't run long enough to do a qstat
-j <jobid>, and I wonder if that would give me anything useful in this
case.

Are there any special configuration you need to enable DRMAA in SGE? I
don't remember evrer setting anything, and this exact program has been
working without a single hiccup for over a year, but it could be that I
accidently changed a setting.

-- 
Prentice
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to