Hi Rad,

Are you sure that the execution daemons are running on your compute nodes?  Can 
you login to one of the nodes say ‘compute001’ and do a ps looking for the 
execd?  When an execd is functioning normally it provides the load and memory, 
etc… none of your nodes are showing that.

Regards,

Bill.

> On May 30, 2016, at 1:20 PM, Radhouane Aniba <arad...@gmail.com> wrote:
> 
> Hello all,
> 
> I am trying to submit a simple "hello world" to test a gridengine (I used it 
> before with no problems)
> 
> The problem is that my job is waiting in the queue forever
> 
> The qhost command shows a wired state of the compute nodes
> 
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  
> SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -       
> -
> compute001              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute002              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute003              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute004              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute005              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute006              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute007              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute008              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute009              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute010              lx26-amd64      4     -   31.4G       -     0.0       
> -
> compute011              lx26-amd64      4     -   31.4G       -     0.0
> In normal times even when the compute nodes are not used I used to have some 
> information on the load and memuse columns
> 
> I am not an SGE persons but I am familiar with all the commands, any help 
> would be much appreciated
> 
> the qstat -f command shows all my nodes in au state. I've been reading a lot 
> about it and I understood its an alarm state (overloaded ?)
> 
> the only heavy activity I had on the head node was a script downloading 19T 
> of data, could the headnode be the problem and not the compute nodes ?
> 
> sge_execd is working on all the compute/exec nodes :/
> 
> --
> Rad
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

William Bryce | VP Products
Univa Corporation, Toronto
E: bbr...@univa.com | D: 647-9742841 | Toll-Free (800) 370-5320
W: Univa.com <http://univa.com/> | FB: facebook.com/univa.corporation 
<http://facebook.com/univa.corporation> | T: twitter.com/Grid_Engine 
<http://twitter.com/Grid_Engine>

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to