I'm trying to setup TORQUE 2.1.8 and every time I try to submit a job from the pbs_server node I keep getting this error: "qsub: Bad UID for job execution" and the following log entry:
03/26/2010 11:26:27;0100;PBS_Server;Req;;Type AuthenticateUser request received fromsitur...@cluster.fing.edu.uy, sock=11 03/26/2010 11:26:27;0100;PBS_Server;Req;;Type QueueJob request received fromsitur...@cluster.fing.edu.uy, sock=10 03/26/2010 11:26:27;0080;PBS_Server;Req;req_reject;Reject reply code=15023(Bad UID for job execution), aux=0, type=QueueJob, fromsitur...@cluster.fing.edu.uy I'm not using root to submit the job and the user is shared among all the cluster nodes. When I execute `id siturria` I get the same UID on all the nodes: [situr...@node02 hola_mundo]$ id siturria uid=524(siturria) gid=501(clusterusers) groups=501(clusterusers),515(pbs),516(maui) The pbs_server node has two network interfaces: cluster.fing.edu.uy (164.73.47.186) and node01.cluster.fing (192.168.242.1). The pbs_server hostname is cluster.fing.edu.uy. Here's my pbs_server configuration (I modified a couple of things trying to identify the problem so there's unnecesary stuff in the configuration): Qmgr: print server # # Create queues and set their attributes. # # # Create and define queue workq # create queue workq set queue workq queue_type = Execution set queue workq resources_max.cput = 10000:00:00 set queue workq resources_max.ncpus = 64 set queue workq resources_max.nodect = 8 set queue workq resources_max.walltime = 10000:00:00 set queue workq resources_min.cput = 00:00:01 set queue workq resources_min.ncpus = 1 set queue workq resources_min.nodect = 1 set queue workq resources_min.walltime = 00:00:01 set queue workq resources_default.cput = 10000:00:00 set queue workq resources_default.ncpus = 1 set queue workq resources_default.nodect = 1 set queue workq resources_default.walltime = 10000:00:00 set queue workq resources_available.nodect = 8 set queue workq enabled = True set queue workq started = True # # Set server attributes. # set server scheduling = True set server managers =r...@cluster.fing.edu.uy set server managers +=situr...@cluster.fing.edu.uy set server managers +=situr...@node02.cluster.fing set server managers +=situr...@node01.cluster.fing set server operators =r...@cluster.fing.edu.uy set server operators +=situr...@cluster.fing.edu.uy set server operators +=situr...@node02.cluster.fing set server operators +=situr...@node01.cluster.fing set server default_queue = workq set server log_events = 64 set server mail_from = adm set server query_other_jobs = True set server resources_available.ncpus = 64 set server resources_available.nodect = 8 set server resources_available.nodes = 8 set server resources_max.ncpus = 64 set server resources_max.nodes = 8 set server scheduler_iteration = 60 set server node_check_rate = 150 set server tcp_timeout = 6 set server pbs_version = 2.1.8 set server submit_hosts = node01.cluster.fing set server submit_hosts += cluster.fing.edu.uy set server submit_hosts += node02.cluster.fing set server allow_node_submit = True Our /etc/hosts contains the following: [situr...@cluster ~]$ cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. ::1 localhost.localdomain localhost 192.168.242.20 node20.cluster.fing node20 192.168.242.19 node19.cluster.fing node19 192.168.242.18 node18.cluster.fing node18 192.168.242.17 node17.cluster.fing node17 192.168.242.16 node16.cluster.fing node16 192.168.242.15 node15.cluster.fing node15 192.168.242.14 node14.cluster.fing node14 192.168.242.13 node13.cluster.fing node13 192.168.242.12 node12.cluster.fing node12 192.168.242.11 node11.cluster.fing node11 192.168.242.10 node10.cluster.fing node10 192.168.242.9 node09.cluster.fing node09 192.168.242.8 node08.cluster.fing node08 192.168.242.7 node07.cluster.fing node07 192.168.242.6 node06.cluster.fing node06 192.168.242.5 node05.cluster.fing node05 192.168.242.4 node04.cluster.fing node04 192.168.242.3 node03.cluster.fing node03 192.168.242.2 node02.cluster.fing node02 192.168.242.1 node01.cluster.fing node01 oscar_server nfs_oscar pbs_oscar 164.73.47.186 cluster.fing.edu.uy cluster Also I tried changing /var/spool/pbs/server_name from "cluster.fing.edu.uy" to "pbs_oscar" but had no luck, I keep getting the "Bad UID for job execution" error. Any ideas on what could be the problem? Regards, Santiago. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users