Ok, after a little more poking around, I looked through the source, and learned a few things, but I don't think any of that has made too much of a difference yet. I reviewed my server definition, and added ACL host information to the server definition. Now I can submit a job, and perform qstat, but I get a Bad UID message from qsub. I have synced the passwd, group,and shadow files across the machines. I have submitted the job as shown below.

Any thoughts on this, I am sooo close now, thanks for everyone's help.

Ben

[EMAIL PROTECTED] bdsimmns]$ qsub -q [EMAIL PROTECTED] -u [EMAIL PROTECTED] surface.pbs
qsub: Bad UID for job execution
[EMAIL PROTECTED] bdsimmns]$
[EMAIL PROTECTED] bdsimmns]$
[EMAIL PROTECTED] bdsimmns]$ cat surface.pbs
#PBS -N se_
#PBS -o AxiCircleNoDowel.log
#PBS -e AxiCircleNoDowel.pbs
#PBS -q [EMAIL PROTECTED]
cd /home/1/bdsimmns
echo /home/1/bdsimmns/AxiCircleNoDowel.fe
/home/1/bdsimmns/bin/evolver /home/1/bdsimmns/AxiCircleNoDowel.fe
#All done
[EMAIL PROTECTED] bdsimmns]$


Frank Crawford wrote:

Ben,
        I'd do this on some other workstation, rather than on the cluster
itself.  Get the src.rpm, install it (rpm -ivh openpbs.src.rpm - check
the name).  You may need to do this as root, if you haven't set up some
rpm macros.  After this, do "rpmbuild -bp openpbs.spec" in the
directory.  At that point you have the patched source, for you to grep
and search in other ways.

        A simple start on it is "find . -type f -exec grep PBSE_BADHOST {}
/dev/null \;" to find the error report and then go searching from there.

Frank

On Wed, 2004-03-03 at 13:06, Benjamin Simmons wrote:


How do I dig into the code like you suggest. I am ok to do this, just not sure how to get started.

Ben

Frank Crawford wrote:



Ben,
        The last time I came across this sort of thing, I eventually had to
dive into the code and add extra information to find out what and why it
was complaining.  There are about 16 places where the PBSE_BADHOST (i.e.
the error you are getting) can come out.  A lot of those can probably be
dropped as they aren't the right function, but it could be any one of
the rest.

        I don't know if that is an option for you, but it might be something to
consider, even if it is just a code review you do.

Frank

On Wed, 2004-03-03 at 04:57, Benjamin Simmons wrote:




I added viper.memphis.edu and borg.memphis.edu to both the clienthosts and to the restricted parts and added the /home entries as well although both systems share a common home directory system.
The same error message persists.
Ben


Frank Crawford wrote:





Ben,
        Have a look in /var/spool/pbs/mon_priv at the file config.  You may
need to add either the restricted or clienthosts line for them.  See
pbs_mom man page for more details.

Frank

On Tue, 2004-03-02 at 12:14, Benjamin Simmons wrote:






I added +viper.memphis.edu to the etc/hosts.equiv on borg.memphis.edu. Viper is the machine that I want to do my job submission from, and borg.memphis.edu is the cluster server.

This is the error message listed in the logs on borg.memphis.edu

03/01/2004 19:09:42;0100;PBS_Server;Req;;Type 49 request received from [EMAIL PROTECTED], sock=11
03/01/2004 19:09:42;0080;PBS_Server;Req;req_reject;Reject reply code=15008, aux=0, type=49, from [EMAIL PROTECTED]


this is the error message I recieved when I tried the qsub on viper:

[EMAIL PROTECTED] bdsimmns]$ qsub -q [EMAIL PROTECTED] surface.pbs
pbs_iff: error returned: 15008
pbs_iff: Access from host not allowed, or unknown host
No Permission.
qsub: cannot connect to server borg.memphis.edu (errno=15007)
[EMAIL PROTECTED] bdsimmns]$


I can ssh to and from the two machines without a password, and they share a common home directory system. I can submit jobs to the queue workq while logged into borg.


Any other thoughts as to things to look for?

Ben

Frank Crawford wrote:







Ben,
        Did you add something in /etc/hosts.equiv on the other server
(borg.memphis.edu)?  Anyway, I think that problems with hosts.equiv come
up with a 10523 (Bad User) error.

        What are you seeing in the logs on the other server, i.e.
borg.memphis.edu?  You should either see something logged about the
connection and possibly a rejection, or you would have a deeper level
problem.

Frank

On Tue, 2004-03-02 at 10:41, Benjamin Simmons wrote:








ok, here is what I have setup right how:
I edited /var/spool/pbs/server_name and set it to the cluster server
I edited hosts.equiv and set it to +outside.computer.name
This computer is connected through eth1 of the cluster server, so I edited the pfilter.conf to have eth1 all trusted, to ensure that would not play a role for now.


I created the following script and executed it with qsub.

The error message follows below, but is permissions based.
Thanks,
Ben

[EMAIL PROTECTED] bdsimmns]$ vi surface.pbs

#PBS -N se_
#PBS -o AxiCircleNoDowel.log
#PBS -e AxiCircleNoDowel.pbs
cd /home/1/bdsimmns
echo /home/1/bdsimmns/AxiCircleNoDowel.fe
/home/1/bdsimmns/bin/evolver /home/1/bdsimmns/AxiCircleNoDowel.fe
#All done
~
[EMAIL PROTECTED] bdsimmns]$ qsub -q [EMAIL PROTECTED] surface.pbs
pbs_iff: error returned: 15008
pbs_iff: Access from host not allowed, or unknown host
No Permission.
qsub: cannot connect to server borg.memphis.edu (errno=15007)
[EMAIL PROTECTED] bdsimmns]$


Jeremy Enos wrote:









The extra pbs server and routing queue shouldn't be necessary. Try setting up a hosts.equiv on the pbs_server machine which includes the machine you want to run qsub from. Also, on the machine you're running qsub from, set the server_name file appropriately for the remote server. qsub/qstat commands should work then. As I mentioned before though, I'm not positive that the hosts.equiv is all that is necessary, but I am sure that you don't need two pbs_servers.

Jeremy

At 03:03 PM 3/1/2004, Benjamin Simmons wrote:









I already went through that pain to get a username and password there. The way I read through the manuals is that I need a pbs_server running on the machine that I am making the submission from, and that I have a queue defined on this machine as well. This machine's queue is a routing queue, and I can define one or more destination queues. I for now only want it to go to the queue that is on my cluster server, but I will later need to direct it to other clusters around our campus.

I can send you privately the pdf of the admin guide from the site you mentioned, or reference pages I think I am using correctly.

Am I misunderstanding how this is supposed to work to go between different physical machines?

Thanks,
Ben

Jeremy Enos wrote:









At 01:32 PM 3/1/2004, Benjamin Simmons wrote:









Ok, I think I set everything up correctly, but when I try to submit a job I get a error that job has been rejected by all possible destinations.

Any thoughts on what I need to look at, or do I need to post the server and queue configs for the machine I am submitting on and the one that is recieving?








Not sure why you would have server/queue configs at all on the machine you're submitting on... shouldn't have a server on there at all, right? I know there are admin guides available at http://www.openpbs.org that may also be of help. You will need to register a user name though. (I had some trouble finding a link to do this there though)

Jeremy









Thanks for the help,
Ben Simmons

Jeremy Enos wrote:









It's been awhile since I looked into it, but I think a hosts.equiv file is needed to allow submission from other hosts. I don't have reliable detail past that at the moment.

Jeremy

At 07:57 PM 2/26/2004, Benjamin Simmons wrote:









Has anyone tried to setup a pbs queue that routes to a different server outside their cluster? Or are most people just having users submit the job on the server that it should be run on?

I am getting many different types of errors, but am looking to see if there is an experience base to draw on here, or if I need to look towards another list,

Thanks to all in advance,

Ben Simmons
The University of Memphis
http://viper.memphis.edu



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users
















-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users
























-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users









-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users












-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to