Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Preethi Chockalingam
Hi,

I have fixed the prob.. The prob was with ssh. I included the IP and the public 
rsa key in known_host file. I dint specify the hostname and public rsa key. 
Host name verification was the prob.. 

I thank all who helped me on this..Thanks a lot!

Thanks,
-Preethi


- Original Message 
From: James A. Peltier <[EMAIL PROTECTED]>
To: Preethi Chockalingam <[EMAIL PROTECTED]>
Cc: Alexander Piavka <[EMAIL PROTECTED]>; mauiusers@supercluster.org
Sent: Tuesday, 30 October, 2007 9:06:01 PM
Subject: Re: [Mauiusers] Output and error files are missing

Preethi Chockalingam wrote:
> This is the output of /var/log/messages file..
> 
>  
> 
> Oct 30 09:00:07 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp 
> -rpB /var/spool/torque/spool/85.academyl.ER 
> [EMAIL PROTECTED]:/home/jaya/torque-2.1.6/trial.e85' 
> <mailto:[EMAIL PROTECTED]:/home/jaya/torque-2.1.6/trial.e85'> 
> failed with status=1, giving up after 4 attempts

Are you using NFS/NIS or some similar setup?  If so try adding the 
following to your PBS MOM config file

$usecp *:/ /

-- 
James A. Peltier
Technical Director, RHCE
SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
Phone  : 778-782-3610
Fax: 778-782-3045
Mobile  : 778-840-6434
E-Mail  : [EMAIL PROTECTED]
Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
MSN: [EMAIL PROTECTED]


  Save all your chat conversations. Find them online at 
http://in.messenger.yahoo.com/webmessengerpromo.php___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread James A. Peltier

Preethi Chockalingam wrote:

This is the output of /var/log/messages file..

 

Oct 30 09:00:07 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp 
-rpB /var/spool/torque/spool/85.academyl.ER 
[EMAIL PROTECTED]:/home/jaya/torque-2.1.6/trial.e85' 
 
failed with status=1, giving up after 4 attempts


Are you using NFS/NIS or some similar setup?  If so try adding the 
following to your PBS MOM config file


$usecp *:/ /

--
James A. Peltier
Technical Director, RHCE
SCIRF | GrUVi @ Simon Fraser University - Burnaby Campus
Phone   : 778-782-3610
Fax : 778-782-3045
Mobile  : 778-840-6434
E-Mail  : [EMAIL PROTECTED]
Website : http://gruvi.cs.sfu.ca | http://scirf.cs.sfu.ca
MSN : [EMAIL PROTECTED]
___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Preethi Chockalingam
27; failed with status=1, giving up 
after 4 attempts
Oct 30 09:30:21 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/90.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.e90
Oct 30 09:37:02 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/91.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.o91' failed with status=1, giving up 
after 4 attempts
Oct 30 09:37:02 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/91.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.o91
Oct 30 09:37:06 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/91.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.e91' failed with status=1, giving up 
after 4 attempts
Oct 30 09:37:06 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/91.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.e91
Oct 30 09:49:39 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/92.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.o92' failed with status=1, giving up 
after 4 attempts
Oct 30 09:49:39 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/92.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.o92
Oct 30 09:49:44 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/92.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.e92' failed with status=1, giving up 
after 4 attempts
Oct 30 09:49:44 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/92.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/trial.e92
Oct 30 10:08:14 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/93.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.out' failed with status=1, giving up 
after 4 attempts
Oct 30 10:08:14 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/93.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.out
Oct 30 10:08:18 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/93.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.error' failed with status=1, giving up 
after 4 attempts
Oct 30 10:08:18 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/93.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.error
Oct 30 10:08:49 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/94.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.out' failed with status=1, giving up 
after 4 attempts
Oct 30 10:08:49 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/94.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.out
Oct 30 10:08:53 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/94.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.error' failed with status=1, giving up 
after 4 attempts
Oct 30 10:08:53 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/94.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/loop.error
Oct 30 10:11:27 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/95.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.o95' failed with status=1, giving up 
after 4 attempts
Oct 30 10:11:27 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/95.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.o95
Oct 30 10:11:31 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/95.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.e95' failed with status=1, giving up 
after 4 attempts
Oct 30 10:11:31 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/95.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.e95
Oct 30 10:22:36 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/96.academyl.OU [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.o96' failed with status=1, giving up 
after 4 attempts
Oct 30 10:22:36 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/96.academyl.OU to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.o96
Oct 30 10:22:40 academylab3 pbs_mom: sys_copy, command '/usr/bin/scp -rpB 
/var/spool/torque/spool/96.academyl.ER [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.e96' failed with status=1, giving up 
after 4 attempts
Oct 30 10:22:40 academylab3 pbs_mom: req_cpyfile, Unable to copy file 
/var/spool/torque/spool/96.academyl.ER to [EMAIL 
PROTECTED]:/home/jaya/torque-2.1.6/try.e96




----- Original Message ----
From: Alexander Piavka <[EMAIL

Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Preethi Chockalingam
Hi,

Yup I find the files in /var/spool/torque/undelivered.. But they are with a 
different extension... (.ER and .OU) Why is this happenin??? 

Reg messages files I am not able to make out as which is wrong.. I do find 
jeywords like sshd.. Should I look for some specific message??

Thanks,
-Preethi.C


- Original Message 
From: Alexander Piavka <[EMAIL PROTECTED]>
To: Preethi Chockalingam <[EMAIL PROTECTED]>
Cc: rishi pathak <[EMAIL PROTECTED]>; mauiusers@supercluster.org
Sent: Tuesday, 30 October, 2007 4:16:56 PM
Subject: Re: [Mauiusers] Output and error files are missing

On Tue, 30 Oct 2007, Preethi Chockalingam wrote:

> Hi,
>
> There are no error messages reg scp and ssh.. on pbs_mom node.

  so you don't jave any errors in /var/log/messages on pbs_mom regarding scp/ssh

  on pbs mom do you have the output and error files of your job in 
/var/spool/pbs/undelivered
or /var/spool/pbs/spool while the job is in E state and after it exits the
queue?


> Job status displays the name of the error and output path, but I dont find 
> the files in the specified path.

You can add:
qmgr  -c "set queue NAME keep_completed = 600"
so that then job completes it is still keeped track of for 10 minutes

and after the job completes run 'qstat -f jid'


>
> Thanks
> -Preethi
>
>
> - Original Message 
> From: Alexander Piavka <[EMAIL PROTECTED]>
> To: Preethi Chockalingam <[EMAIL PROTECTED]>
> Cc: rishi pathak <[EMAIL PROTECTED]>; mauiusers@supercluster.org
> Sent: Tuesday, 30 October, 2007 1:43:07 PM
> Subject: Re: [Mauiusers] Output and error files are missing
>
>
>  Look for scp/ssh errors in syslog messages on pbs_mom node
> what does 'qstat -f jid' gives?
>
>
> On Tue, 30 Oct 2007, Preethi Chockalingam wrote:
>
>> Hi Rishi,
>>
>> I checked my pbs_server and mom logs.. I dont find any error..
>> I am able to scp from all nodes in the cluster to the server node.. But 
>> still the output and error files are not created.
>> Wat else do u think could be wrong?
>>
>> Thanks,
>> -Preethi
>>
>> ----- Original Message ----
>> From: rishi pathak <[EMAIL PROTECTED]>
>> To: Preethi Chockalingam <[EMAIL PROTECTED]>
>> Cc: mauiusers@supercluster.org
>> Sent: Tuesday, 30 October, 2007 11:36:22 AM
>> Subject: Re: [Mauiusers] Output and error files are missing
>>
>> HI,
>>  Check your mom logs and pbs_server logs for 'post job file processing 
>> error'.
>> Also check if you can rsh/rcp(as a cluster user) from any compute node to 
>> the node where pbs_server is running.
>> This has not got any relation to maui.
>>
>> I suggest you to post mom_logs and server_logs for better identificatino of 
>> the problem.
>>
>>
>>
>> On 10/30/07, Preethi Chockalingam <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> I have been integratinf Maui and Torque. When I submit jobs through torque 
>> they appear in state 'E' and the job comes out of the queue.
>>
>> I am not able to find th output and input files anywhere.
>>
>> Any suggestions on this please??
>>
>> Thanks in Advance,
>> Preethi.C
>>
>>
>>
>> Save all your chat conversations. Find them online.
>>
>> ___
>> mauiusers mailing list
>> mauiusers@supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
>>
>>
>>
>>
>> --
>> Regards--
>> Rishi Pathak
>>


  Bring your gang together - do your thing. Go to 
http://in.promos.yahoo.com/groups___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: Re: Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Jan Ploski
Preethi Chockalingam <[EMAIL PROTECTED]> schrieb am 10/30/2007 
10:57:50 AM:

> Hi,
> 
> How do I check the maximum space of spool directory?? How do I 
> resolve the prob?? Should I increase it??

Just check your disk space with df. This is not TORQUE-specific.

Regards,
Jan Ploski
___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Preethi Chockalingam
Hi,

How do I check the maximum space of spool directory?? How do I resolve the 
prob?? Should I increase it??

Thanks,
-Preethi.C

- Original Message 
From: Jan Ploski <[EMAIL PROTECTED]>
To: Preethi Chockalingam <[EMAIL PROTECTED]>
Cc: mauiusers@supercluster.org
Sent: Tuesday, 30 October, 2007 2:07:07 PM
Subject: Re: Re: [Mauiusers] Output and error files are missing

[EMAIL PROTECTED] schrieb am 10/30/2007 08:19:43 AM:

> Hi Rishi,
> 
> I checked my pbs_server and mom logs.. I dont find any error.. 
> I am able to scp from all nodes in the cluster to the server node.. 
> But still the output and error files are not created. 
> Wat else do u think could be wrong?

Maybe you are running out of disk space in the spool directory on the 
target node.

Regards,
Jan Ploski


  Save all your chat conversations. Find them online at 
http://in.messenger.yahoo.com/webmessengerpromo.php___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Preethi Chockalingam
Hi,

There are no error messages reg scp and ssh.. on pbs_mom node.
Job status displays the name of the error and output path, but I dont find the 
files in the specified path.

Thanks
-Preethi


- Original Message 
From: Alexander Piavka <[EMAIL PROTECTED]>
To: Preethi Chockalingam <[EMAIL PROTECTED]>
Cc: rishi pathak <[EMAIL PROTECTED]>; mauiusers@supercluster.org
Sent: Tuesday, 30 October, 2007 1:43:07 PM
Subject: Re: [Mauiusers] Output and error files are missing


  Look for scp/ssh errors in syslog messages on pbs_mom node
what does 'qstat -f jid' gives?


On Tue, 30 Oct 2007, Preethi Chockalingam wrote:

> Hi Rishi,
>
> I checked my pbs_server and mom logs.. I dont find any error..
> I am able to scp from all nodes in the cluster to the server node.. But still 
> the output and error files are not created.
> Wat else do u think could be wrong?
>
> Thanks,
> -Preethi
>
> - Original Message 
> From: rishi pathak <[EMAIL PROTECTED]>
> To: Preethi Chockalingam <[EMAIL PROTECTED]>
> Cc: mauiusers@supercluster.org
> Sent: Tuesday, 30 October, 2007 11:36:22 AM
> Subject: Re: [Mauiusers] Output and error files are missing
>
> HI,
>  Check your mom logs and pbs_server logs for 'post job file processing error'.
> Also check if you can rsh/rcp(as a cluster user) from any compute node to the 
> node where pbs_server is running.
> This has not got any relation to maui.
>
> I suggest you to post mom_logs and server_logs for better identificatino of 
> the problem.
>
>
>
> On 10/30/07, Preethi Chockalingam <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have been integratinf Maui and Torque. When I submit jobs through torque 
> they appear in state 'E' and the job comes out of the queue.
>
> I am not able to find th output and input files anywhere.
>
> Any suggestions on this please??
>
> Thanks in Advance,
> Preethi.C
>
>
>
> Save all your chat conversations. Find them online.
>
> ___
> mauiusers mailing list
> mauiusers@supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>
>
>
>
> -- 
> Regards--
> Rishi Pathak
>
>
>  Bring your gang together - do your thing. Go to 
> http://in.promos.yahoo.com/groups


  Bring your gang together - do your thing. Go to 
http://in.promos.yahoo.com/groups___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: Re: [Mauiusers] Output and error files are missing

2007-10-30 Thread Jan Ploski
[EMAIL PROTECTED] schrieb am 10/30/2007 08:19:43 AM:

> Hi Rishi,
> 
> I checked my pbs_server and mom logs.. I dont find any error.. 
> I am able to scp from all nodes in the cluster to the server node.. 
> But still the output and error files are not created. 
> Wat else do u think could be wrong?

Maybe you are running out of disk space in the spool directory on the 
target node.

Regards,
Jan Ploski
___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: [Mauiusers] Output and error files are missing

2007-10-29 Thread Preethi Chockalingam
Hi Rishi,

I checked my pbs_server and mom logs.. I dont find any error.. 
I am able to scp from all nodes in the cluster to the server node.. But still 
the output and error files are not created. 
Wat else do u think could be wrong?

Thanks,
-Preethi

- Original Message 
From: rishi pathak <[EMAIL PROTECTED]>
To: Preethi Chockalingam <[EMAIL PROTECTED]>
Cc: mauiusers@supercluster.org
Sent: Tuesday, 30 October, 2007 11:36:22 AM
Subject: Re: [Mauiusers] Output and error files are missing

HI,
   Check your mom logs and pbs_server logs for 'post job file processing error'.
Also check if you can rsh/rcp(as a cluster user) from any compute node to the 
node where pbs_server is running.
This has not got any relation to maui.

I suggest you to post mom_logs and server_logs for better identificatino of the 
problem.



On 10/30/07, Preethi Chockalingam <[EMAIL PROTECTED]> wrote:
Hi all,
 
I have been integratinf Maui and Torque. When I submit jobs through torque they 
appear in state 'E' and the job comes out of the queue.
 
I am not able to find th output and input files anywhere.
 
Any suggestions on this please?? 
 
Thanks in Advance,
Preethi.C



Save all your chat conversations. Find them online.

___
mauiusers mailing list
mauiusers@supercluster.org 
http://www.supercluster.org/mailman/listinfo/mauiusers





-- 
Regards--
Rishi Pathak


  Bring your gang together - do your thing. Go to 
http://in.promos.yahoo.com/groups___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


Re: [Mauiusers] Output and error files are missing

2007-10-29 Thread rishi pathak
HI,
   Check your mom logs and pbs_server logs for 'post job file processing
error'.
Also check if you can rsh/rcp(as a cluster user) from any compute node to
the node where pbs_server is running.
This has not got any relation to maui.

I suggest you to post mom_logs and server_logs for better identificatino of
the problem.


On 10/30/07, Preethi Chockalingam <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I have been integratinf Maui and Torque. When I submit jobs through torque
> they appear in state 'E' and the job comes out of the queue.
>
> I am not able to find th output and input files anywhere.
>
> Any suggestions on this please??
>
> Thanks in Advance,
> Preethi.C
>
> --
> Save all your chat conversations. Find them 
> online.
>
> ___
> mauiusers mailing list
> mauiusers@supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>


-- 
Regards--
Rishi Pathak
___
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers