From: carlos vasco [mailto:[EMAIL PROTECTED]
Sent: Tue 28/03/2006 01:55
To: Bernard Li
Cc: [email protected]
Subject: Re: Oscar 4.2.1b5 testing
Hi Bernard,
I have been searching the torque list about
the time use issue, and found that:
>> We are using torque-1.2.0p5
and Maui-3.2.6p13. When I do a qstat I
>> see that the 'Time Use' is
only a couple of seconds, yet the jobs
>> have been running for a
couple of hours. We are running Matlab jobs
>> which are launched from
a script. They are only single cpu (no mpi).
>
> This is a bug fixed
in 1.2.0p6...
Since my problem is very similar (not a mpi issue), and
oscar 4.2.1
being torque-1.2.0p5 (I think), the solution could be
using
torque-1.2.0p6. Any easy way to update
torque?
Thanks,
Carlos
On 3/28/06, carlos vasco
<[EMAIL PROTECTED]> wrote:
> Hi Bernard (and oscar-devel, I
forgot last time to cc them):
>
> The test now worked OK, apart from
ganglia, but this has been
> reconfigured by our IT people, so it should
be ok.
>
> TORQUE still reports 00:00:00 ...
>
>
Carlos
>
> On 3/28/06, carlos vasco <[EMAIL PROTECTED]>
wrote:
> > Hi Bernard (and oscar-devel, I forgot last time to cc
them):
> >
> > The test now worked OK, apart from ganglia, but
this has been
> > reconfigured by our IT people, so it should be
ok.
> >
> > TORQUE still reports 00:00:00 ...
>
>
> > Carlos
> >
> > On 3/28/06, carlos vasco
<[EMAIL PROTECTED]> wrote:
> > > Not sure what I did, but
apparently installing Oscar I modified
> > > ssh_config on the
server instead of sshd_config, and that is why I
> > > think I
forgot tho modified sshd_config.
> > >
> > > I am trying
the test again.
> > >
> > > Carlos
> >
>
> > > On 3/28/06, Bernard Li <[EMAIL PROTECTED]>
wrote:
> > > >
> > > >
> > > > [
CC:ing oscar-devel on this ]
> > > >
> > > > You
shouldn't need to manually modify your /etc/ssh/sshd_config to add
> >
the
> > > > "PermitRootLogin" - this should already be done for
you.
> > > >
> > > > In your error log, it
indicates that you have put the option in
> > > >
/etc/ssh/ssh_config, which is _wrong_. Try taking out that line
and
> > re-run
> > > > the tests (that option should be
in sshd_config, not ssh_config, but as
> > I
> > > >
mentioned you shouldn't need to manually modify it).
> > >
>
> > > > Cheers,
> > > >
> > >
> Bernard
> > > >
> > > >
________________________________
> > > > From: carlos vasco
[mailto:[EMAIL PROTECTED]]
>
> > > Sent: Mon 27/03/2006 22:18
> > > > To: Bernard
Li
> > > > Subject: Re: [Oscar-devel] Oscar 4.2.1b5
testing
> > > >
> > > >
> > >
>
> > > >
> > > > Hi, Bernard,
> >
> >
> > > > I don't know exactly what the logs are, I only
can find the following
> > > > in the /home/oscartst/
directory:
> > > >
> > > > drwxr-xr-x 2
oscartst oscartst 4096 Mar 27 15:30 ganglia
> > > >
drwxr-xr-x 2 oscartst oscartst 4096 Mar 27 15:31 lam
> > >
> drwxr-xr-x 2 oscartst oscartst 4096 Mar 22 16:49 maui
> >
> > drwxr-xr-x 2 oscartst oscartst 4096 Mar 27 15:30 mpich
>
> > > -rwxr-xr-x 1 oscartst oscartst 4826 Mar 27 15:30
pbs_test
> > > > drwxr-xr-x 2 oscartst oscartst 4096 Mar 27
15:30 pvm
> > > > -rwxr-xr-x 1 oscartst oscartst 927
Mar 27 15:30 ssh_user_tests
> > > > -rwxr-xr-x 1 oscartst
oscartst 7326 Mar 27 15:30 test_cluster
> > > > -rwxr-xr-x
1 oscartst oscartst 3562 Mar 27 15:30 testprint
> > > >
drwxr-xr-x 2 oscartst oscartst 4096 Mar 27 15:30 torque
> > >
>
> > > > In mpich,
> > > > -rw-r--r-- 1
oscartst oscartst 3093 Mar 18 00:30 cpi.c
> > > >
-rw-r--r-- 1 oscartst oscartst 1732 Mar 18 00:30
cxxhello.cc
> > > > -rw-r--r-- 1 oscartst
oscartst 1647 Mar 18 00:30 f77hello.f
> > > >
-rwxrwxr-x 1 oscartst oscartst 337512 Mar 27 15:30 mpich-cpi
> >
> > -rw------- 1 oscartst oscartst 136 Mar 27
15:30 mpichtest.err
> > > > -rw------- 1 oscartst
oscartst 454 Mar 27 15:30 mpichtest.out
> > > >
-rwxr-xr-x 1 oscartst oscartst 1412 Mar 18 00:30
pbs_script.mpich
> > > > -rw-rw-r-- 1 oscartst
oscartst 510 Mar 27 13:51 PI21051
> > > >
-rw-rw-r-- 1 oscartst oscartst 510 Mar 27 12:37
PI3408
> > > > -rwxr-xr-x 1 oscartst oscartst
2837 Mar 18 00:30 test_user
> > > >
> > > > I
attach the mpichtest files.
> > > >
> > > > Not
sure how to track the TORQUE problem, maybe I can config it in the
> >
> > same way we configured the other clusters.
> > >
>
> > > > Thanks,
> > > > Carlos
> >
> >
> > > >
> > > > On 3/27/06, Bernard Li
<[EMAIL PROTECTED]> wrote:
> > > > > Hi Carlos:
> >
> > >
> > > > > > No problems have been found
during installation, but some errors
> > did
> > > >
> > occur during the test phase (see attachment).
> > > >
>
> > > > > Can you post the relevant logs in
/home/oscartst?
> > > > >
> > > > > >
Other problem found is that qstat reports 00:00:00 in the
> > > >
> > Time Use field.
> > > > >
> > > >
> I wonder if this is a TORQUE bug or a bug of us setting it up - do
>
> you
> > > > > think you can dig deeper into this?
>
> > > >
> > > > > > During installation, I
think I forgot to put PermitRootLogin yes in
> > > > > >
sshd_config, and after the nodes were created, I cpushed the
> >
corrected
> > > > > > sshd_config file. Could these be
related with the errors?
> > > > >
> > > > >
You shouldn't need to edit sshd_config manually - anyways, we should
>
> be
> > > > > able to figure out what's wrong by
investigating the log files.
> > > > >
> > > >
> Thanks,
> > > > >
> > > > >
Bernard
> > > > >
> > > >
> >
>
> >
>
