Already doing so.  Iterating off the list; we'll post back when the issue
is finally solved.

On Fri, 4 Apr 2003, Jeremy Enos wrote:

> Ashish-
> That's good news.  It does appear that PBS isn't the problem, short of some
> strange interaction between the other tests and PBS.  Remember to keep the
> oscar-users list copied on our progress; that way the LAM helpers can
> assist when necessary.
> LAM folks- can you instruct Ashish on how to manually launch a very simple
> LAM test (with or w/o PBS) so that we can narrow this issue further?
> thx-
>
>          Jeremy
>
> At 03:20 PM 4/4/2003 +0000, Ashish  Navaney wrote:
> >Jeremy-
> >
> >I ran the pbs shell_test manually : pbs seems to be wkg fine
> >
> >here's the output of shelltest ran after su - oscartst in pbs directory:
> >
> >shelltest.err : blank
> >
> >shelltest.out :
> >
> >node2.theory
> >node1.theory
> >Hello, date is 04/04/03, time is 20:35:52
> >Hello, date is 04/04/03, time is 20:35:52
> >
> >PVM and LAM/MPI tests still fail.
> >Thx in advance
> >
> >-Ashish Navaney
> >([EMAIL PROTECTED])
> >
> >On Thu, 03 Apr 2003 Jeremy Enos wrote :
> >>Ashish-
> >>Try running the pbs_shell test manually for me:
> >>
> >>Starting as root:
> >>su - oscartst
> >>cd pbs
> >>qsub -l nodes=2:ppn=1 pbs_script.shell
> >>cat shelltest.err (should be blank)
> >>cat shelltest.out
> >>
> >>This should tell us if PBS itself is working ok or not.
> >>
> >>         Jeremy
> >>
> >>At 01:32 PM 4/2/2003 +0000, Ashish  Navaney wrote:
> >>>Jeremy -
> >>>
> >>>Here's the output of "pbsnodes -a" with 2 compute nodes in the cluster:
> >>>
> >>>[EMAIL PROTECTED] oscar-2.2]# pbsnodes -a
> >>>node1.theory
> >>>       state = free
> >>>       np = 1
> >>>       properties = all
> >>>       ntype = cluster
> >>>
> >>>node2.theory
> >>>       state = free
> >>>       np = 1
> >>>       properties = all
> >>>       ntype = cluster
> >>>
> >>>- using oscar2.2 with rh 7.3 on m/cs with homogenous configns.
> >>>
> >>>My hardware configuration :
> >>>
> >>>P4 processor (2 GHz)
> >>>Mainboard KOB P4M266 NDFSMX (VIA Chipset)
> >>>256 MB SDRAM
> >>>40 GB Harddisk
> >>>Built-in Ethernet LAN 10BaseT/100BaseTX
> >>>External Ethernet Card (D-Link DFE-538TX) 10/100 Mbps Adapter
> >>>24-port 100 MBPS D-Link Network Switch
> >>>
> >>>
> >>>Note : I am using the external card for networking the cluster.
> >>>
> >>>- The PVM test and the LAM/MPI test fail immediately during cluster
> >>>   test(theres no timeout) and there is no error msg.
> >>>
> >>>- have excluded HDF5 from present confign
> >>>
> >>>Here are the relevant PVM and LAM o/p and err files -
> >>>
> >>>1)
> >>>
> >>>/home/oscartst/pvm/pvmtest.out : blank
> >>>
> >>>2)
> >>>
> >>>/home/oscartst/pvm/pvmtest.err :
> >>>
> >>>
> >>>/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC:pvmd:command not found
> >>>master1.c:37:18 : pvm3.h : No such file or directory
> >>>slave1.c:34:18 : pvm3.h : No such file or directory
> >>>/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC: ./master1 : No such file or
> >>>directory
> >>>pvmd3: no process killed
> >>>
> >>>3)
> >>>
> >>>/home/oscartst/lam/lamtest.out:
> >>>
> >>>Running LAM/MPI test
> >>>
> >>>MPI C Bindings Test -->
> >>>
> >>>TEST FAILED!
> >>>
> >>>Commands : mpicc cpi.c -o lam-cpi && mpirun C lam-cpi && lamclean
> >>>
> >>>4)
> >>>
> >>>/home/oscartst/lam/lamtest.err :
> >>>
> >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamboot : command not found
> >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:mpicc : command not found
> >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamhalt : command not found
> >>>
> >>>
> >>>Thx in advance,
> >>>Ashish
> >>>
> >>>
> >>>>
> >>>>On Wed, 02 Apr 2003 Jeremy Enos wrote :
> >>>>>Ashish-
> >>>>>Please send your "pbsnodes -a" output.  What type of hardware are you
> >>>>>running on?
> >>>>>thx-
> >>>>>
> >>>>>         Jeremy
> >>>>>
> >>>>>At 03:43 AM 4/2/2003 +0000, Ashish  Navaney wrote:
> >>>>>>Hi,
> >>>>>>
> >>>>>>thx for replying
> >>>>>>
> >>>>>>oscar2.2 + rh7.3
> >>>>>>all m/cs with same confign.( P4 )
> >>>>>>
> >>>>>>1)
> >>>>>>
> >>>>>>when i ran 'pbsnodes -a' on the cluster  the o/p shows that the nodes
> >>>>>>are coming online
> >>>>>>(i dont have the exact output at this moment )
> >>>>>>but i do remember that all the nodes were listed and for each of
> >>>>>>them, their 'properties' and 'state'  showed as 'free' and 'ntype=cluster'
> >>>>>>
> >>>>>>2)
> >>>>>>
> >>>>>>henceforth i reconfigured oscar without the HDF5 package so the PBS
> >>>>>>HDF5 test didnt happen but
> >>>>>>now the MPICH(via PBS) test fails i.e. it times out
> >>>>>>the same error message comes on "Checking for 2 free nodes...not
> >>>>>>enough free nodes...tests incomplete.There were some issues running
> >>>>>>some user tests. Please check ur logs."
> >>>>>>
> >>>>>>
> >>>>>>when i change the switcher option to LAM/MPI, still the LAM/MPI(via
> >>>>>>PBS) Test fails.
> >>>>>>
> >>>>>>any suggestions...i hv been stuck up with this for days now...even
> >>>>>>tried oscar 2.1 but i get the same prob.
> >>>>>>
> >>>>>>3)
> >>>>>>
> >>>>>>is there any way i can manually install MPICH / LAM to work with the
> >>>>>>oscar cluster even though the oscar MPI fails ?
> >>>>>>
> >>>>>>i'm an final year computer engg student from india....need prompt help.
> >>>>>>
> >>>>>>Thx in adavance
> >>>>>>Ashish Navaney
> >>>>>>([EMAIL PROTECTED])
> >>>>>>
> >>>>>>Message 1 :
> >>>>>>Ashish-
> >>>>>>It sounds like your second node isn't coming online. Add it and then
> >>>>>>run a
> >>>>>>"pbsnodes -a" for me and send the output.
> >>>>>>
> >>>>>>Jeremy
> >>>>>>
> >>>>>>At 02:21 PM 3/30/2003 +0000, Ashish Navaney wrote:
> >>>>>>>hi,
> >>>>>>>i need some help urgently....
> >>>>>>>trying oscar 2.2 on rh7.3...
> >>>>>>>
> >>>>>>>with 1 server and 1 node the cluster tests successfully
> >>>>>>>but on adding even one more node the PBS HDF5 test fails during the
> >>>>>>>cluster test...the 30 secs timeout
> >>>>>>>the foll message appears :
> >>>>>>>
> >>>>>>>"Checking for 2 free nodes...not enough free nodes...tests incomplete.
> >>>>>>>There were some issues running some user tests. Please check ur logs."
> >>>>>>>
> >>>>>>>also when i delete the 2nd node the cluster passes the test.
> >>>>>>>
> >>>>>>>pls help
> >>>>>>>thx
> >>>>>>>Ashish Navaney
> >>>>>>>([EMAIL PROTECTED])
> >>>
> >>>
> >>>
> >>>
> >>>_______________________________________________________________________
> >>>Odomos - the only  mosquito protection outside 4 walls -
> >>>Click here to know more!
> >>>http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn
> >
> >
> >
> >_______________________________________________________________________
> >Odomos - the only  mosquito protection outside 4 walls -
> >Click here to know more!
> >http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn
> >
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: ValueWeb:
> Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
> No other company gives more support or power for your dedicated server
> http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
> _______________________________________________
> Oscar-users mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>

-- 
{+} Jeff Squyres
{+} [EMAIL PROTECTED]
{+} http://www.lam-mpi.org/


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: 
Dedicated Hosting for just $79/mo with 500 GB of bandwidth! 
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to