Already doing so. Iterating off the list; we'll post back when the issue is finally solved.
On Fri, 4 Apr 2003, Jeremy Enos wrote: > Ashish- > That's good news. It does appear that PBS isn't the problem, short of some > strange interaction between the other tests and PBS. Remember to keep the > oscar-users list copied on our progress; that way the LAM helpers can > assist when necessary. > LAM folks- can you instruct Ashish on how to manually launch a very simple > LAM test (with or w/o PBS) so that we can narrow this issue further? > thx- > > Jeremy > > At 03:20 PM 4/4/2003 +0000, Ashish Navaney wrote: > >Jeremy- > > > >I ran the pbs shell_test manually : pbs seems to be wkg fine > > > >here's the output of shelltest ran after su - oscartst in pbs directory: > > > >shelltest.err : blank > > > >shelltest.out : > > > >node2.theory > >node1.theory > >Hello, date is 04/04/03, time is 20:35:52 > >Hello, date is 04/04/03, time is 20:35:52 > > > >PVM and LAM/MPI tests still fail. > >Thx in advance > > > >-Ashish Navaney > >([EMAIL PROTECTED]) > > > >On Thu, 03 Apr 2003 Jeremy Enos wrote : > >>Ashish- > >>Try running the pbs_shell test manually for me: > >> > >>Starting as root: > >>su - oscartst > >>cd pbs > >>qsub -l nodes=2:ppn=1 pbs_script.shell > >>cat shelltest.err (should be blank) > >>cat shelltest.out > >> > >>This should tell us if PBS itself is working ok or not. > >> > >> Jeremy > >> > >>At 01:32 PM 4/2/2003 +0000, Ashish Navaney wrote: > >>>Jeremy - > >>> > >>>Here's the output of "pbsnodes -a" with 2 compute nodes in the cluster: > >>> > >>>[EMAIL PROTECTED] oscar-2.2]# pbsnodes -a > >>>node1.theory > >>> state = free > >>> np = 1 > >>> properties = all > >>> ntype = cluster > >>> > >>>node2.theory > >>> state = free > >>> np = 1 > >>> properties = all > >>> ntype = cluster > >>> > >>>- using oscar2.2 with rh 7.3 on m/cs with homogenous configns. > >>> > >>>My hardware configuration : > >>> > >>>P4 processor (2 GHz) > >>>Mainboard KOB P4M266 NDFSMX (VIA Chipset) > >>>256 MB SDRAM > >>>40 GB Harddisk > >>>Built-in Ethernet LAN 10BaseT/100BaseTX > >>>External Ethernet Card (D-Link DFE-538TX) 10/100 Mbps Adapter > >>>24-port 100 MBPS D-Link Network Switch > >>> > >>> > >>>Note : I am using the external card for networking the cluster. > >>> > >>>- The PVM test and the LAM/MPI test fail immediately during cluster > >>> test(theres no timeout) and there is no error msg. > >>> > >>>- have excluded HDF5 from present confign > >>> > >>>Here are the relevant PVM and LAM o/p and err files - > >>> > >>>1) > >>> > >>>/home/oscartst/pvm/pvmtest.out : blank > >>> > >>>2) > >>> > >>>/home/oscartst/pvm/pvmtest.err : > >>> > >>> > >>>/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC:pvmd:command not found > >>>master1.c:37:18 : pvm3.h : No such file or directory > >>>slave1.c:34:18 : pvm3.h : No such file or directory > >>>/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC: ./master1 : No such file or > >>>directory > >>>pvmd3: no process killed > >>> > >>>3) > >>> > >>>/home/oscartst/lam/lamtest.out: > >>> > >>>Running LAM/MPI test > >>> > >>>MPI C Bindings Test --> > >>> > >>>TEST FAILED! > >>> > >>>Commands : mpicc cpi.c -o lam-cpi && mpirun C lam-cpi && lamclean > >>> > >>>4) > >>> > >>>/home/oscartst/lam/lamtest.err : > >>> > >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamboot : command not found > >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:mpicc : command not found > >>>/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamhalt : command not found > >>> > >>> > >>>Thx in advance, > >>>Ashish > >>> > >>> > >>>> > >>>>On Wed, 02 Apr 2003 Jeremy Enos wrote : > >>>>>Ashish- > >>>>>Please send your "pbsnodes -a" output. What type of hardware are you > >>>>>running on? > >>>>>thx- > >>>>> > >>>>> Jeremy > >>>>> > >>>>>At 03:43 AM 4/2/2003 +0000, Ashish Navaney wrote: > >>>>>>Hi, > >>>>>> > >>>>>>thx for replying > >>>>>> > >>>>>>oscar2.2 + rh7.3 > >>>>>>all m/cs with same confign.( P4 ) > >>>>>> > >>>>>>1) > >>>>>> > >>>>>>when i ran 'pbsnodes -a' on the cluster the o/p shows that the nodes > >>>>>>are coming online > >>>>>>(i dont have the exact output at this moment ) > >>>>>>but i do remember that all the nodes were listed and for each of > >>>>>>them, their 'properties' and 'state' showed as 'free' and 'ntype=cluster' > >>>>>> > >>>>>>2) > >>>>>> > >>>>>>henceforth i reconfigured oscar without the HDF5 package so the PBS > >>>>>>HDF5 test didnt happen but > >>>>>>now the MPICH(via PBS) test fails i.e. it times out > >>>>>>the same error message comes on "Checking for 2 free nodes...not > >>>>>>enough free nodes...tests incomplete.There were some issues running > >>>>>>some user tests. Please check ur logs." > >>>>>> > >>>>>> > >>>>>>when i change the switcher option to LAM/MPI, still the LAM/MPI(via > >>>>>>PBS) Test fails. > >>>>>> > >>>>>>any suggestions...i hv been stuck up with this for days now...even > >>>>>>tried oscar 2.1 but i get the same prob. > >>>>>> > >>>>>>3) > >>>>>> > >>>>>>is there any way i can manually install MPICH / LAM to work with the > >>>>>>oscar cluster even though the oscar MPI fails ? > >>>>>> > >>>>>>i'm an final year computer engg student from india....need prompt help. > >>>>>> > >>>>>>Thx in adavance > >>>>>>Ashish Navaney > >>>>>>([EMAIL PROTECTED]) > >>>>>> > >>>>>>Message 1 : > >>>>>>Ashish- > >>>>>>It sounds like your second node isn't coming online. Add it and then > >>>>>>run a > >>>>>>"pbsnodes -a" for me and send the output. > >>>>>> > >>>>>>Jeremy > >>>>>> > >>>>>>At 02:21 PM 3/30/2003 +0000, Ashish Navaney wrote: > >>>>>>>hi, > >>>>>>>i need some help urgently.... > >>>>>>>trying oscar 2.2 on rh7.3... > >>>>>>> > >>>>>>>with 1 server and 1 node the cluster tests successfully > >>>>>>>but on adding even one more node the PBS HDF5 test fails during the > >>>>>>>cluster test...the 30 secs timeout > >>>>>>>the foll message appears : > >>>>>>> > >>>>>>>"Checking for 2 free nodes...not enough free nodes...tests incomplete. > >>>>>>>There were some issues running some user tests. Please check ur logs." > >>>>>>> > >>>>>>>also when i delete the 2nd node the cluster passes the test. > >>>>>>> > >>>>>>>pls help > >>>>>>>thx > >>>>>>>Ashish Navaney > >>>>>>>([EMAIL PROTECTED]) > >>> > >>> > >>> > >>> > >>>_______________________________________________________________________ > >>>Odomos - the only mosquito protection outside 4 walls - > >>>Click here to know more! > >>>http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn > > > > > > > >_______________________________________________________________________ > >Odomos - the only mosquito protection outside 4 walls - > >Click here to know more! > >http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Oscar-users mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/oscar-users > -- {+} Jeff Squyres {+} [EMAIL PROTECTED] {+} http://www.lam-mpi.org/ ------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
