Hi Michael, Your analysis is perfectly correct. After replacing ethernet cables with new ones and relocating the switch to a more suitable place (it was placed on top of a hot computer which might have led to its malfunctioning), the bizarre errors were gone and the client installation was done within a minute. So now my new cluster is now up and running.
I cannot thank you enough for the help. I have been wondering all along that the problem may be on the computers themselves. It has never occurred to me that the problem is actually on the ethernet cables and/or the switch! Shiang-Tai Lin Michael Edwards wrote: > Have you tried a different switch? Or maybe even a crossover cable... > That would eliminate the network fabric as the problem. > This is probably a long shot, but it looks like you are having network > issues of some kind. > > Also, check /var/log/messages to see if there are any obvious network > errors. > > You could also try and find some 10/100 ethernet cards and try imaging > using those (the network drivers are better). That would let you know > if the network card drivers are bad. > > If the Network cards are on the motherboard, you could check for > firmware/bios updates at your motherboard mfg website. Some network > cards are known to be flakey with linux as well, so you might check > boards and google for your network card model and your linux distribution. > > On Mon, Apr 28, 2008 at 10:46 PM, stlin <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi Michael: > > I follow your suggestion and have a clean OS (FC8) and OSCAR (5.1b2) > install. The client installations again stuck at "Quietly installing > image..." and failed after about 2 hrs. Note that I have turned > off the > firewall on the head node during the OS install. The oscar log file is > posted on > from begining to step 3: http://pastebin.ca/1001279 > steps 4 to 6: http://pastebin.ca/1001283 > and the client terminal output: http://pastebin.ca/1001286 > > Although the client installation was not complete, I can ssh from the > client to the sever. So for some unknown reason, the data transfer > seems > to be extremely slow between the head and client nodes and rsync > seemed > to fail after several hundred minutes. > > Thanks a lot for looking into this problem for me. > > Sincerely, > Shiang-Tai Lin > > Michael Edwards wrote: > > Personally I would start from a clean OS install. I might even try > > downloading the repositories again. It sounds like something is > very > > broken. When you try again try running the wizard by doing > > > > env OSCAR_VERBOSE=3 ./install_cluster eth0 > > > > I wouldn't trust your current setup though, it seems like it has > > gotten beyond useful debugging. > > > > On Sat, Apr 26, 2008 at 1:50 AM, stlin <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> > > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>> wrote: > > > > Hi Michael, > > > > Thanks for your reply. > > > > I tried "/etc/init.d/mysqld status" and the result showed that > > mysql was running. And even if I execute "/etc/init.d/mysqld > > restart", I still get "DBD::mysql::st execute failed: MySQL > server > > has gone away at /opt/oscar/lib/OSCAR/oda.pm <http://oda.pm> > <http://oda.pm> line > > 802." when I click on the "Setup Networking" button. > > > > I then tried to start over (/opt/oscar/scripts/start_over), > > rebooted the server, and launched the OSCAR wizard. But this > time > > the client installation got stuck after received > > boel_binaries.tar.gz. The terminal output from the client is > > http://pastebin.ca/998123 and the oscar log file is posted to > > http://pastebin.ca/998121. > > > > Any hint will be highly appreciated. Thank you. > > > > Sincerely, > > > > Shiang-Tai > > > > Michael Edwards wrote: > >> It looks like your mysql daemon on the head node died or > was not > >> started. > >> > >> Try doing "/etc/init.d/mysqld restart" and try reimaging > the nodes. > >> > >> On Fri, Apr 25, 2008 at 11:09 AM, Shiang-Tai Lin > >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>> wrote: > >> > >> Hi Michael, > >> > >> Thanks for the instruction. > >> > >> The oscar log is posted on http://pastebin.ca/997165 > >> > >> The complete client terminal message is posted on > >> http://pastebin.ca/997170 </997170> > >> > >> Thanks in advance for any hints. > >> > >> Shiang-Tai > >> > >> Michael Edwards wrote: > >> > Logs etc would be helpful. Post them at pastebin.ca > <http://pastebin.ca> > >> <http://pastebin.ca> > >> > <http://pastebin.ca> and post a link here. > >> > > >> > On Fri, Apr 25, 2008 at 10:30 AM, Shiang-Tai Lin > >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > >> > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>>> wrote: > >> > > >> > Hi, > >> > > >> > My attempt to set up a cluster using OSCAR 5.1b2 with > >> Fedora core > >> > 8 was > >> > failed during the client installation (Step 6, > >> Monitoring Cluster > >> > Deployment). The installation of client image > seems to > >> be abnormally > >> > slow (data transfer speed was about 50 Kb/s) and > failed > >> after > >> > about 150 > >> > minutes with nc and rsync error. The last few lines > >> from the client > >> > terminal output are > >> > > >> > ************quote from the client > >> > terminal******************************** > >> > Quietly installing image... > >> > /rsync -aHS --exclude=lost+found/ --exclude=/proc/* > >> --numeric-ids > >> > 10.0.3.230::oscarimage/ /a/ > >> > /nc: connect: No route to host > >> > -|/-|/-|/-nc: connect: No route to host > >> > |/-|/-|/-nc: connect: No route to host > >> > |/-|/-|/-|nc: connect: No route to host > >> > -|/-|/-|/-|/-|/-|/-|/-|/rsync error: timeout in data > >> send/receive > >> > (code > >> > 30) at io.c(165) [sender=2.6.9] > >> > rsync: read error: Connection reset by peer (104) > >> > rsync error: error in rsync protocol data stream > (code > >> 12) at > >> > io.c(759) > >> > [receiver=3.0.0pre6] > >> > rsync: connection unexpectedly closed (776458 bytes > >> received so far) a > >> > rsync error: error in rsync protocol data stream > (code > >> 12) at > >> > io.c(600) a > >> > Killing off running processes. > >> > > >> > write_variables > >> > > >> > ************************************************************** > >> > > >> > The hardware spec (both for the server and the > client) are > >> > CPU: dual Intel Xeon E5345 (quad core) > >> > Motherboard: Tyan S2692 Tempest i5000XL #D1796-100 > >> > Network Card: Intel(R) PRO/1000 Gigabit Server > Adapter > >> (Intel GbE from > >> > ESB2(w/ single port "Gilgal")-ASF2.0) > >> > Hard Drive: 250GB SATA2 > >> > > >> > I have upgraded the BIOS to the latest version > and used > >> "UYOK" in > >> > network setup. I have also stopped the firewall > >> (service iptables > >> > stop). > >> > Any hints to solve the problem is greatly > appreciated. > >> Please let me > >> > know if it is necessary to post the complete > oscar log > >> file and the > >> > client terminal messages (they are over 50 Kb). > Thanks > >> a lot. > >> > > >> > Sincerely, > >> > Shiang-Tai Lin > >> > > >> > > >> > > >> > > >> > > >> > ------------------------------------------------------------------------- > >> > This SF.net email is sponsored by the 2008 > JavaOne(SM) > >> Conference > >> > Don't miss this year's exciting event. There's still > >> time to save > >> > $100. > >> > Use priority code J8TL2D2. > >> > > >> > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > >> > _______________________________________________ > >> > Oscar-users mailing list > >> > Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > >> <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>> > >> > <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > >> <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>>> > >> > > https://lists.sourceforge.net/lists/listinfo/oscar-users > >> > > >> > > >> > > >> > ------------------------------------------------------------------------ > >> > > >> > > >> > ------------------------------------------------------------------------- > >> > This SF.net email is sponsored by the 2008 JavaOne(SM) > >> Conference > >> > Don't miss this year's exciting event. There's still time > >> to save $100. > >> > Use priority code J8TL2D2. > >> > > >> > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > >> > > >> > ------------------------------------------------------------------------ > >> > > >> > _______________________________________________ > >> > Oscar-users mailing list > >> > Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > >> <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>> > >> > https://lists.sourceforge.net/lists/listinfo/oscar-users > >> > > >> > >> > >> > ------------------------------------------------------------------------- > >> This SF.net email is sponsored by the 2008 JavaOne(SM) > Conference > >> Don't miss this year's exciting event. There's still > time to > >> save $100. > >> Use priority code J8TL2D2. > >> > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > >> _______________________________________________ > >> Oscar-users mailing list > >> Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > >> <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>> > >> https://lists.sourceforge.net/lists/listinfo/oscar-users > >> > >> > >> > ------------------------------------------------------------------------ > >> > ------------------------------------------------------------------------- > >> This SF.net email is sponsored by the 2008 JavaOne(SM) > Conference > >> Don't miss this year's exciting event. There's still time > to save > >> $100. Use priority code J8TL2D2. > >> > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > >> > >> > >> > ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Oscar-users mailing list > >> Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>> > >> https://lists.sourceforge.net/lists/listinfo/oscar-users > >> > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) > Conference > > Don't miss this year's exciting event. There's still time to > save > > $100. > > Use priority code J8TL2D2. > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > > <mailto:Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net>> > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to > save $100. > > Use priority code J8TL2D2. > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save > $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > <mailto:Oscar-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > ------------------------------------------------------------------------ > > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users