My cluster has been running awesome, until I just updated all the Fedora 2 RPM’s using yum.  The install went great, but when I reload an image my nodes get half way through a rebuild and die.  I was having this exact same problem before when I built up the head node updated it and then tried to push to the clients.  So I took someone’s advice this time and built the cluster before adding any updates and it worked just fine.

 

Now when a node start rebuilding from the TFTP image, it starts load normally until it gets to the point that it’s placing the files on the drive.  Then it stops at a random spot and crash out.  Here is the error that I’m getting:

 

########################

 

Rsync: read error: Connection reset by peer

Rsync: error: error in rsync protocol data stream (code 12) at io.c(177)

Rsync: connection unexpectedly closed (1729062 bytes read so far)

Rsync: error in rsync protocol data stream (code 12) at io.c(165) Killing off running processes

 

#######################

 

I had noticed this error before so I did not update rsync with yum.  I left that RPM out.  Any ideas what I can do to repair this?  I don’t want to re-install this again.  It would be nice if there was a way to removed all the patches that I added to the head node, but there about 644 of them.

 

**oh, and apart from that problem, it seems like half the time I halt a node it comes up un-cleaned after I restart it and I have to reload the image.  Is that normal?  Seem much pickier then a strait install.

Reply via email to