Are you up-to-date for patches to DHCP?
From the symptoms you describe, you pretty clearly
have a bug or a corrupted installation for DHCP
itself.  That might not be your only problem, of course,
but it's the place to start.  Once you've eliminated that
variable if you still have problems we can go from there.

For Solaris 10, the SPARC patch is 118879-01,
the x86 patch is 118880-01 (since you're new to Solaris,
the way to check patch revs is 'showrev -p').

-Bob

Paul Bryan wrote:

On Thu, Jul 27, 2006 at 10:20:07PM -0600, Paul Greidanus wrote:
Can you clarify dropouts? 60 sunrays should easily work off a single interface.

Hi Paul. Thanks for the quick response. I didn't get back into the office
on Friday after posting my question and came in Monday morning to find
all the Sunrays down!

The problem (sorry I should have mentioned it to begin with) is that from
time to time a few of the Sunray's lose connection, and drop out with a
21 error code. The dhcp server is still running though. I see a lot
of errors like

Worker7 NOTICE: readMessage::socket looping limit exceeded.Close it.

and
Network: TCBDaemon: packet receive error

in the /var/opt/SUNWut/log/messages file.

This happens a few times per day and has been going on for a few weeks.

When they all went down like this this morning, I checked out the dhcp
config with dhcpmgr and it reported "Internal error" and showed no addresses at all. The same for "pntadm -P". I removed the SUNWbinfiles file
for our network from /var/dhcp and recreated the IP adresses. It's working
(for now at least!). I'm not really sure what happened. It seems as though
the file was corrupted, but I have no idea why!

So my concern now is will this problem reoccur and what (if any) is the
link between the intermittent dropouts and the total crash.

What you can do, is setup different switches for groups of 20 sunrays, and have the 3 nics serving 3 groups of 20 machines. Alternatively, if it's not linked at gigabit, that might help as well. This configuration is in the SRSS administration manual with diagrams as well.

We're running over 100M not gigabit. That may happen some time in the future.

I don't think we have the extra switches for this. The configuration at the
moment is using LAN connections and not per-interface configuration. There is
only one IP address listed for the authserver though. Do I need to provide a
list of all the ip addresses to balance across the interfaces?

Why not use SRSS 3.0, which makes more effective use of bandwidth from everything I've read?


I'm fairly new to Solaris (I've managed the odd box in amongst my Linux systems) and Sunrays are completely new to me. I've only just started this job so I
didn't want to go changing software until I'm a bit more familair with the 
setup.

Cheers,
Paul.


_______________________________________________
SunRay-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/sunray-users

Reply via email to