Well, the first thing you should do is to install the SRSS-supplied JRE for Linux (you can install it in a non-default location so that only SRSS uses it), and point /etc/opt/SUNWut/jre to it.

We have seen many problems in the past with vendor-supplied JREs and utauthd, so that's why we supply (and test and support) our own for use with utauthd. From what you're saying this may not affect this issue (since you're seeing it on Solaris also) but it will reduce risk for other issues.

We'll continue to pursue the other information you provided.

-Bob

Jens Langner wrote:
Hi Bob,

Bob Doolittle schrieb:

[...]
Any further suggestions welcome.
Does utgstatus start reporting correctly again after a few minutes? If
so, this might possibly be a thread scheduling issue. The Group Manager
is responsible for sending out broadcast/multicast group membership
"advertisements" every 20 seconds, and there is a dedicated thread in
utauthd for sending and collecting such advertisements that determines
group membership. It's conceivable that if the other threads were
somehow preventing the GM thread from being scheduled (which shouldn't
be possible, but...) a problem like this could occur. I'd expect it to
"repair itself" after a couple of minutes, however.

Well, sometimes not all of the servers in the FOG fail immediately when
one server goes down. Depending on the downtime or perhaps at the very
situation the server is suddenly unavailable for the others either one
or all servers in the FOG lose their connections to the group and thus
interrupts the sunray connections. Only a "utrestart" on each server
reporting that it can't establish the group connection (utgstatus
reports an error) helps to restore the sessions.

How large is your site? How many servers in this FOG, how many Sun Rays?
Are you using the JRE supplied with SRSS (i.e. did you specify it during
utinstall, so that /etc/opt/SUNWut/jre now points to it)? How much RAM
is in your server, and what does ps report for utauthd:
On Solaris: # pargs `pgrep -f utauthd`
On Linux: # ps wwwaux | grep utauthd

In fact, we have two failover groups. One between two Solaris10 servers
running SRSS 4.1 in kiosk mode to serve uttsc/rdesktop sessions. Another
three servers are connected in a FOG with different SRSS versions (4.1
and 4.0) and operating systems (Linux/Solaris). However, in both
failover group the problem occurrs as soon as I reboot one of the server
or suddenly interrupt the network connection. As soon as I reissue a
"utrestart" on the affected servers the sessions will return correctly.
All in all we have about 90 Sunrays connected to these two failover
group with 5 servers. However, it does not really seem to matter how
many sunrays are connected to the individual sunray servers.

On our Solaris (SPARC) machines the jre link points to:

/etc/opt/SUNWut/jre -> /usr/java
java version "1.5.0_17"

And on our Linux (Ubuntu x86_64) machines the jre link points to:

/etc/opt/SUNWut/jre -> /usr/lib/jvm/ia32-java-6-sun/jre
java version "1.6.0_10"

Here is the output of the ps commands:

Solaris:
19495:  /etc/opt/SUNWut/jre/bin/java -client auth.utauthd.utauthd
argv[0]: /etc/opt/SUNWut/jre/bin/java
argv[1]: -client
argv[2]: auth.utauthd.utauthd

Linux:
root      9157  0.0  0.0 220984 18132 ?        Sl   12:10   0:07
/etc/opt/SUNWut/jre/bin/java -client auth.utauthd.utauthd

cheers,
jens

_______________________________________________
SunRay-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/sunray-users

Reply via email to