Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1

2011-01-11 Thread Devin Nate
Dear SunRay users (hopefully Bob, Craig also?);

We're continuing to have problems with random 26B's, and don't even know where 
to begin to help debug.

We don't see any particular errors in log files, besides several pam messages. 
The 26B's are happening on new sun ray 3's and older sun ray 2's.

So far, our only theory is a PAM problem ... we appear to have many of the 
following style messages.

dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: 
ut_getTokenByDisplay failed -1 for display :51

We run fully stock PAM, no customizations.

In any event, does anyone have any theories as to how to debug 26B's? In the 
most recent case, card removal corrected (went back to solaris login screen), 
and card insert worked exactly as expected and 26B was gone.

Please, any ideas on trouble shooting? We don't even really know what likely 
candidates for a 26B are to start a process.

Thanks,
Devin





From: sunray-users-boun...@filibeto.org 
[mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate
Sent: Tuesday, January 04, 2011 6:00 PM
To: 'sunray-users@filibeto.org'
Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1

Hi Folks;

We recently applied some sun ray patches, and are now experiencing some new 
problems. In particular, the following updates:

1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches 
were applied.
a. Performed prescribed reboots, single user patching, and configuration 
reboots.

2. Applied Java SDK 1.6.0_23, and activated as default java instance. 
Previously, we were at 1.5 (for some unknown reason).

3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), 
as well as SRWC 2.3 (previously 2.2).



The new undesirable behavior we are seeing:

1. Our users all use a Kiosk app to access windows terminal servers using 
uttsc. Intermittently, with card inserted and the kiosk app running (i.e. 
uttsc), the sun ray will display a 26B dialog box floating around for no 
apparent reason. The user is still able to fully use the system, just the 
annoyance of the 26B window floating around.
a. Our policy requires full encryption + client authentication. These dtu's are 
all authenticated and have worked continuously for a long time without this 
symptom.
b. A stop-A tends to be able to make it go away. It ?sometimes? comes back 
after ?some? unknown period of time, and doesn't impact all users.
c. Removal of the card properly takes the user back to the standard login 
solaris login screen. Re-insertion of the card back to the terminal server 
session.
d. Seen on SunRay 2 DTU and new SunRay 3.

2. Possibly related: When in the srs management website, after first patching 
(and still viewable), sessions show as disconnected that are clearly  
connected. In the most extreme case, I was in a OVDC session logged into a 
kiosk session (into a terminal server). I was on that terminal server looking 
at the srs website, at my session, which was identified as 'disconnected'. 
Further investigation showed that on initial login, the session shows as 
connected for a few minutes, after which it goes to disconnected, even though 
nothing becomes disconnected (i.e. still using the session). Likewise, 
reconnection works just fine.


Any feedback helpful, thanks,
Devin

___
SunRay-Users mailing list
SunRay-Users@filibeto.org
http://www.filibeto.org/mailman/listinfo/sunray-users


Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1

2011-01-11 Thread Bob Doolittle

26B means that communications between the client and the X server (or YUV 
client) has been interrupted.

Typical causes are:
- network failure
- X server crashed, and wasn't properly restarted
- X server hung
- it's a YUV session (used to display error or state-indicating icons without 
an X server) and the YUV client (yuvfile) isn't rendering to the screen 
properly or has died undetected

So the first step is to test whether you can ping the DTU from the server. If 
you can, you need to investigate the X server.
I wouldn't suspect that PAM could cause the X server to become horribly hung, 
although theoretically anything could tickle a bug resulting in an X server 
crash. Seems unlikely to me.

It appears you're on Solaris. Is it Solaris 10 or OpenSolaris?
What version and patch rev of SRSS?

I'd first of all try to identify the session that the client is supposedly 
servicing. utwho -ca should help there.

What type of session is it? Is it a greeter, RHA (session locked), YUV (error icon of 
some sort), or logged-in session? You can look in /var/opt/SUNWut/displays/DISPLAYNUM at 
the SESSION_TYPE. If it's default then the Display Manager (dtlogin for S10, 
GDM for OpenSolaris/S11/Linux) is responsible to restart the server if it dies. Otherwise 
it's SRSS's responsibility.

Then I'd look for the Xnewt process servicing that display. Is there one? If 
so, I'd try a pstack and also look in the Xserver error log for clues:
S10: /var/dt/Xerrors
S11 (or Linux): /var/log/gdm/:DISPLAYNUM

Sometimes we've observed that 26 can occur when /tmp gets clobbered and the 
/tmp/SUNWut directory structure has been disturbed, or the host has run out of 
VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed 
to. We do a lot of book-keeping in that area and if it's corrupted the software 
can misbehave. Check /var/adm/messages for signs of VM starvation.

-Bob

On 01/11/11 13:30, Devin Nate wrote:

Dear SunRay users (hopefully Bob, Craig also?);

We're continuing to have problems with random 26B's, and don't even know where 
to begin to help debug.

We don't see any particular errors in log files, besides several pam messages. 
The 26B's are happening on new sun ray 3's and older sun ray 2's.

So far, our only theory is a PAM problem ... we appear to have many of the 
following style messages.

dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: 
ut_getTokenByDisplay failed -1 for display :51

We run fully stock PAM, no customizations.

In any event, does anyone have any theories as to how to debug 26B's? In the 
most recent case, card removal corrected (went back to solaris login screen), 
and card insert worked exactly as expected and 26B was gone.

Please, any ideas on trouble shooting? We don't even really know what likely 
candidates for a 26B are to start a process.

Thanks,
Devin





From: sunray-users-boun...@filibeto.org 
[mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate
Sent: Tuesday, January 04, 2011 6:00 PM
To: 'sunray-users@filibeto.org'
Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1

Hi Folks;

We recently applied some sun ray patches, and are now experiencing some new 
problems. In particular, the following updates:

1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches 
were applied.
a. Performed prescribed reboots, single user patching, and configuration 
reboots.

2. Applied Java SDK 1.6.0_23, and activated as default java instance. 
Previously, we were at 1.5 (for some unknown reason).

3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), 
as well as SRWC 2.3 (previously 2.2).



The new undesirable behavior we are seeing:

1. Our users all use a Kiosk app to access windows terminal servers using 
uttsc. Intermittently, with card inserted and the kiosk app running (i.e. 
uttsc), the sun ray will display a 26B dialog box floating around for no 
apparent reason. The user is still able to fully use the system, just the 
annoyance of the 26B window floating around.
a. Our policy requires full encryption + client authentication. These dtu's are 
all authenticated and have worked continuously for a long time without this 
symptom.
b. A stop-A tends to be able to make it go away. It ?sometimes? comes back 
after ?some? unknown period of time, and doesn't impact all users.
c. Removal of the card properly takes the user back to the standard login 
solaris login screen. Re-insertion of the card back to the terminal server 
session.
d. Seen on SunRay 2 DTU and new SunRay 3.

2. Possibly related: When in the srs management website, after first patching 
(and still viewable), sessions show as disconnected that are clearly  
connected. In the most extreme case, I was in a OVDC session logged into a 
kiosk session (into a terminal server). I was on that terminal server looking 
at the srs website, at my session, which was identified as 'disconnected'. 
Further investigation showed that on 

Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1

2011-01-11 Thread Nishimura, Scott L (IT Solutions)
Just to emphasize something Bob mentioned, make sure you don't have a cron job 
that periodically deletes stuff in /tmp.  This is not uncommon on a lot of 
non-SRS systems but can be disastrous on a SRS box.


Scott

-Original Message-
From: sunray-users-boun...@filibeto.org 
[mailto:sunray-users-boun...@filibeto.org] On Behalf Of Bob Doolittle
Sent: Tuesday, January 11, 2011 10:56 AM
To: SunRay-Users mailing list
Subject: EXTERNAL:Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models 
- SRS 5.1.1

26B means that communications between the client and the X server (or YUV 
client) has been interrupted.

Typical causes are:
- network failure
- X server crashed, and wasn't properly restarted
- X server hung
- it's a YUV session (used to display error or state-indicating icons without 
an X server) and the YUV client (yuvfile) isn't rendering to the screen 
properly or has died undetected

So the first step is to test whether you can ping the DTU from the server. If 
you can, you need to investigate the X server.
I wouldn't suspect that PAM could cause the X server to become horribly hung, 
although theoretically anything could tickle a bug resulting in an X server 
crash. Seems unlikely to me.

It appears you're on Solaris. Is it Solaris 10 or OpenSolaris?
What version and patch rev of SRSS?

I'd first of all try to identify the session that the client is supposedly 
servicing. utwho -ca should help there.

What type of session is it? Is it a greeter, RHA (session locked), YUV (error 
icon of some sort), or logged-in session? You can look in 
/var/opt/SUNWut/displays/DISPLAYNUM at the SESSION_TYPE. If it's default then 
the Display Manager (dtlogin for S10, GDM for OpenSolaris/S11/Linux) is 
responsible to restart the server if it dies. Otherwise it's SRSS's 
responsibility.

Then I'd look for the Xnewt process servicing that display. Is there one? If 
so, I'd try a pstack and also look in the Xserver error log for clues:
S10: /var/dt/Xerrors
S11 (or Linux): /var/log/gdm/:DISPLAYNUM

Sometimes we've observed that 26 can occur when /tmp gets clobbered and the 
/tmp/SUNWut directory structure has been disturbed, or the host has run out of 
VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed 
to. We do a lot of book-keeping in that area and if it's corrupted the software 
can misbehave. Check /var/adm/messages for signs of VM starvation.

-Bob

On 01/11/11 13:30, Devin Nate wrote:
 Dear SunRay users (hopefully Bob, Craig also?);

 We're continuing to have problems with random 26B's, and don't even know 
 where to begin to help debug.

 We don't see any particular errors in log files, besides several pam 
 messages. The 26B's are happening on new sun ray 3's and older sun ray 2's.

 So far, our only theory is a PAM problem ... we appear to have many of the 
 following style messages.

 dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: 
 ut_getTokenByDisplay failed -1 for display :51

 We run fully stock PAM, no customizations.

 In any event, does anyone have any theories as to how to debug 26B's? In the 
 most recent case, card removal corrected (went back to solaris login screen), 
 and card insert worked exactly as expected and 26B was gone.

 Please, any ideas on trouble shooting? We don't even really know what likely 
 candidates for a 26B are to start a process.

 Thanks,
 Devin





 From: sunray-users-boun...@filibeto.org 
 [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate
 Sent: Tuesday, January 04, 2011 6:00 PM
 To: 'sunray-users@filibeto.org'
 Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1

 Hi Folks;

 We recently applied some sun ray patches, and are now experiencing some new 
 problems. In particular, the following updates:

 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 
 patches were applied.
 a. Performed prescribed reboots, single user patching, and configuration 
 reboots.

 2. Applied Java SDK 1.6.0_23, and activated as default java instance. 
 Previously, we were at 1.5 (for some unknown reason).

 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it 
 seems), as well as SRWC 2.3 (previously 2.2).



 The new undesirable behavior we are seeing:

 1. Our users all use a Kiosk app to access windows terminal servers using 
 uttsc. Intermittently, with card inserted and the kiosk app running (i.e. 
 uttsc), the sun ray will display a 26B dialog box floating around for no 
 apparent reason. The user is still able to fully use the system, just the 
 annoyance of the 26B window floating around.
 a. Our policy requires full encryption + client authentication. These dtu's 
 are all authenticated and have worked continuously for a long time without 
 this symptom.
 b. A stop-A tends to be able to make it go away. It ?sometimes? comes back 
 after ?some? unknown period of time, and doesn't impact all users.
 c. Removal of the card properly takes the user back to the 

Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1

2011-01-11 Thread Devin Nate
Hi Bob, Sun Ray Users;

Thank you so much. I think we've reviewed many of these problems but will 
review in context of your email here. This Sun Ray env has been in operation 
without the 26B's before the patches (SRS 5.0 - SRS 5.1.1, Solaris was patched 
via smpatch roughly 300 patchsets, and java 1.5 - java 1.6.0_23 32bit). My 
concern is a newly introduced bug.

1. ping's all work. It's not a specific dtu either, although there's a chance 
this is more prevalent on the newer Sun Ray 3's. Even removing a card brings 
the session back alive/no lack of network connectivity.

2. Xserver is Xnewt in our case. We (maybe) got a new one going from SRSS 
patchlevel -03 to -06 (srs 5.1.1). Will try to dig into Xnewt some more. 
Reviewing /var/dt/Xerrors now. Nothing particularly out of the ordinary in the 
cursory review but checking in depth.

3. Solaris 10 (not OpenSolaris, not Solaris 11) on Sun hardware, supported with 
current paid contract from Oracle. All patches from Oracle (smpatch update). 
Also paid for RTU's and maint (we run enterprise supported environment).

4. SRSS is 5.1.1 (140994-06). Was previously 5.0 with 140994-03. SRWC is 2.3 
now.

5. Will check utwho -ac output. 

6. All working sessions have default session type. I will need to wait till the 
next 26B report to see if it is default or something different. We do use 
dtlogin. Presumably, we may have got a new dtlogin program or supporting files 
with new Solaris patches.

7. We go out of our way to NOT clobber /tmp - absolutely no cleanup scripts. df 
reports tons of free space. 
swap 14399605380 1434580 1%/tmp
swap 1435504 924  1434580 1%/var/run

8. /var/adm/messages reports the standard SRWC message, plus a pam message. We 
know the SRWC message, just indicating a windows session shutdown. The pam 
one's in this file cause us to suspect people accidentially pressing buttons / 
book on keyboard /etc at dtlogin session login.

Sun Ray Connector proxy:[10389]: [ID 855542 user.error] Child closed socket 
prematurely, session shutdown
dtlogin[14867]: [ID 937900 user.error] sunray_get_user:isValidUsername: Invalid 
characters found in username
dtlogin[14867]: [ID 817952 user.error] sunray_get_user:pam_sm_auth: Username 
validation failed: Error -1


Thanks and will update as we find more info.
Devin


-Original Message-
From: Bob Doolittle [mailto:bob.doolit...@oracle.com] 
Sent: Tuesday, January 11, 2011 11:56 AM
To: SunRay-Users mailing list
Cc: Devin Nate
Subject: Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 
5.1.1

26B means that communications between the client and the X server (or YUV 
client) has been interrupted.

Typical causes are:
- network failure
- X server crashed, and wasn't properly restarted
- X server hung
- it's a YUV session (used to display error or state-indicating icons without 
an X server) and the YUV client (yuvfile) isn't rendering to the screen 
properly or has died undetected

So the first step is to test whether you can ping the DTU from the server. If 
you can, you need to investigate the X server.
I wouldn't suspect that PAM could cause the X server to become horribly hung, 
although theoretically anything could tickle a bug resulting in an X server 
crash. Seems unlikely to me.

It appears you're on Solaris. Is it Solaris 10 or OpenSolaris?
What version and patch rev of SRSS?

I'd first of all try to identify the session that the client is supposedly 
servicing. utwho -ca should help there.

What type of session is it? Is it a greeter, RHA (session locked), YUV (error 
icon of some sort), or logged-in session? You can look in 
/var/opt/SUNWut/displays/DISPLAYNUM at the SESSION_TYPE. If it's default then 
the Display Manager (dtlogin for S10, GDM for OpenSolaris/S11/Linux) is 
responsible to restart the server if it dies. Otherwise it's SRSS's 
responsibility.

Then I'd look for the Xnewt process servicing that display. Is there one? If 
so, I'd try a pstack and also look in the Xserver error log for clues:
S10: /var/dt/Xerrors
S11 (or Linux): /var/log/gdm/:DISPLAYNUM

Sometimes we've observed that 26 can occur when /tmp gets clobbered and the 
/tmp/SUNWut directory structure has been disturbed, or the host has run out of 
VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed 
to. We do a lot of book-keeping in that area and if it's corrupted the software 
can misbehave. Check /var/adm/messages for signs of VM starvation.

-Bob

On 01/11/11 13:30, Devin Nate wrote:
 Dear SunRay users (hopefully Bob, Craig also?);

 We're continuing to have problems with random 26B's, and don't even know 
 where to begin to help debug.

 We don't see any particular errors in log files, besides several pam 
 messages. The 26B's are happening on new sun ray 3's and older sun ray 2's.

 So far, our only theory is a PAM problem ... we appear to have many of the 
 following style messages.

 dtlogin[990]: [ID 691260

Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1

2011-01-11 Thread Devin Nate
Hi Craig;

Thanks for the follow up.

1. Unless a solaris patchset added a /tmp cleaner, we absolutely do not do 
this. Our kiosk code requires readonly access to a perfect copy of /tmp/SUNWut 
- anything that messes with that would be horrible for us.

2. Yes, I was personally working on a station, uttsc kiosk session active and 
usable, and 26B floating around. However, today, we watched a different similar 
scenario, where a user was working, GOT the 26B floating around and unable to 
proceed. Removed card, the solaris dtlogin came up. Re-inserted their card and 
got back to their session as normal, no more 26B. It's not consistent.

3. Java was upgraded from 1.5 to 1.6.0_23 Solaris x86 32-bit.

4. Will review utsession -p when we get our next 26B. We may have one, have a 
remote user who left for lunch with one but I can't reach right now.

5. All of our sessions are kiosk sessions, so I'd say yes.

6. /etc/opt/SUNWut/jre points to /usr/java, which is the newly installed 
1.6.0_23:

/etc/opt/SUNWut/jre/bin/java -version
java version 1.6.0_23
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) Server VM (build 19.0-b09, mixed mode)

7. Yes, updated all dtu's to GUI4.2_140993-06_2010.10.08.21.53.
Both members of FOG equally updated. Just double checked. Just checked on 
version info on DTU's and they do represent that level.

8. Don't believe we have any disk space issues. Our 'smallest' member of the 
fog reports no more than 25% disk utilization.

9. Very custom kiosk script in perl. However, it essentially runs 2 utactions 
and completes with it essentially calls uttsc. The final perl is:
system(/opt/SUNWuttsc/bin/uttsc  ...);

10. Again, no /tmp cleanup processes at all.

Thanks,
Devin




-Original Message-
From: sunray-users-boun...@filibeto.org 
[mailto:sunray-users-boun...@filibeto.org] On Behalf Of Craig Bender
Sent: Tuesday, January 11, 2011 12:22 PM
To: SunRay-Users mailing list
Subject: Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 
5.1.1

Hi Devin,

Just to be clear, users are actively using the client and can continue to do so 
when the On Screen Display pops up with 26B?  OSD code of 26 is basically 
telling you the DTU is waiting on the Xserver to start sending it traffic.

I've seen a few different causes for code 26.  Wrong Java version, crashing 
Xservers, a cron job that cleans out /tmp periodically and deletes critical 
session files /tmp/SUNWut.  But I don't think I've ever seen a case where the 
user could keep on using the session.

A few questions.

What does utsession -p report when the OSD is on the screen?
Does it only happen to kiosk sessions?
Does /etc/opt/SUNWut/jre point to a 32 bit version of the 1.6 JRE?

Did you update the DTUs with the new firmware image?
No disk space issues?
No cleanup scripts in crontab that might be clearing out /tmp Are you using the 
built-in Kiosk Script for SRWC?  Any customizations there?

On 1/11/11 10:30 AM, Devin Nate wrote:
 Dear SunRay users (hopefully Bob, Craig also?);

 We're continuing to have problems with random 26B's, and don't even know 
 where to begin to help debug.

 We don't see any particular errors in log files, besides several pam 
 messages. The 26B's are happening on new sun ray 3's and older sun ray 2's.

 So far, our only theory is a PAM problem ... we appear to have many of the 
 following style messages.

 dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: 
 ut_getTokenByDisplay failed -1 for display :51

 We run fully stock PAM, no customizations.

 In any event, does anyone have any theories as to how to debug 26B's? In the 
 most recent case, card removal corrected (went back to solaris login screen), 
 and card insert worked exactly as expected and 26B was gone.

 Please, any ideas on trouble shooting? We don't even really know what likely 
 candidates for a 26B are to start a process.

 Thanks,
 Devin





 From: sunray-users-boun...@filibeto.org 
 [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate
 Sent: Tuesday, January 04, 2011 6:00 PM
 To: 'sunray-users@filibeto.org'
 Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1

 Hi Folks;

 We recently applied some sun ray patches, and are now experiencing some new 
 problems. In particular, the following updates:

 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 
 patches were applied.
 a. Performed prescribed reboots, single user patching, and configuration 
 reboots.

 2. Applied Java SDK 1.6.0_23, and activated as default java instance. 
 Previously, we were at 1.5 (for some unknown reason).

 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it 
 seems), as well as SRWC 2.3 (previously 2.2).



 The new undesirable behavior we are seeing:

 1. Our users all use a Kiosk app to access windows terminal servers using 
 uttsc. Intermittently, with card inserted and the kiosk app running (i.e. 
 uttsc), the sun ray will display