Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1
Dear SunRay users (hopefully Bob, Craig also?); We're continuing to have problems with random 26B's, and don't even know where to begin to help debug. We don't see any particular errors in log files, besides several pam messages. The 26B's are happening on new sun ray 3's and older sun ray 2's. So far, our only theory is a PAM problem ... we appear to have many of the following style messages. dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: ut_getTokenByDisplay failed -1 for display :51 We run fully stock PAM, no customizations. In any event, does anyone have any theories as to how to debug 26B's? In the most recent case, card removal corrected (went back to solaris login screen), and card insert worked exactly as expected and 26B was gone. Please, any ideas on trouble shooting? We don't even really know what likely candidates for a 26B are to start a process. Thanks, Devin From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate Sent: Tuesday, January 04, 2011 6:00 PM To: 'sunray-users@filibeto.org' Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1 Hi Folks; We recently applied some sun ray patches, and are now experiencing some new problems. In particular, the following updates: 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches were applied. a. Performed prescribed reboots, single user patching, and configuration reboots. 2. Applied Java SDK 1.6.0_23, and activated as default java instance. Previously, we were at 1.5 (for some unknown reason). 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), as well as SRWC 2.3 (previously 2.2). The new undesirable behavior we are seeing: 1. Our users all use a Kiosk app to access windows terminal servers using uttsc. Intermittently, with card inserted and the kiosk app running (i.e. uttsc), the sun ray will display a 26B dialog box floating around for no apparent reason. The user is still able to fully use the system, just the annoyance of the 26B window floating around. a. Our policy requires full encryption + client authentication. These dtu's are all authenticated and have worked continuously for a long time without this symptom. b. A stop-A tends to be able to make it go away. It ?sometimes? comes back after ?some? unknown period of time, and doesn't impact all users. c. Removal of the card properly takes the user back to the standard login solaris login screen. Re-insertion of the card back to the terminal server session. d. Seen on SunRay 2 DTU and new SunRay 3. 2. Possibly related: When in the srs management website, after first patching (and still viewable), sessions show as disconnected that are clearly connected. In the most extreme case, I was in a OVDC session logged into a kiosk session (into a terminal server). I was on that terminal server looking at the srs website, at my session, which was identified as 'disconnected'. Further investigation showed that on initial login, the session shows as connected for a few minutes, after which it goes to disconnected, even though nothing becomes disconnected (i.e. still using the session). Likewise, reconnection works just fine. Any feedback helpful, thanks, Devin ___ SunRay-Users mailing list SunRay-Users@filibeto.org http://www.filibeto.org/mailman/listinfo/sunray-users
Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1
26B means that communications between the client and the X server (or YUV client) has been interrupted. Typical causes are: - network failure - X server crashed, and wasn't properly restarted - X server hung - it's a YUV session (used to display error or state-indicating icons without an X server) and the YUV client (yuvfile) isn't rendering to the screen properly or has died undetected So the first step is to test whether you can ping the DTU from the server. If you can, you need to investigate the X server. I wouldn't suspect that PAM could cause the X server to become horribly hung, although theoretically anything could tickle a bug resulting in an X server crash. Seems unlikely to me. It appears you're on Solaris. Is it Solaris 10 or OpenSolaris? What version and patch rev of SRSS? I'd first of all try to identify the session that the client is supposedly servicing. utwho -ca should help there. What type of session is it? Is it a greeter, RHA (session locked), YUV (error icon of some sort), or logged-in session? You can look in /var/opt/SUNWut/displays/DISPLAYNUM at the SESSION_TYPE. If it's default then the Display Manager (dtlogin for S10, GDM for OpenSolaris/S11/Linux) is responsible to restart the server if it dies. Otherwise it's SRSS's responsibility. Then I'd look for the Xnewt process servicing that display. Is there one? If so, I'd try a pstack and also look in the Xserver error log for clues: S10: /var/dt/Xerrors S11 (or Linux): /var/log/gdm/:DISPLAYNUM Sometimes we've observed that 26 can occur when /tmp gets clobbered and the /tmp/SUNWut directory structure has been disturbed, or the host has run out of VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed to. We do a lot of book-keeping in that area and if it's corrupted the software can misbehave. Check /var/adm/messages for signs of VM starvation. -Bob On 01/11/11 13:30, Devin Nate wrote: Dear SunRay users (hopefully Bob, Craig also?); We're continuing to have problems with random 26B's, and don't even know where to begin to help debug. We don't see any particular errors in log files, besides several pam messages. The 26B's are happening on new sun ray 3's and older sun ray 2's. So far, our only theory is a PAM problem ... we appear to have many of the following style messages. dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: ut_getTokenByDisplay failed -1 for display :51 We run fully stock PAM, no customizations. In any event, does anyone have any theories as to how to debug 26B's? In the most recent case, card removal corrected (went back to solaris login screen), and card insert worked exactly as expected and 26B was gone. Please, any ideas on trouble shooting? We don't even really know what likely candidates for a 26B are to start a process. Thanks, Devin From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate Sent: Tuesday, January 04, 2011 6:00 PM To: 'sunray-users@filibeto.org' Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1 Hi Folks; We recently applied some sun ray patches, and are now experiencing some new problems. In particular, the following updates: 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches were applied. a. Performed prescribed reboots, single user patching, and configuration reboots. 2. Applied Java SDK 1.6.0_23, and activated as default java instance. Previously, we were at 1.5 (for some unknown reason). 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), as well as SRWC 2.3 (previously 2.2). The new undesirable behavior we are seeing: 1. Our users all use a Kiosk app to access windows terminal servers using uttsc. Intermittently, with card inserted and the kiosk app running (i.e. uttsc), the sun ray will display a 26B dialog box floating around for no apparent reason. The user is still able to fully use the system, just the annoyance of the 26B window floating around. a. Our policy requires full encryption + client authentication. These dtu's are all authenticated and have worked continuously for a long time without this symptom. b. A stop-A tends to be able to make it go away. It ?sometimes? comes back after ?some? unknown period of time, and doesn't impact all users. c. Removal of the card properly takes the user back to the standard login solaris login screen. Re-insertion of the card back to the terminal server session. d. Seen on SunRay 2 DTU and new SunRay 3. 2. Possibly related: When in the srs management website, after first patching (and still viewable), sessions show as disconnected that are clearly connected. In the most extreme case, I was in a OVDC session logged into a kiosk session (into a terminal server). I was on that terminal server looking at the srs website, at my session, which was identified as 'disconnected'. Further investigation showed that on
Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1
Just to emphasize something Bob mentioned, make sure you don't have a cron job that periodically deletes stuff in /tmp. This is not uncommon on a lot of non-SRS systems but can be disastrous on a SRS box. Scott -Original Message- From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Bob Doolittle Sent: Tuesday, January 11, 2011 10:56 AM To: SunRay-Users mailing list Subject: EXTERNAL:Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1 26B means that communications between the client and the X server (or YUV client) has been interrupted. Typical causes are: - network failure - X server crashed, and wasn't properly restarted - X server hung - it's a YUV session (used to display error or state-indicating icons without an X server) and the YUV client (yuvfile) isn't rendering to the screen properly or has died undetected So the first step is to test whether you can ping the DTU from the server. If you can, you need to investigate the X server. I wouldn't suspect that PAM could cause the X server to become horribly hung, although theoretically anything could tickle a bug resulting in an X server crash. Seems unlikely to me. It appears you're on Solaris. Is it Solaris 10 or OpenSolaris? What version and patch rev of SRSS? I'd first of all try to identify the session that the client is supposedly servicing. utwho -ca should help there. What type of session is it? Is it a greeter, RHA (session locked), YUV (error icon of some sort), or logged-in session? You can look in /var/opt/SUNWut/displays/DISPLAYNUM at the SESSION_TYPE. If it's default then the Display Manager (dtlogin for S10, GDM for OpenSolaris/S11/Linux) is responsible to restart the server if it dies. Otherwise it's SRSS's responsibility. Then I'd look for the Xnewt process servicing that display. Is there one? If so, I'd try a pstack and also look in the Xserver error log for clues: S10: /var/dt/Xerrors S11 (or Linux): /var/log/gdm/:DISPLAYNUM Sometimes we've observed that 26 can occur when /tmp gets clobbered and the /tmp/SUNWut directory structure has been disturbed, or the host has run out of VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed to. We do a lot of book-keeping in that area and if it's corrupted the software can misbehave. Check /var/adm/messages for signs of VM starvation. -Bob On 01/11/11 13:30, Devin Nate wrote: Dear SunRay users (hopefully Bob, Craig also?); We're continuing to have problems with random 26B's, and don't even know where to begin to help debug. We don't see any particular errors in log files, besides several pam messages. The 26B's are happening on new sun ray 3's and older sun ray 2's. So far, our only theory is a PAM problem ... we appear to have many of the following style messages. dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: ut_getTokenByDisplay failed -1 for display :51 We run fully stock PAM, no customizations. In any event, does anyone have any theories as to how to debug 26B's? In the most recent case, card removal corrected (went back to solaris login screen), and card insert worked exactly as expected and 26B was gone. Please, any ideas on trouble shooting? We don't even really know what likely candidates for a 26B are to start a process. Thanks, Devin From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate Sent: Tuesday, January 04, 2011 6:00 PM To: 'sunray-users@filibeto.org' Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1 Hi Folks; We recently applied some sun ray patches, and are now experiencing some new problems. In particular, the following updates: 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches were applied. a. Performed prescribed reboots, single user patching, and configuration reboots. 2. Applied Java SDK 1.6.0_23, and activated as default java instance. Previously, we were at 1.5 (for some unknown reason). 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), as well as SRWC 2.3 (previously 2.2). The new undesirable behavior we are seeing: 1. Our users all use a Kiosk app to access windows terminal servers using uttsc. Intermittently, with card inserted and the kiosk app running (i.e. uttsc), the sun ray will display a 26B dialog box floating around for no apparent reason. The user is still able to fully use the system, just the annoyance of the 26B window floating around. a. Our policy requires full encryption + client authentication. These dtu's are all authenticated and have worked continuously for a long time without this symptom. b. A stop-A tends to be able to make it go away. It ?sometimes? comes back after ?some? unknown period of time, and doesn't impact all users. c. Removal of the card properly takes the user back to the
Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1
Hi Bob, Sun Ray Users; Thank you so much. I think we've reviewed many of these problems but will review in context of your email here. This Sun Ray env has been in operation without the 26B's before the patches (SRS 5.0 - SRS 5.1.1, Solaris was patched via smpatch roughly 300 patchsets, and java 1.5 - java 1.6.0_23 32bit). My concern is a newly introduced bug. 1. ping's all work. It's not a specific dtu either, although there's a chance this is more prevalent on the newer Sun Ray 3's. Even removing a card brings the session back alive/no lack of network connectivity. 2. Xserver is Xnewt in our case. We (maybe) got a new one going from SRSS patchlevel -03 to -06 (srs 5.1.1). Will try to dig into Xnewt some more. Reviewing /var/dt/Xerrors now. Nothing particularly out of the ordinary in the cursory review but checking in depth. 3. Solaris 10 (not OpenSolaris, not Solaris 11) on Sun hardware, supported with current paid contract from Oracle. All patches from Oracle (smpatch update). Also paid for RTU's and maint (we run enterprise supported environment). 4. SRSS is 5.1.1 (140994-06). Was previously 5.0 with 140994-03. SRWC is 2.3 now. 5. Will check utwho -ac output. 6. All working sessions have default session type. I will need to wait till the next 26B report to see if it is default or something different. We do use dtlogin. Presumably, we may have got a new dtlogin program or supporting files with new Solaris patches. 7. We go out of our way to NOT clobber /tmp - absolutely no cleanup scripts. df reports tons of free space. swap 14399605380 1434580 1%/tmp swap 1435504 924 1434580 1%/var/run 8. /var/adm/messages reports the standard SRWC message, plus a pam message. We know the SRWC message, just indicating a windows session shutdown. The pam one's in this file cause us to suspect people accidentially pressing buttons / book on keyboard /etc at dtlogin session login. Sun Ray Connector proxy:[10389]: [ID 855542 user.error] Child closed socket prematurely, session shutdown dtlogin[14867]: [ID 937900 user.error] sunray_get_user:isValidUsername: Invalid characters found in username dtlogin[14867]: [ID 817952 user.error] sunray_get_user:pam_sm_auth: Username validation failed: Error -1 Thanks and will update as we find more info. Devin -Original Message- From: Bob Doolittle [mailto:bob.doolit...@oracle.com] Sent: Tuesday, January 11, 2011 11:56 AM To: SunRay-Users mailing list Cc: Devin Nate Subject: Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1 26B means that communications between the client and the X server (or YUV client) has been interrupted. Typical causes are: - network failure - X server crashed, and wasn't properly restarted - X server hung - it's a YUV session (used to display error or state-indicating icons without an X server) and the YUV client (yuvfile) isn't rendering to the screen properly or has died undetected So the first step is to test whether you can ping the DTU from the server. If you can, you need to investigate the X server. I wouldn't suspect that PAM could cause the X server to become horribly hung, although theoretically anything could tickle a bug resulting in an X server crash. Seems unlikely to me. It appears you're on Solaris. Is it Solaris 10 or OpenSolaris? What version and patch rev of SRSS? I'd first of all try to identify the session that the client is supposedly servicing. utwho -ca should help there. What type of session is it? Is it a greeter, RHA (session locked), YUV (error icon of some sort), or logged-in session? You can look in /var/opt/SUNWut/displays/DISPLAYNUM at the SESSION_TYPE. If it's default then the Display Manager (dtlogin for S10, GDM for OpenSolaris/S11/Linux) is responsible to restart the server if it dies. Otherwise it's SRSS's responsibility. Then I'd look for the Xnewt process servicing that display. Is there one? If so, I'd try a pstack and also look in the Xserver error log for clues: S10: /var/dt/Xerrors S11 (or Linux): /var/log/gdm/:DISPLAYNUM Sometimes we've observed that 26 can occur when /tmp gets clobbered and the /tmp/SUNWut directory structure has been disturbed, or the host has run out of VM/swap space at some point and couldn't write to /tmp/SUNWut when it needed to. We do a lot of book-keeping in that area and if it's corrupted the software can misbehave. Check /var/adm/messages for signs of VM starvation. -Bob On 01/11/11 13:30, Devin Nate wrote: Dear SunRay users (hopefully Bob, Craig also?); We're continuing to have problems with random 26B's, and don't even know where to begin to help debug. We don't see any particular errors in log files, besides several pam messages. The 26B's are happening on new sun ray 3's and older sun ray 2's. So far, our only theory is a PAM problem ... we appear to have many of the following style messages. dtlogin[990]: [ID 691260
Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1
Hi Craig; Thanks for the follow up. 1. Unless a solaris patchset added a /tmp cleaner, we absolutely do not do this. Our kiosk code requires readonly access to a perfect copy of /tmp/SUNWut - anything that messes with that would be horrible for us. 2. Yes, I was personally working on a station, uttsc kiosk session active and usable, and 26B floating around. However, today, we watched a different similar scenario, where a user was working, GOT the 26B floating around and unable to proceed. Removed card, the solaris dtlogin came up. Re-inserted their card and got back to their session as normal, no more 26B. It's not consistent. 3. Java was upgraded from 1.5 to 1.6.0_23 Solaris x86 32-bit. 4. Will review utsession -p when we get our next 26B. We may have one, have a remote user who left for lunch with one but I can't reach right now. 5. All of our sessions are kiosk sessions, so I'd say yes. 6. /etc/opt/SUNWut/jre points to /usr/java, which is the newly installed 1.6.0_23: /etc/opt/SUNWut/jre/bin/java -version java version 1.6.0_23 Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) Server VM (build 19.0-b09, mixed mode) 7. Yes, updated all dtu's to GUI4.2_140993-06_2010.10.08.21.53. Both members of FOG equally updated. Just double checked. Just checked on version info on DTU's and they do represent that level. 8. Don't believe we have any disk space issues. Our 'smallest' member of the fog reports no more than 25% disk utilization. 9. Very custom kiosk script in perl. However, it essentially runs 2 utactions and completes with it essentially calls uttsc. The final perl is: system(/opt/SUNWuttsc/bin/uttsc ...); 10. Again, no /tmp cleanup processes at all. Thanks, Devin -Original Message- From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Craig Bender Sent: Tuesday, January 11, 2011 12:22 PM To: SunRay-Users mailing list Subject: Re: [SunRay-Users] Bob, Craig - Sun Ray 26B - several models - SRS 5.1.1 Hi Devin, Just to be clear, users are actively using the client and can continue to do so when the On Screen Display pops up with 26B? OSD code of 26 is basically telling you the DTU is waiting on the Xserver to start sending it traffic. I've seen a few different causes for code 26. Wrong Java version, crashing Xservers, a cron job that cleans out /tmp periodically and deletes critical session files /tmp/SUNWut. But I don't think I've ever seen a case where the user could keep on using the session. A few questions. What does utsession -p report when the OSD is on the screen? Does it only happen to kiosk sessions? Does /etc/opt/SUNWut/jre point to a 32 bit version of the 1.6 JRE? Did you update the DTUs with the new firmware image? No disk space issues? No cleanup scripts in crontab that might be clearing out /tmp Are you using the built-in Kiosk Script for SRWC? Any customizations there? On 1/11/11 10:30 AM, Devin Nate wrote: Dear SunRay users (hopefully Bob, Craig also?); We're continuing to have problems with random 26B's, and don't even know where to begin to help debug. We don't see any particular errors in log files, besides several pam messages. The 26B's are happening on new sun ray 3's and older sun ray 2's. So far, our only theory is a PAM problem ... we appear to have many of the following style messages. dtlogin[990]: [ID 691260 user.notice] pam_sunray_hotdesk:pam_sm_auth: ut_getTokenByDisplay failed -1 for display :51 We run fully stock PAM, no customizations. In any event, does anyone have any theories as to how to debug 26B's? In the most recent case, card removal corrected (went back to solaris login screen), and card insert worked exactly as expected and 26B was gone. Please, any ideas on trouble shooting? We don't even really know what likely candidates for a 26B are to start a process. Thanks, Devin From: sunray-users-boun...@filibeto.org [mailto:sunray-users-boun...@filibeto.org] On Behalf Of Devin Nate Sent: Tuesday, January 04, 2011 6:00 PM To: 'sunray-users@filibeto.org' Subject: [SunRay-Users] Sun Ray 26B - several models - SRS 5.1.1 Hi Folks; We recently applied some sun ray patches, and are now experiencing some new problems. In particular, the following updates: 1. Applied all patches from smpatch for Solaris 10, x86-64. Approx 300 patches were applied. a. Performed prescribed reboots, single user patching, and configuration reboots. 2. Applied Java SDK 1.6.0_23, and activated as default java instance. Previously, we were at 1.5 (for some unknown reason). 3. Applied SRS 5.1.1. We were previously at patch level -03 (now -06 it seems), as well as SRWC 2.3 (previously 2.2). The new undesirable behavior we are seeing: 1. Our users all use a Kiosk app to access windows terminal servers using uttsc. Intermittently, with card inserted and the kiosk app running (i.e. uttsc), the sun ray will display