Paul Greidanus wrote:
Bob Doolittle wrote:
No, not if you avoid the "-c" option to utrestart. Restarting
utauthd is safe and will result only in a temporary disconnection
of the DTUs - they'll immediately bind to the other server.
But, the other server tells me that they won't have the login sessions
still connected.. possibly they could reconnect to the backup, but
that's not solving the problem..
I'm sorry, I don't follow you. What do you mean
"the other server tells me ..."?
Do you *want* people to lose their sessions on the backup
server? If so, use utrestart -c. If not, use utrestart. I don't
know your policy and whether you use smartcards or not,
but it's like I said before:
- If you are using smartcards, as soon as people insert their
smartcard, the dedicated server will do session location,
and if the smartcard has a session on the backup server, it
will connect to it.
- If not using smartcards, and using NSCM, as soon as the
user enters their name, if they have a session on the backup
server they will connect to it.
- If not using smartcards, and not using NSCM, or
- If using smartcards, and people left their cards in and don't
remove/reinsert them, and
- people don't power-cycle the Sun Ray
Then if people had a session on the backup server they will
not connect to it. They must remove/reinsert their card or
power cycle the DTU (and not login to the dedicated server)
to connect to their session on the backup server.
If you use NSCM, this isn't a problem, because session
location and load balancing happens after you enter
your user name, at which time both servers will be
up.
The only problem is people who leave their cards in
over night. In that case, if they don't remove and
then reinsert their cards (or power-cycle the Sun Ray)
in the morning, they'll log in and get a new session on
the dedicated server, which is not what you want.
Shouldn't there be policy against this sort of thing? It's a huge
security risk IMHO
Explain the security issue here please.
It's a physical security issue, like getting users to lock their
session, not a network security thing.. This is sort of especially
needed in my area because the card removal doesn't lock the session..
but, this is really dependant on the security needs of your environment.
As soon as a user's session is disconnected it is
automatically locked (see man -M /opt/SUNWut/man
utxlock). So card removal *should* lock the
session. If not, you may be seeing one of the
many xscreensaver bugs :-( If you have the latest
patch for xscreensaver it should work properly (do
you have pam_sunray.so as 'sufficient' on top of
the xscreensaver stack in /etc/pam.conf? You need
this, and depending on the order you installed
SRSS and xscreensaver patches and other upgrades
you might not have it).
Maybe you could write a script that issues "utsession -k"
to all the idle sessions?
We're discussing how we might add a feature to
disconnect DTUs and cause them to rebalance in
the next release.
Would this save session state, and move processes between servers?
No, that's impossible. Things like TCP connection
and hardware I/O state (think of hardware buffered
data in the process of being flushed to a device,
possibly flow-controlled) can't be moved from one
server to another. What we're envisioning would
simply solve the problem you requested. Without
NSCM, or if cards are inserted, after a disconnect
DTUs will reconnect and immediately be redirected
to servers which are online. With NSCM, the
initial "username" greeter session might connect to
an offline server, but once people insert their
username if they have no session on offline
servers they'll be redirected to an online server
for their new session.
Well, not impossible.. for instance, OpenBSD uses a virtual shared mac
address to keep session information between a CARP paried firewall, so
N machines have a single mac address and IP, which they use for
routing.. the problem becomes harder to solve when you have the
endpoint of the sessions on the machine,
Exactly. This is quite common - any application
that uses a network connection is subject to this
issue. And don't forget things like UARTs or
other devices with internal hardware state on the
server.
but you could do something like lockstep the 2 machines together, and
run them like that.. breaking off a half of the mirrored machines
wouldn't kill the sessions. But, now we're getting into redundant
hardware type of setup, which may be a nuke to kill a flea type of
setup..
Sorry, we don't make any redundant
hardware/lockstep fault tolerant machines any
more, and less and less manufacturers are doing
this. It was just not cost effective for the last
fraction of a percent of RAS.
I'm imagining something like a "disconnect all"
capability that would redistribute all idle sessions
across the servers which are currently up and
online.
Ok.. I wasn't aware this was an issue.. I'm not running a large enough
environment yet to have run into that issue..
But you are, with your backup/dedicated scenario!
If you could "disconnect all" on the backup and
dedicated servers right now, you wouldn't have to
worry about the scenarios I describe at the top of
this mail. There would be no possibility of any
"orphaned sessions". We'd probably have to do
some coordination in SRSS to avoid immediately
granting sessions to give time for all the servers
to start participating in LB/SL.
This would be very very cool.. but a huge challenge. It would be
very neat if you could use infiniband, or something and have HA
SunRay Sessions, not service, but sessions..
If you can show me a graduate thesis from
*anybody* showing how to move a general-purpose
process between two servers (barring redundant
Fault Tolerant hardware), I'll be very surprised.
We actually had a Stanford intern in our group
working on this for limited types of processes for
a while, but it didn't pan out in any way that was
useful.
I'd be surprised as well.. I can't see that it's impossible, but I
will concede impractical, and probably not financially fesable to
build this for such a small problem.. it's not the end of the world
for a few users to loose their session in the case of a server
failure. They're able to use their session much quicker then if they
had the hardware failure on a desktop under their desk.
My assumption stated above is non-redundant
hardware, and I still maintain that with such an
assumption this problem is intractable for the
general case of processes. I agree it can be
solved with redundant hardware, but then your
price tag more than doubles, and the computer
manufacturers who tried to make a living producing
such hardware are long gone (or at least those
lines of computers are - Sun tried this also at
one point).
-Bob
_______________________________________________
SunRay-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/sunray-users