Re: Questions about SecondLevelCacheSessionStore

Ryan Holmes Wed, 21 Feb 2007 01:29:06 -0800


On Feb 20, 2007, at 1:27 AM, Johan Compagner wrote:

Even if the session is many times larger than a single page, it
*could* be cheaper to write the entire thing to disk or a database
when the session goes "idle" than to perform lots of small writes (I
assume it's one per request for stateful pages).
not really, because what is still takes more time then the actualwritingis the serialization., under really heavy load this sort of crawlsto each
other
but under normal load the serialization (that is just constant) ismuch much
higher
so if we would just constantly jump the complete session then the
serialization
would just cost more and more. And saving small things shouldn'tmatter to
much
a disk have a cache and will save many small new files pretty muchjust as
one.

Ahhh, I wasn't thinking about the serialization cost -- makes sense.Whether or not it's actually slower than the write should depend onCPU vs disk throughput (you could have a fast, modern CPU with asingle SATA drive that's already heavily loaded, for example). But inany case, serialization cost == CPU load and that's obviously a bigfactor.

Regarding a disk cache, I don't believe write acceleration is enabledon most drives due to the inherent danger. Battery-backed writecaches in combination with a RAID controller aren't too uncommon, andthose will indeed coalesce small write operations (as will the OScache, I believe). Nonetheless, when I benchmarked the write speed ofone of our new servers a few weeks ago (no battery backed writecache, but RAID 5 w/15k drives), I got nearly twice the throughputwith 1MB writes as I did with 256K writes. I don't know if thatslowing trend continues below 256K -- perhaps write coalescing wouldstart to kick in at that point.

as for clustering, i don't think that is a big problem, sticky
> session is
> the way to go
> anyway (i don't believe in the none sticky session variant at all,
> thats
> doesn't gain you anything)
I beg to differ here. If you don't have to rely on sticky sessions
then you get rock-solid failover and can therefore add and remove
nodes at will (you're basically failing over constantly under normal
operating conditions). The other big advantage is improved resource
utilization across the farm. Your load balancers distribute on a
request-by-request basis, and that can make a significant difference
in overall throughput when compared to distributing on a session-by-
session basis.




this is just not true.
first: you get also rock solid failover if you use sticky sessions.
it doesn't mean that the sessions are not replicated at all!

most of the time you have a buddy system so one server has abackup server

and if the normal server fails the buddy system takes over.

and the utilization across the farm is also the same. If the a newsessionis round robin distributed then when you have 4 servers and 100sessionsdo come in then every server does get 25 sessions. It must bereally reallystrange then that one server is much much busier as the other. Thatis just

a
mathematically  calculation.  For example if of those 100 sessions 15
sessions

do terminate it is very likely that those are pretty equallyterminated

across all servers

By "rock solid" I meant more robust than a typical sticky sessionenvironment. With a non-sticky session configuration, you don't carewhich other servers might or might not have the replicated session --all servers have access to all sessions so any of them (includingmultiple nodes) can be pulled or can fail with no user interruptions.

While sessions will distribute evenly, what really matters withregard to resource utilization is the distribution of individualrequests and their operations. The finer grained model ofdistributing per request has an advantage in that the system as awhole reacts much faster to changing load (it's essentiallyinstantaneous).

The performance penalty that you get if you have true round robinrequest
are huge.
The session must be synced before it leaves the server to all servers.
(serialized and send over)
also frames/browser tabs or async ajax request are really horrible.What
happens if 2 request at the same
time goes to the farm? 1 request goes to A en the other to B. Then we
suddenly must have
clusterwide session locking! I want to avoid synchronize(xxx) injava as
much as possible
because that is a real concurrency killer but a
synchronizeOverCluster(session) would be really horrible.

That's not really an issue in a "typical" non-sticky sessionconfiguration: session data is read at the beginning of a requestand, if the session is updated, written at the end of the request toa central database. The idea is to push concurrency issues to thedatabase instead of handling them in code. The DB can handlesimultaneous selects and/or updates from different servers easilyand, if you want serialized requests, you can use SELECT FOR UPDATEor the equivalent lock when you read the session row.

It's not a terribly elegant solution and it has a built-inbottleneck, plus a single point of failure that can only be partiallymitigated with database replication. However, it's extremely simpleand can perform better than you might think.

To be clear -- I'm not trying to convince anyone that non-stickysessions are the holy grail of scalability. The advantages of stickysessions (Igor brought up the cache data locality issue, and thereare plenty of others) can easily outweigh the advantages of non-sticky sessions. I just think it's a viable option in certainsituations. The more important point is that, imo, a web frameworkshouldn't dictate scalability choices if possible.

I'd like to run some tests when the SLCSS file store gets to a point
where you're reasonably happy with the performance. It will be
difficult to compare it to Tomcat's PersistentManager, but I'll try
to come up with something meaningful.
do remember we are trying to avoid serialization and writing of thefile in
the request thread itself
So the request can be fast. Only the background thread(s) should dothe
serialization and writing
The only time the request thread is doing something is when itcomes back in
and the page that it
wants to access is not yet serialized (saved or not doesn't matter,
serialized or not is the question here)
because it must be serialized before we can release the page againbecause
that request could
change the page.

johan

Thanks, I didn't quite understand that before. No question that it'sa compelling design.


-Ryan

Re: Questions about SecondLevelCacheSessionStore

Reply via email to