We treat crawl bots in a special way. Basically, we have 1 session for all
of them, and restore them to that session id.
So we override WOApplication’s createSessionForRequest(WORequest request)
We then check the user-agent against a list of known bots, and assign them
to a single session. Something like this:
@Override
public WOSession createSessionForRequest(WORequest request) {
String userAgent = request.headerForKey("user-agent");
WOSession session = null;
if (userAgent != null && userAgent.toLowerCase().contains(“googlebot")) {
session = super.sessionStore().checkOutSessionWithID(sessionIDForRobots,
request);
if (session == null) {
session = super.createSessionForRequest(request);
session.setStoresIDsInCookies(true);
session.setStoresIDsInURLs(false); //this means any urls generated for bots
will be without the session id
* sessionIDForRobots = session.sessionID();*
log.debug("NEW SESSION CREATED FROM " + SHWORequestUtilities.clientIP(
request) + " (" + userAgent + ")");
} else {
// no session created, so we need to "fix" the activeSessionsCount, which
is incremented on every call to this method
ERXKeyValueCodingUtilities.takePrivateValueForKey(this,
activeSessionsCount() - 1, "_activeSessionsCount");
}
log.debug("Known bot hitting application. User-Agent: " + userAgent);
} else {
session = super.createSessionForRequest(request);
log.debug("NEW SESSION CREATED FROM " + SHWORequestUtilities.clientIP(
request) + " (" + userAgent + ")");
}
return session;
}
The only oddity is the when we’re checking out an existing session (bots),
the session count goes up even though no session is created. I don’t
remember where I got that bit of code. :) This has been in production for
a long time and hasn’t caused us any problems, but your mileage may vary.
Good luck,
Lon
On Thu, Mar 24, 2016 at 10:21 AM, OC <[email protected]> wrote:
> Hello there,
>
> one of my applications drained its memory quickly; it looks like the
> primary cause was creation of thousands sessions, which itself was caused
> by lots of Google requests containing spurious (probably stored years ago)
> session IDs. It seems each such request creates a new session, and hilarity
> quickly ensues.
>
> Of course, moving session IDs to cookies would help (and kicking out those
> bloody Google bots from the server would help tremendously), but I wonder...
>
> ... first, sometimes session IDs in URLs are needed, e.g., to allow
> concurrent work in different sessions from more browser tabs/windows.
> Besides, an attack can be created maliciously with spurious session IDs in
> cookies just as well.
>
> Is there a known and tested way to reliably prevent this kind of
> session-induced death? Some trick to create new sessions only for valid
> requests? Perhaps handleSessionRestorationErrorInContext redirecting to a
> static address without a session ID, or something like that?
>
> Thanks,
> OC
>
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list ([email protected])
> Help/Unsubscribe/Update your Subscription:
>
> https://lists.apple.com/mailman/options/webobjects-dev/lon.varscsak%40gmail.com
>
> This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com
This email sent to [email protected]