We treat crawl bots in a special way.  Basically, we have 1 session for all
of them, and restore them to that session id.

So we override WOApplication’s createSessionForRequest(WORequest request)

We then check the user-agent against a list of known bots, and assign them
to a single session.  Something like this:

@Override

public WOSession createSessionForRequest(WORequest request) {

String userAgent = request.headerForKey("user-agent");

WOSession session = null;


if (userAgent != null && userAgent.toLowerCase().contains(“googlebot")) {

session = super.sessionStore().checkOutSessionWithID(sessionIDForRobots,
request);


if (session == null) {

session = super.createSessionForRequest(request);

session.setStoresIDsInCookies(true);

session.setStoresIDsInURLs(false); //this means any urls generated for bots
will be without the session id

*                   sessionIDForRobots = session.sessionID();*

log.debug("NEW SESSION CREATED FROM " + SHWORequestUtilities.clientIP(
request) + " (" + userAgent + ")");

} else {

// no session created, so we need to "fix" the activeSessionsCount, which
is incremented on every call to this method

ERXKeyValueCodingUtilities.takePrivateValueForKey(this,
activeSessionsCount() - 1, "_activeSessionsCount");

}


log.debug("Known bot hitting application. User-Agent: " + userAgent);

} else {

session = super.createSessionForRequest(request);

log.debug("NEW SESSION CREATED FROM " + SHWORequestUtilities.clientIP(
request) + " (" + userAgent + ")");

}


return session;

}

The only oddity is the when we’re checking out an existing session (bots),
the session count goes up even though no session is created.  I don’t
remember where I got that bit of code. :)  This has been in production for
a long time and hasn’t caused us any problems, but your mileage may vary.

Good luck,

Lon

On Thu, Mar 24, 2016 at 10:21 AM, OC <[email protected]> wrote:

> Hello there,
>
> one of my applications drained its memory quickly; it looks like the
> primary cause was creation of thousands sessions, which itself was caused
> by lots of Google requests containing spurious (probably stored years ago)
> session IDs. It seems each such request creates a new session, and hilarity
> quickly ensues.
>
> Of course, moving session IDs to cookies would help (and kicking out those
> bloody Google bots from the server would help tremendously), but I wonder...
>
> ... first, sometimes session IDs in URLs are needed, e.g., to allow
> concurrent work in different sessions from more browser tabs/windows.
> Besides, an attack can be created maliciously with spurious session IDs in
> cookies just as well.
>
> Is there a known and tested way to reliably prevent this kind of
> session-induced death? Some trick to create new sessions only for valid
> requests? Perhaps handleSessionRestorationErrorInContext redirecting to a
> static address without a session ID, or something like that?
>
> Thanks,
> OC
>
>
>  _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list      ([email protected])
> Help/Unsubscribe/Update your Subscription:
>
> https://lists.apple.com/mailman/options/webobjects-dev/lon.varscsak%40gmail.com
>
> This email sent to [email protected]
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to