I've gone ahead and moved the referrer spam checking logic into the RefererFilter to make sure that we can send an appropriate response to spammers. The drawback is that the RefererFilter now requires a lookup of a WebsiteData object per request, and that can be time consuming.
I *very* highly recommend that we add an L2 Hibernate cache for the WebsiteData object to help counteract this. Without it the asynchronous referrer processing is much less effective; -- Allen On Wed, 2006-01-04 at 14:40, Allen Gilliland wrote: > On Tue, 2006-01-03 at 13:51, David M Johnson wrote: > > On Jan 3, 2006, at 4:01 PM, Matthew Schmidt wrote: > > > Definitely useful, but I question how we plan on blocking requests > > > from > > > referrers that are bad? If everything is pushed into the queue, > > > wouldn't > > > the request just continue as normal with the blacklist processing > > > happening > > > later? > > > > Yes, that appears to be a shortcoming of this proposal. > > > > If we want to answer referrer spammers with a 403 access denied, as > > we do now, then I guess we could do something like this: when the > > request comes in, check it against the blacklist, which is in memory. > > If it matches, then pitch it out with a 403. Otherwise, put it in the > > queue for storage in the DB. > > > > With that approach, we'd still do some work for each referrer but we > > wouldn't have to hit the DB. > > I don't mind doing that, but currently the spam checker stuff wants a > full WebsiteData object passed in to do the spam check, and that means a > trip to the db. So we would need a way to check the blacklist without > requiring any objects from the db. > > I don't see anywhere that would cache a weblog specific blacklist, so > I'm not sure how to make that work. That means any way we would hack > this it couldn't check a weblog specific blacklist. Maybe it's good > enough even if we don't check the weblog custom blacklist? > > Another idea is to create a special SpamFilter which would check the > spam itself and return 403 responses. The problem is still the same > though, we wouldn't want to put that in front of the cache filters > because then you are hitting the db on every request just to check for > referrer spam. So that wouldn't work unless it was specifically > designed to cache the weblog customized blacklists. If the custom > blacklists are cached then it would probably be okay to put it as one of > the first filters in line. I don't know how big those blacklists could > get though. > > -- Allen > > > > > > - Dave > > > > > > > > > > -Matt > > > > > > -----Original Message----- > > > From: Allen Gilliland [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, January 03, 2006 3:18 PM > > > To: [email protected] > > > Subject: Proposal: Asynchronous Referrer Processing > > > > > > This is already linked on the Roller 2.2 proposal page, but I thought > > > I'd send it out directly as well. > > > > > > http://rollerweblogger.org/wiki/Wiki.jsp? > > > page=AsynchronousReferrerProcessing > > > > > > This will allow Roller admins to optionally process referrers in an > > > asynchronous manner, i.e. not tied to the http request/response cycle. > > > > > > Thoughts/comments always welcome. > > > > > > -- Allen > > >
