Hello, I still didn't find the time to make a blog post about this. So I just put the code on pastebin:
http://pastebin.org/31242 I'm looking forward to your feedback :) I tested this filter on Jetty and Tomcat (with Firefox' user agent switcher) where it worked fine. However, as stated in the code, some app servers might behave a little different, so YMMV. greetings, Rüdiger Am Montag, den 14.04.2008, 16:37 +0200 schrieb Korbinian Bachl - privat: > Yeah, its quite a shame that google doesnt open source their logic ;) > > would be nice if you could give us the code however, so we could have a > look at it :) > > Rüdiger Schulz schrieb: > > Hm, SEO is really a little bit like black science sometimes *g* > > > > This (german) article states, that SID cloaking would be ok for google: > > http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking > > > > Some more googling, and here someone seems to confirm this: > > http://www.webmasterworld.com/cloaking/3201743.htm > > " I was actually at SMX West and Matt Cutts specifically sa*id* that this is > > OK" > > > > All I can say in our case is that I added this filter several months ago, > > and I can't see any negative effects so far. > > > > > > greetings, > > > > Rüdiger > > > > > > 2008/4/14, Korbinian Bachl - privat <[EMAIL PROTECTED]>: > >> Hi Rüdiger, > >> > >> AFAIK this could lead to some punishment by google, as he browses the site > >> multiple times using different agents and origin IPs and in case he sees > >> different behaviours he thinks about cloaking/ prepared content and will > >> act > >> accordingly to it; > >> > >> This is usually noticed after the regular google index refreshes that > >> happen some times a year - you should keep an eye onto this; > >> > >> Best, > >> > >> Korbinian > >> > >> Rüdiger Schulz schrieb: > >> > >>> Hello everybody, > >>> > >>> I just want to add my 2 cents to this discussion. > >>> > >>> At IndyPhone we too wanted to get rid of jesessionid-URLs in google's > >>> index. > >>> Yeah, it would be nice if the google bot would be as clever as the one > >>> from > >>> yahoo, and just remove them himself. But he doesn't. > >>> > >>> So I implemented a Servlet-Filter which checks the user agent header for > >>> google bot, and skips the url rewriting just for those clients. As this > >>> will > >>> generate lots of new sessions, the filter invalidates the session right > >>> after the request. Also, if a crawler is doing a request containing a > >>> jsessionid (which he stored before the filter was implemented), he > >>> redirects > >>> the crawler to the same URL, just without the jsessionid parameter. That > >>> way, the index will be updated for those old URLs. > >>> > >>> Now we have almost none of those URLs in google's index. > >>> > >>> If anyone is interested in the code, I'd be willing to publish this. As > >>> it > >>> is not wicket specific, I could share it with some generic servlet tools > >>> OS > >>> project - is there something like that on apache or elsewhere? > >>> > >>> But maybe Google is smarter by now, and it is not required anymore? > >>> > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
