AW: AW: AW: URL scheme for Wicket 2.0

Korbinian Bachl Sat, 04 Nov 2006 07:37:21 -0800

 

> -----Ursprüngliche Nachricht-----
> 
> How would I detect a crawler? By the user agent string.
> This is not cloaking. Nor it is a sneaky redirect! In fact, 
> there is no redirect for crawler at all.
> 
> The only difference is that if there is a link in document 
> /my/page and crawler follows the link, the page gets displayed
> 
> However, if a regular visitor (not a crawler) follows the 
> link, he is redirected to /my/page[24] (for example)
> 
> It's either a) or b). Google (nor any other crawler) won't 
> see both of those.
> 
> The idea is to hide as much session relative stuff from 
> google as possible.
>


Ah! - here lies the problem. You think a crawler is coming, saying he is a
crawler and then indexing. In reality it will go similar to this:

-> search engine wants to index foo.com/bar
-> spider goes to foo.com/bar, having user agent "Google Bot" and known
google.com IP
-> data from spider is saved by google.com
-> some time goes by
-> 2nd spider to foo.com/bar, having user agent "IE 6.0" (or any other
possible browser) and unknown IP, however this is also a spider
-> data from spider2 is savedby google.com
-> as the results from spider1, and 2, are not the same, the procedure is
rewinded - the result is same: actions in case of "Googlebot" is not same as
in case of "IE 6.0"
-> site is marked as cloaked, not visited anymore and banned from index

this behavior is known - spiders rarely say that they are spiders, often
faking user agent and IPs to detect frauds - that youre not doing any fraud
doenst care google - they see youre behaviour and react to their guidelines.
Just look that even having a JavaScript redirects for different user agents
can be treated as claoking (I refer to the action google did against bmw.de
and some other very big companies some months ago).

Regards,

Korbinian

AW: AW: AW: URL scheme for Wicket 2.0

Reply via email to