On Jan 27, 2010, at 9:47am, Claudio Martella wrote:

Hello,

i'm crawling our intranet site, i see that the default configuration
normalizes urls removing '?', which means no queries. This is basically
saying that you crawl just static data. most of our table-based sites
are handled with paging with '? and = ' queries, like 99% out there.
What is the rationale behind this choice then?

If you don't exclude queries, you can get a huge number of URLs that point to the same content, but have different query parameters.

But I agree, these days many sites rely on query parameters to get to specific content, so having this normalization enabled by default is odd.

-- Ken


Claudio Martella
Digital Technologies
Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to