On Jan 27, 2010, at 9:47am, Claudio Martella wrote:
Hello,
i'm crawling our intranet site, i see that the default configuration
normalizes urls removing '?', which means no queries. This is
basically
saying that you crawl just static data. most of our table-based sites
are handled with paging with '? and = ' queries, like 99% out there.
What is the rationale behind this choice then?
If you don't exclude queries, you can get a huge number of URLs that
point to the same content, but have different query parameters.
But I agree, these days many sites rely on query parameters to get to
specific content, so having this normalization enabled by default is
odd.
-- Ken
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst
TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g