subject:"url normalization"

url normalization

2010-01-27 Thread Claudio Martella

Hello, i'm crawling our intranet site, i see that the default configuration normalizes urls removing '?', which means no queries. This is basically saying that you crawl just static data. most of our table-based sites are handled with paging with '? and = ' queries, like 99% out there. What is

Re: url normalization

2010-01-27 Thread Ken Krugler

On Jan 27, 2010, at 9:47am, Claudio Martella wrote: Hello, i'm crawling our intranet site, i see that the default configuration normalizes urls removing '?', which means no queries. This is basically saying that you crawl just static data. most of our table-based sites are handled with

Re: url normalization

2010-01-27 Thread Claudio Martella

Ken Krugler wrote: On Jan 27, 2010, at 9:47am, Claudio Martella wrote: Hello, i'm crawling our intranet site, i see that the default configuration normalizes urls removing '?', which means no queries. This is basically saying that you crawl just static data. most of our table-based sites

Re: url normalization

2010-01-27 Thread Jesse Hires

This also prevents things like over indexing generated calendars where the next day/month/year link will always produce output no matter how far it goes. Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice // Guaranteed to be random } // xkcd.com On Wed,

Re: url normalization

2010-01-27 Thread Claudio Martella

I do understand this problem. But then, how do you handle this? avoiding completely the queries is suicide. Google indexes queries. How do you think it can do it? Jesse Hires wrote: This also prevents things like over indexing generated calendars where the next day/month/year link will always

URL normalization ...

2009-03-22 Thread David M. Cole

-insensitive. I have Google'd Nutch URL normalization, but those postings seem to deal with issues such as http://my.domain.com:80/ vs. http://my.domain.com/ ... Any thoughts about how to resolve this (admittedly minor) problem would be appreciated. Thanks. \dmc

url normalization

2007-12-05 Thread Lyndon Maydwell

Is there a way to apply regex normalization on the urls currently in the database? e.g. I would like to make www.asdf.com equivalent to asdf.com

url normalization

Re: url normalization

Re: url normalization

Re: url normalization

Re: url normalization

URL normalization ...

url normalization

7 matches

Site Navigation

Mail list logo

Footer information