Hi,

We use an adaptive scheduler for our crawl, this works fine for most cases but 
a specific type of page is crawled more often than it should. These are usually 
news or article archives such as news/archive/12345. Most websites generate 
these pages dynamically. The problem is that whenever a new item is posted, all 
news/archive/* pages become modified, every article or item shifts one position 
and changes thousands of URL's.

The problem of adaptive scheduling for these pages should be obvious by now. I 
have given it some thought the past few weeks but i haven't figured out a 
generic solution just yet so any advice or out-of-the-box ideas or very much 
appreciated!

Thanks
Markus

Reply via email to