Hi, We use an adaptive scheduler for our crawl, this works fine for most cases but a specific type of page is crawled more often than it should. These are usually news or article archives such as news/archive/12345. Most websites generate these pages dynamically. The problem is that whenever a new item is posted, all news/archive/* pages become modified, every article or item shifts one position and changes thousands of URL's.
The problem of adaptive scheduling for these pages should be obvious by now. I have given it some thought the past few weeks but i haven't figured out a generic solution just yet so any advice or out-of-the-box ideas or very much appreciated! Thanks Markus

