Any ideas?
-----Original message----- > From:Markus Jelsma <[email protected]> > Sent: Mon 02-Jul-2012 23:05 > To: [email protected] > Subject: Adaptive scheduling, but different > > Hi, > > We use an adaptive scheduler for our crawl, this works fine for most cases > but a specific type of page is crawled more often than it should. These are > usually news or article archives such as news/archive/12345. Most websites > generate these pages dynamically. The problem is that whenever a new item is > posted, all news/archive/* pages become modified, every article or item > shifts one position and changes thousands of URL's. > > The problem of adaptive scheduling for these pages should be obvious by now. > I have given it some thought the past few weeks but i haven't figured out a > generic solution just yet so any advice or out-of-the-box ideas or very much > appreciated! > > Thanks > Markus >

