By default, Nutch only crawls the first 100 outlinks on a page. Maybe that's your problem?
On 4/14/05, Matthias Jaekle <[EMAIL PROTECTED]> wrote: > > try > > +^http://news.buaa.edu.cn/* > This should not be the reason. > Your regex fits on urls starting with: > http://news.buaa.edu.cn > http://news.buaa.edu.cn/ > http://news.buaa.edu.cn// > http://news.buaa.edu.cn/// ... > > The only thing I would try is to escape some caracters to make it more > correct. A dot means every possible sign. Better: > +^http:\/\/news\.buaa\.edu\.cn > > Did you make enough rounds, to get the wanted depth? > With every crawl you only fetch the already known links. > > Matthias > > -- > http://www.eventax.com - eventax GmbH > http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
