Hi Everybody, I'm real new to Nutch. I've read through the documentation and many months of mailinglist archives and I don't think this question has been answered.
I have two tasks I would like Nutch to handle. I would like it to crawl and index ONLY a specific set of urls. This is a stronger limitation that confining to specific sites (so db.ignore.external.links is insufficient): it should not follow ANY links on pages in the list of urls. Secondly, after creating the crawl and index of specific sites, I would like to occasionally add SINGLE urls to the index. Is this possible? If so, is it trivially possible with something like '--topN 0' (or should that be '--topN 1' ??) ? Or could I create a single local web page with all the links on it and run the crawler with '-depth 1' ? Apologies if this is an overasked or misguided question; if so I'd appreciate pointers to appropriate documentation or code so I can figure it out on my own. Thanks! -k7 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general