Jared Dunne wrote: > Can someone explain the motivation for the nutch crawl command to use a > different file, namely crawl-urlfilter.txt, instead of just using > regex-urlfilter.txt? > > Seems unnecessary, but I'm assuming there a compelling reason. >
Originally, the 'crawl' command was intended for a simple one-step Intranet crawling. Urlfilters would be very different in such case from the case of Internet-wide crawling, so a separate config file seemed appropriate. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
