Hello Steve I think I can explain more.
regex-urlfilter.txt is used by RegexURLFilter plugin while crawl-urlfilter.txt is used by CrawlTool(the crawl-tool.xml) I think it is clear Best regards, /Jack ======= At 2005-03-31, 17:51:21 you wrote: ======= >Olaf, Cheers, I am still confused though. >How does nutch know which of the two to use, >that is, how do I tell nutch if its doing intranet >or internet? Do I rename regex-urlfilter.txt >to crawl-urlfilter.txt to if I want to do internet crawls? >Steve > > >-----Original Message----- >From: Olaf Thiele [mailto:[EMAIL PROTECTED] >Sent: Thursday, March 31, 2005 4:40 PM >To: [email protected] >Subject: Re: What's the difference between crawl-urlfilter.txt and >regex-urlfilter.txt? > > >Hi Steve, >the crawl-urlfilter is for intranet crawling while regex-urlfilter is >for internet crawling. > >Kind regards, >Olaf > > > >On Thu, 31 Mar 2005 12:01:19 +0800, Steve Follmer <[EMAIL PROTECTED]> >wrote: >> >> What's the difference between crawl-urlfilter.txt and >> regex-urlfilter.txt? They look very similar. Why does nutch have both, > >> and what do they do different? >> >> My best guess is that the first is used only by the crawl tool and the > >> second is used by nutch proper. The crawl tool and nutch proper seem >> to also have >> separate .xml config files. I further guess that this is just an >> artifact of >> having two separate tools that need separate but equal configuration? >> >> -Poindexter >> >> > > >-- > ><SimpleHuman gender="male"> > <Physical name="Olaf Thiele" /> > <Virtual adress="http://www.olafthiele.de" /> ></SimpleHuman> > > > >------------------------------------------------------- >This SF.net email is sponsored by Demarc: >A global provider of Threat Management Solutions. >Download our HomeAdmin security software for free today! >http://www.demarc.com/Info/Sentarus/hamr30 >_______________________________________________ >Nutch-general mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/nutch-general
