HI, I resolved crawling with Eclipse by deleting
nutch-site.xml nutch-default.xml in nutch-0.9.jar file. Hope this may help you. -Bala Mark J. Hoy wrote: > > Volkan - > > You need to remove the comment (#) from the line: > > #+^http://([a-z0-9]*\.)*sabah.com/ > > to allow it to crawl on the sabah.com domain. You can keep the -. line at > the bottom as nutch will process the restrictions in the order they are > found. > > > > > Volkan Ebil wrote: >> Ok I'll post it but there is no problem without eclipse. >> Thanks for your interest. >> >> -----Original Message----- >> From: Christoph M. Pflügler >> [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 17, 2008 3:04 PM >> To: [email protected] >> Subject: RE: Eclipse-Crawl Problem >> >> I just saw that you only changed the one line in urlfilter.txt you >> described. >> >> So I suppose it still contains the "-." line. If so, try it without that >> line, this might solve your problem. >> >> Chris >> >> Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil: >> >>> Yes i know how to start crawl process.I have created the url txt file in >>> specifed folder.The problem occures in eclipse enviroment. >>> Is any body know something about my problem? >>> Thanks. >>> >>> -----Original Message----- >>> From: Christoph M. Pflügler >>> [mailto:[EMAIL PROTECTED] >>> Sent: Thursday, January 17, 2008 12:44 PM >>> To: [email protected] >>> Subject: Re: Eclipse-Crawl Problem >>> >>> Hey Volkan, >>> >>> did you specify any seed urls in an arbitrary file in the folder you >>> pass >>> >> to >> >>> nutch >>> with the parameter -urls? This is necessary to give nutch some point(s) >>> to start off with the crawl. >>> >>> >>> Greets, >>> Christoph >>> >>> Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil: >>> >>>> I configured Eclipse following RunNutchInEclipse0.9 document.But when I >>>> >>> give >>> >>>> the arguments to eclipse >>>> And run the Project it gives the "No URLs to fetch - check your seed >>>> >> list >> >>>> and URL filters". >>>> I have changed the line in crawl-url filter >>>> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ >>>> With >>>> +. >>>> As it's suggested before. >>>> But it didn't solve my problem. >>>> Thanks for your help. >>>> >>>> Volkan. >>>> >>>> >>>> >>>> > > > -- View this message in context: http://www.nabble.com/Eclipse-Crawl-Problem-tp14916065p17593974.html Sent from the Nutch - User mailing list archive at Nabble.com.
