Re: Eclipse-Crawl Problem

Mark J. Hoy Thu, 17 Jan 2008 08:38:29 -0800

Volkan -

You need to remove the comment (#) from the line:


#+^http://([a-z0-9]*\.)*sabah.com/

to allow it to crawl on the sabah.com domain. You can keep the -. line at the 
bottom as nutch will process the restrictions in the order they are found.




Volkan Ebil wrote:

Ok I'll post it but there is no problem without eclipse.
Thanks for your interest.

-----Original Message-----
From: Christoph M. Pflügler

[mailto:[EMAIL PROTECTED]Sent: Thursday, January 17, 2008 3:04 PM

To: [email protected]
Subject: RE: Eclipse-Crawl Problem

I just saw that you only changed the one line in urlfilter.txt you
described.

So I suppose it still contains the "-." line. If so, try it without that
line, this might solve your problem.

Chris

Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:

Yes i know how to start crawl process.I have created the url txt file in
specifed folder.The problem occures in eclipse enviroment.
Is any body know something about my problem?
Thanks.

-----Original Message-----
From: Christoph M. Pflügler

[mailto:[EMAIL PROTECTED]Sent: Thursday, January 17, 2008 12:44 PM

To: [email protected]
Subject: Re: Eclipse-Crawl Problem

Hey Volkan,

did you specify any seed urls in an arbitrary file in the folder you pass

to

nutch
with the parameter -urls? This is necessary to give nutch some point(s)
to start off with the crawl.


Greets,
Christoph

Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:

I configured Eclipse following RunNutchInEclipse0.9 document.But when I

give

the arguments to eclipse
And run the Project it gives the "No URLs to fetch - check your seed

list

and URL filters".
I have changed the line in crawl-url filter+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
With
+.
As it's suggested before.
But it didn't solve my problem.
Thanks for your help.
Volkan.

Re: Eclipse-Crawl Problem

Reply via email to