Volkan -

You need to remove the comment (#) from the line:

#+^http://([a-z0-9]*\.)*sabah.com/

to allow it to crawl on the sabah.com domain. You can keep the -. line at the 
bottom as nutch will process the restrictions in the order they are found.




Volkan Ebil wrote:
Ok I'll post it but there is no problem without eclipse.
Thanks for your interest.

-----Original Message-----
From: Christoph M. Pflügler
[mailto:[EMAIL PROTECTED] Sent: Thursday, January 17, 2008 3:04 PM
To: [email protected]
Subject: RE: Eclipse-Crawl Problem

I just saw that you only changed the one line in urlfilter.txt you
described.

So I suppose it still contains the "-." line. If so, try it without that
line, this might solve your problem.

Chris

Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:
Yes i know how to start crawl process.I have created the url txt file in
specifed folder.The problem occures in eclipse enviroment.
Is any body know something about my problem?
Thanks.

-----Original Message-----
From: Christoph M. Pflügler
[mailto:[EMAIL PROTECTED] Sent: Thursday, January 17, 2008 12:44 PM
To: [email protected]
Subject: Re: Eclipse-Crawl Problem

Hey Volkan,

did you specify any seed urls in an arbitrary file in the folder you pass
to
nutch
with the parameter -urls? This is necessary to give nutch some point(s)
to start off with the crawl.


Greets,
Christoph
Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
I configured Eclipse following RunNutchInEclipse0.9 document.But when I
give
the arguments to eclipse
And run the Project it gives the "No URLs to fetch - check your seed
list
and URL filters".
I have changed the line in crawl-url filter +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
With
+.
As it's suggested before.
But it didn't solve my problem.
Thanks for your help.
Volkan.


Reply via email to