Re: Using nutch for niche/country specific TLD

Vlad Berditchevskiy Mon, 07 Nov 2005 07:03:17 -0800

"Insurance Squared Inc." <[EMAIL PROTECTED]> writes:

> - Can nutch only crawl specific TLD's?  (i.e. like .it, or .uk.com).  My 
> suspicion is that I could easily modify nutch to do this.


You could use regex-urlfilter. Put something like this in
conf/regex-urlfilter.txt:

+^http://.*\.tld/

Don't forget to remove the "+." line.

> - Can I run crawlers on two seperate machines, then merge the results 
> for search?  I'm guessing yes, just looking for confirmation.

Yes.

> - If I only use a specific TLD, I think I would need a 'submit your 
> site' function.  Does nutch do this?  I didn't see it in our install, 
> wondering if it's a common practice.

AFAIK you have to write such a function yourself (unless someone already
did it). But it should be pretty simple, just inject the submitted URL
(maybe after a sanity check).


-- 
\  /                                       [EMAIL PROTECTED]
 \/lad                                     http://www.hashbang.de

Re: Using nutch for niche/country specific TLD

Reply via email to