"Insurance Squared Inc." <[EMAIL PROTECTED]> writes: > - Can nutch only crawl specific TLD's? (i.e. like .it, or .uk.com). My > suspicion is that I could easily modify nutch to do this.
You could use regex-urlfilter. Put something like this in conf/regex-urlfilter.txt: +^http://.*\.tld/ Don't forget to remove the "+." line. > - Can I run crawlers on two seperate machines, then merge the results > for search? I'm guessing yes, just looking for confirmation. Yes. > - If I only use a specific TLD, I think I would need a 'submit your > site' function. Does nutch do this? I didn't see it in our install, > wondering if it's a common practice. AFAIK you have to write such a function yourself (unless someone already did it). But it should be pretty simple, just inject the submitted URL (maybe after a sanity check). -- \ / [EMAIL PROTECTED] \/lad http://www.hashbang.de
