Hi,

We're barely past the install stages with nutch, I'd like to ask the more experienced a few general questions before I jump in with both feet.

I'm thinking about creating a country specific (by TLD) search engine. - Can nutch only crawl specific TLD's? (i.e. like .it, or .uk.com). My suspicion is that I could easily modify nutch to do this. - Can I run crawlers on two seperate machines, then merge the results for search? I'm guessing yes, just looking for confirmation. - If I only use a specific TLD, I think I would need a 'submit your site' function. Does nutch do this? I didn't see it in our install, wondering if it's a common practice. - In the future I think I'd want to branch out to other TLD's, but keeping the results country specific (i.e. .com's that are relevant to the country). I'm guessing this is a largish project that would require substantial changes to the algorithm to rank a site's 'country-specificness'? - I'm also considering hand editing the crawl, is this reasonably possible? i.e. I unleash the crawler on a seed set of sites, then need to hand approve any further sites that are found by the crawler from there. Actually, I guess that's a double question - is it currently technically possible, and secondly am I an idiot for even thinking of such a task? :). Thanks - I'm trying to get a handle on things I might run into before I get too far into this. I'm confident I can make minor tweaks if needed, but some of the above seem to me to need some heavy duty work if they're not already available; perhaps more than I can do for what I'm looking at as my next hobby.
Thanks!

Reply via email to