Jimmy Forrester wrote:

1. How tricky is it to get nutch running for a server newbie like myself?
If you are familiar with Linux + configuration (editors, config files...), you'll be just fine.

2. Whats nutch like for limiting the type of site which gets crawled?
Nutch can be 100% customized, the only thing is that you'll have to do the customization yourself, sometimes site by site.This will be the most time consuming part.

3. I'm building my seach engine as a hobby - will I need to purchase a dedicated server to run Nutch? (I so can't afford that) or does anyone know a good cheap hosting company which can defiantly get nutch up and running with?
For small size index as yours, you don't need dedicated server, I can refer you to the person where I had vps of a sort, (22$ month but I got a special offer, you might be paying more) tell me if you are interested. However, I didn't do the crawling from his server, I just used it as a place for storing the index database. ( few gigs, several hours uploading, it was worth the wait). Last time I used his server I uploaded almost a million pages and it worked without a glitch. For starters, do the crawling from home, you'll get used to nutch, types of errors, configuration things, etc. All this will a bit more complicated if you have to do it remotely, and it can be slight discouragement. On a home 1.5 Mbps line (DSL) , you can get maybe half a million pages a day if you tweak your regex enough to skip all kinds of junk. This will of course depends on what you want to crawl in the first place, types of hosts, etc. Also, use a firewall, sometimes bunch of hosts will try to ping you back on bunch of ports as soon as you hit them, and will continue pinging you on and on. Quick change of the ip helps in this case.

4. Is my own search engine worth continuing? or will it simply be too slow & inaccurate for people to use?
Nutch is fast and powerful engine, you'll discover that in time.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to