If you want to use a cheap host and still use nutch, you will have to be creative.
No cheap host will let you run the crawler, its just too much cpu power being used by a single host. However, you pc migh be strong enough to handle the task. You might be able to do all of the fetching and indexing on your pc, move the index to your server, and nutch could read that index. I have not tried that myself, but in theory it should work. Your index might be a big file, so ftping it over to your server might be a little bit of an issue, but you could probably use some automated ftp client to the task for you. So your nutch server will just look at the index, and search from it ( very fast) Your home pc will build the index, and ftp it over. Once your index is on the server, you should restart nutch and now nutch would be up-to-date with the new websites. You can probably update the index as frequently as you can finish creating one. If you want me to suggest a cheap jsp host, just let me know. Regards, Paul On 9/16/05, Michael Ji <[EMAIL PROTECTED]> wrote: > for a developing and testing propose, only a bit > powerful PC is far enough ( I used a Dell P4, nutch is > running well there), but you definitely need high > speed internet connection for everything Nutch > required; > > Michael Ji, > > --- Jimmy Forrester <[EMAIL PROTECTED]> wrote: > > > hi, I'm 21 and currently writing my own search > > engine in PHP & MySQL. I > > wanted to build a search engine that only searches > > fashion, entertainment, > > nightlife and gay websites. I've built this using > > PHP & MySQL you can see an > > example of it running here: > > > > http://onescene.com/search/ > > > > As you can see it isn't branded yet - still finding > > a good domain - and its > > tiny - only run it for a few hours and it filled > > 200MB on my database so my > > hosts told me to stop or they would charge me an > > obscene amount for using > > over the 200MB allowance. Its really very basic just > > using a full text > > search over the none common words within the page > > and the meta data. It > > kinda works (yet very inaccurate) but Im worried > > that if I move hosts and > > keep developing it, it will become too slow to use > > once I get 100k web pages > > in there - even if I optimize the code loads. > > > > I'm worried that im not going to be good enough at > > server config and stuff > > to get Nutch running well for me. I've been working > > for an hour so far and > > have just finally got java downloading, I may not > > even manage to get tomcat > > running at all! here are my few questions to the > > community: > > > > 1. How tricky is it to get nutch running for a > > server newbie like > > myself? > > 2. Whats nutch like for limiting the type of site > > which gets crawled? > > my current site asses if the site is "gay enough" > > to be added to the search > > domains > > 3. I'm building my seach engine as a hobby - will > > I need to purchase a > > dedicated server to run Nutch? (I so can't afford > > that) or does anyone know > > a good cheap hosting company which can defiantly > > get nutch up and running > > with? > > 4. Is my own search engine worth continuing? or > > will it simply be too > > slow & inaccurate for people to use? > > > > thank you all for taking the time to read this, > > > > looking forward to any responses! > > > > Jimmy > > > > > > > __________________________________ > Yahoo! Mail - PC Magazine Editors' Choice 2005 > http://mail.yahoo.com > ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
