Hi,

can you enlighten me on how one could classify if a web page is "gay enough"? just certain keywords or is it pages from a particular source

if search engines is your hobby than it might be worth sticking with your php search engine implementation, just cos it will be more fun and you can tailor it to your requirements easier (not that nutch is hard to customise), lucene is worth looking at for development projects which nutch is built on

also hosting a nutch engine on anything other that your own machine will be a pain in the butt, if your at university they might be able to help you out :)

_regards
gk

----- Original Message ----- From: "Jimmy Forrester" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, September 16, 2005 10:46 PM
Subject: hello nutchers!


hi, I'm 21 and currently writing my own search engine in PHP & MySQL. I
wanted to build a search engine that only searches fashion, entertainment,
nightlife and gay websites. I've built this using PHP & MySQL you can see an
example of it running here:

http://onescene.com/search/

As you can see it isn't branded yet - still finding a good domain - and its
tiny - only run it for a few hours and it filled 200MB on my database so my
hosts told me to stop or they would charge me an obscene amount for using
over the 200MB allowance. Its really very basic just using a full text
search over the none common words within the page and the meta data. It
kinda works (yet very inaccurate) but Im worried that if I move hosts and
keep developing it, it will become too slow to use once I get 100k web pages
in there - even if I optimize the code loads.

I'm worried that im not going to be good enough at server config and stuff
to get Nutch running well for me. I've been working for an hour so far and
have just finally got java downloading, I may not even manage to get tomcat
running at all! here are my few questions to the community:

  1. How tricky is it to get nutch running for a server newbie like
  myself?
  2. Whats nutch like for limiting the type of site which gets crawled?
my current site asses if the site is "gay enough" to be added to the search
  domains
  3. I'm building my seach engine as a hobby - will I need to purchase a
dedicated server to run Nutch? (I so can't afford that) or does anyone know
  a good cheap hosting company which can defiantly get nutch up and running
  with?
  4. Is my own search engine worth continuing? or will it simply be too
  slow & inaccurate for people to use?

thank you all for taking the time to read this,

looking forward to any responses!

Jimmy




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to