Hello,
I am building my database for the spider to fill but I have a problem.
I am trying to make a thematic search engine. I would like to let the user
to make complex searches such as "the red rabbit".
But since it is intended to be used on any web server even free ones (It
will be a light search engine system released under GPL), it has to consume
the fewer ressources possible. So I just can not store the whole page as
Yahoo or big search engines do.
At first I thought about indexing only the words that seem relevants but
this way I can only make simple searches (ie : "rabbit"). Then I thought
about Indexing with the word, the previous one and the next one. This way I
should be able to make complex searches even on more than 3 words since each
new word can find next on or previous one and so on. eg : the -> red ->
rabbit -> with -> a -> big -> tail
It seems quite a good way to do it but since I would like to avoid indexing
"noise words" such as "the" or "a" it is not really satisfying.*
Does anyone has suggestions on how to achieve this ?
A database scheme would be perfect in fact :-)
Thanks a lot for your time.
Best regards
Jean-Marc
--
This message was sent by the Internet robots and spiders discussion list
([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message
to "[EMAIL PROTECTED]".
--
This message was sent by the Internet robots and spiders discussion list
([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message
to "[EMAIL PROTECTED]".