Hi Charles

As you know, there is no such thing as one-relevancy-ranking-fits-all
algorithm: relevancy is, by definition, tied to the purpose of the
"query". The vanilla IR-relevancy ranking algorithm (you can look up the
SMART papers, or many other papers from UMass/Bruce-Croft group) assumes
that the query is looking for the "largest overlap" (in some measure)
between the query and the documents; the Google relevancy-ranking
assumes that the query is looking for "authoritative" sources.
If these algorithms (and I don't know for sure as to how much of Google
algorithm you can use, due to patents and such) are not good enough for
you, then you need to devise one(some) for the purpose(s) of your
query(s). Essentially, you need to model what you are looking for (based
on the purpose of your query(s)), "compare" the incoming document with
your model, and assign a relevancy score to a document based on its
"closeness" to the model. From what I can tell from your qn, you need to
build a model of the purpose of your query(s).

Hope it helps,
Krishna Jha

Charles Bedard wrote:
>
> Hi,
>
> Can anyone share their ideas and knowledge on how
> one would create a relevancy ranking system. I am
> writing a spider that goes out on the web and
> retrieves documents for as long as I let it run.
>
> Before starting a search, I specify keywords that
> the spider will look for, and it will only index
> those documents that contain the keywords specified
> for the search. All other documents are not indexed.
>
> Currently, the spider only checks for the mere
> presence of one of (or all of) the keywords, but does
> not rank the results.
>
> I started computing the occurences of keywords in
> different parts of the documents and assign a different
> weight depending on the location in the document. For
> example, if a keyword appears in the title or in one of
> the META tags, then the occurence weighs more than if it
> is in the body.
>
> So I sum up all the occurences of keywords, and I come up
> with a total weight. But as you may have guessed already,
> it doesn't allow me to assign scores to documents.
>
> The only way I can assign scores with this technique is
> to wait for the whole search to finish and then take the
> Maximum weight and the minimum weight and build a scale from
> those values (i.e. Max = 100, Min = 0, and all other documents
> fall in between).
>
> I find this method a little crude, and wondered if anyone
> could give me other methods or point me to information
> regarding relevancy computations.
>
> Thanks for any help you can give me.
>
> Charles Bedard
> [EMAIL PROTECTED]

Reply via email to