Re: [Sqlalchemy-users] ...SORT BY score DESC...

Qvx Mon, 03 Apr 2006 08:08:04 -0700

Stored procedure will be different for each database. I can help you with Oracle.

I would make my query like this:

SELECT doc.id, SUM( his.cnt/doc.word_count) weight
FROM doc, his
WHERE doc.id = his.doc_id
   AND his.word IN (:w1, :w2, :w3)
GROUP BY doc.id
HAVING COUNT(*) = :num_search_words
ORDER BY 2 DESC

But then again, this is not what you want. You want product and I don't know of any. For that you will have to loop and multiply yourself.

On 4/3/06, Jonathan Hayward http://JonathansCorner.com < [EMAIL PROTECTED]> wrote:
> Thank you! I've just added that column.
>
>  Would this be an appropriate time to have an SQL stored procedure? I'm
> thinking pseudocode like:
>
>  function count(document, list_of_keywords)
>      result = 1.0
>      if document.word_count > 0:
>          for keyword in list_of_keywords:
>              get histogram result (= histogram count of number of
> occurrences of document keyword; if this is 0, return 0.)
>              result = result * float(histogram_count) /
> float(document.word_count)
>      return result
>
>  Sorry if this is crude; I don't know SQL well, but I have the impression
> that a stored procedure might look good in a case like this.
>
>
> On 4/3/06, Qvx <[EMAIL PROTECTED]> wrote:
> > I would definitely add a "word_count" column to document table no
> > matter what your resulting query will look like. But you are going to
> > have some trouble making an efficient query which will also take into
> > account the given formula (the part where you expect element "0/4").
> >
> > Tvrtko
> >
> > On 4/3/06, Jonathan Hayward http://JonathansCorner.com
> > < [EMAIL PROTECTED]> wrote:
> > > I owe a BIG thank you to those of you who have patiently helped me. My
> > > program
> > >  is basically running now, and though it's definitely a prototype, it
> works.
> > >  (I had started the move in the hopes that a sluggish program could be
> sped
> > > up,
> > >  and it is DEFINITELY speeding up.)
> > >
> > >  There is one immediate help request for a complex use of SQLAlchemy:
> > >
> > >  A document_table represents a webpage (roughly); a histogram_table
> > > represents
> > >  an entry into a histogram of words. If the webpage says "egg bacon spam
> > > spam"
> > >  it will have:
> > >
> > >  * one histogram with its primary key, "egg", and a count of 1
> > >  * one histogram with its primary key, "bacon", and a count of 1
> > >  * one histogram with its primary key, "spam", and a count of 2
> > >
> > >  In the kind of search I want to enable, a webpage has a score which is
> > >  calculated by dividing its count for each search keyword by the total
> word
> > > count for the webpage. Thus if someone searches for "egg bacon spam",
> the
> > > score
> > >  will be 1/4 * 1/4 * 2/4 = 0.03125. (This is actually a very high score,
> > >  althoguh it looks low.) If a keyword is missing, it will have a score
> of 0:
> > >  "egg sausage spam" will have a score of 1/4 * 0/4 * 2/4 = 0.0.
> > >
> > >  What I want to do is select all documents with a nonzero score based on
> the
> > >  keyword search terms, and sort them by score descending. I know I could
> > >  bludgeon it and eventually get it working, but this is complex enough
> > > (should I
> > >  make a separate histogram_grand_total_table with the total word count
> for a
> > >  webpage?) that I wanted to ask for help.
> > >
> > >
> > > --
> > > ++ Jonathan Hayward, [EMAIL PROTECTED]
> > > ** To see an award-winning website with stories, essays, artwork,
> > > ** games, and a four-dimensional maze, why not visit my home page?
> > > ** All of this is waiting for you at http://JonathansCorner.com
> > >
> > > ** If you'd like a Google Mail ( gmail.com ) account, please tell me!
> >
>
>
>
> --
>
> ++ Jonathan Hayward, [EMAIL PROTECTED]
> ** To see an award-winning website with stories, essays, artwork,
> ** games, and a four-dimensional maze, why not visit my home page?
> ** All of this is waiting for you at http://JonathansCorner.com
>
> ** If you'd like a Google Mail ( gmail.com) account, please tell me!

Re: [Sqlalchemy-users] ...SORT BY score DESC...

Reply via email to