I owe a BIG thank you to those of you who have patiently helped me. My program
is basically running now, and though it's definitely a prototype, it works.
(I had started the move in the hopes that a sluggish program could be sped up,
and it is DEFINITELY speeding up.)

There is one immediate help request for a complex use of SQLAlchemy:

A document_table represents a webpage (roughly); a histogram_table represents
an entry into a histogram of words. If the webpage says "egg bacon spam spam"
it will have:

* one histogram with its primary key, "egg", and a count of 1
* one histogram with its primary key, "bacon", and a count of 1
* one histogram with its primary key, "spam", and a count of 2

In the kind of search I want to enable, a webpage has a score which is
calculated by dividing its count for each search keyword by the total word count for the webpage. Thus if someone searches for "egg bacon spam", the score
will be 1/4 * 1/4 * 2/4 = 0.03125. (This is actually a very high score,
althoguh it looks low.) If a keyword is missing, it will have a score of 0:
"egg sausage spam" will have a score of 1/4 * 0/4 * 2/4 = 0.0.

What I want to do is select all documents with a nonzero score based on the
keyword search terms, and sort them by score descending. I know I could
bludgeon it and eventually get it working, but this is complex enough (should I
make a separate histogram_grand_total_table with the total word count for a
webpage?) that I wanted to ask for help.


--
++ Jonathan Hayward, [EMAIL PROTECTED]
** To see an award-winning website with stories, essays, artwork,
** games, and a four-dimensional maze, why not visit my home page?
** All of this is waiting for you at http://JonathansCorner.com

** If you'd like a Google Mail (gmail.com) account, please tell me!

Reply via email to