What kind of speed do you need? I had to go to temp tables for a logging
application, but not until my table got upwards of a million records or so..
For the numbers your quoting, it should be pretty quick unless your engine
On Friday 11 February 2005 11:56 am, Matthew Weier O'Phinney wrote:
> * Micah Stevens <[EMAIL PROTECTED]>:
> > Sounds like a self join should work wonders.
> > I didn't test this, but the idea should work:
> > select t1.app_id as a1, t2.app_id as a2, t2.word, t2.score
> > from tablename as t1
> > left join tablename as t2
> > on t1.resource_id = t2.resource_id and t1.app_id != t2.app_id
> > group by word
> > order by word
> > voila, no temp tables. Am I missing something?
> Speed. :-)
> I had tried this as well. The temporary tables really do offer much
> better performance. The issues I was having were (1) bad resultsets
> (I've now got that fixed) and (2) speed. I still don't have (2)
> completely fixed, and it may be something I can't fix.
> > On Thursday 10 February 2005 07:56 pm, Matthew Weier O'Phinney wrote:
> >> I have a table which contains the following:
> >> id (primary key, auto incrementing)
> >> app_id (integer, foreign key)
> >> resource_id (integer, foreign key)
> >> word
> >> score
> >> (This is a search index.) I want to find all resource_ids from one
> >> app_id that match resource_ids in another app_id by word.
> >> I have created a temporary table 'tmp1' that contains all resource_ids
> >> from the second app_id (the one whose resources I wish to retrieve). I
> >> am then looping through all resource_ids in the main table with the
> >> first app_id, and doing the following:
> >> * Creating a temporary table tmp2 with a single column 'word'
> >> populated by the words associated with resource_id in the main
> >> table
> >> * Selecting all distinct resource_ids from tmp1 INNER JOIN'd on tmp2
> >> on the word field
> >> The issues I'm running into are that (1) each resource_id cycle takes a
> >> good amount of time, and (2) I seem to be getting either too many
> >> resource_ids or not enough.
> >> (1) may be something I just have to deal with. As it is, I'm planning on
> >> running the full indexing once, and then doing incremental updates, so
> >> it may not be that big of an issue (unless it takes too much time to
> >> create the initial index). As for (2), unfortunately, I'm not sur ehow
> >> to really trouble shoot the issue. I know, for instance, that in once
> >> case, I have a list of 343 words that generates a list of ~12,000
> >> resource_ids (of a possible 18,000) -- but I don't quite know how to
> >> spot check 300 values to be certain that this is reasonable.
> >> In a previous incarnation of the script, I was looping through each word
> >> of each resource_id and then selecting out of tmp1 based on the single
> >> word value. The results were very different (very few matches), and,
> >> again, the script ran long.
> >> Any ideas?
> Matthew Weier O'Phinney | WEBSITES:
> Webmaster and IT Specialist | http://www.garden.org
> National Gardening Association | http://www.kidsgardening.com
> 802-863-5251 x156 | http://nationalgardenmonth.org
> mailto:[EMAIL PROTECTED] | http://vermontbotanical.org
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php