[SQL] Suggestions on finetuning this search?
I would welcome any suggestions for fine-tuning this search to run faster. Here is the SQL. Basically what we're allowing people to do is to specify words to search our article index. THE TABLES: knowledge = the knowledge base of articles kb_categories = the category that each article is assigned to kbwords = an index of every word in the knowledge base (kbid, wordid) wordindex = an index of every word in the knowledge base (wordid, word) AMPLIFICATION: kbwords = a list of all the words that appear in a specific article (might return a list of 5 articles where the word 'monk' appears) wordindex = a unique list of all the words that appear in all the articles (would only return a single entry for the word 'monk') THE SQL: select * from knowledge k, kb_categories c , kbwords w0 , kbwords w1 WHERE k.catid=c.catid AND ((k.kbid=w0.kbid and w0.wordid=(SELECT wordid from wordindex where word='BUDDIST')) AND (k.kbid=w1.kbid and w1.wordid=(SELECT wordid from wordindex where word='MONK'))) ORDER BY k.regionid , k.ctryid , k.catid , k.title ; WHAT "EXPLAIN" RETURNS: Sort (cost=2796577.40..2796577.40 rows=2878549 width=332) InitPlan -> Index Scan using wordindex_word on wordindex (cost=0.00..247.71 rows=376 width=4) -> Index Scan using wordindex_word on wordindex (cost=0.00..247.71 rows=376 width=4) -> Merge Join (cost=21187.45..21993.59 rows=2878549 width=332) -> Merge Join (cost=11060.50..11140.94 rows=52582 width=324) -> Sort (cost=933.56..933.56 rows=960 width=316) -> Hash Join (cost=1.43..885.97 rows=960 width=316) -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 width=284) -> Hash (cost=1.34..1.34 rows=34 width=32) -> Seq Scan on kb_categories c (cost=0.00..1.34 rows=34 width=32) -> Sort (cost=10126.95..10126.95 rows=5474 width=8) -> Seq Scan on kbwords w0 (cost=0.00..9787.02 rows=5474 width=8) -> Sort (cost=10126.95..10126.95 rows=5474 width=8) -> Seq Scan on kbwords w1 (cost=0.00..9787.02 rows=5474 width=8) Please e-mail suggestions to [EMAIL PROTECTED] Thanks! To see the code in action, visit http://www.strategicnetwork.org/index.asp?loc=kb
[SQL] Optimizing Query
Any suggestions welcome! Here is my query: select k.*, c.category from knowledge k, kb_categories c , kbwords w0 , kbwords w1 WHERE k.catid=c.catid AND ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and w1.wordid=85369)) ORDER BY k.kbid DESC LIMIT 25; Now for the details knowledge k = 3,150 records kbwords = 2-field database (kbid, wordid), 825,748 records Each word in the knowledge base is stored in a database called wordindex, which has 50,000 records or so. The system first explodes the query string and pulls the word #s from this database, which is where we get 42743 and 85369 above, "ASIA" and "CHILDREN" respectively.) The idea is then to pull all the articles in the knowledge base which contain both of these words. Here is the EXPLAIN for the query: NOTICE: QUERY PLAN: Nested Loop (cost=0.00..527690060.67 rows=2878549 width=308) -> Nested Loop (cost=0.00..9472443.40 rows=52582 width=304) -> Nested Loop (cost=0.00..6278.63 rows=960 width=300) -> Index Scan Backward using knowledge_kbid_key on knowledge k (cost=0.00..1292.51 rows=2825 width=284) -> Seq Scan on kb_categories c (cost=0.00..1.34 rows=34 width=16) -> Seq Scan on kbwords w0 (cost=0.00..9787.02 rows=5474 width=4) -> Seq Scan on kbwords w1 (cost=0.00..9787.02 rows=5474 width=4) This takes quite a while to return results... prohibitively long. There are indexes on k.catid, c.catid, k.kbid, w0.kbid, w0.wordid. Any suggestions for further optimization would be very welcome. We get about 3,000 searches on our database daily... Blessings, Justin Long ________ Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup. ________ Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup.
Re: [SQL] Optimizing Query
Does that mean that if you have 3 indexes on a table and you cluster one, it deletes the other 2? At 04:45 PM 3/5/2001 -0500, you wrote: Yes. > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > Have you tried VACUUM ANALYZE and CLUSTER? > > I assume CLUSTER still drops all indexes except the one you're clustering > on? > > Mathijs > -- > It's not that perl programmers are idiots, it's that the language > rewards idiotic behavior in a way that no other language or tool has > ever done. > Erik Naggum > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ____ Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup.
Re: [SQL] Optimizing Query
Yes, it drops indexes, much to my chagrin, as I just realized ... including SERIALs... Justin At 04:45 PM 3/5/2001 -0500, you wrote: Yes. > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > Have you tried VACUUM ANALYZE and CLUSTER? > > I assume CLUSTER still drops all indexes except the one you're clustering > on? > > Mathijs > -- > It's not that perl programmers are idiots, it's that the language > rewards idiotic behavior in a way that no other language or tool has > ever done. > Erik Naggum > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ____ Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup.
Re: [SQL] Optimizing Query
Ok, now I have another question... it doesn't seem to be accessing the index. explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and w1.wordid=85369)) NOTICE: QUERY PLAN: Merge Join (cost=32339.30..35496.97 rows=19262538 width=24) -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20) -> Sort (cost=15809.06..15809.06 rows=8257 width=4) -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 width=4) -> Sort (cost=721.18..721.18 rows=2825 width=16) -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 width=16) -> Sort (cost=15809.06..15809.06 rows=8257 width=4) -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4) Note the sequential scans... there is a wordindex where w0.wordid=42743... why isn't it doing an indexscan? wouldn't that be more efficient? Justin At 04:45 PM 3/5/2001 -0500, you wrote: Yes. > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > Have you tried VACUUM ANALYZE and CLUSTER? > > I assume CLUSTER still drops all indexes except the one you're clustering > on? > > Mathijs > -- > It's not that perl programmers are idiots, it's that the language > rewards idiotic behavior in a way that no other language or tool has > ever done. > Erik Naggum > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup.
Re: [SQL] Optimizing Query
Wow. I can't believe the difference. It didn't take too long. I'll set up a script in my etc/cron.weekly to run it... would there be any benefit to doing a vacuum analyze nightly? Justin Long At 11:10 PM 3/5/2001 -0500, you wrote: Justin Long <[EMAIL PROTECTED]> writes: > Ok, now I have another question... it doesn't seem to be accessing the index. > explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 > WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and > w1.wordid=85369)) > NOTICE: QUERY PLAN: > Merge Join (cost=32339.30..35496.97 rows=19262538 width=24) > -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 > width=4) > -> Sort (cost=721.18..721.18 rows=2825 width=16) > -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 > width=16) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4) > Note the sequential scans... there is a wordindex where w0.wordid=42743... > why isn't it doing an indexscan? wouldn't that be more efficient? It probably thinks not, because the estimated number of hits (8257) is so high. That estimate is currently driven by the frequency of the most common value in the column (mainly because that's the only stat we have :-(). I am guessing that you have a few very common words, which are skewing the stats for kbwords and causing it not to pick an indexscan. Does your setup have a notion of "stop words" that shouldn't be indexed, like "a", "an", "the", etc? Perhaps you need to add such a feature, or throw in a few more stopwords if you already have 'em. regards, tom lane ---(end of broadcast)------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] Justin Long Network for Strategic Missions [EMAIL PROTECTED] 1732 South Park Court http://www.strategicnetwork.org Chesapeake, VA 23320, USA Reality Check e-zine: [EMAIL PROTECTED] Law: Never retreat. Never surrender. Never cut a deal with a dragon. Corollary: No armor? Unclean life? Then do not mess in the affairs of dragons, for you are crunchy and taste good with ketchup.
