Re: [PERFORM] why index scan not working when using 'like'?
Tom Lane kirjutas T, 25.11.2003 kell 23:29: > Josh Berkus <[EMAIL PROTECTED]> writes: > > In regular text fields containing words, your problem is solvable with full > > text indexing (FTI). Unfortunately, FTI is not designed for arbitrary > > non-language strings. It could be adapted, but would require a lot of > > hacking. > > I'm not sure why you say that FTI isn't a usable solution. As long as > the gene symbols are separated by whitespace or some other non-letters > (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work. If he wants to search on arbitrary substring, he could change tokeniser in FTI to produce trigrams, so that "foomifbar" would be indexed as if it were text "foo oom omi mif ifb fba bar" and search for things like %mifb% should first do a FTI search for "mif" AND "ifb" and then simple LIKE %mifb% to weed out something like "mififb". There are ways to use trigrams for 1 and 2 letter matches as well. - Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] why index scan not working when using 'like'?
Josh Berkus <[EMAIL PROTECTED]> writes: > In regular text fields containing words, your problem is solvable with full > text indexing (FTI). Unfortunately, FTI is not designed for arbitrary > non-language strings. It could be adapted, but would require a lot of > hacking. I'm not sure why you say that FTI isn't a usable solution. As long as the gene symbols are separated by whitespace or some other non-letters (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] why index scan not working when using 'like'?
Lianhe, > I want to use index on the gene_symbol column in my > query and gene_symbol is indexed. but when I use > lower (gene_symbol) like lower('%mif%'), the index > is not used. While when I change to > lower(gene_symbol) = lower('mif'), the index is used > and index scan works, but this is not what I like. I > want all the gene_symbols containing substring > 'mif' are pulled out, and not necessarily exactly match. LIKE '%mif%' is what's called an "unanchored text search" and it cannot use an index. The database *has* to scan the full text looking for the substring. This is true of all database platforms I know of. In regular text fields containing words, your problem is solvable with full text indexing (FTI). Unfortunately, FTI is not designed for arbitrary non-language strings. It could be adapted, but would require a lot of hacking. So you will need to find a way to restructure you data to avoid needing unanchored text searches.One way would be to break down the gene_symbol field into its smallest atomic components and store those in an indexed child table. Or if you're searching on the same values all the time, you can create a partial index. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] why index scan not working when using 'like'?
Hi, Searches with like or regexes often can't use the index. Think of the index as a sorted list of your items. It's easy to find an item when you know it starts with mif so ('mif%' should use the index). But when you use a 'like' that starts with '%' the index is useless and the search needs to do a sequential scan. Regards, Dror On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote: > Hi all, > > I want to use index on the gene_symbol column in my > query and gene_symbol is indexed. but when I use > lower (gene_symbol) like lower('%mif%'), the index > is not used. While when I change to > lower(gene_symbol) = lower('mif'), the index is used > and index scan works, but this is not what I like. I > want all the gene_symbols containing substring > 'mif' are pulled out, and not necessarily exactly match. > > could anybody give me some hints how to deal with > this. If I do not used index, it take too long for > the query. > > > PGA> explain select distinct probeset_id, chip, > gene_symbol, title, sequence_description, pfam from > affy_array_annotation where lower(gene_symbol) like > upper('%mif%'); >QUERY PLAN > - > Unique (cost=29576.44..29591.44 rows=86 width=265) >-> Sort (cost=29576.44..29578.59 rows=857 > width=265) > Sort Key: probeset_id, chip, gene_symbol, > title, sequence_description, pfam > -> Seq Scan on affy_array_annotation > (cost=0.00..29534.70 rows=857 width=265) >Filter: (lower((gene_symbol)::text) > ~~ 'MIF%'::text) > (5 rows) > > > PGA=> explain select distinct probeset_id, chip, > gene_symbol, title, sequence_description, pfam from > affy_array_annotation where lower(gene_symbol) = > upper('%mif%'); > > QUERY PLAN > - > Unique (cost=3433.44..3448.44 rows=86 width=265) >-> Sort (cost=3433.44..3435.58 rows=857 width=265) > Sort Key: probeset_id, chip, gene_symbol, > title, sequence_description, pfam > -> Index Scan using gene_symbol_idx_fun1 > on affy_array_annotation (cost=0.00..3391.70 > rows=857 width=265) >Index Cond: > (lower((gene_symbol)::text) = '%MIF%'::text) > (5 rows) > > > > > > Regards, > William > > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]