[PERFORM] why index scan not working when using 'like'?
Hi all, I want to use index on the gene_symbol column in my query and gene_symbol is indexed. but when I use lower (gene_symbol) like lower('%mif%'), the index is not used. While when I change to lower(gene_symbol) = lower('mif'), the index is used and index scan works, but this is not what I like. I want all the gene_symbols containing substring 'mif' are pulled out, and not necessarily exactly match. could anybody give me some hints how to deal with this. If I do not used index, it take too long for the query. PGA explain select distinct probeset_id, chip, gene_symbol, title, sequence_description, pfam from affy_array_annotation where lower(gene_symbol) like upper('%mif%'); QUERY PLAN - Unique (cost=29576.44..29591.44 rows=86 width=265) - Sort (cost=29576.44..29578.59 rows=857 width=265) Sort Key: probeset_id, chip, gene_symbol, title, sequence_description, pfam - Seq Scan on affy_array_annotation (cost=0.00..29534.70 rows=857 width=265) Filter: (lower((gene_symbol)::text) ~~ 'MIF%'::text) (5 rows) PGA= explain select distinct probeset_id, chip, gene_symbol, title, sequence_description, pfam from affy_array_annotation where lower(gene_symbol) = upper('%mif%'); QUERY PLAN - Unique (cost=3433.44..3448.44 rows=86 width=265) - Sort (cost=3433.44..3435.58 rows=857 width=265) Sort Key: probeset_id, chip, gene_symbol, title, sequence_description, pfam - Index Scan using gene_symbol_idx_fun1 on affy_array_annotation (cost=0.00..3391.70 rows=857 width=265) Index Cond: (lower((gene_symbol)::text) = '%MIF%'::text) (5 rows) Regards, William ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] why index scan not working when using 'like'?
Hi, Searches with like or regexes often can't use the index. Think of the index as a sorted list of your items. It's easy to find an item when you know it starts with mif so ('mif%' should use the index). But when you use a 'like' that starts with '%' the index is useless and the search needs to do a sequential scan. Regards, Dror On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote: Hi all, I want to use index on the gene_symbol column in my query and gene_symbol is indexed. but when I use lower (gene_symbol) like lower('%mif%'), the index is not used. While when I change to lower(gene_symbol) = lower('mif'), the index is used and index scan works, but this is not what I like. I want all the gene_symbols containing substring 'mif' are pulled out, and not necessarily exactly match. could anybody give me some hints how to deal with this. If I do not used index, it take too long for the query. PGA explain select distinct probeset_id, chip, gene_symbol, title, sequence_description, pfam from affy_array_annotation where lower(gene_symbol) like upper('%mif%'); QUERY PLAN - Unique (cost=29576.44..29591.44 rows=86 width=265) - Sort (cost=29576.44..29578.59 rows=857 width=265) Sort Key: probeset_id, chip, gene_symbol, title, sequence_description, pfam - Seq Scan on affy_array_annotation (cost=0.00..29534.70 rows=857 width=265) Filter: (lower((gene_symbol)::text) ~~ 'MIF%'::text) (5 rows) PGA= explain select distinct probeset_id, chip, gene_symbol, title, sequence_description, pfam from affy_array_annotation where lower(gene_symbol) = upper('%mif%'); QUERY PLAN - Unique (cost=3433.44..3448.44 rows=86 width=265) - Sort (cost=3433.44..3435.58 rows=857 width=265) Sort Key: probeset_id, chip, gene_symbol, title, sequence_description, pfam - Index Scan using gene_symbol_idx_fun1 on affy_array_annotation (cost=0.00..3391.70 rows=857 width=265) Index Cond: (lower((gene_symbol)::text) = '%MIF%'::text) (5 rows) Regards, William ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] why index scan not working when using 'like'?
Josh Berkus [EMAIL PROTECTED] writes: In regular text fields containing words, your problem is solvable with full text indexing (FTI). Unfortunately, FTI is not designed for arbitrary non-language strings. It could be adapted, but would require a lot of hacking. I'm not sure why you say that FTI isn't a usable solution. As long as the gene symbols are separated by whitespace or some other non-letters (eg, foo mif bar not foomifbar), I'd think FTI would work. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])