Re: [PERFORM] why index scan not working when using 'like'?

2003-11-26 Thread Hannu Krosing
Tom Lane kirjutas T, 25.11.2003 kell 23:29:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> > In regular text fields containing words, your problem is solvable with full 
> > text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary 
> > non-language strings.  It could be adapted, but would require a lot of 
> > hacking.
> 
> I'm not sure why you say that FTI isn't a usable solution.  As long as
> the gene symbols are separated by whitespace or some other non-letters
> (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work.

If he wants to search on arbitrary substring, he could change tokeniser
in FTI to produce trigrams, so that "foomifbar" would be indexed as if
it were text "foo oom omi mif ifb fba bar" and search for things like
%mifb% should first do a FTI search for "mif" AND "ifb" and then simple
LIKE %mifb% to weed out something like "mififb".

There are ways to use trigrams for 1 and 2 letter matches as well.

-
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes:
> In regular text fields containing words, your problem is solvable with full 
> text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary 
> non-language strings.  It could be adapted, but would require a lot of 
> hacking.

I'm not sure why you say that FTI isn't a usable solution.  As long as
the gene symbols are separated by whitespace or some other non-letters
(eg, "foo mif bar" not "foomifbar"), I'd think FTI would work.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Josh Berkus
Lianhe,

> I want to use index on the gene_symbol column in my
> query and gene_symbol is indexed. but when I use
> lower (gene_symbol) like lower('%mif%'), the index
> is not used. While when I change to
> lower(gene_symbol) = lower('mif'), the index is used
> and index scan works, but this is not what I like. I
> want all the gene_symbols  containing substring
> 'mif' are pulled out, and not necessarily exactly match.

LIKE '%mif%' is what's called an "unanchored text search" and it cannot use an 
index.   The database *has* to scan the full text looking for the substring.  
This is true of all database platforms I know of.

In regular text fields containing words, your problem is solvable with full 
text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary 
non-language strings.  It could be adapted, but would require a lot of 
hacking.

So you will need to find a way to restructure you data to avoid needing 
unanchored text searches.One way would be to break down the gene_symbol 
field into its smallest atomic components and store those in an indexed child 
table.   Or if you're searching on the same values all the time, you can 
create a partial index.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Dror Matalon

Hi,

Searches with like or regexes often can't use the index. Think of the index as
a sorted list of your items. It's easy to find an item when you know it
starts with mif so ('mif%' should use the index). But when you use a
'like' that starts with '%' the index is useless and the search needs to
do a sequential scan.

Regards,

Dror

On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote:
> Hi all,
> 
> I want to use index on the gene_symbol column in my
> query and gene_symbol is indexed. but when I use
> lower (gene_symbol) like lower('%mif%'), the index
> is not used. While when I change to
> lower(gene_symbol) = lower('mif'), the index is used
> and index scan works, but this is not what I like. I
> want all the gene_symbols  containing substring
> 'mif' are pulled out, and not necessarily exactly match.
> 
> could anybody give me some hints how to deal with 
> this. If I do not used index, it take too long for
> the query.
> 
>
> PGA> explain select distinct probeset_id, chip,
> gene_symbol, title, sequence_description, pfam from
> affy_array_annotation where lower(gene_symbol) like
> upper('%mif%');
>QUERY PLAN
> -
>  Unique  (cost=29576.44..29591.44 rows=86 width=265)
>->  Sort  (cost=29576.44..29578.59 rows=857
> width=265)
>  Sort Key: probeset_id, chip, gene_symbol,
> title, sequence_description, pfam
>  ->  Seq Scan on affy_array_annotation 
> (cost=0.00..29534.70 rows=857 width=265)
>Filter: (lower((gene_symbol)::text)
> ~~ 'MIF%'::text)
> (5 rows)
> 
> 
> PGA=> explain select distinct probeset_id, chip,
> gene_symbol, title, sequence_description, pfam from
> affy_array_annotation where lower(gene_symbol) =
> upper('%mif%');
>
>  QUERY PLAN
> -
>  Unique  (cost=3433.44..3448.44 rows=86 width=265)
>->  Sort  (cost=3433.44..3435.58 rows=857 width=265)
>  Sort Key: probeset_id, chip, gene_symbol,
> title, sequence_description, pfam
>  ->  Index Scan using gene_symbol_idx_fun1
> on affy_array_annotation  (cost=0.00..3391.70
> rows=857 width=265)
>Index Cond:
> (lower((gene_symbol)::text) = '%MIF%'::text)
> (5 rows)
> 
> 
> 
> 
> 
> Regards,
> William
> 
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]