Re: [GENERAL] efficiency of wildcards at both ends

2012-06-23 Thread Jasen Betts
On 2012-06-20, Sam Z J  wrote:
> --0016e6d999db24c4c704c2ea7a97
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all
>
> I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
> How efficient is it if that's the only search criteria against a large
> table? how much does indexing the column help

fulltextsearch can be abused with a custom lexer that fragments the string in 
every possible
way and that can be matched against, but not using LIKE.

Works OK on tables with tens of thaousands of rows haven't tried it on
larger rows.


-- 
⚂⚃ 100% natural


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Thomas Kellerer

Alan Hodgson wrote on 20.06.2012 19:39:

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a large
table? how much does indexing the column help and roughly how much more
space is needed for the index?



Indexing helps not at all. If the search string starts with a wildcard you
will always get a sequential scan of the whole table.


Not necessarily: 
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Edson Richter

Em 20/06/2012 15:03, John R Pierce escreveu:

On 06/20/12 10:37 AM, Edson Richter wrote:


select * from tb1
  where nome like 'CARLOS%' or reverse(nome) like reverse('%CARLOS')

Should return same results as
select * from tb1
  where nome like '%CARLOS%'


no, that won't match 'abcCARLOSxyx'




Perfect. I did not realized this case.

Thanks,

Edson.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread John R Pierce

On 06/20/12 10:37 AM, Edson Richter wrote:


select * from tb1
  where nome like 'CARLOS%' or reverse(nome) like reverse('%CARLOS')

Should return same results as
select * from tb1
  where nome like '%CARLOS%'


no, that won't match 'abcCARLOSxyx'



--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Thomas Kellerer

Sam Z J wrote on 20.06.2012 19:10:

Hi all

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a large table? 
how much does indexing the column help and roughly how much more space is 
needed for the index?

if the answers are too long, please point me to the relavant text =D



Since 9.1 you can speed up such a query using a trigram index.

http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html

Another option might be to use the wildspeed extension

http://www.sai.msu.su/~megera/wiki/wildspeed

(never used that myself though)






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Sam Z J
thank you all for the useful information =D

On Wed, Jun 20, 2012 at 1:39 PM, Alan Hodgson  wrote:

> On Wednesday, June 20, 2012 01:10:03 PM Sam Z J wrote:
> > Hi all
> >
> > I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
> > How efficient is it if that's the only search criteria against a large
> > table? how much does indexing the column help and roughly how much more
> > space is needed for the index?
> >
>
> Indexing helps not at all. If the search string starts with a wildcard you
> will always get a sequential scan of the whole table.
>
> Look at the full text search documentation for a better approach.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Alan Hodgson
On Wednesday, June 20, 2012 01:10:03 PM Sam Z J wrote:
> Hi all
> 
> I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
> How efficient is it if that's the only search criteria against a large
> table? how much does indexing the column help and roughly how much more
> space is needed for the index?
> 

Indexing helps not at all. If the search string starts with a wildcard you 
will always get a sequential scan of the whole table.

Look at the full text search documentation for a better approach.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Edson Richter
Just ocurred to me that would be possible to create some sort of 
"hybrid" solution...


create index idx1 on tb1 (nome);
create index idx2 on tb1 (reverse(nome));

select * from tb1
  where nome like 'CARLOS%' or reverse(nome) like reverse('%CARLOS')

Should return same results as
select * from tb1
  where nome like '%CARLOS%'

I supposed that this hybrid solution will be optmized by the indexes 
(but at what cost?).


Can some PostgreSQL expert tell if this assumption is right? Would be 
possible to have PostgreSQL doing that automatically? Something like


create index idx1 on tb1 (nome) with options (optimize wildcards);
select * from tb1 where nome like '%CARLOS%';

and then this get expanded as the example above? Then, what happens with 
the following query:


select * from tb1 where nome like '%CARLOS%ERICKSSON%';

?

Edson

Em 20/06/2012 14:28, Edson Richter escreveu:
AFAIK, wildcards at both ends are not optimized at all, unless you use 
some sort of specialized index (may be Gist or FullText).
Until 9.1 there is no such "Index Scan" feature, that would help (very 
little).
Other databases (like MS SQL Server) solve this kind of query by 
executing an Index Scan, then merge join with rest of the query.


This is all I know about LIKE optimization in PostgreSQL:

LIKE 'str%' -> optimized by normal indexes
LIKE '%str%' -> not optimized. You can use FullText, but then your 
wildcards will have to change to something not SQL-standard compatible 
solution...
LIKE '%str' -> can be optimized if you create index with column 
content reversed then query reversed as well. See code below for details.


How did I optimized "%str" queries (code implemented with help from 
the PgSql community):


CREATE OR REPLACE FUNCTION reverse(input character varying)
  RETURNS character varying AS
$BODY$
DECLARE
  result character varying = '';
  i int;
BEGIN
  FOR i IN 1..length(input) BY 2 LOOP
result = substr(input,i+1,1) || substr(input,i,1) || result;
  END LOOP;
  RETURN result;
END$BODY$
  LANGUAGE plpgsql IMMUTABLE STRICT
  COST 100;
create index idx on tb1 (reverse(nome));
select * from tb1 where reverse(nome) like reverse('%RICHTER');


Regards,

Edson.

Em 20/06/2012 14:10, Sam Z J escreveu:

Hi all

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a 
large table? how much does indexing the column help and roughly how 
much more space is needed for the index?


if the answers are too long, please point me to the relavant text =D

thanks

--
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com 








Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Edson Richter
AFAIK, wildcards at both ends are not optimized at all, unless you use 
some sort of specialized index (may be Gist or FullText).
Until 9.1 there is no such "Index Scan" feature, that would help (very 
little).
Other databases (like MS SQL Server) solve this kind of query by 
executing an Index Scan, then merge join with rest of the query.


This is all I know about LIKE optimization in PostgreSQL:

LIKE 'str%' -> optimized by normal indexes
LIKE '%str%' -> not optimized. You can use FullText, but then your 
wildcards will have to change to something not SQL-standard compatible 
solution...
LIKE '%str' -> can be optimized if you create index with column content 
reversed then query reversed as well. See code below for details.


How did I optimized "%str" queries (code implemented with help from the 
PgSql community):


CREATE OR REPLACE FUNCTION reverse(input character varying)
  RETURNS character varying AS
$BODY$
DECLARE
  result character varying = '';
  i int;
BEGIN
  FOR i IN 1..length(input) BY 2 LOOP
result = substr(input,i+1,1) || substr(input,i,1) || result;
  END LOOP;
  RETURN result;
END$BODY$
  LANGUAGE plpgsql IMMUTABLE STRICT
  COST 100;
create index idx on tb1 (reverse(nome));
select * from tb1 where reverse(nome) like reverse('%RICHTER');


Regards,

Edson.

Em 20/06/2012 14:10, Sam Z J escreveu:

Hi all

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a large 
table? how much does indexing the column help and roughly how much 
more space is needed for the index?


if the answers are too long, please point me to the relavant text =D

thanks

--
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com 





Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Andy Colson

On 6/20/2012 12:10 PM, Sam Z J wrote:

Hi all

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a large
table? how much does indexing the column help and roughly how much more
space is needed for the index?

if the answers are too long, please point me to the relavant text =D

thanks

--
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com 


An index will not be used for that kind of search.  PG will scan the 
entire table to find matches.  PG can only use an index if you have a 
search LIKE 'str%'


There are options like full text search, and pg_trgm that you might be 
able to use.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Lonni J Friedman
On Wed, Jun 20, 2012 at 10:10 AM, Sam Z J  wrote:
> Hi all
>
> I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
> How efficient is it if that's the only search criteria against a large
> table? how much does indexing the column help and roughly how much more
> space is needed for the index?
>
> if the answers are too long, please point me to the relavant text =D

My limited understanding is that any time you need to resort to using
wildcards, indices are never used, and you're falling back to using
the inefficient table scan.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Sam Z J
Hi all

I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
How efficient is it if that's the only search criteria against a large
table? how much does indexing the column help and roughly how much more
space is needed for the index?

if the answers are too long, please point me to the relavant text =D

thanks

-- 
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com