Re: [PERFORM] name search query speed

2005-03-04 Thread stig erikson
Jeremiah Jahn wrote:
I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.
thanx,
-jj-
is there a chance you could benefit from indices spanning over multiple columns?
maybe the user that searches for SMITH knows more then the last name, ie first 
name, location (zip code, name of city, etc.)?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] name search query speed

2005-03-03 Thread Ken Egervari
I'm not sure what the answer is but maybe I can help?  Would clustering the 
name index make this faster? I thought that would bunch up the pages so the 
names were more or less in order, which would improve search time.  Just a 
guess though.

Ken
- Original Message - 
From: Jeremiah Jahn [EMAIL PROTECTED]
To: postgres performance pgsql-performance@postgresql.org
Sent: Thursday, March 03, 2005 11:38 AM
Subject: [PERFORM] name search query speed


I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.
thanx,
-jj-

--
You probably wouldn't worry about what people think of you if you could
know how seldom they do.
   -- Olin Miller.
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] name search query speed

2005-03-03 Thread Jeremiah Jahn
yes, it does. I forgot to mention, that I also have clustering on that
table by my name_field index. My Bad.

On Thu, 2005-03-03 at 12:00 -0500, Ken Egervari wrote:
 I'm not sure what the answer is but maybe I can help?  Would clustering the 
 name index make this faster? I thought that would bunch up the pages so the 
 names were more or less in order, which would improve search time.  Just a 
 guess though.
 
 Ken
 
 - Original Message - 
 From: Jeremiah Jahn [EMAIL PROTECTED]
 To: postgres performance pgsql-performance@postgresql.org
 Sent: Thursday, March 03, 2005 11:38 AM
 Subject: [PERFORM] name search query speed
 
 
 I have about 5M names stored on my DB. Currently the searches are very
  quick unless, they are on a very common last name ie. SMITH. The Index
  is always used, but I still hit 10-20 seconds on a SMITH or Jones
  search, and I average about 6 searches a second and max out at about
  30/s. Any suggestions on how I could arrange things to make this search
  quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
  can increase this speed w/o a HW upgrade.
 
  thanx,
  -jj-
 
 
 
  -- 
  You probably wouldn't worry about what people think of you if you could
  know how seldom they do.
 -- Olin Miller.
 
 
  ---(end of broadcast)---
  TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
  
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
-- 
You probably wouldn't worry about what people think of you if you could
know how seldom they do.
-- Olin Miller.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] name search query speed

2005-03-03 Thread John A Meinel
Jeremiah Jahn wrote:
I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.
thanx,
-jj-

 

It sounds like the problem is just that you have a lot of rows that need 
to be returned. Can you just put a limit on the query? And then change 
the client app to recognize when the limit is reached, and either give a 
link to more results, or refine query, or something like that.

John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] name search query speed

2005-03-03 Thread Josh Berkus
Jeremiah,

 I have about 5M names stored on my DB. Currently the searches are very
 quick unless, they are on a very common last name ie. SMITH. The Index
 is always used, but I still hit 10-20 seconds on a SMITH or Jones
 search, and I average about 6 searches a second and max out at about
 30/s. Any suggestions on how I could arrange things to make this search
 quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
 can increase this speed w/o a HW upgrade.

First off, see http://www.powerpostgresql.com/PerfList about your 
configuration settings.

The problem you're running into with SMITH is that, if your query is going to 
return a substantial number of rows (variable, but generally anything over 5% 
of the table and 1000 rows) is not able to make effective use of an index.
This makes it fall back on a sequential scan, and based on you execution 
time, I'd guess that the table is a bit too large to fit in memory.

AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on 
your database, if you're still having problems post an EXPLAIN ANALYZE of the 
query to this list.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] name search query speed

2005-03-03 Thread Markus Schaber
Hi, Jeremiah,

Jeremiah Jahn schrieb:
 yes, it does. I forgot to mention, that I also have clustering on that
 table by my name_field index. My Bad.

Fine. Did you run ANALYZE and CLUSTER on the table after every large
bunch of insertions / updates?

Markus


-- 
Markus Schaber | Dipl. Informatiker | Software Development GIS

Fight against software patents in EU! http://ffii.org/
  http://nosoftwarepatents.org/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] name search query speed

2005-03-03 Thread Jeremiah Jahn
On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote:
 Jeremiah Jahn wrote:
 
 I have about 5M names stored on my DB. Currently the searches are very
 quick unless, they are on a very common last name ie. SMITH. The Index
 is always used, but I still hit 10-20 seconds on a SMITH or Jones
 search, and I average about 6 searches a second and max out at about
 30/s. Any suggestions on how I could arrange things to make this search
 quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
 can increase this speed w/o a HW upgrade.
 
 thanx,
 -jj-
 
 
 
   
 
 It sounds like the problem is just that you have a lot of rows that need 
 to be returned. Can you just put a limit on the query? And then change 
 the client app to recognize when the limit is reached, and either give a 
 link to more results, or refine query, or something like that.
Not really, about 2% of the returned rows are thrown away for security
reasons based on the current user, security groups they belong to and
different flags in the data itself. So the count for this is generated
on the fly needed for pagination in the app which expresses the total
number of finds, but only displays 40 of them. If any one knows a way to
determine the total number of matches without needing to iterate through
them using jdbc, I'm all ears as this would save me huge amounts of time
and limit/offset would become an option. 

 
 John
 =:-
 
-- 
A power so great, it can only be used for Good or Evil!
-- Firesign Theatre, The Giant Rat of Summatra


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] name search query speed

2005-03-03 Thread Jeremiah Jahn
On Thu, 2005-03-03 at 09:44 -0800, Josh Berkus wrote:
 Jeremiah,
 
  I have about 5M names stored on my DB. Currently the searches are very
  quick unless, they are on a very common last name ie. SMITH. The Index
  is always used, but I still hit 10-20 seconds on a SMITH or Jones
  search, and I average about 6 searches a second and max out at about
  30/s. Any suggestions on how I could arrange things to make this search
  quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
  can increase this speed w/o a HW upgrade.
 
 First off, see http://www.powerpostgresql.com/PerfList about your 
 configuration settings.
 
 The problem you're running into with SMITH is that, if your query is going to 
 return a substantial number of rows (variable, but generally anything over 5% 
 of the table and 1000 rows) is not able to make effective use of an index.
 This makes it fall back on a sequential scan, and based on you execution 
 time, I'd guess that the table is a bit too large to fit in memory.
 
 AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on 
 your database, if you're still having problems post an EXPLAIN ANALYZE of the 
 query to this list.
 

ie. throw more hardware at it. All of the other things on the list,
except for effective_cache_size have always been done. I bumped it up
from the default to 260. Will see if that makes a difference.

thanx,
-jj-


-- 
A power so great, it can only be used for Good or Evil!
-- Firesign Theatre, The Giant Rat of Summatra


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] name search query speed

2005-03-03 Thread Dave Held
 -Original Message-
 From: Jeremiah Jahn [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 03, 2005 2:15 PM
 To: John A Meinel
 Cc: postgres performance
 Subject: Re: [PERFORM] name search query speed
 
 [...]
 So the count for this is generated on the fly needed for
 pagination in the app which expresses the total number of
 finds, but only displays 40 of them. If any one knows a way
 to determine the total number of matches without needing to 
 iterate through them using jdbc, I'm all ears as this would
 save me huge amounts of time and limit/offset would become
 an option. 

Is there a reason you can't do a count(field) query first?  If
so, you can get the number of records returned by setting
absolute(-1) and getting the row number.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] name search query speed

2005-03-03 Thread Jeremiah Jahn
doesn't that cause two queries? I used to do it that way and cut my time
substantially by counting in-line. Even though the results were cached
it still took more time. Also since the tables is constantly be updated
the returned total would not always match the number of results on the
second query.

On Thu, 2005-03-03 at 14:26 -0600, Dave Held wrote:
  -Original Message-
  From: Jeremiah Jahn [mailto:[EMAIL PROTECTED]
  Sent: Thursday, March 03, 2005 2:15 PM
  To: John A Meinel
  Cc: postgres performance
  Subject: Re: [PERFORM] name search query speed
  
  [...]
  So the count for this is generated on the fly needed for
  pagination in the app which expresses the total number of
  finds, but only displays 40 of them. If any one knows a way
  to determine the total number of matches without needing to 
  iterate through them using jdbc, I'm all ears as this would
  save me huge amounts of time and limit/offset would become
  an option. 
 
 Is there a reason you can't do a count(field) query first?  If
 so, you can get the number of records returned by setting
 absolute(-1) and getting the row number.
 
 __
 David B. Held
 Software Engineer/Array Services Group
 200 14th Ave. East,  Sartell, MN 56377
 320.534.3637 320.253.7800 800.752.8129
-- 
A power so great, it can only be used for Good or Evil!
-- Firesign Theatre, The Giant Rat of Summatra


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] name search query speed

2005-03-03 Thread John Arbash Meinel
Jeremiah Jahn wrote:
On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote:

...
Not really, about 2% of the returned rows are thrown away for security
reasons based on the current user, security groups they belong to and
different flags in the data itself. So the count for this is generated
on the fly needed for pagination in the app which expresses the total
number of finds, but only displays 40 of them. If any one knows a way to
determine the total number of matches without needing to iterate through
them using jdbc, I'm all ears as this would save me huge amounts of time
and limit/offset would become an option.

Well, what is wrong with select count(*) from the query I would have
done?
Are you saying 2% are thrown away, or only 2% are kept?
Is this being done at the client side? Is there a way to incorporate the
security info into the database, so that the query actually only returns
the rows you care about? That seems like it would be a decent way to
speed it up, if you can restrict the number of rows that it needs to
look at.
There are other alternatives, such as materialized views, or temp
tables, where you select into the temp table the rows that the user
would request, and then you generate limit/offset from that. The first
query would be a little slow, since it would get all the rows, but all
subsequent accesses for that user could be really fast.
The other possibility is to do limit 200, and then in your list of
pages, you could have:
1, 2, 3, 4, 5, ...
This means that you don't have to worry about getting 10,000 entries,
which probably isn't really useful for the user anyway, and you can
still break things into 40 entry pages, just 200 entries at a time.
John
=:-
John
=:-




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] name search query speed

2005-03-03 Thread Markus Schaber
Hi, Jeremiah,

Jeremiah Jahn schrieb:
 doesn't that cause two queries? I used to do it that way and cut my time
 substantially by counting in-line. Even though the results were cached
 it still took more time.

This sounds rather strange.

 Also since the tables is constantly be updated
 the returned total would not always match the number of results on the
 second query.

Did you run both queries in the same transaction, with transaction
isolation level set to serializable? If yes, you found a serious bug in
PostgreSQL transaction engine.

Markus

-- 
Markus Schaber | Dipl. Informatiker | Software Development GIS

Fight against software patents in EU! http://ffii.org/
  http://nosoftwarepatents.org/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] name search query speed

2005-03-03 Thread William Yu
Jeremiah Jahn wrote:
I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.
If it's just SMITH, the only fix is to throw more hardware at the 
problem. I've got my own database of medical providers  facilities in 
the millions and anytime somebody tries to search for MEDICAL FACILITY, 
it takes forever. I've tried every optimization possible but when you 
have 500K records with the word MEDICAL in it, what can you do? You've 
got to check all 500K records to see if it matches your criteria.

For multi-word searches, what I've found does work is to periodically 
generate stats on work frequencies and use those stats to search the 
least common words first. For example, if somebody enters ALTABATES 
MEDICAL HOSPITAL, I can get the ~50 providers with ALTABATES in the 
name and then do a 2nd and 3rd pass to filter against MEDICAL and HOSPITAL.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org