Re: [PERFORM] PostgreSQL OR performance

2008-11-17 Thread Віталій Тимчишин
2008/11/15 Tom Lane [EMAIL PROTECTED]

 =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
  I am not. I can't see how materialize can multiply number of rows it gets
  from sort by 100.

 Is it the right-hand input of a merge join?  If so you're looking at
 mark/restore rescans, ie, repeated fetches of the same tuples.  There
 must be a huge number of duplicate join keys in that relation to make
 for such an increase though.  Normally the planner avoids putting a
 table with lots of duplicates as the RHS of a merge, but if it doesn't
 have good statistics for the join key then it might not realize the
 problem.


OK, thanks for cleaning-up some mystery.
But, returning to original Q: Do anyone known why does it choose plan from *
OR-plan.txt* instead of *union-plan.txt*? The first is
cost=4588.13..960900482668.95, the latter is cost=266348.42..272953.14
according to statistics postgres have, so I suppose planner would select it
if it could evaluate it.


Re: [PERFORM] Improve Seq scan performance

2008-11-17 Thread PFC


OK, I see your problem. Try this :

read this : http://www.postgresql.org/docs/current/static/pgtrgm.html
locate and \i the pg_trgm.sql file

CREATE TABLE dict( s TEXT );

I loaded the english - german dictionary in a test table. I didn't parse  
it, so it's just a bunch of 418552 strings, english and german mixed.


test= EXPLAIN ANALYZE SELECT * FROM dict WHERE s LIKE '%tation%';
   QUERY PLAN

 Seq Scan on dict  (cost=0.00..7445.90 rows=133 width=13) (actual  
time=0.102..217.155 rows=802 loops=1)

   Filter: (s ~~ '%tation%'::text)
 Total runtime: 217.451 ms
(3 lignes)

Temps : 217,846 ms

Since this data does not change very often, let's use a gin index.

CREATE INDEX trgm_idx ON dict USING gin (s gin_trgm_ops);

With trigrams we can search by similarity. So, we can issue this :

EXPLAIN ANALYZE SELECT s, similarity(s, 'tation') AS sml FROM dict WHERE s  
% 'tation' ORDER BY sml DESC, s;

QUERY PLAN
--
 Sort  (cost=1114.44..1115.49 rows=419 width=13) (actual  
time=190.778..190.980 rows=500 loops=1)

   Sort Key: (similarity(s, 'tation'::text)), s
   Sort Method:  quicksort  Memory: 37kB
   -  Bitmap Heap Scan on dict  (cost=35.80..1096.19 rows=419 width=13)  
(actual time=113.486..188.825 rows=500 loops=1)

 Filter: (s % 'tation'::text)
 -  Bitmap Index Scan on trgm_idx  (cost=0.00..35.69 rows=419  
width=0) (actual time=112.011..112.011 rows=15891 loops=1)

   Index Cond: (s % 'tation'::text)
 Total runtime: 191.189 ms

It is not much faster than the seq scan, but it can give you useful  
results, correct spelling errors, etc.

Perhaps it's faster when data is not cached.
Sample of returns :

 taxation|  0.6
 station |  0.5
 tabulation  |  0.5
 taction |  0.5
 Taxation {f}|  0.5
 Taxation {f}|  0.5

If you do not want to correct for spelling errors, you can do like this :

EXPLAIN ANALYZE SELECT s FROM dict WHERE s LIKE '%tation%' AND s %  
'tation';

QUERY PLAN
--
 Bitmap Heap Scan on dict  (cost=35.70..1096.09 rows=1 width=13) (actual  
time=66.583..80.980 rows=306 loops=1)

   Filter: ((s ~~ '%tation%'::text) AND (s % 'tation'::text))
   -  Bitmap Index Scan on trgm_idx  (cost=0.00..35.69 rows=419 width=0)  
(actual time=65.799..65.799 rows=15891 loops=1)

 Index Cond: (s % 'tation'::text)
 Total runtime: 81.140 ms
(5 lignes)

Temps : 81,652 ms

In this case the trigram index is used to narrow the search, and the LIKE  
to get only exact matches.


Careful though, it might not always match, for instance if you search  
rat you won't find consideration, because the search string is too  
small.


Anyway, I would suggest to change your strategy.

You could try preloading everything into an in-memory array of strings.  
This would be much faster.
You could also try to build a list of unique words from your dictionary,  
which contains lots of expressions. Then, when the user enters a query,  
get the words that contain the entered text, and use a full-text index to  
search your dictionary.


I tested first only some words. And later with '%a%', '%b% etc. When I  
re-query the table with the used term (e.g. 1.'%a%' -slow, 2. '%b%'-  
slow, '%a%' - fast), it is faster than the old method.


When the user enters a very short string like 'a' or 'is', I don't think  
it is relevant to display all entries that contain this, because that  
could be most of your dictionary. Instead, why not display all unique  
words which start with this string ? Much less results, faster, and  
probably more useful too. Then the user can select an longer word and use  
this.


Also, pagination is overrated. If there are 50 pages of results, the user  
will never click on them anyway. They are more likely to refine their  
query instead. So, just display the first 100 results and be done with it  
;)






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun
Hi folks,

I have a simple table that keeps track of a user's access history.
It has a a few fields, but the important ones are:
  - ownerId: the user's ID, a int8
  - accessTS: the timestamp of the record

The table right now is small, only 1942 records.
The user I test with (10015) has only 89 entries.

What I want is to get the last 5 accesses of a user:
   SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 
5
 
If I create a composite index *and* analyze:
   create index IDX_TRIP_HISTORY_OWNER_ACCESS_TS on tripHistory (ownerId, 
accessTS);
   ANALYZE triphistory;

It takes 0.091s (!):
perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 
ORDER BY accessTS DESC LIMIT 5;

QUERY PLAN  
  
--
 Limit  (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 
loops=1)
   -  Index Scan Backward using idx_trip_history_owner_access_ts on 
triphistory  (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 
rows=5 loops=1)
 Index Cond: (ownerid = 10015)
 Total runtime: 0.091 ms
(4 rows)


BTW, this is after several runs of the query, shouldn't all this stuff be in 
memory?

This is not a fast machine, but this seems rather excessive, no? 

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Alan Hodgson
On Monday 17 November 2008, Dimi Paun [EMAIL PROTECTED] wrote:
 It takes 0.091s (!):
 perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId =
 10015 ORDER BY accessTS DESC LIMIT 5; QUERY PLAN
 -
--
--- Limit  (cost=0.00..7.99 rows=5 width=106) (actual
 time=0.024..0.042 rows=5 loops=1) -  Index Scan Backward using
 idx_trip_history_owner_access_ts on triphistory  (cost=0.00..142.20
 rows=89 width=106) (actual time=0.021..0.034 rows=5 loops=1) Index Cond:
 (ownerid = 10015)
  Total runtime: 0.091 ms

That's 0.091 milliseconds (0.91 seconds).


-- 
Corporations will ingest natural resources and defecate garbage until all 
resources are depleted, debt can no longer be repaid and our money becomes 
worthless - Jay Hanson

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Scott Marlowe
On Mon, Nov 17, 2008 at 8:53 AM, Dimi Paun [EMAIL PROTECTED] wrote:
 Hi folks,

 I have a simple table that keeps track of a user's access history.
 It has a a few fields, but the important ones are:
  - ownerId: the user's ID, a int8
  - accessTS: the timestamp of the record

 The table right now is small, only 1942 records.
 The user I test with (10015) has only 89 entries.

 What I want is to get the last 5 accesses of a user:
   SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC 
 LIMIT 5

 If I create a composite index *and* analyze:
   create index IDX_TRIP_HISTORY_OWNER_ACCESS_TS on tripHistory (ownerId, 
 accessTS);
   ANALYZE triphistory;

 It takes 0.091s (!):
 perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 
 ORDER BY accessTS DESC LIMIT 5;

 QUERY PLAN
 --
  Limit  (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 
 loops=1)
   -  Index Scan Backward using idx_trip_history_owner_access_ts on 
 triphistory  (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 
 rows=5 loops=1)
 Index Cond: (ownerid = 10015)
  Total runtime: 0.091 ms
 (4 rows)


 BTW, this is after several runs of the query, shouldn't all this stuff be in 
 memory?

Are you saying it's excessive you need the compound query?  Cause
that's running in 91microseconds as pointed out by Alan.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 09:53 -0700, Scott Marlowe wrote:
 
 Are you saying it's excessive you need the compound query?  Cause
 that's running in 91microseconds as pointed out by Alan.

Of course, my bad. I read that as 91ms (blush/).

Confusion came from the fact that pgadminIII reports the query
taking 20-40ms, so I read the 0.091 as seconds not ms.

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Scott Marlowe
On Mon, Nov 17, 2008 at 10:07 AM, Dimi Paun [EMAIL PROTECTED] wrote:

 On Mon, 2008-11-17 at 09:53 -0700, Scott Marlowe wrote:

 Are you saying it's excessive you need the compound query?  Cause
 that's running in 91microseconds as pointed out by Alan.

 Of course, my bad. I read that as 91ms (blush/).

 Confusion came from the fact that pgadminIII reports the query
 taking 20-40ms, so I read the 0.091 as seconds not ms.

Ahhh.  Keep in mind that if you just run the query, pgadminIII will
tell you how long it took to run AND return all the data across the
network, so it will definitely take longer then.  But most of that's
network io wait so it's not a real issue unless you're saturating your
network.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote:
 Ahhh.  Keep in mind that if you just run the query, pgadminIII will
 tell you how long it took to run AND return all the data across the
 network, so it will definitely take longer then.  But most of that's
 network io wait so it's not a real issue unless you're saturating your
 network.

But that is brutal -- there's no way it can take 20ms for a request 
across an unloaded network.

Moreover, I got something like this:

pgadminIII | pgsql
w/o index: 45ms  0.620ms
w/ index   20ms  0.091ms

How now I try to replicate, and I get 45ms in both cases. This is
very misleading...

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Scott Marlowe
On Mon, Nov 17, 2008 at 10:31 AM, Dimi Paun [EMAIL PROTECTED] wrote:

 On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote:
 Ahhh.  Keep in mind that if you just run the query, pgadminIII will
 tell you how long it took to run AND return all the data across the
 network, so it will definitely take longer then.  But most of that's
 network io wait so it's not a real issue unless you're saturating your
 network.

 But that is brutal -- there's no way it can take 20ms for a request
 across an unloaded network.

 Moreover, I got something like this:

pgadminIII | pgsql
 w/o index: 45ms  0.620ms
 w/ index   20ms  0.091ms

 How now I try to replicate, and I get 45ms in both cases. This is
 very misleading...

I'm guessing a fair bit of that time is pgadminIII prettifying the
output for you, etc.  I.e. it's not all transfer time.  Hard to say
without hooking some kind of profiler in pgadminIII.  Is psql running
local and pgadminIII remotely?  Or are they both remote?  If both psql
and pgadminIII are remote (i.e. same basic circumstances) then it's
got to be a difference in the client causing the extra time.  OR is
this output of explain analyze?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread ries van Twisk


On Nov 17, 2008, at 12:40 PM, Scott Marlowe wrote:


On Mon, Nov 17, 2008 at 10:31 AM, Dimi Paun [EMAIL PROTECTED] wrote:


On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote:

Ahhh.  Keep in mind that if you just run the query, pgadminIII will
tell you how long it took to run AND return all the data across the
network, so it will definitely take longer then.  But most of that's
network io wait so it's not a real issue unless you're saturating  
your

network.


But that is brutal -- there's no way it can take 20ms for a request
across an unloaded network.

Moreover, I got something like this:

  pgadminIII | pgsql
w/o index: 45ms  0.620ms
w/ index   20ms  0.091ms

How now I try to replicate, and I get 45ms in both cases. This is
very misleading...


I'm guessing a fair bit of that time is pgadminIII prettifying the
output for you, etc.  I.e. it's not all transfer time.  Hard to say
without hooking some kind of profiler in pgadminIII.  Is psql running
local and pgadminIII remotely?  Or are they both remote?  If both psql
and pgadminIII are remote (i.e. same basic circumstances) then it's
got to be a difference in the client causing the extra time.  OR is
this output of explain analyze?



Side note: I haven't seen pgAdminIII never show a time below 20ms (the  
time on the bottom right corner).


When I do a query like this : select 1; it takes according to  
pgAdminIII around 20ms. (whatever that time is)


what I normally do to find my real query time is put and explain  
analyse in front of my query to know to real query time.


Ries







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 10:40 -0700, Scott Marlowe wrote:
 I'm guessing a fair bit of that time is pgadminIII prettifying the
 output for you, etc.  I.e. it's not all transfer time.  Hard to say
 without hooking some kind of profiler in pgadminIII.  Is psql running
 local and pgadminIII remotely?  Or are they both remote?  If both psql
 and pgadminIII are remote (i.e. same basic circumstances) then it's
 got to be a difference in the client causing the extra time.  OR is
 this output of explain analyze?

With \timing on I get basically the same output (local vs remote)
in psql (0.668ms vs. 0.760ms). More like it.


WTH is pgadminIII reporting?!?

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dave Page
On Mon, Nov 17, 2008 at 6:14 PM, Dimi Paun [EMAIL PROTECTED] wrote:

 On Mon, 2008-11-17 at 10:40 -0700, Scott Marlowe wrote:
 I'm guessing a fair bit of that time is pgadminIII prettifying the
 output for you, etc.  I.e. it's not all transfer time.  Hard to say
 without hooking some kind of profiler in pgadminIII.  Is psql running
 local and pgadminIII remotely?  Or are they both remote?  If both psql
 and pgadminIII are remote (i.e. same basic circumstances) then it's
 got to be a difference in the client causing the extra time.  OR is
 this output of explain analyze?

 With \timing on I get basically the same output (local vs remote)
 in psql (0.668ms vs. 0.760ms). More like it.


 WTH is pgadminIII reporting?!?

Exactly what it's supposed to be, however it's using libpq's
asynchronous query interface and has to pass the query result  through
the wxWidgets event handling system, both of which seem to add a few
milliseconds to the overall query time from the quick testing I've
just done. In a GUI app like pgAdmin, we need use this kind of
architecture to allow the UI to continue processing events (such as
button clicks, redraws etc), and to allow multiple windows to work
independently without one query locking up the whole app.

Note that the rendering time that Tom mentioned the other day which
used to confuse things has not been an issue for a couple of years -
that was dependent on resultset size and could lead to much bigger
variations. that was fixed by having libpq act as a virtual data store
for the UI instead of transferring data from the PGresult to the data
grid's own data store.

I think the bottom line is that you cannot compare psql and pgAdmin's
timings because the architectures of the two apps are very different.
Further, pgAdmin isn't the best choice for micro-optimisation of
extremely fast queries.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance