Re: [PERFORM] PostgreSQL OR performance
2008/11/15 Tom Lane [EMAIL PROTECTED] =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: I am not. I can't see how materialize can multiply number of rows it gets from sort by 100. Is it the right-hand input of a merge join? If so you're looking at mark/restore rescans, ie, repeated fetches of the same tuples. There must be a huge number of duplicate join keys in that relation to make for such an increase though. Normally the planner avoids putting a table with lots of duplicates as the RHS of a merge, but if it doesn't have good statistics for the join key then it might not realize the problem. OK, thanks for cleaning-up some mystery. But, returning to original Q: Do anyone known why does it choose plan from * OR-plan.txt* instead of *union-plan.txt*? The first is cost=4588.13..960900482668.95, the latter is cost=266348.42..272953.14 according to statistics postgres have, so I suppose planner would select it if it could evaluate it.
Re: [PERFORM] Improve Seq scan performance
OK, I see your problem. Try this : read this : http://www.postgresql.org/docs/current/static/pgtrgm.html locate and \i the pg_trgm.sql file CREATE TABLE dict( s TEXT ); I loaded the english - german dictionary in a test table. I didn't parse it, so it's just a bunch of 418552 strings, english and german mixed. test= EXPLAIN ANALYZE SELECT * FROM dict WHERE s LIKE '%tation%'; QUERY PLAN Seq Scan on dict (cost=0.00..7445.90 rows=133 width=13) (actual time=0.102..217.155 rows=802 loops=1) Filter: (s ~~ '%tation%'::text) Total runtime: 217.451 ms (3 lignes) Temps : 217,846 ms Since this data does not change very often, let's use a gin index. CREATE INDEX trgm_idx ON dict USING gin (s gin_trgm_ops); With trigrams we can search by similarity. So, we can issue this : EXPLAIN ANALYZE SELECT s, similarity(s, 'tation') AS sml FROM dict WHERE s % 'tation' ORDER BY sml DESC, s; QUERY PLAN -- Sort (cost=1114.44..1115.49 rows=419 width=13) (actual time=190.778..190.980 rows=500 loops=1) Sort Key: (similarity(s, 'tation'::text)), s Sort Method: quicksort Memory: 37kB - Bitmap Heap Scan on dict (cost=35.80..1096.19 rows=419 width=13) (actual time=113.486..188.825 rows=500 loops=1) Filter: (s % 'tation'::text) - Bitmap Index Scan on trgm_idx (cost=0.00..35.69 rows=419 width=0) (actual time=112.011..112.011 rows=15891 loops=1) Index Cond: (s % 'tation'::text) Total runtime: 191.189 ms It is not much faster than the seq scan, but it can give you useful results, correct spelling errors, etc. Perhaps it's faster when data is not cached. Sample of returns : taxation| 0.6 station | 0.5 tabulation | 0.5 taction | 0.5 Taxation {f}| 0.5 Taxation {f}| 0.5 If you do not want to correct for spelling errors, you can do like this : EXPLAIN ANALYZE SELECT s FROM dict WHERE s LIKE '%tation%' AND s % 'tation'; QUERY PLAN -- Bitmap Heap Scan on dict (cost=35.70..1096.09 rows=1 width=13) (actual time=66.583..80.980 rows=306 loops=1) Filter: ((s ~~ '%tation%'::text) AND (s % 'tation'::text)) - Bitmap Index Scan on trgm_idx (cost=0.00..35.69 rows=419 width=0) (actual time=65.799..65.799 rows=15891 loops=1) Index Cond: (s % 'tation'::text) Total runtime: 81.140 ms (5 lignes) Temps : 81,652 ms In this case the trigram index is used to narrow the search, and the LIKE to get only exact matches. Careful though, it might not always match, for instance if you search rat you won't find consideration, because the search string is too small. Anyway, I would suggest to change your strategy. You could try preloading everything into an in-memory array of strings. This would be much faster. You could also try to build a list of unique words from your dictionary, which contains lots of expressions. Then, when the user enters a query, get the words that contain the entered text, and use a full-text index to search your dictionary. I tested first only some words. And later with '%a%', '%b% etc. When I re-query the table with the used term (e.g. 1.'%a%' -slow, 2. '%b%'- slow, '%a%' - fast), it is faster than the old method. When the user enters a very short string like 'a' or 'is', I don't think it is relevant to display all entries that contain this, because that could be most of your dictionary. Instead, why not display all unique words which start with this string ? Much less results, faster, and probably more useful too. Then the user can select an longer word and use this. Also, pagination is overrated. If there are 50 pages of results, the user will never click on them anyway. They are more likely to refine their query instead. So, just display the first 100 results and be done with it ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad performance on simple query
Hi folks, I have a simple table that keeps track of a user's access history. It has a a few fields, but the important ones are: - ownerId: the user's ID, a int8 - accessTS: the timestamp of the record The table right now is small, only 1942 records. The user I test with (10015) has only 89 entries. What I want is to get the last 5 accesses of a user: SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 5 If I create a composite index *and* analyze: create index IDX_TRIP_HISTORY_OWNER_ACCESS_TS on tripHistory (ownerId, accessTS); ANALYZE triphistory; It takes 0.091s (!): perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 5; QUERY PLAN -- Limit (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 loops=1) - Index Scan Backward using idx_trip_history_owner_access_ts on triphistory (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 rows=5 loops=1) Index Cond: (ownerid = 10015) Total runtime: 0.091 ms (4 rows) BTW, this is after several runs of the query, shouldn't all this stuff be in memory? This is not a fast machine, but this seems rather excessive, no? -- Dimi Paun [EMAIL PROTECTED] Lattica, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Monday 17 November 2008, Dimi Paun [EMAIL PROTECTED] wrote: It takes 0.091s (!): perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 5; QUERY PLAN - -- --- Limit (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 loops=1) - Index Scan Backward using idx_trip_history_owner_access_ts on triphistory (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 rows=5 loops=1) Index Cond: (ownerid = 10015) Total runtime: 0.091 ms That's 0.091 milliseconds (0.91 seconds). -- Corporations will ingest natural resources and defecate garbage until all resources are depleted, debt can no longer be repaid and our money becomes worthless - Jay Hanson -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, Nov 17, 2008 at 8:53 AM, Dimi Paun [EMAIL PROTECTED] wrote: Hi folks, I have a simple table that keeps track of a user's access history. It has a a few fields, but the important ones are: - ownerId: the user's ID, a int8 - accessTS: the timestamp of the record The table right now is small, only 1942 records. The user I test with (10015) has only 89 entries. What I want is to get the last 5 accesses of a user: SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 5 If I create a composite index *and* analyze: create index IDX_TRIP_HISTORY_OWNER_ACCESS_TS on tripHistory (ownerId, accessTS); ANALYZE triphistory; It takes 0.091s (!): perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 5; QUERY PLAN -- Limit (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 loops=1) - Index Scan Backward using idx_trip_history_owner_access_ts on triphistory (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 rows=5 loops=1) Index Cond: (ownerid = 10015) Total runtime: 0.091 ms (4 rows) BTW, this is after several runs of the query, shouldn't all this stuff be in memory? Are you saying it's excessive you need the compound query? Cause that's running in 91microseconds as pointed out by Alan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, 2008-11-17 at 09:53 -0700, Scott Marlowe wrote: Are you saying it's excessive you need the compound query? Cause that's running in 91microseconds as pointed out by Alan. Of course, my bad. I read that as 91ms (blush/). Confusion came from the fact that pgadminIII reports the query taking 20-40ms, so I read the 0.091 as seconds not ms. -- Dimi Paun [EMAIL PROTECTED] Lattica, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, Nov 17, 2008 at 10:07 AM, Dimi Paun [EMAIL PROTECTED] wrote: On Mon, 2008-11-17 at 09:53 -0700, Scott Marlowe wrote: Are you saying it's excessive you need the compound query? Cause that's running in 91microseconds as pointed out by Alan. Of course, my bad. I read that as 91ms (blush/). Confusion came from the fact that pgadminIII reports the query taking 20-40ms, so I read the 0.091 as seconds not ms. Ahhh. Keep in mind that if you just run the query, pgadminIII will tell you how long it took to run AND return all the data across the network, so it will definitely take longer then. But most of that's network io wait so it's not a real issue unless you're saturating your network. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote: Ahhh. Keep in mind that if you just run the query, pgadminIII will tell you how long it took to run AND return all the data across the network, so it will definitely take longer then. But most of that's network io wait so it's not a real issue unless you're saturating your network. But that is brutal -- there's no way it can take 20ms for a request across an unloaded network. Moreover, I got something like this: pgadminIII | pgsql w/o index: 45ms 0.620ms w/ index 20ms 0.091ms How now I try to replicate, and I get 45ms in both cases. This is very misleading... -- Dimi Paun [EMAIL PROTECTED] Lattica, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, Nov 17, 2008 at 10:31 AM, Dimi Paun [EMAIL PROTECTED] wrote: On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote: Ahhh. Keep in mind that if you just run the query, pgadminIII will tell you how long it took to run AND return all the data across the network, so it will definitely take longer then. But most of that's network io wait so it's not a real issue unless you're saturating your network. But that is brutal -- there's no way it can take 20ms for a request across an unloaded network. Moreover, I got something like this: pgadminIII | pgsql w/o index: 45ms 0.620ms w/ index 20ms 0.091ms How now I try to replicate, and I get 45ms in both cases. This is very misleading... I'm guessing a fair bit of that time is pgadminIII prettifying the output for you, etc. I.e. it's not all transfer time. Hard to say without hooking some kind of profiler in pgadminIII. Is psql running local and pgadminIII remotely? Or are they both remote? If both psql and pgadminIII are remote (i.e. same basic circumstances) then it's got to be a difference in the client causing the extra time. OR is this output of explain analyze? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Nov 17, 2008, at 12:40 PM, Scott Marlowe wrote: On Mon, Nov 17, 2008 at 10:31 AM, Dimi Paun [EMAIL PROTECTED] wrote: On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote: Ahhh. Keep in mind that if you just run the query, pgadminIII will tell you how long it took to run AND return all the data across the network, so it will definitely take longer then. But most of that's network io wait so it's not a real issue unless you're saturating your network. But that is brutal -- there's no way it can take 20ms for a request across an unloaded network. Moreover, I got something like this: pgadminIII | pgsql w/o index: 45ms 0.620ms w/ index 20ms 0.091ms How now I try to replicate, and I get 45ms in both cases. This is very misleading... I'm guessing a fair bit of that time is pgadminIII prettifying the output for you, etc. I.e. it's not all transfer time. Hard to say without hooking some kind of profiler in pgadminIII. Is psql running local and pgadminIII remotely? Or are they both remote? If both psql and pgadminIII are remote (i.e. same basic circumstances) then it's got to be a difference in the client causing the extra time. OR is this output of explain analyze? Side note: I haven't seen pgAdminIII never show a time below 20ms (the time on the bottom right corner). When I do a query like this : select 1; it takes according to pgAdminIII around 20ms. (whatever that time is) what I normally do to find my real query time is put and explain analyse in front of my query to know to real query time. Ries -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, 2008-11-17 at 10:40 -0700, Scott Marlowe wrote: I'm guessing a fair bit of that time is pgadminIII prettifying the output for you, etc. I.e. it's not all transfer time. Hard to say without hooking some kind of profiler in pgadminIII. Is psql running local and pgadminIII remotely? Or are they both remote? If both psql and pgadminIII are remote (i.e. same basic circumstances) then it's got to be a difference in the client causing the extra time. OR is this output of explain analyze? With \timing on I get basically the same output (local vs remote) in psql (0.668ms vs. 0.760ms). More like it. WTH is pgadminIII reporting?!? -- Dimi Paun [EMAIL PROTECTED] Lattica, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance on simple query
On Mon, Nov 17, 2008 at 6:14 PM, Dimi Paun [EMAIL PROTECTED] wrote: On Mon, 2008-11-17 at 10:40 -0700, Scott Marlowe wrote: I'm guessing a fair bit of that time is pgadminIII prettifying the output for you, etc. I.e. it's not all transfer time. Hard to say without hooking some kind of profiler in pgadminIII. Is psql running local and pgadminIII remotely? Or are they both remote? If both psql and pgadminIII are remote (i.e. same basic circumstances) then it's got to be a difference in the client causing the extra time. OR is this output of explain analyze? With \timing on I get basically the same output (local vs remote) in psql (0.668ms vs. 0.760ms). More like it. WTH is pgadminIII reporting?!? Exactly what it's supposed to be, however it's using libpq's asynchronous query interface and has to pass the query result through the wxWidgets event handling system, both of which seem to add a few milliseconds to the overall query time from the quick testing I've just done. In a GUI app like pgAdmin, we need use this kind of architecture to allow the UI to continue processing events (such as button clicks, redraws etc), and to allow multiple windows to work independently without one query locking up the whole app. Note that the rendering time that Tom mentioned the other day which used to confuse things has not been an issue for a couple of years - that was dependent on resultset size and could lead to much bigger variations. that was fixed by having libpq act as a virtual data store for the UI instead of transferring data from the PGresult to the data grid's own data store. I think the bottom line is that you cannot compare psql and pgAdmin's timings because the architectures of the two apps are very different. Further, pgAdmin isn't the best choice for micro-optimisation of extremely fast queries. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance