[PERFORM] response time when querying via JDBC and via psql differs
Hi all, i have strange problem with performance in PostgreSQL (8.1.9). My problem shortly: I'm using postgreSQL via JDBC driver (postgresql-8.1-404.jdbc3.jar) and asking the database for search on table with approximately 3 000 000 records. I have created functional index table(lower(href) varchar_pattern_ops) because of lower case like searching. When i ask the database directly from psql, it returns result in 0,5 ms, but when i put the same command via jdbc driver, it returns in 10 000 ms. Where can be the problem?? Any problem with PostgreSQL tuning?? The command is select df.id as id, df.c as c, df.href as href, df.existing as existing, df.filesize as filesize from documentfile df where (lower(href) like 'aba001!_223581.djvu' escape '!' ) order by id limit 1 Thank you very much for any help, Kind regards, Pavel Rotek
Re: [PERFORM] Q on views and performance
On Fri, 22 Feb 2008, Kynn Jones wrote: Hi. I'm trying to optimize... (Q1) SELECT a1.word, a2.word FROM T a1 JOIN T a2 USING ( zipk ) WHERE a1.type = int1 AND a2.type = int2; Okay, try this: Create an index on T(type, zipk), and then CLUSTER on that index. That will effectively group all the data for one type together and sort it by zipk, making a merge join very quick indeed. I'm not sure whether Postgres will notice that, but it's worth a try. More specifically, how can I go about building table T and the views Vint?'s to maximize the performance of (Q1)? For example, I'm thinking that if T had an additional id column and were built in such a way that all the records belonging to each Vint? were physically contiguous, and (say) had contiguous values in the id column, then I could define each view like this The above index and CLUSTER will effectively do this - you don't need to introduce another field. Alternatively, you could go *really evil* and pre-join the table. Something like this: CREATE TABLE evilJoin AS SELECT a1.type AS type1, a2.type AS type2, a1.zipk, a1.word AS word1, a2.word AS word2 FROM T AS a1, T AS a2 WHERE a1.zipk = a2.zipk ORDER BY a1.type, a2.type, a1.zipk; CREATE INDEX evilIndex1 ON evilJoin(type1, type2, zipk); Then your query becomes: SELECT word1, word2 FROM evilJoin WHERE type1 = int1 AND type2 = int2 which should run quick. However, your cache usefulness will be reduced because of the extra volume of data. Matthew -- [About NP-completeness] These are the problems that make efficient use of the Fairy Godmother.-- Computer Science Lecturer ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Q on views and performance
So, this email is directed much more towards Postgres Powers That Be. I came across this problem a while ago, and I haven't checked whether it has been improved. On Mon, 25 Feb 2008, I wrote: Hi. I'm trying to optimize... (Q1) SELECT a1.word, a2.word FROM T a1 JOIN T a2 USING ( zipk ) WHERE a1.type = int1 AND a2.type = int2; Create an index on T(type, zipk), and then CLUSTER on that index. That will effectively group all the data for one type together and sort it by zipk, making a merge join very quick indeed. I'm not sure whether Postgres will notice that, but it's worth a try. Statistics are generated on fields in a table, and the one I'm interested in is the correlation coefficient which tells Postgres how costly an index scan sorted on that field would be. This entry is ONLY useful when the result needs to be sorted by that exact field only. For example: CREATE TABLE test (a int, b int); // insert a bazillion entries CREATE INDEX testIndex ON test(a, b); CLUSTER test ON testIndex; ANALYSE; So now we have a table sorted by (a, b), but the statistics only record the fact that it is sorted by a, and completely unsorted by b. If we run: SELECT * FROM test ORDER BY a; then the query will run quickly, doing an index scan. However, if we run: SELECT * FROM test ORDER BY a, b; then Postgres will not be able to use the index, because it cannot tell how sequential the fetches from the index will be. Especially if we run: SELECT * FROM test WHERE a = something ORDER BY b; then this is the case. So, these observations were made a long time ago, and I don't know if they have been improved. A while back I suggested a partial sort algorithm that could take a stream sorted by a and turn it into a stream sorted by (a, b) at small cost. That would fix some instances of the problem. However, now I suggest that the statistics are in the wrong place. At the moment, the correlation coefficient, which is an entry purely designed to indicate how good an index is at index scans, is a statistic on the first field of the index. Why not create a correlation coefficient statistic for the index as a whole instead, and store it elsewhere in the statistics data? That way, instead of having to infer from the first field how correlated an index is, and getting it wrong beyond the first field, you can just look up the correlation for the index. Opinions? Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] response time when querying via JDBC and via psql differs
The thing to remember here is that prepared statements are only planned once and strait queries are planned for each query. When you give the query planner some concrete input like in your example then it will happily use the index because it can check if the input starts with % or _. If you use JDBC to set up a prepared statement like: select df.id as id, df.c as c, df.href as href, df.existing as existing, df.filesize as filesize from documentfile df where (lower(href) like ? escape '!' ) order by id limit 1 then the query planner takes the safe route like Markus said and doesn't use the index. I think your best bet is to use connection.createStatement instead of connection.prepareStatement. The gain in query performance will offset the loss in planning overhead. I'm reasonably sure the plans are cached anyway. --Nik On Mon, Feb 25, 2008 at 6:10 AM, Markus Bertheau [EMAIL PROTECTED] wrote: 2008/2/25, Pavel Rotek [EMAIL PROTECTED]: I have created functional index table(lower(href) varchar_pattern_ops) because of lower case like searching. When i ask the database directly from psql, it returns result in 0,5 ms, but when i put the same command via jdbc driver, it returns in 10 000 ms. Where can be the problem?? Any problem with PostgreSQL tuning?? Most likely the problem is that the JDBC driver uses prepared statements, in which the query is planned withouth the concrete argument value. For like only patterns that don't start with % or _ can use the index. Without the argument value PostgreSQL can't tell whether that is the case, so it takes the safe route and chooses a sequential scan. to solve this particular problem, you have to convince jdbc to not use a prepared statement for this particular query. Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Weird issue with planner choosing seq scan
On Feb 24, 2008, at 4:27 PM, Scott Marlowe wrote: On Sun, Feb 24, 2008 at 6:05 PM, Sean Leach [EMAIL PROTECTED] wrote: On Feb 24, 2008, at 4:03 PM, Scott Marlowe wrote: What version pgsql is this? If it's pre 8.0 it might be worth looking into migrating for performance and maintenance reasons. It's the latest 8.3.0 release :( Urg. Then I wonder how your indexes are bloating but your table is not... you got autovac running? No weird lock issues? It's a side issue right now since the table is showing as non-bloated (unless you've got a long running transaction and that number is WAY off from your vacuum) Autovac is running, but probably not tuned. I am looking at my max_fsm_pages setting to up as vacuum says, but not sure which value to use (all the posts on the web refer to what looks like an old vacuum output format), is this the line to look at? INFO: u_counts: found 0 removable, 6214708 nonremovable row versions in 382344 pages DETAIL: 2085075 dead row versions cannot be removed yet. I.e. I need 382344 max_fsm_pages? No weird lock issues that we have seen. So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? Thanks! Sean ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Weird issue with planner choosing seq scan
On Sun, 24 Feb 2008, Tom Lane wrote: Sean Leach [EMAIL PROTECTED] writes: I have a table, that in production, currently has a little over 3 million records in production. In development, the same table has about 10 million records (we have cleaned production a few weeks ago). You mean the other way around, to judge by the rowcounts from EXPLAIN. - Index Scan using u_counts_i2 on u_counts c (cost=0.00..53.53 rows=1082 width=4) (actual time=0.277..1224.582 rows=392173 loops=1) I kinda think the devel system wouldn't be using an indexscan either if it had up-to-date ANALYZE statistics. But even with the 1082 row estimate that seems a remarkably low cost estimate. Seems pretty obvious to me. The table is obviously going to be well ordered by the timestamp, if that's the time that the entries are inserted into the table. So the index is going to have a very good correlation with the order of the table, which is why the estimated cost for the index scan is so low. The production table will be more active than the development table, so the entries in it will be more recent. The entries that were cleaned out a while ago are all irrelevant, because they will be old ones, and we are specifically searching for new entries. Because the production table is more active, even though it is smaller, the results of the search will be bigger (as seen in the explain analyse results), pushing it over the limit and making a sequential scan more attractive. Matthew -- Failure is not an option. It comes bundled with your Microsoft product. -- Ferenc Mantfeld ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Q on views and performance
On Mon, Feb 25, 2008 at 8:45 AM, Matthew [EMAIL PROTECTED] wrote: On Fri, 22 Feb 2008, Kynn Jones wrote: Hi. I'm trying to optimize... (Q1) SELECT a1.word, a2.word FROM T a1 JOIN T a2 USING ( zipk ) WHERE a1.type = int1 AND a2.type = int2; Okay, try this: Create an index on T(type, zipk), and then CLUSTER on that index... This is just GREAT!!! It fits the problem to a tee. Many, many thanks! Also, including zipk in the index is a really nice extra boost. (If you hadn't mentioned it I would have just settled for clustering only on type...) Thanks for that also! Kynn
Re: [PERFORM] Q on views and performance
On Mon, 25 Feb 2008, Kynn Jones wrote: This is just GREAT!!! It fits the problem to a tee. It makes the queries quick then? Matthew -- The only secure computer is one that's unplugged, locked in a safe, and buried 20 feet under the ground in a secret location...and i'm not even too sure about that one. --Dennis Huges, FBI ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] LISTEN / NOTIFY performance in 8.3
Joel Stevenson [EMAIL PROTECTED] writes: Also, it might be worth enabling log_lock_waits to see if the slow notifies are due to having to wait on some lock or other. Turning on log_lock_waits shows that there is a lot of waiting for locks on the pg_listener table ala: Interesting. The LISTEN/NOTIFY mechanism itself takes ExclusiveLock on pg_listener, but never for very long at a time (assuming pg_listener doesn't get horribly bloated, which we know isn't happening for you). Another thought that comes to mind is that maybe the delays you see come from these lock acquisitions getting blocked behind autovacuums of pg_listener. I did not see that while trying to replicate your problem, but maybe the issue requires more update load on pg_listener than the test script can create by itself, or maybe some nondefault autovacuum setting is needed --- what are you using? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] LISTEN / NOTIFY performance in 8.3
Also, it might be worth enabling log_lock_waits to see if the slow notifies are due to having to wait on some lock or other. Turning on log_lock_waits shows that there is a lot of waiting for locks on the pg_listener table ala: process 22791 still waiting for ExclusiveLock on relation 2614 of database 16387 after 992.397 ms ... process 22791 acquired ExclusiveLock on relation 2614 of database 16387 after 1433.152 ms deadlock_timeout is left at the default 1s setting. Though these are being generated during application activity - running the simulation script does produce 300ms - 600ms NOTIFY statements but doesn't (at the moment) trigger a lock_wait log entry. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] LISTEN / NOTIFY performance in 8.3
At 2:57 PM -0500 2/25/08, Tom Lane wrote: It's weird that the behavior is robust for you but I can't make it happen at all. Would you show the output of pg_config, as well as all your nondefault postgresql.conf settings? pg_config: BINDIR = /usr/local/pgsql/bin DOCDIR = /usr/local/pgsql/doc INCLUDEDIR = /usr/local/pgsql/include PKGINCLUDEDIR = /usr/local/pgsql/include INCLUDEDIR-SERVER = /usr/local/pgsql/include/server LIBDIR = /usr/local/pgsql/lib PKGLIBDIR = /usr/local/pgsql/lib LOCALEDIR = MANDIR = /usr/local/pgsql/man SHAREDIR = /usr/local/pgsql/share SYSCONFDIR = /usr/local/pgsql/etc PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = 'CFLAGS=-O2 -pipe' '--with-openssl' '--enable-thread-safety' '--with-includes=/usr/kerberos/include' '--with-perl' CC = gcc CPPFLAGS = -D_GNU_SOURCE -I/usr/kerberos/include CFLAGS = -O2 -pipe -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -fno-strict-aliasing CFLAGS_SL = -fpic LDFLAGS = -Wl,-rpath,'/usr/local/pgsql/lib' LDFLAGS_SL = LIBS = -lpgport -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt -ldl -lm VERSION = PostgreSQL 8.3.0 Non-default postgresql.conf settings: max_connections = 80 ssl = on shared_buffers = 1GB work_mem = 100MB maintenance_work_mem = 100MB max_fsm_pages = 204800 vacuum_cost_delay = 100 wal_buffers = 124kB wal_writer_delay = 200ms commit_delay = 100 checkpoint_segments = 6 effective_cache_size = 6GB ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] when is a DELETE FK trigger planned?
I have a cascading delete trigger that is obviously using a seqscan. (Explain analyze shows that trigger as taking over 1000s while all other triggers are 1s. The value in test delete didn't even appear in this child table, so an index scan would have been almost instant.) If I do DELETE FROM child_table WHERE fkey = value; I get an index scan. Why doesn't the trigger do that, and how can I force it to re-plan? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Weird issue with planner choosing seq scan
Sean Leach wrote: On Feb 24, 2008, at 4:27 PM, Scott Marlowe wrote: Urg. Then I wonder how your indexes are bloating but your table is not... you got autovac running? No weird lock issues? It's a side issue right now since the table is showing as non-bloated (unless you've got a long running transaction and that number is WAY off from your vacuum) Autovac is running, but probably not tuned. I am looking at my max_fsm_pages setting to up as vacuum says, but not sure which value to use (all the posts on the web refer to what looks like an old vacuum output format), is this the line to look at? INFO: u_counts: found 0 removable, 6214708 nonremovable row versions in 382344 pages DETAIL: 2085075 dead row versions cannot be removed yet. I.e. I need 382344 max_fsm_pages? No weird lock issues that we have seen. I think the hint and warning are referring to this line: 281727 pages contain useful free space. But you're likely to have other relations in your database that have useful free space too. What this warning is saying is that at least some of the useful free space in that table will not be re-used for new rows or row versions, because it is impossible for the free space map to have references to all of the pages with usable space, since it is too small to hold that much information. So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? If your usage pattern results in generating all of that unused space in one transaction, and no further inserts or updates to that table till next time you run the same process, then my guess is that you probably should run a vacuum full on that table after each aggregation run. In that case you wouldn't have to increase max_fsm_pages solely to keep track of large amount of unused space in that table, since you're cleaning it up as soon as you're generating it. You earlier had 5.5 million row versions, 2 million of them dead but not yet removable, and you said (even earlier) that the table had 3.3 million rows in it. You now say you've got 6.2 million row versions (with the same 2M dead). So it looks like you're creating new row versions at quite a pace, in which case increasing max_fsm_pages, and not worrying about doing a vacuum full _every_ time is probably a good idea. Have you checked Scott Marlowe's note: unless you've got a long running transaction How come those 2 million dead rows are not removable yet? My guess (based on a quick search of the mailing lists) would be that they were generated from your aggregation run, and that a long running transaction started before your aggregation run committed (possibly even before it started), and that transaction is still alive. Alternatively, it may be a different 2 million dead row versions now than earlier, and may simply be a side effect of your particular usage, and nothing to worry about. (Though it is exactly the same number of rows, which strongly hints at being exactly the same rows.) Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Weird issue with planner choosing seq scan
On Feb 25, 2008, at 1:19 PM, Stephen Denne wrote: So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? If your usage pattern results in generating all of that unused space in one transaction, and no further inserts or updates to that table till next time you run the same process, then my guess is that you probably should run a vacuum full on that table after each aggregation run. In that case you wouldn't have to increase max_fsm_pages solely to keep track of large amount of unused space in that table, since you're cleaning it up as soon as you're generating it. You earlier had 5.5 million row versions, 2 million of them dead but not yet removable, and you said (even earlier) that the table had 3.3 million rows in it. You now say you've got 6.2 million row versions (with the same 2M dead). So it looks like you're creating new row versions at quite a pace, in which case increasing max_fsm_pages, and not worrying about doing a vacuum full _every_ time is probably a good idea. So 281727 should be the minimum I bump it to correct? Have you checked Scott Marlowe's note: unless you've got a long running transaction How come those 2 million dead rows are not removable yet? My guess (based on a quick search of the mailing lists) would be that they were generated from your aggregation run, and that a long running transaction started before your aggregation run committed (possibly even before it started), and that transaction is still alive. Alternatively, it may be a different 2 million dead row versions now than earlier, and may simply be a side effect of your particular usage, and nothing to worry about. (Though it is exactly the same number of rows, which strongly hints at being exactly the same rows.) Great detective work, you are correct. We have a daemon that runs and is constantly adding new data to that table, then we aggregated it daily (I said weekly before, I was incorrect) - which deletes several rows as it updates a bunch of others. So it sounds like upping max_fsm_pages is the best option. Thanks again everyone! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Weird issue with planner choosing seq scan
Sean Leach wrote On Feb 25, 2008, at 1:19 PM, Stephen Denne wrote: So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? If your usage pattern results in generating all of that unused space in one transaction, and no further inserts or updates to that table till next time you run the same process, then my guess is that you probably should run a vacuum full on that table after each aggregation run. In that case you wouldn't have to increase max_fsm_pages solely to keep track of large amount of unused space in that table, since you're cleaning it up as soon as you're generating it. You earlier had 5.5 million row versions, 2 million of them dead but not yet removable, and you said (even earlier) that the table had 3.3 million rows in it. You now say you've got 6.2 million row versions (with the same 2M dead). So it looks like you're creating new row versions at quite a pace, in which case increasing max_fsm_pages, and not worrying about doing a vacuum full _every_ time is probably a good idea. So 281727 should be the minimum I bump it to correct? Please know that I'm very new at advising PostgreSQL users how they should tune their system... My understanding of your vacuum verbose output was that it was pointing out that max_fsm_pages was currently smaller than 281727, so therefore there was no way it could contain mappings to all the reusable space. However I don't think it is hinting at, nor recommending a value that you should be using. If you do nothing, then this number of pages with reusable space will probably continue to grow, therefore, it probably has been growing. So, for example, if your max_fsm_pages is currently only 2, then perhaps 2 of the 281727 pages with reusable space are in the free space map. The remaining 26 pages _may_ have been generated through 20 different processes each of which created 13000 more pages with reusable space than the map could reference. If that was the case, then a max_fsm_pages of 33000 might be large enough. Do you see what I'm getting at? I think that you should do a vacuum full of that table once, then monitor the number of pages in it with reusable space for a while (over a few iterations of your regular processes). That should give you information about how much larger your max_fsm_pages should be than it currently is. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Weird issue with planner choosing seq scan
On Feb 25, 2008, at 2:59 PM, Stephen Denne wrote: Please know that I'm very new at advising PostgreSQL users how they should tune their system... I'd never have known it if you hadn't said anything My understanding of your vacuum verbose output was that it was pointing out that max_fsm_pages was currently smaller than 281727, so therefore there was no way it could contain mappings to all the reusable space. However I don't think it is hinting at, nor recommending a value that you should be using. If you do nothing, then this number of pages with reusable space will probably continue to grow, therefore, it probably has been growing. So, for example, if your max_fsm_pages is currently only 2, then perhaps 2 of the 281727 pages with reusable space are in the free space map. The remaining 26 pages _may_ have been generated through 20 different processes each of which created 13000 more pages with reusable space than the map could reference. If that was the case, then a max_fsm_pages of 33000 might be large enough. Do you see what I'm getting at? I think that you should do a vacuum full of that table once, then monitor the number of pages in it with reusable space for a while (over a few iterations of your regular processes). That should give you information about how much larger your max_fsm_pages should be than it currently is. This sounds sane to me, will do. Thanks again! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org