Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(
On 17/01/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Try a), b), and c) in order on the offending tables as they address the problem at increasing cost...thanks alot for the detailed information! the entire concept of vacuum isn'tyet that clear to me, so your explanations and hints are very much appreciated. i'll defenitely try these steps this weekend when the next fullvacuum was scheduled :-)Thanks guys, that pretty much answered my question(s) too. I have a sneaking suspicion that vacuuming won't do too much for us however... now that I think about it - we do very little removing, pretty much only inserts and selects. I will give it a vacuum full and see what happens. CheersAntoine -- This is where I should put some witty comment.
[PERFORM] Autovacuum / full vacuum
hi, I'm curious as to why autovacuum is not designed to do full vacuum. I know that the necessity of doing full vacuums can be reduced by increasing the FSM, but in my opinion that is the wrong decision for many applications. My application does not continuously insert/update/delete tuples at a constant rate. Basically there are long periods of relatively few modifications and short burst of high activity. Increasing the FSM so that even during these bursts most space would be reused would mean to reduce the available memory for all other database tasks. So my question is: What's the use of an autovacuum daemon if I still have to use a cron job to do full vacuums? wouldn't it just be a minor job to enhance autovacuum to be able to perform full vacuums, if one really wants it to do that - even if some developers think that it's the wrong approach? Mike ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
So my question is: What's the use of an autovacuum daemon if I still have to use a cron job to do full vacuums? wouldn't it just be a minor job to enhance autovacuum to be able to perform full vacuums, if one really wants it to do that - even if some developers think that it's the wrong approach? You should never have to do full vacuums... Chris ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Autovacuum / full vacuum
Hi, did you read my post? In the first part I explained why I don't want to increase the FSM that much. Mike So my question is: What's the use of an autovacuum daemon if I still have to use a cron job to do full vacuums? wouldn't it just be a minor job to enhance autovacuum to be able to perform full vacuums, if one really wants it to do that - even if some developers think that it's the wrong approach? You should never have to do full vacuums... Chris ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Use of Stored Procedures and
Hi, I already read the documentation for to use the SPI_PREPARE and SPI_EXEC... but sincerely I don't understand how I will use this resource in my statements. I looked for examples, but I din't good examples :(.. Somebody can help me? Thanks. Marcos. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Autovacuum / full vacuum
You should never have to do full vacuums... I would rather say, You should never have to do full vacuums by any periodic means. It may be done on a adhoc basis, when you have figured out that your table is never going to grow that big again. On 1/17/06, Christopher Kings-Lynne [EMAIL PROTECTED] wrote: So my question is: What's the use of an autovacuum daemon if I still have to use a cron job to do full vacuums? wouldn't it just be a minor job to enhance autovacuum to be able to perform full vacuums, if one really wants it to do that - even if some developers think that it's the wrong approach? You should never have to do full vacuums... Chris ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 11:33:02AM +0100, Michael Riess wrote: did you read my post? In the first part I explained why I don't want to increase the FSM that much. Since you didn't quantify it, that wasn't much of a data point. (IOW, you'd generally have to be seriously resource constrained before the FSM would be a significant source of memory consumption--in which case more RAM would probably be a much better solution than screwing with autovacuum.) Mike Stone ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Autovacuum / full vacuum
Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. Because a VACUUM FULL is too invasive. Lazy vacuum is so light on the system w.r.t. locks that it's generally not a problem to start one at any time. On the contrary, vacuum full could be a disaster on some situations. What's more, in general a lazy vacuum is enough to keep the dead space within manageability, given a good autovacuum configuration and good FSM configuration, so there's mostly no need for full vacuum. (This is the theory at least.) For the situations where there is a need, we tell you to issue it manually. So my question is: What's the use of an autovacuum daemon if I still have to use a cron job to do full vacuums? wouldn't it just be a minor job to enhance autovacuum to be able to perform full vacuums, if one really wants it to do that - even if some developers think that it's the wrong approach? Yes, it is a minor job to enhance it to perform vacuum full. The problem is having a good approach to determining _when_ to issue a full vacuum, and having a way to completely disallow it. If you want to do the development work, be my guest (but let us know your design first). If you don't, I guess you would have to wait until it comes high enough on someone's to-do list, maybe because you convinced him (or her, but we don't have Postgres-ladies at the moment AFAIK) monetarily or something. You can, of course, produce a patch and use it internally. This is free software, remember. -- Alvaro Herrera Developer, http://www.PostgreSQL.org God is real, unless declared as int ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(
On Tue, Jan 17, 2006 at 09:14:27AM +0100, Antoine wrote: think about it - we do very little removing, pretty much only inserts and selects. I will give it a vacuum full and see what happens. UPDATES? Remember that, in Postgres, UPDATE is effectively DELETE + INSERT (from the point of view of storage, not the point of view of the user). A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] new to postgres (and db management) and performance
Hi, Tom, Tom Lane wrote: in our db system (for a website), i notice performance boosts after a vacuum full. but then, a VACUUM FULL takes 50min+ during which the db is not really accessible to web-users. is there another way to perform maintenance tasks AND leaving the db fully operable and accessible? You're not doing regular vacuums often enough. It may also help to increase the max_fsm_pages setting, so postmaster has more memory to remember freed pages between VACUUMs. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] new to postgres (and db management) and performance
Hi, Thomas, [EMAIL PROTECTED] wrote: Try a), b), and c) in order on the offending tables as they address the problem at increasing cost... thanks alot for the detailed information! the entire concept of vacuum isn't yet that clear to me, so your explanations and hints are very much appreciated. i'll defenitely try these steps this weekend when the next full vacuum was scheduled :-) Basically, VACUUM scans the whole table and looks for pages containing garbage rows (or row versions), deletes the garbage, and adds those pages to the free space map (if there are free slots). When allocating new rows / row versions, PostgreSQL first tries to fit them in pages from the free space maps before allocating new pages. This is why a high max_fsm_pages setting can help when VACUUM freqency is low. VACUUM FULL additionally moves rows between pages, trying to concentrate all the free space at the end of the tables (aka defragmentation), so it can then truncate the files and release the space to the filesystem. CLUSTER basically rebuilds the tables by copying all rows into a new table, in index order, and then dropping the old table, which also reduces fragmentation, but not as strong as VACUUM FULL might. ANALYZE creates statistics about the distribution of values in a column, allowing the query optimizer to estimate the selectivity of query criteria. (This explanation is rather simplified, and ignores indices as well as the fact that a table can consist of multiple files. Also, I believe that newer PostgreSQL versions allow VACUUM to truncate files when free pages happen to appear at the very end of the file.) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Autovacuum / full vacuum
I'm curious as to why autovacuum is not designed to do full vacuum. Because that's terribly invasive due to the locks it takes out. Lazy vacuum may chew some I/O, but it does *not* block your application for the duration. VACUUM FULL blocks the application. That is NOT something that anyone wants to throw into the activity mix randomly. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxdatabases.info/info/slony.html Signs of a Klingon Programmer #11: This machine is a piece of GAGH! I need dual Pentium processors if I am to do battle with this code! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Autovacuum / full vacuum
VACUUM FULL blocks the application. That is NOT something that anyone wants to throw into the activity mix randomly. There must be a way to implement a daemon which frees up space of a relation without blocking it too long. It could abort after a certain number of blocks have been freed and then move to the next relation. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Autovacuum / full vacuum
Well, I think that the documentation is not exactly easy to understand. I always wondered why there are no examples for common postgresql configurations. All I know is that the default configuration seems to be too low for production use. And while running postgres I get no hints as to which setting needs to be increased to improve performance. I have no chance to see if my FSM settings are too low other than to run vacuum full verbose in psql, pipe the result to a text file and grep for some words to get a somewhat comprehensive idea of how much unused space there is in my system. Don't get me wrong - I really like PostgreSQL and it works well in my application. But somehow I feel that it might run much better ... about the FSM: You say that increasing the FSM is fairly cheap - how should I know that? did you read my post? In the first part I explained why I don't want to increase the FSM that much. No, you didn't. You explained *that* you thought you didn't want to increase the FSM. You didn't explain why. FSM expansion comes fairly cheap ... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. activity. Increasing the FSM so that even during these bursts most space would be reused would mean to reduce the available memory for all other database tasks. I don't believe the hit is enough that you should even notice it. You'd have to post some pretty incredible use cases to show that the tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the loss of efficiency you get from having some preallocated pages in tables. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Autovacuum / full vacuum
Hi, hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. I thought that vacuum full only locks the table which it currently operates on? I'm pretty sure that once a table has been vacuumed, it can be accessed without any restrictions while the vacuum process works on the next table. activity. Increasing the FSM so that even during these bursts most space would be reused would mean to reduce the available memory for all other database tasks. I don't believe the hit is enough that you should even notice it. You'd have to post some pretty incredible use cases to show that the tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the loss of efficiency you get from having some preallocated pages in tables. I have 5000 tables and a workstation with 1 GB RAM which hosts an Apache Web Server, Tomcat Servlet Container and PostgreSQL. RAM is not something that I have plenty of ... and the hardware is fixed and cannot be changed. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote: about the FSM: You say that increasing the FSM is fairly cheap - how should I know that? Why would you assume otherwise, to the point of not considering changing the setting? The documentation explains how much memory is used for FSM entries. If you look at vacuum verbose output it will tell you how much memory you're currently using for the FSM. Mike Stone ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote: always wondered why there are no examples for common postgresql configurations. You mean like this one? (for 8.0): http://www.powerpostgresql.com/Downloads/annotated_conf_80.html All I know is that the default configuration seems to be too low for production use. Define production use. It may be too low for you. chance to see if my FSM settings are too low other than to run vacuum full verbose in psql, pipe the result to a text file and grep for some Not true. You don't need a FULL on there to figure this out. about the FSM: You say that increasing the FSM is fairly cheap - how should I know that? Do the math. The docs say this: --snip--- max_fsm_pages (integer) Sets the maximum number of disk pages for which free space will be tracked in the shared free-space map. Six bytes of shared memory are consumed for each page slot. This setting must be more than 16 * max_fsm_relations. The default is 2. This option can only be set at server start. max_fsm_relations (integer) Sets the maximum number of relations (tables and indexes) for which free space will be tracked in the shared free-space map. Roughly seventy bytes of shared memory are consumed for each slot. The default is 1000. This option can only be set at server start. ---snip--- So by default, you have 6 B * 20,000 = 120,000 bytes for the FSM pages. By default, you have 70 B * 1,000 = 70,000 bytes for the FSM relations. Now, there are two knobs. One of them tracks the number of relations. How many relations do you have? Count the number of indexes and tables you have, and give yourself some headroom in case you add some more, and poof, you have your number for the relations. Now all you need to do is figure out what your churn rate is on tables, and count up how many disk pages that's likely to be. Give yourself a little headroom, and the number of FSM pages is done, too. This churn rate is often tough to estimate, though, so you may have to fiddle with it from time to time. A -- Andrew Sullivan | [EMAIL PROTECTED] I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 09:09:02AM -0500, Matthew T. O'Connor wrote: vacuum. As long as that percentage is small enough, the effect on performance is negligible. Have you measured to see if things are truly Actually, as long as the percentage is small enough and the pages are really empty, the performance effect is positive. If you have VACUUM FULLed table, inserts have to extend the table before inserting, whereas in a table with some space reclaimed, the I/O effect of having to allocate another disk page is already done. A -- Andrew Sullivan | [EMAIL PROTECTED] When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
On 1/17/06, Michael Riess [EMAIL PROTECTED] wrote: about the FSM: You say that increasing the FSM is fairly cheap - howshould I know that?comment from original postgresql.conf file seems pretty obvious:#max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes eachbasically setting max_fsm_pages to 100 consumes 6 megabytes. and i definitelly doubt you will ever hit that high.depesz
Re: [PERFORM] Autovacuum / full vacuum
Michael Riess [EMAIL PROTECTED] writes: I'm curious as to why autovacuum is not designed to do full vacuum. Locking considerations. VACUUM FULL takes an exclusive lock, which blocks any foreground transactions that want to touch the table --- so it's really not the sort of thing you want being launched at unpredictable times. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 04:04:41PM +0100, Michael Riess wrote: I thought that vacuum full only locks the table which it currently operates on? I'm pretty sure that once a table has been vacuumed, it can be accessed without any restrictions while the vacuum process works on the next table. Yes, I think the way I phrased it was unfortunate. But if you issue VACUUM FULL you'll get an exclusive lock on everything, although not all at the same time. But of course, if your query load is like this BEGIN; SELECT from t1, t2 where t1.col1 = t2.col2; [application logic] UPDATE t3 . . . COMMIT; you'll find yourself blocked in the first statement on both t1 and t2; and then on t3 as well. You sure don't want that to happen automagically, in the middle of your business day. I have 5000 tables and a workstation with 1 GB RAM which hosts an Apache Web Server, Tomcat Servlet Container and PostgreSQL. RAM is not something that I have plenty of ... and the hardware is fixed and cannot be changed. I see. Well, I humbly submit that your problem is not the design of the PostgreSQL server, then. The hardware is fixed and cannot be changed, is the first optimisation I'd make. Heck, I gave away a box to charity only two weeks ago that would solve your problem better than automatically issuing VACUUM FULL. A -- Andrew Sullivan | [EMAIL PROTECTED] Information security isn't a technological problem. It's an economics problem. --Bruce Schneier ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Autovacuum / full vacuum
Michael Riess [EMAIL PROTECTED] writes: But actually I never understood why the database system slows down at all when there is much unused space in the files. Perhaps some of your common queries are doing sequential scans? Those would visit the empty pages as well as the full ones. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Autovacuum / full vacuum
On Tue, 2006-01-17 at 09:08, Andrew Sullivan wrote: On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote: always wondered why there are no examples for common postgresql configurations. You mean like this one? (for 8.0): http://www.powerpostgresql.com/Downloads/annotated_conf_80.html I have to admit, looking at the documentation, that we really don't explain this all that well in the administration section, and I can see how easily led astray beginners are. I think it's time I joined the pgsql-docs mailing list... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
[EMAIL PROTECTED] (Andrew Sullivan) writes: On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. That's a bit more than what autovacuum would probably do... autovacuum does things table by table, so that what would be locked should just be one table. Even so, I'd not be keen on having anything that runs automatically take an exclusive lock on even as much as a table. activity. Increasing the FSM so that even during these bursts most space would be reused would mean to reduce the available memory for all other database tasks. I don't believe the hit is enough that you should even notice it. You'd have to post some pretty incredible use cases to show that the tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the loss of efficiency you get from having some preallocated pages in tables. There is *a* case for setting up full vacuums of *some* objects. If you have a table whose tuples all get modified in the course of some common query, that will lead to a pretty conspicuous bloating of *that table.* Even with a big FSM, the pattern of how updates take place will lead to that table having ~50% of its space being dead/free, which is way higher than the desirable stable proportion of 10-15%. For that sort of table, it may be attractive to run VACUUM FULL on a regular basis. Of course, it may also be attractive to try to come up with an update process that won't kill the whole table's contents at once ;-). -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://cbbrowne.com/info/x.html As long as each individual is facing the TV tube alone, formal freedom poses no threat to privilege. --Noam Chomsky ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] sum of left join greater than its parts
8.1.1, everything vacuumed/analyzed. basically i have two queries that when executed individually run quite quickly, but if I try to left join the second query onto the first, everything gets quite a bit slower. rms=# explain analyze rms-# SELECT rms-# software_download.* rms-# FROM rms-# ( rms(# SELECT rms(# host_id, max(mtime) as mtime rms(# FROM rms(# software_download rms(# WHERE rms(# bds_status_id not in (6,17,18) rms(# GROUP BY rms(# host_id, software_binary_id rms(# ) latest_download rms-# JOIN software_download using (host_id,mtime) rms-# JOIN software_binary b USING (software_binary_id) rms-# WHERE rms-# binary_type_id IN (3,5,6); QUERY PLAN - Hash Join (cost=870.00..992.56 rows=1 width=96) (actual time=90.566..125.782 rows=472 loops=1) Hash Cond: ((outer.host_id = inner.host_id) AND (outer.?column2? = inner.mtime)) - HashAggregate (cost=475.88..495.32 rows=1555 width=16) (actual time=51.300..70.761 rows=10870 loops=1) - Seq Scan on software_download (cost=0.00..377.78 rows=13080 width=16) (actual time=0.010..23.700 rows=13167 loops=1) Filter: ((bds_status_id 6) AND (bds_status_id 17) AND (bds_status_id 18)) - Hash (cost=379.37..379.37 rows=2949 width=96) (actual time=39.167..39.167 rows=639 loops=1) - Hash Join (cost=5.64..379.37 rows=2949 width=96) (actual time=0.185..37.808 rows=639 loops=1) Hash Cond: (outer.software_binary_id = inner.software_binary_id) - Seq Scan on software_download (cost=0.00..277.16 rows=13416 width=96) (actual time=0.008..19.338 rows=13416 loops=1) - Hash (cost=5.59..5.59 rows=20 width=4) (actual time=0.149..0.149 rows=22 loops=1) - Seq Scan on software_binary b (cost=0.00..5.59 rows=20 width=4) (actual time=0.011..0.108 rows=22 loops=1) Filter: ((binary_type_id = 3) OR (binary_type_id = 5) OR (binary_type_id = 6)) Total runtime: 126.704 ms (13 rows) rms=# explain analyze rms-# SELECT rms-# entityid, rmsbinaryid, rmsbinaryid as software_binary_id, timestamp as downloaded, ia.host_id rms-# FROM rms-# (SELECT rms(# entityid, rmsbinaryid,max(msgid) as msgid rms(# FROM rms(# msg306u rms(# WHERE rms(# downloadstatus=1 rms(# GROUP BY entityid,rmsbinaryid rms(# ) a1 rms-# JOIN myapp_app ia on (entityid=myapp_app_id) rms-# JOIN rms-# (SELECT * rms(# FROM msg306u rms(# WHERE rms(# downloadstatus != 0 rms(# ) a2 USING(entityid,rmsbinaryid,msgid) rms-# ; QUERY PLAN --- Nested Loop (cost=1733.79..4620.38 rows=1 width=20) (actual time=81.160..89.826 rows=238 loops=1) - Nested Loop (cost=1733.79..4615.92 rows=1 width=20) (actual time=81.142..86.826 rows=238 loops=1) Join Filter: (outer.rmsbinaryid = inner.rmsbinaryid) - HashAggregate (cost=1733.79..1740.92 rows=570 width=12) (actual time=81.105..81.839 rows=323 loops=1) - Bitmap Heap Scan on msg306u (cost=111.75..1540.65 rows=25752 width=12) (actual time=4.490..41.233 rows=25542 loops=1) - Bitmap Index Scan on rht3 (cost=0.00..111.75 rows=25752 width=0) (actual time=4.248..4.248 rows=25542 loops=1) - Index Scan using msg306u_entityid_msgid_idx on msg306u (cost=0.00..5.02 rows=1 width=20) (actual time=0.008..0.010 rows=1 loops=323) Index Cond: ((outer.entityid = msg306u.entityid) AND (outer.?column3? = msg306u.msgid)) Filter: (downloadstatus '0'::text) - Index Scan using myapp_app_pkey on myapp_app ia (cost=0.00..4.44 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=238) Index Cond: (outer.entityid = ia.myapp_app_id) Total runtime: 90.270 ms (12 rows) and here
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 09:59:25AM -0600, Scott Marlowe wrote: I have to admit, looking at the documentation, that we really don't explain this all that well in the administration section, and I can see how easily led astray beginners are. I understand what you mean, but I suppose my reaction would be that what we really need is a place to keep these things, with a note in the docs that the best practice settings for these are documented at some url, and evolve over time as people gain expertise with the new features. I suspect, for instance, that nobody knows exactly the right settings for any generic workload yet under 8.1 (although probably people know them well enough for particular workloads). A -- Andrew Sullivan | [EMAIL PROTECTED] This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
Hi, Michael, Michael Riess wrote: But actually I never understood why the database system slows down at all when there is much unused space in the files. Are the unused pages cached by the system, or is there another reason for the impact on the performance? No, they are not cached as such, but PostgreSQL caches whole pages, and you don't have only empty pages, but also lots of partially empty pages, so the signal/noise ratio is worse (means PostgreSQL has to fetch more pages to get the same data). Sequential scans etc. are also slower. And some file systems get slower when files get bigger or there are more files, but this effect should not really be noticeable here. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Autovacuum / full vacuum
Chris Browne wrote: [EMAIL PROTECTED] (Andrew Sullivan) writes: On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. That's a bit more than what autovacuum would probably do... autovacuum does things table by table, so that what would be locked should just be one table. Even a database-wide vacuum does not take locks on more than one table. The table locks are acquired and released one by one, as the operation proceeds. And as you know, autovacuum (both 8.1's and contrib) does issue database-wide vacuums, if it finds a database close to an xid wraparound. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre Las mujeres son como hondas: mientras más resistencia tienen, más lejos puedes llegar con ellas (Jonas Nightingale, Leap of Faith) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
On Tue, 2006-01-17 at 11:16, Andrew Sullivan wrote: On Tue, Jan 17, 2006 at 09:59:25AM -0600, Scott Marlowe wrote: I have to admit, looking at the documentation, that we really don't explain this all that well in the administration section, and I can see how easily led astray beginners are. I understand what you mean, but I suppose my reaction would be that what we really need is a place to keep these things, with a note in the docs that the best practice settings for these are documented at some url, and evolve over time as people gain expertise with the new features. I suspect, for instance, that nobody knows exactly the right settings for any generic workload yet under 8.1 (although probably people know them well enough for particular workloads). But the problem is bigger than that. The administrative docs were obviously evolved over time, and now they kind of jump around and around covering the same subject from different angles and at different depths. Even I find it hard to find what I need, and I know PostgreSQL administration well enough to be pretty darned good at it. For the beginner, it must seem much more confusing. The more I look at the administration section of the docs, the more I want to reorganize the whole thing, and rewrite large sections of it as well. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 11:43:14AM -0500, Chris Browne wrote: [EMAIL PROTECTED] (Andrew Sullivan) writes: Because nothing that runs automatically should ever take an exclusive lock on the entire database, That's a bit more than what autovacuum would probably do... Or even VACUUM FULL, as I tried to make clearer in another message: the way I phrased it suggests that it's a simultaneous lock on the entire database (when it is most certainly not). I didn't intend to mislead; my apologies. Note, though, that the actual effect for a user might look worse than a lock on the entire database, though, if you conider statement_timeout and certain use patterns. Suppose you want to issue occasional VACCUM FULLs, but your application is prepared for this, and depends on statement_timeout to tell it sorry, too long, try again. Now, if the exclusive lock on any given table takes less than statement_timeout, so that each statement is able to continue in its time, the application looks like it's having an outage _even though_ it is actually blocked on vacuums. (Yes, it's poor application design. There's plenty of that in the world, and you can't always fix it.) There is *a* case for setting up full vacuums of *some* objects. If you have a table whose tuples all get modified in the course of some common query, that will lead to a pretty conspicuous bloating of *that table.* Sure. And depending on your use model, that might be good. In many cases, though, a rotor table + view + truncate approach would be better, and would allow improved uptime. If you don't care about uptime, and can take long outages every day, then the discussion is sort of moot anyway. And _all_ of this is moot, as near as I can tell, given the OP's claim that the hardware is adequate and immutable, even though the former claim is demonstrably false. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Autovacuum / full vacuum
[EMAIL PROTECTED] (Alvaro Herrera) writes: Chris Browne wrote: [EMAIL PROTECTED] (Andrew Sullivan) writes: On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. That's a bit more than what autovacuum would probably do... autovacuum does things table by table, so that what would be locked should just be one table. Even a database-wide vacuum does not take locks on more than one table. The table locks are acquired and released one by one, as the operation proceeds. And as you know, autovacuum (both 8.1's and contrib) does issue database-wide vacuums, if it finds a database close to an xid wraparound. Has that changed recently? I have always seen vacuumdb or SQL VACUUM (without table specifications) running as one long transaction which doesn't release the locks that it is granted until the end of the transaction. -- cbbrowne,@,acm.org http://cbbrowne.com/info/spiritual.html My nostalgia for Icon makes me forget about any of the bad things. I don't have much nostalgia for Perl, so its faults I remember. -- Scott Gilbert comp.lang.python ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
Chris Browne [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Alvaro Herrera) writes: Even a database-wide vacuum does not take locks on more than one table. The table locks are acquired and released one by one, as the operation proceeds. Has that changed recently? I have always seen vacuumdb or SQL VACUUM (without table specifications) running as one long transaction which doesn't release the locks that it is granted until the end of the transaction. You sure? It's not supposed to, and watching a database-wide vacuum with select * from pg_locks doesn't look to me like it ever has locks on more than one table (plus the table's indexes and toast table). regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Suspending SELECTs
Tom Lane wrote: Alessandro Baretta [EMAIL PROTECTED] writes: I am aware that what I am dreaming of is already available through cursors, but in a web application, cursors are bad boys, and should be avoided. What I would like to be able to do is to plan a query and run the plan to retreive a limited number of rows as well as the executor's state. This way, the burden of maintaining the cursor on hold, between activations of the web resource which uses it, is transferred from the DBMS to the web application server, This is a pipe dream, I'm afraid, as the state of a cursor does not consist exclusively of bits that can be sent somewhere else and then retrieved. There are also locks to worry about, as well as the open transaction itself, and these must stay alive inside the DBMS because they affect the behavior of other transactions. As an example, once the cursor's originating transaction closes, there is nothing to stop other transactions from modifying or removing rows it would have read. I understand most of these issues, and expected this kind of reply. Please, allow me to insist that we reason on this problem and try to find a solution. My reason for doing so is that the future software industry is likely to see more and more web applications retrieving data from virtually endless databases, and in such contexts, it is sensible to ask the final client--the web client--to store the cursor state, because web interaction is intrinsically asynchronous, and you cannot count on users logging out when they're done, releasing resources allocated to them. Think of Google. Let me propose a possible solution strategy for the problem of client-side cursors. * Let us admit the limitation that a client-side cursor can only be declared in a transaction where no inserts, updates or deletes are allowed, so that such a transaction is virtually non-existent to other transactions. This allows the backend to close the transaction and release locks as soon as the cursor is declared. * When the cursor state is pushed back to the backend, no new transaction is instantiated, but the XID of the original transaction is reused. In the MVCC system, this allows us to achieve a perfectly consistent view of the database at the instant the original transaction started, unless a VACUUM command has been executed in the meantime, in which case I would lose track of tuples which would have been live in the context of the original transaction, but have been updated or deleted and later vacuumed; however, this does not bother me at all. Is this not a viable solution? Alex -- * http://www.barettadeit.com/ Baretta DEIT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) http://www.asxcaml.org/ The FreerP Project http://www.freerp.org/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Suspending SELECTs
On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote: I understand most of these issues, and expected this kind of reply. Please, allow me to insist that we reason on this problem and try to find a solution. My reason for doing so is that the future software industry is likely to see more and more web applications retrieving data from virtually endless databases, and in such contexts, it is sensible to ask the final client--the web client--to store the cursor state, because web interaction is intrinsically asynchronous, and you cannot count on users logging out when they're done, releasing resources allocated to them. Think of Google. I don't understand why it is better to rework the db instead of just having the web middleware keep track of what cursors are associated with what sessions? Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Suspending SELECTs
Craig A. James wrote: Alessandro Baretta [EMAIL PROTECTED] writes: I think you're trying to do something at the wrong layer of your architecture. This task normally goes in your middleware layer, not your database layer. I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could easily implement a cursor-pooling strategy, but there is no perfect solution to the problem of guaranteeing that cursors be closed. Remember that web applications require the user to open a session by connecting the appropriate HTTP resource, but users as never required to log out. Hence, in order to eventually reclaim all cursors, I must use magical log-out detection algorithm, which is usually implemented with a simple timeout. This guarantees the required property of safety (the population of cursors does not diverge) but does not guarantee the required property of liveness (a user connecting to the application, who has opened a session but has not logged out, and thus possesses a session token, should have access the execution context identified by his token). Alex -- * http://www.barettadeit.com/ Baretta DEIT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) http://www.asxcaml.org/ The FreerP Project http://www.freerp.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] wildcard search performance with like
Hi, I have a postges 8.1.1 table with over 29 million rows in it. The colunm (file_name) that I need to search on has entries like the following: MOD04_L2.A2005311.1400.004.2005312013848.hdf MYD04_L2.A2005311.0700.004.2005312013437.hdf I have an index on this column. But an index search is performance only when I give the full file_name for search: testdbspc=# explain select file_name from catalog where file_name = 'MOD04_L2.A2005311.1400.004.2005312013848.hdf'; QUERY PLAN Index Scan using catalog_pk_idx on catalog (cost=0.00..6.01 rows=1 width=404) Index Cond: (file_name = 'MOD04_L2.A2005311.1400.004.2005312013848.hdf'::bpchar) (2 rows) What I really need to do most of the time is a multi-wildcard search on this column, which is now doing a whole table scan without using the index at all: testdbspc=# explain select file_name from catalog where file_name like 'MOD04_L2.A2005311.%.004.2005312013%.hdf'; QUERY PLAN Seq Scan on catalog (cost=0.00..429.00 rows=1 width=404) Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text) (2 rows) Obviously, the performance of the table scan on such a large table is not acceptable. I tried full-text indexing and searching. It did NOT work on this column because all the letters and numbers are linked together with . and considered one big single word by to_tsvector. Any solutions for this column to use an index search with multiple wild cards? Thanks a lot, Yantao Shi ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Suspending SELECTs
Alessandro Baretta [EMAIL PROTECTED] writes: * When the cursor state is pushed back to the backend, no new transaction is instantiated, but the XID of the original transaction is reused. In the MVCC system, this allows us to achieve a perfectly consistent view of the database at the instant the original transaction started, unless a VACUUM command has been executed in the meantime, in which case I would lose track of tuples which would have been live in the context of the original transaction, but have been updated or deleted and later vacuumed; however, this does not bother me at all. Is this not a viable solution? No. I'm not interested in solutions that can be translated as you may or may not get the right answer, and there's no way even to know whether you did or not. That might be acceptable for your particular application but you certainly can't argue that it's of general usefulness. Also, I can't accept the concept of pushing the entire execution engine state out to the client and then back again; that state is large enough that doing so for every few dozen rows would yield incredibly bad performance. (In most scenarios I think it'd be just as efficient for the client to pull the whole cursor output at the start and page through it for itself.) Worse yet: this would represent a security hole large enough to wheel West Virginia through. We'd have no reasonable way to validate the data the client sends back. Lastly, you underestimate the problems associated with not holding the locks the cursor is using. As an example, it's likely that a btree indexscan wouldn't successfully restart at all, because it couldn't find where it had been if the index page had been split or deleted meanwhile. So not running VACUUM is not enough to guarantee the query will still work. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Suspending SELECTs
On Tue, Jan 17, 2006 at 09:06:53PM +0100, Alessandro Baretta wrote: Craig A. James wrote: Alessandro Baretta [EMAIL PROTECTED] writes: I think you're trying to do something at the wrong layer of your architecture. This task normally goes in your middleware layer, not your database layer. I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could easily implement a cursor-pooling strategy, but there is no perfect solution to the problem of guaranteeing that cursors be closed. Remember that web applications require the user to open a session by connecting the appropriate HTTP resource, but users as never required to log out. Hence, in order to eventually reclaim all cursors, I must use magical log-out detection algorithm, which is usually implemented with a simple timeout. This guarantees the required property of safety (the population of cursors does not diverge) but does not guarantee the required property of liveness (a user connecting to the application, who has opened a session but has not logged out, and thus possesses a session token, should have access the execution context identified by his token). With some AJAX magic, it would probably be pretty easy to create an application that let you know very quickly if a user left the application (ie: browsed to another site, or closed the browser). Essentially, you should be able to set it up so that it will ping the application server fairly frequently (like every 10 seconds), so you could drastically reduce the timeout interval. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] wildcard search performance with like
Yantao Shi [EMAIL PROTECTED] writes: testdbspc=# explain select file_name from catalog where file_name like 'MOD04_L2.A2005311.%.004.2005312013%.hdf'; QUERY PLAN Seq Scan on catalog (cost=0.00..429.00 rows=1 width=404) Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text) (2 rows) I'm betting you are using a non-C locale. You need either to run the database in C locale, or to create a special index type that is compatible with LIKE searches. See http://www.postgresql.org/docs/8.1/static/indexes-opclass.html regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sum of left join greater than its parts
Hmmm, this looks like a planner bug to me: Hash Join (cost=870.00..992.56 rows=1 width=96) (actual time=90.566..125.782 rows=472 loops=1) Hash Cond: ((outer.host_id = inner.host_id) AND (outer.?column2? = inner.mtime)) - HashAggregate (cost=475.88..495.32 rows=1555 width=16) (actual time=51.300..70.761 rows=10870 loops=1) -- Nested Loop (cost=1733.79..4620.38 rows=1 width=20) (actual time=81.160..89.826 rows=238 loops=1) - Nested Loop (cost=1733.79..4615.92 rows=1 width=20) (actual time=81.142..86.826 rows=238 loops=1) Join Filter: (outer.rmsbinaryid = inner.rmsbinaryid) - HashAggregate (cost=1733.79..1740.92 rows=570 width=12) (actual time=81.105..81.839 rows=323 loops=1) - Bitmap Heap Scan on msg306u (cost=111.75..1540.65 rows=25752 width=12) (actual time=4.490..41.233 rows=25542 loops=1) Notice that for both queries, the estimates are reasonably accurate (within +/- 4x) until they get to left joining the subquery, at which point the estimate of rows joined becomes exactly 1. That looks suspicios to me ... Tom? Neil? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Suspending SELECTs
On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote: I understand most of these issues, and expected this kind of reply. Please, allow me to insist that we reason on this problem and try to find a solution. My reason for doing so is that the future software industry is likely to see more and more web applications retrieving data from virtually endless databases, and in such contexts, it is sensible to ask the final client--the web client--to store the cursor state, because web interaction is intrinsically asynchronous, and you cannot count on users logging out when they're done, releasing resources allocated to them. Think of Google. What is wrong with LIMIT and OFFSET? I assume your results are ordered in some manner. Especially with web users, who become bored if the page doesn't flicker in a way that appeals to them, how could one have any expectation that the cursor would ever be useful at all? As a 'general' solution, I think optimizing the case where the same query is executed multiple times, with only the LIMIT and OFFSET parameters changing, would be a better bang for the buck. I'm thinking along the lines of materialized views, for queries executed more than a dozen times in a short length of time... :-) In the mean time, I successfully use LIMIT and OFFSET without such an optimization, and things have been fine for me. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Suspending SELECTs
I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could easily implement a cursor-pooling strategy, but there is no perfect solution to the problem of guaranteeing that cursors be closed. Remember that web applications require the user to open a session by connecting the appropriate HTTP resource, but users as never required to log out. Hence, in order to eventually reclaim all cursors, I must use magical log-out detection algorithm, which is usually implemented with a simple timeout. This guarantees the required property of safety (the population of cursors does not diverge) but does not guarantee the required property of liveness (a user connecting to the application, who has opened a session but has not logged out, and thus possesses a session token, should have access the execution context identified by his token). I fail to see the problem here. Why should liveness be a required property? If is it simply that you can't promptly detect when a user is finished with their web session so you can free resources, then remember that there is no requirement that you dedicate a connection to their session in the first place. Even if you're using your own custom middleware, it isn't a very complicated or conceptually difficult thing to implement (see my previous post). Certainly it's simpler than allowing clients to pass around runtime state. As far as implementing this sort of thing in the back-end, it would be really hard with the PostgreSQL versioning model. Oracle can more easily (and kind of does) support cursors like you've described because they implement MVCC differently than PostgreSQL, and in their implementation you're guaranteed that you always have access to the most recent x megabytes of historical rows, so even without an open transaction to keep the required rows around you can still be relatively sure they'll be around for long enough. In PostgreSQL, historical rows are kept in the tables themselves and periodically vacuumed, so there is no such guarantee, which means that you would need to either implement a lot of complex locking for little material gain, or just hold the cursors in moderately long-running transactions, which leads back to the solution suggested earlier. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Multiple Order By Criteria
I'm trying to query a table with 250,000+ rows. My query requires I provide 5 colums in my "order by" clause: selectcolumn fromtable where column = '2004-3-22 0:0:0'order by ds.receipt desc, ds.carrier_id asc, ds.batchnum asc, encounternum asc, ds.encounter_id ASC limit 100 offset 0 I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain: Limit (cost=229610.78..229611.03 rows=100 width=717) - Sort (cost=229610.78..230132.37 rows=208636 width=717) Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id - Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717) Filter: (receipt = '2004-03-22'::date) When I have the order by just have 1 criteria, it's fine (just ds.receipt DESC) Limit (cost=0.00..177.71 rows=100 width=717) - Index Scan Backward using detail_summary_receipt_id_idx on detail_summary ds (cost=0.00..370756.84 rows=208636 width=717) Index Cond: (receipt = '2004-03-22'::date) I've increased my work_mem to up to 256meg with no speed increase. I think there's something here I just don't understand. How do I make this go fast ?
Re: [PERFORM] Multiple Order By Criteria
J, I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain: You need a single index which has all five columns, in order. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Multiple Order By Criteria
I created the index, in order. Did a vacuum analyze on the table and my explain still says: Limit (cost=229610.78..229611.03 rows=100 width=717) - Sort (cost=229610.78..230132.37 rows=208636 width=717) Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id - Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717) Filter: (receipt = '2004-03-22'::date) So, for fun I did set enable_seqscan to off But that didn't help. For some reason, the sort wants to do a seq scan and not use my super new index. Am I doing something wrong ? - Original Message - From: Josh Berkus josh@agliodbs.com To: pgsql-performance@postgresql.org Cc: [EMAIL PROTECTED] Sent: Tuesday, January 17, 2006 5:25 PM Subject: Re: [PERFORM] Multiple Order By Criteria J, I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain: You need a single index which has all five columns, in order. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Multiple Order By Criteria
On Tue, 17 Jan 2006, Josh Berkus wrote: J, I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain: You need a single index which has all five columns, in order. I think he'll also need a reverse opclass for the first column in the index or for the others since he's doing desc, asc, asc, asc, asc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote: Well, I think that the documentation is not exactly easy to understand. I always wondered why there are no examples for common postgresql configurations. All I know is that the default configuration seems to be too low for production use. And while running postgres I get no hints as to which setting needs to be increased to improve performance. I have no There's a number of sites that have lots of info on postgresql.conf tuning. Google for 'postgresql.conf tuning' or 'annotated postgresql.conf'. chance to see if my FSM settings are too low other than to run vacuum full verbose in psql, pipe the result to a text file and grep for some words to get a somewhat comprehensive idea of how much unused space there is in my system. Don't get me wrong - I really like PostgreSQL and it works well in my application. But somehow I feel that it might run much better ... about the FSM: You say that increasing the FSM is fairly cheap - how should I know that? [EMAIL PROTECTED]:26]/opt/local/share/postgresql8:3%grep fsm \ postgresql.conf.sample #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each [EMAIL PROTECTED]:26]/opt/local/share/postgresql8:4% -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Suspending SELECTs
On Tue, 17 Jan 2006 16:12:59 -0500 [EMAIL PROTECTED] wrote: In the mean time, I successfully use LIMIT and OFFSET without such an optimization, and things have been fine for me. Same here. - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Suspending SELECTs
Alessandro, I understand most of these issues, and expected this kind of reply. Please, allow me to insist that we reason on this problem and try to find a solution. My reason for doing so is that the future software industry is likely to see more and more web applications retrieving data from virtually endless databases, and in such contexts, it is sensible to ask the final client--the web client--to store the cursor state, because web interaction is intrinsically asynchronous, and you cannot count on users logging out when they're done, releasing resources allocated to them. Think of Google. I think you're trying to use an unreasonable difficult method to solve a problem that's already been solved multiple times. What you want is called query caching. There are about 800 different ways to do this on the middleware or application layer which are 1000% easier than what you're proposing. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Multiple Order By Criteria
try adding the keyword 'date' before the date in your query. I ran into this quite a while back, but I'm not sure I remember the solution. In Reply to: Tuesday January 17 2006 04:29 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I created the index, in order. Did a vacuum analyze on the table and my explain still says: Limit (cost=229610.78..229611.03 rows=100 width=717) - Sort (cost=229610.78..230132.37 rows=208636 width=717) Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id - Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717) Filter: (receipt = '2004-03-22'::date) So, for fun I did set enable_seqscan to off But that didn't help. For some reason, the sort wants to do a seq scan and not use my super new index. Am I doing something wrong ? - Original Message - From: Josh Berkus josh@agliodbs.com To: pgsql-performance@postgresql.org Cc: [EMAIL PROTECTED] Sent: Tuesday, January 17, 2006 5:25 PM Subject: Re: [PERFORM] Multiple Order By Criteria J, I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain: You need a single index which has all five columns, in order. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Suspending SELECTs
People: To follow up further, what Alessandro is talking about is known as a keyset cursor. Sybase and SQL Server used to support them; I beleive that they were strictly read-only and had weird issues with record visibility. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Multiple Order By Criteria
On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote: I created the index like this: CREATE INDEX rcbee_idx ON detail_summary USING btree (receipt, carrier_id, batchnum, encounternum, encounter_id); Is this correct ? That would work if you were asking for all the columns ascending or descending, but we don't currently use it for mixed orders. How do I make a reverse opclass ? There's some information at the following: http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Ensuring data integrity with fsync=off
On Sat, Jan 14, 2006 at 01:41:43PM -0500, Tom Lane wrote: Benjamin Arai [EMAIL PROTECTED] writes: Right now I run sync afte the updates have finished to ensure that the data is synced to disk but I am concerned about the segment data and anything else I am missing that PostgreSQL explicitly handles. Is there something I can do in addition to sync to tell PostgreSQL exlplicitly that it is time to ensure everything is stored in its final destionation and etc? You need to give PG a CHECKPOINT command to flush stuff out of its internal buffers. After that finishes, a manual sync commnd will push everything down to disk. You realize, of course, that a system failure while the updates are running might leave your database corrupt? As long as you are prepared to restore from scratch, this might be a good tradeoff ... but don't let yourself get caught without an up-to-date backup ... Another alternative that may (or may not) be simpler would be to run everything in one transaction and just let that commit at the end. Also, there is ongoing work towards allowing certain operations to occur without generating any log writes. Currently there is code submitted that allows COPY into a table that was created in the same transaction to go un-logged, though I think it's only in HEAD. In any case, there should be some features that could be very useful to you in 8.2. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Multiple Order By Criteria
I've read all of this info, closely. I wish when I was searching for an answer for my problem these pages came up. Oh well. I am getting an idea of what I need to do to make this work well. I was wondering if there is more information to read on how to implement this solution in a more simple way. Much of what's written seems to be towards an audience that should understand certain things automatically. - Original Message - From: Stephan Szabo [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Josh Berkus josh@agliodbs.com; pgsql-performance@postgresql.org Sent: Tuesday, January 17, 2006 6:39 PM Subject: Re: [PERFORM] Multiple Order By Criteria On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote: I created the index like this: CREATE INDEX rcbee_idx ON detail_summary USING btree (receipt, carrier_id, batchnum, encounternum, encounter_id); Is this correct ? That would work if you were asking for all the columns ascending or descending, but we don't currently use it for mixed orders. How do I make a reverse opclass ? There's some information at the following: http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Suspending SELECTs
[EMAIL PROTECTED] wrote: On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote: What is wrong with LIMIT and OFFSET? I assume your results are ordered in some manner. Especially with web users, who become bored if the page doesn't flicker in a way that appeals to them, how could one have any expectation that the cursor would ever be useful at all? As a 'general' solution, I think optimizing the case where the same query is executed multiple times, with only the LIMIT and OFFSET parameters changing, would be a better bang for the buck. I'm thinking along the lines of materialized views, for queries executed more than a dozen times in a short length of time... :-) In the mean time, I successfully use LIMIT and OFFSET without such an optimization, and things have been fine for me. Second that. I do seem to recall a case where I used a different variant of this method (possibly a database product that didn't have OFFSET, or maybe because OFFSET was expensive for the case in point), where the ORDER BY key for the last record on the page was saved and the query amended to use it filter for the next' screen - e.g: 1st time in: SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20; Suppose this displays records for id 1 - 10020. When the user hits next, and page saves id=10020 in the session state and executes: SELECT ... FROM table WHERE ... AND id 10020 ORDER BY id LIMIT 20; Clearly you have to be a little careful about whether to use '' or '=' depending on whether 'id' is unique or not (to continue using '' in the non unique case, you can just save and use all the members of the primary key too). Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Getting pg to use index on an inherited table (8.1.1)
Hi, I have two tables foobar and foobar2 (which inherits from foobar, no extra columns). foobar2 has all the data (574,576 rows), foobar is empty. Both foobar and foobar2 have an index on the only column 'id'. Now I have a list of ids in a tmp_ids tables. A query on foobar2 (child table) uses the index, whereas the same query via foobar (parent) doesn't. Even if I set seq_scan off, it still doesn't use the index on the child table while queried via the parent table. Details are given below. Any help is appreciated. # analyze foobar; ANALYZE # analyze foobar2; ANALYZE # explain analyze select * from foobar2 join tmp_ids using (id); QUERY PLAN --- Nested Loop (cost=0.00..3013.69 rows=85856 width=4) (actual time=0.038..234.864 rows=44097 loops=1) - Seq Scan on tmp_ids (cost=0.00..1.52 rows=52 width=4) (actual time=0.008..0.102 rows=52 loops=1) - Index Scan using foobar2_idx1 on foobar2 (cost=0.00..37.29 rows=1651 width=4) (actual time=0.007..1.785 rows=848 loops=52) Index Cond: (foobar2.id = outer.id) Total runtime: 302.963 ms (5 rows) # explain analyze select * from foobar join tmp_ids using (id); QUERY PLAN Hash Join (cost=1.65..13267.85 rows=149946 width=4) (actual time=7.338..3837.060 rows=44097 loops=1) Hash Cond: (outer.id = inner.id) - Append (cost=0.00..8883.16 rows=576716 width=4) (actual time=0.012..2797.555 rows=574576 loops=1) - Seq Scan on foobar (cost=0.00..31.40 rows=2140 width=4) (actual time=0.002..0.002 rows=0 loops=1) - Seq Scan on foobar2 foobar (cost=0.00..8851.76 rows=574576 width=4) (actual time=0.004..1027.422 rows=574576 loops=1) - Hash (cost=1.52..1.52 rows=52 width=4) (actual time=0.194..0.194 rows=52 loops=1) - Seq Scan on tmp_ids (cost=0.00..1.52 rows=52 width=4) (actual time=0.003..0.094 rows=52 loops=1) Total runtime: 3905.074 ms (8 rows) # select version(); version PostgreSQL 8.1.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row) # \d foobar Table public.foobar Column | Type | Modifiers +-+--- id | integer | Indexes: foobar_idx1 btree (id) # \d foobar2 Table public.foobar2 Column | Type | Modifiers +-+--- id | integer | Indexes: foobar2_idx1 btree (id) Inherits: foobar # \d tmp_ids Table public.tmp_ids Column | Type | Modifiers +-+--- id | integer | # set enable_seqscan=off; SET # explain analyze select * from foobar join tmp_ids using (id); QUERY PLAN - Hash Join (cost=20001.65..300013267.85 rows=149946 width=4) (actual time=7.352..3841.221 rows=44097 loops=1) Hash Cond: (outer.id = inner.id) - Append (cost=1.00..28883.16 rows=576716 width=4) (actual time=0.012..2803.547 rows=574576 loops=1) - Seq Scan on foobar (cost=1.00..10031.40 rows=2140 width=4) (actual time=0.003..0.003 rows=0 loops=1) - Seq Scan on foobar2 foobar (cost=1.00..18851.76 rows=574576 width=4) (actual time=0.005..1032.148 rows=574576 loops=1) - Hash (cost=10001.52..10001.52 rows=52 width=4) (actual time=0.194..0.194 rows=52 loops=1) - Seq Scan on tmp_ids (cost=1.00..10001.52 rows=52 width=4) (actual time=0.004..0.098 rows=52 loops=1) Total runtime: 3909.332 ms (8 rows) Output of show all (remember I just turned off seq_scan above) enable_bitmapscan | on| Enables the planner's use of bitmap-scan plans. enable_hashagg | on| Enables the planner's use of hashed aggregation plans. enable_hashjoin | on| Enables the planner's use of hash join plans. enable_indexscan| on| Enables the planner's use of index-scan plans. enable_mergejoin| on| Enables the planner's use of merge join plans. enable_nestloop | on| Enables the planner's use of nested-loop join plans. enable_seqscan | off
Re: [PERFORM] Suspending SELECTs
Alessandro Baretta wrote: I think you're trying to do something at the wrong layer of your architecture. This task normally goes in your middleware layer, not your database layer. I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could easily implement a cursor-pooling strategy... You're trying to solve a very hard problem, and you're rewriting a lot of stuff that's been worked on for years by teams of people. If there's any way you switch use something like JBOSS, it might save you a lot of grief and hard work. I eliminated this problem a different way, using what we call a hitlist. Basically, every query becomes a select into, something like this: insert into hitlist_ (select id from ...) where is your user's id. Once you do this, it's trivial to return each page to the user almost instantly using offset/limit, or by adding a ROW_NUM column of some sort. We manage very large hitlists -- millions of rows. Going from page 1 to page 100,000 takes a fraction of a second. It also has the advantage that the user can come back in a week or a month and the results are still there. The drawback are: 1. Before the user gets the first page, the entire query must complete. 2. You need a way to clean up old hitlists. 3. If you have tens of thousands of users, you'll have a large number of hitlists, and you have to use tablespaces to ensure that Linux filesystem directories don't get too large. 4. It takes space to store everyone's data. (But disk space is so cheap this isn't much of an issue.) You can eliminate #3 by a single shared hitlist with a column of UserID's. But experience shows that a big shared hitlist doesn't work very well: Inserts get slower because the UserID column must be indexed, and you can truncate individual hitlists but you have to delete from a shared hitlist. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Suspending SELECTs
Mark Kirkwood [EMAIL PROTECTED] writes: SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20; Suppose this displays records for id 1 - 10020. When the user hits next, and page saves id=10020 in the session state and executes: SELECT ... FROM table WHERE ... AND id 10020 ORDER BY id LIMIT 20; Clearly you have to be a little careful about whether to use '' or '=' depending on whether 'id' is unique or not (to continue using '' in the non unique case, you can just save and use all the members of the primary key too). This is actually fairly painful to get right for a multi-column key at the moment. It'll be much easier once I finish up the SQL-spec-row-comparison project. See this thread for background: http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Getting pg to use index on an inherited table (8.1.1)
Hari Warrier [EMAIL PROTECTED] writes: A query on foobar2 (child table) uses the index, whereas the same query via foobar (parent) doesn't. A query just on foobar should be able to use the index AFAIR. The problem here is that you have a join, and we are not very good about situations involving joins against inheritance sets (nor joins against UNION ALL subqueries, which is really about the same thing). I'm hoping to get a chance to look into improving this during the 8.2 development cycle. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Multiple Order By Criteria
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Rabu, 18 Januari 2006 07:23 To: Stephan Szabo Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Multiple Order By Criteria I've read all of this info, closely. I wish when I was searching for an answer for my problem these pages came up. Oh well. Well, I think you have to know about btree index. Btree is good enough, although it's not better. It will perform best, if it doesn't index too many multiple column. In your case, you have to consentrate on 2 or 3 fields that will use frequently. Put the most duplicate value on the front and others are behind. Eq: receipt, carrier_id, batchnum is the most frequently use, but the most duplicate value are: carrier_id, receipt, and batchnum so make btree index (carrier_id, receipt, batchnum). Btree will not suffer, and we also will advantage if the table have relationship with other table with the same fields order. We have not to make another index for that relation. Best regards, ahmad fajar. I am getting an idea of what I need to do to make this work well. I was wondering if there is more information to read on how to implement this solution in a more simple way. Much of what's written seems to be towards audience that should understand certain things automatically. - Original Message - From: Stephan Szabo [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Josh Berkus josh@agliodbs.com; pgsql-performance@postgresql.org Sent: Tuesday, January 17, 2006 6:39 PM Subject: Re: [PERFORM] Multiple Order By Criteria On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote: I created the index like this: CREATE INDEX rcbee_idx ON detail_summary USING btree (receipt, carrier_id, batchnum, encounternum, encounter_id); Is this correct ? That would work if you were asking for all the columns ascending or descending, but we don't currently use it for mixed orders. How do I make a reverse opclass ? There's some information at the following: http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Suspending SELECTs
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20; Suppose this displays records for id 1 - 10020. When the user hits next, and page saves id=10020 in the session state and executes: SELECT ... FROM table WHERE ... AND id 10020 ORDER BY id LIMIT 20; Clearly you have to be a little careful about whether to use '' or '=' depending on whether 'id' is unique or not (to continue using '' in the non unique case, you can just save and use all the members of the primary key too). This is actually fairly painful to get right for a multi-column key at the moment. It'll be much easier once I finish up the SQL-spec-row-comparison project. Right, I think it was actually an Oracle 7.3 based web app (err... showing age here...) that I used this technique on. Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend