Re: [PERFORM] partitioning question 1
-Original Message- From: Ben [mailto:midfi...@gmail.com] Sent: Thursday, October 28, 2010 12:37 PM To: pgsql-performance@postgresql.org Subject: partitioning question 1 hello -- my last email was apparently too long to respond to so i'll split it up into shorter pieces. my first question : my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy any query performance that a clustered index doesn't already give you -- the advantages are all in maintainability. an index is able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions. the I/O advantages of having queries target small subtables are the same as the I/O advantages of clustering the index : result pages in a small range are very close to each other on disk. finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that constraint exclusion only works with static constants in where clauses, and only works with simple operators like , which basically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than one big table with a clustered index. is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas robertmh...@gmail.com wrote: Well, we COULD keep the data in shared buffers, and then copy it into an mmap()'d region rather than calling write(), but I'm not sure there's any advantage to it. Managing address space mappings is a pain in the butt. I could see this being a *theoretical* benefit in the case that the background writer gains the ability to write out all blocks associated with a file in order. In that case, you might get a win because you could get a single mmap of the entire file, and just wholesale memcpy blocks across, then sync/unmap it. This, of course assumes a few things that must be for it to be per formant: 0) a list of blocks to be written grouped by files is readily available. 1) The pages you write to must be in the page cache, or your memcpy is going to fault them in. With a plain write, you don't need the over-written page in the cache. 2) Now, instead of the torn-page problem being FS block/sector sized base, you can now actually have a possibly arbitrary amount of the block memory written when the kernel writes out the page. you *really* need full-page-writes. 3) The mmap overhead required for the kernel to setup the mappings is less than the repeated syscalls of a simple write(). All those things seem like something that somebody could synthetically benchmark to prove value before even trying to bolt into PostgreSQL. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
Robert Haas robertmh...@gmail.com writes: On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's true that we don't know whether write() causes an immediate or delayed disk write, but we generally don't care that much. What we do care about is being able to ensure that a WAL write happens before the data write, and with mmap we don't have control over that. Well, we COULD keep the data in shared buffers, and then copy it into an mmap()'d region rather than calling write(), but I'm not sure there's any advantage to it. Managing address space mappings is a pain in the butt. In principle that ought to be right about the same speed as using write() to copy the data from shared buffers to kernel disk buffers, anyway. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote: is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. my impression was that a *clustered* index would give a lot of the same I/O benefits, in a more flexible way. if you're clustered on the column in question, then an index scan for a range is much like a sequential scan over a partition (as far as i understand.) b -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
-Original Message- From: Ben [mailto:midfi...@gmail.com] Sent: Friday, October 29, 2010 12:16 PM To: Igor Neyman Cc: pgsql-performance@postgresql.org Subject: Re: partitioning question 1 On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote: is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. my impression was that a *clustered* index would give a lot of the same I/O benefits, in a more flexible way. if you're clustered on the column in question, then an index scan for a range is much like a sequential scan over a partition (as far as i understand.) b Even with clustered index you still read index+table, which is more expensive than just table scan (in situation I described above). PG clustered index is not the same as SQL Server clustered index (which includes actual table pages on the leaf level). Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] typoed column name, but postgres didn't grump
I've been having trouble with a query. The query is a cross join between two tables. Initially, I mis-typed the query, and one of the columns specified in the query doesn't exist, however the query ran nonetheless. The actual query: select gid from t2, t3 where t2.name = t3.name and t3.scope = 'city' and t3.hierarchy = 'STANDARD' and t2.adiv = t3.adiv limit 1 ; However, there *is* no column 'name' in table 't2'. When I ran the query, it took a *really* long time to run (670 seconds). When I corrected the query to use the right column name (city_name), the query ran in 28ms. The question, then, is why didn't the postgres grump about the non-existent column name? The version is 8.4.5 on x86_64, openSUSE 11.3 PostgreSQL 8.4.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, Oct 29, 2010 at 11:56 AM, Aidan Van Dyk ai...@highrise.ca wrote: 1) The pages you write to must be in the page cache, or your memcpy is going to fault them in. With a plain write, you don't need the over-written page in the cache. I seem to remember a time many years ago when I got bitten by this problem. The fact that our I/O is in 8K pages means this could be a pretty severe hit, I think. 2) Now, instead of the torn-page problem being FS block/sector sized base, you can now actually have a possibly arbitrary amount of the block memory written when the kernel writes out the page. you *really* need full-page-writes. Yeah. 3) The mmap overhead required for the kernel to setup the mappings is less than the repeated syscalls of a simple write(). You'd expect to save something from that; but on the other hand, at least on 32-bit systems, there's a very limited number of 1GB files that can be simultaneously mapped into one address space, and it's a lot smaller than the number of file descriptors that you can have open. Rumor has it that cutting down the number of fds that can stay open simultaneously is pretty bad for performance, so cutting it down to a number you can count on one hand (maybe one finger) would probably be bad. Maybe on 64-bit it would be OK but it seems like an awful lot of complexity for at most a minor savings (and a pretty bad anti-savings if point #1 kicks in). Anyway this is all totally off-topic... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] CPUs for new databases
Hello, What is the general view of performance CPU's nowadays when it comes to PostgreSQL performance? Which CPU is the better choice, in regards to RAM access-times, stream speed, cache synchronization etc. Which is the better CPU given the limitation of using AMD64 (x86-64)? We're getting ready to replace our (now) aging db servers with some brand new with higher core count. The old ones are 4-socket dual-core Opteron 8218's with 48GB RAM. Right now the disk-subsystem is not the limiting factor so we're aiming for higher core-count and as well as faster and more RAM. We're also moving into the territory of version 9.0 with streaming replication to be able to offload at least a part of the read-only queries to the slave database. The connection count on the database usually lies in the region of ~2500 connections and the database is small enough that it can be kept entirely in RAM (dump is about 2,5GB). Regards, Christian Elmerot -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] MVCC and Implications for (Near) Real-Time Application
Hi experts, I have a (near) real-time application in which inserts into the database needs to be visible to queries from other threads with minimal delay. The inserts are triggered by real-time events and are therefore asynchronous (i.e. many performance tips I read related to batch inserts or copy do not apply here, since these events cannot be predicted or batched), and the inserted data need to be available within a couple of seconds to other threads (for example, an inserted row that only appears to other query threads 5 seconds or more after the insert is not acceptable). The delay should be under 2 seconds maximum, sub-1 second would be great. My questions are: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads (I am unclear about how multiple versions are collapsed or reconciled, as well as how different query threads are seeing which version)? (2) Are there any available benchmarks that can measure this delay? (3) What are relevant config parameters that will reduce this delay? Thanks for your patience with my ignorance of MVCC (still learning more about it), Steve
Re: [PERFORM] Stored procedure declared as VOLATILE = no good optimization is done
Thank you for all of the responses. This was really helpful. Damon On Sat, Oct 16, 2010 at 12:54 PM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Oct 15, 2010 at 10:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Tatsuo Ishii is...@postgresql.org writes: So can I say if a function is marked IMMUTABLE, then it should never modify database? Is there any counter example? It seems if above is correct, I can say STABLE functions should never modify databases as well. Both of those things are explicitly stated here: http://developer.postgresql.org/pgdocs/postgres/xfunc-volatility.html Ok, being pedantic here, but: I think more interesting is *why* the 'immutable shall not modify the database' requirement is there. IOW, suppose you ignore the warnings on the docs and force immutability on a function that writes (via the function loophole) to the database, why exactly is this a bad idea? The reasoning given in the documentation explains a problematic symptom of doing so but gives little technical reasoning what it should never be done. One reason why writing to the database breaks immutability is that writing to the database depends on resources that can change after the fact: function immutability also pertains to failure -- if a function errors (or not) with a set of inputs, it should always do so. If you write to a table, you could violate a constraint from one call to the next, or the table may not even be there at all... Writing to the database means you are influencing other systems, and via constraints they are influencing you, so it makes it wrong by definition. That said, if you were writing to, say, a table with no meaningful constraints this actually wouldn't be so bad as long as you can also deal with the other big issue with immutability, namely that there is not 1:1 correspondence between when the function is logically evaluated and when it is executed. This more or less eliminates logging (at least outside of debugging purposes), the only thing I can figure you can usefully do on a table w/no enforceable constraints. Also, a big use case for immutable function is to allow use in indexing, and it would be just crazy (again, debugging purposes aside) to write to a table on index evaluation. merlin
Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle
Dear All, Thanks for your inputs on the insert performance part. Any suggestion on storage requirement? VACUUM is certainly not an option, because this is something related to maintenance AFTER insertion. I am talking about the plain storage requirement w.r. to Oracle, which I observed is twice of Oracle in case millions of rows are inserted. Anybody who tried to analyze the average storage requirement of PG w.r. to Oracle? Best Regards, Divakar From: Merlin Moncure mmonc...@gmail.com To: Robert Haas robertmh...@gmail.com Cc: Mladen Gogala mladen.gog...@vmsinfo.com; pgsql-performance@postgresql.org Sent: Wed, October 27, 2010 4:46:53 AM Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: The table is created with on commit obliterate rows option which means that there is no need to do truncate. The truncate command is a heavy artillery. Truncating a temporary table is like shooting ducks in a duck pond, with a howitzer. This is just not true. ON COMMIT DELETE ROWS simply arranges for a TRUNCATE to happen immediately before each commit. See PreCommit_on_commit_actions() in tablecmds.c. quite so. If you are doing anything performance sensitive with 'on commit drop', you are better off organizing a cache around txid_current() (now(), pid for older pg versions). Skips the writes to the system catalogs and truncate. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
If you alter the default_statistics_target or any of the specific statistics targets ( via ALTER TABLE SET STATISTICS ) , the change will not have an effect until an analyze is performed. This is implied by http://www.postgresql.org/docs/9.0/static/planner-stats.html and http://www.postgresql.org/docs/9.0/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET, but it might save questions like this if it were much more explicit. On Wed, Oct 27, 2010 at 2:52 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Wed, Oct 27, 2010 at 1:32 PM, Reid Thompson reid.thomp...@ateb.com wrote: On Wed, 2010-10-27 at 13:23 -0500, Jon Nelson wrote: set it to 500 and restarted postgres. did you re-analyze? Not recently. I tried that, initially, and there was no improvement. I'll try it again now that I've set the stats to 500. The most recent experiment shows me that, unless I create whatever indexes I would like to see used *before* the large (first) update, then they just don't get used. At all. Why would I need to ANALYZE the table immediately following index creation? Isn't that part of the index creation process? Currently executing is a test where I place an ANALYZE foo after the COPY, first UPDATE, and first index, but before the other (much smaller) updates. .. Nope. The ANALYZE made no difference. This is what I just ran: BEGIN; CREATE TEMPORARY TABLE foo COPY ... UPDATE ... -- 1/3 of table, approx CREATE INDEX foo_rowB_idx on foo (rowB); ANALYZE ... -- queries from here to 'killed' use WHERE rowB = 'someval' UPDATE ... -- 7 rows. seq scan! UPDATE ... -- 242 rows, seq scan! UPDATE .. -- 3700 rows, seq scan! UPDATE .. -- 3100 rows, seq scan! killed. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
Jason Pitts: RE: changing default_statistics_target (or via ALTER TABLE SET STATS) not taking effect until ANALYZE is performed. I did already know that, but it's probably good to put into this thread. However, you'll note that this is a temporary table created at the beginning of a transaction. ( giving up on replying to the group; the list will not accept my posts ) I've been following the thread so long I had forgotten that. I rather strongly doubt that analyze can reach that table's content inside that transaction, if you are creating, populating, and querying it all within that single transaction. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application
On Oct 25, 2010, at 2:46 PM, Steve Wong wrote: Hi experts, I have a (near) real-time application in which inserts into the database needs to be visible to queries from other threads with minimal delay. The inserts are triggered by real-time events and are therefore asynchronous (i.e. many performance tips I read related to batch inserts or copy do not apply here, since these events cannot be predicted or batched), and the inserted data need to be available within a couple of seconds to other threads (for example, an inserted row that only appears to other query threads 5 seconds or more after the insert is not acceptable). The delay should be under 2 seconds maximum, sub-1 second would be great. My questions are: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads (I am unclear about how multiple versions are collapsed or reconciled, as well as how different query threads are seeing which version)? (2) Are there any available benchmarks that can measure this delay? (3) What are relevant config parameters that will reduce this delay? There is no way to know without testing whether your hardware, OS, database schema, and database load can meet your demands. However, there is no technical reason why PostgreSQL could not meet your timing goals- MVCC does not inherently introduce delays, however the PostgreSQL implementation requires a cleanup process which can introduce latency. If you find that your current architecture is not up to the task, consider using LISTEN/NOTIFY with a payload (new in 9.0), which we are using for a similar live-update system. Cheers, M -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application
Steve Wong powerpch...@yahoo.com wrote: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads (I am unclear about how multiple versions are collapsed or reconciled, as well as how different query threads are seeing which version)? As soon as the inserting transaction commits the inserted row is visible to new snapshots. If you are in an explicit transaction the commit will have occurred before the return from the COMMIT request; otherwise it will have completed before the return from the INSERT request. You will get a new snapshot for every statement in READ COMMITTED (or lower) transaction isolation. You will get a new snapshot for each database transaction in higher isolation levels. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] typoed column name, but postgres didn't grump
Jon Nelson jnelson+pg...@jamponi.net writes: Initially, I mis-typed the query, and one of the columns specified in the query doesn't exist, however the query ran nonetheless. The actual query: select gid from t2, t3 where t2.name = t3.name and t3.scope = 'city' and t3.hierarchy = 'STANDARD' and t2.adiv = t3.adiv limit 1 ; However, there *is* no column 'name' in table 't2'. This is the old automatic-cast-from-record-to-text-string issue, ie it treats this like (t2.*)::name. We've been over this a few times before, but it's not clear that we can make this throw an error without introducing unpleasant asymmetry into the casting behavior, as in you couldn't get the cast when you did want it. BTW this seems pretty far off-topic for pgsql-performance. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
On Wed, Oct 27, 2010 at 3:44 PM, Justin Pitts justinpi...@gmail.com wrote: Jason Pitts: RE: changing default_statistics_target (or via ALTER TABLE SET STATS) not taking effect until ANALYZE is performed. I did already know that, but it's probably good to put into this thread. However, you'll note that this is a temporary table created at the beginning of a transaction. ( giving up on replying to the group; the list will not accept my posts ) Evidently it's accepting some of them... I've been following the thread so long I had forgotten that. I rather strongly doubt that analyze can reach that table's content inside that transaction, if you are creating, populating, and querying it all within that single transaction. Actually I don't think that's a problem, at least for a manual ANALYZE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] typoed column name, but postgres didn't grump
Tom Lane t...@sss.pgh.pa.us wrote: BTW this seems pretty far off-topic for pgsql-performance. It is once you understand what's happening. It was probably the 11+ minutes for the mistyped query run, versus the 28 ms without the typo, that led them to this list. I remembered this as an issued that has come up before, but couldn't come up with good search criteria for finding the old thread before you posted. If you happen to have a reference or search criteria for a previous thread, could you post it? Otherwise, a brief explanation of why this is considered a feature worth keeping would be good. I know it has been explained before, but it just looks wrong, on the face of it. Playing around with it a little, it seems like a rather annoying foot-gun which could confuse people and burn a lot of development time: test=# create domain make text; CREATE DOMAIN test=# create domain model text; CREATE DOMAIN test=# create table vehicle (id int primary key, make make); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index vehicle_pkey for table vehicle CREATE TABLE test=# insert into vehicle values (1, 'Toyota'),(2,'Ford'),(3,'Rambler'); INSERT 0 3 test=# select v.make, v.model from vehicle v; make |model -+- Toyota | (1,Toyota) Ford| (2,Ford) Rambler | (3,Rambler) (3 rows) If someone incorrectly thinks they've added a column, and the purported column name happens to match any character-based type or domain name, they can get a query which behaves in a rather unexpected way. In this simple query it's pretty easy to spot, but it could surface in a much more complex query. If a mistyped query runs for 11 days instead of 11 minutes, they may have a hard time spotting the problem. A typo like this could be particularly hazardous in a DELETE or UPDATE statement. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
Excerpts from Greg Smith's message of jue oct 21 14:04:17 -0300 2010: What I would like to do is beef up the documentation with some concrete examples of how to figure out if your cache and associated write path are working reliably or not. It should be possible to include does this handle full page writes correctly? in that test suite. Until we have something like that, I'm concerned that bugs in filesystem or controller handling may make full_page_writes unsafe even with a BBU, and we'd have no way for people to tell if that's true or not. I think if you assume that there are bugs in the filesystem which you need to protect against, you are already hosed. I imagine there must be some filesystem bug that makes it safe to have full_page_writes=on, but unsafe to have full_page_writes=off; but I'd probably discard those as a rare minority and thus not worth worrying about. I agree it would be worth testing though. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] typoed column name, but postgres didn't grump
[ please continue any further discussion in pgsql-bugs only ] Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: BTW this seems pretty far off-topic for pgsql-performance. It is once you understand what's happening. It was probably the 11+ minutes for the mistyped query run, versus the 28 ms without the typo, that led them to this list. I remembered this as an issued that has come up before, but couldn't come up with good search criteria for finding the old thread before you posted. If you happen to have a reference or search criteria for a previous thread, could you post it? Otherwise, a brief explanation of why this is considered a feature worth keeping would be good. I know it has been explained before, but it just looks wrong, on the face of it. What's going on here is an unpleasant interaction of several different features: 1. The notations a.b and b(a) are equivalent: either one can mean the column b of a table a, or an invocation of a function b() that takes a's composite type as parameter. This is an ancient PostQUEL-ism, but we've preserved it because it is helpful for things like emulating computed columns via functions. 2. The notation t(x) will be taken to mean x::t if there's no function t() taking x's type, but there is a cast from x's type to t. This is just as ancient as #1. It doesn't really add any functionality, but I believe we would break a whole lot of users' code if we took it away. Because of #1, this also means that x.t could mean x::t. 3. As of 8.4 or so, there are built-in casts available from pretty much any type (including composites) to all the built-in string types, viz text, varchar, bpchar, name. Upshot is that t.name is a cast to type name if there's no column or user-defined function that can match the call. We've seen bug reports on this with respect to both the name and text cases, though I'm too lazy to trawl the archives for them just now. So, if you want to throw an error for this, you have to choose which of these other things you want to break. I think if I had to pick a proposal, I'd say we should disable #2 for the specific case of casting a composite type to something else. The intentional uses I've seen were all scalar types; and before 8.4 there was no built-in functionality that such a call could match. If we slice off some other part of the functionality, we risk breaking apps that've worked for many years. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, 29 Oct 2010, Robert Haas wrote: On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: James Mansion ja...@mansionfamily.plus.com writes: Tom Lane wrote: The other and probably worse problem is that there's no application control over how soon changes to mmap'd pages get to disk. An msync will flush them out, but the kernel is free to write dirty pages sooner. So if they're depending for consistency on writes not happening until msync, it's broken by design. (This is one of the big reasons we don't use mmap'd space for Postgres disk buffers.) Well, I agree that it sucks for the reason you give - but you use write and that's *exactly* the same in terms of when it gets written, as when you update a byte on an mmap'd page. Uh, no, it is not. The difference is that we can update a byte in a shared buffer, and know that it *isn't* getting written out before we say so. If the buffer were mmap'd then we'd have no control over that, which makes it mighty hard to obey the WAL write log before data paradigm. It's true that we don't know whether write() causes an immediate or delayed disk write, but we generally don't care that much. What we do care about is being able to ensure that a WAL write happens before the data write, and with mmap we don't have control over that. Well, we COULD keep the data in shared buffers, and then copy it into an mmap()'d region rather than calling write(), but I'm not sure there's any advantage to it. Managing address space mappings is a pain in the butt. keep in mind that you have no way of knowing what order the data in the mmap region gets written out to disk. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application
My questions are: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads As said by others, once commited it is immediately visible to all (2) Are there any available benchmarks that can measure this delay? Since you will not be batching INSERTs, you will use 1 INSERT per transaction. If you use Autocommit mode, that's it. If you don't, you will get a few extra network roundtrips after the INSERT, to send the COMMIT. One INSERT is usually extremely fast unless you're short on RAM and the indexes that need updating need some disk seeking. Anyway, doing lots of INSERTs each in its own transaction is usually very low-throughput, because at each COMMIT, postgres must always be sure that all the data is actually written to the harddisks. So, depending on the speed of your harddisks, each COMMIT can take up to 10-20 milliseconds. On a 7200rpm harddisk, it is absolutely impossible to do more than 7200 commits/minute if you want to be sure each time that the data really is written on the harddisk, unless : - you use several threads (one disk write can group several commits from different connections, see the config file docs) - you turn of synchronous_commit ; in this case commit is instantaneous, but if your server loses power or crashes, the last few seconds of data may be lost (database integrity is still guaranteed though) - you use a battery backup cache on your RAID controller, in this case written to the harddisks is replaced by written to batteyr backed RAM which is a lot faster If you dont use battery backed cache, place the xlog on a different RAID1 array than the tables/indexes, this allows committing of xlog records (which is the time critical part) to proceed smoothly and not be disturbed by other IO on the indexes/tables. Also consider tuning your bgwriter and checkpoints, after experimentation under realistic load conditions. So, when you benchmark your application, if you get disappointing results, think about this... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow Query- Bad Row Estimate
Unfortunately I have not received a response on this question. Is more information needed? Does anyone have any ideas why the estimates may be bad? Or what I might be able to do to speed this up? thanks From: Ozer, Pam Sent: Tuesday, October 26, 2010 4:27 PM To: 'pgsql-performance@postgresql.org' Subject: Slow Query- Bad Row Estimate I have the following query: select distinct Region.RegionShort as RegionShort ,County.County as County from Region join PostalCodeRegionCountyCity on (PostalCodeRegionCountyCity.RegionId=Region.RegionId) join DealerGroupGeoCache on (DealerGroupGeoCache.RegionId=PostalCodeRegionCountyCity.RegionId) and (DealerGroupGeoCache.CountyId=PostalCodeRegionCountyCity.CountyId) and (DealerGroupGeoCache.CityId=PostalCodeRegionCountyCity.CityId) join County on (PostalCodeRegionCountyCity.CountyId=County.CountyId) where (DealerGroupGeoCache.DealerGroupId=13) and (PostalCodeRegionCountyCity.RegionId=5) With the following Explain: HashAggregate (cost=6743.96..6747.36 rows=34 width=11) (actual time=854.407..854.425 rows=57 loops=1) - Nested Loop (cost=0.00..6743.28 rows=34 width=11) (actual time=0.062..762.698 rows=163491 loops=1) - Nested Loop (cost=0.00..6679.19 rows=34 width=11) (actual time=0.053..260.001 rows=163491 loops=1) - Index Scan using region_i00 on region (cost=0.00..3.36 rows=1 width=5) (actual time=0.009..0.011 rows=1 loops=1) Index Cond: (regionid = 5) - Merge Join (cost=0.00..6672.43 rows=34 width=10) (actual time=0.040..189.654 rows=163491 loops=1) Merge Cond: ((postalcoderegioncountycity.countyid = dealergroupgeocache.countyid) AND (postalcoderegioncountycity.cityid = dealergroupgeocache.cityid)) - Index Scan using postalcoderegioncountycity_i06 on postalcoderegioncountycity (cost=0.00..716.05 rows=2616 width=10) (actual time=0.018..1.591 rows=2615 loops=1) Index Cond: (regionid = 5) - Index Scan using dealergroupgeocache_i01 on dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual time=0.015..87.689 rows=163491 loops=1) Index Cond: ((dealergroupgeocache.dealergroupid = 13) AND (dealergroupgeocache.regionid = 5)) - Index Scan using county_i00 on county (cost=0.00..1.77 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=163491) Index Cond: (county.countyid = dealergroupgeocache.countyid) Total runtime: 854.513 ms The statistics have been recently updated and it does not change the bad estimates. The DealerGroupGeoCache Table has 765392 Rows, And the query returns 57 rows. I am not at all involved in the way the server is set up so being able to change the settings is not very likely unless it will make a huge difference. Is there any way for me to speed up this query without changing the settings? If not what would you think the changes that would be needed? We are currently running Postgres8.4 with the following settings. shared_buffers = 500MB # min 128kB effective_cache_size = 1000MB max_connections = 100 temp_buffers = 100MB work_mem = 100MB maintenance_work_mem = 500MB max_files_per_process = 1 seq_page_cost = 1.0 random_page_cost = 1.1 cpu_tuple_cost = 0.1 cpu_index_tuple_cost = 0.05 cpu_operator_cost = 0.01 default_statistics_target = 1000 autovacuum_max_workers = 1 #log_min_messages = DEBUG1 #log_min_duration_statement = 1000 #log_statement = all #log_temp_files = 128 #log_lock_waits = on #log_line_prefix = '%m %u %d %h %p %i %c %l %s' #log_duration = on #debug_print_plan = on Any help is appreciated, Pam
Re: [PERFORM] BBU Cache vs. spindles
Tom Lane wrote: Uh, no, it is not. The difference is that we can update a byte in a shared buffer, and know that it *isn't* getting written out before we Well, I don't know where yu got the idea I was refering to that sort of thing - its the same as writing to a buffer before copying to the mmap'd area. It's true that we don't know whether write() causes an immediate or delayed disk write, but we generally don't care that much. What we do Which is what I was refering to. care about is being able to ensure that a WAL write happens before the data write, and with mmap we don't have control over that. I think you have just the same control either way, because you can only force ordering with an appropriate explicit sync, and in the absence of such a sync all bets are off for whether/when each disk page is written out, and if you can't ensure that the controller and disk are write through you'd better do a hardware cache flush.too, right? A shame that so many systems have relatively poor handling of that hardware flush. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
On 10/29/10 1:54 PM, Ozer, Pam wrote: - Index Scan using dealergroupgeocache_i01 on dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual time=0.015..87.689 rows=163491 loops=1) This appears to be your problem here. a) when was dealergroupgeocache last ANALYZED? b) try increasing the stats_target on dealergroupid and regionid, to say 500 and re-analyzing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, 29 Oct 2010, James Mansion wrote: Tom Lane wrote: Uh, no, it is not. The difference is that we can update a byte in a shared buffer, and know that it *isn't* getting written out before we Well, I don't know where yu got the idea I was refering to that sort of thing - its the same as writing to a buffer before copying to the mmap'd area. It's true that we don't know whether write() causes an immediate or delayed disk write, but we generally don't care that much. What we do Which is what I was refering to. care about is being able to ensure that a WAL write happens before the data write, and with mmap we don't have control over that. I think you have just the same control either way, because you can only force ordering with an appropriate explicit sync, and in the absence of such a sync all bets are off for whether/when each disk page is written out, and if you can't ensure that the controller and disk are write through you'd better do a hardware cache flush.too, right? A shame that so many systems have relatively poor handling of that hardware flush. the issue is that when you update a mmaped chunk of data, it could be written out immediatly without you doing _anything_ (and thanks to multiple cores/threads, it could get written out while you are still in the middle of updating it). When you update an internal buffer and then write that, you know that nothing will hit the disk before you issue the write command. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Ozer, Pam po...@automotive.com writes: Unfortunately I have not received a response on this question. Is more information needed? Does anyone have any ideas why the estimates may be bad? Or what I might be able to do to speed this up? The most likely explanation for the bad rowcount estimates is that there is correlation between the regionid/countyid/cityid columns, only the planner doesn't know it. Can you reformulate that data representation at all, or at least avoid depending on it as a join key? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Ozer, Pam po...@automotive.com wrote: Is more information needed? Table layouts of the tables involved (including indexes) would be interesting. A description of the machine would be useful, including OS, CPUs, RAM, and disk system. I know you said you might have trouble changing the config, but some of these seem problematic. shared_buffers = 500MB effective_cache_size = 1000MB max_connections = 100 temp_buffers = 100MB So you will allow up to 10GB to be tied up in space reserved for temporary tables, but only expect to cache 1GB of your database? That hardly seems optimal. work_mem = 100MB That could be another 10GB or more in work memory at any moment, if each connection was running a query which needed one work_mem allocation. seq_page_cost = 1.0 random_page_cost = 1.1 cpu_tuple_cost = 0.1 cpu_index_tuple_cost = 0.05 cpu_operator_cost = 0.01 Those settings are OK if the active portion of the database is fully cached. Is it? default_statistics_target = 1000 If plan times get long with complex queries, you might want to back that off; otherwise, OK. autovacuum_max_workers = 1 That seems like a bad idea. Allowing multiple workers helps reduce bloat and improve statistics. If autovacuum is affecting performance, you would be better off tweaking the autovacuum cost limits. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
I am not sure what you mean by reformulate the data representation. Do you mean do I have to join on all three columns? -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, October 29, 2010 2:18 PM To: Ozer, Pam Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate Ozer, Pam po...@automotive.com writes: Unfortunately I have not received a response on this question. Is more information needed? Does anyone have any ideas why the estimates may be bad? Or what I might be able to do to speed this up? The most likely explanation for the bad rowcount estimates is that there is correlation between the regionid/countyid/cityid columns, only the planner doesn't know it. Can you reformulate that data representation at all, or at least avoid depending on it as a join key? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanation of the default_stats work? I don't think I completely understand what it means when you set it to 500 instead of 1000? thanks -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Josh Berkus Sent: Friday, October 29, 2010 2:10 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate On 10/29/10 1:54 PM, Ozer, Pam wrote: - Index Scan using dealergroupgeocache_i01 on dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual time=0.015..87.689 rows=163491 loops=1) This appears to be your problem here. a) when was dealergroupgeocache last ANALYZED? b) try increasing the stats_target on dealergroupid and regionid, to say 500 and re-analyzing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Ozer, Pam po...@automotive.com writes: I am not sure what you mean by reformulate the data representation. Do you mean do I have to join on all three columns? No, I was wondering if you could change things so that you join on just one column, instead of two that each tell part of the truth. BTW, did you check your current statistics target? If it's small then raising it might possibly fix the problem by itself. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
On 10/29/10 2:47 PM, Ozer, Pam wrote: I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanation of the default_stats work? I don't think I completely understand what it means when you set it to 500 instead of 1000? You're already at 1000? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Yes. The default statistics target was at 1000. So that would be what the column was using correct? -Original Message- From: Josh Berkus [mailto:j...@agliodbs.com] Sent: Friday, October 29, 2010 2:55 PM To: Ozer, Pam Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate On 10/29/10 2:47 PM, Ozer, Pam wrote: I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanation of the default_stats work? I don't think I completely understand what it means when you set it to 500 instead of 1000? You're already at 1000? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Ozer, Pam po...@automotive.com writes: Yes. The default statistics target was at 1000. So that would be what the column was using correct? But you evidently didn't have stats. Perhaps you have autovacuum turned off? What PG version is this anyway? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Bad Row Estimate
Its 8.4. On the column stats_target=-1 before I changed it. AutoVacuum is set to on. I actually did a full analyze of the database and then ran it again. So what am I missing? -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, October 29, 2010 3:03 PM To: Ozer, Pam Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate Ozer, Pam po...@automotive.com writes: Yes. The default statistics target was at 1000. So that would be what the column was using correct? But you evidently didn't have stats. Perhaps you have autovacuum turned off? What PG version is this anyway? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance