[PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Or, if you want to actually read that query plan, try: http://explain.depesz.com/s/qYq -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
2010/2/10 Bryce Nesbitt bry...@obviously.com: Or, if you want to actually read that query plan, try: http://explain.depesz.com/s/qYq hello, check your work_mem sesttings. Hash join is very slow in your case. Pavel -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [GENERAL] PostgreSQL - case studies
El 10/02/2010 6:49, Scott Marlowe escribió: Quick note, please stick to text formatted email for the mailing list, it's the preferred format. On Tue, Feb 9, 2010 at 9:09 PM, Jayadevan M jayadevan.maym...@ibsplc.com wrote: Hello all, Apologies for the long mail. I work for a company that is provides solutions mostly on a Java/Oracle platform. Recently we moved on of our products to PostgreSQL. The main reason was PostgreSQL's GIS capabilities and the inability of government departments (especially road/traffic) to spend a lot of money for such projects. This product is used to record details about accidents and related analysis (type of road, when/why etc) with maps. Fortunately, even in India, an accident reporting application does not have to handle many tps :). So, I can't say PostgreSQL's performance was really tested in this case. Later, I tested one screen of one of our products - load testing with Jmeter. We tried it with Oracle, DB2, PostgreSQL and Ingres, and PostgreSQL easily out-performed the rest. We tried a transaction mix with 20+ SELECTS, update, delete and a few inserts. Please note that benchmarking oracle (and a few other commercial dbs) and then publishing those results without permission of oracle is considered to be in breech of their contract. Yeah, another wonderful aspect of using Oracle. That said, and as someone who is not an oracle licensee in any way, this mimics my experience that postgresql is a match for oracle, db2, and most other databases in the simple, single db on commodity hardware scenario. After a really good experience with the database, I subscribed to all PostgreSQL groups (my previous experience is all-Oracle) and reading these mails, I realized that many organizations are using plan, 'not customized' PostgreSQL for databases that handle critical applications. Since there is no company trying to 'sell' PostgreSQL, many of us are not aware of such cases. Actually there are several companies that sell pgsql service, and some that sell customized versions. RedHat, Command Prompt, EnterpriseDB, and so on. Could some of you please share some info on such scenarios- where you are supporting/designing/developing databases that run into at least a few hundred GBs of data (I know, that is small by todays' standards)? There are other instances of folks on the list sharing this kind of info you can find by searching the archives. I've used pgsql for about 10 years for anywhere from a few megabytes to hundreds of gigabytes, and all kinds of applications. Where I currently work we have a main data store for a web app that is about 180Gigabytes and growing, running on three servers with slony replication. We handle somewhere in the range of 10k to 20k queries per minute (a mix of 90% or so reads to 10% writes). Peak load can be into the 30k or higher reqs / minute. The two big servers that handle this load are dual quad core opteron 2.1GHz machines with 32Gig RAM and 16 15krpm SAS drives configured as 2 in RAID-1 for OS and pg_xlog, 2 hot spares, and 12 in a RAID-10 for the main data. HW Raid controller is the Areca 1680 which is mostly stable, except for the occasional (once a year or so) hang problem which has been described, and which Areca has assured me they are working on. Our total downtime due to database outages in the last year or so has been 10 to 20 minutes, and that was due to a RAID card driver bug that hits us about once every 300 to 400 days. the majority of the down time has been waiting for our hosting provider to hit the big red switch and restart the main server. Our other pgsql servers provide search facility, with a db size of around 300Gig, and statistics at around ~1TB. I am sure PostgreSQL has matured a lot more from the days when these case studies where posted. I went through the case studies at EnterpiseDB and similar vendors too. But those are customized PostgreSQL servers. Not necessarily. They sell support more than anything, and the majority of customization is not for stability but for additional features, such as mpp queries or replication etc. The real issue you run into is that many people don't want to tip their hand that they are using pgsql because it is a competitive advantage. It's inexpensive, capable, and relatively easy to use. If your competitor is convinced that Oracle or MSSQL server with $240k in licensing each year is the best choice, and you're whipping them with pgsql, the last thing you want is for them to figure that out and switch. Following with that subject, there are many apps on the world that are using PostgreSQL for its business. We are planning the design and deployment of the a large PostgreSQL Cluster for a DWH-ODS-BI apps. We are documenting everthing for give the information later to be published on the PostgreSQL CaseStudies section. We are using Slony-I for replication, PgBouncer for pooling
Re: [PERFORM] PostgreSQL - case studies
Jayadevan M jayadevan.maym...@ibsplc.com wrote: Could some of you please share some info on such scenarios- where you are supporting/designing/developing databases that run into at least a few hundred GBs of data (I know, that is small by todays' standards)? I'm a database administrator for the Wisconsin Courts. We've got about 200 PostgreSQL database clusters on about 100 servers spread across the state. Databases range from tiny (few MB) to 1.3 TB. Check out this for more info: http://www.pgcon.org/2009/schedule/events/129.en.html I hope that helps. If you have any particular questions not answered by the above, just ask. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL - case studies
* Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: Could some of you please share some info on such scenarios- where you are supporting/designing/developing databases that run into at least a few hundred GBs of data (I know, that is small by todays' standards)? Just saw this, so figured I'd comment: tsf= \l+ List of databases Name| Owner | Encoding | Collation |Ctype| Access privileges | Size | Tablespace |Description ---+--+--+-+-++-+-+--- beac | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres | 1724 GB | pg_default | Doesn't look very pretty, but the point is that its 1.7TB. There's a few other smaller databases on that system too. PG handles it quite well, though this is primairly for data-mining. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] How exactly PostgreSQL allocates memory for its needs?
On 2/10/2010 12:10 AM, Anton Maksimenkov wrote: Can anybody briefly explain me how each postgres process allocate memory for it needs? I mean, what is the biggest size of malloc() it may want? How many such chunks? What is the average size of allocations? I think that at first it allocates contiguous piece of shared memory for shared buffers (rather big, hundreds of megabytes usually, by one chunk). What next? temp_buffers, work_mem, maintenance_work_mem - are they allocated as contiguous too? What about other needs? By what size they are typically allocated? There is no short answer to this, you should read section 18 of the manual http://www.postgresql.org/docs/8.4/interactive/runtime-config.html specifically section 18.4 http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html and performance section of the wiki http://wiki.postgresql.org/wiki/Performance_Optimization Here is a link annotated postgresql.conf http://www.pgcon.org/2008/schedule/attachments/44_annotated_gucs_draft1.pdf Keep in mind each connection/client that connecting to the server creates a new process on the server. Each one the settings you list above is the max amount of memory each one of those sessions is allowed to consume. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
On Feb 10, 2010, at 1:37 AM, Greg Smith wrote: Jeff wrote: I'd done some testing a while ago on the schedulers and at the time deadline or noop smashed cfq. Now, it is 100% possible since then that they've made vast improvements to cfq and or the VM to get better or similar performance. I recall a vintage of 2.6 where they severely messed up the VM. Glad I didn't upgrade to that one :) Here's the old post: http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php pgiosim doesn't really mix writes into there though, does it? The mixed read/write situations are the ones where the scheduler stuff gets messy. It has the abillity to rewrite blocks randomly as well - but I honestly don't remember if I did that during my cfq/deadline test. I'd wager I didn't. Maybe I'll get some time to run some more tests on it in the next couple days -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deferred constraint and delete performance
On Wednesday 10 February 2010 15:56:40 Tom Lane wrote: Franck Routier franck.rout...@axege.com writes: I am wondering if deferring foreign key constraints (instead of disableing them) would increase performance, compared to non deferred constraints No, it wouldn't make any noticeable difference AFAICS. It would postpone the work from end-of-statement to end-of-transaction, but not make the work happen any more (or less) efficiently. It could make a difference if the transaction is rather long and updates the same row repeatedly because of better cache usage. But I admit thats a bit of a constructed scenario (where one likely would get into trigger-queue size problems as well) Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
On Feb 9, 2010, at 10:37 PM, Greg Smith wrote: Jeff wrote: I'd done some testing a while ago on the schedulers and at the time deadline or noop smashed cfq. Now, it is 100% possible since then that they've made vast improvements to cfq and or the VM to get better or similar performance. I recall a vintage of 2.6 where they severely messed up the VM. Glad I didn't upgrade to that one :) Here's the old post: http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php pgiosim doesn't really mix writes into there though, does it? The mixed read/write situations are the ones where the scheduler stuff gets messy. Also, read/write mix performance depend on the file system not just the scheduler. The block device readahead parameter can have a big impact too. If you test xfs, make sure you configure the 'allocsize' mount parameter properly as well. If there are any sequential reads or writes in there mixed with other reads/writes, that can have a big impact on how fragmented the filesystem gets. Ext3 has several characteristics for writes that might favor cfq that other file systems do not. Features like delayed allocation, extents, and write barriers significantly change the pattern of writes seen by the I/O scheduler. In short, one scheduler may be best for one filesystem, but not a good idea for others. And then on top of that, it all depends on what type of DB you're running. Lots of small fast mostly read queries? Large number of small writes? Large bulk writes? Large reporting queries? Different configurations and tuning is required to maximize performance on each. There is no single rule for Postgres on Linux that I can think of other than never have ext3 in 'ordered' or 'journal' mode for your WAL on the same filesystem as your data. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
On Wed, Feb 10, 2010 at 3:29 AM, Bryce Nesbitt bry...@obviously.com wrote: Or, if you want to actually read that query plan, try: http://explain.depesz.com/s/qYq Much better, though I prefer a text attachment... anyhow, I think the root of the problem may be that both of the subquery scans under the append node are seeing hundreds of times more rows than they're expecting, which is causing the planner to choose nested loops higher up that it otherwise might have preferred to implement in some other way. I'm not quite sure why, though. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] perf problem with huge table
Hi all, i am trying to move my app from M$sql to PGsql, but i need a bit of help :) on M$sql, i had certain tables that was made as follow (sorry pseudo code) contab_y date amt uid contab_yd date amt uid contab_ymd date amt uid and so on.. this was used to solidify (aggregate..btw sorry for my terrible english) the data on it.. so basically, i get contab_y date = 2010 amt = 100 uid = 1 contab_ym date = 2010-01 amt = 10 uid = 1 date = 2010-02 amt = 90 uid = 1 contab_ymd date=2010-01-01 amt = 1 uid = 1 blabla in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check what i have try is to go for a normal approach, using just a table that contains all the data, and some proper indexing. The issue is that this table can contains easilly 100M rows :) that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums. I am here to ask for an advice to PGsql experts: what do you think i can do to better manage this situation? there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? any advice is really appreciated! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] perf problem with huge table
On 2/10/2010 5:13 PM, rama wrote: in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check what i have try is to go for a normal approach, using just a table that contains all the data, and some proper indexing. The issue is that this table can contains easilly 100M rows :) that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums. I am here to ask for an advice to PGsql experts: what do you think i can do to better manage this situation? there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? any advice is really appreciated! Look into table partitioning http://www.postgresql.org/docs/current/static/ddl-partitioning.html Its similar to what you are doing but it simplifies queries and logic to access large data sets. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] perf problem with huge table
Hi Rama I'm actually looking at going in the other direction I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale. In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does. My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff: - Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases. - PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-) - There is a lot of row turnover in my big table (I age out data) a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big DELETE from FOO where ... and let the database chew on it I am interested to hear about your progress. Cheers Dave On Wed, Feb 10, 2010 at 4:13 PM, rama rama.r...@tiscali.it wrote: Hi all, i am trying to move my app from M$sql to PGsql, but i need a bit of help :) on M$sql, i had certain tables that was made as follow (sorry pseudo code) contab_y date amt uid contab_yd date amt uid contab_ymd date amt uid and so on.. this was used to solidify (aggregate..btw sorry for my terrible english) the data on it.. so basically, i get contab_y date = 2010 amt = 100 uid = 1 contab_ym date = 2010-01 amt = 10 uid = 1 date = 2010-02 amt = 90 uid = 1 contab_ymd date=2010-01-01 amt = 1 uid = 1 blabla in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check what i have try is to go for a normal approach, using just a table that contains all the data, and some proper indexing. The issue is that this table can contains easilly 100M rows :) that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums. I am here to ask for an advice to PGsql experts: what do you think i can do to better manage this situation? there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? any advice is really appreciated! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
That sure looks like the source of the problem to me too. I've seen similar behavior in queries not very different from that. It's hard to guess what the problem is exactly without having more knowledge of the data distribution in article_words though. Given the results of analyze, I'd try to run the deepest subquery and try to see if I could get the estimate to match reality, either by altering statistics targets, or tweaking the query to give more information to the planner. For example, i'd check if the number of expected rows from SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' is much less accurate than the estimate for SELECT context_key FROM article_words WHERE word_key = (whatever the actual word_key for insider is) Robert Haas robertmh...@gmail.com 02/10/10 2:31 PM On Wed, Feb 10, 2010 at 3:29 AM, Bryce Nesbitt bry...@obviously.com wrote: Or, if you want to actually read that query plan, try: http://explain.depesz.com/s/qYq Much better, though I prefer a text attachment... anyhow, I think the root of the problem may be that both of the subquery scans under the append node are seeing hundreds of times more rows than they're expecting, which is causing the planner to choose nested loops higher up that it otherwise might have preferred to implement in some other way. I'm not quite sure why, though. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] perf problem with huge table
On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke dcro...@gmail.com wrote: Hi Rama I'm actually looking at going in the other direction I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale. In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does. My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff: Just a nit, but Oracle implements MVCC. 90% of the databases out there do. - Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases. - PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-) - There is a lot of row turnover in my big table (I age out data) a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big DELETE from FOO where ... and let the database chew on it . I find partitioning pretty useful in this scenario if the data allows is. Aging out data just means dropping a partition rather than a delete statement.
Re: [PERFORM] perf problem with huge table
On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote: Just a nit, but Oracle implements MVCC. 90% of the databases out there do. Sorry, I spoke imprecisely. What I meant was the difference in how the rows are stored internally in Oracle, the main tablespace contains only the newest version of a row, which is (where possible) updated in place - queries in a transaction that can still see an older version have to pull it from the UNDO tablespace (rollback segments in Oracle 8 and older). In Postgres, all versions of all rows are in the main table, and have validity ranges associated with them (this version of this row existed between transaction ids x and y). Once a version goes out of scope, it has to be garbage collected by the vacuuming process so the space can be re-used. In general, this means Oracle is faster *if* you're only doing lots of small transactions (consider how these different models handle an update to a single field in a single row) but it is more sensitive to the scale of transactions doing a really big transaction against a database with an OLTP workload can upset Oracle's digestion as it causes a lot of UNDO lookups, PG's performance is a lot more predictable in this regard. Both models have benefits and drawbacks ... when designing a schema for performance it's important to understand these differences. I find partitioning pretty useful in this scenario if the data allows is. Aging out data just means dropping a partition rather than a delete statement. Forgot to say this - yes, absolutely agree dropping a table is a lot cheaper than a transactional delete. In general, I think partitioning is more important / beneficial with PG's style of MVCC than with Oracle or SQL-Server (which I think is closer to Oracle than PG). Cheers Dave
Re: [PERFORM] perf problem with huge table
On Wed, Feb 10, 2010 at 4:48 PM, Dave Crooke dcro...@gmail.com wrote: On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote: Just a nit, but Oracle implements MVCC. 90% of the databases out there do. Sorry, I spoke imprecisely. What I meant was the difference in how the rows are stored internally in Oracle, the main tablespace contains only the newest version of a row, which is (where possible) updated in place - queries in a transaction that can still see an older version have to pull it from the UNDO tablespace (rollback segments in Oracle 8 and older). In Postgres, all versions of all rows are in the main table, and have validity ranges associated with them (this version of this row existed between transaction ids x and y). Once a version goes out of scope, it has to be garbage collected by the vacuuming process so the space can be re-used. In general, this means Oracle is faster *if* you're only doing lots of small transactions (consider how these different models handle an update to a single field in a single row) but it is more sensitive to the scale of transactions doing a really big transaction against a database with an OLTP workload can upset Oracle's digestion as it causes a lot of UNDO lookups, PG's performance is a lot more predictable in this regard. Both models have benefits and drawbacks ... when designing a schema for performance it's important to understand these differences. Yes, absolutely. It's not unusual to see the UNDO tablespace increase in size by several gigs for a large bulk load. Speaking of rollback segments I'm assuming that since all storage for non-visible row versions is in the main table that PostgreSQL has no equivalent for an ORA-01555. - Jon
Re: [PERFORM] perf problem with huge table
Actually, in a way it does No space left on device or similar ;-) Cheers Dave P.S. for those not familiar with Oracle, ORA-01555 translates to your query / transaction is kinda old and I've forgotten the data, so I'm just going to throw an error at you now. If you're reading, your SELECT randomly fails, if you're writing it forces a rollback of your transaction. On Wed, Feb 10, 2010 at 6:09 PM, Jon Lewison jlewis...@gmail.com wrote: Speaking of rollback segments I'm assuming that since all storage for non-visible row versions is in the main table that PostgreSQL has no equivalent for an ORA-01555. - Jon
Re: [PERFORM] perf problem with huge table
rama wrote: in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check You sound like you're trying to implement something like materialized views one at a time; have you considered adopting the more general techniques used to maintain those so that you're not doing custom development each time for the design? http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views http://www.pgcon.org/2008/schedule/events/69.en.html I think that sort of approach is more practical than it would have been for you in MySQL, so maybe this wasn't on your list of possibilities before. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
Scott Marlowe wrote: I'd love to see someone do a comparison of early to mid 2.6 kernels (2.6.18 like RHEL5) to very up to date 2.6 kernels. On fast hardware. I'd be happy just to find fast hardware that works on every kernel from the RHEL5 2.6.18 up to the latest one without issues. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
On Wed, 10 Feb 2010, Greg Smith wrote: Scott Marlowe wrote: I'd love to see someone do a comparison of early to mid 2.6 kernels (2.6.18 like RHEL5) to very up to date 2.6 kernels. On fast hardware. I'd be happy just to find fast hardware that works on every kernel from the RHEL5 2.6.18 up to the latest one without issues. it depends on your definition of 'fast hardware' I have boxes that were very fast at the time that work on all these kernels, but they wouldn't be considered fast by todays's standards. remember that there is a point release about every 3 months, 2.6.33 is about to be released, so this is a 3 x (33-18) = ~45 month old kernel. hardware progresses a LOT on 4 years. most of my new hardware has no problems with the old kernels as well, but once in a while I run into something that doesn't work. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
If you guys succeed in making this class of query perform, you'll have beat out the professional consulting firm we hired, which was all but useless! The query is usually slow, but particular combinations of words seem to make it obscenely slow. The query plans are now attached (sorry I did not start there: many lists reject attachments). Or you can click on text at the query planner analysis site http://explain.depesz.com/s/qYq _Here's typical server load:_ Tasks: 166 total, 1 running, 165 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 0.9%sy, 0.0%ni, 86.5%id, 11.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 32966936k total, 32873860k used,93076k free, 2080k buffers Swap: 33554424k total, 472k used, 33553952k free, 30572904k cached _ Configurations modified from Postgres 8.3 default are:_ listen_addresses = '10.100.2.11, 10.101.2.11' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 400 # (change requires restart) shared_buffers = 4096MB # min 128kB or max_connections*16kB work_mem = 16MB # min 64kB max_fsm_pages = 50 # default:2 min:max_fsm_relations*16,6 bytes see:MAIN-5740 max_fsm_relations = 2700# min 100, ~70 bytes each checkpoint_segments = 20# in logfile segments, min 1, 16MB each random_page_cost = 2.0 # same scale as above effective_cache_size = 28672MB default_statistics_target = 150 # range 1-1000 log_destination = 'syslog' # Valid values are combinations of log_min_error_statement = error # values in order of decreasing detail: log_min_duration_statement = 5000 # -1 is disabled, 0 logs all statements log_checkpoints = on# default off autovacuum_naptime = 5min # time between autovacuum runs escape_string_warning = off # default:on (See bepress MAIN-4857) standard_conforming_strings = off # deafult:off (See bepress MAIN-4857) production=# EXPLAIN ANALYZE SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider'; - Nested Loop (cost=0.00..23393.15 rows=23 width=4) (actual time=0.077..15.637 rows=4003 loops=1) - Index Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual time=0.049..0.051 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) - Index Scan using article_words_wc on article_words (cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237 rows=4003 loops=1) Index Cond: (article_words.word_key = words.word_key) Total runtime: 19.776 ms production=# EXPLAIN ANALYZE SELECT context_key FROM article_words WHERE word_key = 3675; --- Index Scan using article_words_wc on article_words (cost=0.00..21433.53 rows=11309 width=4) (actual time=0.025..7.579 rows=4003 loops=1) Index Cond: (word_key = 3675) Total runtime: 11.704 ms production=# explain analyze select count(*) from article_words; Aggregate (cost=263831.63..263831.64 rows=1 width=0) (actual time=35851.654..35851.655 rows=1 loops=1) - Seq Scan on words (cost=0.00..229311.30 rows=13808130 width=0) (actual time=0.043..21281.124 rows=13808184 loops=1) Total runtime: 35851.723 ms production=# select count(*) from words; 13,808,184 production=# explain analyze select count(*) from article_words; Aggregate (cost=5453242.40..5453242.41 rows=1 width=0) (actual time=776504.017..776504.018 rows=1 loops=1) - Seq Scan on article_words (cost=0.00..4653453.52 rows=319915552 width=0) (actual time=0.034..438969.347 rows=319956663 loops=1) Total runtime: 776504.177 ms production=# select count(*) from article_words; 319,956,720 Aggregate (cost=139279.85..139279.86 rows=1 width=4) (actual time=502493.000..502493.001 rows=1 loops=1) - Hash Join (cost=131314.31..139279.84 rows=2 width=4) (actual time=501787.397..502492.316 rows=622 loops=1) Hash Cond: (matview_82034.context_key = articles.context_key) - Seq Scan on matview_82034 (cost=0.00..6322.20 rows=438220 width=4) (actual time=0.014..462.312 rows=438220 loops=1) - Hash (cost=131314.30..131314.30 rows=1 width=16) (actual time=501553.755..501553.755 rows=622 loops=1) - Nested Loop IN Join (cost=46291.79..131314.30 rows=1 width=16) (actual time=467.546..501550.735 rows=622 loops=1) Join Filter: (a.context_key = articles.context_key) - Nested Loop (cost=46291.79..46323.15 rows=2 width=12) (actual time=179.760..303.474 rows=1473 loops=1) - Nested Loop (cost=46291.79..46314.35
Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline
da...@lang.hm wrote: most of my new hardware has no problems with the old kernels as well, but once in a while I run into something that doesn't work. Quick survey just of what's within 20 feet of me: -Primary desktop: 2 years old, requires 2.6.23 or later for SATA to work -Server: 3 years old, requires 2.6.22 or later for the Areca card not to panic under load -Laptops: both about 2 years old, and require 2.6.28 to work at all; mostly wireless issues, but some power management ones that impact the processor working right too, occasional SATA ones too. I'm looking into a new primary desktop to step up to 8 HT cores; I fully expect it won't boot anything older than 2.6.28 and may take an even newer kernel just for basic processor and disks parts to work. We're kind of at a worst-case point right now for this sort of thing, on the tail side of the almost 3 year old RHEL5 using a 3.5 year old kernel as the standard for so many Linux server deployments. Until RHEL6 is ready to go, there's little motivation for the people who make server hardware to get all their drivers perfect in the newer kernels. Just after that ships will probably be a good time to do that sort of comparison, like it was possible to easily compare RHEL4 using 2.6.9 and RHEL5 with 2.6.18 easily in mid to late 2007 with many bits of high-performance hardware known to work well on each. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
On Monday 08 February 2010 05:53:23 Robert Haas wrote: On Sun, Feb 7, 2010 at 10:09 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Andres Freund escribió: I personally think the fsync on the directory should be added to the stable branches - other opinions? If wanted I can prepare patches for that. Yeah, it seems there are two patches here -- one is the addition of fsync_fname() and the other is the fsync_prepare stuff. Andres, you want to take a crack at splitting this up? I hope I didnt duplicate Gregs work, but I didnt hear back from him, so... Everything 8.1 is hopeless because cp is used there... I didnt see it worth to replace that. The patch applies cleanly for 8.1 to 8.4 and survives the regression tests Given pg's heavy commit model I didnt see a point to split the patch for 9.0 as well... Andres diff --git a/src/port/copydir.c b/src/port/copydir.c index 72fbf36..b057ffa 100644 *** a/src/port/copydir.c --- b/src/port/copydir.c *** copydir(char *fromdir, char *todir, bool *** 50,55 --- 50,56 { DIR *xldir; struct dirent *xlde; + int dirfd; char fromfile[MAXPGPATH]; char tofile[MAXPGPATH]; *** copydir(char *fromdir, char *todir, bool *** 91,96 --- 92,116 } FreeDir(xldir); + + /* + * fsync the directory to make sure data has reached the + * disk. While needed by most filesystems, the window got bigger + * with newer ones like ext4. + */ + dirfd = BasicOpenFile(todir, + O_RDONLY | PG_BINARY, + S_IRUSR | S_IWUSR); + if(dirfd == -1) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not open directory for fsync \%s\: %m, todir))); + + if(pg_fsync(dirfd) == -1) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not fsync directory \%s\: %m, todir))); + close(dirfd); } /* -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How exactly PostgreSQL allocates memory for its needs?
On Wed, Feb 10, 2010 at 9:43 AM, Justin Graf jus...@magwerks.com wrote: Keep in mind each connection/client that connecting to the server creates a new process on the server. Each one the settings you list above is the max amount of memory each one of those sessions is allowed to consume. It's even worse for work_mem (formerly sort_mem) in that each individual hash agg or sort can grab that much memory. A complex query with 4 sorts and 2 hash aggregates could chew through 6 x work_mem if it needed it. Which is why work_mem can be such a horrific foot gun. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Bryce Nesbitt bry...@obviously.com writes: The query plans are now attached (sorry I did not start there: many lists reject attachments). Or you can click on text at the query planner analysis site http://explain.depesz.com/s/qYq At least some of the problem is the terrible quality of the rowcount estimates in the IN subquery, as you extracted here: Nested Loop (cost=0.00..23393.15 rows=23 width=4) (actual time=0.077..15.637 rows=4003 loops=1) - Index Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual time=0.049..0.051 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) - Index Scan using article_words_wc on article_words (cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237 rows=4003 loops=1) Index Cond: (article_words.word_key = words.word_key) Total runtime: 19.776 ms Given that it estimated 1 row out of words (quite correctly) and 12264 rows out of each scan on article_words, you'd think that the join size estimate would be 12264, which would be off by only a factor of 3 from the true result. Instead it's 23, off by a factor of 200 :-(. Running a roughly similar test case here, I see that 8.4 gives significantly saner estimates, which I think is because of this patch: http://archives.postgresql.org/pgsql-committers/2008-10/msg00191.php At the time I didn't want to risk back-patching it, because there were a lot of other changes in the same general area in 8.4. But it would be interesting to see what happens with your example if you patch 8.3 similarly. (Note: I think only the first diff hunk is relevant to 8.3.) regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance