Re: [PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server
On Fri, Jan 31, 2014 at 8:55 AM, Julien Cigar jci...@ulb.ac.be wrote: On Sat, Jan 25, 2014 at 12:02:59PM +0530, RAMAKRISHNAN KANDASAMY wrote: Hi All, I have configured the blow parameters for a 32 GB server . I this correct ? shared_buffers = 6GB going over 2GB probably doesn't help That is true on 32 bit system. On a 64 bit system with 32GB of RAM, there is a lot of value to be potentially gained by having shared buffers significantly higher than 2GB. It's considered as a bad habit to change the cost settings, but I often raise the default cpu_tuple_cost to 0.08 (instead of 0.01) too. -- --Regards RAMAKRISHNAN KANDASAMY -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Thomas John
Re: [PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server
On Sat, Jan 25, 2014 at 12:02:59PM +0530, RAMAKRISHNAN KANDASAMY wrote: Hi All, I have configured the blow parameters for a 32 GB server . I this correct ? shared_buffers = 6GB going over 2GB probably doesn't help work_mem = 24MB maintenance_work_mem = 250MB work_mem depends a lot of your queries and the number of clients, but with 32GB RAM setting a default work_mem of 128MB would probably not hurt. Your maintenance_work_mem is too low, raise it to 2GB. effective_cache_size = 16GB if it's a dedicated server you can raise it to 24GB shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max = 1 pg_stat_statements.track = all wal_buffers = 8MB checkpoint_segments = 32 depends of your load, 10's reasonable for light loads. 50 or 100 isn't uncommon for heavier ones. Keep in mind that every increase of 30 will cost you 1 gigabyte of disk space in pg_xlog and an extra ~2-5 minutes (depends of your i/o) of recovery time after a crash. checkpoint_completion_target = 0.9 It's considered as a bad habit to change the cost settings, but I often raise the default cpu_tuple_cost to 0.08 (instead of 0.01) too. -- --Regards RAMAKRISHNAN KANDASAMY -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server
Hi All, I have configured the blow parameters for a 32 GB server . I this correct ? shared_buffers = 6GB work_mem = 24MB maintenance_work_mem = 250MB effective_cache_size = 16GB shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max = 1 pg_stat_statements.track = all wal_buffers = 8MB checkpoint_segments = 32 checkpoint_completion_target = 0.9 -- --Regards RAMAKRISHNAN KANDASAMY
[PERFORM] PostgreSQL 9.3.2 Performance issues
We have 64GB of Memory on RHEL 6.4 shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 1GB effective_cache_size = 48GBI found this list of recommended parameters for memory management in PostgreSQL. About shared_buffers: Below 2GB, set to 20% of total system memory.Below 32GB, set to 25% of total system memory.Above 32GB, set to 8GB About work_mem, this parameter can cause a huge speed-up if set properly, however it can use that amount of memory per planning node. Here are some recommendations to set it up. Start low: 32-64MBLook for ‘temporary file’ lines in logsSet to 2-3x the largest temp file About maintenance_work_mem, some recommendations were: 10% of system memory, up to1GBMaybe even higher if you are having VACUUM problems About effective_cache_size, guidelines suggested. Set to the amount of file system cache availableIf you don’t know, set it to 50% of total system memoryWe have real time 24/7 data ingest processes running on our 9.3.2 database 7TB in sizeDo these settings look correct for 9.3.2?thanks
Re: [PERFORM] PostgreSQL 9.3.2 Performance issues
fburg...@radiantblue.com fburg...@radiantblue.com wrote: We have 64GB of Memory on RHEL 6.4 shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 1GB effective_cache_size = 48GB Do these settings look correct for 9.3.2? Maybe. What is your max_connections setting? I find that a good place to start with work_mem is to ignore the factors you quoted, and to set it somewhere near 25% of machine RAM divided by max_connections. It might be possible to go up from there, but monitor closely for peaks of activity which cause enough memory allocation to flush the OS cache and cause high disk read rates, killing performance until the cache repopulates. The can take a while since the high disk read rates slow queries, causing more of them to compete, leading to higher total work_mem allocations, and thus preventing recovery from the performance degradation. In other words, setting this too high leads to unstable performance. It looks better than a lower setting until too many users hit Enter at about the same time, causing performance to collapse for a while. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
I am having the same exact problems. I reduced shared buffers as that seems to have done the trick for now in this thread. If things improve I'll post back and confirm. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-9-2-3-performance-problem-caused-Exclusive-locks-tp5747909p5756113.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
On Thu, 14 Mar 2013 06:53:55 +0200, Jeff Janes jeff.ja...@gmail.com wrote: On Friday, March 8, 2013, Emre Hasegeli wrote: PostgreSQL writes several following logs during the problem which I never saw before 9.2.3: LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms The key here is not that it is an ExclusiveLock, but rather than it is the relation extension lock. I don't think the extension lock is ever held across user-code, or transaction boundaries, or anything like that. It is held over some small IOs. So if it blocked on that for over 2 seconds, you almost surely have some serious IO congestion. And this particular message is probably more a symptom of that congestion than anything else. You said you rolled back to 9.2.2 and the stalling is still there. Are you still seeing the log message, or are you now seeing silently stalls? Did you roll back all other changes that were made at the same time as the upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)? I did not try with different kernel or file system. It was not because of 9.2.3, same problem occurred in both 9.2.2 and 9.2.3. Increasing max connections make it worse. It lasts almost 15 minutes in the last time. There were not much disk utilization while it is happening, top was pointing out most of the CPU usage on the %sy column, there were no IO wait. I saw allocstalls increasing on atop. There were a lot of slow insert statements in the logs except ExclusiveLock waits. We were using 64 GiB of shared buffers. RhodiumToad suggested to reduce it on the IRC channel. It did not happen since then. It was a real problem for us. I could not find anything related to it. I cannot let it happen again on the production environment but I would be happy to share more experience, if it would help you fix it. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
Emre, LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms The reason you're seeing that message is that you have log_lock_waits turned on. That message says that some process waited for 2.3 seconds to get a lock for expanding the size of relation 16396/305605, which is most likely an index. This is most likely due to changes in your application, or an increase in concurrent write activity. --Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
On Friday, March 8, 2013, Emre Hasegeli wrote: PostgreSQL writes several following logs during the problem which I never saw before 9.2.3: LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms The key here is not that it is an ExclusiveLock, but rather than it is the relation extension lock. I don't think the extension lock is ever held across user-code, or transaction boundaries, or anything like that. It is held over some small IOs. So if it blocked on that for over 2 seconds, you almost surely have some serious IO congestion. And this particular message is probably more a symptom of that congestion than anything else. You said you rolled back to 9.2.2 and the stalling is still there. Are you still seeing the log message, or are you now seeing silently stalls? Did you roll back all other changes that were made at the same time as the upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)? Cheers, Jeff
Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
2013-03-08 13:27:16 +0200 Emre Hasegeli emre.haseg...@tart.com.tr: PostgreSQL writes several following logs during the problem which I never saw before 9.2.3: LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms I tried * to downgrade to 9.2.2 * to disable autovacuum * to disable synchronous commit * to write less on the big tables * to increase checkpoint segments * to increase max connections * to move pg_xlog to sepe None of them helps to avoid downtimes. I could not find anything related to it? Do you have any idea? Have you ever experience something like this? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks
On Fri, 08 Mar 2013 20:39:45 +0200, AI Rumman rumman...@gmail.com wrote: Knowing your problem, I read the docs and found that : * * *Fix performance problems with autovacuum truncation in busy workloads (Jan Wieck)* *Truncation of empty pages at the end of a table requires exclusive lock, but autovacuum was coded to fail (and release the table lock) when there are conflicting lock requests. Under load, it is easily possible that truncation would never occur, resulting in table bloat. Fix by performing a partial truncation, releasing the lock, then attempting to re-acquire the lock and continuehttp://www.postgresql.org/docs/9.2/static/release-9-2-3.html#. This fix also greatly reduces the average time before autovacuum releases the lock after a conflicting request arrives.* This could be a reason of your locking. Yes, I saw this. It is commit b19e4250b45e91c9cbdd18d35ea6391ab5961c8d by Jan Wieck. He also seems worried in the commit message about this patch. Do you think this is the exact reason of the problem? I have downgraded to 9.2.2, decreased the autovacuum_max_workers to 2 from 10 and increase max_connections to 500 from 200 in the mean time. There are not any ExclusiveLock's since then. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql Replication Performance
Thank you all for the valuable information. Now we have decide to go with streaming replication. I did the setup on machine and it is working good. Now I have to implement the automatic failover. Please share a solution for the same. Saurabh Gupta -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql Replication Performance
On 01/04/2012 12:41 PM, Saurabh wrote: Thank you all for the valuable information. Now we have decide to go with streaming replication. I did the setup on machine and it is working good. Now I have to implement the automatic failover. Please share a solution for the same. Saurabh Gupta You ca use pgpool-II for automatic failover and connection cache. This article is good enough http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html Also do not forget to configure Postgres max_connections = (pgpool) num_init_children*max_pool if you'll use connections cache. -- Best regards Aleksej Trofimov -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql Replication Performance
On Thu, Dec 29, 2011 at 3:33 AM, sgupta saurabh@gmail.com wrote: I am doing POC on Posgtresql replication. I am using latest version of postgresql i.e. 9.1. There are multiple replication solutions avaliable in the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built replication solutions (Streaming replication, Warm Standby and hot standby). I am confused which solution is best for the financial application for which I am doing POC. The application will write around 160 million records with row size of 2.5 KB in database. My questions is for following scenarios which replication solution will be suitable: If I would require replication for backup purpose only If I would require to scale the reads If I would require High Avaliability and Consistency Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions. The built in HS/SR integrates with the postgres engine (over the WAL system) at a very low level and is going to be generally faster and more robust. More importantly, it has a very low administrative overhead -- the underlying mechanism of log shipping has been tweaked and refined continually since PITR was released in 8.0. Once you've done it a few times, it's a five minute procedure to replicate a database (not counting, heh, the base database copy). The main disadvantage of HS/SR is inflexibility: you get an exact replica of a database cluster. Slony (which is a trigger based system) and pgpool (which is statement replication) can do a lot of funky things that hs/sr can't do -- so they definitely fill a niche depending on what your requirements are. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql Replication Performance
On 12/29/2011 11:33 AM, sgupta wrote: I am doing POC on Posgtresql replication. I am using latest version of postgresql i.e. 9.1. There are multiple replication solutions avaliable in the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built replication solutions (Streaming replication, Warm Standby and hot standby). I am confused which solution is best for the financial application for which I am doing POC. The application will write around 160 million records with row size of 2.5 KB in database. My questions is for following scenarios which replication solution will be suitable: If I would require replication for backup purpose only If I would require to scale the reads If I would require High Avaliability and Consistency Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions. Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-Replication-Performance-tp5107278p5107278.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. Which replication solution will be suitable depends on your needs and database architecture. Starting with PGCluster, I can say only, that PGCluster last released in 2005 year, so you can not use it with Postgres 9.0 =) Slony-I is a good solution if you want to have cascade replication from Slave to Slave or you want to replicate only several parts of your database (because Slony performs table level like replication) PGPool-II is an advanced load balancer and pooling solution. Which also has replication support. Pgpool-II is query based replication utility, which performs queries on several database servers. If you are looking for performance and stability I do not recommend using PGPool as replication software. Postgres Streaming replication is WAL based replication, so using this type of replication you will have absolutely identical database servers, what is best choice for HA and scaling reads. Also this choice is not practically affecting performance, because it is not adding any latency to database layer. Also you could read about difference between Slony and Streaming replications here http://scanningpages.wordpress.com/2010/10/09/9-0-streaming-replication-vs-slony/ -- Best regards Aleksej Trofimov
Re: [PERFORM] Postgresql Replication Performance
On 12/29/2011 05:00 PM, Claudio Freire wrote: Second, is that WAL activity on streaming replication or WAL shipping is documented to contain more data than on non-replicated setups. What is not clear is how much more data. This not only affects our network bandwidth estimations, but also I/O load on the master server, slowing writes (and some reads that cannot happen on the slave). Our database has about 2MB/s writes on the WAL array, we had about 160 IOPS in average when replications was switched off, and 165-170 IOPS in replication. This I think could be explained with statistical error, so we have not experienced any I/O load on our master server since replication was configured. -- Best regards Aleksej Trofimov -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Postgresql Replication Performance
I am doing POC on Posgtresql replication. I am using latest version of postgresql i.e. 9.1. There are multiple replication solutions avaliable in the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built replication solutions (Streaming replication, Warm Standby and hot standby). I am confused which solution is best for the financial application for which I am doing POC. The application will write around 160 million records with row size of 2.5 KB in database. My questions is for following scenarios which replication solution will be suitable: If I would require replication for backup purpose only If I would require to scale the reads If I would require High Avaliability and Consistency Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions. Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-Replication-Performance-tp5107278p5107278.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql Replication Performance
On Thu, Dec 29, 2011 at 11:33 AM, Aleksej Trofimov aleksej.trofi...@ruptela.lt wrote: Postgres Streaming replication is WAL based replication, so using this type of replication you will have absolutely identical database servers, what is best choice for HA and scaling reads. Also this choice is not practically affecting performance, because it is not adding any latency to database layer. Let me chime in, because I'm in a similar situation. I'm preparing a POC WAL-replicated environment, and testing up until now has been inconclusive since we lack the kind of hardware in our test environment. I know I should require it, testing on similar hardware is the only way to get reliable results, but getting the budget approved would take way too long, and right now we're in a hurry to scale reads. So getting the hardware is not an option, my option is asking those who have the experience :-) I gather WAL replication introduces only a few possible bottlenecks. First, network bandwidth between master and slaves, and my app does write a lot - our monitoring tools show, today, an average of 1MB/s writes on the WAL array, with peaks exceeding 8MB/s, which can easily saturate our lowly 100Mb/s links. No worries, we can upgrade to 1Gb/s links. Second, is that WAL activity on streaming replication or WAL shipping is documented to contain more data than on non-replicated setups. What is not clear is how much more data. This not only affects our network bandwidth estimations, but also I/O load on the master server, slowing writes (and some reads that cannot happen on the slave). So, my question is, in your experience, how much of an increase in WAL activity can be expected? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Tue, Aug 4, 2009 at 4:40 PM, Tom Lanet...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: There are a handful of other compression algorithms very similar to LZO in performance / compression level under various licenses. LZO is just the best known and most widely used. And after we get done with the license question, we need to ask about patents. The compression area is just a minefield of patents. gzip is known to avoid all older patents (and would be pretty solid prior art against newer ones). I'm far less confident about lesser-known systems. I did a little bit of research. LZO and friends are variants of LZW. The main LZW patent died in 2003, and AFAIK there has been no patent enforcement cases brought against LZO or it's cousins (LZO dates to 1996). OK, I'm no attorney, etc, but the internet seems to believe that the algorithms are patent free. LZO is quite widely used, in both open source and some relatively high profile commercial projects. I downloaded the libraries and did some tests. 2.5 G sql dump: compression time: zlib: 4m 1s lzo: 17s fastlz: 28.8s liblzf: 26.7s compression size: zlib: 609M 75% lzo: 948M 62% fastlz: 936M 62.5% liblzf: 916M 63.5% A couple of quick notes: liblzf produces (possibly) architecture dependent archives according to its header, and fastlz is not declared 'stable' according to its website. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 8/5/09 7:12 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Aug 4, 2009 at 4:40 PM, Tom Lanet...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: There are a handful of other compression algorithms very similar to LZO in performance / compression level under various licenses. LZO is just the best known and most widely used. And after we get done with the license question, we need to ask about patents. The compression area is just a minefield of patents. gzip is known to avoid all older patents (and would be pretty solid prior art against newer ones). I'm far less confident about lesser-known systems. I did a little bit of research. LZO and friends are variants of LZW. The main LZW patent died in 2003, and AFAIK there has been no patent enforcement cases brought against LZO or it's cousins (LZO dates to 1996). OK, I'm no attorney, etc, but the internet seems to believe that the algorithms are patent free. LZO is quite widely used, in both open source and some relatively high profile commercial projects. That doesn't sound right to me, LZW is patent protected in a few ways, and is a LZ78 scheme. LZO, zlib, and the others here are LZ77 schemes which avoid the LZW patents. There are some other patents in the territory with respect to how the hash lookups are done for the LZ77 'sliding window' approach. Most notably, using a tree is patented, and a couple other (obvious) tricks that are generally avoided anyway for any algorithms that are trying to be fast rather than produce the highest compression. http://en.wikipedia.org/wiki/Lossless_data_compression#Historical_legal_issu es http://en.wikipedia.org/wiki/LZ77_and_LZ78 http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch http://www.faqs.org/faqs/compression-faq/part1/section-7.html http://www.ross.net/compression/patents.html Note, US patents are either 17 years after grant, or 20 years after filing. A very large chunk of those in this space have expired, but a few were filed/granted in the early 90's -- though those are generally more specific and easy to avoid. Or very obvious duplicates of previous patents. More notably, one of these, if interpreted broadly, would apply to zlib as well (Gibson and Graybill) but the patent mentions LZRW1, and any broader scope would have prior art conflicts with ones that are now long expired. Its 17 years after grant on that, but not 20 years after filing. I downloaded the libraries and did some tests. 2.5 G sql dump: compression time: zlib: 4m 1s lzo: 17s fastlz: 28.8s liblzf: 26.7s compression size: zlib: 609M 75% lzo: 948M 62% fastlz: 936M 62.5% liblzf: 916M 63.5% Interesting how that conflicts with some other benchmarks out there (where LZO ad the others are about the same). But, they're all an order of magnitude faster than gzip/zlib. A couple of quick notes: liblzf produces (possibly) architecture dependent archives according to its header, and fastlz is not declared 'stable' according to its website. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Mon, Aug 3, 2009 at 5:30 PM, PFCli...@peufeu.com wrote: lzo is much, much, (much) faster than zlib. Note, I've tried several decompression speed is even more awesome... times to contact the author to get clarification on licensing terms and have been unable to get a response. lzop and the LZO library are distributed under the terms of the GNU General Public License (GPL). source : http://www.lzop.org/lzop_man.php yeah...I have another project I'm working on that is closed source, plus I was curious if something could be worked out for pg...lzo seems ideal for database usage. The author is MIA or too busy hacking to answer his email :-). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 8/4/09 8:30 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Merlin Moncure escribió: On Mon, Aug 3, 2009 at 5:30 PM, PFCli...@peufeu.com wrote: lzo is much, much, (much) faster than zlib. Note, I've tried several decompression speed is even more awesome... times to contact the author to get clarification on licensing terms and have been unable to get a response. lzop and the LZO library are distributed under the terms of the GNU General Public License (GPL). source : http://www.lzop.org/lzop_man.php yeah...I have another project I'm working on that is closed source, plus I was curious if something could be worked out for pg...lzo seems ideal for database usage. I think this was already discussed here. It turns out that a specific exception for PG wouldn't be acceptable because of the multiple commercial derivates. LZO would have to become BSD, which presumably the author just doesn't want to do. Maybe we could have a --enable-lzo switch similar to what we do with readline. Of course, a non-LZO-enabled build would not be able to read a dump from such a build. (We could also consider LZO for TOAST compression). There are a handful of other compression algorithms very similar to LZO in performance / compression level under various licenses. LZO is just the best known and most widely used. http://www.fastlz.org/ (MIT) http://www.quicklz.com/ (GPL again) http://oldhome.schmorp.de/marc/liblzf.html (BSD -ish) ZFS uses LZJB (CDDL) source code here: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/ compress.c (a good read for one of the most simple LZ compression algorithms in terms of lines of code -- about 100 lines) Fastlz, with its MIT license, is probably the most obvious choice. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com writes: There are a handful of other compression algorithms very similar to LZO in performance / compression level under various licenses. LZO is just the best known and most widely used. And after we get done with the license question, we need to ask about patents. The compression area is just a minefield of patents. gzip is known to avoid all older patents (and would be pretty solid prior art against newer ones). I'm far less confident about lesser-known systems. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/31/09 4:01 PM, PFC li...@peufeu.com wrote: On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote: I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I dug into this a bit more. zlib *does* have internal buffering --- it has to, because it needs a minimum lookahead of several hundred bytes to ensure that compression works properly. The per-call overhead of deflate() looks a bit higher than one could wish when submitting short chunks, but oprofile shows that pg_dump -Fc breaks down about like this: During dump (size of dump is 2.6 GB), No Compression : - postgres at 70-100% CPU and pg_dump at something like 10-20% - dual core is useful (a bit...) - dump size 2.6G - dump time 2m25.288s Compression Level 1 : - postgres at 70-100% CPU and pg_dump at 20%-100% - dual core is definitely useful - dump size 544MB - dump time 2m33.337s Since this box is mostly idle right now, eating CPU for compression is no problem... I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump pg_dump -Fc file.gz 9m6s, output is 768M (6.53x compression); 9.18MB/sec 30% CPU in postgres, 70% CPU in pg_dump pg_dump | gzip file.2.gz 6m22s, 13MB/sec. 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip The default (5) compression level was used. So, when using pg_dump alone, I could not get significantly more than one core of CPU (all on the same box). No matter how I tried, pg_dump plus the postgres process dumping data always totaled about 102% -- it would flulctuate in top, give or take 15% at times, but the two always were very close (within 3%) of this total. Piping the whole thing to gzip gets some speedup. This indicates that perhaps the implementation or use of gzip is inappropriate on pg_dump's side or the library version is older or slower. Alternatively, the use of gzip inside pg_dump fails to pipeline CPU useage as well as piping it does, as the above shows 50% more CPU utilization when piping. I can do the same test with a single table that is 10GB later (which does dump much faster than 13MB/sec and has rows that average about 500 bytes in size). But overall I have found pg_dump's performace sorely lacking, and this is a data risk in the big picture. Postgres is very good about not losing data, but that only goes up to the limits of the hardware and OS, which is not good enough. Because of long disaster recovery times and poor replication/contingency features, it is a fairly unsafe place for data once it gets beyond a certain size and a BC plan requires minimal downtime. Adding an option to use LZO instead of gzip could be useful... Compressing the uncompressed 2.6GB dump : - gzip -1 : - compressed size : 565 MB - compression throughput : 28.5 MB/s - decompression throughput : 74 MB/s - LZO -1 : - compressed size : 696M - compression throughput : 86 MB/s - decompression throughput : 247 MB/s Conclusion : LZO could help for fast disks (RAID) or slow disks on a CPU-starved server... LZO would be a great option, it is very fast, especially decompression. With gzip, one rarely gains by going below gzip -3 or above gzip -6. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com writes: I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump pg_dump -Fc file.gz 9m6s, output is 768M (6.53x compression); 9.18MB/sec 30% CPU in postgres, 70% CPU in pg_dump pg_dump | gzip file.2.gz 6m22s, 13MB/sec. 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip I don't see anything very contradictory here. What you're demonstrating is that it's nice to be able to throw a third CPU at the compression part of the problem. That's likely to remain true if we shift to a different compression algorithm. I suspect if you substituted lzo for gzip in the third case, the picture wouldn't change very much. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 8/3/09 11:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump pg_dump -Fc file.gz 9m6s, output is 768M (6.53x compression); 9.18MB/sec 30% CPU in postgres, 70% CPU in pg_dump pg_dump | gzip file.2.gz 6m22s, 13MB/sec. 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip I don't see anything very contradictory here. The other poster got nearly 2 CPUs of work from just pg_dump + postgres. That contradicts my results (but could be due to data differences or postgres version differences). In the other use case, compression was not slower, but just used more CPU (also contradicting my results). What you're demonstrating is that it's nice to be able to throw a third CPU at the compression part of the problem. No, 1.5 CPU. A full use of a second would even be great. I'm also demonstrating that there is some artificial bottleneck somewhere preventing postgres and pg_dump to operate concurrently. Instead, one waits while the other does work. Your claim earlier in this thread was that there was already pipelined work being done due to pg_dump + postgresql -- which seems to be true for the other test case but not mine. As a consequence, adding compression throttles the postgres process even though the compression hasn't caused 100% CPU (or close) on any task involved. That's likely to remain true if we shift to a different compression algorithm. I suspect if you substituted lzo for gzip in the third case, the picture wouldn't change very much. That is exactly the point. LZO would be nice (and help mitigate this problem), but it doesn't solve the real problem here. Pg_dump is slow and artificially throttles without even getting 100% CPU from itself or postgres. The problem still remains: dumping with -Fc can be significantly slower than raw piped to a compression utility, even if no task is CPU or I/O bound. Dumping and piping to gzip is faster. But parallel restore won't work without custom or raw format. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 That's a very different serup from my (much less powerful) box, so that would explain it... No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump If there is no disk wait time, then why do you get 50/50 and not 100/100 or at least 1 core maxed out ? That's interesting... COPY annonces TO '/dev/null'; COPY 413526 Temps : 13871,093 ms \copy annonces to '/dev/null' Temps : 14037,946 ms time pg_dump -Fc -t annonces -U annonces --compress=0 annonces /dev/null real0m14.596s user0m0.700s sys 0m0.372s In all 3 cases postgres maxes out one core (I've repeated the test until all data was cached, so there is no disk access at all in vmstat). Size of dump is 312MB. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Mon, Aug 3, 2009 at 2:56 PM, Tom Lanet...@sss.pgh.pa.us wrote: I don't see anything very contradictory here. What you're demonstrating is that it's nice to be able to throw a third CPU at the compression part of the problem. That's likely to remain true if we shift to a different compression algorithm. I suspect if you substituted lzo for gzip in the third case, the picture wouldn't change very much. lzo is much, much, (much) faster than zlib. Note, I've tried several times to contact the author to get clarification on licensing terms and have been unable to get a response. [r...@devdb merlin]# time lzop -c dump.sql /dev/null real0m16.683s user0m15.573s sys 0m0.939s [r...@devdb merlin]# time gzip -c dump.sql /dev/null real3m43.090s user3m41.471s sys 0m1.036s merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
lzo is much, much, (much) faster than zlib. Note, I've tried several decompression speed is even more awesome... times to contact the author to get clarification on licensing terms and have been unable to get a response. lzop and the LZO library are distributed under the terms of the GNU General Public License (GPL). source : http://www.lzop.org/lzop_man.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey wrote: On 7/30/09 11:24 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. That's not an option when a dump compressed is 200GB and uncompressed is 1.3TB, for example. yeah that was not meant as don't use compression at all but rather as use a different way to compress than what pg_dump provides internally. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
That's true. I tried to lower work_mem from 192 to 64, and it caused total slowdown. By the way, is there any performance tips for tuning joins? I noticed, that my joins on 8.4 slowed down, on 8.3 it was faster a bit. 2009/7/31 Scott Marlowe scott.marl...@gmail.com: On Thu, Jul 30, 2009 at 10:10 PM, Greg Smithgsm...@gregsmith.com wrote: On Thu, 30 Jul 2009, Rauan Maemirov wrote: maintenance_work_mem = 1GB work_mem = 192MB shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb While it looks like you sorted out your issue downthread, I wanted to point out that your setting for work_mem could be dangerously high here and contribute to problems The real danger here is that you can set up your pg server to fail ONLY under heavy load, when it runs out of memory and goes into a swap storm. So, without proper load testing and profiling, you may not know you're headed for danger until your server goes unresponsive midday at the most critical of times. And restarting it will just lead to the same failure again as the clients all reconnect and pummel your server. Meanwhile, going from 192 to 16MB might result in a total slowdown measured in a fraction of a percentage overall, and prevent this kind of failure. If there's one single request you can justify big work_mem for then set it for just that one query. It's not uncommon to have a reporting user limited to a few connections and with alter user reportinguser set work_mem='512MB'; so that it can run fast but not deplete your server's resources on accident during heavy load. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Thu, Jul 30, 2009 at 10:07 AM, Rauan Maemirovra...@maemirov.com wrote: Unfortunately had to downgrade back to 8.3. Now having troubles with that and still solving them. For future upgrade, what is the basic steps? Was the database analyzed recently? Hm... there was smth like auto analyzer in serverlog when i started it first time, but i didn't mention that. Should I analyze whole db? How to do it? And how should I change _cost variables? I/O was very high. at first memory usage grew up and then began to full swap. There is at least one known case of memory leak 8.4.0. Possibly you got hit by that or another early adopter bug. I think in your case it's probably to soon to have upgraded...with 10k connections even minor annoyances can be real nasty. I'd wait a few months while in the meantime stage your app on a testing database and double check the important query plans to make sure there are no big performance regressions. Each version of pg has a couple for various reasons. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Greg Stark gsst...@mit.edu writes: On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote: I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I dug into this a bit more. zlib *does* have internal buffering --- it has to, because it needs a minimum lookahead of several hundred bytes to ensure that compression works properly. The per-call overhead of deflate() looks a bit higher than one could wish when submitting short chunks, but oprofile shows that pg_dump -Fc breaks down about like this: samples %image name symbol name 1103922 74.7760 libz.so.1.2.3longest_match 215433 14.5927 libz.so.1.2.3deflate_slow 55368 3.7504 libz.so.1.2.3compress_block 41715 2.8256 libz.so.1.2.3fill_window 17535 1.1878 libc-2.9.so memcpy 13663 0.9255 libz.so.1.2.3adler32 4613 0.3125 libc-2.9.so _int_malloc 2942 0.1993 libc-2.9.so free 2552 0.1729 libc-2.9.so malloc 2155 0.1460 libz.so.1.2.3pqdownheap 2128 0.1441 libc-2.9.so _int_free 1702 0.1153 libz.so.1.2.3deflate 1648 0.1116 libc-2.9.so mempcpy longest_match is the core lookahead routine and is not going to be affected by submission sizes, because it isn't called unless adequate data (ie, the longest possible match length) is available in zlib's internal buffer. It's possible that doing more buffering on our end would reduce the deflate_slow component somewhat, but it looks like the most we could hope to get that way is in the range of 10% speedup. So I'm wondering if anyone can provide concrete evidence of large wins from buffering zlib's input. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote: I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I dug into this a bit more. zlib *does* have internal buffering --- it has to, because it needs a minimum lookahead of several hundred bytes to ensure that compression works properly. The per-call overhead of deflate() looks a bit higher than one could wish when submitting short chunks, but oprofile shows that pg_dump -Fc breaks down about like this: During dump (size of dump is 2.6 GB), No Compression : - postgres at 70-100% CPU and pg_dump at something like 10-20% - dual core is useful (a bit...) - dump size 2.6G - dump time 2m25.288s Compression Level 1 : - postgres at 70-100% CPU and pg_dump at 20%-100% - dual core is definitely useful - dump size 544MB - dump time 2m33.337s Since this box is mostly idle right now, eating CPU for compression is no problem... Adding an option to use LZO instead of gzip could be useful... Compressing the uncompressed 2.6GB dump : - gzip -1 : - compressed size : 565 MB - compression throughput : 28.5 MB/s - decompression throughput : 74 MB/s - LZO -1 : - compressed size : 696M - compression throughput : 86 MB/s - decompression throughput : 247 MB/s Conclusion : LZO could help for fast disks (RAID) or slow disks on a CPU-starved server... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL 8.4 performance tuning questions
Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpu load. Here's changed values of config: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 192MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only postgresql working on it. For connection pooling i'm using pgbouncer's latest version with pool_size 20 (used 30 before, but now lowered) and 10k connections. What parameters i should give more attention on? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpu load. Here's changed values of config: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 192MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only postgresql working on it. For connection pooling i'm using pgbouncer's latest version with pool_size 20 (used 30 before, but now lowered) and 10k connections. What parameters i should give more attention on? All the values seem quite reasonable to me. What about the _cost variables? I guess one or more queries are evaluated using a different execution plan, probably sequential scan instead of index scan, hash join instead of merge join, or something like that. Try to log the slow statements - see log_min_statement_duration. That might give you slow queries (although not necessarily the ones causing problems), and you can analyze them. What is the general I/O activity? Is there a lot of data read/written to the disks, is there a lot of I/O wait? regards Tomas PS: Was the database analyzed recently? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Unfortunately had to downgrade back to 8.3. Now having troubles with that and still solving them. For future upgrade, what is the basic steps? 1. create database 2. dump the data from the old database 3. load the data into the new database 4. analyze etc. (I prefer to do this manually at the beginning) 5. check that everything is working (that the correct execution plans are used, etc.) You may even run the (2) and (3) at once - use pipe instead of a file. Was the database analyzed recently? Hm... there was smth like auto analyzer in serverlog when i started it first time, but i didn't mention that. Should I analyze whole db? How to do it? Just execute 'ANALYZE' and the whole database will be analyzed, but when the autovacuum daemon is running this should be performed automatically (I guess - check the pg_stat_user_tables, there's information about last manual/automatic vacuuming and/or analysis). And how should I change _cost variables? I haven't noticed you've not modified those variables, so don't change them. I/O was very high. at first memory usage grew up and then began to full swap. OK, this seems to be the cause. What were the original values of the config variables? If you've lowered the work_mem and you need to sort a lot of data, this may be a problem. What amounts of data are you working with? If the data were not analyzed recently, the execution plans will be inefficient and this may be the result. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Unfortunately had to downgrade back to 8.3. Now having troubles with that and still solving them. For future upgrade, what is the basic steps? Was the database analyzed recently? Hm... there was smth like auto analyzer in serverlog when i started it first time, but i didn't mention that. Should I analyze whole db? How to do it? And how should I change _cost variables? I/O was very high. at first memory usage grew up and then began to full swap. 2009/7/30 t...@fuzzy.cz: Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpu load. Here's changed values of config: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 192MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only postgresql working on it. For connection pooling i'm using pgbouncer's latest version with pool_size 20 (used 30 before, but now lowered) and 10k connections. What parameters i should give more attention on? All the values seem quite reasonable to me. What about the _cost variables? I guess one or more queries are evaluated using a different execution plan, probably sequential scan instead of index scan, hash join instead of merge join, or something like that. Try to log the slow statements - see log_min_statement_duration. That might give you slow queries (although not necessarily the ones causing problems), and you can analyze them. What is the general I/O activity? Is there a lot of data read/written to the disks, is there a lot of I/O wait? regards Tomas PS: Was the database analyzed recently? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
OK, this seems to be the cause. What were the original values of the config variables? If you've lowered the work_mem and you need to sort a lot of data, this may be a problem. What amounts of data are you working with? If the data were not analyzed recently, the execution plans will be inefficient and this may be the result. The reason is that i'm using config that i used before and it worked perfect. work_mem is 192mb. I tried ANALYZE, but it didn't change anything. Amounts of data... at least, backup is over ~2.2Gb. I tried to use EXPLAIN ANALYZE for slow queries that i get from serverlog, but it also didn't change anything. 2009/7/30 t...@fuzzy.cz: Unfortunately had to downgrade back to 8.3. Now having troubles with that and still solving them. For future upgrade, what is the basic steps? 1. create database 2. dump the data from the old database 3. load the data into the new database 4. analyze etc. (I prefer to do this manually at the beginning) 5. check that everything is working (that the correct execution plans are used, etc.) You may even run the (2) and (3) at once - use pipe instead of a file. Was the database analyzed recently? Hm... there was smth like auto analyzer in serverlog when i started it first time, but i didn't mention that. Should I analyze whole db? How to do it? Just execute 'ANALYZE' and the whole database will be analyzed, but when the autovacuum daemon is running this should be performed automatically (I guess - check the pg_stat_user_tables, there's information about last manual/automatic vacuuming and/or analysis). And how should I change _cost variables? I haven't noticed you've not modified those variables, so don't change them. I/O was very high. at first memory usage grew up and then began to full swap. OK, this seems to be the cause. What were the original values of the config variables? If you've lowered the work_mem and you need to sort a lot of data, this may be a problem. What amounts of data are you working with? If the data were not analyzed recently, the execution plans will be inefficient and this may be the result. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
My additional comments: t...@fuzzy.cz wrote: ... For future upgrade, what is the basic steps? 0. Create test database - work out bugs and performance issues before going live. 1. create database ...cluster. You only need to create the individual database if the options you select for the dump do not create the database(s). 2. dump the data from the old database ...using the dump tools from the *new* version. With several cores, you might want to consider using the binary dump options in pg_dump if you want to use the new parallel restore feature in pg_restore with a possible dramatic increase in restore speed (benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup so it's worth the effort). The manual suggests that setting --jobs to the number of cores on the server is a good first approximation. See the -Fc options on pg_dump and the --jobs option in pg_restore for details. Cheers, Steve
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Steve Crawford scrawf...@pinpointresearch.com wrote: benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup I'm curious what sort of data in what environment showed that ratio. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup I'm curious what sort of data in what environment showed that ratio. Was going on memory from a presentation I watched. Reports on the web have shown anything from a 3x increase using 8 cores to other non-detailed reports of up to 8x improvement. If you have one big table, don't expect much if any improvement. If you have lots of smaller tables/indexes then parallel restore will probably benefit you. This is all based on the not-atypical assumption that your restore will be CPU bound. I don't think parallel restore will be much use beyond the point you hit IO limits. Cheers, Steve
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Thu, 30 Jul 2009, Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup I'm curious what sort of data in what environment showed that ratio. It depends on a lot of things. However, certainly for index creation, tests on servers over here have indicated that running four CREATE INDEX statements at the time runs four times as fast, assuming the table fits in maintenance_work_mem. Matthew -- I have an inferiority complex. But it's not a very good one. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Matthew Wakeling matt...@flymine.org wrote: tests on servers over here have indicated that running four CREATE INDEX statements at the time runs four times as fast, assuming the table fits in maintenance_work_mem. I'm benchmarking a patch to the parallel restore, and just out of curiosity I've been comparing the multi-job approach, with various numbers of jobs, to a restore within a single database transaction; and I'm seeing (on serious production-quality servers) the parallel restore run in 55% to 75% of the time of a restore running off the same dump file using the -1 switch. The 16 processor machine got the best results, running with anywhere from 12 to 20 jobs. The 2 processor machine got the lesser benefit, running with 2 to 4 jobs. (The exact number of jobs really didn't make a difference big enough to emerge from the noise.) I've got 431 user tables with 578 indexes in a database which, freshly restored, is 70GB. (That's 91GB with the fragmentation and reasonable dead space we have in production.) Real production data; nothing synthetic. Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. I'm sure there are cases where people don't have the option to pipe things through, or that there may sometime be a big enough savings in the multiple jobs to pay off, even without overlapping the dump and restore, and with the necessity to write and read the data an extra time; but there are clearly situations where the piped approach is faster. We may want to try to characterize the conditions under which each is a win, so we can better target our advice -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Tom Lane t...@sss.pgh.pa.us wrote: Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I'm not sure without benchmarking that. I was writing to the same RAID as the database I was dumping, so contention was probably a significant issue. But it would be interesting to compare different permutations to see what impact each has alone and in combination. I'm OK with setting up a benchmark run each night for a while, to shake out what I can, on this and the artificial cases. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. All of this is on a RAID 5 array with 5 drives using xfs with noatime,nobarrier and a 256MB BBU controller. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I always dump with -Z0(and compress afterwards or even in a pipe to get two cores busy) because otherwise custom dump times are simply ridiculous. However Kevin is on something here - on the typical 4-8 core box I tested I managed to an around cores/2 speedup for the restore which means that for a pure upgrade or testing similiar to what kevin is doing custom dumps + parallel restore might result in no win or even a loss. On on of our datasets I did some benchmarking a while ago (for those who attended bruce pg_migrator talk @pgcon these are same numbers): * 150GB Database (on-disk - ~100GB as a plain text dump) time to dump(-C0): 120min time to restore(single threaded): 180min time to restore(-j 16): 59min however the problem is that this does not actually mean that parallel restore shaves you ~120min in dump/restore time because you get the following real runtimes: plain text dump + single threaded restore in a pipe: 188min custom dump to file + parallel restore: 179min this is without compression, with the default custom dump + parallel restore is way slower than the simple approach on reasonable hardware. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. All of this is on a RAID 5 array with 5 drives using xfs with noatime,nobarrier and a 256MB BBU controller. Of course Compression has a HUGE effect if your I/O system is half-decent. Max GZIP compression speed with the newest Intel CPU's is something like 50MB/sec (it is data dependant, obviously -- it is usually closer to 30MB/sec). Max gzip decompression ranges from 50 to 150MB/sec (it can get really high only if the ratio is extremely large, like if you compress a repeating sequence of 256 bytes). The new parallel restore is nice and all, but we're still limited by the week it takes to dump the whole thing compressed. Parallel restore is a lot faster when restoring compressed dumps though, even without any indexes to make, since all that decompression is CPU hungry. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com wrote: Max GZIP compression speed with the newest Intel CPU's is something like 50MB/sec (it is data dependant, obviously -- it is usually closer to 30MB/sec). Applying 30MB/sec to the 70GB accounts for 40 minutes. If those numbers are good, there's something else at play here. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 11:14 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I always dump with -Z0(and compress afterwards or even in a pipe to get two cores busy) because otherwise custom dump times are simply ridiculous. However Kevin is on something here - on the typical 4-8 core box I tested I managed to an around cores/2 speedup for the restore which means that for a pure upgrade or testing similiar to what kevin is doing custom dumps + parallel restore might result in no win or even a loss. On on of our datasets I did some benchmarking a while ago (for those who attended bruce pg_migrator talk @pgcon these are same numbers): * 150GB Database (on-disk - ~100GB as a plain text dump) time to dump(-C0): 120min time to restore(single threaded): 180min time to restore(-j 16): 59min Note also that with ext3 and XFS (untuned) parallel restore = HORRIBLY FRAGMENTED tables, to the point of sequential scans being rather slow. At least, they're mostly just interleaved with each other so there is little seeking backwards, but still... Beware. XFS with allocsize=64m or so interleaves them in reasonably large chunks though and prevents significant fragmentation. however the problem is that this does not actually mean that parallel restore shaves you ~120min in dump/restore time because you get the following real runtimes: plain text dump + single threaded restore in a pipe: 188min custom dump to file + parallel restore: 179min On the other hand, I find that the use case where one DB is dumped to a backup, and then this backup is restored on several others -- that parallel restore is extremely useful there. Dump needs to be parallelized or at least pipelined to use more cores. COPY on one thread, compression on another? One trick with a dump, that works only if you have tables or schemas that can safely dump in different transactions, is to dump concurrently on different slices of the DB manually. This makes a huge difference if that is possible. this is without compression, with the default custom dump + parallel restore is way slower than the simple approach on reasonable hardware. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 11:24 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. That's not an option when a dump compressed is 200GB and uncompressed is 1.3TB, for example. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 30-7-2009 20:46 Scott Carey wrote: Of course Compression has a HUGE effect if your I/O system is half-decent. Max GZIP compression speed with the newest Intel CPU's is something like 50MB/sec (it is data dependant, obviously -- it is usually closer to 30MB/sec). Max gzip decompression ranges from 50 to 150MB/sec (it can get really high only if the ratio is extremely large, like if you compress a repeating sequence of 256 bytes). I just ran some quick numbers on our lightly loaded Nehalem X5570 (2.93+ Ghz depending on turbo-mode). I compressed a 192MB text file I had at hand using gzip -1, -2, -3, -6 and -9 and outputted its results to /dev/null. The file was in the kernels file cache all the time and I did the tests 3 times. Gzip -1 reached 54MB/s, -2 got 47MB/s, -3 got 32MB/s, -6 got 18MB/s and -9 got to 12MB/s. Just running cat on the file made it do 6400MB/s (i.e. it took 0.030 seconds to copy the file from memory to nowhere). Those files where respectively 69MB, 66MB, 64MB, 59MB and 58MB. Gunzip on the -1 file took 1.66 seconds, i.e. it read data at 41MB/s and outputted it to /dev/null at 115MB/s. The -9 file took 1.46s, so it read 40MB/s and wrote 131MB/s. Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com writes: Dump needs to be parallelized or at least pipelined to use more cores. COPY on one thread, compression on another? We already do that (since compression happens on the pg_dump side). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 11:58 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Scott Carey sc...@richrelevance.com wrote: Max GZIP compression speed with the newest Intel CPU's is something like 50MB/sec (it is data dependant, obviously -- it is usually closer to 30MB/sec). Applying 30MB/sec to the 70GB accounts for 40 minutes. If those numbers are good, there's something else at play here. It is rather data dependant, try gzip on command line as a test on some data. On a random tarball on my Nehalem system, I just got 23MB/sec compression rate on an uncompressable file. Decompression with gunzip was 145MB/sec. On a text file that I manually created with randommly placed repeating segments that compresses 200x to 1, compression was 115MB/sec (bytes in per sec), and decompression (bytes out per sec) was 265MB/sec. The array in this machine will do 800MB/sec reads/sec with 'dd' and 700MB/sec writes. One core has no chance. Now, what needs to be known with the pg_dump is not just how fast compression can go (assuming its gzip) but also what the duty cycle time of the compression is. If it is single threaded, there is all the network and disk time to cut out of this, as well as all the CPU time that pg_dump does without compression. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: Dump needs to be parallelized or at least pipelined to use more cores. COPY on one thread, compression on another? We already do that (since compression happens on the pg_dump side). regards, tom lane Well, that isn't what I meant. pg_dump uses CPU outside of compression doing various things, If that Cpu is 10% as much as the compression, then splitting them up would yield ~10% gain when CPU bound. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com wrote: Now, what needs to be known with the pg_dump is not just how fast compression can go (assuming its gzip) but also what the duty cycle time of the compression is. If it is single threaded, there is all the network and disk time to cut out of this, as well as all the CPU time that pg_dump does without compression. Well, I established a couple messages back on this thread that pg_dump piped to psql to a database on the same machine writes the 70GB database to disk in two hours, while pg_dump to a custom format file at default compression on the same machine writes the 50GB file in six hours. No network involved, less disk space written. I'll try it tonight at -Z0. One thing I've been wondering about is what, exactly, is compressed in custom format. Is it like a .tar.gz file, where the compression is a layer over the top, or are individual entries compressed? If the latter, what's the overhead on setting up each compression stream? Is there some minimum size before that kicks in? (I know, I should go check the code myself. Maybe in a bit. Of course, if someone already knows, it would be quicker) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 1:58 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Scott Carey sc...@richrelevance.com wrote: Now, what needs to be known with the pg_dump is not just how fast compression can go (assuming its gzip) but also what the duty cycle time of the compression is. If it is single threaded, there is all the network and disk time to cut out of this, as well as all the CPU time that pg_dump does without compression. Well, I established a couple messages back on this thread that pg_dump piped to psql to a database on the same machine writes the 70GB database to disk in two hours, while pg_dump to a custom format file at default compression on the same machine writes the 50GB file in six hours. No network involved, less disk space written. I'll try it tonight at -Z0. So, I'm not sure what the pg_dump custom format overhead is minus the compression -- there is probably some non-compression overhead from that format other than the compression. -Z1 might be interesting too, but obviously it takes some time. Interesting that your uncompressed case is only 40% larger. For me, the compressed dump is in the range of 20% the size of the uncompressed one. One thing I've been wondering about is what, exactly, is compressed in custom format. Is it like a .tar.gz file, where the compression is a layer over the top, or are individual entries compressed? It is instructive to open up a compressed custom format file in 'less' or another text viewer. Basically, it is the same as the uncompressed dump with all the DDL uncompressed, but the binary chunks compressed. It would seem (educated guess, looking at the raw file, and not the code) that the table data is compressed and the DDL points to an index in the file where the compressed blob for the copy lives. If the latter, what's the overhead on setting up each compression stream? Is there some minimum size before that kicks in? (I know, I should go check the code myself. Maybe in a bit. Of course, if someone already knows, it would be quicker) Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com wrote: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Yeah, I've run into that before. If we're sending each individual datum to a gzip function rather than waiting until we've got a decent-size buffer, that could explain it. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner kevin.gritt...@wicourts.gov writes: One thing I've been wondering about is what, exactly, is compressed in custom format. Is it like a .tar.gz file, where the compression is a layer over the top, or are individual entries compressed? Individual entries. Eyeball examination of a dump file shows that we only compress table-data entries, and don't for example waste time firing up the compressor to process a function body. It's possible that it'd be worth trying to have some lower limit on the amount of data in a table before we bother to compress it, but I bet that it wouldn't make any difference on your databases ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com writes: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. If the library used here is anything like the native library used by Java, it'd be worth putting a buffer layer ahead of the calls to gzip, so it isn't dealing with each individual value as a separate call. I seem to remember running into that issue in Java, where throwing a BufferedOutputStream in there fixed the performance issue. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Hey guyz, thanks for help. I solved the problems. The reason was in bad query, that i've accidentally committed right after upgrading. PostgreSQL 8.4 is perfect! Analyze works like a charm, and MUCH better than in 8.3. 2009/7/31 Kevin Grittner kevin.gritt...@wicourts.gov: Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. If the library used here is anything like the native library used by Java, it'd be worth putting a buffer layer ahead of the calls to gzip, so it isn't dealing with each individual value as a separate call. I seem to remember running into that issue in Java, where throwing a BufferedOutputStream in there fixed the performance issue. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On 7/30/09 2:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. regards, tom lane For example, one of the things that gzip does is calculate the crc of the item being compressed. Calculating that incrementally is less efficient than doing it in bulk. For whatever reason, some other internals of gzip tend to perform much better if submitting say, 4k or 8k or 16k chunks rather than little bits at a time. But I'm sure some of that also depends on what library you're using since they all vary somewhat. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey sc...@richrelevance.com writes: On 7/30/09 2:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. For whatever reason, some other internals of gzip tend to perform much better if submitting say, 4k or 8k or 16k chunks rather than little bits at a time. But I'm sure some of that also depends on what library you're using since they all vary somewhat. AFAIK there is only one widely-used implementation of zlib, and it hasn't changed much in a long time. I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote: I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I suppose it wouldn't be the first time a problem like this went unfixed though. Is the zlib software actively maintained or was your earlier comment implying it's currently an orphaned codebase? -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Greg Stark gsst...@mit.edu wrote: I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I think that most uses of this library are on entire files or streams. They may have felt that adding another layer of buffering would just hurt performance for the typical use case, and anyone using it in some other way could always use their own buffering layer. In Java adding that layer took 30 characters of code, so it didn't make a very big impression on me -- it took a while to even remember I'd had to do it. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Thu, 30 Jul 2009, Rauan Maemirov wrote: maintenance_work_mem = 1GB work_mem = 192MB shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb While it looks like you sorted out your issue downthread, I wanted to point out that your setting for work_mem could be dangerously high here and contribute to problems with running out memory or using swap. If each of your 80 clients was doing a sort at the same time, you'd be using 80 * 192MB + 7680MB = 15360GB of RAM just for the server. The problem is that each client could do multiple sorts, so usage might even got higher. Unless you have a big data warehouse setup, more common work_mem settings are in the 16-64MB range rather than going this high. Just something to keep an eye on if you find a lot of memory is being used by the database processes. I really need to refine the pgtune model to more carefully account for this particular problem, it's a bit too aggressive here for people who aren't proactively watching the server's RAM after changing the settings. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
On Thu, Jul 30, 2009 at 10:10 PM, Greg Smithgsm...@gregsmith.com wrote: On Thu, 30 Jul 2009, Rauan Maemirov wrote: maintenance_work_mem = 1GB work_mem = 192MB shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb While it looks like you sorted out your issue downthread, I wanted to point out that your setting for work_mem could be dangerously high here and contribute to problems The real danger here is that you can set up your pg server to fail ONLY under heavy load, when it runs out of memory and goes into a swap storm. So, without proper load testing and profiling, you may not know you're headed for danger until your server goes unresponsive midday at the most critical of times. And restarting it will just lead to the same failure again as the clients all reconnect and pummel your server. Meanwhile, going from 192 to 16MB might result in a total slowdown measured in a fraction of a percentage overall, and prevent this kind of failure. If there's one single request you can justify big work_mem for then set it for just that one query. It's not uncommon to have a reporting user limited to a few connections and with alter user reportinguser set work_mem='512MB'; so that it can run fast but not deplete your server's resources on accident during heavy load. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
2008/11/15 Tom Lane [EMAIL PROTECTED] =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: I am not. I can't see how materialize can multiply number of rows it gets from sort by 100. Is it the right-hand input of a merge join? If so you're looking at mark/restore rescans, ie, repeated fetches of the same tuples. There must be a huge number of duplicate join keys in that relation to make for such an increase though. Normally the planner avoids putting a table with lots of duplicates as the RHS of a merge, but if it doesn't have good statistics for the join key then it might not realize the problem. OK, thanks for cleaning-up some mystery. But, returning to original Q: Do anyone known why does it choose plan from * OR-plan.txt* instead of *union-plan.txt*? The first is cost=4588.13..960900482668.95, the latter is cost=266348.42..272953.14 according to statistics postgres have, so I suppose planner would select it if it could evaluate it.
Re: [PERFORM] PostgreSQL OR performance
Sorry, for delayed response - It was very busy week. 2008/11/7 David Wilson [EMAIL PROTECTED] On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote: Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual time=30292.802..755751.242 rows=34749 loops=1) Have you tried increasing the default_statistics_target? The planner is expecting 1.3 billion rows to be produced from a query that's only actually producting 35k, which probably indicates some very bad statistics. The planner seems to think that every second pair from company-company join will succeed with this join expression (1386158171 ~ 52648^2 / 2). That is not true. Anyway, I've tried to set default_statistics_target to 1000, then analyze. Nothing've changed At the same time, the materialize step produces 242 million rows when the planner only expects to produce 2.3, indicating a similar problem in the opposite direction. This probably means that the planner is choosing plans that would be optimal if it was making good guesses but are decidedly sub-optimal for your actual data. That is even more strange, because materialize step must produce exactly the rows it takes from sort, that is 2316503, so I don't get how table scan + sort + materialize can multiply number of rows by 100.
Re: [PERFORM] PostgreSQL OR performance
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: I am not. I can't see how materialize can multiply number of rows it gets from sort by 100. Is it the right-hand input of a merge join? If so you're looking at mark/restore rescans, ie, repeated fetches of the same tuples. There must be a huge number of duplicate join keys in that relation to make for such an increase though. Normally the planner avoids putting a table with lots of duplicates as the RHS of a merge, but if it doesn't have good statistics for the join key then it might not realize the problem. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote: Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual time=30292.802..755751.242 rows=34749 loops=1) Have you tried increasing the default_statistics_target? The planner is expecting 1.3 billion rows to be produced from a query that's only actually producting 35k, which probably indicates some very bad statistics. At the same time, the materialize step produces 242 million rows when the planner only expects to produce 2.3, indicating a similar problem in the opposite direction. This probably means that the planner is choosing plans that would be optimal if it was making good guesses but are decidedly sub-optimal for your actual data. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
For what i see in four OR-plan.txt tou are doing too much sequencial scan . Create some indexes for those tables using the fields that you use an it may help you. OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working. Tips: 1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan. 2 - You cold give us the configuration os the hardware and the posgresql configuration file and we can see what is going on. Regards On Thu, Nov 6, 2008 at 8:46 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote: My main message is that I can see this in many queries and many times. But OK, I can present exact example. 2008/11/5 Jeff Davis [EMAIL PROTECTED] On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote: For a long time already I can see very poor OR performance in postgres. If one have query like select something from table where condition1 or condition2 it may take ages to execute while select something from table where condition1 and select something from table where condition2 are executed very fast and select something from table where condition1 and not condition2 union all select something from table where condition2 gives required results fast What version are you using? Server version 8.3.3 Have you run VACUUM ANALYZE? I have autovacuum, but for this example I did vacuum analyze of the whole DB. The real-life query (autogenerated) looks like the next: select t0.id as pk1,t1.id as pk2 ,t0.run_id as f1_run_id,t1.run_id as f2_run_id from tmpv_unproc_null_production_company_dup_cons_company as t0, (select * from production.company where run_id in (select id from production.run where name='test')) as t1 where t0.name = t1.name or (t0.name,t1.name) in (select s1.name, s2.name from atom_match inner join atoms_string s1 on atom_match.atom1_id = s1.id inner join atoms_string s2 on atom_match.atom2_id = s2.id where s1.atom_type_id = -1 and match_function_id = 2) with tmpv_unproc_null_production_company_dup_cons_company: create temporary view tmpv_unproc_null_production_company_dup_cons_company as select * from production.company where 1=1 and status='unprocessed' and run_id in (select id from production.run where name='test') Next, do: EXPLAIN ANALYZE select something from table where condition1 or condition2; without analyze is in OR-plan.txt Also plans for only condition1, only condition2 and union is attached -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Helio Campos Mello de Andrade
Re: [PERFORM] PostgreSQL OR performance
My main message is that I can see this in many queries and many times. But OK, I can present exact example. 2008/11/5 Jeff Davis [EMAIL PROTECTED] On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote: For a long time already I can see very poor OR performance in postgres. If one have query like select something from table where condition1 or condition2 it may take ages to execute while select something from table where condition1 and select something from table where condition2 are executed very fast and select something from table where condition1 and not condition2 union all select something from table where condition2 gives required results fast What version are you using? Server version 8.3.3 Have you run VACUUM ANALYZE? I have autovacuum, but for this example I did vacuum analyze of the whole DB. The real-life query (autogenerated) looks like the next: select t0.id as pk1,t1.id as pk2 ,t0.run_id as f1_run_id,t1.run_id as f2_run_id from tmpv_unproc_null_production_company_dup_cons_company as t0, (select * from production.company where run_id in (select id from production.run where name='test')) as t1 where t0.name = t1.name or (t0.name,t1.name) in (select s1.name, s2.name from atom_match inner join atoms_string s1 on atom_match.atom1_id = s1.id inner join atoms_string s2 on atom_match.atom2_id = s2.id where s1.atom_type_id = -1 and match_function_id = 2) with tmpv_unproc_null_production_company_dup_cons_company: create temporary view tmpv_unproc_null_production_company_dup_cons_company as select * from production.company where 1=1 and status='unprocessed' and run_id in (select id from production.run where name='test') Next, do: EXPLAIN ANALYZE select something from table where condition1 or condition2; without analyze is in OR-plan.txt Also plans for only condition1, only condition2 and union is attached Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32) Join Filter: (((production.company.name)::text = (production.company.name)::text) OR (subplan)) - Hash IN Join (cost=1.56..73814.22 rows=52648 width=30) Hash Cond: ((production.company.run_id)::bigint = production.run.id) - Seq Scan on company (cost=0.00..64599.29 rows=2316503 width=30) Filter: ((status)::text = 'unprocessed'::text) - Hash (cost=1.55..1.55 rows=1 width=8) - Seq Scan on run (cost=0.00..1.55 rows=1 width=8) Filter: ((name)::text = 'test'::text) - Nested Loop (cost=1183.27..39219.67 rows=52648 width=30) - HashAggregate (cost=1.55..1.56 rows=1 width=8) - Seq Scan on run (cost=0.00..1.55 rows=1 width=8) Filter: ((name)::text = 'test'::text) - Bitmap Heap Scan on company (cost=1181.72..38435.51 rows=62608 width=30) Recheck Cond: ((production.company.run_id)::bigint = production.run.id) - Bitmap Index Scan on comp_run (cost=0.00..1166.07 rows=62608 width=0) Index Cond: ((production.company.run_id)::bigint = production.run.id) SubPlan - Materialize (cost=3403.29..4005.74 rows=35745 width=28) - Hash Join (cost=928.57..3122.55 rows=35745 width=28) Hash Cond: ((atom_match.atom1_id)::integer = s1.id) - Hash Join (cost=445.80..1880.19 rows=35745 width=18) Hash Cond: ((atom_match.atom2_id)::integer = s2.id) - Seq Scan on atom_match (cost=0.00..674.81 rows=35745 width=8) Filter: ((match_function_id)::integer = 2) - Hash (cost=260.91..260.91 rows=14791 width=18) - Seq Scan on atoms_string s2 (cost=0.00..260.91 rows=14791 width=18) - Hash (cost=297.89..297.89 rows=14791 width=18) - Seq Scan on atoms_string s1 (cost=0.00..297.89 rows=14791 width=18) Filter: ((atom_type_id)::integer = (-1)) Merge Join (cost=89373.23..97526.15 rows=525975 width=32) (actual time=276.869..523.669 rows=34749 loops=1) Merge Cond: ((production.company.name)::text = (production.company.name)::text) - Sort (cost=44764.87..44896.49 rows=52648 width=30) (actual time=120.036..144.925 rows=15507 loops=1) Sort Key: production.company.name Sort Method: external merge Disk: 704kB - Nested Loop (cost=1183.27..39376.19 rows=52648 width=30) (actual time=1.898..72.693 rows=15507 loops=1) - HashAggregate (cost=1.55..1.56 rows=1 width=8) (actual time=0.020..0.022 rows=1 loops=1) - Seq Scan on run (cost=0.00..1.55 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1) Filter: ((name)::text = 'test'::text) - Bitmap Heap Scan on company (cost=1181.72..38592.03 rows=62608 width=30) (actual time=1.873..29.251 rows=15507 loops=1) Recheck
Re: [PERFORM] PostgreSQL OR performance
2008/11/6 Helio Campos Mello de Andrade [EMAIL PROTECTED] For what i see in four OR-plan.txt tou are doing too much sequencial scan . Create some indexes for those tables using the fields that you use an it may help you. OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working. Tips: 1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan. As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite? Actually I did rewrite the query to work properly as you can see from union-plan.txt. My question is if postgresql can do this automatically because such a rewrite is not always easy/possible (esp. for generated queries)?
Re: [PERFORM] PostgreSQL OR performance
Віталій Тимчишин wrote: As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite? I must admit, I haven't managed to figure out what your query is trying to do, but then that's a common problem with autogenerated queries. The main question that needs answering is why the planner thinks you're going to get 1.3 billion rows in the or query: Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32) You don't show explain analyse for this query, so there's no way of knowing how many rows get returned but presumably you're expecting around 88000. What does explain analyse return? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
As far as i know if you created the indexes properly and postgres sees that it will give some improvement he will use those. - Look at the page of index creation that we may be forgeting some thing. http://www.postgresql.org/docs/8.3/static/indexes.html I have to go to the hospital know. Tomorrow i will take a look at the manual and try to understand all the necessary for the postgresql use an index. Regards On Thu, Nov 6, 2008 at 2:33 PM, Віталій Тимчишин [EMAIL PROTECTED] wrote: 2008/11/6 Helio Campos Mello de Andrade [EMAIL PROTECTED] For what i see in four OR-plan.txt tou are doing too much sequencial scan . Create some indexes for those tables using the fields that you use an it may help you. OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working. Tips: 1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan. As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite? Actually I did rewrite the query to work properly as you can see from union-plan.txt. My question is if postgresql can do this automatically because such a rewrite is not always easy/possible (esp. for generated queries)? -- Helio Campos Mello de Andrade
Re: [PERFORM] PostgreSQL OR performance
2008/11/6 Richard Huxton [EMAIL PROTECTED] Віталій Тимчишин wrote: As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite? I must admit, I haven't managed to figure out what your query is trying to do, but then that's a common problem with autogenerated queries. That's easy - I am looking for duplicates from subset of companies. Two companies are equal when there names are simply equal or there is an entry in match table for names. The main question that needs answering is why the planner thinks you're going to get 1.3 billion rows in the or query: Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32) You don't show explain analyse for this query, so there's no way of knowing how many rows get returned but presumably you're expecting around 88000. What does explain analyse return? Yes, the query should output exactly same result as in Union plan. I will run slow explain analyze now and will repost after it will complete (tomorrow?). BTW: I'd say planner should think rows estimated as sum of ORs estimation minus intersection, but no more then sum or ORs (if intersection is 0). For first condition it has rows=525975, for second it has rows=2403 (with other plans, of course), so it's strange it has such a high estimation It's exactly 50% of full cartesian join of merge, so it does think that every second pair would succeed, that is not true.
[PERFORM] PostgreSQL OR performance
Hello. For a long time already I can see very poor OR performance in postgres. If one have query like select something from table where condition1 or condition2 it may take ages to execute while select something from table where condition1 and select something from table where condition2 are executed very fast and select something from table where condition1 and not condition2 union all select something from table where condition2 gives required results fast For example, in my current query for condition1 optimizer gives 88252, for condition1 and not condition2 it is 88258, for condition2 it is 99814. And for condition1 or condition2 it is 961499627680. And it does perform this way. All is more or less good when select part is easy and query can be easily rewritten. But for complex queries it looks ugly and if the query is autogenerated, moving autogeneration mechanism from creating simple clean where to unions is not an easy task. So the question is: Do I miss something? Can this be optimized?
Re: [PERFORM] PostgreSQL OR performance
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: For a long time already I can see very poor OR performance in postgres. If you would provide a concrete example rather than handwaving, we might be able to offer some advice ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote: For a long time already I can see very poor OR performance in postgres. If one have query like select something from table where condition1 or condition2 it may take ages to execute while select something from table where condition1 and select something from table where condition2 are executed very fast and select something from table where condition1 and not condition2 union all select something from table where condition2 gives required results fast What version are you using? Have you run VACUUM ANALYZE? Next, do: EXPLAIN ANALYZE select something from table where condition1 or condition2; for each of the queries, unless that query takes so long you don't want to wait for the result. In that case, omit the ANALYZE and just do EXPLAIN Then post those results to the list. These tell us what plans PostgreSQL is choosing and what it estimates the costs to be. If it's the output of EXPLAIN ANALYZE, it also runs the query and tells us what the costs really are. From that, we can see where the planner is going wrong, and what you might do to change it. Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
Thanks Mark, We are using DBCP and i found something about pgpool in some forum threads, which gave me queries on it. But I am clear now. On Wed, Aug 20, 2008 at 8:59 PM, Mark Lewis [EMAIL PROTECTED] wrote: Yes, we use connection pooling. As I recall Hibernate ships with c3p0 connection pooling built-in, which is what we use. We were happy enough with c3p0 that we ended up moving our other non-hibernate apps over to it, away from DBCP. pgpool does connection pooling at a socket level instead of in a local library level, so really it's a very different thing. If your app is the only thing talking to this database, and you don't have a multi-database configuration, then it will be easier for you to use a Java-based connection pooling library like c3p0 or DBCP than to use pgpool. -- Mark On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote: Hi Mark, Thank you very much for the information. I will analyse the DB structure and create indexes on PG directly. Are you using any connection pooling like DBCP? or PG POOL? Regards, KP On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625 -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625
Re: [PERFORM] PostgreSQL+Hibernate Performance
Thanks Matthew, does that mean i can just have index1, index3, index4? On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED]wrote: On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. However, if you have index1, there is no point in having index2 or index5. Matthew -- Isn't Microsoft Works something of a contradiction? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625
Re: [PERFORM] PostgreSQL+Hibernate Performance
On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote: On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. However, if you have index1, there is no point in having index2 or index5. Matthew Thanks Matthew, does that mean i can just have index1, index3, index4? (trying to get the thread back into newest-comments-last order) Well, yes you can get away with just index1, index3 and index4, and it may well be the optimal solution for you, but it's not entirely clear-cut. It's true that PG can use index1 to satisfy queries of the form SELECT x FROM y WHERE column1=somevalue or column1=a AND column2=b. It will not be as fast as an index lookup from a single index, but depending on the size of the tables/indexes and the selectivity of leading column(s) in the index, the difference in speed may be trivial. On the other hand, if you have individual indexes on column1, column2 and column3 but no multi-column index, PG can combine the individual indexes in memory with a bitmap. This is not as fast as a normal lookup in the multi-column index would be, but can still be a big win over not having an index at all. To make an educated decision you might want to read over some of the online documentation about indexes, in particular these two sections: http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html and http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
Hi Mark, Thanks again for the info. I shall create diff sets of indexes and see the query execution time. And one of such tables might get around 700,000 records over a period of 4-5 months. So what kind of other measures I need to focus on. I thought of the following 1) Indexes 2) Better Hardware (RAM HDD) And how can i estimate the size of the row? is it like based on the data types of the columns i have in the table? Do you have any info to guide me on this? On Thu, Aug 21, 2008 at 7:32 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote: On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. However, if you have index1, there is no point in having index2 or index5. Matthew Thanks Matthew, does that mean i can just have index1, index3, index4? (trying to get the thread back into newest-comments-last order) Well, yes you can get away with just index1, index3 and index4, and it may well be the optimal solution for you, but it's not entirely clear-cut. It's true that PG can use index1 to satisfy queries of the form SELECT x FROM y WHERE column1=somevalue or column1=a AND column2=b. It will not be as fast as an index lookup from a single index, but depending on the size of the tables/indexes and the selectivity of leading column(s) in the index, the difference in speed may be trivial. On the other hand, if you have individual indexes on column1, column2 and column3 but no multi-column index, PG can combine the individual indexes in memory with a bitmap. This is not as fast as a normal lookup in the multi-column index would be, but can still be a big win over not having an index at all. To make an educated decision you might want to read over some of the online documentation about indexes, in particular these two sections: http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html and http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html -- Mark -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625
[PERFORM] PostgreSQL+Hibernate Performance
Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Thanks in advance. == KP
Re: [PERFORM] PostgreSQL+Hibernate Performance
On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
Hi Mark, Thank you very much for the information. I will analyse the DB structure and create indexes on PG directly. Are you using any connection pooling like DBCP? or PG POOL? Regards, KP On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625
Re: [PERFORM] PostgreSQL+Hibernate Performance
The only thing thats bitten me about hibernate + postgres is that when inserting into partitioned tables, postgres does not reply with the number of rows that hibernate expected. My current (not great) solution is to define a specific SQLInsert annotation and tell it not to do any checking like so: @SQLInsert(sql=insert into bigmetric (account_id, a, b, timestamp, id) values (?, ?, ?, ?, ?), check=ResultCheckStyle.NONE) I just steel the sql from the SQL from hibernate's logs. On Wed, Aug 20, 2008 at 10:40 AM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 which means, i am trying fire the SQL queries keeping columns in the where conditions. and the possibilities are like the above. if we create such indexes will it effect on performance? and what is the best go in this case? On Wed, Aug 20, 2008 at 8:10 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625
Re: [PERFORM] PostgreSQL+Hibernate Performance
On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. However, if you have index1, there is no point in having index2 or index5. Matthew -- Isn't Microsoft Works something of a contradiction? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
The tradeoffs for multiple indexes are more or less as follows: 1. Having the right indexes makes queries faster, often dramatically so. 2. But more indexes makes inserts/updates slower, although generally not dramatically slower. 3. Each index requires disk space. With several indexes, you can easily have more of your disk taken up by indexes than with actual data. I would be careful to only create the indexes you need, but it's probably worse to have too few indexes than too many. Depends on your app though. -- Mark On Wed, 2008-08-20 at 20:40 +0530, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 which means, i am trying fire the SQL queries keeping columns in the where conditions. and the possibilities are like the above. if we create such indexes will it effect on performance? and what is the best go in this case? On Wed, Aug 20, 2008 at 8:10 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
Yes, we use connection pooling. As I recall Hibernate ships with c3p0 connection pooling built-in, which is what we use. We were happy enough with c3p0 that we ended up moving our other non-hibernate apps over to it, away from DBCP. pgpool does connection pooling at a socket level instead of in a local library level, so really it's a very different thing. If your app is the only thing talking to this database, and you don't have a multi-database configuration, then it will be easier for you to use a Java-based connection pooling library like c3p0 or DBCP than to use pgpool. -- Mark On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote: Hi Mark, Thank you very much for the information. I will analyse the DB structure and create indexes on PG directly. Are you using any connection pooling like DBCP? or PG POOL? Regards, KP On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on view using max
Hello, Actually what I expected from the planner for this query (select max(transid) from view) was something like this : select max(transid) from (select max(transid) from archive.transaction union all select max(transid) from public.transaction) and to apply the max function to each query of the union. This is what is happening when you use a where condition, it is using the indexes on each subquery of the view... ex: select transid from view where transid = 12; This way it would be fast enough. Also for order by and limit I was expecting the same thing. Thank you for your time, Ioana Danes constraint exclusion and inheritance won't help him. The problem is that he has two indexes, and he needs to find the max between both of them. PostgreSQL isn't smart enough to recognize that it can use two indexes, find the max in each one, and find the max of those two values. Regards, Jeff Davis __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on
Hi, I tried and this does does not work either. Thank you, Ioana --- Jim C. Nasby [EMAIL PROTECTED] wrote: On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote: On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote: Sorry, don't have the earlier part of this thread, but what about... SELECT greatest(max(a), max(b)) ... ? To fill you in, we're trying to get the max of a union (a view across two physical tables). UNION or UNION ALL? You definitely don't want to do a plain UNION if you can possibly avoid it. It can be done if you're creative with the query; I suggested a query that selected the max of the max()es of the individual tables. Your query could work too. However, the trick would be getting postgresql to recognize that it can transform SELECT max(x) FROM foo into that, where foo is a view of a union. If PostgreSQL could sort the result of a union by merging the results of two index scans, I think the problem would be solved. Is there something preventing this, or is it just something that needs to be added to the planner? Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to see if that worked in this case, but I don't have much hope for that. -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on
Hello, It looks like some of you missed my first email but my problem is not to find a replacement for this select: select max(transid) from someunionview Thare are plenty of solutions for doing this... My point is to split a tale in two and to make this transparent for the developers as a first step. On the second step they will improve some of the queries but that is another story. So I would like to know if there is any plan to improve this type of query for views in the near future, or maybe it is alredy improved in 8.2 version? I have the same problem and question for: select transid from someunionview order by transid desc limit 1; Thank you for your time, Ioana Danes --- Tom Lane [EMAIL PROTECTED] wrote: Jeff Davis [EMAIL PROTECTED] writes: If PostgreSQL could sort the result of a union by merging the results of two index scans, I think the problem would be solved. Is there something preventing this, or is it just something that needs to be added to the planner? It's something on the wish-list. Personally I'd be inclined to try to rewrite the query as a plain MAX() across rewritten per-table indexed queries, rather than worry about mergesort or anything like that. There wasn't any very good way to incorporate that idea when planagg.c was first written, but now that the planner has an explicit notion of append relations it might be relatively straightforward. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Postgresql 8.1.4 - performance issues for select on view using max
Hi everyone, I am doing a test for a scenario where I have 2 schemas one (public) for the operational data and another one (archive) for old, archived data. So basically I want to split the data from some huge tables in two. All data before 2006 in archive and all data after and including 2006 in public. Let's say I have a table named public.AllTransactions with data before and including 2006. I want to move all the data 2006 into a new table named archive.transaction (in archive schema) I also want to move all data = 2006 into a new table named public.transaction (in public schema). In order to make this transparent for the developers I want to drop the original table public.AllTransactions and to create a view with the same name that is a union between the two new tables: create view public.AllTransactions as select * from public.transaction union all select * from archive.transaction On this view I will create rules for insert, update, delete... Testing some selects I know we have in the application I got into a scenario where my plan does not work without doing code change. This scenario is: select max(transid) from alltransaction; because the planner does not use the existent indexes on the 2 new tables: public.transaction and archive.transaction Here are the results of the explain analyze: 1. Select only from one table is OK: - # explain select max(transid) from public.transaction; QUERY PLAN -- Result (cost=0.04..0.05 rows=1 width=0) InitPlan - Limit (cost=0.00..0.04 rows=1 width=8) - Index Scan Backward using pk_transaction on transaction (cost=0.00..357870.46 rows=9698002 width=8) Filter: (transid IS NOT NULL) (5 rows) 2. Select from the view is doing a sequential scan: --- # explain analyze select max(transid) from alltransaction; QUERY PLAN --- - Aggregate (cost=200579993.70..200579993.71 rows=1 width=8) (actual time=115778.101..115778.103 rows=1 loops=1) - Append (cost=1.00..200447315.74 rows=10614237 width=143) (actual time=0.082..95146.144 rows=10622206 loops= 1) - Seq Scan transaction (cost=1.00..100312397.02 rows=9698002 width=143) (actual time=0.078..56002.778 rows= 9706475 loops=1) - Seq Scan on transaction (cost=1.00..100028776.35 rows=916235 width=143) (actual time=8.822..2799.496 rows= 915731 loops=1) Total runtime: 115778.200 ms (5 rows) Is this a bug or this is how the planner is suppose to work? The same problem I have on the following select: select transid from alltransaction order by transid desc limit 1; Thank you for your time, Ioana __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on view using max
Hi, Le mercredi 18 octobre 2006 21:51, Ioana Danes a écrit : I am doing a test for a scenario where I have 2 schemas one (public) for the operational data and another one (archive) for old, archived data. So basically I want to split the data from some huge tables in two. All data before 2006 in archive and all data after and including 2006 in public. [...] I got into a scenario where my plan does not work without doing code change. This sounds a lot as a ddl partitionning, you may want to add some checks to your schema and set constraint_exclusion to on in your postgresql.conf. Please read following documentation material : http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html Regards, -- Dimitri Fontaine http://www.dalibo.com/ pgpTgCewok9P3.pgp Description: PGP signature