Re: [PERFORM] mysql to postgresql, performance questions
Greg Smith wrote: Tom Lane wrote: So has anyone looked at porting MythTV to PG? Periodically someone hacks together something that works, last big effort I'm aware of was in 2006, and then it bit rots away. I'm sure we'd get some user uptake on the result--MySQL corruption is one of the top ten cause of a MythTV system crashing. It would be the same with PG, unless the pg cluster configuration with mythtv would come with a properly configured WAL - I had corrupted tables (and a personal wiki entry (the other mysql database in my house) *only* when I sometimes took the risk of not shutting down the machine properly when e.g. the remote was missing). regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Yeb Havinga wrote: Greg Smith wrote: Tom Lane wrote: So has anyone looked at porting MythTV to PG? Periodically someone hacks together something that works, last big effort I'm aware of was in 2006, and then it bit rots away. I'm sure we'd get some user uptake on the result--MySQL corruption is one of the top ten cause of a MythTV system crashing. It would be the same with PG, unless the pg cluster configuration with mythtv would come with a properly configured WAL - I had corrupted tables (and a personal wiki entry forgot to add how to fix the corrupted tables, sorry (the other mysql database in my house) *only* when I sometimes took the risk of not shutting down the machine properly when e.g. the remote was missing). regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Yeb Havinga wrote: Greg Smith wrote: MySQL corruption is one of the top ten cause of a MythTV system crashing. It would be the same with PG, unless the pg cluster configuration with mythtv would come with a properly configured WAL - I had corrupted tables (and a personal wiki entry (the other mysql database in my house) *only* when I sometimes took the risk of not shutting down the machine properly when e.g. the remote was missing). You can shutdown a PostgreSQL database improperly and it will come back up again just fine unless a number of things have happened at just the wrong time: 1) You've written something to disk 2) The write is sitting in in a write cache, usually on the hard drive, but the OS believes the data has been written 3) There is a hard crash before that data is actually written to disk Now, this certainly still happens with PostgreSQL; was just discussing that yesterday with a client who runs an app on desktop hardware in countries with intermittant power, and database corruption is a problem for them. However, that's a fairly heavy write volume situation, which is not the case with most MythTV servers. The actual window where the WAL will not do what it's supposed to here is pretty narrow; it's easy to trigger if you pull the plug when writing constantly, but that's not a typical MythTV database load. Also, moving forward, we'll see the default filesystem on more Linux systems shift to ext4, and it's starting to lose even this vulnerability--newer kernels will flush the data out to disk in this situation using the appropriate drive command. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
reeds...@rice.edu (Ross J. Reedstrom) writes: http://www.mythtv.org/wiki/PostgreSQL_Support That's a pretty hostile presentation... The page has had two states: a) In 2008, someone wrote up... After some bad experiences with MySQL (data loss by commercial power failure, very bad performance deleting old records and more) I would prefer to have a MythTV Application option to use PostgreSQL. I never saw such bad database behaviour at any other RDBMS than MySQL. I'm ready to contribute at any activity going that direction (I'm developer for commercial database applications). b) Deleted by GBee in 2009, indicating (Outdated, messy and unsupported) -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxfinances.info/info/spreadsheets.html A language that doesn't affect the way you think about programming, is not worth knowing. -- Alan J. Perlis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
t...@sss.pgh.pa.us (Tom Lane) writes: Ross J. Reedstrom reeds...@rice.edu writes: On Sat, Mar 20, 2010 at 10:47:30PM -0500, Andy Colson wrote: (I added the and trust as an after thought, because I do have one very important 100% uptime required mysql database that is running. Its my MythTV box at home, and I have to ask permission from my GF before I take the box down to upgrade anything. And heaven forbid if it crashes or anything. So I do have experience with care and feeding of mysql. And no, I'm not kidding.) Andy, you are so me! I have the exact same one-and-only-one mission critical mysql DB, but the gatekeeper is my wife. And experience with that instance has made me love and trust PostgreSQL even more. So has anyone looked at porting MythTV to PG? It has come up several times on the MythTV list. http://david.hardeman.nu/files/patches/mythtv/mythletter.txt http://www.mythtv.org/pipermail/mythtv-dev/2004-August/025385.html http://www.mythtv.org/pipermail/mythtv-users/2006-July/141191.html Probably worth asking David Härdeman and Danny Brow who have proposed such to the MythTV community what happened. (It's possible that they will get cc'ed on this.) If there's a meaningful way to help, that would be cool. If not, then we might as well not run slipshot across the same landmines that blew the idea up before. -- Transported to a surreal landscape, a young girl kills the first woman she meets and then teams up with three complete strangers to kill again. -- Unknown, Marin County newspaper's TV listing for _The Wizard of Oz_ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
On 03/22/2010 03:21 PM, Tom Lane wrote: The fundamental reason why you're getting a bad plan choice is the factor-of-100 estimation error here. I'm not sure you can do a whole lot about that without rethinking the query --- in particular I would suggest trying to get rid of the non-constant range bounds. You're apparently already plugging in an external variable for the date, so maybe you could handle the time of day similarly instead of joining to sysstrings for it. Tom Peter, I thought you might like to know the outcome of this. I was able to get the 8.0 and the 8.2 planner to correctly run the query. There were 2 issues. As Tom pointed out the the 'systrings' lookup seems to be the main culprit. Which makes sense. How can the planner know how to run the query when it doesn't know approximately what it will bracket the until the query has started? The other part of the solution is bit concerning. I had to do a 'dump and load' (and vacuum analyze) to get the planner to work correctly even after I rewrote the query. FYI I had run 'VACUUM ANALYZE' (and sometimes 'REINDEX TABLE x') between each test. -- Christian Brink -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Forcing index scan on query produces 16x faster
On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick pe...@automotive.com wrote: I'm running 8.4.2 and have noticed a similar heavy preference for sequential scans and hash joins over index scans and nested loops. Our database is can basically fit in cache 100% so this may not be applicable to your situation, but the following params seemed to help us: seq_page_cost = 1.0 random_page_cost = 1.01 cpu_tuple_cost = 0.0001 cpu_index_tuple_cost = 0.5 cpu_operator_cost = 0.25 effective_cache_size = 1000MB shared_buffers = 1000MB Might I suggest the Postgres developers reconsider these defaults for 9.0 release, or perhaps provide a few sets of tuning params for different workloads in the default install/docs? The cpu_*_cost in particular seem to be way off afaict. I may be dead wrong though, fwiw =) The default assume that the database is not cached in RAM. If it is, you want to lower seq_page_cost and random_page_cost to something much smaller, and typically make them equal. I often recommend 0.005, but I know others have had success with higher values. Ultimately it would be nice to have a better model of how data gets cached in shared_buffers and the OS buffer cache, but that is not so easy. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] memory question
PostgreSQL 8.4.3 Linux Redhat 5.0 Question: How much memory do I really need? From my understanding there are two primary strategies for setting up PostgreSQL in relationship to memory: 1) Rely on Linux to cache the files. In this approach you set the shared_buffers to a relatively low number. 2) You can set shared_buffers to a very large percentage of your memory so that PostgreSQL reserves the memory for the database. I am currently using option #1. I have 24 Gig of memory on my server and the database takes up 17 Gig of disk space. When I do the Linux command top I notice that 19 Gig is allocated for cache. Is there a way for me to tell how much of that cache is associated with the caching of database files? I am basically asking how much memory do I really need? Maybe I have complete over kill. Maybe I am getting to a point where I might need more memory. My thought was I could use option #2 and then set the number to a lower amount. If the performance is bad then slowly work the number up. Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. I don't really believe that. But I want to cover myself. With money tight I don't want to be the person who is wasting resources. We need to replace our database servers so I want to do the right thing. Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382
Re: [PERFORM] Forcing index scan on query produces 16x faster
Ok, the wording is a bit unclear in the documentation as to whether it is the cost for an entire *page* of tuples, or actual tuples. So something like the following might give better results for a fully-cached DB? seq_page_cost = 1.0 random_page_cost = 1.1 #even memory has random access costs, lack of readahead, TLB misses, etc cpu_tuple_cost = 1.0 cpu_index_tuple_cost = 0.5 cpu_operator_cost = 0.25 effective_cache_size = 1000MB shared_buffers = 1000MB -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Wednesday, March 24, 2010 5:47 PM To: Eger, Patrick Cc: Christian Brink; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Forcing index scan on query produces 16x faster On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick pe...@automotive.com wrote: I'm running 8.4.2 and have noticed a similar heavy preference for sequential scans and hash joins over index scans and nested loops. Our database is can basically fit in cache 100% so this may not be applicable to your situation, but the following params seemed to help us: seq_page_cost = 1.0 random_page_cost = 1.01 cpu_tuple_cost = 0.0001 cpu_index_tuple_cost = 0.5 cpu_operator_cost = 0.25 effective_cache_size = 1000MB shared_buffers = 1000MB Might I suggest the Postgres developers reconsider these defaults for 9.0 release, or perhaps provide a few sets of tuning params for different workloads in the default install/docs? The cpu_*_cost in particular seem to be way off afaict. I may be dead wrong though, fwiw =) The default assume that the database is not cached in RAM. If it is, you want to lower seq_page_cost and random_page_cost to something much smaller, and typically make them equal. I often recommend 0.005, but I know others have had success with higher values. Ultimately it would be nice to have a better model of how data gets cached in shared_buffers and the OS buffer cache, but that is not so easy. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] memory question
On Wed, Mar 24, 2010 at 6:49 PM, Campbell, Lance la...@illinois.edu wrote: PostgreSQL 8.4.3 Linux Redhat 5.0 Question: How much memory do I really need? The answer is as much as needed to hold your entire database in memory and a few gig left over for sorts and backends to play in. From my understanding there are two primary strategies for setting up PostgreSQL in relationship to memory: 1) Rely on Linux to cache the files. In this approach you set the shared_buffers to a relatively low number. 2) You can set shared_buffers to a very large percentage of your memory so that PostgreSQL reserves the memory for the database. The kernel is better at caching large amounts of memory. Pg is better at handling somewhat smaller amounts and not flushing out random access data for sequential access data. I am currently using option #1. I have 24 Gig of memory on my server and the database takes up 17 Gig of disk space. When I do the Linux command “top” I notice that 19 Gig is allocated for cache. Is there a way for me to tell how much of that cache is associated with the caching of database files? Probably nearly all of that 19G for cache is allocated for pg files. Not sure how to tell off the top of my head though. I am basically asking how much memory do I really need? Maybe I have complete over kill. Maybe I am getting to a point where I might need more memory. Actually, there are three levels of caching that are possible. 1: Entire db, tables and indexes, can fit in RAM. This is the fastest method. Worth the extra $ for RAM if you can afford it / db isn't too huge. 2: Indexes can fit in RAM, some of tables can. Still pretty fast. Definitely worth paying a little extra for. 3: Neither indexes nor tables can wholly fit in RAM. At this point the speed of your large disk array becomes important, and you want a fast cachine RAID controller. Both of these items (disk array and RAID controller) are considerably more costly than 16 or 32 Gigs of RAM. My thought was I could use option #2 and then set the number to a lower amount. If the performance is bad then slowly work the number up. I'm not sure what you mean. Install less RAM and let PG do all the caching? Usually a bad idea. Usually. I'm sure there are use cases that it might be a good idea on. But keep in mind, a large amount of shared_buffers doesn't JUST buffer your reads, it also results in a much large memory space to keep track of in terms of things that need to get written out etc. I'm actually about to reduce the shared_buffers from 8G on one reporting server down to 1 or 2G cause that's plenty, and it's having a hard time keeping up with the huge checkpoints it's having to do. Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. How much do you absolutely need to boot up, run postgresql, and not run out of memory? That's what you need and it's probably around 1Gig. It's just no less arbitraty than 5G. Did he show you how he arrived at this number? If your DB is 17Gig on disk, it's foolish to be cheap on memory. I don’t really believe that. But I want to cover myself. With money tight I don’t want to be the person who is wasting resources. We need to replace our database servers so I want to do the right thing. You can waste your time (valuable but sunk cost) other people's time (more valuable, also sunk cost) or waste a few dollars on memory. 24Gig isn't that expensive really compared to say 10 seconds per transaction for 100 users, 1000 times a day. Or 11 user days in a single day. 10s of seconds start to add up. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Forcing index scan on query produces 16x faster
On Wed, Mar 24, 2010 at 8:59 PM, Eger, Patrick pe...@automotive.com wrote: Ok, the wording is a bit unclear in the documentation as to whether it is the cost for an entire *page* of tuples, or actual tuples. So something like the following might give better results for a fully-cached DB? seq_page_cost = 1.0 random_page_cost = 1.1 #even memory has random access costs, lack of readahead, TLB misses, etc cpu_tuple_cost = 1.0 cpu_index_tuple_cost = 0.5 cpu_operator_cost = 0.25 effective_cache_size = 1000MB shared_buffers = 1000MB Yeah, you can do it that way, by jacking up the cpu_tuple costs. I prefer to lower the {random/seq}_page_cost values because it keeps the cost values in the range I'm used to seeing, but it works out to the same thing. I am not sure that there is any benefit from making random_page_cost seq_page_cost on a fully cached database. What does readahead mean in the context of cached data? The data isn't likely physically contiguous in RAM, and I'm not sure it would matter much if it were. Basically, what random_page_cost seq_page_cost tends to do is discourage the use of index scans in borderline cases, so you want to benchmark it and figure out which way is faster and then tune accordingly. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance