Re: [PERFORM] Typecast bug?
Tom Lane wrote: Craig James [EMAIL PROTECTED] writes: This seems like a bug to me, but it shows up as a performance problem. emol_warehouse_1= explain analyze select version_id, parent_id from version where version_id = 999; If you actually *need* so many 9's here as to force it out of the range of bigint, then why is your id column not declared numeric? This seems to me to be about on par with complaining that intcol = 4.2e1 won't be indexed. We have a numeric data type hierarchy, learn to work with it ... Your suggestion of learn to work with it doesn't fly. A good design separates the database schema details from the application to the greatest extent possible. What you're suggesting is that every application that queries against a Postgres database should know the exact range of every numeric data type of every indexed column in the schema, simply because Postgres can't recognize an out-of-range numeric value. In this case, the optimizer could have instantly returned zero results with no further work, since the query was out of range for that column. This seems like a pretty simple optimization to me, and it seems like a helpful suggestion to make to this forum. BTW, this query came from throwing lots of junk at a web app in an effort to uncover exactly this sort of problem. It's not a real query, but then, hackers don't use real queries. The app checks that its input is a well-formed integer expression, but then assumes Postgres can deal with it from there. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Typecast bug?
On 6/26/08, Craig James [EMAIL PROTECTED] wrote: This seems like a bug to me, but it shows up as a performance problem. Since the column being queried is an integer, the second query (see below) can't possibly match, yet Postgres uses a typecast, forcing a full table scan for a value that can't possibly be in the table. Which version are you using? 8.3 removes a lot of implicit casts (all? not sure), so this may already be your fix. Cheers, Frank -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Typecast bug?
On Thu, Jun 26, 2008 at 9:02 AM, Frank Joerdens [EMAIL PROTECTED] wrote: Which version are you using? 8.3 removes a lot of implicit casts (all? not sure), so this may already be your fix. 8.3 only removed implicit casts from non text types to text (date - text, int - text, interval - text...) to avoid unexpected behaviours. -- Guillaume -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 答复: [PERFORM] Postgresql update op is very very slow
I know the problem, because there are about 35 million rows , which cost about 12G disk space and checkpoint segments use 64, but update operation is in one transaction which lead fast fill up the checkpoint segments and lead do checkpoints frequently, but checkpoints will cost lots resources, so update operation become slowly and slowly and bgwrite won't write because it's not commit yet. Create a new table maybe a quick solution, but it's not appropriated in some cases. If we can do commit very 1000 row per round, it may resolve the problem. But PG not support transaction within function yet? -邮件原件- 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 代表 Heikki Linnakangas 发送时间: 2008年6月25日 18:11 收件人: jay 抄送: pgsql-performance@postgresql.org 主题: Re: [PERFORM] Postgresql update op is very very slow jay wrote: I've a table with about 34601755 rows ,when I execute 'update msg_table set type=0;' is very very slow, cost several hours, but still not complete? Why postgresql is so slowly? Is the PG MVCC problem? Possibly. Because of MVCC, a full-table update will actually create a new version of each row. I presume that's a one-off query, or a seldom-run batch operation, and not something your application needs to do often. In that case, you could drop all indexes, and recreate them after the update, which should help a lot: BEGIN; DROP INDEX index name, index name 2, ...; -- for each index UPDATE msg_table SET type = 0; CREATE INDEX ... -- Recreate indexes COMMIT; Or even better, instead of using UPDATE, do a SELECT INTO a new table, drop the old one, and rename the new one in its place. That has the advantage that the new table doesn't contain the old row version, so you don't need to vacuum right away to reclaim the space. Actually, there's an even more clever trick to do roughly the same thing: ALTER TABLE msg_table ALTER COLUMN type TYPE int4 USING 0; (assuming type is int4, replace with the actual data type if necessary) This will rewrite the table, similar to a DROP + CREATE, and rebuild all indexes. But all in one command. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow
2008/6/26 jay [EMAIL PROTECTED]: If we can do commit very 1000 row per round, it may resolve the problem. But PG not support transaction within function yet? Yeah, transaction control is not supported inside functions. There are some hacks using dblink to do transactions inside functions. You may want to check that out. I had suggested another hack in the past for very simplistic updates, when you are sure that the tuple length does not change between updates and you are ready to handle half updated table if there is a crash or failure in between. May be for your case, where you are updating a single column of the entire table and setting it to some default value for all the rows, it may work fine. But please be aware of data consistency issues before you try that. And it must be once in a lifetime kind of hack. http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: ??: [PERFORM] Postgresql update op is very very slow
jay wrote: I know the problem, because there are about 35 million rows , which cost about 12G disk space and checkpoint segments use 64, but update operation is in one transaction which lead fast fill up the checkpoint segments and lead do checkpoints frequently, but checkpoints will cost lots resources, so update operation become slowly and slowly and bgwrite won't write because it's not commit yet. Create a new table maybe a quick solution, but it's not appropriated in some cases. If we can do commit very 1000 row per round, it may resolve the problem. Committing more frequently won't help you with checkpoints. The updates will generate just as much WAL regardless of how often you commit, so you will have to checkpoint just as often. And commits have no effect on bgwriter either; bgwriter will write just as much regardless of how often you commit. One idea would be to partition the table vertically, that is, split the table into two tables, so that the columns that you need to update like that are in one table, together with the primary key, and the rest of the columns are in another table. That way the update won't need to scan or write the columns that are not changed. You can create a view on top of the two tables to make them look like the original table to the application. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
Hi - I have been following this thread and find some of the recommendations really surprising. I understand that MVCC necessarily creates overhead, in-place updates would not be safe against crashes etc. but have a hard time believing that this is such a huge problem for RDBMS in 2008. How do large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC (maybe a different kind?) as well, but I cannot believe that large updates still pose such big problems. Are there no options (algorithms) for adaptively choosing different update strategies that do not incur the full MVCC overhead? Holger (Disclaimer: I'm not a professional DBA, just a curious developer). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
Holger Hoffstaette wrote: Hi - I have been following this thread and find some of the recommendations really surprising. I understand that MVCC necessarily creates overhead, in-place updates would not be safe against crashes etc. but have a hard time believing that this is such a huge problem for RDBMS in 2008. How do large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC (maybe a different kind?) as well, but I cannot believe that large updates still pose such big problems. Are there no options (algorithms) for adaptively choosing different update strategies that do not incur the full MVCC overhead? I think Pg already does in place updates, or close, if the tuples being replaced aren't referenced by any in-flight transaction. I noticed a while ago that if I'm doing bulk load/update work, if there aren't any other transactions no MVCC bloat seems to occur and updates are faster. I'd be interested to have this confirmed, as I don't think I've seen it documented anywhere. Is it a side-effect/benefit of HOT somehow? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware suggestions for high performance 8.3
I've seen some concerns about buying database performance hardware from DELL. Are there at least some of the RAID cards that work well with Linux or should I stay clear of DELL permanently? Thanks! //Henke 25 jun 2008 kl. 17.45 skrev Greg Smith: On Wed, 25 Jun 2008, Henrik wrote: Would you turn off fsync if you had a controller with BBU? =) Turning off fsync has some potential to introduce problems even in that environment, so better not to do that. The issue is that you might have, say, 1GB of OS-level cache but 256MB of BBU cache, and if you turn fsync off it won't force the OS cache out to the controller when it's supposed to and that can cause corruption. Also, if you've got a controller with BBU, the overhead of fsync for regular writes is low enough that you don't really need to turn it off. If writes are cached the fsync is almost free. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Wed, 25 Jun 2008, Andrew Sullivan wrote: the key thing to do is to ensure you have good testing infrastructure in place to check that things will work before you deploy to production. This is true whether you're using Linux or completely closed source software. There are two main differences from my view: -OSS software lets you look at the code before a typical closed-source company would have pushed a product out the door at all. Downside is that you need to recognize that. Linux kernels for example need significant amounts of encouters with the real world after release before they're ready for most people. -If your OSS program doesn't work, you can potentially find the problem yourself. I find that I don't fix issues when I come across them very much, but being able to browse the source code for something that isn't working frequently makes it easier to understand what's going on as part of troubleshooting. It's not like closed source software doesn't have the same kinds of bugs. The way commercial software (and projects like PostgreSQL) get organized into a smaller number of official releases tends to focus the QA process a bit better though, so that regular customers don't see as many rough edges. Linux used to do a decent job of this with their development vs. stable kernels, which I really miss. Unfortunately there's just not enough time for the top-level developers to manage that while still keeping up with the pace needed just for new work. Sorting out which are the stable kernel releases seems to have become the job of the distributors (RedHat, SuSE, Debian, etc.) instead of the core kernel developers. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Jun 25, 2008, at 11:35 AM, Matthew Wakeling wrote: On Wed, 25 Jun 2008, Greg Smith wrote: A firewire-attached log device is an extremely bad idea. Anyone have experience with IDE, SATA, or SAS-connected flash devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer rate of 100MB/s, and doesn't degrade much in performance when writing small random blocks. But what's it actually like, and is it reliable? None of these manufacturers rates these drives for massive amounts of writes. They're sold as suitable for laptop/desktop use, which normally is not a heavy wear and tear operation like a DB. Once they claim suitability for this purpose, be sure that I and a lot of others will dive into it to see how well it really works. Until then, it will just be an expensive brick-making experiment, I'm sure. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
On Thu, Jun 26, 2008 at 02:40:59PM +0200, Holger Hoffstaette wrote: large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC (maybe a different kind?) as well, but I cannot believe that large updates still pose such big problems. DB2 does not use MVCC. This is why lock escalation is such a big problem for them. Oracle uses a kind of MVCC based on rollback segments: your work goes into the rollback segment, so that it can be undone, and the update happens in place. This causes a different kind of pain: you can run out of rollback segments (part way through a long-running transaction, even) and then have to undo everything in order to do any work at all. Every system involves trade-offs, and different systems make different ones. The bulk update problem is PostgreSQL's weak spot, and for that cost one gets huge other benefits. Are there no options (algorithms) for adaptively choosing different update strategies that do not incur the full MVCC overhead? How would you pick? But one thing you could do is create the table with a non-standard fill factor, which might allow HOT to work its magic. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
On Thu, Jun 26, 2008 at 09:16:25PM +0800, Craig Ringer wrote: I think Pg already does in place updates, or close, if the tuples being replaced aren't referenced by any in-flight transaction. I noticed a while ago that if I'm doing bulk load/update work, if there aren't any other transactions no MVCC bloat seems to occur and updates are faster. Are you on 8.3? That may be HOT working for you. MVCC doesn't get turned off if there are no other transactions (it can't: what if another transaction starts part way through yours?). A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
Holger Hoffstaette wrote: Hi - I have been following this thread and find some of the recommendations really surprising. I understand that MVCC necessarily creates overhead, in-place updates would not be safe against crashes etc. but have a hard time believing that this is such a huge problem for RDBMS in 2008. How do large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC (maybe a different kind?) as well, but I cannot believe that large updates still pose such big problems. Are there no options (algorithms) for adaptively choosing different update strategies that do not incur the full MVCC overhead? My opinion: Any system that provides cheap UPDATE operations is either not ACID compliant, or is not designed for highly concurrent access, possibly both. By ACID compliant I mean that there both the OLD and NEW need to take space on the hard disk in order to guarantee that if a failure occurs in the middle of the transaction, one can select only the OLD versions for future transactions, or if it fails after the end fo the transaction, one can select only the NEW versions for future transactions. If both must be on disk, it follows that updates are expensive. Even with Oracle rollback segments - the rollback segments need to be written. Perhaps they will be more sequential, and able to be written more efficiently, but the data still needs to be written. The other option is to make sure that only one person is doing updates at a time, and in this case it becomes possible (although not necessarily safe unless one implements the ACID compliant behaviour described in the previous point) for one operation to complete before the next begins. The HOT changes introduced recently into PostgreSQL should reduce the cost of updates in many cases (but not all - I imagine that updating ALL rows is still expensive). There is a third system I can think of, but I think it's more theoretical than practical. That is, remember the list of changes to each row/column and replay them on query. The database isn't ever stored in a built state, but is only kept as pointers that allow any part of the table to be re-built on access. The UPDATE statement could be recorded cheaply, but queries against the UPDATE statement might be very expensive. :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow
jay [EMAIL PROTECTED] writes: I know the problem, because there are about 35 million rows , which cost about 12G disk space and checkpoint segments use 64, but update operation is in one transaction which lead fast fill up the checkpoint segments and lead do checkpoints frequently, but checkpoints will cost lots resources, so update operation become slowly and slowly and bgwrite won't write because it's not commit yet. Create a new table maybe a quick solution, but it's not appropriated in some cases. If we can do commit very 1000 row per round, it may resolve the problem. No, that's utterly unrelated. Transaction boundaries have nothing to do with checkpoints. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow
2008/6/26 Pavan Deolasee [EMAIL PROTECTED]: 2008/6/26 jay [EMAIL PROTECTED]: If we can do commit very 1000 row per round, it may resolve the problem. But PG not support transaction within function yet? Yeah, transaction control is not supported inside functions. There are some hacks using dblink to do transactions inside functions. You may want to check that out. If you need autonomous transactions. For most people save points and catching seem to be a n acceptable form of transaction control. I had suggested another hack in the past for very simplistic updates, when you are sure that the tuple length does not change between updates and you are ready to handle half updated table if there is a crash or failure in between. May be for your case, where you are updating a single column of the entire table and setting it to some default value for all the rows, it may work fine. But please be aware of data consistency issues before you try that. And it must be once in a lifetime kind of hack. http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html In a way that's what pg_bulkloader does. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow
2008/6/26 Tom Lane [EMAIL PROTECTED]: jay [EMAIL PROTECTED] writes: I know the problem, because there are about 35 million rows , which cost about 12G disk space and checkpoint segments use 64, but update operation is in one transaction which lead fast fill up the checkpoint segments and lead do checkpoints frequently, but checkpoints will cost lots resources, so update operation become slowly and slowly and bgwrite won't write because it's not commit yet. Create a new table maybe a quick solution, but it's not appropriated in some cases. If we can do commit very 1000 row per round, it may resolve the problem. No, that's utterly unrelated. Transaction boundaries have nothing to do with checkpoints. True. But if you update 1 rows and vacuum you can keep the bloat to something reasonable. On another note, I haven't seen anyone suggest adding the appropriate where clause to keep from updating rows that already match. Cheap compared to updating the whole table even if a large chunk aren't a match. i.e. ... set col=0 where col 0; That should be the first thing you reach for in this situation, if it can help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Thu, 26 Jun 2008, Vivek Khera wrote: Anyone have experience with IDE, SATA, or SAS-connected flash devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer rate of 100MB/s, and doesn't degrade much in performance when writing small random blocks. But what's it actually like, and is it reliable? None of these manufacturers rates these drives for massive amounts of writes. They're sold as suitable for laptop/desktop use, which normally is not a heavy wear and tear operation like a DB. Once they claim suitability for this purpose, be sure that I and a lot of others will dive into it to see how well it really works. Until then, it will just be an expensive brick-making experiment, I'm sure. It claims a MTBF of 2,000,000 hours, but no further reliability information seems forthcoming. I thought the idea that flash couldn't cope with many writes was no longer true these days? Matthew -- I work for an investment bank. I have dealt with code written by stock exchanges. I have seen how the computer systems that store your money are run. If I ever make a fortune, I will store it in gold bullion under my bed. -- Matthew Crosby -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Thu, Jun 26, 2008 at 10:14 AM, Matthew Wakeling [EMAIL PROTECTED] wrote: On Thu, 26 Jun 2008, Vivek Khera wrote: Anyone have experience with IDE, SATA, or SAS-connected flash devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer rate of 100MB/s, and doesn't degrade much in performance when writing small random blocks. But what's it actually like, and is it reliable? None of these manufacturers rates these drives for massive amounts of writes. They're sold as suitable for laptop/desktop use, which normally is not a heavy wear and tear operation like a DB. Once they claim suitability for this purpose, be sure that I and a lot of others will dive into it to see how well it really works. Until then, it will just be an expensive brick-making experiment, I'm sure. It claims a MTBF of 2,000,000 hours, but no further reliability information seems forthcoming. I thought the idea that flash couldn't cope with many writes was no longer true these days? What's mainly happened is a great increase in storage capacity has allowed flash based devices to spread their writes out over so many cells that the time it takes to overwrite all the cells enough to get dead ones is measured in much longer intervals. Instead of dieing in weeks or months, they'll now die, for most work loads, in years or more. However, I've tested a few less expensive solid state storage and for some transactional loads it was much faster, but then for things like report queries scanning whole tables they were factors slower than a sw RAID-10 array of just 4 spinning disks. But pg_bench was quite snappy using the solid state storage for pg_xlog. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Thu, Jun 26, 2008 at 9:49 AM, Peter T. Breuer [EMAIL PROTECTED] wrote: Also sprach Merlin Moncure: As discussed down thread, software raid still gets benefits of write-back caching on the raid controller...but there are a couple of (I wish I knew what write-back caching was!) hardware raid controllers generally have some dedicated memory for caching. the controllers can be configured in one of two modes: (the jargon is so common it's almost standard) write back: raid controller can lie to host o/s. when o/s asks controller to sync, controller can hold data in cache (for a time) write through: raid controller can not lie. all sync requests must pass through to disk The thinking is, the bbu on the controller can hold scheduled writes in memory (for a time) and replayed to disk when server restarts in event of power failure. This is a reasonable compromise between data integrity and performance. 'write back' caching provides insane burst IOPS (because you are writing to controller cache) and somewhat improved sustained IOPS because the controller is reorganizing writes on the fly in (hopefully) optimal fashion. This imposes a considerable extra resource burden. It's a mystery to me However the lack of extra buffering is really deliberate (double buffering is a horrible thing in many ways, not least because of the snip completely unconvincing. the overhead of various cache layers is completely minute compared to a full fault to disk that requires a seek which is several orders of magnitude slower. The linux software raid algorithms are highly optimized, and run on a presumably (much faster) cpu than what the controller supports. However, there is still some extra oomph you can get out of letting the raid controller do what the software raid can't...namely delay sync for a time. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Thu, Jun 26, 2008 at 12:14 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: None of these manufacturers rates these drives for massive amounts of writes. They're sold as suitable for laptop/desktop use, which normally is not a heavy wear and tear operation like a DB. Once they claim suitability for this purpose, be sure that I and a lot of others will dive into it to see how well it really works. Until then, it will just be an expensive brick-making experiment, I'm sure. It claims a MTBF of 2,000,000 hours, but no further reliability information seems forthcoming. I thought the idea that flash couldn't cope with many writes was no longer true these days? Flash and disks have completely different failure modes, and you can't do apples to apples MTBF comparisons. In addition there are many different types of flash (MLC/SLC) and the flash cells themselves can be organized in particular ways involving various trade-offs. The best flash drives combined with smart wear leveling are anecdotally believed to provide lifetimes that are good enough to warrant use in high duty server environments. The main issue is lousy random write performance that basically makes them useless for any kind of OLTP operation. There are a couple of software (hacks?) out there which may address this problem if the technology doesn't get there first. If the random write problem were solved, a single ssd would provide the equivalent of a stack of 15k disks in a raid 10. see: http://www.bigdbahead.com/?p=44 http://feedblog.org/2008/01/30/24-hours-with-an-ssd-and-mysql/ merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software Raid
Also sprach Merlin Moncure: write back: raid controller can lie to host o/s. when o/s asks This is not what the linux software raid controller does, then. It does not queue requests internally at all, nor ack requests that have not already been acked by the components (modulo the fact that one can deliberately choose to have a slow component not be sync by allowing write-behind on it, in which case the controller will ack the incoming request after one of the compionents has been serviced, without waiting for both). integrity and performance. 'write back' caching provides insane burst IOPS (because you are writing to controller cache) and somewhat improved sustained IOPS because the controller is reorganizing writes on the fly in (hopefully) optimal fashion. This is what is provided by Linux file system and (ordinary) block device driver subsystem. It is deliberately eschewed by the soft raid driver, because any caching will already have been done above and below the driver, either in the FS or in the components. However the lack of extra buffering is really deliberate (double buffering is a horrible thing in many ways, not least because of the snip completely unconvincing. But true. Therefore the problem in attaining conviction must be at your end. Double buffering just doubles the resources dedicated to a single request, without doing anything for it! It doubles the frequency with which one runs out of resources, it doubles the frequency of the burst limit being reached. It's deadly (deadlockly :) in the situation where the receiving component device also needs resources in order to service the request, such as when the transport is network tcp (and I have my suspicions about scsi too). the overhead of various cache layers is completely minute compared to a full fault to disk that requires a seek which is several orders of magnitude slower. That's aboslutely true when by overhead you mean computation cycles and absolutely false when by overhead you mean memory resources, as I do. Double buffering is a killer. The linux software raid algorithms are highly optimized, and run on a I can confidently tell you that that's balderdash both as a Linux author and as a software RAID linux author (check the attributions in the kernel source, or look up something like Raiding the Noosphere on google). presumably (much faster) cpu than what the controller supports. However, there is still some extra oomph you can get out of letting the raid controller do what the software raid can't...namely delay sync for a time. There are several design problems left in software raid in the linux kernel. One of them is the need for extra memory to dispatch requests with and as (i.e. buffer heads and buffers, both). bhs should be OK since the small cache per device won't be exceeded while the raid driver itself serialises requests, which is essentially the case (it does not do any buffering, queuing, whatever .. and tries hard to avoid doing so). The need for extra buffers for the data is a problem. On different platforms different aspects of that problem are important (would you believe that on ARM mere copying takes so much cpu time that one wants to avoid it at all costs, whereas on intel it's a forgettable trivium). I also wouldn't aboslutely swear that request ordering is maintained under ordinary circumstances. But of course we try. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software Raid
On Thu, Jun 26, 2008 at 1:03 AM, Peter T. Breuer [EMAIL PROTECTED] wrote: Also sprach Merlin Moncure: write back: raid controller can lie to host o/s. when o/s asks This is not what the linux software raid controller does, then. It does not queue requests internally at all, nor ack requests that have not already been acked by the components (modulo the fact that one can deliberately choose to have a slow component not be sync by allowing write-behind on it, in which case the controller will ack the incoming request after one of the compionents has been serviced, without waiting for both). integrity and performance. 'write back' caching provides insane burst IOPS (because you are writing to controller cache) and somewhat improved sustained IOPS because the controller is reorganizing writes on the fly in (hopefully) optimal fashion. This is what is provided by Linux file system and (ordinary) block device driver subsystem. It is deliberately eschewed by the soft raid driver, because any caching will already have been done above and below the driver, either in the FS or in the components. However the lack of extra buffering is really deliberate (double buffering is a horrible thing in many ways, not least because of the snip completely unconvincing. But true. Therefore the problem in attaining conviction must be at your end. Double buffering just doubles the resources dedicated to a single request, without doing anything for it! It doubles the frequency with which one runs out of resources, it doubles the frequency of the burst limit being reached. It's deadly (deadlockly :) in the situation where Only if those resources are drawn from the same pool. You are oversimplifying a calculation that has many variables such as cost. CPUs for example are introducing more cache levels (l1, l2, l3), etc. Also, the different levels of cache have different capabilities. Only the hardware controller cache is (optionally) allowed to delay acknowledgment of a sync. In postgresql terms, we get roughly the same effect with the computers entire working memory with fsync disabled...so that we are trusting, rightly or wrongly, that all writes will eventually make it to disk. In this case, the raid controller cache is redundant and marginally useful. the receiving component device also needs resources in order to service the request, such as when the transport is network tcp (and I have my suspicions about scsi too). the overhead of various cache layers is completely minute compared to a full fault to disk that requires a seek which is several orders of magnitude slower. That's aboslutely true when by overhead you mean computation cycles and absolutely false when by overhead you mean memory resources, as I do. Double buffering is a killer. Double buffering is most certainly _not_ a killer (or at least, _the_ killer) in practical terms. Most database systems that do any amount of writing (that is, interesting databases) are bound by the ability to randomly read and write to the storage medium, and only that. This is why raid controllers come with a relatively small amount of cache...there are diminishing returns from reorganizing writes. This is also why up and coming storage technologies (like flash) are so interesting. Disk drives have made only marginal improvements in speed since the early 80's. The linux software raid algorithms are highly optimized, and run on a I can confidently tell you that that's balderdash both as a Linux author I'm just saying here that there is little/no cpu overhead for using software raid on modern hardware. believe that on ARM mere copying takes so much cpu time that one wants to avoid it at all costs, whereas on intel it's a forgettable trivium). This is a database list. The main area of interest is in dealing with server class hardware. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software Raid
On Thu, 26 Jun 2008, Peter T. Breuer wrote: Also sprach Merlin Moncure: The linux software raid algorithms are highly optimized, and run on a I can confidently tell you that that's balderdash both as a Linux author and as a software RAID linux author (check the attributions in the kernel source, or look up something like Raiding the Noosphere on google). presumably (much faster) cpu than what the controller supports. However, there is still some extra oomph you can get out of letting the raid controller do what the software raid can't...namely delay sync for a time. There are several design problems left in software raid in the linux kernel. One of them is the need for extra memory to dispatch requests with and as (i.e. buffer heads and buffers, both). bhs should be OK since the small cache per device won't be exceeded while the raid driver itself serialises requests, which is essentially the case (it does not do any buffering, queuing, whatever .. and tries hard to avoid doing so). The need for extra buffers for the data is a problem. On different platforms different aspects of that problem are important (would you believe that on ARM mere copying takes so much cpu time that one wants to avoid it at all costs, whereas on intel it's a forgettable trivium). I also wouldn't aboslutely swear that request ordering is maintained under ordinary circumstances. which flavor of linux raid are you talking about (the two main families I am aware of are the md and dm ones) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Federated Postgresql architecture ?
Hi List; Anyone have any experiences to share per setting up a federated architecture with PostgreSQL ? I wonder if the dblink contrib works well in a federated scenario, specifically in the setup of the federated views which equate to a select * from the same table on each federated server ? Thanks in advance... /Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 4:33 PM, kevin kempter [EMAIL PROTECTED] wrote: Anyone have any experiences to share per setting up a federated architecture with PostgreSQL ? I wonder if the dblink contrib works well in a federated scenario, specifically in the setup of the federated views which equate to a select * from the same table on each federated server ? Because Postgres currently lacks the ability to push down predicates to individual nodes over a database link, you have to spend a good amount of time writing PL set-returning functions capable of adding appropriate WHERE clauses to queries sent over the link. There are other things you can do, but it's mostly hackery at this point in time. IIRC, David Fetter is trying to get some of the required predicate information exposed for use in DBI-Link. Not to self-plug, but if you require it, EnterpriseDB includes Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support predicate push-down. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 5:41 PM, Josh Berkus [EMAIL PROTECTED] wrote: Not to self-plug, but if you require it, EnterpriseDB includes Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support predicate push-down. Also check out Skytools: http://skytools.projects.postgresql.org/doc/ Hmm, I didn't think the Skype tools could really provide federated database functionality without a good amount of custom work. Or, am I mistaken? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ??: Postgresql update op is very very slow
On Thu, 26 Jun 2008, Holger Hoffstaette wrote: How do large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC (maybe a different kind?) as well An intro to the other approaches used by Oracle and DB2 (not MVCC) is at http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Transaction_Locking_and_Scalability (a URL which I really need to shorten one day). Are there no options (algorithms) for adaptively choosing different update strategies that do not incur the full MVCC overhead? If you stare at the big picture of PostgreSQL's design, you might notice that it usually aims to do things one way and get that implementation right for the database's intended audience. That intended audience cares about data integrity and correctness and is willing to suffer the overhead that goes along with operating that way. There's few I don't care about reliability here so long as it's fast switches you can flip, and not having duplicate code paths to support them helps keep the code simpler and therefore more reliable. This whole area is one of those good/fast/cheap trios. If you want good transaction guarantees on updates, you either get the hardware and settings right to handle that (!cheap), or it's slow. The idea of providing a !good/fast/cheap option for updates might have some theoretical value, but I think you'd find it hard to get enough support for that idea to get work done on it compared to the other things developer time is being spent on right now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
Jonah, Hmm, I didn't think the Skype tools could really provide federated database functionality without a good amount of custom work. Or, am I mistaken? Sure, what do you think pl/proxy is for? -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software Raid
On Thu, 26 Jun 2008, Peter T. Breuer wrote: Double buffering is a killer. No, it isn't; it's a completely trivial bit of overhead. It only exists during the time when blocks are queued to write but haven't been written yet. On any database system, in those cases I/O congestion at the disk level (probably things backed up behind seeks) is going to block writes way before the memory used or the bit of CPU time making the extra copy becomes a factor on anything but minimal platforms. You seem to know quite a bit about the RAID implementation, but you are a) extrapolating from that knowledge into areas of database performance you need to spend some more time researching first and b) extrapolating based on results from trivial hardware, relative to what the average person on this list is running a database server on in 2008. The weakest platform I deploy PostgreSQL on and consider relevant today has two cores and 2GB of RAM, for a single-user development system that only has to handle a small amount of data relative to what the real servers handle. If you note the kind of hardware people ask about here that's pretty typical. You have some theories here, Merlin and I have positions that come from running benchmarks, and watching theories suffer a brutal smack-down from the real world is one of those things that happens every day. There is absolutely some overhead from paths through the Linux software RAID that consume resources. But you can't even measure that in database-oriented comparisions against hardware setups that don't use those resources, which means that for practical purposes the overhead doesn't exist in this context. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Federated Postgresql architecture ?
On Thu, Jun 26, 2008 at 6:31 PM, Josh Berkus [EMAIL PROTECTED] wrote: Sure, what do you think pl/proxy is for? Well, considering that an application must be written specifically to make use of it, and for very specific scenarios, I wouldn't consider it as making PostgreSQL a federated database. The pl/proxy architecture certainly doesn't resemble federated in the sense of the other database vendors. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware vs Software RAID
On Wednesday 25 June 2008 11:24:23 Greg Smith wrote: What I often do is get a hardware RAID controller, just to accelerate disk writes, but configure it in JBOD mode and use Linux or other software RAID on that platform. JBOD + RAIDZ2 FTW ;-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware suggestions for high performance 8.3
On Thu, Jun 26, 2008 at 10:47 PM, Greg Smith [EMAIL PROTECTED] wrote: On Thu, 26 Jun 2008, Henrik wrote: I've seen some concerns about buying database performance hardware from DELL. Are there at least some of the RAID cards that work well with Linux or should I stay clear of DELL permanently? People seem to be doing OK if the RAID card is their Perc/6i, which has an LSI Logic MegaRAID SAS 1078 chipset under the hood. There's some helpful benchmark results and follow-up meesages related to one of those at http://archives.postgresql.org/message-id/[EMAIL PROTECTED] Yeah, the problems I've had have been with the internal RAID (perc 5???) lsi based controllers. They kick their drives offline. Dell has a firmware update but we haven't had a chance to install it just yet to see if it fixes the problem with that one. That said, I consider the rebranded LSI cards a pain and hate the quality of Dell's hardware. Yeah, I'd just as soon get a regular LSI bios as the remade one Dell seems intent on pushing. Also, we just discovered the broadcom chipsets we have in our Dell 1950s and 1850s will not negotiate to gigabit with our Nortel switches. Everything else I've plugged in just worked. Went looking at Dell's site, and for the 1950 they recommend buying a dual port Intel NIC for it. Why couldn't they just build in better NICS to start? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance