On Wed, 2006-11-08 at 10:59 -0700, Daniel C. wrote: > Is it possible to get more information from your friend about this?
I think the information posted in response to my post probably will tell you the technical details. Here is what he had to say to me: (10:16:25) Mark Lewis: PostgreSQL is nirvana, as far as development is concerned. Oracle is almost as pleasant to develop with, and has more enterprise features to boot, so it's on my 'A' list too. SQLServer and MySQL seem more or less on the same level; they're based on an outmoded and difficult to work with underlying data model, which prevents them from really behaving as nicely as PG and Oracle, but they're at least decent implementations of a flawed idea (although SQLServer has some absurdly annoying documentation where MS tries to repeatedly convince you that you don't really want all those pesky ACID features that are so difficult to implement with their code base). Then there's DB2. Based on the same old model as SQLServer and MySQL, it's a dinosaur. It hasn't substantially changed since hard disks were invented. Working with it is like chipping fossils out of the rock. (10:17:36) Mark Lewis: Sorry, didn't really mean to go into a tirade there, just venting some frustration because I need to pick up some DB2 work again today. (10:34:16) Mark Lewis: Anyway, PG and Oracle are interesting because although they both implement the same basic idea (keeping multiple versions of the same row around), they come at it from two different angles. Oracle always keeps only the most recent version of the row in its main tablespace, and older versions get copied into the redo logs. PostgreSQL keeps everything in the main table and differentiates each row with a version id. (10:36:35) torriem: interesting (10:37:09) Mark Lewis: This explains why you need to periodically vacuum PG databases, to get rid of the old versions, but with Oracle you just need to back up or throw away the old redo logs once you don't need them anymore. On the other hand, rollbacks in PG are an O(1) operation, it just invalidates the transaction ID that modified a row. With Oracle, it takes as long to rollback a transaction as it took to do it in the first place, because everything needs to be copied back from the redo logs. (10:38:35) torriem: true (10:38:57) torriem: In my experience, though, on huge tables, vacuuming takes a long time (4GB table) and during that time the table is read-only (10:39:14) Mark Lewis: This is also why you'll almost certainly never see flashback queries in any database besides Oracle; the lock-based DB's by definition only store the most recent version of the row in a usable format, and when PG vacuums it discards all inactive rows, so you'd have holes in your history. (10:39:37) Mark Lewis: In newer versions of PG, vacuum doesn't lock the table. (10:39:43) torriem: nice (10:39:48) torriem: I'll have to move to 8.x soon then (10:40:33) Mark Lewis: They've also added the autovacuum daemon into the core, so it'll be smart enough to run vacuums exactly as often as you need them. (10:41:01) Mark Lewis: Although I still think you need to turn autovacuum on, it doesn't run by default. (10:43:20) torriem: good. that's at the expense of rollback history, though, right? (10:43:29) torriem: (forgive me; my database experience and knowledge is not too great) (10:44:52) Mark Lewis: Well, not really. Vacuum will never remove anything that could still be rolled back. It will only remove rows which have definitely been outdated by committed transactions. (10:45:16) Mark Lewis: But yes in the sense that vacuuming is why flashback queries won't work in PG. (10:47:22) Mark Lewis: 8.0 introduced the WAL, which is nice from an administrative perspective (point-in-time recovery, much easier online backups), but also significantly improved write performance, especially for lots of small transactions. They also did a good amount of performance tuning on the hotspots, resulting in respectable across-the-board query improvements. (10:48:38) Mark Lewis: 8.1 is really nice especially if you have complicated queries with multiple 'and' terms, because it introduces bitmap index scans which can make it much faster to use the intersection of two indexes to answer a query. (10:49:50) Mark Lewis: They did some of the work of making vacuum less intrusive in 7.4, but most of it was in the 8.0 release. (10:51:43) Mark Lewis: Oh, and with 8.0 they got rid of the big shared memory allocator lock, so performance increases significantly when you have lots of memory and multiple processors. (10:53:44) torriem: sorry about that. had to go away for a moment (10:54:20) Mark Lewis: No prob. I probably gave you more than you really wanted to know :) (10:54:32) torriem: no I appreciate what you've told me (10:55:05) torriem: I definitely need to roll my database server over to 8.0. It's still 7.3. Of course we don't do anything heavy, but I do have one application (the pr0n url sniffer) that has 4 GB of data) (10:55:23) torriem: Seems to me that with 8.0 I shouldn't really need mysql for much. (10:55:47) torriem: Although mysql's auto-incrementing indexes are nice. With PG I have to use a sequence and a default function (10:57:06) Mark Lewis: You can use the serial data type in PG, which makes a PG column look like an auto-increment column in MySQL by creating a sequence behind the scenes for you. (10:57:51) Mark Lewis: http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL (10:59:15) torriem: nice! (11:02:25) Mark Lewis: For non-concurrent applications, MySQL still has a performance advantage especially with trivial inserts and selects, although the difference is much smaller than it used to be. So there's still a place for MySQL. Although I wouldn't touch it for other reasons, most notably that it has a bad habit of automatically guessing what you really meant to do and doing something bad without giving you an error. (11:03:28) Mark Lewis: And if you actually need referential integrity then you need to use InnoDB tables which aren't any faster than PG. > > > Well it's likely that PHP and Django don't use any advanced DB features > > like triggers, constraints, or stored procedures, so it really doesn't > > matter in the least what db you pick, as long as the load is manageable. > > PHP can, Django doesn't have it (except foreign key constraints) > natively. But since we'll be running reports and things from outside > of Django, those things do come into play. > > Dan > > /* > PLUG: http://plug.org, #utah on irc.freenode.net > Unsubscribe: http://plug.org/mailman/options/plug > Don't fear the penguin. > */ > /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
