Re: [PERFORM] Update table performance
Erik Jones wrote: > Decibel! wrote: >> I should mention that if you can handle splitting the >> update into multiple transactions, that will help a >> lot since it means you won't be doubling the size of >> the table. > > As I mentioned above, when you do an update you're actually inserting a > new row and deleting the old one. That deleted row is still considered > part of the table (for reasons of concurrency, read up on the > concurrency chapter in the manual for the details) and once it is no > longer visible by any live transactions can be re-used by future > inserts. So, if you update one column on every row of a one million row > table all at once, you have to allocate and write out one million new > rows. But, if you do the update a quarter million at a time, the last > three updates would be able to re-use many of the rows deleted in > earlier updates. Only if you vacuum between the updates. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Update table performance
On Aug 8, 2007, at 3:00 AM, Heikki Linnakangas wrote: Erik Jones wrote: Decibel! wrote: I should mention that if you can handle splitting the update into multiple transactions, that will help a lot since it means you won't be doubling the size of the table. As I mentioned above, when you do an update you're actually inserting a new row and deleting the old one. That deleted row is still considered part of the table (for reasons of concurrency, read up on the concurrency chapter in the manual for the details) and once it is no longer visible by any live transactions can be re-used by future inserts. So, if you update one column on every row of a one million row table all at once, you have to allocate and write out one million new rows. But, if you do the update a quarter million at a time, the last three updates would be able to re-use many of the rows deleted in earlier updates. Only if you vacuum between the updates. This is true. In fact, the chapter on Routine Database Maintenance tasks that discusses vacuuming explains all of this. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Update table performance
On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote: > Vertical partitioning is where you split up your table on disk by > columns, i.e on the vertical lines. He quoted it because Postgres > doesn't actually support it transparently but you can always fake it > by splitting up your table. For example, given the following table > wherein column bar gets updated a lot but the others don't: > > create table foo ( > idint not null, > bar int, > baz int, > > primary key (id) > ); > > You could split it up like so: > > create table foo_a ( > idint, > baz int, > > primary key (id) > ); > > create table foo_b ( > foo_idint, > bar int, > > foreign key foo_a_id (foo_id) references foo_a (id) > ); FWIW, the cases where I've actually used this have been on much wider tables, and a number of the attributes are in-frequently accessed. An example would be if you keep snail-mail address info for users; you probably don't use those fields very often, so they would be good candidates for going into a second table. When does it actually make sense to use this? When you do a *lot* with a small number of fields in the table. In this example, perhaps you very frequently need to look up either user_name or user_id, probably via joins. Having a table with just name, id, perhaps password and a few other fields might add up to 50 bytes per row (with overhead), while address information by itself could easily be 50 bytes. So by pushing that out to another table, you cut the size of the main table in half. That means more efficient use of cache, faster seqscans, etc. The case Erik is describing is more unique to PostgreSQL and how it handles MVCC. In some cases, splitting a frequently updated row out to a separate table might not gain as much once we get HOT, but it's still a good tool to consider. Depending on what you're doing another useful technique is to not update the field as often by logging updates to be performed into a separate table and periodically processing that information into the main table. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpjuxIIyA3eO.pgp Description: PGP signature
Re: [PERFORM] When/if to Reindex
On Jul 18, 2007, at 1:08 PM, Steven Flatt wrote: Some background: we make extensive use of partitioned tables. In fact, I'm really only considering reindexing partitions that have "just closed". In our simplest/most general case, we have a table partitioned by a timestamp column, each partition 24 hours wide. The partition will have an index on the timestamp column as well as a few other indexes including a primary key If all you ever did was insert into that table, then you probably don't need to reindex. If you did mass updates/deletes mixed with your inserts, then perhaps you do. Do some experiments comparing pg_class.relpages for your table and its indexes before and after a reindex. Decide if the number of pages you save on the index is worth the trouble. If it shaves off just a handful of pages, I'd vote no... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] When/if to Reindex
On 8/8/07, Vivek Khera <[EMAIL PROTECTED]> wrote: > > If all you ever did was insert into that table, then you probably > don't need to reindex. If you did mass updates/deletes mixed with > your inserts, then perhaps you do. > > Do some experiments comparing pg_class.relpages for your table and > its indexes before and after a reindex. Decide if the number of > pages you save on the index is worth the trouble. If it shaves off > just a handful of pages, I'd vote no... What's interesting is that an insert-only table can benefit significantly from reindexing after the table is fully loaded. I had done experiments exactly as you suggest (looking at pg_class.relpages), and determined that reindexing results in about a 30% space savings for all indexes except the PK index. The PK index (integer based on a sequence) does not benefit at all. By setting fillfactor=100 on the index prior to reindexing, I get another 10% space savings on all the indexes. Not to mention the general performance improvements when reading from the table... So, we decided that reindexing partitions after they're fully loaded *was* worth it. Steve
Re: [PERFORM] When/if to Reindex
In response to "Steven Flatt" <[EMAIL PROTECTED]>: > On 8/8/07, Vivek Khera <[EMAIL PROTECTED]> wrote: > > > > If all you ever did was insert into that table, then you probably > > don't need to reindex. If you did mass updates/deletes mixed with > > your inserts, then perhaps you do. > > > > Do some experiments comparing pg_class.relpages for your table and > > its indexes before and after a reindex. Decide if the number of > > pages you save on the index is worth the trouble. If it shaves off > > just a handful of pages, I'd vote no... > > > What's interesting is that an insert-only table can benefit significantly > from reindexing after the table is fully loaded. I had done experiments > exactly as you suggest (looking at pg_class.relpages), and determined that > reindexing results in about a 30% space savings for all indexes except the > PK index. The PK index (integer based on a sequence) does not benefit at > all. By setting fillfactor=100 on the index prior to reindexing, I get > another 10% space savings on all the indexes. > > Not to mention the general performance improvements when reading from the > table... > > So, we decided that reindexing partitions after they're fully loaded *was* > worth it. I've had similar experience. One thing you didn't mention that I've noticed is that VACUUM FULL often bloats indexes. I've made it SOP that after application upgrades (which usually includes lots of ALTER TABLES and other massive schema and data changes) I VACUUM FULL and REINDEX (in that order). Lots of ALTER TABLEs seem to bloat the database size considerably, beyond what normal VACUUM seems to fix. A FULL seems to fix that, but it appears to bloat the indexes, thus a REINDEX helps. I would expect that setting fillfactor to 100 will encourage indexs to bloat faster, and would only be recommended if you didn't expect the index contents to change? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] How to ENABLE SQL capturing???
Hello all, I am trying to enable capturing of the submitted code via an application...how do I do this in Postgres? Performance is SLOW on my server and I have autovacuum enabled as well as rebuilt indexes...whatelse should be looked at? Thanks...Michelle -- View this message in context: http://www.nabble.com/How-to-ENABLE-SQL-capturingtf4238694.html#a12060736 Sent from the PostgreSQL - performance mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to ENABLE SQL capturing???
On Wed, Aug 08, 2007 at 01:02:24PM -0700, smiley2211 wrote: > I am trying to enable capturing of the submitted code via an > application...how do I do this in Postgres? Performance is SLOW on my > server and I have autovacuum enabled as well as rebuilt indexes...whatelse > should be looked at? Try "log_min_duration_statement = 100" in postgresql.conf; it will show all statements that take more than 100ms. Set to 0 to log _all_ statements, or -1 to turn the logging back off. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] When/if to Reindex
On Wed, Aug 08, 2007 at 03:27:57PM -0400, Bill Moran wrote: > I've had similar experience. One thing you didn't mention that I've noticed > is that VACUUM FULL often bloats indexes. I've made it SOP that > after application upgrades (which usually includes lots of ALTER TABLES and > other massive schema and data changes) I VACUUM FULL and REINDEX (in that > order). You'd be better off with a CLUSTER in that case. It'll be faster, and you'll ensure that the table has optimal ordering. > Lots of ALTER TABLEs seem to bloat the database size considerably, beyond > what normal VACUUM seems to fix. A FULL seems to fix that, but it appears > to bloat the indexes, thus a REINDEX helps. Hrm, are you sure that's still true? I just did an ALTER TABLE ... TYPE and it created a new file, meaning no bloating. > I would expect that setting fillfactor to 100 will encourage indexs to bloat > faster, and would only be recommended if you didn't expect the index contents > to change? Yes. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgp4WeZLcqCcp.pgp Description: PGP signature
Re: [PERFORM] How to ENABLE SQL capturing???
we currently have logging enabled for all queries over 100ms, and keep the last 24 hours of logs before we rotate them. I've found this tool very helpful in diagnosing new performance problems that crop up: http://pgfouine.projects.postgresql.org/ Bryan On 8/8/07, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote: > On Wed, Aug 08, 2007 at 01:02:24PM -0700, smiley2211 wrote: > > I am trying to enable capturing of the submitted code via an > > application...how do I do this in Postgres? Performance is SLOW on my > > server and I have autovacuum enabled as well as rebuilt indexes...whatelse > > should be looked at? > > Try "log_min_duration_statement = 100" in postgresql.conf; it will show all > statements that take more than 100ms. Set to 0 to log _all_ statements, or > -1 to turn the logging back off. > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] When/if to Reindex
Bill Moran <[EMAIL PROTECTED]> writes: > In response to "Steven Flatt" <[EMAIL PROTECTED]>: >> What's interesting is that an insert-only table can benefit significantly >> from reindexing after the table is fully loaded. > I've had similar experience. One thing you didn't mention that I've noticed > is that VACUUM FULL often bloats indexes. I've made it SOP that > after application upgrades (which usually includes lots of ALTER TABLES and > other massive schema and data changes) I VACUUM FULL and REINDEX (in that > order). Actually, if that is your intent then the best plan is: drop indexes, VACUUM FULL, create indexes from scratch. A huge proportion of VACUUM FULL's time goes into updating the indexes, and that work is basically wasted if you are going to reindex afterwards. CLUSTER is a good substitute for V.F. partly because it doesn't try to update the indexes incrementally, but just does the equivalent of REINDEX after it's reordered the heap. I'd make the same remark about Steven's case: if possible, don't create the indexes at all until you've loaded the table fully. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] mid 2007 "best bang for the buck" hardware opinions
I saw an interesting topic in the archives on best bang for the buck for $20k.. about a year old now. So whats the thoughts on a current combined rack/disks/cpu combo around the $10k-$15k point, currently? I can configure up a Dell poweredge 2900 for $9k, but am wondering if I'm missing out on something better. ( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats 8 cores total, with 24gb of memory and 8x 146gb 15k serial attached scsi connected, raid10, attached to a perc 5/i with redundant power supplies, all in what looks like a 3u chassis). But the system is pretty maxxed out like that, no room for future expansion. better options? or a better balance for a pure db box that is mostly smaller reads and large indexes? This would be to go into a rack with existing but older equipment that can be warm standby so I don't have to split the cost here for getting redundancy. thanks for any 2007 advice! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] mid 2007 "best bang for the buck" hardware opinions
On 8/8/07, justin <[EMAIL PROTECTED]> wrote: > I saw an interesting topic in the archives on best bang for the buck for > $20k.. about a year old now. > > So whats the thoughts on a current combined rack/disks/cpu combo around > the $10k-$15k point, currently? > > I can configure up a Dell poweredge 2900 for $9k, but am wondering if > I'm missing out on something better. > ( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats > 8 cores total, with 24gb of memory > and 8x 146gb 15k serial attached scsi connected, raid10, attached to a > perc 5/i with redundant power supplies, > all in what looks like a 3u chassis). But the system is pretty maxxed > out like that, no room for future expansion. > > better options? or a better balance for a pure db box that is mostly > smaller reads and large indexes? That's not much kit for $20k. I went to www.aberdeeninc.com and speced out a box with 24 750G barracudas, battery backed cache RAID and dual Quad core 2.66GHz xeons, and 16 Gigs of ram for $15k. Another 16gigs of ram would probably put it around 18k or so, but they didn't have the config option enabled there. I consider Dells to be mediocre hardware with mediocre support. There are lots of medium sized integration shops out there that make better servers with lots more storage for a lot less. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] mid 2007 "best bang for the buck" hardware opinions
No it wouldn't be much kit for $20k but that example is currently coming in at $9k ... (the $20k referred to is last years topic). I think I can spend up to $15k but it would have to be clearly faster/better/more expandable than this config. Or I can spend $9k with someone else if I can convince myself it is just a better option. The sense that there might be better options out there, I've no doubt in.. it is why I posted hoping for some solid leads on what & why. Scott Marlowe wrote: On 8/8/07, justin <[EMAIL PROTECTED]> wrote: I saw an interesting topic in the archives on best bang for the buck for $20k.. about a year old now. So whats the thoughts on a current combined rack/disks/cpu combo around the $10k-$15k point, currently? I can configure up a Dell poweredge 2900 for $9k, but am wondering if I'm missing out on something better. ( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats 8 cores total, with 24gb of memory and 8x 146gb 15k serial attached scsi connected, raid10, attached to a perc 5/i with redundant power supplies, all in what looks like a 3u chassis). But the system is pretty maxxed out like that, no room for future expansion. better options? or a better balance for a pure db box that is mostly smaller reads and large indexes? That's not much kit for $20k. I went to www.aberdeeninc.com and speced out a box with 24 750G barracudas, battery backed cache RAID and dual Quad core 2.66GHz xeons, and 16 Gigs of ram for $15k. Another 16gigs of ram would probably put it around 18k or so, but they didn't have the config option enabled there. I consider Dells to be mediocre hardware with mediocre support. There are lots of medium sized integration shops out there that make better servers with lots more storage for a lot less. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster