Re: [PERFORM] Update table performance

2007-08-08 Thread Heikki Linnakangas
Erik Jones wrote:
> Decibel! wrote:
>> I should mention that if you can handle splitting the
>> update into multiple transactions, that will help a
>> lot since it means you won't be doubling the size of
>> the table.
> 
> As I mentioned above, when you do an update you're actually inserting a
> new row and deleting the old one.  That deleted row is still considered
> part of the table (for reasons of concurrency, read up on the
> concurrency chapter in the manual for the details) and once it is no
> longer visible by any live transactions can be re-used by future
> inserts.  So, if you update one column on every row of a one million row
> table all at once, you have to allocate and write out one million new
> rows.  But, if you do the update a quarter million at a time, the last
> three updates would be able to re-use many of the rows deleted in
> earlier updates.

Only if you vacuum between the updates.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Update table performance

2007-08-08 Thread Erik Jones

On Aug 8, 2007, at 3:00 AM, Heikki Linnakangas wrote:


Erik Jones wrote:

Decibel! wrote:

I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.


As I mentioned above, when you do an update you're actually  
inserting a
new row and deleting the old one.  That deleted row is still  
considered

part of the table (for reasons of concurrency, read up on the
concurrency chapter in the manual for the details) and once it is no
longer visible by any live transactions can be re-used by future
inserts.  So, if you update one column on every row of a one  
million row

table all at once, you have to allocate and write out one million new
rows.  But, if you do the update a quarter million at a time, the  
last

three updates would be able to re-use many of the rows deleted in
earlier updates.


Only if you vacuum between the updates.


This is true.  In fact, the chapter on Routine Database Maintenance  
tasks that discusses vacuuming explains all of this.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Update table performance

2007-08-08 Thread Decibel!
On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote:
> Vertical partitioning is where you split up your table on disk by  
> columns, i.e on the vertical lines.  He quoted it because Postgres  
> doesn't actually support it transparently but you can always fake it  
> by splitting up your table.  For example, given the following table  
> wherein column bar gets updated a lot but the others don't:
> 
> create table foo (
> idint not null,
> bar   int,
> baz   int,
> 
> primary key (id)
> );
> 
> You could split it up like so:
> 
> create table foo_a (
> idint,
> baz   int,
> 
> primary key (id)
> );
> 
> create table foo_b (
> foo_idint,
> bar   int,
> 
> foreign key foo_a_id (foo_id) references foo_a (id)
> );

FWIW, the cases where I've actually used this have been on much wider
tables, and a number of the attributes are in-frequently accessed. An
example would be if you keep snail-mail address info for users; you
probably don't use those fields very often, so they would be good
candidates for going into a second table.

When does it actually make sense to use this? When you do a *lot* with a
small number of fields in the table. In this example, perhaps you very
frequently need to look up either user_name or user_id, probably via
joins. Having a table with just name, id, perhaps password and a few
other fields might add up to 50 bytes per row (with overhead), while
address information by itself could easily be 50 bytes. So by pushing
that out to another table, you cut the size of the main table in half.
That means more efficient use of cache, faster seqscans, etc.

The case Erik is describing is more unique to PostgreSQL and how it
handles MVCC. In some cases, splitting a frequently updated row out to a
separate table might not gain as much once we get HOT, but it's still a
good tool to consider. Depending on what you're doing another useful
technique is to not update the field as often by logging updates to be
performed into a separate table and periodically processing that
information into the main table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpjuxIIyA3eO.pgp
Description: PGP signature


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Vivek Khera


On Jul 18, 2007, at 1:08 PM, Steven Flatt wrote:

Some background: we make extensive use of partitioned tables.  In  
fact, I'm
really only considering reindexing partitions that have "just  
closed".  In
our simplest/most general case, we have a table partitioned by a  
timestamp
column, each partition 24 hours wide.  The partition will have an  
index on
the timestamp column as well as a few other indexes including a  
primary key


If all you ever did was insert into that table, then you probably  
don't need to reindex.  If you did mass updates/deletes mixed with  
your inserts, then perhaps you do.


Do some experiments comparing pg_class.relpages for your table and  
its indexes before and after a reindex.  Decide if the number of  
pages you save on the index is worth the trouble.  If it shaves off  
just a handful of pages, I'd vote no...


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Steven Flatt
On 8/8/07, Vivek Khera <[EMAIL PROTECTED]> wrote:
>
> If all you ever did was insert into that table, then you probably
> don't need to reindex.  If you did mass updates/deletes mixed with
> your inserts, then perhaps you do.
>
> Do some experiments comparing pg_class.relpages for your table and
> its indexes before and after a reindex.  Decide if the number of
> pages you save on the index is worth the trouble.  If it shaves off
> just a handful of pages, I'd vote no...


What's interesting is that an insert-only table can benefit significantly
from reindexing after the table is fully loaded.  I had done experiments
exactly as you suggest (looking at pg_class.relpages), and determined that
reindexing results in about a 30% space savings for all indexes except the
PK index.  The PK index (integer based on a sequence) does not benefit at
all.  By setting fillfactor=100 on the index prior to reindexing, I get
another 10% space savings on all the indexes.

Not to mention the general performance improvements when reading from the
table...

So, we decided that reindexing partitions after they're fully loaded *was*
worth it.

Steve


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Bill Moran
In response to "Steven Flatt" <[EMAIL PROTECTED]>:

> On 8/8/07, Vivek Khera <[EMAIL PROTECTED]> wrote:
> >
> > If all you ever did was insert into that table, then you probably
> > don't need to reindex.  If you did mass updates/deletes mixed with
> > your inserts, then perhaps you do.
> >
> > Do some experiments comparing pg_class.relpages for your table and
> > its indexes before and after a reindex.  Decide if the number of
> > pages you save on the index is worth the trouble.  If it shaves off
> > just a handful of pages, I'd vote no...
> 
> 
> What's interesting is that an insert-only table can benefit significantly
> from reindexing after the table is fully loaded.  I had done experiments
> exactly as you suggest (looking at pg_class.relpages), and determined that
> reindexing results in about a 30% space savings for all indexes except the
> PK index.  The PK index (integer based on a sequence) does not benefit at
> all.  By setting fillfactor=100 on the index prior to reindexing, I get
> another 10% space savings on all the indexes.
> 
> Not to mention the general performance improvements when reading from the
> table...
> 
> So, we decided that reindexing partitions after they're fully loaded *was*
> worth it.

I've had similar experience.  One thing you didn't mention that I've noticed
is that VACUUM FULL often bloats indexes.  I've made it SOP that
after application upgrades (which usually includes lots of ALTER TABLES and
other massive schema and data changes) I VACUUM FULL and REINDEX (in that
order).

Lots of ALTER TABLEs seem to bloat the database size considerably, beyond
what normal VACUUM seems to fix.  A FULL seems to fix that, but it appears
to bloat the indexes, thus a REINDEX helps.

I would expect that setting fillfactor to 100 will encourage indexs to bloat
faster, and would only be recommended if you didn't expect the index contents
to change?

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] How to ENABLE SQL capturing???

2007-08-08 Thread smiley2211

Hello all,

I am trying to enable capturing of the submitted code via an
application...how do I do this in Postgres?  Performance is SLOW on my
server and I have autovacuum enabled as well as rebuilt indexes...whatelse
should be looked at?

Thanks...Michelle
-- 
View this message in context: 
http://www.nabble.com/How-to-ENABLE-SQL-capturingtf4238694.html#a12060736
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to ENABLE SQL capturing???

2007-08-08 Thread Steinar H. Gunderson
On Wed, Aug 08, 2007 at 01:02:24PM -0700, smiley2211 wrote:
> I am trying to enable capturing of the submitted code via an
> application...how do I do this in Postgres?  Performance is SLOW on my
> server and I have autovacuum enabled as well as rebuilt indexes...whatelse
> should be looked at?

Try "log_min_duration_statement = 100" in postgresql.conf; it will show all
statements that take more than 100ms. Set to 0 to log _all_ statements, or
-1 to turn the logging back off.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Decibel!
On Wed, Aug 08, 2007 at 03:27:57PM -0400, Bill Moran wrote:
> I've had similar experience.  One thing you didn't mention that I've noticed
> is that VACUUM FULL often bloats indexes.  I've made it SOP that
> after application upgrades (which usually includes lots of ALTER TABLES and
> other massive schema and data changes) I VACUUM FULL and REINDEX (in that
> order).

You'd be better off with a CLUSTER in that case. It'll be faster, and
you'll ensure that the table has optimal ordering.

> Lots of ALTER TABLEs seem to bloat the database size considerably, beyond
> what normal VACUUM seems to fix.  A FULL seems to fix that, but it appears
> to bloat the indexes, thus a REINDEX helps.

Hrm, are you sure that's still true? I just did an ALTER TABLE ... TYPE
and it created a new file, meaning no bloating.

> I would expect that setting fillfactor to 100 will encourage indexs to bloat
> faster, and would only be recommended if you didn't expect the index contents
> to change?

Yes.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp4WeZLcqCcp.pgp
Description: PGP signature


Re: [PERFORM] How to ENABLE SQL capturing???

2007-08-08 Thread Bryan Murphy
we currently have logging enabled for all queries over 100ms, and keep
the last 24 hours of logs before we rotate them.  I've found this tool
very helpful in diagnosing new performance problems that crop up:

http://pgfouine.projects.postgresql.org/

Bryan

On 8/8/07, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote:
> On Wed, Aug 08, 2007 at 01:02:24PM -0700, smiley2211 wrote:
> > I am trying to enable capturing of the submitted code via an
> > application...how do I do this in Postgres?  Performance is SLOW on my
> > server and I have autovacuum enabled as well as rebuilt indexes...whatelse
> > should be looked at?
>
> Try "log_min_duration_statement = 100" in postgresql.conf; it will show all
> statements that take more than 100ms. Set to 0 to log _all_ statements, or
> -1 to turn the logging back off.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Tom Lane
Bill Moran <[EMAIL PROTECTED]> writes:
> In response to "Steven Flatt" <[EMAIL PROTECTED]>:
>> What's interesting is that an insert-only table can benefit significantly
>> from reindexing after the table is fully loaded.

> I've had similar experience.  One thing you didn't mention that I've noticed
> is that VACUUM FULL often bloats indexes.  I've made it SOP that
> after application upgrades (which usually includes lots of ALTER TABLES and
> other massive schema and data changes) I VACUUM FULL and REINDEX (in that
> order).

Actually, if that is your intent then the best plan is: drop indexes,
VACUUM FULL, create indexes from scratch.  A huge proportion of VACUUM
FULL's time goes into updating the indexes, and that work is basically
wasted if you are going to reindex afterwards.

CLUSTER is a good substitute for V.F. partly because it doesn't try to
update the indexes incrementally, but just does the equivalent of
REINDEX after it's reordered the heap.

I'd make the same remark about Steven's case: if possible, don't create
the indexes at all until you've loaded the table fully.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] mid 2007 "best bang for the buck" hardware opinions

2007-08-08 Thread justin
I saw an interesting topic in the archives on best bang for the buck for 
$20k.. about a year old now.


So whats the thoughts on a current combined rack/disks/cpu combo around 
the $10k-$15k point, currently?


I can configure up a Dell poweredge 2900 for $9k, but am wondering if 
I'm missing out on something better.
( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats 
8 cores total, with 24gb of memory
and 8x 146gb 15k serial attached scsi connected, raid10, attached to a 
perc 5/i with redundant power supplies,
all in what looks like a 3u chassis). But the system is pretty maxxed 
out like that, no room for future expansion.


better options? or a better balance for a pure db box that is mostly 
smaller reads and large indexes?


This would be to go into a rack with existing but older equipment that 
can be warm standby

so I don't have to split the cost here for getting redundancy.

thanks for any 2007 advice!


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] mid 2007 "best bang for the buck" hardware opinions

2007-08-08 Thread Scott Marlowe
On 8/8/07, justin <[EMAIL PROTECTED]> wrote:
> I saw an interesting topic in the archives on best bang for the buck for
> $20k.. about a year old now.
>
> So whats the thoughts on a current combined rack/disks/cpu combo around
> the $10k-$15k point, currently?
>
> I can configure up a Dell poweredge 2900 for $9k, but am wondering if
> I'm missing out on something better.
> ( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats
> 8 cores total, with 24gb of memory
> and 8x 146gb 15k serial attached scsi connected, raid10, attached to a
> perc 5/i with redundant power supplies,
> all in what looks like a 3u chassis). But the system is pretty maxxed
> out like that, no room for future expansion.
>
> better options? or a better balance for a pure db box that is mostly
> smaller reads and large indexes?

That's not much kit for $20k.

I went to www.aberdeeninc.com and speced out a box with 24 750G
barracudas, battery backed cache RAID and dual Quad core 2.66GHz
xeons, and 16 Gigs of ram for $15k.  Another 16gigs of ram would
probably put it around 18k or so, but they didn't have the config
option enabled there.

I consider Dells to be mediocre hardware with mediocre support.

There are lots of medium sized integration shops out there that make
better servers with lots more storage for a lot less.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] mid 2007 "best bang for the buck" hardware opinions

2007-08-08 Thread justin

No it wouldn't be much kit for $20k

but that example is currently coming in at $9k ... (the $20k referred to 
is last years topic).


I think I can spend up to $15k but it would have to be clearly 
faster/better/more expandable than this config.
Or I can spend $9k with someone else if I can convince myself it is just 
a better option.


The sense that there might be better options out there, I've no doubt in..
it is why I posted hoping for some solid leads on what & why.

Scott Marlowe wrote:

On 8/8/07, justin <[EMAIL PROTECTED]> wrote:
  

I saw an interesting topic in the archives on best bang for the buck for
$20k.. about a year old now.

So whats the thoughts on a current combined rack/disks/cpu combo around
the $10k-$15k point, currently?

I can configure up a Dell poweredge 2900 for $9k, but am wondering if
I'm missing out on something better.
( 9k spent with mr dell that gets you 2x quad core xeon X535s, so thats
8 cores total, with 24gb of memory
and 8x 146gb 15k serial attached scsi connected, raid10, attached to a
perc 5/i with redundant power supplies,
all in what looks like a 3u chassis). But the system is pretty maxxed
out like that, no room for future expansion.

better options? or a better balance for a pure db box that is mostly
smaller reads and large indexes?



That's not much kit for $20k.

I went to www.aberdeeninc.com and speced out a box with 24 750G
barracudas, battery backed cache RAID and dual Quad core 2.66GHz
xeons, and 16 Gigs of ram for $15k.  Another 16gigs of ram would
probably put it around 18k or so, but they didn't have the config
option enabled there.

I consider Dells to be mediocre hardware with mediocre support.

There are lots of medium sized integration shops out there that make
better servers with lots more storage for a lot less.

  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster