Re: [PERFORM] Update table performance

2007-08-09 Thread Piotr Kołaczkowski
Hi,

update valley set test='this is a test'

Such query updates ALL of your records in the table. 
5 million records * 47 fields - that can be several gigabytes of data.
The system has to scan that gigabytes to change every record. This is a huge 
task. Try vacuuming and see if it helps. It can help a lot, if you perform 
such 'whole table updates' often.

Best regards,
Piotr Kolaczkowski

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Update table performance

2007-08-09 Thread Kenneth Marshall
Mark,

You are not alone in the fact that when you post your system
specifications, CPU and memory are always listed while the
disk I/O subsystem invariably is not. This is a very disk
intensive operation and I suspect that your disk system is
maxed-out. If you want it faster, you will need more I/O
capacity.

Regards,
Ken

On Tue, Aug 07, 2007 at 05:58:35AM -0700, Mark Makarowsky wrote:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
> 
> update valley set test='this is a test'
> 
> This took 905641 ms.  Isn't that kind of slow?  There
> aren't any indexes, triggers, constraints or anything
> on this table.  The version of Postgres is "PostgreSQL
> 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)".  The operating
> environment is Windows 2003 Standard Edition w/service
> pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
> the results from Explain:
> 
> "Seq Scan on valley  (cost=0.00..1034083.57
> rows=4897257 width=601)"
> 
> Here are the settings in the postgresql.conf.  Any

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Update table performance

2007-08-09 Thread Decibel!
On Thu, Aug 09, 2007 at 06:04:09PM +0530, Merlin Moncure wrote:
> That said, 'update' is the slowest operation for postgresql relative
> to other databases that are not MVCC.

Actually, it depends on how you do MVCC. In Oracle, DELETE is actually
the most expensive operation, because they have to not only remove the
row from the heap, they have to copy it to the undo log. And they need
to do something with indexes as well. Whereas we just update 4 bytes in
the heap and that's it.

An UPDATE in Oracle OTOH just needs to store whatever fields have
changed in the undo log. If you haven't messed with indexed fields, it
doesn't have to touch those either.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgplqWULgqzjL.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-09 Thread Trevor Talbot
On 8/9/07, Michael Stone <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 09, 2007 at 06:04:09PM +0530, Merlin Moncure wrote:
> >keep an eye for the HOT feature which will hopefully make 8.3 that
> >will highly reduce the penalty for (small) updates in many cases.
>
> Is there an overview somewhere about how this feature works and what it
> is expected to do? There have been a lot of references to it over time,
> and it's possible to understand it if you follow list traffic over time,
> but starting cold it's hard to know what it is. The name was poorly
> chosen as far as google is concerned. :)

This is what I found when I went looking for info earlier:
http://archives.postgresql.org/pgsql-patches/2007-07/msg00142.php
http://archives.postgresql.org/pgsql-patches/2007-07/msg00360.php

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Update table performance

2007-08-09 Thread Michael Stone

On Thu, Aug 09, 2007 at 06:04:09PM +0530, Merlin Moncure wrote:

keep an eye for the HOT feature which will hopefully make 8.3 that
will highly reduce the penalty for (small) updates in many cases.


Is there an overview somewhere about how this feature works and what it 
is expected to do? There have been a lot of references to it over time, 
and it's possible to understand it if you follow list traffic over time, 
but starting cold it's hard to know what it is. The name was poorly 
chosen as far as google is concerned. :)


Mike Stone

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Update table performance

2007-08-09 Thread Merlin Moncure
On 8/8/07, Mark Makarowsky <[EMAIL PROTECTED]> wrote:
> Can you provide more detail on what you mean by your
> two suggestions below:
>
> Yeah, I've used "vertical partitioning" very
> successfully in the past, though I've never done it
> for just a single field. I'll typically leave the few
> most common fields in the "main" table and pull
> everything else into a second table.
>
> I should mention that if you can handle splitting the
> update into multiple transactions, that will help a
> lot since it means you won't be doubling the size of
> the table.
>
> I guess I was just surprised by the speed it takes to
> update the field in Postgres since on an almost
> identical table in FoxPro (400,000 records less), it
> updates the table with the same exact update table
> statement in about 4 minutes.

FoxPro is a single process DBF based system with some sql access.
When you update th records, it updates them in place since all the
records are fixed size and padded.  Be careful with this
comparison...while certain operations like the above may feel faster,
the locking in foxpro is extremely crude compared to PostgreSQL.
There are many other things about dbf systems in general which are
pretty lousy from performance perspective.

That said, 'update' is the slowest operation for postgresql relative
to other databases that are not MVCC.  This is balanced by extremely
efficient locking and good performance under multi user loads.
PostgreSQL likes to be used a certain way...you will find that when
used properly it is extremely fast.

keep an eye for the HOT feature which will hopefully make 8.3 that
will highly reduce the penalty for (small) updates in many cases.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Update table performance

2007-08-08 Thread Decibel!
On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote:
> Vertical partitioning is where you split up your table on disk by  
> columns, i.e on the vertical lines.  He quoted it because Postgres  
> doesn't actually support it transparently but you can always fake it  
> by splitting up your table.  For example, given the following table  
> wherein column bar gets updated a lot but the others don't:
> 
> create table foo (
> idint not null,
> bar   int,
> baz   int,
> 
> primary key (id)
> );
> 
> You could split it up like so:
> 
> create table foo_a (
> idint,
> baz   int,
> 
> primary key (id)
> );
> 
> create table foo_b (
> foo_idint,
> bar   int,
> 
> foreign key foo_a_id (foo_id) references foo_a (id)
> );

FWIW, the cases where I've actually used this have been on much wider
tables, and a number of the attributes are in-frequently accessed. An
example would be if you keep snail-mail address info for users; you
probably don't use those fields very often, so they would be good
candidates for going into a second table.

When does it actually make sense to use this? When you do a *lot* with a
small number of fields in the table. In this example, perhaps you very
frequently need to look up either user_name or user_id, probably via
joins. Having a table with just name, id, perhaps password and a few
other fields might add up to 50 bytes per row (with overhead), while
address information by itself could easily be 50 bytes. So by pushing
that out to another table, you cut the size of the main table in half.
That means more efficient use of cache, faster seqscans, etc.

The case Erik is describing is more unique to PostgreSQL and how it
handles MVCC. In some cases, splitting a frequently updated row out to a
separate table might not gain as much once we get HOT, but it's still a
good tool to consider. Depending on what you're doing another useful
technique is to not update the field as often by logging updates to be
performed into a separate table and periodically processing that
information into the main table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpjuxIIyA3eO.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-08 Thread Erik Jones

On Aug 8, 2007, at 3:00 AM, Heikki Linnakangas wrote:


Erik Jones wrote:

Decibel! wrote:

I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.


As I mentioned above, when you do an update you're actually  
inserting a
new row and deleting the old one.  That deleted row is still  
considered

part of the table (for reasons of concurrency, read up on the
concurrency chapter in the manual for the details) and once it is no
longer visible by any live transactions can be re-used by future
inserts.  So, if you update one column on every row of a one  
million row

table all at once, you have to allocate and write out one million new
rows.  But, if you do the update a quarter million at a time, the  
last

three updates would be able to re-use many of the rows deleted in
earlier updates.


Only if you vacuum between the updates.


This is true.  In fact, the chapter on Routine Database Maintenance  
tasks that discusses vacuuming explains all of this.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Update table performance

2007-08-08 Thread Heikki Linnakangas
Erik Jones wrote:
> Decibel! wrote:
>> I should mention that if you can handle splitting the
>> update into multiple transactions, that will help a
>> lot since it means you won't be doubling the size of
>> the table.
> 
> As I mentioned above, when you do an update you're actually inserting a
> new row and deleting the old one.  That deleted row is still considered
> part of the table (for reasons of concurrency, read up on the
> concurrency chapter in the manual for the details) and once it is no
> longer visible by any live transactions can be re-used by future
> inserts.  So, if you update one column on every row of a one million row
> table all at once, you have to allocate and write out one million new
> rows.  But, if you do the update a quarter million at a time, the last
> three updates would be able to re-use many of the rows deleted in
> earlier updates.

Only if you vacuum between the updates.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Update table performance

2007-08-07 Thread Erik Jones

On Aug 7, 2007, at 6:13 PM, Mark Makarowsky wrote:


Can you provide more detail on what you mean by your
two suggestions below:

Yeah, I've used "vertical partitioning" very
successfully in the past, though I've never done it
for just a single field. I'll typically leave the few
most common fields in the "main" table and pull
everything else into a second table.


Vertical partitioning is where you split up your table on disk by  
columns, i.e on the vertical lines.  He quoted it because Postgres  
doesn't actually support it transparently but you can always fake it  
by splitting up your table.  For example, given the following table  
wherein column bar gets updated a lot but the others don't:


create table foo (
id  int not null,
bar int,
baz int,

primary key (id)
);

You could split it up like so:

create table foo_a (
id  int,
baz int,

primary key (id)
);

create table foo_b (
foo_id  int,
bar int,

foreign key foo_a_id (foo_id) references foo_a (id)
);

The reason you'd ever want to do this is that when Postgres goes to  
update a row what it actually does is inserts a new row with the new  
value(s) that you changed and marks the old one as deleted.  So, if  
you have a wide table and frequently update only certain columns,  
you'll take a performance hit as you're having to re-write a lot of  
static values.




I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.


As I mentioned above, when you do an update you're actually inserting  
a new row and deleting the old one.  That deleted row is still  
considered part of the table (for reasons of concurrency, read up on  
the concurrency chapter in the manual for the details) and once it is  
no longer visible by any live transactions can be re-used by future  
inserts.  So, if you update one column on every row of a one million  
row table all at once, you have to allocate and write out one million  
new rows.  But, if you do the update a quarter million at a time, the  
last three updates would be able to re-use many of the rows deleted  
in earlier updates.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Update table performance

2007-08-07 Thread Mark Makarowsky
Can you provide more detail on what you mean by your
two suggestions below:

Yeah, I've used "vertical partitioning" very
successfully in the past, though I've never done it
for just a single field. I'll typically leave the few
most common fields in the "main" table and pull
everything else into a second table.

I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.

I guess I was just surprised by the speed it takes to
update the field in Postgres since on an almost
identical table in FoxPro (400,000 records less), it
updates the table with the same exact update table
statement in about 4 minutes.
--- Decibel! <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 07, 2007 at 02:36:18PM -0500, Scott
> Marlowe wrote:
> > On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > > On Tue, Aug 07, 2007 at 02:33:19PM +0100,
> Richard Huxton wrote:
> > > > Mark Makarowsky wrote:
> > > > >I have a table with 4,889,820 records in it. 
> The
> > > > >table also has 47 fields.  I'm having
> problems with
> > > > >update performance.  Just as a test, I issued
> the
> > > > >following update:
> > > > >
> > > > >update valley set test='this is a test'
> > > > >
> > > > >This took 905641 ms.  Isn't that kind of
> slow?
> > > >
> > > > The limiting factor here will be how fast you
> can write to your disk.
> > >
> > > Well, very possibly how fast you can read, too.
> Using your assumption of
> > > 1k per row, 5M rows means 5G of data, which
> might well not fit in
> > > memory. And if the entire table's been updated
> just once before, even
> > > with vacuuming you're now at 10G of data.
> > 
> > Where one might have to update just one column of
> a wide table often,
> > it's often a good idea to move that column into
> its own dependent
> > table.
> 
> Yeah, I've used "vertical partitioning" very
> successfully in the past,
> though I've never done it for just a single field.
> I'll typically leave
> the few most common fields in the "main" table and
> pull everything else
> into a second table.
> 
> > Or just don't update one column of every row in 
> table...
> 
> Yeah, that too. :) Though sometimes you can't avoid
> it.
> 
> I should mention that if you can handle splitting
> the update into
> multiple transactions, that will help a lot since it
> means you won't be
> doubling the size of the table.
> -- 
> Decibel!, aka Jim Nasby   
> [EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com 
> 512.569.9461 (cell)
> 



  

Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:36:18PM -0500, Scott Marlowe wrote:
> On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> > > Mark Makarowsky wrote:
> > > >I have a table with 4,889,820 records in it.  The
> > > >table also has 47 fields.  I'm having problems with
> > > >update performance.  Just as a test, I issued the
> > > >following update:
> > > >
> > > >update valley set test='this is a test'
> > > >
> > > >This took 905641 ms.  Isn't that kind of slow?
> > >
> > > The limiting factor here will be how fast you can write to your disk.
> >
> > Well, very possibly how fast you can read, too. Using your assumption of
> > 1k per row, 5M rows means 5G of data, which might well not fit in
> > memory. And if the entire table's been updated just once before, even
> > with vacuuming you're now at 10G of data.
> 
> Where one might have to update just one column of a wide table often,
> it's often a good idea to move that column into its own dependent
> table.

Yeah, I've used "vertical partitioning" very successfully in the past,
though I've never done it for just a single field. I'll typically leave
the few most common fields in the "main" table and pull everything else
into a second table.

> Or just don't update one column of every row in  table...

Yeah, that too. :) Though sometimes you can't avoid it.

I should mention that if you can handle splitting the update into
multiple transactions, that will help a lot since it means you won't be
doubling the size of the table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp9IchIPHEK0.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Scott Marlowe
On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> > Mark Makarowsky wrote:
> > >I have a table with 4,889,820 records in it.  The
> > >table also has 47 fields.  I'm having problems with
> > >update performance.  Just as a test, I issued the
> > >following update:
> > >
> > >update valley set test='this is a test'
> > >
> > >This took 905641 ms.  Isn't that kind of slow?
> >
> > The limiting factor here will be how fast you can write to your disk.
>
> Well, very possibly how fast you can read, too. Using your assumption of
> 1k per row, 5M rows means 5G of data, which might well not fit in
> memory. And if the entire table's been updated just once before, even
> with vacuuming you're now at 10G of data.

Where one might have to update just one column of a wide table often,
it's often a good idea to move that column into its own dependent
table.

Or just don't update one column of every row in  table...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update table performance

2007-08-07 Thread Chris Browne
[EMAIL PROTECTED] (Mark Makarowsky) writes:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  There
> aren't any indexes, triggers, constraints or anything
> on this table.  The version of Postgres is "PostgreSQL
> 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)".  The operating
> environment is Windows 2003 Standard Edition w/service
> pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
> the results from Explain:
>
> "Seq Scan on valley  (cost=0.00..1034083.57
> rows=4897257 width=601)"
>
> Here are the settings in the postgresql.conf.  Any
> ideas or is this the expected speed?

Hmm.  

- You asked to update 4,889,820 records.

- It's a table consisting of 8.5GB of data (based on the cost info)

For this to take 15 minutes doesn't seem particularly outrageous.
-- 
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/oses.html
Rules of the Evil Overlord #65.  "If I must have computer systems with
publically available  terminals, the maps  they display of  my complex
will have  a room clearly marked  as the Main Control  Room. That room
will be  the Execution Chamber. The  actual main control  room will be
marked as Sewage Overflow Containment." 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> Mark Makarowsky wrote:
> >I have a table with 4,889,820 records in it.  The
> >table also has 47 fields.  I'm having problems with
> >update performance.  Just as a test, I issued the
> >following update:
> >
> >update valley set test='this is a test'
> >
> >This took 905641 ms.  Isn't that kind of slow?
> 
> The limiting factor here will be how fast you can write to your disk. 

Well, very possibly how fast you can read, too. Using your assumption of
1k per row, 5M rows means 5G of data, which might well not fit in
memory. And if the entire table's been updated just once before, even
with vacuuming you're now at 10G of data.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpC7cdBrUEbs.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Alan Hodgson
On Tuesday 07 August 2007 05:58, Mark Makarowsky 
<[EMAIL PROTECTED]> wrote:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  

PostgreSQL has to write a full new version of every row that gets updated. 
Updates are, therefore, relatively slow. 

I'm guessing you're doing this on a single SATA drive, too, which probably 
doesn't help.

-- 
"If a nation values anything more than freedom, it will lose its freedom;
and the irony of it is that if it is comfort or money that it values more,
it will lose that too." -- Somerset Maugham, Author


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Update table performance

2007-08-07 Thread Scott Marlowe
On 8/7/07, Mark Makarowsky <[EMAIL PROTECTED]> wrote:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  There
> aren't any indexes, triggers, constraints or anything
> on this table.  The version of Postgres is "PostgreSQL
> 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)".  The operating
> environment is Windows 2003 Standard Edition w/service
> pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
> the results from Explain:
>
> "Seq Scan on valley  (cost=0.00..1034083.57
> rows=4897257 width=601)"

Have you done this a few times?  You could easily have a very large
and bloated table if you do this several times in a row.  That would
explain the slow performance.  If you're going to do a lot of updates
without where clauses on large tables, you'll need to run a vacuum
right afterwards to clean things up.

I see that you included a lot about your machine, but you didn't
include any specs on your disk subsystem.  When it comes to update
speed, the disk subsystem is probably the most important part.

Note also that Windows is still not the preferred platform for
postgresql from a performance perspective (actually, the only database
where that's true is MS-SQL really).

Have you run any benchmarks on your disk subsystem to see how fast it is?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update table performance

2007-08-07 Thread Richard Huxton

Mark Makarowsky wrote:

I have a table with 4,889,820 records in it.  The
table also has 47 fields.  I'm having problems with
update performance.  Just as a test, I issued the
following update:

update valley set test='this is a test'

This took 905641 ms.  Isn't that kind of slow?


The limiting factor here will be how fast you can write to your disk. 
Let's see: 5 million rows in ~900 seconds, that's about 5500 
rows/second. Now, you don't say  how large your rows are, but assuming 
each row is say 1kB that'd be 5.5MB/sec - or not brilliant. Simplest way 
to find out total activity is check how much disk space PG is using 
before and after the update.


What you'll need to do is monitor disk activity, in particular how many 
writes and how much time the processor spends waiting on writes to complete.


If your standard usage pattern is to update a single field in all rows 
of a large table, then performance isn't going to be sub-second I'm afraid.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Update table performance

2007-08-07 Thread Mark Makarowsky
I have a table with 4,889,820 records in it.  The
table also has 47 fields.  I'm having problems with
update performance.  Just as a test, I issued the
following update:

update valley set test='this is a test'

This took 905641 ms.  Isn't that kind of slow?  There
aren't any indexes, triggers, constraints or anything
on this table.  The version of Postgres is "PostgreSQL
8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)".  The operating
environment is Windows 2003 Standard Edition w/service
pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
the results from Explain:

"Seq Scan on valley  (cost=0.00..1034083.57
rows=4897257 width=601)"

Here are the settings in the postgresql.conf.  Any
ideas or is this the expected speed?

#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

listen_addresses = '*'  # what IP address(es) to
listen on; 
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
port = 5432 # (change requires restart)
max_connections = 20# (change requires restart)
# Note: increasing max_connections costs ~400 bytes of
shared memory per 
# connection slot, plus lock space (see
max_locks_per_transaction).  You
# might also need to raise shared_buffers to support
more connections.
#superuser_reserved_connections = 3 # (change requires
restart)
#unix_socket_directory = '' # (change requires
restart)
#unix_socket_group = '' # (change requires restart)
#unix_socket_permissions = 0777 # octal
# (change requires restart)
#bonjour_name = ''  # defaults to the computer name
# (change requires restart)

# - Security & Authentication -

#authentication_timeout = 1min  # 1s-600s
#ssl = off  # (change requires restart)
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''# (change requires restart)
#krb_srvname = 'postgres'   # (change requires restart)
#krb_server_hostname = ''   # empty string matches any
keytab entry
# (change requires restart)
#krb_caseins_users = off# (change requires restart)

# - TCP Keepalives -
# see 'man 7 tcp' for details

#tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
# 0 selects the system default
#tcp_keepalives_interval = 0# TCP_KEEPINTVL, in
seconds;
# 0 selects the system default
#tcp_keepalives_count = 0   # TCP_KEEPCNT;
# 0 selects the system default


#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 512MB  # min 128kB or
max_connections*16kB
# (change requires restart)
temp_buffers = 8MB  # min 800kB
max_prepared_transactions = 5   # can be 0 or more
# (change requires restart)
# Note: increasing max_prepared_transactions costs
~600 bytes of shared memory
# per transaction slot, plus lock space (see
max_locks_per_transaction).
work_mem = 8MB  # min 64kB
maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 4MB  # min 100kB

# - Free Space Map -

max_fsm_pages = 70  # min max_fsm_relations*16, 6
bytes each
# (change requires restart)
max_fsm_relations = 1000# min 100, ~70 bytes each
# (change requires restart)

# - Kernel Resource Usage -

max_files_per_process = 1000# min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires
restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

# - Background writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
scanned/round
#bgwriter_lru_maxpages = 5  # 0-1000 b

Re: [PERFORM] Update table performance problem

2007-06-13 Thread Mark Makarowsky
The version is:

"PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC
gcc.exe (GCC) 3.4.2 (mingw-special)"

Here is the table definition for co and fco.  There
aren't any rules constraints, triggers, etc. on the
tables.  Only an index on each table for the xno
field.  Any other thoughts?

CREATE TABLE co
(
  xno character(10),
  longitude double precision,
  latitude double precision,
  firest_id character(8),
  fire_dist double precision,
  polst_id character(8),
  pol_dist double precision,
  fnew_id character(10),
  fnew_dist double precision,
  pnew_id character(10),
  pnew_dist double precision,
  seihazm020 bigint,
  acc_val integer,
  valley integer,
  flood_id bigint,
  chance character varying
) 
WITHOUT OIDS;
ALTER TABLE co OWNER TO postgres;
-- Index: co_xno

-- DROP INDEX co_xno;

CREATE UNIQUE INDEX co_xno
  ON co
  USING btree
  (xno);

CREATE TABLE fco
(
  firest_id character(8),
  fire_dist double precision,
  xno character(10)
) 
WITHOUT OIDS;
ALTER TABLE fco OWNER TO postgres;

-- Index: fco_xno

-- DROP INDEX fco_xno;

CREATE UNIQUE INDEX fco_xno
  ON fco
  USING btree
  (xno);

--- Tom Lane <[EMAIL PROTECTED]> wrote:

> Mark Makarowsky <[EMAIL PROTECTED]>
> writes:
> > "Hash Join  (cost=15590.22..172167.03 rows=383654
> > width=215) (actual time=1473.297..43032.178
> > rows=383654 loops=1)"
> > "  Hash Cond: (co.xno = fco.xno)"
> > "  ->  Seq Scan on co  (cost=0.00..123712.64
> > rows=384964 width=195) (actual
> time=440.196..37366.682
> > rows=384964 loops=1)"
> > "  ->  Hash  (cost=7422.54..7422.54 rows=383654
> > width=34) (actual time=995.651..995.651
> rows=383654
> > loops=1)"
> > "->  Seq Scan on fco  (cost=0.00..7422.54
> > rows=383654 width=34) (actual time=4.641..509.947
> > rows=383654 loops=1)"
> > "Total runtime: 378258.707 ms"
> 
> According to the top line, the actual scanning and
> joining took 43 sec;
> so the rest of the time went somewhere else. 
> Possibilities include
> the actual data insertion (wouldn't take 5 minutes),
> index updates
> (what indexes are on this table?), constraint
> checks, triggers, ...
> 
> You failed to mention which PG version this is.  8.1
> and up would show
> time spent in triggers separately, so we could
> eliminate that
> possibility if it's 8.1 or 8.2.  My suspicion
> without any data is
> a lot of indexes on the table.
> 
>   regards, tom lane
> 



   

Got a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Update table performance problem

2007-06-13 Thread Tom Lane
Mark Makarowsky <[EMAIL PROTECTED]> writes:
> "Hash Join  (cost=15590.22..172167.03 rows=383654
> width=215) (actual time=1473.297..43032.178
> rows=383654 loops=1)"
> "  Hash Cond: (co.xno = fco.xno)"
> "  ->  Seq Scan on co  (cost=0.00..123712.64
> rows=384964 width=195) (actual time=440.196..37366.682
> rows=384964 loops=1)"
> "  ->  Hash  (cost=7422.54..7422.54 rows=383654
> width=34) (actual time=995.651..995.651 rows=383654
> loops=1)"
> "->  Seq Scan on fco  (cost=0.00..7422.54
> rows=383654 width=34) (actual time=4.641..509.947
> rows=383654 loops=1)"
> "Total runtime: 378258.707 ms"

According to the top line, the actual scanning and joining took 43 sec;
so the rest of the time went somewhere else.  Possibilities include
the actual data insertion (wouldn't take 5 minutes), index updates
(what indexes are on this table?), constraint checks, triggers, ...

You failed to mention which PG version this is.  8.1 and up would show
time spent in triggers separately, so we could eliminate that
possibility if it's 8.1 or 8.2.  My suspicion without any data is
a lot of indexes on the table.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Update table performance problem

2007-06-13 Thread Mark Makarowsky
I am trying to update a field in one table with a
field from another table like:

update co set
firest_id=fco.firest_id,fire_dist=fco.fire_dist from
fco where co.xno=fco.xno

Table co has 384964 records
Table fco has 383654 records

The xno fields in both tables are indexed but they
don't seem to be used.  I would expect the update to
be faster than 6.3 minutes or is that expectation
wrong?  Here is the results of Explain Analyze:

"Hash Join  (cost=15590.22..172167.03 rows=383654
width=215) (actual time=1473.297..43032.178
rows=383654 loops=1)"
"  Hash Cond: (co.xno = fco.xno)"
"  ->  Seq Scan on co  (cost=0.00..123712.64
rows=384964 width=195) (actual time=440.196..37366.682
rows=384964 loops=1)"
"  ->  Hash  (cost=7422.54..7422.54 rows=383654
width=34) (actual time=995.651..995.651 rows=383654
loops=1)"
"->  Seq Scan on fco  (cost=0.00..7422.54
rows=383654 width=34) (actual time=4.641..509.947
rows=383654 loops=1)"
"Total runtime: 378258.707 ms"

Thanks, 

Fred


   

Looking for a deal? Find great prices on flights and hotels with Yahoo! 
FareChase.
http://farechase.yahoo.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly