Re: [PERFORM] 121+ million record table perf problems

2007-05-21 Thread Vivek Khera


On May 18, 2007, at 2:30 PM, Andrew Sullivan wrote:


Note also that your approach of updating all 121 million records in
one statement is approximately the worst way to do this in Postgres,
because it creates 121 million dead tuples on your table.  (You've
created some number of those by killing the query as well.)

All of that said, 17 hours seems kinda long.


I don't think that is too long.  Growing the table one page at a time  
takes a long time when you add a lot of pages to a table that big.   
Add in the single disk and you're flying the disk head all over the  
place so it will just be slow.  No way around it.


And just for good measure, I ran a count on one of my big tables  
which consists of two integers and a varchar(7):


db= select count(*) from mytable;
   count
---
311994721
(1 row)

Time: 157689.057 ms

So I'm going to bet $1 that you're I/O starved.

Also, for memory usage, postgres won't use more than you tell it to...


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] 121+ million record table perf problems

2007-05-18 Thread cyber-postgres

I need some help on recommendations to solve a perf problem.

I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value on 
one of the columns I finally killed after it ran 17 hours and had still 
not completed.  Queries into the table are butt slow, and


System:   SUSE LINUX 10.0 (X86-64)
Postgresql:   PostgreSQL 8.2.1
Index type:   btree

A select count took ~48 minutes before I made some changes to the 
postgresql.conf, going from default values to these:

shared_buffers = 24MB
work_mem = 256MB
maintenance_work_mem = 512MB
random_page_cost = 100
stats_start_collector = off
stats_row_level = off

As a test I am trying to do an update on state using the following queries:
update res set state=5001;
select count(resid) from res;

The update query that started this all I had to kill after 17hours.  It 
should have updated all 121+ million records.  That brought my select 
count down to 19 minutes, but still a far cry from acceptable.


The system has 2GB of RAM (more is alreads on order), but doesn't seem to 
show problems in TOP with running away with RAM.  If anything, I don't 
think it's using enough as I only see about 6 processes using 26-27 MB 
each) and is running on a single disk (guess I will likely have to at the 
minimum go to a RAID1).  Workload will primarily be comprised of queries 
against the indicies (thus why so many of them) and updates to a single 
record from about 10 clients where that one records will have md5, state, 
rval, speed, audit, and date columns updated.  Those updates don't seem to 
be a problem, and are generally processed in bulk of 500 to 5000 at a 
time.


Here is the schema for the table giving me problems:

CREATE TABLE res
(
  res_id integer NOT NULL DEFAULT nextval('result_id_seq'::regclass),
  res_client_id integer NOT NULL,
  time real DEFAULT 0,
  error integer DEFAULT 0,
  md5 character(32) DEFAULT 0,
  res_tc_id integer NOT NULL,
  state smallint DEFAULT 0,
  priority smallint,
  rval integer,
  speed real,
  audit real,
  date timestamp with time zone,
  gold_result_id integer,
  CONSTRAINT result_pkey PRIMARY KEY (res_id),
  CONSTRAINT unique_res UNIQUE (res_client_id, res_tc_id)
)
WITHOUT OIDS;
ALTER TABLE res OWNER TO postgres;

CREATE INDEX index_audit
  ON res
  USING btree
  (audit);

CREATE INDEX index_event
  ON res
  USING btree
  (error);

CREATE INDEX index_priority
  ON res
  USING btree
  (priority);

CREATE INDEX index_rval
  ON res
  USING btree
  (rval);

CREATE INDEX index_speed
  ON res
  USING btree
  (speed);

CREATE INDEX index_state
  ON res
  USING btree
  (state);

CREATE INDEX index_tc_id
  ON res
  USING btree
  (res_tc_id);

CREATE INDEX index_time
  ON res
  USING btree
  (time);

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Andrew Sullivan
On Fri, May 18, 2007 at 12:43:40PM -0500, [EMAIL PROTECTED] wrote:
 I've got a table with ~121 million records in it.  Select count on it 
 currently takes ~45 minutes, and an update to the table to set a value on 
 one of the columns I finally killed after it ran 17 hours and had still 
 not completed.  Queries into the table are butt slow, and

I don't think you've told us anything like enough to get started on
solving your problem.  But to start with, you know that in Postgres,
an unrestricted count() on a table always results in reading the
entire table, right?

Standard questions: have you performed any vacuum or analyse?

Your update statement is also a case where you have to touch every
row.  Note that, given that you seem to be setting the state field to
the same value for everything, an index on there will do you not one
jot of good until there's greater selectivity.

How fast is the disk?  Is it fast enough to read and touch every one
of those rows on the table inside of 17 hours?  

Note also that your approach of updating all 121 million records in
one statement is approximately the worst way to do this in Postgres,
because it creates 121 million dead tuples on your table.  (You've
created some number of those by killing the query as well.)

All of that said, 17 hours seems kinda long. 

 As a test I am trying to do an update on state using the following queries:
 update res set state=5001;
 select count(resid) from res;

What is this testing?

 The update query that started this all I had to kill after 17hours.  

Does that suggest that the update you're trying to make work well is
_not_ update res set state = 5001?

 each) and is running on a single disk (guess I will likely have to at the 
 minimum go to a RAID1).  Workload will primarily be comprised of queries 

I bet that single disk is your problem.  Iostat is your friend, I'd
say.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Everything that happens in the world happens at some place.
--Jane Jacobs 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Joshua D. Drake

[EMAIL PROTECTED] wrote:

I need some help on recommendations to solve a perf problem.

I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value 
on one of the columns I finally killed after it ran 17 hours and had 
still not completed.  Queries into the table are butt slow, and


Scanning 121 million rows is going to be slow even on 16 disks.



System:   SUSE LINUX 10.0 (X86-64)
Postgresql:   PostgreSQL 8.2.1
Index type:   btree


You really should be running 8.2.4.



A select count took ~48 minutes before I made some changes to the 
postgresql.conf, going from default values to these:

shared_buffers = 24MB


This could be increased.


work_mem = 256MB
maintenance_work_mem = 512MB
random_page_cost = 100
stats_start_collector = off
stats_row_level = off

As a test I am trying to do an update on state using the following queries:
update res set state=5001;


You are updating 121 million rows, that takes a lot of time considering 
you are actually (at a very low level) marking 121 million rows dead and 
inserting 121 million more.


The update query that started this all I had to kill after 17hours.  It 
should have updated all 121+ million records.  That brought my select 
count down to 19 minutes, but still a far cry from acceptable.


Not quite sure what you would considerable acceptable based on what you 
are trying to do.



Sincerely,

Joshua D. Drake





Here is the schema for the table giving me problems:

CREATE TABLE res
(
  res_id integer NOT NULL DEFAULT nextval('result_id_seq'::regclass),
  res_client_id integer NOT NULL,
  time real DEFAULT 0,
  error integer DEFAULT 0,
  md5 character(32) DEFAULT 0,
  res_tc_id integer NOT NULL,
  state smallint DEFAULT 0,
  priority smallint,
  rval integer,
  speed real,
  audit real,
  date timestamp with time zone,
  gold_result_id integer,
  CONSTRAINT result_pkey PRIMARY KEY (res_id),
  CONSTRAINT unique_res UNIQUE (res_client_id, res_tc_id)
)
WITHOUT OIDS;
ALTER TABLE res OWNER TO postgres;

CREATE INDEX index_audit
  ON res
  USING btree
  (audit);

CREATE INDEX index_event
  ON res
  USING btree
  (error);

CREATE INDEX index_priority
  ON res
  USING btree
  (priority);

CREATE INDEX index_rval
  ON res
  USING btree
  (rval);

CREATE INDEX index_speed
  ON res
  USING btree
  (speed);

CREATE INDEX index_state
  ON res
  USING btree
  (state);

CREATE INDEX index_tc_id
  ON res
  USING btree
  (res_tc_id);

CREATE INDEX index_time
  ON res
  USING btree
  (time);

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Brian Hurt

[EMAIL PROTECTED] wrote:


I need some help on recommendations to solve a perf problem.

I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value 
on one of the columns I finally killed after it ran 17 hours and had 
still not completed.  Queries into the table are butt slow, and


This is way too long.  I just did a select count(*) on a table of mine 
that has 48 million rows and it took only 178 seconds.  And this is on a 
serious POS disk subsystem that's giving me about 1/2 the read speed of 
a single off the shelf SATA disk.
As select count(*) has to read the whole table sequentially, the time it 
takes is linear with the size of the table (once you get large enough 
that the whole table doesn't get cached in memory).  So I'd be surprised 
if a 121 million record table took more than 500 or so seconds to read, 
and would expect it to be less.


So my advice: vacuum.  I'll bet you've got a whole boatload of dead 
tuples kicking around.  Then analyze.  Then consider firing off a 
reindex and/or cluster against the table.  The other thing I'd consider 
is dropping the money on some more hardware- a few hundred bucks to get 
a battery backed raid card and half a dozen SATA drives would probably 
do wonders for your performance.




shared_buffers = 24MB


Up your shared buffers.  This is a mistake I made originally as well- 
but this is the total number of shared buffers used by the system.  I 
had originally assumed that the number of shared buffers used was this 
times the number of backends, but it's not.


With 2G of memory, I'd start with shared buffers of 512MB, and consider 
upping it to 768MB or even 1024MB.  This will also really help performance.



stats_start_collector = off
stats_row_level = off


I think I'd also recommend turning these one.

Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Alan Hodgson
On Friday 18 May 2007 11:51, Joshua D. Drake [EMAIL PROTECTED] wrote:
  The update query that started this all I had to kill after 17hours.  It
  should have updated all 121+ million records.  That brought my select
  count down to 19 minutes, but still a far cry from acceptable.

You're going to want to drop all your indexes before trying to update 121 
million records.  Updates in PostgreSQL are really quite slow, mostly due 
to all the index updates. Drop indexes, do the updates, create a primary 
key, cluster the table on that key to free up the dead space, then recreate 
the rest of the indexes. That's about as fast as you can get that process.

Of course, doing anything big on one disk is also going to be slow, no 
matter what you do. I don't think a table scan should take 19 minutes, 
though, not for 121 million records. You should be able to get at least 
60-70MB/sec out of anything modern. I can only assume your disk is 
thrashing doing something else at the same time as the select.

-- 
We can no more blame our loss of freedom on Congressmen than we can
prostitution on pimps.  Both simply provide broker services for their
customers. -- Dr. Walter Williams


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Craig James




I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value 
on one of the columns I finally killed after it ran 17 hours and had 
still not completed.  Queries into the table are butt slow, and


The update query that started this all I had to kill after 17hours.  
It should have updated all 121+ million records.  That brought my 
select count down to 19 minutes, but still a far cry from acceptable.


If you have a column that needs to be updated often for all rows, 
separate it into a different table, and create a view that joins it back 
to the main table so that your application still sees the old schema.


This will greatly speed your update since (in Postgres) and update is 
the same as a delete+insert.  By updating that one column, you're 
re-writing your entire 121 million rows.  If you separate it, you're 
only rewriting that one column.  Don't forget to vacuum/analyze and 
reindex when you're done.


Better yet, if you can stand a short down time, you can drop indexes on 
that column, truncate, then do 121 million inserts, and finally 
reindex.  That will be MUCH faster.


Craig



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Alvaro Herrera
Craig James wrote:

 Better yet, if you can stand a short down time, you can drop indexes on 
 that column, truncate, then do 121 million inserts, and finally 
 reindex.  That will be MUCH faster.

Or you can do a CLUSTER, which does all the same things automatically.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Greg Smith

On Fri, 18 May 2007, [EMAIL PROTECTED] wrote:


shared_buffers = 24MB
work_mem = 256MB
maintenance_work_mem = 512MB


You should take a minute to follow the suggestions at 
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm and set 
dramatically higher values for shared_buffers and effective_cache_size for 
your server.  Also, your work_mem figure may be OK for now, but if ever do 
have 10 people connect to this database at once and run big queries you 
could have an issue with it set that high--that's a per client setting.


After you're done with that, you should also follow the suggestions there 
to do a VACCUM ANALYZE.  That may knock out two other potential issues at 
once.  It will take a while to run, but I think you need it badly to sort 
out what you've already done.



random_page_cost = 100


I'm not sure what logic prompted this change, but after you correct the 
above you should return this to its default; if this is helping now it's 
only because other things are so far off from where they should be.



update res set state=5001;
The update query that started this all I had to kill after 17hours.  It 
should have updated all 121+ million records.  That brought my select count 
down to 19 minutes, but still a far cry from acceptable.


You should work on the select side of this first.  If that isn't running 
in a moderate amount of time, trying to get the much more difficult update 
to happen quickly is hopeless.


Once the select is under control, there are a lot of parameters to adjust 
that will effect the speed of the updates.  The first thing to do is 
dramatically increase checkpoint_segments; I would set that to at least 30 
in your situation.


Also:  going to RAID-1 won't make a bit of difference to your update 
speed; could even make it worse.  Adding more RAM may not help much 
either.  If you don't have one already, the real key to improving 
performance in a heavy update situation is to get a better disk controller 
with a cache that helps accelerate writes.  Then the next step is to 
stripe this data across multiple disks in a RAID-0 configuration to split 
the I/O up.


You have a lot of work ahead of you.  Even after you resolve the gross 
issues here, you have a table that has around 10 indexes on it. 
Maintaining those is far from free; every time you update a single record 
in that table, the system has to update each of those indexes on top of 
the record update itself.  So you're really asking your system to do 
around 1.2 billion disk-related operations when you throw out your simple 
batch update against every row, and good luck getting that to run in a 
time frame that's less than days long.


The right way to get a feel for what's going on is to drop all the indexes 
except for the constraints and see how the bulk update runs after the 
parameter changes suggested above are in place and the database has been 
cleaned up with vacuum+analyze.  Once you have a feel for that, add some 
indexes back in and see how it degrades.  Then you'll know how adding each 
one of them impacts your performance.  I suspect you're going to have to 
redesign your indexing scheme before this is over.  I don't think your 
current design is ever going to work the way you expect it to.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match