Re: RES: RES: [PERFORM] select on 1milion register = 6s

2007-07-30 Thread Decibel!
Please reply-all so others can learn and contribute.

On Sun, Jul 29, 2007 at 09:38:12PM -0700, Craig James wrote:
 Decibel! wrote:
 It's unlikely that it's going to be faster to index scan 2.3M rows than
 to sequential scan them. Try setting enable_seqscan=false and see if it
 is or not.
 
 Out of curiosity ... Doesn't that depend on the table?  Are all of the data 
 for one row stored contiguously, or are the data stored column-wise?  If 
 it's the former, and the table has hundreds of columns, or a few columns 
 with large text strings, then wouldn't the time for a sequential scan 
 depend not on the number of rows, but rather the total amount of data?

Yes, the time for a seqscan is mostly dependent on table size and not
the number of rows. But the number of rows plays a very large role in
the cost of an indexscan.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpQf4XPs38J4.pgp
Description: PGP signature


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Richard Huxton

Jay Kang wrote:

Hello,

I'm currently trying to decide on a database design for tags in my web
2.0application. The problem I'm facing is that I have 3 separate
tables
i.e. cars, planes, and schools. All three tables need to interact with the
tags, so there will only be one universal set of tags for the three tables.

I read a lot about tags and the best articles I found were:

Road to Web 2.0 ( http://wyome.com/docs/Road_to_Web_2.0:_The_Database_Design )


And what in particular recommended this to you?


Currently, this is my DB design:

Cars (carid, carname, text, etc.)
Planes (planeid, planename, text, etc.)
Schools (schoolname, text, etc.) -- School does not take int as primary
key but a varchar.


You don't mention a primary-key here at all. You're not thinking of 
using schoolname are you?



Tags (tagid, tagname, etc)

--- Now here is where I have the question. I have to link up three separate
tables to use Tags
--- So when a new car is created in the Cars table, should I insert that
carID into the TagsItems table
--- as itemID? So something like this?

TagsItems
(
  tagid INT NOT NULL REFERENCES Tags.TagID,
  itemid INT NULL,   really references Cars.carID and Planes.planeID
  schoolname varchar NULL   Saves the Schools.schoolname
  itemid + tagId as Unique
)


What's wrong with the completely standard:
  car_tags (carid, tagid)
  plane_tags (planeid, tagid)
  school_tags (schoolid, tagid)


I also have a question on the schoolname field, because it accepts varchar
not integer. There seems to be some design that would better fit my needs.
I'm asking  you guys for a little assistance.


Sorry, don't understand this question.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Jay Kang
Thanks for the reply Richard, but I guess I didn't explain myself well. I
have three tables that needs to be mapped to the Tags table. Most of the web
references that I mentioned only maps one table to the Tags table. Here is
my Tags table:

CREATE TABLE Tags
(
   TagID serial NOT NULL,
   TagName varchar(64) NOT NULL,
   AddedBy varchar(256) NOT NULL,
   AddedDate timestamp NOT NULL,
   Status int NOT NULL,
   ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT (('0'))
);

Is it your opinion that the most standard solution for my problem would be
to create three separate tables called car_tags, plane_tags and school_tags,
which maps to each of the tables:

CREATE TABLE car_tags
(
   CarID integer NOT NULL,
   TagID integer NOT NULL
);

CREATE TABLE plane_tags
(
   PlaneID integer NOT NULL,
   TagID integer NOT NULL
);

CREATE TABLE school_tags
(
   SchoolID integer NOT NULL,
   TagID integer NOT NULL
);

Would TagID for each of these three tables be a foreign key for the Tags
table? Also would each CarID, PlaneID, and SchoolID be a foreign for each
corresponding tables? Also won't getting tags for three tables be more
complicated? Isn't there a better solution or is this wishful thinking?

On 7/30/07, Richard Huxton [EMAIL PROTECTED] wrote:

 Jay Kang wrote:
  Hello,
 
  I'm currently trying to decide on a database design for tags in my web
  2.0application. The problem I'm facing is that I have 3 separate
  tables
  i.e. cars, planes, and schools. All three tables need to interact with
 the
  tags, so there will only be one universal set of tags for the three
 tables.
 
  I read a lot about tags and the best articles I found were:
 
  Road to Web 2.0 (
 http://wyome.com/docs/Road_to_Web_2.0:_The_Database_Design )

 And what in particular recommended this to you?


The Road to Web 2.0 is an example of tag implementation, just thought it
would be helpful to someone with the same problem that I have.

 Currently, this is my DB design:
 
  Cars (carid, carname, text, etc.)
  Planes (planeid, planename, text, etc.)
  Schools (schoolname, text, etc.) -- School does not take int as
 primary
  key but a varchar.

 You don't mention a primary-key here at all. You're not thinking of
 using schoolname are you?


Yes, I used school name varchar(64) as primary key for the school tables.
You can consider schoolName as Pagename for a wiki.

 Tags (tagid, tagname, etc)
 
  --- Now here is where I have the question. I have to link up three
 separate
  tables to use Tags
  --- So when a new car is created in the Cars table, should I insert that

  carID into the TagsItems table
  --- as itemID? So something like this?
 
  TagsItems
  (
tagid INT NOT NULL REFERENCES Tags.TagID,
itemid INT NULL,   really references Cars.carID and
 Planes.planeID
schoolname varchar NULL   Saves the Schools.schoolname
itemid + tagId as Unique
  )

 What's wrong with the completely standard:
car_tags (carid, tagid)
plane_tags (planeid, tagid)
school_tags (schoolid, tagid)

 I also have a question on the schoolname field, because it accepts varchar
  not integer. There seems to be some design that would better fit my
 needs.
  I'm asking  you guys for a little assistance.

 Sorry, don't understand this question.

 --
Richard Huxton
Archonet Ltd




-- 
Regards,
Jay Kang


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Richard Huxton

Jay Kang wrote:

Thanks for the reply Richard, but I guess I didn't explain myself well. I
have three tables that needs to be mapped to the Tags table. Most of the web
references that I mentioned only maps one table to the Tags table. Here is
my Tags table:


One quick point. SQL is case-insensitive unless you double-quote 
identifiers. This means CamelCase tend not to be used. So instead of 
AddedBy you'd more commonly see added_by.



CREATE TABLE Tags
(
   TagID serial NOT NULL,
   TagName varchar(64) NOT NULL,
   AddedBy varchar(256) NOT NULL,


This is supposed to be a user? But it's not a foreign-key, and you've 
decided that 255 characters will be a good length, but 257 is impossible.



   AddedDate timestamp NOT NULL,


You probably want timestamp with time zone (which represents an 
absolute time) rather than without time-zone (which means 1pm in London 
is different from 1pm in New York).


Also, if it's AddedDate why isn't it a date?


   Status int NOT NULL,
   ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT (('0'))
);


You might not want to mix in details about number of views with details 
of the tag. Particularly if you might record more details later (when 
viewed, by whom etc).



Is it your opinion that the most standard solution for my problem would be
to create three separate tables called car_tags, plane_tags and school_tags,
which maps to each of the tables:


Well, yes.


CREATE TABLE car_tags
(
   CarID integer NOT NULL,
   TagID integer NOT NULL
);

[snip other table defs]

Don't forget CarID isn't really an integer (I mean, you're not going to 
be doing sums with car id's are you?) it's actually just a unique code. 
Of course, computers are particularly fast at dealing with 32-bit integers.



Would TagID for each of these three tables be a foreign key for the Tags
table? Also would each CarID, PlaneID, and SchoolID be a foreign for each
corresponding tables? Also won't getting tags for three tables be more
complicated? Isn't there a better solution or is this wishful thinking?


Yes, yes, and no.

You have cars which have tags and planes which have tags. Tagging a 
plane is not the same as tagging a car. Either you confuse that issue, 
or you want separate tables to track each relationship.


Fetching a list of everything with a specific tag is straightforward enough:

SELECT 'car'::text AS item_type, car_id AS item_id, carname AS item_name
FROM cars JOIN car_tags WHERE tag_id = x
UNION ALL
SELECT 'plane'::text AS item_type, plane_id AS item_id, planename AS 
item_name

FROM planes JOIN plane_tags WHERE tag_id = x
...

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Richard Huxton

Richard Huxton wrote:



CREATE TABLE car_tags
(
   CarID integer NOT NULL,
   TagID integer NOT NULL
);

[snip other table defs]

Don't forget CarID isn't really an integer (I mean, you're not going to 
be doing sums with car id's are you?) it's actually just a unique code. 
Of course, computers are particularly fast at dealing with 32-bit integers.


Just realised I haven't explained what I meant by that.

CarID is a different type from PlaneID and TagID. As it happens, we are 
using integers to represent them all, but a CarID = 1 is different from 
a PlaneID = 1 and although you can numerically compare the two it is an 
error to do so.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Jay Kang
Hey Richard,

Thanks again for the reply, its great to hear some feedback. So once again,
here we go:

On 7/30/07, Richard Huxton [EMAIL PROTECTED] wrote:

 Jay Kang wrote:
  Thanks for the reply Richard, but I guess I didn't explain myself well.
 I
  have three tables that needs to be mapped to the Tags table. Most of the
 web
  references that I mentioned only maps one table to the Tags table. Here
 is
  my Tags table:

 One quick point. SQL is case-insensitive unless you double-quote
 identifiers. This means CamelCase tend not to be used. So instead of
 AddedBy you'd more commonly see added_by.



Yes, I am aware that postgre is case-insensitive, but I write all query with
case so its easier for me to read later on.

 CREATE TABLE Tags
  (
 TagID serial NOT NULL,
 TagName varchar(64) NOT NULL,
 AddedBy varchar(256) NOT NULL,

 This is supposed to be a user? But it's not a foreign-key, and you've
 decided that 255 characters will be a good length, but 257 is impossible.


I'm developing in c# with asp.net 2.0 which as a membership provider. I'm
using ASP.NET 2.0 Website Programming / Problem - Design - Solution (Wrox
Press) http://www.amazon.com/gp/product/0764584642 as a reference, so not
having AddedBy as a foreign key within each of the tables was taken directly
from the text. I do not understand your comment about 255 character with 257
being impossible? Could you elaborate, if you feel it warrants further
elaboration.

AddedDate timestamp NOT NULL,

 You probably want timestamp with time zone (which represents an
 absolute time) rather than without time-zone (which means 1pm in London
 is different from 1pm in New York).


OK, timestamp with time zone it is. To be honest, I've been using postgresql
for a while now, but never tried using timestamp with time zone.


Also, if it's AddedDate why isn't it a date?


I had this first as a date, but asp.net 2.0 didn't like it, and changing it
to a timestamp fixed the problem.

Status int NOT NULL,
 ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT
 (('0'))
  );

 You might not want to mix in details about number of views with details
 of the tag. Particularly if you might record more details later (when
 viewed, by whom etc).


Are you suggesting to separate the Tags table into Tags and TagDetails?
Because ViewCount within Tags table would represent how many times that tag
was clicked, I think others would call this field Popularity. I've been
reading alot about tags and I am fascinated at all the information about
user tags can provide. Where would I put information such as ViewCount,
AddedBy, Status, etc if not within the Tags table? Sorry, if I'm totally
missing your point.

 Is it your opinion that the most standard solution for my problem would be
  to create three separate tables called car_tags, plane_tags and
 school_tags,
  which maps to each of the tables:

 Well, yes.

  CREATE TABLE car_tags
  (
 CarID integer NOT NULL,
 TagID integer NOT NULL
  );
 [snip other table defs]

 Don't forget CarID isn't really an integer (I mean, you're not going to
 be doing sums with car id's are you?) it's actually just a unique code.
 Of course, computers are particularly fast at dealing with 32-bit
 integers.


Yes, within the Cars table CarID would be a serial so it would auto
increment with each row. I understand your concern.

 Would TagID for each of these three tables be a foreign key for the Tags
  table? Also would each CarID, PlaneID, and SchoolID be a foreign for
 each
  corresponding tables? Also won't getting tags for three tables be more
  complicated? Isn't there a better solution or is this wishful thinking?

 Yes, yes, and no.

 You have cars which have tags and planes which have tags. Tagging a
 plane is not the same as tagging a car. Either you confuse that issue,
 or you want separate tables to track each relationship.


Hmm, so if I have a tag called Saab and a user clicks on Saab, then
information from both Cars and Planes table would appear. If I'm inserting a
new row for a tag, wouldn't I need to check if that tagname already appears
within the Tags table or would I just create a new row with that tag name.
Sorry, I'm not sure what  'car'::text  this is doing, but I'm guessing its
used to group the cars, planes, etc. so it knows which item_type it is.
Brilliant!

Fetching a list of everything with a specific tag is straightforward enough:

 SELECT 'car'::text AS item_type, car_id AS item_id, carname AS item_name
 FROM cars JOIN car_tags WHERE tag_id = x
 UNION ALL
 SELECT 'plane'::text AS item_type, plane_id AS item_id, planename AS
 item_name
 FROM planes JOIN plane_tags WHERE tag_id = x



Thanks for the query, I'm going to start programming so I can figure it out
as I go along.

...

 --
Richard Huxton
Archonet Ltd




-- 
Regards,
Jay Kang


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Richard Huxton

Jay Kang wrote:

One quick point. SQL is case-insensitive unless you double-quote
identifiers. This means CamelCase tend not to be used. So instead of
AddedBy you'd more commonly see added_by.


Yes, I am aware that postgre is case-insensitive, but I write all query with
case so its easier for me to read later on.


It's SQL that's case insensitive. Pretty much any SQL-based database 
system you use will do case-folding in some way.



CREATE TABLE Tags

(
   TagID serial NOT NULL,
   TagName varchar(64) NOT NULL,
   AddedBy varchar(256) NOT NULL,

This is supposed to be a user? But it's not a foreign-key, and you've
decided that 255 characters will be a good length, but 257 is impossible.



I'm developing in c# with asp.net 2.0 which as a membership provider. I'm
using ASP.NET 2.0 Website Programming / Problem - Design - Solution (Wrox
Press) http://www.amazon.com/gp/product/0764584642 as a reference, so not
having AddedBy as a foreign key within each of the tables was taken directly
from the text. I do not understand your comment about 255 character with 257
being impossible? Could you elaborate, if you feel it warrants further
elaboration.


What is AddedBy - a name, a user-id?
If it's an ID, then it seems very long.
If it's a name, then 256 characters sounds a bit arbitrary as a length. 
Why choose 256?


The advantage of *not* having AddedBy as a foreign-key is that you can 
delete users and not have to update tags with their user-id. The 
disadvantage is the same thing. You can end up with tags added by 
non-existent users.



   AddedDate timestamp NOT NULL,

You probably want timestamp with time zone (which represents an
absolute time) rather than without time-zone (which means 1pm in London
is different from 1pm in New York).


OK, timestamp with time zone it is. To be honest, I've been using postgresql
for a while now, but never tried using timestamp with time zone.


You can get away with it as long as the time-zone setting on your client 
 stays the same. Then it changes, and you're left wondering why all 
your comparisons are hours out.



Also, if it's AddedDate why isn't it a date?

I had this first as a date, but asp.net 2.0 didn't like it, and changing it
to a timestamp fixed the problem.


Surely asp.net has a date type? If not, I'd suggest AddedTimestamp as a 
name (or AddedTS if you don't enjoy lots of typing :-). It won't matter 
to you now, but 12 months from now it'll save you looking up data types.



   Status int NOT NULL,

   ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT

(('0'))

);

You might not want to mix in details about number of views with details
of the tag. Particularly if you might record more details later (when
viewed, by whom etc).


Are you suggesting to separate the Tags table into Tags and TagDetails?
Because ViewCount within Tags table would represent how many times that tag
was clicked, I think others would call this field Popularity. I've been
reading alot about tags and I am fascinated at all the information about
user tags can provide. Where would I put information such as ViewCount,
AddedBy, Status, etc if not within the Tags table? Sorry, if I'm totally
missing your point.


Well, it all depends on your use analysis. You could make a good 
argument that there are two sets of fact data:

1. Identity data - id, name, added-by, added-ts, status
2. Activity data - number of views, last time clicked etc

Depending on how you intend to use the tags, it might make sense to 
separate these. Particularly if you find yourself with no sensible 
values in activity data until a tag is used.


From a separate performance-related point of view, you would expect 
activity data to be updated much more often than identity data.



You have cars which have tags and planes which have tags. Tagging a
plane is not the same as tagging a car. Either you confuse that issue,
or you want separate tables to track each relationship.


Hmm, so if I have a tag called Saab and a user clicks on Saab, then
information from both Cars and Planes table would appear.


Well, if you UNION them, yes. Of course you'll need to find columns that 
make sense across all types. If planes have e.g. wingspan then you'll 
need to add 'not applicable'::text in the car-related subquery.


 If I'm inserting a

new row for a tag, wouldn't I need to check if that tagname already appears
within the Tags table or would I just create a new row with that tag name.
Sorry, I'm not sure what  'car'::text  this is doing, but I'm guessing its
used to group the cars, planes, etc. so it knows which item_type it is.
Brilliant!


Yes, if you're giving a list of all Saabs, I'm assuming your users 
will want to know if it's a plane or car. The ::text is just PostgreSQL 
shorthand for a cast - it's good practice to specify precise types for 
literal values.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Richard Huxton

Jay Kang wrote:

Hey Richard,

Sorry for the late reply, I was just making my first test version of the DB
closely resembling you suggested design. Just wanted to write you back
answering your questions. So here we go:


No problem - it's email and what with different timezones it's common to 
have gaps.



What is AddedBy - a name, a user-id?
If it's an ID, then it seems very long.
If it's a name, then 256 characters sounds a bit arbitrary as a length.
Why choose 256?


No, AddedBy is the username of the individual. Why is 256 characters
arbitrary as a length? Would 255 be better or 32? I guess, your saying its
too long, shorten it, I'm just going with what the books says, but I really
welcome any comments you have^^


The book doesn't know what you're trying to do. You do. The important 
thing is not whether you choose 256 or 32 or 100, it's that you've 
thought about it first.


Obvious thoughts:
1. Is this going to be a real name Richard Huxton or an identifier 
rhuxton123? You'll want more characters for the real name than an 
identifier.
2. Where will this be displayed and will it take up too much space? If I 
pick a username of WWW...250 repeats...WWW does that mess up any formatting?
3. Do we allow HTML-unsafe characters ('', '') and escape them when 
used, or just not allow them?


No wrong or right, the process of thinking about it is important.


The advantage of *not* having AddedBy as a foreign-key is that you can

delete users and not have to update tags with their user-id. The
disadvantage is the same thing. You can end up with tags added by
non-existent users.


Thanks, I would like anonymous users to be able to add tags, so I guess I'll
leave it the way it is^^


You would normally use NULL to indicate unknown, which in the case of 
an anonymous user would be true. A NULL foreign-key is allowed (unless 
you define the column not-null of course).


[snip]

Well, it all depends on your use analysis. You could make a good
argument that there are two sets of fact data:
1. Identity data - id, name, added-by, added-ts, status
2. Activity data - number of views, last time clicked etc


If I were to create Identity and Activity for the Tags table, would I be
creating two separate tables called TagActivities and TagIdentities?


That's what I'm talking about, and you'd have a foreign-key constraint 
to make sure activity refers to a real tag identity. Again, I'm not 
saying you *do* want to do this, just that you'll need to think about it.



Currently, I'm not sure how I'll analysis the data for Tags. I know that I
want to do the bigger font if it is popular and smaller font if its not.
Hmm, would like to see examples of other websites that utilized Tags tables
to see how they implemented this function. I was thinking of adding tagcount
(popularity) for each user within the user definition table.


For this particular case, you'll almost certainly want to cache the 
results anyway. The popularity isn't going to change that fast, and 
presumably you'll only want to categorise them as VERY BIG, Big, normal 
etc. I assume asp.net allows you to cache this sort of information somehow.



You have cars which have tags and planes which have tags. Tagging a
plane is not the same as tagging a car. Either you confuse that issue,
or you want separate tables to track each relationship.

Hmm, so if I have a tag called Saab and a user clicks on Saab, then
information from both Cars and Planes table would appear.

Well, if you UNION them, yes. Of course you'll need to find columns that
make sense across all types. If planes have e.g. wingspan then you'll
need to add 'not applicable'::text in the car-related subquery.


Hmm, currently I can't visualize the query, again, it would help if I can
see the data to see what you mean. If planes table had a tag called
wingspan, wouldn't the query just not show any value for the field so it
wouldn't need 'not applicable' in the car-related subquery? Not sure really.


Sorry - I'm trying to say that if you UNION together several queries 
they all need to have the same columns. So - if one subquery doesn't 
have that column you'll need to provide a not applicable value instead.


Best of luck with the application, and don't forget to cache query 
results when they don't change often. It'll boost performance quite a bit.


P.S. - try the general mailing list if you want to discuss this sort 
of thing some more. This one is really supposed to be 
performance-related questions only.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Vacuum looping?

2007-07-30 Thread Steven Flatt
On 7/28/07, Jim C. Nasby [EMAIL PROTECTED] wrote:

 What are your vacuum_cost_* settings? If you set those too aggressively
 you'll be in big trouble.


 autovacuum_vacuum_cost_delay = 100
autovacuum_vacuum_cost_limit = 200

These are generally fine, autovacuum keeps up, and there is minimal impact
on the system.

vacuum_cost_delay = 100
vacuum_cost_limit = 1000

We set this cost_limit a little higher so that, in the few cases where we
have to intervene manually, vacuum runs faster.


The second pass on the vacuum means that maintenance_work_memory isn't
 large enough.


maintenance_work_mem is set to 256MB and I don't think we want to make this
any bigger by default.  Like I say above, generally autovacuum runs fine.
If we do run into this situation again (lots of OOM queries and lots to
cleanup), we'll probably increase maintenance_work_mem locally and run a
vacuum in that session.

Good to know that vacuum was doing the right thing.

Thanks,
Steve


Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Nis Jørgensen
Tilmann Singer skrev:

 But the subselect is not fast for the user with many relationships and
 matched rows at the beginning of the sorted large_table:
 
 testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
  user_id=5)
  ORDER BY created_at DESC LIMIT 10;
   
  QUERY PLAN 
 -
  Limit  (cost=6963.94..6963.96 rows=10 width=621) (actual 
 time=53187.349..53187.424 rows=10 loops=1)
-  Sort  (cost=6963.94..6966.96 rows=1211 width=621) (actual 
 time=53187.341..53187.360 rows=10 loops=1)
  Sort Key: lt.created_at
  -  Nested Loop  (cost=39.52..6901.92 rows=1211 width=621) (actual 
 time=201.728..52673.800 rows=69018 loops=1)
-  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual 
 time=178.777..178.966 rows=40 loops=1)
  -  Bitmap Heap Scan on relationships  (cost=4.33..39.49 
 rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1)
Recheck Cond: (user_id = 5)
-  Bitmap Index Scan on 
 relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10 width=0) 
 (actual time=28.721..28.721 rows=40 loops=1)
  Index Cond: (user_id = 5)
-  Index Scan using large_user_id_started_at_index on 
 large_table lt  (cost=0.00..6834.04 rows=2268 width=621) (actual 
 time=21.994..1301.375 rows=1725 loops=40)
  Index Cond: (lt.user_id = relationships.contact_id)
  Total runtime: 53188.584 ms
 
 
 
 Using a join now the query for mat

 Any ideas?

It seems to me the subselect plan would benefit quite a bit from not
returning all rows, but only the 10 latest for each user. I believe the
problem is similar to what is discussed for UNIONs here:

http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8

which got implemented here:

http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565

It seems to me the planner in this case would actually need to push the
limit into the nested loop, since we want the plan to take advantage of
the index (using a backwards index scan). I am ready to be corrected though.

You could try this (quite hackish) attempt at forcing the query planner
to use this plan:

SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=5) AND created_at in (SELECT created_at FROM large_table
lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;

If that doesn't work, you might have reached the point where you need to
use some kind of record-keeping system to keep track of which records to
look at.

/Nis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Tilmann Singer
* Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]:
 It seems to me the subselect plan would benefit quite a bit from not
 returning all rows, but only the 10 latest for each user. I believe the
 problem is similar to what is discussed for UNIONs here:
 
 http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8
 
 which got implemented here:
 
 http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565
 
 It seems to me the planner in this case would actually need to push the
 limit into the nested loop, since we want the plan to take advantage of
 the index (using a backwards index scan). I am ready to be corrected though.

If I understand that correctly than this means that it would benefit
the planning for something like

SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ...

if any of q1 or q2 would satisfy the rows requested by limit early,
instead of planning q1 and q2 without having the limit of the outer
query an influence.

Unfortunately I'm having problems making my q2 reasonably efficient in
the first place, even before UNIONing it.

 You could try this (quite hackish) attempt at forcing the query planner
 to use this plan:
 
 SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
  user_id=5) AND created_at in (SELECT created_at FROM large_table
 lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
  ORDER BY created_at DESC LIMIT 10;

No for the user with many matches at the beginning:

testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=5) AND created_at IN (SELECT created_at FROM large_table lt2
   WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;

 QUERY PLAN

 Limit  (cost=4.94..4.97 rows=10 width=621) (actual 
time=70550.549..70550.616 rows=10 loops=1)
   -  Sort  (cost=4.94..45557.46 rows=605 width=621) (actual 
time=70550.542..70550.569 rows=10 loops=1)
 Sort Key: lt.created_at
 -  Nested Loop  (cost=39.52..45527.99 rows=605 width=621) (actual 
time=2131.501..70548.313 rows=321 loops=1)
   -  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual 
time=0.406..0.615 rows=40 loops=1)
 -  Bitmap Heap Scan on relationships  (cost=4.33..39.49 
rows=10 width=4) (actual time=0.075..0.248 rows=40 loops=1)
   Recheck Cond: (user_id = 5)
   -  Bitmap Index Scan on 
relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10 width=0) 
(actual time=0.052..0.052 rows=40 loops=1)
 Index Cond: (user_id = 5)
   -  Index Scan using large_user_id_started_at_index on 
large_table lt  (cost=0.00..45474.29 rows=1134 width=621) (actual 
time=1762.067..1763.637 rows=8 loops=40)
 Index Cond: (lt.user_id = relationships.contact_id)
 Filter: (subplan)
 SubPlan
   -  Limit  (cost=0.00..34.04 rows=10 width=8) (actual 
time=0.048..0.147 rows=10 loops=69018)
 -  Index Scan Backward using 
large_user_id_created_at_index on large_table lt2  (cost=0.00..7721.24 
rows=2268 width=8) (actual time=0.040..0.087 rows=10 loops=69018)
   Index Cond: (user_id = $0)
 Total runtime: 70550.847 ms


The same plan is generated for the user with few matches and executes
very fast.


 If that doesn't work, you might have reached the point where you need to
 use some kind of record-keeping system to keep track of which records to
 look at.

Yes, I'm considering that unfortunately.

Seeing however that there are 2 different queries which result in very
efficient plans for one of the 2 different cases, but not the other,
makes me hope there is a way to tune the planner into always coming up
with the right plan. I'm not sure if I explained the problem well
enough and will see if I can come up with a reproduction case with
generated data.


Til

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-30 Thread Jignesh K. Shah
With CLOG 16 the drp[s comes at about 1150 users with the following lock 
stats

bash-3.00# ./4_lwlock_waits.d 16404

Lock IdMode   Count
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   2
 XidGenLock  Shared   4
  WALInsertLock   Exclusive   7
CLogControlLock  Shared   8
   WALWriteLock   Exclusive  46
  ProcArrayLock   Exclusive  64
CLogControlLock   Exclusive 263

Lock IdMode   Combined Time (ns)
 XidGenLock   Exclusive   528300
  ProcArrayLock  Shared   968800
  WALInsertLock   Exclusive  4090900
 XidGenLock  Shared 73987600
   WALWriteLock   Exclusive 86200700
  ProcArrayLock   Exclusive130756000
CLogControlLock  Shared240471000
CLogControlLock   Exclusive   4115158500

So I think 32  is a better option for CLogs before ProcArrayLock becomes 
the bottleneck.


Though I havent seen what we can do with ProcArrayLock problem.


Regards,
Jignesh



Jignesh K. Shah wrote:
Using CLOG Buffers 32 and the commit sibling check patch I still see a 
drop at 1200-1300 users..




bash-3.00# ./4_lwlock_waits.d 18250

Lock IdMode   Count
 XidGenLock  Shared   1
CLogControlLock  Shared   2
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   4
CLogControlLock   Exclusive  15
  WALInsertLock   Exclusive  18
   WALWriteLock   Exclusive  38
  ProcArrayLock   Exclusive  77

Lock IdMode   Combined Time (ns)
 XidGenLock  Shared88700
  WALInsertLock   Exclusive 69556000
  ProcArrayLock  Shared 95656800
 XidGenLock   Exclusive139634100
CLogControlLock   Exclusive148822200
CLogControlLock  Shared16163
   WALWriteLock   Exclusive332781800
  ProcArrayLock   Exclusive   5688265500

bash-3.00# ./4_lwlock_waits.d 18599

Lock IdMode   Count
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   3
 XidGenLock  Shared   4
CLogControlLock  Shared   5
  WALInsertLock   Exclusive  10
CLogControlLock   Exclusive  21
   WALWriteLock   Exclusive  28
  ProcArrayLock   Exclusive  54

Lock IdMode   Combined Time (ns)
 XidGenLock   Exclusive  5688800
  WALInsertLock   Exclusive 11424700
CLogControlLock  Shared 55589100
  ProcArrayLock  Shared135220400
   WALWriteLock   Exclusive177906900
 XidGenLock  Shared524146500
CLogControlLock   Exclusive524563900
  ProcArrayLock   Exclusive   5828744500

bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599

Lock IdMode   Count
  ProcArrayLock  Shared   1
  ProcArrayLock   Exclusive  52

Lock IdMode   Combined Time (ns)
  ProcArrayLock  Shared 41428300
  ProcArrayLock   Exclusive   3858386500

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1f0
 postgres`GetSnapshotData+0x120
 postgres`GetTransactionSnapshot+0x80
 postgres`PortalStart+0x198
 postgres`exec_bind_message+0x84c
 postgres`PostgresMain+0x17f8
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Shared 41428300

 postgres`LWLockAcquire+0x1f0
 postgres`CommitTransaction+0x104
 postgres`CommitTransactionCommand+0xbc
 postgres`finish_xact_command+0x78
 postgres`exec_execute_message+0x42c
 postgres`PostgresMain+0x1838
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Exclusive3858386500


-Jignesh


---(end of 

Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Nis Jørgensen
Tilmann Singer skrev:
 * Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]:
 It seems to me the subselect plan would benefit quite a bit from not
 returning all rows, but only the 10 latest for each user. I believe the
 problem is similar to what is discussed for UNIONs here:

 http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8

 which got implemented here:

 http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565

 It seems to me the planner in this case would actually need to push the
 limit into the nested loop, since we want the plan to take advantage of
 the index (using a backwards index scan). I am ready to be corrected though.
 
 If I understand that correctly than this means that it would benefit
 the planning for something like

 SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ...
 
 if any of q1 or q2 would satisfy the rows requested by limit early,
 instead of planning q1 and q2 without having the limit of the outer
 query an influence.

 Unfortunately I'm having problems making my q2 reasonably efficient in
 the first place, even before UNIONing it.

The second branch of your UNION is really equivalent to the following
pseudo_code:

contacts = SELECT contact_id FROM relations WHERE user_id = $id

sql = SELECT * FROM (
SELECT * FROM lt WHERE user_id = contacts[0]
UNION ALL
SELECT * FROM lt WHERE user_id = contacts[1]
.
.
.
) ORDER BY created_at LIMIT 10;

Currently, it seems the subqueries are fetching all rows.

Thus a plan which makes each of the subqueries aware of the LIMIT  might
be able to improve performance. Unlike the UNION case, it seems this
means making the subqueries aware that the plan is valid, not just
changing the cost estimate.


How does this imperative approach perform? I realize you probably
don't want to use this, but it would give us an idea whether we would be
able to get the speed we need by forcing this plan on pg.

 If that doesn't work, you might have reached the point where you need to
 use some kind of record-keeping system to keep track of which records to
 look at.
 
 Yes, I'm considering that unfortunately.
 
 Seeing however that there are 2 different queries which result in very
 efficient plans for one of the 2 different cases, but not the other,
 makes me hope there is a way to tune the planner into always coming up
 with the right plan. I'm not sure if I explained the problem well
 enough and will see if I can come up with a reproduction case with
 generated data.

I think the problem is that Postgresql does not have the necessary
statistics to determine which of the two plans will perform well. There
are basically two unknowns in the query:

- How many uninteresting records do we have to scan through before get
to the interesting ones (if using plan 1).
- How many matching rows do we find in relations

The first one it is not surprising that pg cannot estimate.

I am a little surprised that pg is not able to estimate the second one
better. Increasing the statistics settings might help in getting a
different plan.

I am also slightly surprised that the two equivalent formulations of the
query yield such vastly different query plans. In my experience, pg is
quite good at coming up with similar query plans for equivalent queries.
 You might want to fiddle with DISTINCT or indexing to make sure that
they are indeed logically equivalent.

Anyway, it seems likely that you will at some point run into the query
for many matches at the end of the table - which none of your plans will
be good at supplying. So perhaps you can just as well prepare for it now.


Nis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Questions on Tags table schema

2007-07-30 Thread Ron Mayer
Jay Kang wrote:
 Hello,
 
 I'm currently trying to decide on a database design for tags in my web
 2.0 application. The problem I'm facing is that I have 3 separate tables
 i.e. cars, planes, and schools. All three tables need to interact with
 the tags, so there will only be one universal set of tags for the three
 tables.

It strikes me that some tsearch2 ts_vector like datatype might
work well for this; depending on the types of queries you're doing.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] multicolumn index column order

2007-07-30 Thread Lew

valgog wrote:

On Jul 25, 2:14 am, Lew [EMAIL PROTECTED] wrote:

How about two indexes, one on each column?  Then the indexes will cooperate
when combined in a WHERE clause.
http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html

I don't believe the index makes a semantic difference with regard to ascending
or descending.  An index is used to locate records in the selection phase of a
query or modification command.

--
Lew


Ordered indexes (b-tree in this case) are also used to get the needed
record order and it is absolutely not necessary to have a WHARE clause
in your select statement to use them when you are using ORDER BY.


But does that affect anything when you ORDER BY foo ASC vs. when you ORDER 
BY foo DESC?


For use by ORDER BY, separate column indexes are an even better idea.

--
Lew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-30 Thread Dimitri
Luke,

ZFS tuning is not coming from general suggestion ideas, but from real
practice...

So,
  - limit ARC is the MUST for the moment to keep your database running
comfortable (specially DWH!)
  - 8K blocksize is chosen to read exactly one page when PG ask to
read one page - don't mix it with prefetch! when prefetch is detected,
ZFS will read next blocks without any demand from PG; but otherwise
why you need to read more  pages each time PG asking only one?...
  - prefetch of course not needed for OLTP, but helps on OLAP/DWH, agree :)

Rgds,
-Dimitri


On 7/22/07, Luke Lonergan [EMAIL PROTECTED] wrote:
 Josh,

 On 7/20/07 4:26 PM, Josh Berkus [EMAIL PROTECTED] wrote:

  There are some specific tuning parameters you need for ZFS or performance
  is going to suck.
 
  http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
  (scroll down to PostgreSQL)
  http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
  http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
 
  You also don't say anything about what kind of workload you're running.


 I think we're assuming that the workload is OLTP when putting these tuning
 guidelines forward.  Note that the ZFS tuning guidance referred to in this
 bug article recommend turning vdev prefetching off for random I/O
 (databases).  This is exactly the opposite of what we should do for OLAP
 workloads.

 Also, the lore that setting recordsize on ZFS is mandatory for good database
 performance is similarly not appropriate for OLAP work.

 If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of the
 tuning information from Sun that refers generically to database.  The
 untuned ZFS performance should be far better in those cases.  Specifically,
 these three should be ignored:
 - (ignore this) limit ARC memory use
 - (ignore this) set recordsize to 8K
 - (ignore this) turn off vdev prefetch

 - Luke



 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-30 Thread Luke Lonergan
Hi Dimitri,

Can you post some experimental evidence that these settings matter?

At this point we have several hundred terabytes of PG databases running on ZFS, 
all of them setting speed records for data warehouses.

We did testing on these settings last year on S10U2, perhaps things have 
changed since then.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Dimitri [mailto:[EMAIL PROTECTED]
Sent:   Monday, July 30, 2007 05:26 PM Eastern Standard Time
To: Luke Lonergan
Cc: Josh Berkus; pgsql-performance@postgresql.org; Marc Mamin
Subject:Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

Luke,

ZFS tuning is not coming from general suggestion ideas, but from real
practice...

So,
  - limit ARC is the MUST for the moment to keep your database running
comfortable (specially DWH!)
  - 8K blocksize is chosen to read exactly one page when PG ask to
read one page - don't mix it with prefetch! when prefetch is detected,
ZFS will read next blocks without any demand from PG; but otherwise
why you need to read more  pages each time PG asking only one?...
  - prefetch of course not needed for OLTP, but helps on OLAP/DWH, agree :)

Rgds,
-Dimitri


On 7/22/07, Luke Lonergan [EMAIL PROTECTED] wrote:
 Josh,

 On 7/20/07 4:26 PM, Josh Berkus [EMAIL PROTECTED] wrote:

  There are some specific tuning parameters you need for ZFS or performance
  is going to suck.
 
  http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
  (scroll down to PostgreSQL)
  http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
  http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
 
  You also don't say anything about what kind of workload you're running.


 I think we're assuming that the workload is OLTP when putting these tuning
 guidelines forward.  Note that the ZFS tuning guidance referred to in this
 bug article recommend turning vdev prefetching off for random I/O
 (databases).  This is exactly the opposite of what we should do for OLAP
 workloads.

 Also, the lore that setting recordsize on ZFS is mandatory for good database
 performance is similarly not appropriate for OLAP work.

 If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of the
 tuning information from Sun that refers generically to database.  The
 untuned ZFS performance should be far better in those cases.  Specifically,
 these three should be ignored:
 - (ignore this) limit ARC memory use
 - (ignore this) set recordsize to 8K
 - (ignore this) turn off vdev prefetch

 - Luke



 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match



[PERFORM] Query optimization....

2007-07-30 Thread Karl Denninger
In a followup to a question I put forward here on performance which I 
traced to the stats bug (and fixed it).  Now I'm trying to optimize 
that query and... I'm getting confused fast...


I have the following (fairly complex) statement which is run with some 
frequency:


select post.forum, post.subject, post.replied from post where toppost = 
1 and (replied  (select lastview from forumlog where login='someone' 
and forum=post.forum and number is null)) is not false AND (replied  
(select lastview from forumlog where login='someone' and 
forum=post.forum and number=post.number)) is not false order by pinned 
desc, replied desc;


This gives me exactly what I'm looking for BUT can be quite slow.

The forumlog table has one tuple for each post and user; it has the 
fields forum, number, login and lastview.  The post items have 
a forum, number and replied field (which is used to match the 
lastview one.) 

When you look at a post (which may have replies) the application 
updates your existing entry in that table if there is one, or INSERTs a 
new tuple if not.


Therefore, for each post you have viewed, there is a tuple in the 
forumlog table which represents the last time you looked at that item.


The problem is that for a person who has NOT visited a specific thread 
of discussion, there is no forumlog entry for that person and post in 
the table.  Thus, to get all posts which (1) you've not seen at all, or 
(2) you've seen but someone has added to since you saw them, the above 
complex query is what I've come up with; there may be a null table 
entry which a wildcard match if its present - if there is no match 
then the item also must treated as new.  The above statement works - but 
its slow.


The following query is VERY fast but only returns those in which there 
IS an entry in the table (e.g. you've visited the item at least once)


select post.forum, post.subject, post.replied from post, forumlog where 
post.number = forumlog.number and post.toppost = 1 and post.replied  
forumlog.lastview and forumlog.login='someone' order by pinned desc, 
replied desc;


What I haven't been able to figure out is how to structure a query that 
is both fast and will return the posts for which you DO NOT have a 
matching entry in the forumlog table for the specific post but DO 
either (1) match the null number entry (that is, they're posted later 
than that) OR (2) have no match at all.  (The first statement matches 
these other two cases)


Any ideas?  (Its ok if that query(s) are separate; in other words, its 
cool if I have to execute two or even three queries and get the results 
separately - in fact, that might be preferrable in some circumstances)


Ideas?

--
Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] disk filling up

2007-07-30 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 third (but unlikely) possibility is there are various dropped tables,
 etc which need to be deleted but there are stale postgresql processes
 holding on to the fd.  This would only happen following a postmaster
 crash or some other bizarre scenario, but i've seen it on production
 box.

Brent Reid reported something similar in bug #3483 but I'm still quite
unclear how it'd happen in any realistic scenario.  Can you create a
test case?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Vacuum looping?

2007-07-30 Thread Decibel!

On Jul 30, 2007, at 9:04 AM, Steven Flatt wrote:
On 7/28/07, Jim C. Nasby [EMAIL PROTECTED] wrote: What are your  
vacuum_cost_* settings? If you set those too aggressively

you'll be in big trouble.

autovacuum_vacuum_cost_delay = 100


Wow, that's *really* high. I don't think I've ever set it higher than  
25. I'd cut it way back.



autovacuum_vacuum_cost_limit = 200

These are generally fine, autovacuum keeps up, and there is minimal  
impact on the system.


vacuum_cost_delay = 100
vacuum_cost_limit = 1000

We set this cost_limit a little higher so that, in the few cases  
where we have to intervene manually, vacuum runs faster.


IIRC, when the cost delay was initially introduced (8.0), someone did  
testing and decided that the cost limit of 200 was optimal, so I  
wouldn't go changing it like that without good reason.


Normally, I'll use a delay of 10ms on good disk hardware, and 20ms on  
slower hardware.

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster