Re: [HACKERS] Patch committers

2009-11-11 Thread Rick Gigger
Couldn't you have a policy that every patch is reviewed first by someone who 
wants to be an expert in that area, and then by someone who is currently an 
expert.  Then you have the best of both worlds.  The patch is reviewed by 
someone will make sure it won't cause problems, and the want to be expert gets 
training and will eventually be able to be the expert.

On Nov 11, 2009, at 12:58 PM, Josh Berkus wrote:

 
 True, but even I avoid patches I don't understand, and practicing by
 applying them could lead to a very undesirable outcome, e.g.
 instability.
 
 Right, but being responsible for applying the patch is the only real
 incentive anyone has to learn enough to understand its effects.  If a
 contributor is not responsible, they can always think oh, Tom will get
 it, I'll learn that later.
 
 --Josh Berkus
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.5 release timetable, again

2009-08-26 Thread Rick Gigger


On Aug 24, 2009, at 9:46 PM, Robert Haas wrote:

On Mon, Aug 24, 2009 at 10:15 PM, David Fetterda...@fetter.org  
wrote:

On Mon, Aug 24, 2009 at 08:02:31PM -0400, Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:

That is a slightly alarmist.  Who are we going to lose these
users to?



Drizzle.  MySQL forks.  CouchDB.  Any database which has
replication which you don't need a professional DBA to understand.
Whether or not it works.


You haven't explained why we'd lose such folk next year when we
haven't lost them already.  MySQL has had replication (or at least
has checked off the bullet point ;-)) for years.  I'd be seriously
surprised if any of the forks will offer significantly better
replication than is there now, so the competitive situation is not
changing in that regard.

It is true that we're missing a chance to pull some folks away while
the situation on that side of the fence is so messy.  But I don't
see our situation getting worse because of that, just not getting
better.


One possible reason that replication is more critical now than it was  
a year ago is the rise in cloud computing.  The ability to fire up  
instances on demand is much more useful when some of those boxes can  
be database servers and those databases servers can replicate the  
primary database and start doing something useful.  As far as I can  
tell this one feature alone is the one thing that makes it hard to  
convince people to migrate away from Mysql despite it's demonstrable  
inferiority in many other areas.  Postgres seems to be winning  
mindshare as the real and reliable database of choice for people who  
are serious about their data.  But for many, many businesses (many of  
whom are really not that serious about their data) easy to set up  
replication is just too big of a draw, such that you can't get them to  
consider anything without it.


I don't know if current postgres users are really going to switch over  
existing projects that were built on postgres, but for new apps  
running on EC2 or similar I would not be surprised to see people  
choosing mysql over postgres solely on this one issue.  Databases  
scalability is becoming and issue for more and more businesses and  
others are filling in the gap.  If postgres could combine it's current  
deserved reputation for having a robust feature set, standards  
compliance, high performance, reliability, stability, etc, with easy  
to use replication it would be be a slam dunk, no-brainer decision to  
go with postgres on just about anything.


Just my 2 cents.

Rick

P.S. I don't actually use mysql anywhere but I know many who do and  
replication is always the sticking point.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.5 release timetable, again

2009-08-26 Thread Rick Gigger


On Aug 26, 2009, at 8:17 AM, Jean-Michel Pouré wrote:


Le mercredi 26 août 2009 à 01:36 -0600, Rick Gigger a écrit :

One possible reason that replication is more critical now than it
was
a year ago is the rise in cloud computing.  The ability to fire up
instances on demand is much more useful when some of those boxes can
be database servers and those databases servers can replicate the
primary database and start doing something useful.  As far as I can
tell this one feature alone is the one thing that makes it hard to
convince people to migrate away from Mysql despite it's demonstrable
inferiority in many other areas.


I think you should have a deep look at
these two manuals that I wrote for Drupal:

Guidelines for writing MySQL and PostgreSQL compliant SQL
http://drupal.org/node/14

and

Quidelines for writing SQL efficient code:
http://drupal.org/node/559302

I have been using PostgreSQL since 1998. I took part in the  
development
of pgAdmin 1 and pgAdmin 2. I am very proud of the PostgreSQL  
community,

but I think it should fix some important issues in the domain of SQL
compliance and compatibility.

When reading developers comments on Drupal web site, it seems that  
there

is deep misunderstanding between developers and SQL databases. I would
say that 1% of developers know database technology. For example, most
Drupal developers think that an INNER JOIN is faster than a LEFT JOIN.

The reality of facts is that developers will not even test PostgreSQL
and stop using it after the first SQL warning or error.

So I would recommend focussing on usability.

Then you can work on replication and materilized views. You probably
know that a cloud requires several computers. With materialized  
view, a

single computer can perform 100 times better. I see plenty of of
possibilities to improve speed using materialized views.

But first and firstly ... focus on usability. Then make a pre- 
release of

a PostgreSQL 8.4.2 release or so ... We need to wipe out this MySQL
issue once for all.

If there is a compat MySQL mode or functions, then include it in core.
This is too important for PostgreSQL success.

Why MySQL usability is achieved add materialized views and MySQL is  
dead

in the performance statistics, beaten 99% by PostgreSQL.


This may be your experience and maybe there are stats to back this  
up.  I was simply saying, that in my experience I have consulted with  
companies using cloud computing services (ie EC2) and mysql.  They are  
performance conscious.  When they have to fire up more boxes, they pay  
for it immediately.  When they ran into problems getting good  
performance out of mysql it was very easy to show them how to get  
better performance using postgres.  (Often this was just: do the same  
thing in postgres, and look, it's faster!).  But they also rely on  
replication to be able to scale.  And without it they just weren't  
interested.


Porting any project has it's own set of issues, I was speaking to the  
case where people are evaluating databases for a new project.  I was  
not however trying to make any kind of statement as too how important  
it is as compared to any other specific feature.


I was just trying to say that in my experience current trends indicate  
that having easy to set up replication is getting more important over  
time, not less, and not the same.  Other features may be more  
important.  Getting it right is certainly more important that getting  
it soon (for reasonable values of soon and right of course IMHO).   
The experience of others my be totally different, but that is mine.


Just wanted to clarify what I was actually trying to say because this  
response seems to indicate that I didn't make certain things clear.


- Rick




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Rick Gigger

On Jul 16, 2009, at 12:07 AM, Heikki Linnakangas wrote:


Dimitri Fontaine wrote:

Le 15 juil. 09 à 23:03, Heikki Linnakangas a écrit :
Furthermore, the counter-argument against having the primary
able to send data from the archives to some standby is that it should
still work when primary's dead, but as this is only done in the setup
phase, I don't see that being able to continue preparing a not-yet- 
ready

standby against a dead primary is buying us anything.


The situation arises also when the standby falls badly behind. A  
simple

solution to that is to add a switch in the master to specify always
keep X MB of WAL in pg_xlog. The standby will then still find it in
pg_xlog, making it harder for a standby to fall so much behind that it
can't find the WAL it needs in the primary anymore. Tom suggested that
we can just give up and re-sync with a new base backup, but that  
really

requires built-in base backup capability, and is only practical for
small databases.


If you use an rsync like algorithm for doing the base backups wouldn't  
that increase the size of the database for which it would still be  
practical to just re-sync?  Couldn't you in fact sync a very large  
database if the amount of actual change in the files was a small  
percentage of the total size?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep for CommitFest 2009-07

2009-07-16 Thread Rick Gigger

On Jul 16, 2009, at 11:09 AM, Greg Stark wrote:


On Thu, Jul 16, 2009 at 4:41 PM, Heikki
Linnakangasheikki.linnakan...@enterprisedb.com wrote:

Rick Gigger wrote:
If you use an rsync like algorithm for doing the base backups  
wouldn't

that increase the size of the database for which it would still be
practical to just re-sync?  Couldn't you in fact sync a very large
database if the amount of actual change in the files was a small
percentage of the total size?


It would certainly help to reduce the network traffic, though you'd
still have to scan all the data to see what has changed.


The fundamental problem with pushing users to start over with a new
base backup is that there's no relationship between the size of the
WAL and the size of the database.

You can plausibly have a system with extremely high transaction rate
generating WAL very quickly, but where the whole database fits in a
few hundred megabytes. In that case you could be behind by only a few
minutes and have it be faster to take a new base backup.

Or you could have a petabyte database which is rarely updated. In
which case it might be faster to apply weeks' worth of logs than to
try to take a base backup.

Only the sysadmin is actually going to know which makes more sense.
Unless we start tieing WAL parameters to the database size or
something like that.


Once again wouldn't an rsync like algorithm help here.  Couldn't you  
have the default be to just create a new base backup for them , but  
then allow you to specify an existing base backup if you've already  
got one?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4 release planning

2009-01-27 Thread Rick Gigger


On Jan 27, 2009, at 2:41 AM, Mark Kirkwood wrote:


Dave Page wrote:

On Mon, Jan 26, 2009 at 8:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:


Josh Berkus j...@agliodbs.com writes:


So, some feedback to make this decision more difficult:
 Users: care about HS more than anything else in the world.

I don't think this is correct.  There are certainly a lot of users  
who
would like an in-core replication solution, but HS by itself is  
not that
--- you also need (near) real-time log shipping, which we have  
already

decided to punt to 8.5.  That being the case, I think the argument
that HS is a must-have feature for 8.4 is actually rather weak.



I don't buy that. Sure, sync-rep would be the icing on the cake, but
HS with a small archive_timeout (even of the order of 10 or 15
minutes) would have been extremely useful on a number of systems I
used to run.




+1

I have customers who want exactly this - a simple to administer,  
query-able slave that does DDL transparently and is up to date  
within a controllable time frame. Bluntly, it looks like a killer  
feature.


regards


+1

So, I am just a lurker here.  I mostly follow hackers to find out if  
any new features are coming out that will make it worth upgrading, and  
to keep up on any backwards compatibly changes that I should be aware  
of.  I am on 8.1 and it performs well and no features added since then  
have seemed worth downing the whole system to do the upgrade for.   
However, a simple to administer, query-able slave that does DDL  
transparently and is up to date within a controllable time frame is  
something that would undoubtably make it worth the upgrade.  Whatever  
version this feature makes it into will probably be the one I will  
upgrade to.


Of course this is just one developer giving you anecdotal evidence and  
there are obviously many concerns other than just how in demand it is,  
but I just wanted to register my vote that this is a very sought after  
feature and it is hard for me to imagine a situation (especially for a  
24x7 web application) where having an easy to admin hot standby server  
wouldn't help your local DBA sleep better at night.


Rick

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger

Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.


In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between replying from within the tracker  
and replying from within your normal mail client.  But you can still  
use either and the communication doesn't get scattered all over the  
place.  There would also be an option to let you choose another  
tracker item to merge with the current one so that relevant threads  
can be aggregated into the same tracker item.


At this point you could have a page that lists all unassigned tracker  
items so that someone looking for some work to do could quickly scan a  
short easy to read list and pick something.


Problem #3: When a new patch creator posts a new patch they need  
feedback quickly so they know at least that they did the right thing  
and someone is aware of the patch.
Solution 3: The tracker has  a list of all new threads that haven't  
been looked at.  Someone can then go through the list of unprocessed  
items and see if it has a patch. If it does he can flag that item as a  
patch submission and it will immediately show up on the list for patch  
reviewers to look through.  It can also be assigned right there to a  
specific fest but will default to the soonest one.  Once it is flagged  
an email will automatically go out telling them their patch is pending  
review.


Problem #4: Patches may or may not, on rare 

Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger

Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.



In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between replying from within the tracker  
and replying from within your normal mail client.  But you can still  
use either and the communication doesn't get scattered all over the  
place.  There would also be an option to let you choose another  
tracker item to merge with the current one so that relevant threads  
can be aggregated into the same tracker item.


At this point you could have a page that lists all unassigned tracker  
items so that someone looking for some work to do could quickly scan a  
short easy to read list and pick something.


Problem #3: When a new patch creator posts a new patch they need  
feedback quickly so they know at least that they did the right thing  
and someone is aware of the patch.
Solution 3: The tracker has  a list of all new threads that haven't  
been looked at.  Someone can then go through the list of unprocessed  
items and see if it has a patch. If it does he can flag that item as a  
patch submission and it will immediately show up on the list for patch  
reviewers to look through.  It can also be assigned right there to a  
specific fest but will default to the soonest one.  Once it is flagged  
an email will automatically go out telling them their patch is pending  
review.


Problem #4: Patches may or may not, on rare 

Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger


On Apr 11, 2008, at 9:30 AM, Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
Personally I don't think we *know* what we want to do and that's  
why the wiki

is a good interim tool.


Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.

Perhaps an example will help make the point.  Throughout this past  
fest

I was desperately wishing for a way to group and label related issues
--- we had a pile of items focused around index AM API questions, and
another pile focused around mapping problems (FSM/DSM/Visibility
map/etc), and being able to put those together would have made it a
lot clearer what needed to be looked at together with what else.
On a wiki page it'd have been no trouble at all to create ad-hoc
sub-headings and sort the individual items into whatever grouping and
ordering made sense (in fact, Alvaro did some of that on his own).
It was basically impossible to do any such thing with Bruce's mhonarc
page, though he did kluge up some ways to partially address the issue
by merging threads.  The bug trackers I've dealt with haven't got much
flexibility in this respect either --- the sorts of global views you  
can

get are entirely determined by the tool.  I'm fairly certain that a
tracker designed around the assumption that different entries are
essentially independent isn't going to be very helpful.


In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between 

Re: [HACKERS] should I worry?

2007-11-05 Thread Rick Gigger
Doesn't DROP TRIGGER require the name of the trigger?  He says they  
are unnamed.  How then does he drop them?



On Nov 5, 2007, at 6:31 AM, Tom Lane wrote:


[EMAIL PROTECTED] writes:

On Sun, 4 Nov 2007, Tom Lane wrote:
So you have a *bunch* of partially broken FK constraints in that  
source

database.



I just talk to my customer and he/we'll make a big batch deleting and
recreating all foreign keys on 8.2.5.
The question is, how do we get rid of those useless unamed  
triggers?


DROP TRIGGER should be fine.

regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that  
your

  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] should I worry?

2007-11-05 Thread Rick Gigger

Ah, yes it was the quotes.  I guess that makes me a newbie.  :)


On Nov 5, 2007, at 1:53 PM, Tom Lane wrote:


Heikki Linnakangas [EMAIL PROTECTED] writes:

Rick Gigger wrote:
Doesn't DROP TRIGGER require the name of the trigger?  He says  
they are

unnamed.  How then does he drop them?



They're not really unnamed. pg_dump just replaces the real name with
unnamed.


And \d will show the real names of the triggers, so it's not really
that hard to drop them:

u=# \d t1
Table public.t1
Column |  Type   | Modifiers
+-+---
f1 | integer | not null
Indexes:
t1_pkey PRIMARY KEY, btree (f1)
Triggers:
RI_ConstraintTrigger_229629 AFTER DELETE ON t1 FROM t2 NOT  
DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE  
RI_FKey_noaction_del('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2',  
'f1')
RI_ConstraintTrigger_229630 AFTER UPDATE ON t1 FROM t2 NOT  
DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE  
RI_FKey_noaction_upd('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2',  
'f1')


u=# drop trigger RI_ConstraintTrigger_229629 on t1;
DROP TRIGGER

I do recall newbies forgetting to double-quote the mixed-case trigger
names when this came up in times past, though.

regards, tom lane




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-02-10 Thread Rick Gigger

Jim Nasby wrote:

On Feb 5, 2007, at 12:53 PM, Andrew Hammond wrote:

On Jan 26, 2:38 pm, [EMAIL PROTECTED] (Tom Lane) wrote:

Rick Gigger [EMAIL PROTECTED] writes:

I thought that the following todo item just barely missed 8.2:
Allow a warm standby system to also allow read-only statements [pitr]


No, it's a someday-wishlist item; the work involved is not small.


Slony1 has supported log-shipping replication for about a year now. It
provides similar functionality.


Not really

1) It's not possible for a PITR 'slave' to fall behind to a state where 
it will never catch up, unless it's just on inadequate hardware. Same 
isn't true with slony.

2) PITR handles DDL seamlessly
3) PITR is *much* simpler to configure and maintain


Which is why I was hoping for a PITR based solution.  Oh well, I will 
have to figure out what is my best option now that I know it will not be 
available any time in the near future.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-02-06 Thread Rick Gigger

Tom Lane wrote:

Rick Gigger [EMAIL PROTECTED] writes:

I thought that the following todo item just barely missed 8.2:
Allow a warm standby system to also allow read-only statements [pitr]


No, it's a someday-wishlist item; the work involved is not small.


Thanks,very much for the info.  I'm not sure why I thought that one was 
near completion.  I can now come up with an alternative plan.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-02-06 Thread Rick Gigger

Andrew Hammond wrote:

On Jan 26, 2:38 pm, [EMAIL PROTECTED] (Tom Lane) wrote:

Rick Gigger [EMAIL PROTECTED] writes:

I thought that the following todo item just barely missed 8.2:
Allow a warm standby system to also allow read-only statements [pitr]

No, it's a someday-wishlist item; the work involved is not small.


Slony1 has supported log-shipping replication for about a year now. It
provides similar functionality.


Yes but Slony is much more complicated, has significantly more 
administrative overhead, and as far as I can tell is much more likely to 
impact my production system than this method would.


Slony is a lot more flexible and powerful but I don't need that.  I just 
want a backup that is reasonably up to date that I can do queries on and 
 and failover to in case of hardware failure on my primary db.


I am going to be looking more closely at Slony now that it seems to be 
the best option for this.  I am not looking forward to how it will 
complicate my life though. (Not saying it is bad, just complicated.  At 
least more complicated than simple postgres log shipping.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-02-06 Thread Rick Gigger

Gregory Stark wrote:

Rick Gigger [EMAIL PROTECTED] writes:


I thought that the following todo item just barely missed 8.2:

Allow a warm standby system to also allow read-only statements [pitr]
This is useful for checking PITR recovery.


No, nobody worked on it prior to 8.2. Afaik there's still nobody working on
it. It's not trivial. Consider for example that your read-only query would
still need to come up with a snapshot and there's nowhere currently to find
out what transactions were in-progress at that point in the log replay.

There's also the problem that currently WAL replay doesn't take have allow for
any locking so there's no way for read-only queries to protect themselves
against the WAL replay thrashing the buffer pages they're looking at.

It does seem to be doable and I agree it would be a great feature, but as far
as I know nobody's working on it for 8.3.


Thanks again for the update.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Rick Gigger

I thought that the following todo item just barely missed 8.2:

Allow a warm standby system to also allow read-only statements [pitr]
This is useful for checking PITR recovery.

I assume it's not on this list either because it is already complete and 
slated for 8.3, or it is going to take too long to make it into 8.3 or 
it has been rejected as a good idea entirely or it's just not big enough 
of a priority for anyone to push for it to get into 8.3.


It is the one feature that would make the most difference to me as it 
would allow me to very easily set up a server for reporting purposes 
that could always be within minutes of the live data.  I know there are 
other solutions for this but if this feature is just around the corner 
it would be my first choice.


Does anyone know the status of this feature?

Thanks,

Rick Gigger




Joshua D. Drake wrote:

Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core
Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake






---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.2 features status

2006-08-05 Thread Rick Gigger
If people are going to start listing features they want here's some  
things I think would be nice.  I have no idea though if they would be  
useful to anyone else:


1) hierarchical / recursive queries.  I realize it's just been  
discussed at length but since there was some question as to whether  
or not there's demand for it so I am just weighing in that I think  
there is.  I have to deal with hierarchy tables all the time and I  
simply have several standard methods of dealing with them depending  
on the data set / format.  But they all suck.  I've just gotten use  
to using the workarounds since there is nothing else.  If you are not  
hearing the screams it's just because I think it's just become a fact  
of life for most people (unless you're using oracle) that you've just  
got to work around it.  And everyone already has some code to do this  
and they've already done it everywhere it needs to be done.  And as  
long as you're a little bit clever you can always work around it  
without taking a big performance hit.  But it would sure be nice to  
have next time I have to deal with a tree table.


2) PITR on a per database basis.  I think this would be nice but I'm  
guessing that the work involved is big and that few people really  
care or need it, so it will probably never happen.


3) A further refinement of PITR where some sort of deamon ships small  
log segments as they are created so that the hot standby doesn't have  
to be updated in 16MB increments or have to wait for some timeout to  
occur.  It could always be up to the minute data.


4) All the Greenplum Bizgress MPP goodness.  In reality (and I don't  
know if bizgress mpp can actually do this) I'd like to have a cluster  
of cheap boxes.  I'd like to install postgres on all of them and  
configure them in such a way that it automatically partitions and  
mirrors each table so that each piece of data is always on two boxes  
and large tables and indexes get divided up intelligently.  Sort of  
like a raid10 on the database level.  This way any one box could die  
and I would be fine.  Enormous queries could be handled efficiently  
and I could scale up by just dropping in new hardware.


Maybe greeenplum has done this.  Maybe we will get their changes soon  
enough, maybe not.  Maybe this sort of functionality will never  
happen.  My guess is that all the little bit's a pieces of this will  
trickle in over the next several years and this sort of setup will be  
slowly converged on over time as lot's of little things come  
together.  Table spaces and constraint exclusion come to mind here as  
things that could eventually evolve to contribute to a larger solution.


5) Somehow make it so I NEVER HAVE TO THINK ABOUT OR DEAL WITH VACUUM  
AGAIN.  Once I get everything set up right everything works great but  
I'm sure if there's one thing I think everyone would love it would be  
getting postgres to the point where you don't even need to ship  
vacuumdb because there's no way the user could outsmart postgres's  
attempts to do garbage collection on it's own.


6) genuine updatable views.  such that you just add an updatable  
keyword when you create the view and it's automagically updatable.   
I'm guessing that we'll get something like that, but its real magic  
will be throwing an error to tell you when you try to make a view  
updatable and it can't figure out how to make the rules properly.


7) allow some way to extract the data files from a single database  
and insert them into another database cluster.  In many cases it  
would be a lot faster to copy the datafiles across the network than  
it is to dump, copy dump file, reload.


8) some sort of standard hooks to be used for replication.  I guess  
when the replication people all get their heads together and tell the  
core developers what they all need something like this could evolve.


Like I said, postgres more than satisfies my needs.  I am  
especially happy when you factor in the cost of the software (free),  
and the quality of the community support (excellent).


And you can definitely say that the missing list is shrinking.  But  
I think of it like this.  There are tiers of database functionality  
that different people need:
A) Correct me if I'm wrong but as great as postgres is there are  
still people out there that MUST HAVE Oracle or DB2 to get done what  
they need to get done.  They just do things that the others can't.   
They may be expensive.  They may suck to use and administer but the  
simple fact is that they have features that people need that are not  
offered in less expensive databases.
B) Very, very powerful databases but lack the biggest, most  
complicated enterprise features.
C) Light weight db for taking care of the basic need to store data  
and query it with sql. (some would call these toy databases)
D) databases which are experimental, unreliable or have other limits  
that make them not practical compared with the other options


I 

Re: [HACKERS] 8.2 features status

2006-08-04 Thread Rick Gigger
This has been a very interesting thread, if for no other reason then  
to just catalog all of the changes going into 8.2.  I am going to be  
changing some hardware around so I need to decide if I want to a)  
change the hardware now and don't bother with 8.2, b) wait to upgrade  
hardware and do the upgrade to 8.2 at the same time, c) upgrade the  
hardware now and then upgrade to 8.2 when it is released.


What I do basically depends on how much benefit I'm going to get from  
8.2 and whether it's worth planning my hardware upgrades around or  
incurring additional downtime in order to do the postgres upgrade.   
Doing a dump/reload is not a problem at all, it doesn't take that  
long with the data I've got.  It's just having to kick everyone off  
the system and shut everything down in order to do it.


My applications are not that demanding on the DB so there isn't  
anything that I NEED that postgres doesn't already have.  Because  
of this I stayed on 7.3 for way, way too long.  It just didn't seem  
worth the effort to do the upgrade or the additional testing against  
each postgres version to make sure it didn't break my apps.  But I  
finally bit the bullet and upgraded straight to 8.1.  I couldn't  
believe what a huge performance difference it made even though I  
didn't use any of the big headlining features. (ie PITR, two-phase  
commit, etc).  So I'm sure that even though I don't even understand  
what most of the items on toms huge list are, and I certainly don't  
understand where they will come into play for me, I'm sure that once  
I've upgraded and I use it for a while, I'll be very glad I did.


That being said I think that Bruce has a point in that there is a  
certain class of features (which seems hard to define) which existed  
in 8.0 and 8.1 but not 8.2.  I would define it like this:


A) There are features that PHBs would look for.
B) There are features that a casual database user would look for.
C) There are features that an experienced database user would look  
for / understand
D) and then there are features that a database guru/developer of  
database software would understand.


Features of class A contribute to the bullet point lists, the buzz  
word checklists, etc and are easily used for marketing.  For instance  
if someone were to package up postgres and put it in a box you would  
not have a list of new features in 8.2! that included lazy vacuums  
are ignored by other processes.  Only an existing postgres user who  
has had a setup complex enough to need a tricky vacuuming strategy  
would ever even know what that meant much less care about it.  It  
might be life and death for many users, but it doesn't go on the back  
of the box.


So if you define major features as class A features.  In this case  
major doesn't mean important or useful or difficult to implement,  
just that they are the sort of features that one might be told to  
look for when shopping for a database.  So in terms of marketing  
PITR, two phase commit,  WIN32 support were very much major features.


If people had expectations that are not being met it could be because  
8.0 and 8.1 had so many of these headlining, market friendly,  
buzzword compliant major features.  It doesn't make it any less  
impressive technically or less useful for the actual users of the  
database but it DOES make a difference as to how this release will be  
perceived by a lot of people.  Not that it's a problem, but many  
people I think will see this release as less Major than the 8.0 or  
8.1 releases.  I think this is reflected in the fact that 8.0 was  
picked as the version to jump up to 8.0 instead of 7.5.


I will upgrade at some point but mostly because experience has taught  
me that each release of postgres is significantly better than the  
last one, even if I won't see how until I actually use it in production.


That being said I think that two of the not-yet-reviewed features are  
just as major as the major features from the past two releases.


1) updatable views - I won't really use this but it just seems like  
one of those features that people use when doing rdbms features  
comparison charts.  I think that having updatable views will be  
considered by the masses to be a major features


2) restartable recovery (allow checkpoints for a hot-standby server)  
- Having the ability to have a hot standby database is a HUGE feature  
in my book.  Maybe I'm just biased because it's a feature that I  
actually need but I think it's very big.  Yes you can sort of do the  
same thing or something better with slony but this is what I really  
want (and I'm guessing that this will get used A LOT if it makes it  
in).  And it's bulit in unlike slony.  And it seems much easier to  
set up and maintain and less likely to have problems or complicate  
things than slony.  In terms of having a setup with no single point  
of failure this goes a long way.  And it builds on something that I  
will want set up anyway 

Re: pg_upgrade (was: [HACKERS] 8.2 features status)

2006-08-04 Thread Rick Gigger

I had a few thoughts on this issue:

The objective is to smoothly upgrade to the new version with minimal  
downtime.


The different proposals as far as I can see are as follows:

Proposal A - the big one time reformatting
1) shutdown the db
2) run a command that upgrades the data directory to the new format
3) start up the new postgres version with the new data dir

Pros: only pg_upgrade (or whatever it's called) needs to know about  
the old and new formats, each version of postgres knows about it's  
format and that's it.  The postgres code stays clean
cons: your database is down while the upgrade takes place.  This  
sucks because the people who need this are the same people who are  
trying to avoid downtime.  It's faster than a dump/reload but it  
doesn't completely solve the problem, it just mitigates it.


Proposal B - the gradual upgrade
1) shutdown the db
2) start it back up with the new version of postgres
3) the new postgres version upgrades things in place as needed

Pros: very short downtime.  only the time to shutdown the postgres  
version and start up the new one
cons: postgres code gets filled with cruft.  each version has to know  
about the old versions on disk data format and how to upgrade it.   
Until it is finished you will be left with a database that is part  
old format, part new format.  This could introduce bugs for people  
who never needed the feature in the first place.


Here is another proposal that I haven't heard anyone else suggest.   
My apologies in advance if it's obviously not workable or has already  
be discussed.


Proposal C - PITR with in on the fly disk upgrades
1) setup PITR
2) run pg_upgrade on your latest backed up data directories
3) start up the new pg on that data directory in restartable  
recovery / read-only / hot-standby mode
4) update the recovery log importer so that it can update the log  
files on the fly as it applies them

5) failover to the hot standby as you normally would

Pros: essentially no downtime, just any incidental time needed for  
the failover to occur.
	cruft in postgres main codebase is mimimized.  It's limited to the  
log importer.  All other parts of postgres are unaffected
Cons: requires another server or double the disk space on the  
original server.  Is this a problem for people with databases so  
large that a dump reload is unacceptable?
	Perhaps there are technical issues with postgres that I don't  
understand that would make this too hard.
	Maybe it would take to long to update each log file as it's applied  
so it wouldn't be able to catch up.


Oh yeah there's another way
Proposal D - Use slony
But I figured since that's been working for a long time, if slony  
solved their problem then they wouldn't be looking for something else.


I have no need for this feature as a dump reload is not a problem for  
me.  I've always wondered though if that was a feasible answer to  
this problem.  Each time it crops up people propose solutions A and B  
but never C.



On Aug 4, 2006, at 1:30 PM, Jim C. Nasby wrote:


On Fri, Aug 04, 2006 at 02:12:16PM -0400, Stephen Frost wrote:

* Jim C. Nasby ([EMAIL PROTECTED]) wrote:

On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote:

* In-place upgrades (pg_upgrade)


BTW, I may get Sun to contribute an engineer for this; will get  
you posted.


How would such a thing handle changes to page formats?


Couldn't this be done by converting a table/partial-table at a time?
It wouldn't be something which could run while the system is live,  
but

it'd probably take less time than dump/restore and wouldn't require
double the disk space of the whole database... no?


True, but if you're going to go about creating code that can deal  
with 2
different versions of on-disk data, why not go one better: put that  
code

into the database itself, so that pages are converted on-the-fly as
they're dirtied. That way you have *no* downtime (or almost no,  
anyway).

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.2 feature set

2006-08-02 Thread Rick Gigger
Sorry if this is the wrong place for this but as far as I can tell  
there are only 2 features so far that I've seen discussed on hackers  
that are looking really good to me.  I'm sure all the little changes  
will add up to a big win but these are the only two that would make  
me feel an urgent need to go through the hassle of upgrading.


1) Standby mode
2) Improvements to PITR making it easier to set up.

If I could have asked for any two features to be implemented in  
postgres it would probably have been these two, so I am very  
curious.   Are these two features definitely in, definitely out or  
still up in the air as far as 8.2 in concerned?



On Aug 2, 2006, at 4:09 PM, Alvaro Herrera wrote:


Bruce Momjian wrote:

I am thinking we will portray 8.2 as a release focused on usability
improvements.  We do have a few large features, like perhaps bit- 
mapped

indexes, but in general, the release has a lot of additions that make
things easier to use, rather than adding new features or performance.


Hum, haven't we seen a lot of performance improvements committed?  I'd
say 8.2 is performance + usability :-)  It's true that we don't  
have as

many major features in 8.2 as compared to 8.1 or 8.0.

--
Alvaro Herrerahttp:// 
www.CommandPrompt.com/

PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger
Just out of curiosity Mark, didn't you write your session daemon so  
that you don't have to put sessions in postgres anymore?  Or are you  
just giving that as an example of a very wide, very heavily updated  
table?  My session tables have been an extreme case of this problem,  
but no other table that I have is so adversely affected by this  
behavior.  My decision was not to pull postgres out entirely, just  
using other session handlers.


Rick

On Jun 22, 2006, at 7:59 AM, Mark Woodward wrote:


After a long battle with technology, [EMAIL PROTECTED] (Mark
Woodward), an earthling, wrote:
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward)  
mumbled into

her beard:

[snip]


1.  The index points to all the versions, until they get  
vacuumed out.


It can't point to all versions, it points to the last current
version
as  updated by vacuum, or the first version of the row.


No, it points to *all* the versions.

Suppose I take a table with two rows:

INFO:  analyzing public.test
INFO:  test: 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Then, over and over, I remove and insert one entry with the same PK:

sample=# delete from test where id = 2;insert into test (id)  
values (2);

DELETE 1


[snip]


Now, I vacuum it.

sample=# vacuum verbose analyze test;
INFO:  vacuuming public.test
INFO:  index test_id_key now contains 2 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  test: removed 10 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  test: found 10 removable, 2 nonremovable row versions in  
1 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.test
INFO:  test: 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Notice that the index contained 10 versions of that one row.

It pointed to *ALL* the versions.


Hmm, OK, then the problem is more serious than I suspected.
This means that every index on a row has to be updated on every
transaction that modifies that row. Is that correct?

I am attaching some code that shows the problem with regard to
applications such as web server session management, when run, each  
second
the system can handle fewer and fewer connections. Here is a brief  
output:


[EMAIL PROTECTED]:~/pgfoo$ ./footest
1307 sessions per second, elapsed: 1
1292 sessions per second, elapsed: 2
1287 sessions per second, elapsed: 3

1216 sessions per second, elapsed: 25
1213 sessions per second, elapsed: 26
1208 sessions per second, elapsed: 27

1192 sessions per second, elapsed: 36
1184 sessions per second, elapsed: 37
1183 sessions per second, elapsed: 38

1164 sessions per second, elapsed: 58
1170 sessions per second, elapsed: 59
1168 sessions per second, elapsed: 60

As you can see, in about a minute at high load, this very simple table
lost about 10% of its performance, and I've seen worse based on update
frequency.  Before you say this is an obscure problem, I can tell  
you it
isn't. I have worked with more than a few projects that had to  
switch away

from PostgreSQL because of this behavior.

Obviously this is not a problem with small sites, but this is a real
problem with an enterprise level web site with millions of visitors  
and

actions a day. Quite frankly it is a classic example of something that
does not scale. The more and more updates there are, the higher the  
load

becomes. You can see it on top as the footest program runs.

There has to be a more linear way of handling this scenario.
footest.c

---(end of  
broadcast)---

TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger


On Jun 22, 2006, at 10:33 AM, Mark Woodward wrote:

Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark  
Woodward:

After a long battle with technology, [EMAIL PROTECTED] (Mark
Woodward), an earthling, wrote:
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward)  
mumbled

into

It pointed to *ALL* the versions.


Hmm, OK, then the problem is more serious than I suspected.
This means that every index on a row has to be updated on every
transaction that modifies that row. Is that correct?


Yes.


I am attaching some code that shows the problem with regard to
applications such as web server session management, when run, each
second
the system can handle fewer and fewer connections. Here is a brief
output:

[EMAIL PROTECTED]:~/pgfoo$ ./footest
1307 sessions per second, elapsed: 1
1292 sessions per second, elapsed: 2
1287 sessions per second, elapsed: 3

1216 sessions per second, elapsed: 25
1213 sessions per second, elapsed: 26
1208 sessions per second, elapsed: 27

1192 sessions per second, elapsed: 36
1184 sessions per second, elapsed: 37
1183 sessions per second, elapsed: 38

1164 sessions per second, elapsed: 58
1170 sessions per second, elapsed: 59
1168 sessions per second, elapsed: 60

As you can see, in about a minute at high load, this very simple  
table
lost about 10% of its performance, and I've seen worse based on  
update
frequency.  Before you say this is an obscure problem, I can tell  
you it
isn't. I have worked with more than a few projects that had to  
switch

away
from PostgreSQL because of this behavior.


You mean systems that are designed so exactly, that they can't  
take 10%

performance change ?


No, that's not really the point, performance degrades over time, in  
one

minute it degraded 10%.

The update to session ratio has a HUGE impact on PostgreSQL. If you  
have a

thousand active sessions, it may take a minute to degrade 10% assuming
some level of active vs operations per session per action.


How big are your session?  Running with about 1000 sessions, running  
vacuum on just the session table is so fast it is barely noticeable.   
Vacuuming my session table every 5 minutes keeps them very, very  
small and easy to vacuum and performance degradation is not an  
issue.  I could probably do it every minute if I had to and it would  
be fine.  But my sessions are only about 5k on average.


What is a bigger concern for me is the massive amount of writes to  
the disk that happen in postgres to make sure the data is safe.  It's  
just a waste of disk bandwidth for data for data that is transient  
anyway.


To me postgres (and rdbms's in general) are just not good for  
handling session data for web apps.  Once again isn't that why you  
wrote mcache?




If an active user causes a session update once a second, that is  
not too
bad, but if an active user updates a session more often, then it is  
worse.


Generally speaking, sessions aren't updated when they change, they are
usually updated per HTTP request. The data in a session may not  
change,
but the session handling code doesn't know this and simply updates  
anyway.


This problem is more or less specific to php no?  Because it reads  
the whole session on session open and writes the whole thing on  
close.  Because of this I am looking into smarter ways of handling  
sessions than this.  Because yes, the session data RARELY changes.  I  
am looking into ways of only doing updates when the data changes.  In  
fact for a very similar problem, where I had tons of tiny requests  
coming in that would NEVER alter the sessions I skipped session_start  
and used my own session_touch function to update the timestamp on the  
session because that's all I needed to do.  It saved TONS of wasted  
overhead.


I don't mean to get off topic but it seems like these sorts of  
problems are better solved outside of postgres.  I think your session  
daemon is in fact the right approach here.  If you have other tables  
with similar problems that is one thing but if it is just php session  
tables then I think we need to look for a better use case to look  
into this.




In a heavily AJAX site, you may have many smaller HTTP requests  
returning
items in a page. So, a single page may consist of multiple HTTP  
requests.
Worse yet, as a user drags an image around, there are lots of  
background
requests being made. Each request typically means a session lookup  
and a
session update. This is compounded by the number of active users.  
Since

the object of a site is to have many active users, this is always a
problem. It is less intrusive now that non-locking vacuum is there,  
but

that doesn't mean it isn't a problem.


Once again I think to run an Enterprise app (and by that I mean it  
scales well) you need Enterprise class session management.  The php  
model is not good for this and using postgres is not good for this.   
It's just not the right tool for this job in my opinion.  I would  
think you could 

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger


On Jun 22, 2006, at 2:36 PM, Mark Woodward wrote:



What you seem not to grasp at this point is a large web-farm,  
about 10

or
more servers running PHP, Java, ASP, or even perl. The database is
usually
the most convenient and, aside from the particular issue we are  
talking

about, best suited.


	The answer is sticky sessions : each user is assigned to one and  
only one
webserver in the cluster and his session is maintained locally, in  
RAM. No

locks, no need to manage distributed session...

I actually have a good number of years of experience in this  
topic, and
memcached or file system files are NOT the best solutions for a  
server

farm.


	If sessions are distributed, certainly, but if sessions are  
sticky to

their own server ?


And what if a particulr server goes down? or gets too high a  
percentage of

the load?


Yes, I don't think that sticky sessions are the answer.  But phps  
session handling behavior could be greatly improved on.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic

2006-02-16 Thread Rick Gigger
Any comments on this?  Is he referring to EnterpriseDB extensions  
that

they don't make public?


I've noticed a lot of press lately is mentioning their name next to  
ingres

as an alternative to MySQL, so the MySQL folks might be feeling some
Postgres heat from their direction.

I also wonder where their project is too - they seem publicly  
opaque about
progress, etc.  From the web site's statements it looks like  
they've written

a tool to tune the postgresql.conf file from which they claim a 50%
speed-up, but that's not new or unique fork-level functionality.


What they don't say is whether that is a 50% speed up from the  
default settings or a 50% increase from a carefully hand tunes file.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_hba.conf alternative

2006-02-13 Thread Rick Gigger

how? is there some kernel patch to completely to enable you to deny
access to root?
Tino Wildenhain pointed out SELinux has a feature like that.


I still dont get your problem (apart from that you can always
google for SELinux)

Why arent the other admins not trustworthy? And why do you
have many of them? If they only check logs and create users,
why do they have to be admins? They could use carefully
configured sudo as well to fullfill their tasks w/o full
access to the system.

I'd say, grep your problem at the root (literally spoken)


Yes.  Exactly.  I guess I misunderstood the situation.  Admin is  
vague word.  It could mean db admins, it could mean a system  
administrator for that computer etc.  I apologize if that was  
specified earlier in the discussion.  I just assumed that if you  
didn't want them to be able to edit the conf file that they wouldn't  
have root because well... that just seems obvious.  I realize though  
that you don't need real security but rather a small barrier to give  
the management the warm fuzzies.


I'm sure that you have your reasons but if you could make them non- 
root users and give them privileges to do what they need to do with  
sudo or something but not give them perms on the hba file then that  
would seem to be a better solution all around than compiling your own  
custom postgres.


Just a suggestion.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] pg_hba.conf alternative

2006-02-10 Thread Rick Gigger
But why do they need access to the files in the file system?  Why not  
put them on the local box but don't give them permissions to edit the  
pg_hba file?  They should still be able to connect.


On Feb 9, 2006, at 5:56 PM, Q Beukes wrote:

I did consider that, but the software we use (which again uses  
postgresql)

atm only supports local connection to the database.

I am the database admin, the other admins just manage stuff like user
accounts,
checking logs, etc...

Unfortunately there is no other way to set it up, and like I mentioned
government security is not required.

I did however statically code the pg_hba.conf file into pg binaries.

The only way I found to access the db now would be to replace the  
binary

and
possibly sniffing traffic. But we're not worried about that. They  
not really

criminally minded people.

thx for everyones help anyway ;


korry wrote:

Why would you not simply set this up on a seperate machine to  
which only
the trusted admins had access? Most data centers I am familiar  
with use
single purpose machines anyway. If someone is trusted as root on  
your
box they can screw you no matter what you do. Pretending  
otherwise is

just folly.




Agreed - that would be a much better (easier and more secure)  
solution where

practical.

-- Korry

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq






---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PostgreSQL 8.0.6 crash

2006-02-09 Thread Rick Gigger


On Feb 9, 2006, at 11:22 AM, Stephen Frost wrote:


* Tom Lane ([EMAIL PROTECTED]) wrote:

Mark Woodward [EMAIL PROTECTED] writes:
Again, regardless of OS used, hashagg will exceed working  
memory as

defined in postgresql.conf.


So?  If you've got OOM kill enabled, it can zap a process whether  
it's
strictly adhered to work_mem or not.  The OOM killer is entirely  
capable

of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).


Unless I've missed something here, disabling the OOM killer doesn't
really solve the problem here.  What solves the problem is running
ANALYZE but it's still certainly the case that it would make some  
sense

for the Postmaster, upon realizing that it's gone well beyond its
work_mem boundary, to ideally switch plans to one which isn't going to
exceed its work_mem limit.  Less ideally, it could give up and  
issue an

error to the user instead of running the box out of memory.


So is the work_mem paramater that is set not strictly enforced?  Is  
it more like a suggestions as to what it SHOULD use and not a hard  
limit on how much memory the each process is ALLOWED to use?


If his work_mem is set to 1 mb and that process is using 500 mb for  
tasks that are supposed to stay in work_mem then doesn't that mean  
that that limit is not really a hard limit but rather a suggestion?


Rick

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PostgreSQL 8.0.6 crash

2006-02-09 Thread Rick Gigger


On Feb 9, 2006, at 12:49 PM, Mark Woodward wrote:


On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:

Mark Woodward [EMAIL PROTECTED] writes:
Again, regardless of OS used, hashagg will exceed working  
memory as

defined in postgresql.conf.


So?  If you've got OOM kill enabled, it can zap a process  
whether it's

strictly adhered to work_mem or not.  The OOM killer is entirely

capable

of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).


Sorry, I must strongly disagree here. The postgresql.conf  
working mem

is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will  
work along


Actually, no, it's not designed for that at all.


I guess that's a matter of opinion.




side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in  
fact,

how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for  
1024K,

but
exceeds 1G) this is broken, and it may cause other systems to  
fail or

perform very poorly.

If it is not something that can be fixed, it should be clearly
documented.


work_mem (integer)

Specifies the amount of memory to be used by internal sort
operations and hash tables before switching to temporary disk  
files.
The value is specified in kilobytes, and defaults to 1024  
kilobytes

(1 MB). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be  
allowed to
use as much memory as this value specifies before it starts to  
put
data into temporary files. Also, several running sessions  
could be
doing such operations concurrently. So the total memory used  
could

be many times the value of work_mem; it is necessary to keep this
fact in mind when choosing the value. Sort operations are used  
for

ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
joins, hash-based aggregation, and hash-based processing of IN
subqueries.

So it says right there that it's very easy to exceed work_mem by a  
very
large amount. Granted, this is a very painful problem to deal with  
and
will hopefully be changed at some point, but it's pretty clear as  
to how

this works.


Well, if you read that paragraph carefully, I'll admit that I was a  
little
too literal in my statement apliying it to the process and not  
specific

functions within the process, but in the documentation:

each one will be allowed to use as much memory as this value  
specifies

before it starts to put data into temporary files.

According to the documentation the behavior of hashagg is broken.  
It did
not use up to this amount and then start to use temporary files, it  
used

1000 times this limit and was killed by the OS.

I think it should be documented as the behavior is unpredictable.


It seems to me that the solution for THIS INCIDENT is to run an  
analyze.  That should fix the problem at hand.  I have nothing to say  
about the OOM issue except that hopefully the analyze will prevent  
him from running out of memory at all.


However if hashagg truly does not obey the limit that is supposed to  
be imposed by work_mem then it really ought to be documented.  Is  
there a misunderstanding here and it really does obey it?  Or is  
hashagg an exception but the other work_mem associated operations  
work fine?  Or is it possible for them all to go out of bounds?


Even if you've got 100 terabyts of swap space though if seems like if  
your system is very heavy on reads then you would really want that  
single backend to start using up your disk space and leave your  
memory alone so that most of your data can stay cached and largely  
unaffeted by the problem of one backend.


If your bottleneck is writing to the disk then it doesn't really seem  
to matter.  You just need to make sure that huge out of control  
hashagg never occurs.  If your disks get saturated with writes  
because of the hashagg of one backend then all other processes that  
need to write a lot of info to disk are going to come to a grinding  
halt and queries are not going to complete quickly and build up and  
you will have a huge mess on your hands that will essentially prevent  
postgres from being able to do it's job even if it doesn't actually  
die.  In this situation disk bandwidth is a scarce commodity and  
whether you let the OS handle it all with virtual memory or you let  
postgres swap everything out to disc for that one operation you are  
still using disc to make up for a lack of RAM.  At some point you  
you've either got to stock up on enough RAM to run your queries  
properly or alter how your queries run to use less RAM.  Having a  
process go out of control in resource usage is going to cause big  
problems one way or another.



Re: [HACKERS] [pgsql-advocacy] Audio interview

2006-02-08 Thread Rick Gigger

On Feb 8, 2006, at 7:00 AM, Bruce Momjian wrote:


David Fetter wrote:

On Tue, Feb 07, 2006 at 11:43:40PM -0500, Bruce Momjian wrote:

I did an audio interview today, and it is online now:

  http://bsdtalk.blogspot.com/2006/02/bsdtalk015-interview-with- 
postgresql.html


Great interview.  You hit a lot of the high points :)

You mentioned in-place upgrade scripts.  Are those in contrib/
somewhere?  On GBorg?  On PgFoundry?  If not, could you put them


/contrib/pgupgrade


somewhere?  As far as converting them from shell to Perl, I'm sure
you'll find a flock of volunteers to help.


Yea, but the problem with modifying the disk pages is still a problem.


Maybe this is totally crazy, but for those not using slony but are  
using incremental backup and want to upgrade without doing a time  
consuming dump / reload (this is not actually a problem for me as my  
data is not so large that a dump reload is a huge problem) would it  
be possible to apply pgupgrade to the physical backup before you  
restore, then also alter each WAL record as it is restored so that it  
restores all new pages in the new format.


Then you could do all the work on a different box and quickly switch  
over to it after the restore is complete.  You could eliminate most  
of the downtime.


Is that even feasible?  Not something that would help me now but it  
might make some people very happy (and maybe someday I will need it  
as well.)


Rick

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [pgsql-advocacy] PGUpgrade WAS: Audio interview

2006-02-08 Thread Rick Gigger


On Feb 8, 2006, at 12:55 PM, Josh Berkus wrote:


Andrew,

This would be a very fine project for someone to pick up (maybe  
one of the corporate supporters could sponsor someone to work on it?)


We looked at it for Greenplum but just couldn't justify putting it  
near the top of the priority list.  The work/payoff ratio is terrible.


One justification for in-place upgrades is to be faster than dump/ 
reload.  However, if we're assuming the possibility of new/modified  
header fields which could then cause page splits on pages which are  
90% capacity,  then this time savings would be on the order of no  
more than 50% of load time, not the 90% of load time required to  
justify the programming effort involved -- especially when you take  
into account needing to provide multiple conversions, e.g. 7.3-- 
8.1, 7.4 -- 8.1, etc.


I just posted an idea for first upgrading a physical backup of the  
data directory that you would create when doing Online backups and  
then also altering the the WAL log records as they are applied during  
recovery.  That way the actual load time might still be huge but  
since it could run in parallel with the running server it would  
probably eliminate 99% of the downtime.  Would that be worth the effort?


Also all the heavy lifting could be offloaded to a separate box while  
your production server just keeps running unaffected.


The second reason for in-place upgrade is for large databases where  
the owner does not have enough disk space for two complete copies  
of the database.  Again, this is not solvable; if we want in-place  
upgrade to be fault-tolerant, then we need the doubled disk space  
anyway (you could do a certain amount with compression, but you'd  
still need 150%-175% space so it's not much help).


Yeah, anyone who has so much data that they need this feature but  
isn't willing to back it up is crazy.  Plus disk space is cheap.


Overall, it would be both easier and more effective to write a  
Slony automation wrapper which does the replication, population,  
and switchover for you.


Now that is something that I would actually use.  I think that a  
little bit of automation would greatly enhance the number of users  
using slony.


Rick


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and

2006-02-06 Thread Rick Gigger
I was thinking the exact same thing.  Except  the and just fsync()  
dirty pages on commit part.  Wouldn't that actually make the  
situation worse?  I thought the whole point of WAL was that it was  
more efficient to fsync all of the changes in one sequential write in  
one file rather than fsyncing all of the separate dirty pages.




On Feb 6, 2006, at 7:24 PM, Christopher Kings-Lynne wrote:


* Allow WAL logging to be turned off for a table, but the table
  might be dropped or truncated during crash recovery [walcontrol]

  Allow tables to bypass WAL writes and just fsync() dirty pages on
  commit.  This should be implemented using ALTER TABLE, e.g. ALTER
  TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ].  Tables using
  non-default logging should not use referential integrity with
  default-logging tables.  A table without dirty buffers during a
  crash could perhaps avoid the drop/truncate.


This would be such a sweet feature for website session tables...

Chris


---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and

2006-02-06 Thread Rick Gigger

Rick Gigger wrote:

I was thinking the exact same thing.  Except  the and just fsync()
dirty pages on commit part.  Wouldn't that actually make the
situation worse?  I thought the whole point of WAL was that it was
more efficient to fsync all of the changes in one sequential write in
one file rather than fsyncing all of the separate dirty pages.


Uh, supposedly the WAL traffic is not as efficient as fsyncing whole
pages if you are doing lots of full pages.


So then you would want to use this particular feature only when  
updating/inserting large amounts of info at a time then?


For instance if you have a table with rows that are 12 bytes wide  
using this feature would be very bad because it would fsync out a  
whole bunch of extra data just to get those 12 bytes written.  But on  
a table that wrote out several k of data at a time it would help  
because it would be filling up entire pages and not doing any  
wasteful fsyncing?


I guess that probably would help session tables because it would  
avoid writing the data twice.


In the case of session tables though I for one don't care if that  
data is recoverable or not.  If my database just crashed I've  
probably got bigger problems then just dropped sessions.  Would it be  
possible to a) Not WAL log that table, b) not fsync that table at  
all.  Then if that table is in an inconsistent state just recreate  
the table?


I'm guessing that is getting a little too crazy but I have been  
thinking about setting up a separate postgres instance listening on a  
different port with fsync off as a possible way to avoid the fsync  
overhead for the sessions.



-- 
-





On Feb 6, 2006, at 7:24 PM, Christopher Kings-Lynne wrote:


* Allow WAL logging to be turned off for a table, but the table
  might be dropped or truncated during crash recovery [walcontrol]

	  Allow tables to bypass WAL writes and just fsync() dirty  
pages on
	  commit.  This should be implemented using ALTER TABLE, e.g.  
ALTER

  TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ].  Tables using
  non-default logging should not use referential integrity with
  default-logging tables.  A table without dirty buffers during a
  crash could perhaps avoid the drop/truncate.


This would be such a sweet feature for website session tables...

Chris


---(end of
broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



--
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square,  
Pennsylvania 19073


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Multiple logical databases

2006-02-03 Thread Rick Gigger


On Feb 3, 2006, at 6:47 AM, Chris Campbell wrote:


On Feb 3, 2006, at 08:05, Mark Woodward wrote:

Using the /etc/hosts file or DNS to maintain host locations for  
is a
fairly common and well known practice, but there is no such  
mechanism for
ports. The problem now becomes a code issue, not a system  
administration

issue.


What if you assigned multiple IPs to a machine, then used ipfw (or  
something) to forward connections to port 5432 for each IP to the  
proper IP and port?


If he had multiple ips couldn't he just make them all listen only on  
one specific ip (instead of '*') and just use the default port?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] New project launched : PostgreSQL GUI Installer for

2006-01-31 Thread Rick Gigger

On Jan 31, 2006, at 12:54 AM, Tino Wildenhain wrote:


Rick Gigger schrieb:
I don't see why anyone has a problem with this.  I am certainly  
never  going to use it but if it helps someone who isn't a linux  
person to  use it on a project when they would have used something  
else (like  mysql) or if it convinces someone to run postgres on  
linux instead of  windows because they now have a graphical  
installer on linux then it  seems like a good thing to me.  More  
users = bigger community =  larger potential pool of people to  
help out.  Even if people can't  code they can answer newbie (or  
advanced) questions on the mailing  lists or write documentation  
or even just tell their dba friends  about it.
The more people using postgres the better.  If this will help  
then  I'm all for it.  Just because I would rather do a ./ 
configure make  make install doesn't mean that thats the best  
route for everyone.


As was said, a gui to produce postgresql.conf files (off host)
can be of value. Also for the tune-people a package builder
can be useful too.

For other people - if they dont learn a bit about their package system
on their choosen system - they will run into other problems soon or
later.


Why would the necessarily have to run into problem with their  
packaging system.  If someone installs from source it doesn't cause  
problems with packaging systems.  Why should this have to be any  
different?




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] New project launched : PostgreSQL GUI Installer for

2006-01-30 Thread Rick Gigger
I don't see why anyone has a problem with this.  I am certainly never  
going to use it but if it helps someone who isn't a linux person to  
use it on a project when they would have used something else (like  
mysql) or if it convinces someone to run postgres on linux instead of  
windows because they now have a graphical installer on linux then it  
seems like a good thing to me.  More users = bigger community =  
larger potential pool of people to help out.  Even if people can't  
code they can answer newbie (or advanced) questions on the mailing  
lists or write documentation or even just tell their dba friends  
about it.


The more people using postgres the better.  If this will help then  
I'm all for it.  Just because I would rather do a ./configure make  
make install doesn't mean that thats the best route for everyone.


Rick


On Jan 30, 2006, at 8:58 PM, Marc G. Fournier wrote:


On Mon, 30 Jan 2006, Joshua D. Drake wrote:


On my Debian systems, I can install PostgreSQL quite readily via the
command apt-get install postgresql-8.1, which can get GUIed at  
least

somewhat if I run aptitude, synaptic, or such...

Yes Christopher, you can... I can, and Devrim can

As more and more people come on board people are going to want to  
download a .exe (a metaphor),
double click and have it open an installer, they will then want to  
click next, next, continue, finish.


You don't get that with apt-get install.

There is a reason that even Oracle has a graphical installer on  
Linux, because most people installing

the software:

A. Don't know how to use it
B. Probably don't know how to use Linux
C. Don't want to.


i can't agree more ... I don't care whether you are running FreeBSD  
or Linux or Solaris ... if you want broader adoption of non- 
Microsoft OSs, you have to make it simplier for 'the masses' to  
make use of ... and GUIs tend to follow KISS very closely ...



Marc G. Fournier   Hub.Org Networking Services (http:// 
www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ:  
7615664


---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Working happily on 8.1 (Was: panic on 7.3)

2006-01-21 Thread Rick Gigger
While this little bug was a menace to me at a bad time my biggest  
problem was that I didn't have a good enough vacuum strategy and my  
reasonable sized database became the size of the world.  At that  
point it couldn't be vacuumed without jamming up the whole server.   
So I have some questions.  If this is the wrong place let me know and  
I will submit it to general.


1) What about these settings.  It is a dual 2.8 ghz xeon box with 6  
RAID 5 (I know I should be using 0+1 or something) 15,000 rpm scsi  
drives and 2 gigs  of ram.


max_connections = 1024
shared_buffers = 15000
work_mem = 1024
maintenance_work_mem = 10
max_fsm_pages = 100
checkpoint_segments = 10
checkpoint_timeout = 1000
effective_cache_size = 5

My base directory is 618 MB.  All other performance related  
settings I left at the defaults.


I know it depends on my data set and load etc, but it would be great  
if someone could tell me if anything in there is a little crazy.  The  
max_fsm_pages seemed a bit high but I really want vacuum to go fast  
and painless and if I read everything right it still doesn't take up  
much memory.


2) I didn't touch the Vacuum delay, background writer or autovacuum  
settings because I wasn't familiar enough with them.  Are the default  
values very restricting?  I realized a little too late that leaving  
some of the 7.3 defaults in place came back to bite me when my load  
went up.  Since these are performance enhancing features and they  
didn't exist in older versions I figured that the defaults would  
still be better than 7.3 without those features.  Or are the defaults  
too conservative and I need to change them ASAP?


3) Several times there were backends  running that were just bringing  
down the system.  Is there a way to signal a single backend to die  
without restarting the whole db server?  I looked on google, searched  
the archives and in the docs and couldn't find any way to do this.


Thanks again,

Rick


On Jan 21, 2006, at 12:05 AM, Rick Gigger wrote:


Thanks very much!

I've decided to go straight to 8.1 though.  There are just too many  
performance improvements at this point that I might regret not  
having and I don't want to do a dump reload again.  I am about to  
compile it now.  If it isn't a panic grade failure in the latest  
8.1 code then I'd just assume take the stock release source code.   
I don't care at all if this kills one connection at the ultra-low  
frequency with which it occurs but what I can't have is the whole  
server rebooting itself in the middle of processing hundreds of  
transactions.  Once that happens all of the web clients hang onto  
their bad connections and then eventually die.  Considering that  
I'm moving to 8.1 and am not too familiar with applying patches am  
I crazy for just going with the stock 8.1 code?


On Jan 20, 2006, at 10:36 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

I don't know if 2K could have passed since the last checkpoint.
...
now that I think about it I was getting about 400 pages requests /
minute and each of those would have have been doing at least 2
transactions so yes, I guess that is very likely.


Good, 'cause if you didn't have a couple thousand transactions  
between

checkpoints then we need another theory ;-)


You realize of course that that's pretty old ...



Yes.  I will be upgrading immediately.


You'll want to include this patch:
http://archives.postgresql.org/pgsql-committers/2006-01/msg00289.php
(or see adjacent messages if you plan to move to something newer than
7.3.*).  We probably will not put out another set of releases until
next month, unless something really big comes along.  This one  
doesn't

qualify as really big IMHO, because it's not a PANIC-grade failure in
the later branches.  But having been burnt once, I'm sure you'll want
a patched copy ...

regards, tom lane




---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Working happily on 8.1 (Was: panic on 7.3)

2006-01-21 Thread Rick Gigger

Rick Gigger [EMAIL PROTECTED] writes:

2) I didn't touch the Vacuum delay, background writer or autovacuum
settings because I wasn't familiar enough with them.  Are the default
values very restricting?


By default, autovacuum isn't even turned on --- you have to enable it
and also stats_row_level if you want to use autovac.  I don't have
enough experience with it to say whether the other settings are
adequate.


Yes, I realized this not long after I started things up, so I will  
have to wait till a time when I can restart postgres to try it out.   
For now I have come up with something that I think will work fine.   
Vacuum seems to be about a million times faster now due to a number  
of factors.  I am going to keep a close eye on the sessions table  
making sure that it can't start getting bloated again and I think  
I'll be ok.   It's a saturday though so we'll really see how it holds  
up on monday.





3) Several times there were backends  running that were just bringing
down the system.  Is there a way to signal a single backend to die
without restarting the whole db server?


SIGINT (ie query cancel) is safe enough.  If that doesn't work  
within a

few seconds, try SIGTERM (there is controversy over how safe that is,
but people do use it).


Thanks again!

Rick

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger

I got this message:

2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/ 
pg_clog/0292 failed: File exists


In 7.3.  It caused the server to restart.

Can anyone tell me what it means?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger

Postgres version 7.3.4

... a whole bunch of other files
-rw---  1 postgres postgres 262144 Jan 18 22:42 027D
-rw---  1 postgres postgres 262144 Jan 19 07:38 027E
-rw---  1 postgres postgres 262144 Jan 19 08:25 027F
-rw---  1 postgres postgres 262144 Jan 19 09:07 0280
-rw---  1 postgres postgres 262144 Jan 19 09:59 0281
-rw---  1 postgres postgres 262144 Jan 19 11:07 0282
-rw---  1 postgres postgres 262144 Jan 19 12:22 0283
-rw---  1 postgres postgres 262144 Jan 19 13:29 0284
-rw---  1 postgres postgres 262144 Jan 19 14:26 0285
-rw---  1 postgres postgres 262144 Jan 19 15:58 0286
-rw---  1 postgres postgres 262144 Jan 19 19:55 0287
-rw---  1 postgres postgres 262144 Jan 19 23:47 0288
-rw---  1 postgres postgres 262144 Jan 20 02:35 0289
-rw---  1 postgres postgres 262144 Jan 20 04:21 028A
-rw---  1 postgres postgres 262144 Jan 20 06:16 028B
-rw---  1 postgres postgres 262144 Jan 20 07:20 028C
-rw---  1 postgres postgres 262144 Jan 20 08:22 028D
-rw---  1 postgres postgres 262144 Jan 20 09:24 028E
-rw---  1 postgres postgres 262144 Jan 20 10:24 028F
-rw---  1 postgres postgres 262144 Jan 20 11:04 0290
-rw---  1 postgres postgres 262144 Jan 20 11:50 0291
-rw---  1 postgres postgres 262144 Jan 20 13:27 0292
-rw---  1 postgres postgres 262144 Jan 20 14:24 0293
-rw---  1 postgres postgres 262144 Jan 20 14:53 0294
-rw---  1 postgres postgres 262144 Jan 20 17:10 0295

That is right now.  Right after it started up it went up to 0292.  
There are a lot of files before the ones listed here right now  
though.  Do you need to see their names?



On Jan 20, 2006, at 3:58 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

I got this message:
2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists



In 7.3.  It caused the server to restart.



Can anyone tell me what it means?


7.3.what?

What file names exist in the pg_clog directory?

regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger
It is the version that shipped with fedora core 1.  The version  
string from psql is (PostgreSQL) 7.3.4-RH.  I assume that it must  
have been the first bug since I had plenty of disk space.



On Jan 20, 2006, at 5:35 PM, Bruce Momjian wrote:


Rick Gigger wrote:

Postgres version 7.3.4

... a whole bunch of other files
-rw---  1 postgres postgres 262144 Jan 18 22:42 027D
-rw---  1 postgres postgres 262144 Jan 19 07:38 027E
-rw---  1 postgres postgres 262144 Jan 19 08:25 027F
-rw---  1 postgres postgres 262144 Jan 19 09:07 0280
-rw---  1 postgres postgres 262144 Jan 19 09:59 0281
-rw---  1 postgres postgres 262144 Jan 19 11:07 0282
-rw---  1 postgres postgres 262144 Jan 19 12:22 0283
-rw---  1 postgres postgres 262144 Jan 19 13:29 0284
-rw---  1 postgres postgres 262144 Jan 19 14:26 0285
-rw---  1 postgres postgres 262144 Jan 19 15:58 0286
-rw---  1 postgres postgres 262144 Jan 19 19:55 0287
-rw---  1 postgres postgres 262144 Jan 19 23:47 0288
-rw---  1 postgres postgres 262144 Jan 20 02:35 0289
-rw---  1 postgres postgres 262144 Jan 20 04:21 028A
-rw---  1 postgres postgres 262144 Jan 20 06:16 028B
-rw---  1 postgres postgres 262144 Jan 20 07:20 028C
-rw---  1 postgres postgres 262144 Jan 20 08:22 028D
-rw---  1 postgres postgres 262144 Jan 20 09:24 028E
-rw---  1 postgres postgres 262144 Jan 20 10:24 028F
-rw---  1 postgres postgres 262144 Jan 20 11:04 0290
-rw---  1 postgres postgres 262144 Jan 20 11:50 0291
-rw---  1 postgres postgres 262144 Jan 20 13:27 0292
-rw---  1 postgres postgres 262144 Jan 20 14:24 0293
-rw---  1 postgres postgres 262144 Jan 20 14:53 0294
-rw---  1 postgres postgres 262144 Jan 20 17:10 0295

That is right now.  Right after it started up it went up to 0292.
There are a lot of files before the ones listed here right now
though.  Do you need to see their names?


I assume you are missing one of these fixes in 7.3.X current which  
were

done _after_ 7.3.4 was released:

 * Fix race condition in transaction log management
   There was a narrow window in which an I/O operation could be
   initiated for the wrong page, leading to an Assert
   failure or data corruption.

 * Repair incorrect order of operations in GetNewTransactionId()
   This bug could result in failure under out-of-disk-space
   conditions, including inability to restart even after
   disk space is freed.

-- 
-






On Jan 20, 2006, at 3:58 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

I got this message:
2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists



In 7.3.  It caused the server to restart.



Can anyone tell me what it means?


7.3.what?

What file names exist in the pg_clog directory?

regards, tom lane

---(end of
broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that
your
   message can get through to the mailing list cleanly




---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



--
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square,  
Pennsylvania 19073


---(end of  
broadcast)---

TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger

Thanks.  I'm not quite sure what version I am going to upgrade to yet.

Rick

On Jan 20, 2006, at 5:59 PM, Andrew Dunstan wrote:



Updates for FC1 are available here:

http://download.fedoralegacy.org/fedora/1/updates/i386/

they have 7.3.9 dated in March last year.

Or grab the source for 7.3.13 and build it yourself.

cheers

andrew

Rick Gigger wrote:

It is the version that shipped with fedora core 1.  The version   
string from psql is (PostgreSQL) 7.3.4-RH.  I assume that it must   
have been the first bug since I had plenty of disk space.



On Jan 20, 2006, at 5:35 PM, Bruce Momjian wrote:


Rick Gigger wrote:


Postgres version 7.3.4

... a whole bunch of other files
-rw---  1 postgres postgres 262144 Jan 18 22:42 027D
-rw---  1 postgres postgres 262144 Jan 19 07:38 027E
-rw---  1 postgres postgres 262144 Jan 19 08:25 027F
-rw---  1 postgres postgres 262144 Jan 19 09:07 0280
-rw---  1 postgres postgres 262144 Jan 19 09:59 0281
-rw---  1 postgres postgres 262144 Jan 19 11:07 0282
-rw---  1 postgres postgres 262144 Jan 19 12:22 0283
-rw---  1 postgres postgres 262144 Jan 19 13:29 0284
-rw---  1 postgres postgres 262144 Jan 19 14:26 0285
-rw---  1 postgres postgres 262144 Jan 19 15:58 0286
-rw---  1 postgres postgres 262144 Jan 19 19:55 0287
-rw---  1 postgres postgres 262144 Jan 19 23:47 0288
-rw---  1 postgres postgres 262144 Jan 20 02:35 0289
-rw---  1 postgres postgres 262144 Jan 20 04:21 028A
-rw---  1 postgres postgres 262144 Jan 20 06:16 028B
-rw---  1 postgres postgres 262144 Jan 20 07:20 028C
-rw---  1 postgres postgres 262144 Jan 20 08:22 028D
-rw---  1 postgres postgres 262144 Jan 20 09:24 028E
-rw---  1 postgres postgres 262144 Jan 20 10:24 028F
-rw---  1 postgres postgres 262144 Jan 20 11:04 0290
-rw---  1 postgres postgres 262144 Jan 20 11:50 0291
-rw---  1 postgres postgres 262144 Jan 20 13:27 0292
-rw---  1 postgres postgres 262144 Jan 20 14:24 0293
-rw---  1 postgres postgres 262144 Jan 20 14:53 0294
-rw---  1 postgres postgres 262144 Jan 20 17:10 0295

That is right now.  Right after it started up it went up to 0292.
There are a lot of files before the ones listed here right now
though.  Do you need to see their names?



I assume you are missing one of these fixes in 7.3.X current  
which  were

done _after_ 7.3.4 was released:

 * Fix race condition in transaction log management
   There was a narrow window in which an I/O operation could be
   initiated for the wrong page, leading to an Assert
   failure or data corruption.

 * Repair incorrect order of operations in GetNewTransactionId()
   This bug could result in failure under out-of-disk-space
   conditions, including inability to restart even after
   disk space is freed.

 
-- -






On Jan 20, 2006, at 3:58 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:


I got this message:
2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists




In 7.3.  It caused the server to restart.




Can anyone tell me what it means?



7.3.what?

What file names exist in the pg_clog directory?

regards, tom lane

---(end of
broadcast)---
TIP 1: if posting/reading through Usenet, please send an  
appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that

your
   message can get through to the mailing list cleanly




---(end of   
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



--
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square,   
Pennsylvania 19073


---(end of   
broadcast)---

TIP 6: explain analyze is your friend




---(end of  
broadcast)---

TIP 6: explain analyze is your friend






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger

=Rick Gigger [EMAIL PROTECTED] writes:

Postgres version 7.3.4


You realize of course that that's pretty old ...



Yes.  I will be upgrading immediately.


That is right now.  Right after it started up it went up to 0292.


So it was the latest file eh?  I thought maybe you had some problem
with a corrupted XID leading to trying to touch a clog file
out-of-order, but that seems ruled out.


Well that sounds like a good thing.




2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists


Digging in the 7.3 sources, it seems that error message could only  
have

come from here:

fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
if (fd  0)
{
if (errno != ENOENT)
elog(PANIC, open of %s failed: %m, path);
fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL |  
PG_BINARY,

   S_IRUSR | S_IWUSR);
if (fd  0)
elog(PANIC, creation of file %s failed: %m, path);
}


Yes I found that too (on accident with google) but didn't really have  
the slightest clue what exactly would have caused it.




AFAICS, it is simply not possible for the second open() to fail with
that errno, *unless* someone else created the same file in the
microseconds between the two open calls.

The code doing this has a lock on the particular clog buffer it's  
trying

to write out, so no-one else could be trying to write the same buffer;
but now that I look at it, it's entirely legal for someone else to be
trying to write a different clog buffer.  This leads to the following
theory:

1. The clog page that would be first in the 0292 segment got  
created in

clog buffers, but there was no reason to write it out for awhile.  (In
normal operation, only a checkpoint would be cause to write out the
frontmost page of clog.)

2. More than 2K transactions elapsed, so the page that would be second
in the 0292 segment also got set up in clog buffers.  (Rick, is the  
load

on your machine such that several thousand transactions might have
elapsed between checkpoints?)  Perhaps there were even enough
transactions so that more than two pages were dirty and pending write
in the clog buffers, yet the file hadn't been created yet.


I don't know if 2K could have passed since the last checkpoint.  Part  
of the reason I haven't upgraded in so long is that it has been  
running like champ for about 3 years.   Recently though the load on  
the site just shot through the roof.  Not only am I going to upgrade  
the version of postgres but I need to do some major tuning.  I am  
still using a lot of defaults.  I am using the default checkpoint  
settings but I am not sure how often they are happening.  Actually  
now that I think about it I was getting about 400 pages requests /  
minute and each of those would have have been doing at least 2  
transactions so yes, I guess that is very likely.



3. Two different backends decided to try to write different clog pages
concurrently.  Probably one was writing the frontmost page because it
was doing a checkpoint, and another needed to read in an older clog  
page

so it had to swap out one of the other dirty buffers.

4. Kaboom.


Yeah Kaboom.  It was really bad timing too.  :)


If this theory is correct, the bug has been there since the clog code
was first written.  But the conditions for having it happen are narrow
enough that it's not too surprising we haven't seen it before.


Wow it's great to be the first.


I think that a sufficient fix might just be to remove the O_EXCL flag
from the second open() call --- ie, if someone else creates the file
in this narrow window, it should be considered OK.  Comments?


Well just a little fyi, I don't know if any of this will help but I  
was suffering from severe table bloat.  The data in my session table  
is unfortunately quite large and being updated constantly so the  
session table and it's two indexes and especially it's toast table we  
impossible to vacuum.  Also the vacuum and fsm settings were the  
defaults making the problem worse.




regards, tom lane



Thanks so much for the help.

Rick

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger


On Jan 20, 2006, at 6:02 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

Postgres version 7.3.4


You realize of course that that's pretty old ...


That is right now.  Right after it started up it went up to 0292.


So it was the latest file eh?  I thought maybe you had some problem
with a corrupted XID leading to trying to touch a clog file
out-of-order, but that seems ruled out.


2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists


Digging in the 7.3 sources, it seems that error message could only  
have

come from here:

fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
if (fd  0)
{
if (errno != ENOENT)
elog(PANIC, open of %s failed: %m, path);
fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL |  
PG_BINARY,

   S_IRUSR | S_IWUSR);
if (fd  0)
elog(PANIC, creation of file %s failed: %m, path);
}

AFAICS, it is simply not possible for the second open() to fail with
that errno, *unless* someone else created the same file in the
microseconds between the two open calls.

The code doing this has a lock on the particular clog buffer it's  
trying

to write out, so no-one else could be trying to write the same buffer;
but now that I look at it, it's entirely legal for someone else to be
trying to write a different clog buffer.  This leads to the following
theory:

1. The clog page that would be first in the 0292 segment got  
created in

clog buffers, but there was no reason to write it out for awhile.  (In
normal operation, only a checkpoint would be cause to write out the
frontmost page of clog.)

2. More than 2K transactions elapsed, so the page that would be second
in the 0292 segment also got set up in clog buffers.  (Rick, is the  
load

on your machine such that several thousand transactions might have
elapsed between checkpoints?)  Perhaps there were even enough
transactions so that more than two pages were dirty and pending write
in the clog buffers, yet the file hadn't been created yet.


So what I think I'm getting killed on right now are the disk writes.   
So I was thinking of changing away from the default checkpoint  
settings.  My load is going to continue to go up so 2000+  
transactions are going to start happening pretty fast.  I have lots  
of disk space so I was going to increase the time between  
checkpoints.  Will that increase the chances of this happening again  
or was this such a strange freak of nature coincidence that it can't  
really even happen again.  Also I've decided to upgrade all the way  
to the latest 8.1 code.



3. Two different backends decided to try to write different clog pages
concurrently.  Probably one was writing the frontmost page because it
was doing a checkpoint, and another needed to read in an older clog  
page

so it had to swap out one of the other dirty buffers.

4. Kaboom.

If this theory is correct, the bug has been there since the clog code
was first written.  But the conditions for having it happen are narrow
enough that it's not too surprising we haven't seen it before.

I think that a sufficient fix might just be to remove the O_EXCL flag
from the second open() call --- ie, if someone else creates the file
in this narrow window, it should be considered OK.  Comments?

regards, tom lane

---(end of  
broadcast)---

TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] panic on 7.3

2006-01-20 Thread Rick Gigger

Thanks very much!

I've decided to go straight to 8.1 though.  There are just too many  
performance improvements at this point that I might regret not having  
and I don't want to do a dump reload again.  I am about to compile it  
now.  If it isn't a panic grade failure in the latest 8.1 code then  
I'd just assume take the stock release source code.  I don't care at  
all if this kills one connection at the ultra-low frequency with  
which it occurs but what I can't have is the whole server rebooting  
itself in the middle of processing hundreds of transactions.  Once  
that happens all of the web clients hang onto their bad connections  
and then eventually die.  Considering that I'm moving to 8.1 and am  
not too familiar with applying patches am I crazy for just going with  
the stock 8.1 code?


On Jan 20, 2006, at 10:36 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

I don't know if 2K could have passed since the last checkpoint.
...
now that I think about it I was getting about 400 pages requests /
minute and each of those would have have been doing at least 2
transactions so yes, I guess that is very likely.


Good, 'cause if you didn't have a couple thousand transactions between
checkpoints then we need another theory ;-)


You realize of course that that's pretty old ...



Yes.  I will be upgrading immediately.


You'll want to include this patch:
http://archives.postgresql.org/pgsql-committers/2006-01/msg00289.php
(or see adjacent messages if you plan to move to something newer than
7.3.*).  We probably will not put out another set of releases until
next month, unless something really big comes along.  This one doesn't
qualify as really big IMHO, because it's not a PANIC-grade failure in
the later branches.  But having been burnt once, I'm sure you'll want
a patched copy ...

regards, tom lane




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [DOCS] Online backup vs Continuous backup

2006-01-03 Thread Rick Gigger

How about:

use Online backup or Hot backup to refer to either method of back  
since they are both done while the system is online or hot.


If you want to get specific refer to doing a sql dump etc for using  
pg_dump
Then use Incremental backup to refer to  the whole process of the  
WAL archival etc

Refer to the actual log files themselves as transaction logs.

That all seems to be pretty intuitive and non-ambiguous non-confusing  
to me.


On Dec 26, 2005, at 11:44 AM, Tom Lane wrote:


Bruce Momjian pgman@candle.pha.pa.us writes:

I suggest the following patch to rename our capability Continuous
Backup.


This doesn't seem like an improvement.  Online backup is the  
standard

terminology AFAIK.

regards, tom lane

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [DOCS] Online backup vs Continuous backup

2006-01-03 Thread Rick Gigger

I think it would all make more sense if we described the use of
archive_command = something as being in WAL Archive Mode. That would
then allow us to say:
You can only take Online Backups while in WAL Archive Mode.
If you ever wish to perform PITR, you must use WAL Archive Mode.
If you backed-up in WAL Archive Mode, you can perform an Archive
Recovery.


It seems to me there are two different context in which one would be  
making statements like this.  And what we are allowed to say  
depends greatly on context. These contexts are as follows:


1) Explaining the feature set of postgres to a potential user.
2) Explaining to an actual postgres user how to actually do something.

In the first case it makes the most sense to me to use industry  
standard or very intuitive terminology to the extend that it exists.   
ie (Transaction Logs vs. WAL).  Incremental Backup and Point in Time  
Recovery seem to be fairly commonly used and understood database  
buzzwords for someone to investigate the feature set of an RDBMS.


In the second case it seems to me that the most important thing is  
that you pick terminology that is consistent, unambiguous and clearly  
defined.  Log archival, PITR, etc are not point and click operations  
like they are in say MS SQL Server.  This gives us more flexibility  
but it also requires a deeper understanding.  If someone is unwilling  
or unable to to learn whatever terminology you happen to come up with  
then it seems to me they shouldn't even be attempting to set up one  
of those features.  At the same time if the terminology you uses  
changes all the time (is not consistent), or if you can't figure out  
what any of the terms mean (they are not clearly defined) or if you  
use terms like online backup to mean both types of backup but then  
use it once in a specific circumstance where only one usage is  
appropriate (you are using the terms ambiguously) then users will be  
confused and it will be your fault not theirs.


Just my 2 cents

Rick Gigger

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Incremental Backup Script

2006-01-03 Thread Rick Gigger

I would certainly like some instructions on this as well.

On Jan 3, 2006, at 8:41 PM, Zach Bagnall wrote:


On 12/26/05 11:04, Qingqing Zhou wrote:

Gregor Zeitlinger [EMAIL PROTECTED] wrote
Also, I was wondering whether it is always safe to copy the  
current WAL file, i.e. may the current WAL file be invalid in any  
circumstance?


If you mean current WAL file is the xlog segment in use, then it  
is dangerous. We only backup the xlog segments that have been  
fully used up.


As per docs, if the databases are rarely updated it could take a  
long time for the WAL segment to roll over. We need to backup the  
current segment to guarantee we have the latest trasactions  
archived at time of failure.


http://www.postgresql.org/docs/8.1/interactive/backup-online.html
If you are concerned about being able to recover right up to the  
current instant, you may want to take additional steps to ensure  
that the current, partially-filled WAL segment is also copied  
someplace. This is particularly important if your server generates  
only little WAL traffic (or has slack periods where it does so),  
since it could take a long time before a WAL segment file is  
completely filled and ready to archive. One possible way to handle  
this is to set up a cron job that periodically (once a minute,  
perhaps) identifies the current WAL segment file and saves it  
someplace safe.


Gregor: can you explain how to identify the current file? I had  
implemented a backup and restore script for PITR but stumbled at  
this point. The page above does not specify how this is to be done.


I appreciate the addition of PITR - it's better than nothing  
(nothing being full dumps) in some respects. Ideally, we need to be  
able to dump deltas for a single database. In practice, restoration  
using the PITR method is awkward. I guess you would tarball the  
current data files, do a full restore, do a full dump of the  
database you are interested in, ditch the restored data files and  
replace them with the ones you tarballed, then do a database load  
from the full dump. The only way to avoid having the other  
databases on the server offline is to restore to a second  
postgresql instance. Not complaining, just saying :-)





Regards,
Qingqing


Zach.

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Automatic function replanning

2005-12-20 Thread Rick Gigger

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more  
intelligent.


It also seems like #1 would be fairly straightforward and simple  
whereas #2 would be much more complex.  #1 would do me a world of  
good and probably other people as well.  Postgres's query planning  
has always been fine for me, or at least I have always been able to  
optimize my queries when I've got a representative data set to work  
with.  Query plan caching only gets me when the query plan is created  
before the statistics are present to create a good plan.


Just one users 2 cents.

- Rick Gigger


On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote:


On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:

Jim C. Nasby wrote:

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a  
stored query
plan, and if a query uses that plan but requires a very different  
amount
of work it's a good indication that we either need to replan or  
store
multiple plans for that query. Though if we're certain that  
cardinality

is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.


This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates.  I  
think we
decided that was too hard/risky, but invalidating the plan might  
help,

though I am thinking re-planning might just generate the same plan as
before.  I think something would need to have happened since the last
planning, like ANALYZE or something.


Well, in the stored plan case, presumably what's changed is one of the
bound parameters. And if we want to be sophisticated about it, we  
won't

just throw out the old plan; rather we'll try and figure out what
parameter it is that's wanting a different plan.


Updated TODO:

* Flush cached query plans when the dependent objects change,
  when the cardinality of parameters changes dramatically, or
  when new ANALYZE statistics are available


Probably worth pointing to this therad in the TODO...
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Rick Gigger

Just like MySql!

On Dec 5, 2005, at 10:35 PM, Jan Wieck wrote:


On 12/5/2005 8:18 PM, Gustavo Tonini wrote:


replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.


We do not plan to implement replication inside the backend.  
Replication needs are so diverse that pluggable replication support  
makes a lot more sense. To me it even makes more sense than keeping  
transaction support outside of the database itself and add it via  
pluggable storage add-on.



Jan



Gustavo.
P.S. Sorry for my bad English.
2005/12/5, Chris Browne [EMAIL PROTECTED]:


[EMAIL PROTECTED] (Gustavo Tonini) writes:
 What about replication or data distribution inside the  
backend.  This

 is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string gro.gultn @ enworbbc))
http://www.ntlug.org/~cbbrowne/x.html
Love is like a snowmobile flying over the frozen tundra that  
suddenly

flips, pinning you underneath.  At night, the ice weasels come.
-- Matt Groening

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that your

   message can get through to the mailing list cleanly




--
#= 
=#
# It's easier to get forgiveness for being wrong than for being  
right. #
# Let's break this rule - forgive  
me.  #
#==  
[EMAIL PROTECTED] #


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Rick Gigger

  - Asynchronous master to multi-slave. We have a few of those with
Mommoth-Replicator and Slony-I being the top players. Slony-I does
need some cleanup and/or reimplementation after we have a general
pluggable replication API in place.


Is this API actually have people working on it or just something on  
the todo list?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Rick Gigger
I agree.  I would never ever ever want it to silently start doing  
table locks.  I would simply avoid using merge at all if that was a  
possibility.


However it seems like the idea is to eventually flesh out full  
fledged merge.  And to do that it sounds like you would need to do  
one of the following:


1) implement predicate locking beyond the simple match on unique  
index case that we have here

2) do full table locks.

It sounds like #1 isn't going to happen for a while.  So in order to  
do more complicated merges you will need to do #2.  If you are going  
to implement more complicated merge functionality I certainly  
wouldn't want it throwing a warning telling me about a table lock if  
I had already knew it would get the table lock and decided I wanted  
to go ahead with using merge anyway.


Could you let the user create the lock himself to handle this  
situation?  For instance:


analyze the merge
if merge condition matches unique index
merge without  table locking
elseif needed table lock already exists
merge
else
throw an error

You could also just add something to the merge syntax like ALLOW  
TABLE LOCK or something.  The idea is just that the user can  
explicitly allow the table lock and thus the more complicated merge.


I don't really know anything about the implementation details but  
that is the behavior that I would prefer.  That way I could always do  
a complicated merge if I wanted to but there is no way it would ever  
do an implicit table lock on me.  And it would never throw an error/ 
warning unless I actually did something questionable.


Does that make sense.

Rick Gigger

On Nov 16, 2005, at 7:49 AM, Tom Lane wrote:


Christopher Kings-Lynne [EMAIL PROTECTED] writes:
We should probably throw a notice or warning if we go to a table  
lock,

too.


That's not very useful, because you can only do somethign about it  
AFTER

the 1 hour exclusive lock merge has already run :)


We shouldn't do anything remotely like that.  A statement whose  
locking

effects can't be predicted on sight is horrid both from the user's
viewpoint and from the implementation viewpoint.  In particular, if we
have to do planning before we can determine whether the table needs  
just
a SELECT lock or something stronger, then we have to take a weak  
lock to

do the planning and then we are faced with upgrading to the stronger
lock at runtime.  Can you say deadlock risk?

I think we should do REPLACE-like functionality that simply fails  
if the
match condition isn't equality on a primary key.  If we can use SQL- 
spec

MERGE syntax for this, that's fine, but let's not think in terms of
silently changing to a stronger table lock and a much slower
implementation when the condition isn't a primary key.  That's a whole
lot of work that isn't solving any real-world problems, and *is*
creating a foot-gun for people to cause themselves performance and
deadlock problems anytime they spell the WHERE condition slightly  
wrong.


regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Rick Gigger



Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.


I've just got one question about this.  Everyone seems to be saying  
that try to insert and if that fails update is the same as try to  
insert and if that fails delete and then insert.


What about the following scenario:

mytable

id  serial primary key,
a  int4,
b  int4,
data text

I've got an id field on every table because it simplifies a lot of  
things (such as slony configuration for example)
But I've also got a unique key on (a, b) and if I was to do a merge I  
would most likely do it in (a, b) not id.


If merge does a delete insert then it creates new values for the id  
columns which could cause me problems.  Basically any default fields  
are going to change or for that matter any fields not specified would  
be reinitialized whereas an update would leave them in place.


It seems to me that try to update and if that fails insert seems to  
be the best approach for not messing with existing data.  I guess   
try to insert and if that fails update gets you the same effect.


- Rick Gigger

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Contrib -- PostgreSQL shared variables

2004-08-25 Thread Rick Gigger
LockShared('name');
[EMAIL PROTECTED] wrote:
This is a first pass on a simple shared memory variable system for
PostgreSQL. I would appriciate anyone interested in this functionality to
rip it apart.
It basically adds this functionality:
SetShared('name', value);
GetSharedInt('name');
SetSharedText('name);
RemovedShared('name');
I also added two extra functions that are sort of a kludge, but could be
very helpful.
AddSharedInt('name', value);
SubSharedInt('name', value);
These add or subtect the 'value' from the variable and return the result.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [GENERAL] [HACKERS] Slony-I goes BETA

2004-06-06 Thread Rick Gigger
The link you have down there is not the one on the site.  All of the 
links to that file work just fine for me on the live site.

Jan Wieck wrote:
On 6/4/2004 4:47 AM, Karel Zak wrote:
On Fri, Jun 04, 2004 at 01:01:19AM -0400, Jan Wieck wrote:
Yes, Slonik's,
it't true. After nearly a year the Slony-I project is entering the 
BETA phase for the 1.0 release. Please visit

http://gborg.postgresql.org/project/slony1/news/newsfull.php?news_id=174

 Jan, the link
http://postgresql.org/~wieck/slony1/Slony-I-concept.pdf
 that is used on project pages doesn't work :-(
 
Karel

Great ... and there is no way to modify anything on gborg ... this is 
the first and last project I manage on any site where I don't have shell 
access to the content.

Jan
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Rick Gigger
 On Tue, 2003-02-11 at 10:20, Tom Lane wrote:
  Merlin Moncure [EMAIL PROTECTED] writes:
   May I make a suggestion that maybe it is time to start thinking about
   tuning the default config file, IMHO its just a little bit too
   conservative,
 
  It's a lot too conservative.  I've been thinking for awhile that we
  should adjust the defaults.
 
  The original motivation for setting shared_buffers = 64 was so that
  Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
  (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
  structures).  At one time SHMMAX=1M was a pretty common stock kernel
  setting.  But our other data structures blew past the 1/2 meg mark
  some time ago; at default settings the shmem request is now close to
  1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
  postgresql.conf settings, or preferably learn how to increase SHMMAX.
  That means there is *no* defensible reason anymore for defaulting to
  64 buffers.
 
  We could retarget to try to stay under SHMMAX=4M, which I think is
  the next boundary that's significant in terms of real-world platforms
  (isn't that the default SHMMAX on some BSDen?).  That would allow us
  350 or so shared_buffers, which is better, but still not really a
  serious choice for production work.
 
  What I would really like to do is set the default shared_buffers to
  1000.  That would be 8 meg worth of shared buffer space.  Coupled with
  more-realistic settings for FSM size, we'd probably be talking a shared
  memory request approaching 16 meg.  This is not enough RAM to bother
  any modern machine from a performance standpoint, but there are probably
  quite a few platforms out there that would need an increase in their
  stock SHMMAX kernel setting before they'd take it.
 
  So what this comes down to is making it harder for people to get
  Postgres running for the first time, versus making it more likely that
  they'll see decent performance when they do get it running.
 
  It's worth noting that increasing SHMMAX is not nearly as painful as
  it was back when these decisions were taken.  Most people have moved
  to platforms where it doesn't even take a kernel rebuild, and we've
  acquired documentation that tells how to do it on all(?) our supported
  platforms.  So I think it might be okay to expect people to do it.
 
  The alternative approach is to leave the settings where they are, and
  to try to put more emphasis in the documentation on the fact that the
  factory-default settings produce a toy configuration that you *must*
  adjust upward for decent performance.  But we've not had a lot of
  success spreading that word, I think.  With SHMMMAX too small, you
  do at least get a pretty specific error message telling you so.
 
  Comments?

 I'd personally rather have people stumble trying to get PostgreSQL
 running, up front, rather than allowing the lowest common denominator
 more easily run PostgreSQL only to be disappointed with it and move on.

 After it's all said and done, I would rather someone simply say, it's
 beyond my skill set, and attempt to get help or walk away.  That seems
 better than them being able to run it and say, it's a dog, spreading
 word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
 those that do walk away and claim it performs horribly are probably
 doing more harm to the PostgreSQL community than expecting someone to be
 able to install software ever can.

 Nutshell:
 Easy to install but is horribly slow.

 or

 Took a couple of minutes to configure and it rocks!



 Seems fairly cut-n-dry to me.  ;)

The type of person who can't configure it or doesnt' think to try is
probably not doing a project that requires any serious performance.  As long
as you are running it on decent hardware postgres will run fantastic for
anything but a very heavy load.  I think there may be many people out there
who have little experience but want an RDBMS to manage their data.  Those
people need something very, very easy.  Look at the following that mysql
gets despite how poor of a product it is.  It's very, very easy.  Mysql
works great for many peoples needs but then when they need to do something
real they need to move to a different database entirely.  I think there is a
huge advantage to having a product that can be set up very quickly out of
the box.  Those who need serious performance, hopefully for ther employers
sake, will be more like to take a few minutes to do some quick performance
tuning.

Rick Gigger


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html