Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor

Joshua D. Drake wrote:

The big thing for me, is a single document, zero clicks, that is
searchable. PDF and plain text are the only thing that give me that. If
you are really zealous you can even use Beagle (which I don't) to
preindex the PDF for you for easy searching.


Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Tom Lane wrote:

"Matthew T. O'Connor"  writes:
Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.

The key word there being "big" ;-) ... I don't have any problem with
making such a version available on the website, but I don't think
shipping two versions of the HTML docs in our tarballs is reasonable.


I think having the single HTML file version available on our web site is
enough.


Agreed.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] What X86/X64 OS's do we need coverage for?

2007-04-05 Thread Matthew T. O'Connor

Larry Rosenman wrote:

I might use that as the base then, since the hardware finishes getting here
tomorrow.


The other thing to consider is that CentOS 5 has Xen built right in, so 
you should be able run VMs without VMWare on it.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum versus rolled-back transactions

2007-06-01 Thread Matthew T. O'Connor

Tom Lane wrote:

ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:

Our documentation says
| analyze threshold = analyze base threshold
|   + analyze scale factor * number of tuples
| is compared to the total number of tuples inserted, updated, or deleted
| since the last ANALYZE. 



but deleted tuples are not considered in the total number, because the delta
of {n_live_tuples + n_dead_tuples} is not changed by DELETE. We add the number
of DELETE into n_live_tuples and subtract it from n_dead_tuples.


Yeah, I was concerned about that when I was making the patch, but didn't
see any simple fix.  A large number of DELETEs (without any inserts or
updates) would trigger a VACUUM but not an ANALYZE, which in the worst
case would be bad because the stats could have shifted.

We could fix this at the cost of carrying another per-table counter in
the stats info, but I'm not sure it's worth it.


I believe that whenever autovacuum performs a VACUUM it actually 
performs a VACUUM ANALYZE at leas the old contrib version did and I 
think Alvaro copied that.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.


Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).



Perhaps if you are as PITR master and you have active readonly slaves 
then there should be a WAL record to note plan invalidations, etc?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] [BUGS] BUG #3326: Invalid lower bound of autovacuum_cost_limit

2007-06-07 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

But this is misleading (started postmaster with good value, then edited
postgresql.conf and entered "-2"):
17903 LOG:  received SIGHUP, reloading configuration files
17903 LOG:  -2 is outside the valid range for parameter 
"autovacuum_vacuum_cost_limit" (-1 .. 1000)
Note how it still says the range is -1 .. 1000.

Can we redefine things to make zero be the "disabled" value, thus
keeping the range of valid values contiguous?


That would be another solution ... though it would be different from the
valid value for autovacuum_vacuum_cost_delay (on which 0 is a valid
value).  Also it would be a different value from previous versions.

I don't think either of these is a showstopper, so let's go for that if
nobody objects.


Can you make 0 and -1 both valid disabled values?  That way it will be 
compatible with previous releases.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-07 Thread Matthew T. O'Connor

Tom Lane wrote:

"Andrew Hammond" <[EMAIL PROTECTED]> writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.


It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-).  Would an hour be sane?  10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.


Do we need a configurable autovacuum naptime at all?  I know I put it in 
the original contrib autovacuum because I had no idea what knobs might 
be needed.  I can't see a good reason to ever have a naptime longer than 
the default 60 seconds, but I suppose one might want a smaller naptime 
for a very active system?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor

Andrew Dunstan wrote:
The situation with this patch is that I now have it in a state where I 
think it could be applied, but there is one blocker, namely that we do 
not have a way of preventing the interleaving of log messages from 
different backends, which leads to garbled logs. This is an existing 
issue about which we have had complaints, but it becomes critical for a 
facility the whole purpose of which is to provide logs in a format 
guaranteed to work with our COPY command.


Unfortunately, there is no solution in sight for this problem, certainly 
not one which I think can be devised and implemented simply at this 
stage of the cycle. The solution we'd like to use, LWLocks, is not 
workable in his context. In consequence, I don't think we have any 
option but to shelve this item for the time being.


I think this will get shot down, but here goes anyway...

How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor

Tom Lane wrote:

"Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.


We *have* a log-writing process.  The problem is in getting the data to it.


By that I assume you mean the bgwriter, I thought that was for WAL data, 
I didn't think it could or perhaps should be used for normal log file 
writing, but I also know I'm way outside my comfort area in talking 
about this, so excuse the noise if this is way off base.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby escribió:

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.


Yes, the code in HEAD is different -- each database will be considered
separately.  So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either.  There can be more than one
worker in a single database.


Ok, but I think the question posed is that in say a virtual hosting 
environment there might be say 1,000 databases in the cluster. Am I 
still going to have to wait a long time for my database to get vacuumed? 
 I don't think this has changed much no?


(If default naptime is 1 minute, then autovacuum won't even look at a 
given database but once every 1,000 minutes (16.67 hours) assuming that 
there isn't enough work to keep all the workers busy.)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.


How about pushing thresholds all the way down to 0?



As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.


If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)


I think it's probably fine. I think, that the optimal number for the 
base_threhold is probably dependant on the width of the row, for a very 
narrow row where you might have many on the same page, 20 or 50 might be 
right, but for a very wide table a smaller number might be optimal, 
however I think it probably doesn't matter much anyway.


Reducing the default to 10 seems fine, and perhaps even removing it as a 
tuning knob.  I think there are too many autovacuum knobs and it 
confuses people.  Is it too late to possibly remove this GUC altogether?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] More logging for autovacuum

2007-08-07 Thread Matthew T. O'Connor

Gregory Stark wrote:

I'm having trouble following what's going on with autovacuum and I'm finding
the existing logging insufficient. In particular that it's only logging vacuum
runs *after* the vacuum finishes makes it hard to see what vacuums are running
at any given time. Also, I want to see what is making autovacuum decide to
forgo vacuuming a table and the log with that information is at DEBUG2.

So I would like to suggest adding two options:

log_autovacuum_jobs - output every time a vacuum or analyze *starts*

log_autovacuum_level - set the log level for the autovacuum process

I would also suggest raising the level of the DEBUG2 message indicating why
tables were chosen or not. At least to DEBUG1 if not to INFO.

Am I missing anything? Are there ways to get this info already that I'm
missing? I imagine it would be pretty simple to add these and I'll be happy to
do it and send the patch to -patches assuming others (Alvaro? :) agree.



I think this sounds good.  There was talk a while ago about need a 
special log level setting just for autovacuum, but nobody did the leg 
work.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor

Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:
  

This is an interesting idea, but I think it's attacking the wrong
problem.  To me, the problem here is that an ANALYZE should not block
CREATE INDEX or certain forms of ALTER TABLE.



I doubt that that will work; in particular I'm pretty dubious that you
can safely make CREATE INDEX and VACUUM run together.  Since they'd be
unlikely to be using the identical OldestXmin horizon, you'd likely end
up with dangling index entries (ie, CREATE INDEX indexes a tuple that
the VACUUM removes shortly afterward).


I think the main issue is ANALYZE not VACUUM (at least in this thread) 
since it's DB load times that are in question.  Can CREATE INDEX and 
ANALYZE be made to run concurrently?




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor

Tom Lane wrote:

If you insist on crafting a solution that only fixes this problem for
pg_restore's narrow usage, you'll be back revisiting it before beta1
has been out a month.



I don't know much about what is involved in crafting these solutions, 
but it seems we're close to beta and probably don't want to make drastic 
changes to anything.  As such it seems to me that solving the problem 
for analyze is a nice piece of low-hanging fruit that solves an 
immediate problem that has been reported.  I would think that reducing 
the locking implications of  VACUUM is much more involved, no?  Also, I 
would think that the fix for ANALYZE will be totally different than the 
fix for VACUUM no?


Are you proposing that we solve the VACUUM locking problem before we 
release 8.3?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Concurrent VACUUM and ANALYZE

2008-07-21 Thread Matthew T. O'Connor

Jonah H. Harris wrote:

On Mon, Jul 21, 2008 at 10:19 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
  

I don't find this a compelling argument, at least not without proof that
the various vacuum-improvement projects already on the radar screen
(DSM-driven vacuum, etc) aren't going to fix your problem.



Is DSM going to be in 8.4?  The last I had heard, DSM+related
improvements weren't close to being guaranteed for this release.  If
it doesn't make it, waiting another year and a half for something
easily fixed would be fairly unacceptable.  Should I provide a patch
in the event that DSM doesn't make it?


Can't hurt to submit a patch.  Also, could you do something to help 
mitigate the worse case, something like don't update the stats in 
pg_class if the analyze finishes after a vacuum has finished since the 
current analyze started?


Matt


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Matthew T. O'Connor

Tom Lane wrote:

"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:

Code outside of core, is, in reality, less reviewed, less likely to work
well with recent PG versions, and more likely to cause problems. It's also
less likely to be found by people, less likely to be used by people, and
less likely to be included by distros. Not to say that everything should get
shoved into core, of course, but there are strong arguments for both sides.


These are all true statements, of course, but ISTM they should be looked
on as problems to be solved.  Pushing stuff into core instead of solving
these problems is not a scalable long-term answer.


A few random thoughts...

The application that comes to mind first for me when you talk plugins is 
Firefox.  They make it very easy to browse for plugins and to install, 
update, remove them.  Their plug-in system also tries to account for 
Firefox version and OS platform which we would need to do also.


Perhaps one thing that would help PostgreSQL plug-ins is a nice GUI 
plug-in browser and management application.  The logical place to add 
this IMHO is PGAdmin since it is GUI, already talks to the DB and is 
cross platform.  I'm not saying a GUI should be required to manage 
plug-ins, a fully CLI option should be made available too.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor

Joshua D. Drake wrote:

Merlin Moncure wrote:

Well, there doesn't seem to be a TODO for partial/restartable vacuums,
which were mentioned upthread.  This is a really desirable feature for
big databases and removes one of the reasons to partition large
tables.
I would agree that partial vacuums would be very useful. 



I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.  I was thinking 
about Alvaro's recent work to make vacuum deal with TOAST tables 
separately, which is almost like a partial vacuum since it effectively 
splits the vacuum work up into multiple independent blocks of work, the 
limitation obviously being that it can only split the work around 
TOAST.  Is there anyway that vacuum could work per relfile since we 
already split tables into files that are never greater than 1G?  I would 
think that if Vacuum never had more than 1G of work to do at any given 
moment it would make it much more manageable.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor

Tom Lane wrote:

"Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
  
I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.



I'm not sure how important it will really be once we have support for
dead-space-map-driven vacuum.


Is that something we can expect any time soon? I haven't heard much 
about it really happening for 8.4.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] September CommitFest Closed

2008-10-01 Thread Matthew T. O'Connor

Josh Berkus wrote:
For the September commitfest, 29 patches were applied (one to pgFoundry) 
 and 18 patches were sent back for more work.


More importantly, six *new* reviewers completed reviews of of various 
patches: Abbas Butt, Alex Hunsaker, Markus Wanner, Ibrar Ahmed, Ryan 
Bradetich and Gianni Colli.  Several other new reviewers volunteered, 
but we ran out of patches to check, and Kenneth Marshall had to quit 
reviewing because of Hurricane Ike.


Yay, reviewers.



If nothing else ever came of the Commit Fest approach, if it creates 
more reviewers, then I think the Commit Fest process would be a success.


I think the Commit Fest approach does this by lowering the bar of entry 
to become a reviewer.  It does this because it brings more focus to 
patch review on a regular basis and since it changes the patch review 
process from a last minute dash that only experience hackers should get 
involved with to a process with more time before the final deadline, 
hence more friendly for new reviewers to get involved.  Good news all 
around!




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED

2008-10-15 Thread Matthew T. O'Connor

Tom Lane wrote:

Andrew Chernow <[EMAIL PROTECTED]> writes:

Be careful.  From LockFileEx docs:


"However, the time it takes for the operating system to unlock these 
locks depends upon available system resources. Therefore, it is 
recommended that your process explicitly unlock all files it has locked 
when it terminates. If this is not done, access to these files may be 
denied if the operating system has not yet unlocked them."


ROTFL ... so to translate: "If your program crashes, please release
locks before crashing."


Obviously that wasn't the intent of the above, but I guess it is the net 
effect.  Either way, I don't think it's a huge problem, it just means 
that PG may not be able to restart for a few seconds until the OS has 
time to clean-up the locks.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RAM-only temporary tables

2008-11-05 Thread Matthew T. O'Connor

Kevin Grittner wrote:

An idea for a possible enhancement to PostgreSQL: allow creation of a
temporary table without generating any disk I/O.  (Creating and
dropping a three-column temporary table within a database transaction
currently generates about 150 disk writes).
 
If some circumstances don't make it feasible to always do this as a

RAM-only operation, perhaps a clause could be added to the CREATE
TEMPORARY TABLE syntax to specify this behavior along with whatever
limitations on the temporary table are required for this to work. 
(For example, maybe this is only feasible if the table will be dropped

by the end of the transaction?)


As someone else already pointed out you can put temp tables on a RAM 
disk, but the larger issue is that temp tables still cause system table 
churn which will always need to be on stable media.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Matthew T. O'Connor

Tom Lane wrote:

Decibel! <[EMAIL PROTECTED]> writes:
  
I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.



Of course they can do that --- they just have to do it one version at a
time.


Also, people may be less likely to stick with an old outdated version 
for years and years if the upgrade process is easier.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-11-17 Thread Matthew T. O'Connor

Aidan Van Dyk wrote:

* Greg Stark <[EMAIL PROTECTED]> [081117 03:54]:
I thought of saying that too but it doesn't really solve the problem.  
Think of what happens if someone sets a hint bit on a dirty page.


If the page is dirty from a "real change", then it has a WAL backup block
record already, so the torn-page on disk is going to be fixed with the wal
replay ... *because* of the torn-page problem already being "solved" in PG.
You don't get the hint-bits back, but that's no different from the current
state.  But nobody's previously cared if hint-bits wern't set on WAL replay.



What if all changes to a page (even hit bits) are WAL logged when 
running with Block-level CRC checks enables, does that make things 
easier?  I'm sure it would result in some performance loss, but anyone 
enabling Block Level CRCs is already trading some performance for safety.


Thoughts?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Matthew T. O'Connor

Tom Lane wrote:

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.


I'm sure I'm missing something, but I thought the point of this was to 
lessen the impact of VACUUM and now you are suggesting that we have to 
add vacuums to tables that have never needed one before.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.5 release timetable, again

2009-08-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This seems a good idea.  Possibly pushing the betas more aggresively to
current users would make them tested not only by PG hackers ...


Isn't this the purpose of the new alpha releases, at lease to some extent.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor

Alex Hunsaker wrote:

In fact I
would argue -patches should go away so we dont have that split.


+1I think the main argument for the split is to keep the "large" 
patch emails off the hackers list, but I don't think that limit is so 
high that it's a problem.  People have to gzip their patches to the 
patches list fairly often anyway.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor

Alex Hunsaker wrote:

A big part of my problem with the split is if there is a discussion
taking place on -hackers I want to be able to reply to the discussion
and say "well, here is what I was thinking".  Sending it to -patches
first waiting for it to hit the archive so I can link to it in my
reply on -hackers seems pointless and convoluted.

But if thats what you want, thats what ill try to do from now on :)

For instance the patch Tom reviewed of mine yesterday only -hackers
was copied, so I maintained that but also added -patches because I was
sending in a patch...

I think It will be an ongoing problem though especially for new people
as they probably wont understand the "logical" split...


Patches are an integral part of the conversation about development, I 
think trying to split them up is awkward at best.  Do people really 
still think that the potential for larger messages is really a problem?  
By the way, what is the actual size limit on hackers vs patches.  I 
would imagine that most patches would already fit in the current hackers 
limit, especially since you can gzip.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XIDs and big boxes again ...

2008-05-12 Thread Matthew T. O'Connor

Hans-Juergen Schoenig wrote:
i suggest to introduce a --with-long-xids flag which would give me 62 / 
64 bit XIDs per vacuum on the entire database.

this should be fairly easy to implement.
i am not too concerned about the size of the tuple header here - if we 
waste 500 gb of storage here i am totally fine.


As you say later in the thread, you are on 8.1.  Alot of work has gone 
into reducing the effect, impact and frequency of XID wrap around  and 
vacuuming since then.  In 8.3 transactions that don't actually update a 
table no long use a real XID and autovacuum you no longer need a 
database wide vacuum to solve the XID wraparound problem, so I think the 
answer is upgrade to 8.3 and see if you still have this problem.


Matthew O'Connor

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-06-27 Thread Matthew T. O'Connor

Tom Lane wrote:

We might have to rearrange the logic a bit to make that happen (I'm not
sure what order things get tested in), but a log message does seem like
a good idea.  I'd go for logging anytime an orphaned table is seen,
and dropping once it's past the anti-wraparound horizon.


Is there an easy way for an Admin clean-up the lost temp tables that 
autovacuum is complaining about?  It seems like it could be along time 
and a lot of log messages between when they are first orphaned and and 
finally dropped due to anti-wraparound protection.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and ALTER TABLE / ADD FOREIGN KEY

2002-06-22 Thread Matthew T. O'Connor

> However, others don't believe constraints other than foreign keys
> should go unchecked.
>
> That said, is this functionality wanted outside of pg_dump /
> pg_restore?

pg_dump should reload a database as it was stored in the previous database.  
If  your old data is not clean, pg_dump / restore is not a very good tool for 
cleaning it up.  I think ignoring contrains is a good thing if it will load 
the data faster (at least when you are doing a database backup / restore).  
Why can't we do all alter table commands (that add constraints) after we load 
the data, that way we don't need to alter syntax at all.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Vacuum Daemon

2002-06-29 Thread Matthew T. O'Connor

>From the ToDo list:
Vacuum: * Provide automatic running of vacuum in the background (Tom)

As of 7.2 we have lazy vacuum.  The next logical step is setting up vacuum to 
run automatically in the background either as some type of daemon or as 
something kicked off by the postmaster.

I am interested in working on this to do item, although I see it is assigned 
to Tom right now.  

First: is this something we still want (I assume it is since its in the 
todo.).  

Second: There was some discussion 
(http://archives.postgresql.org/pgsql-hackers/2002-05/msg00970.php) about 
this not being neede once UNDO is on place, what is the current view on this?

Matthew



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] Vacuum Daemon

2002-06-29 Thread Matthew T. O'Connor

On Saturday 29 June 2002 08:14 pm, Tom Lane wrote:
> Launching VACUUMs on some automatic schedule, preferably using feedback
> about where space needs to be reclaimed, seems like a pretty
> straightforward small-matter-of-programming.  The thing that would
> really be needed to make it unobtrusive is to find a way to run the
> vacuum processing at low priority, or at least when the system is not
> heavily loaded.  I don't know a good way to do that.  Nice'ing the
> vacuum process won't work because of priority-inversion problems.
> Making it suspend itself when load gets high might do; but how to
> detect that in a reasonably portable fashion?

Are we sure we want it to be unobtrusive?  If vacuum is performed only where 
and when it's needed, it might be better for overall throughput to have it 
run even when the system is loaded.  Such as a constantly updated table.

As for a portable way to identify system load (if this is what we want) I was 
thinking of looking at the load average (such as the one reported by the top 
command) but I don't know much about portability issues.  

Since there appears to be sufficient interest in some solution, I'll start 
working on it.  I would like to hear a quick description of what 
small-matter-of-programming means.  Do you have specific ideas about what how 
best to get that feedback?

Matthew



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] (A) native Windows port

2002-07-02 Thread Matthew T. O'Connor

> The question is not how to replace some .EXE and .DLL files or modify
> something in the registry. The question is what to do with the existing
> databases in the case of a catalog version change. You have to dump and
> restore. 

pg_upgrade?

Otherwise: no upgrades persay, but you can intall the new version into a new 
directory and then have an automated pg_dump / restore between the old and 
the new.  This would require a lot of disk space, but I don't see any other 
clean way to automate it.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] (A) native Windows port

2002-07-09 Thread Matthew T. O'Connor

> > Keys to this working:
> > 1.) Must not require the old version executable backend.  There are a 
number 
> > of reasons why this might be, but the biggest is due to the way much 
> > upgrading works in practice -- the old executables are typically gone by 
the 
> > time the new package is installed.
> 
> Oh, that is a problem.  We would have to require the old executables.

Could this be solved with packaging? Meaning can postmasters from old versions 
be packed with a new release strictly for the purpose of upgrading?  It is my 
understanding that the only old executable needed is the postmaster is that 
correct?  Perhaps this also requires adding functionality so that pg_dump can 
run against a singer user postmaster.

Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
postmaster binary for v 7.2 (perhaps two or three older versions...).  An 
upgrade script is included that does the automatic dump / restore described 
eariler in this thread.  Effectivly, you are using old versions of the 
postmaster as your standalone dumper. 

I think this could sidestep the problem of having to create / test / maintain 
new version of a dumper or pg_upgrade for every release.

By default perhaps the postmaster for the previous version of postgres is 
included, and postmasters from older versions are distrubuted in separate 
packages, so if I am still runnig 6.5.3 and I want to upgrade to 7.3, I have 
do install the 6.5.3 upgrade package.  Or perhaps there i one pg_upgrade rpm 
package that includes every postmaster since 6.4.  This would allow the 
upgrade script to know that it all backends are availble to it depeding on 
what it finds in PG_VERSION, it also allows the admin to removed them all 
easily once they are no longer needed.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-07-30 Thread Matthew T. O'Connor

On Mon, 2002-07-29 at 08:53, [EMAIL PROTECTED] wrote:
> > Just a long standing curiosity?
> e) Inertia.  MySQL got more popular way back when; the reasons may no longer 

f) Win32 Support.  I can download a setup.exe for mysql and have it up
and running quickly on Windows.  I think that native Win32 support will
go a long way toward making Postgres more "popular"


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] More CVS Problems

2002-08-14 Thread Matthew T. O'Connor

I have been getting this for at least two days:

[matthew@zeut src]$ cvs -v
Concurrent Versions System (CVS) 1.11.2 (client/server)

[matthew@zeut src]$ cvs -z3 -d 
:pserver:[EMAIL PROTECTED]:/projects/cvsroot co -P pgsql

[...]

cvs server: Updating pgsql/src/backend/utils/mb/conversion_procs/ascii_and_mic
cvs server: failed to create lock directory for 
`/projects/cvsroot/pgsql-server/src/backend/utils/mb/conversion_procs/ascii_and_mic' 
(/projects/cvsroot/pgsql-server/src/backend/utils/mb/conversion_procs/ascii_and_mic/#cvs.lock):
 
Permission denied
cvs server: failed to obtain dir lock in repository 
`/projects/cvsroot/pgsql-server/src/backend/utils/mb/conversion_procs/ascii_and_mic'
cvs [server aborted]: read lock failed - giving up


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Coding help

2002-08-15 Thread Matthew T. O'Connor

Hello, I'm playing with creating an auto vacuum daemon, but it is my first 
time inside the pg source code and I'm a bit lost.

I have gotten as far as having a vacuum daemon created on postmaster startup.  
It's just a fork from the postmaster, cribbed mostly from the stat collector 
code. 

Inside the main loop of the autovac daemon, I am trying to call vacuum() but I 
get the following error: 
FATAL:  VACUUM cannot be executed from a function

I don't understand why it thinks I'm in a function, I believe the error is 
being generated by this is vacuum.c:

/* Running VACUUM from a function would free the function context */
if (vacstmt->vacuum && !MemoryContextContains(QueryContext, vacstmt))
elog(ERROR, "%s cannot be executed from a function", stmttype);

So, I assume it has something to do with the memory context that I'm in when I 
call the vacuum command, so I have been playing with switching contexts and 
such, but I have had no luck, obviously I don't really know what is going on 
here.

The code snippet that is actually calling the vacuum looks like this:
{
VacuumStmt *n = makeNode(VacuumStmt);
n->vacuum = true;
n->analyze = false;
n->full = false;
n->freeze = false;
n->verbose = false;
n->relation = NULL;
n->va_cols = NIL;
vacuum(n);  
}

Any help would be greatly appreciated.

Thanks,

Matt

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Coding help

2002-08-16 Thread Matthew T. O'Connor

Absolutely, I have been looking into this and I have some thoughts, but right 
now all I was trying to do was some rough implementations just to help me 
make sure I understand all / most of the issues.  I am very new to hacking on 
the guts of the backend.  

I plan on posting a formal proposal when I feel more confident as to what I'm 
talking about.

On Friday 16 August 2002 10:10 am, Jan Wieck wrote:
> "Matthew T. O'Connor" wrote:
> > Hello, I'm playing with creating an auto vacuum daemon, but it is my
> > first time inside the pg source code and I'm a bit lost.
> >
> > I have gotten as far as having a vacuum daemon created on postmaster
> > startup. It's just a fork from the postmaster, cribbed mostly from the
> > stat collector code.
>
> I recall that there has been discussion and so far the conclusion that
> an automatic vacuum daemon is not the solution everyone needs. If you
> really want to spend the effort on doing this, can we please see some
> proposal about possible configuration options, how the daemon decides
> what to vacuum when and the like?
>
>
> Jan


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] [PATCHES] fix for palloc() of user-supplied length

2002-08-28 Thread Matthew T. O'Connor

> > > Anyone want to argue that we should keep the v0 protocol support any
> > > longer?
> > 
> > Nope, exactly the same thought crossed my mind while I was reading
> > through the code...
> 
> Feel free to rip it out.

Should probably be mentioned in the release notes.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] --with-maxbackends

2002-09-07 Thread Matthew T. O'Connor

On Saturday 07 September 2002 12:52 pm, Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > Didn't we want to remove that option?
>
> I didn't know it was still in there.  I see no reason for it.

How about --enable-depend, that's not still needed is it?  Or is that 
something other than the new dependancy system?


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Postgresql Automatic vacuum

2002-09-24 Thread Matthew T. O'Connor

> It doesn't have to make its way into the postgresql daemon itself -- in
fact since some people like tuning the vacuuming, it makes more sense to
make this a daemon. No, my suggestion is simple that some sort of
auto-vacuumer be compiled as a stand-alone app and included in the standard
postgresql tar.gz file, and the install instructions recommend the site
adding it as a cron job.

unless I missed something the point of a daemon is so that we don't need
to use cron.

I also think that some type of daemon should be included in the pg sources,
and installed with the rest of the system, and if configured to do so, the
postmaster launches the auto vac daemon.  I think this still makes sense
even with the proposed setup (autovac client is just special client app).


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Reconstructing FKs in pg_dump

2002-09-27 Thread Matthew T. O'Connor

From: "Tom Lane" <[EMAIL PROTECTED]>
> However, if we are going to put that kind of knowledge into pg_dump,
> it would only be a small further step to have it dump these triggers
> as ALTER TABLE ADD CONSTRAINT commands instead.  Which would be a lot
> better for forward compatibility than dumping the raw triggers.

There was some talk of adding Rod Taylor's identifies upgrade script to
contrib, or mentioning it in the release.  I think that it upgrades Foreign
key, Unique, and Serial constraints, is that relevant here?  Could it be
used (or modified) to handle this situation?  Just a thought.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Request for supported platforms

2002-10-28 Thread Matthew T. O'Connor
Are you compiling from CVS or from a released tarball?  The bison
requirement was recently raised to bison 1.5 or above (1.75 was recently
released also.)  This is an issue only when compiling from CVS, since
the bison stuff is preprocessed for released tarballs.  So you might
want to try the just release beta3.

On Mon, 2002-10-28 at 08:32, Jason Tishler wrote:
> Dave,
> 
> Thanks for the heads up...
> 
> On Mon, Oct 28, 2002 at 10:31:00AM -, Dave Page wrote:
> > > -Original Message-
> > > From: Bruce Momjian [mailto:pgman@;candle.pha.pa.us] 
> > > Sent: 26 October 2002 03:17
> > > Subject: [HACKERS] Request for supported platforms
> > >
> > > Folks. start sending in those plaform reports, OS name and 
> > > version number please.
> > 
> > CYGWIN_NT-5.1 PC9 1.3.10(0.51/3/2) 2002-02-25 11:14 i686 unknown
> ^^
> 
> Please try with Cygwin 1.3.14-1 while I attempt to deal with at least
> the following Cygwin build issues with PostgreSQL CVS as of today at
> about 7:00 AM EST:
> 
> 1. pg_config.h.in HAVE_FSEEKO ifdef:
> 
> make[4]: Entering directory `/home/jt/src/pgsql/src/backend/access/common'
> gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../../src/include  
>-DBUILDING_DLL  -c -o heaptuple.o heaptuple.c
> In file included from ../../../../src/include/c.h:56,
>  from ../../../../src/include/postgres.h:48,
>  from heaptuple.c:21:
> /usr/include/stdio.h:207: parse error before `('
> 
> 2. Cygwin bison limit exceeded:
> 
> make[4]: Entering directory `/home/jt/src/pgsql/src/interfaces/ecpg/preproc'
> [snip]
> bison -y -d  preproc.y
> preproc.y:5560: fatal error: maximum table size (32767) exceeded
> 
> > Make check failed with the normal spurious errors.
> 
> I would stick with make installcheck due to the Cygwin (i.e., Windows)
> backlog issue.
> 
> > Make installcheck also failed on horology, copy2 and domain - see
> > attached output. 
> > 
> > The clocks changed here on Saturday night, so I guess that shouldn't
> > have caused the first error (or should the docs be updated?).
> > 
> > The second 2 errors are both with copys - related to the problem with
> > the listen() backlog queue in the parallel test perhaps?
> 
> I haven't looked into the above yet due to the build problems.  Any help
> regarding these issues is gratefully appreciated.
> 
> Thanks,
> Jason
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Win32 port

2002-11-05 Thread Matthew T. O'Connor
On Wed, 2002-11-06 at 01:32, Justin Clift wrote:
> Bruce Momjian wrote:
> > 
> > I have copies of Peer Direct's (Jan's company) port of PostgreSQL to
> > Win32, and SRA's port to Win32, and permission to generate a merged
> > patch that can be applied to 7.4.
> > 
> > Now that 7.3 is almost complete, I am going to start work on that.  I
> > will post patches that deal with specific portability issues, like
> > fork/exec and path separator handling, and once reviewed, apply them to
> > the main CVS tree for 7.4.
> 
> Whoo Hooo!
> 
> :-)
> 
> + Justin

Couldn't agree with Justin more.  Even though I won't use it in
production, We have developers that use postgres on their windows
laptops for development and cygwin just doesn't cut it.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RC1?

2002-11-14 Thread Matthew T. O'Connor
> Tom, would you really be able to ask Permaine to retest 7.3?  Have a
> feeling we might be able to leverage the PlayStation2 brand name here
> for the Advocacy project.
>
> :-)
>

Anyone try it on an Xbox yet?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Auto Vacuum Daemon (again...)

2002-11-26 Thread Matthew T. O'Connor
Several months ago tried to implement a special postgres backend as an
Auto Vacuum Daemon (AVD), somewhat like the stats collector.  I failed
due to my lack of experience with the postgres source.  

On Sep 23, Shridhar Daithankar released an AVD written in C++ that acted
as a client program rather than part of the backend. I rewrote it in C,
and have been playing with it ever since.  At this point I need feedback
and direction from the hacker group.

First: Do we want AVD integrated into the main source tree, or should it
remain a "tool" that can be downloaded from gborg. I would think it
should be controlled by the postmaster, and configured from GUC (at
least basic on off settings)

Second: Assuming we want it integrated into the source tree, can it
remain a client app?  Can a non backend program that connects to the
postmaster using libpq be a child of the postmaster that the postmaster
can control (start and stop).

Third:  If a special backend version is preferred, I don't personally
know how to have a backend monitor and vacuum multiple databases.  I
guess it could be similar to the client app and fire up new back
everytime a database needs to be vacuumed.

Fourth: I think AVD is a feature that is needed in some form or
fashion.  I am willing to work on it, but if it needs to be a backend
version I  will probably need some help.

Anyway for you reading pleasure, I have attached a plot of results from
a simple test program I wrote. As you can see from the plot, AVD keeps
the file size under control.  Also, the first few Xacts are faster in
the non AVD case, but after that AVD keeps the average Xact time down. 
The periodic spikes in the AVD run correspond to when the AVD has fired
off a vacuum.  Also when the table file gets to approx 450MB performance
drops off horribly I assume this is because my system can no longer
cache the whole file (I have 512M in my machine).  Also, I had been
developing against 7.2.3 until recently, and I wound up doing some of
these benchmarks against both 7.2.3 and 7.3devel and 7.3 perfoms much
better, that is it 7.2 slowed down much sooner under this test.

Thanks,

Matthew

ps, The test program performs the following: 

create table pgavdtest_table (id int,num numeric(10,2),txt char(512))

while i<1000
insert into pgavdtest_table (id,num,txt) values (i,i.i,'string i')

while i<1000
update pgavdtest_table set num=num+i, txt='update string %i'


pps, I can post the source (both the AVD and the test progam) to the
list, or email it to individuals if they would like.



<>
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Auto Vacuum Daemon (again...)

2002-11-28 Thread Matthew T. O'Connor
On Thu, 2002-11-28 at 01:58, Shridhar Daithankar wrote:
> There are differences in approach here. The reason I prefer polling rather than 
> signalig is IMO vacuum should always be a low priority activity and as such it 
> does not deserve a signalling overhead.
> 
> A simpler way of integrating would be writing a C trigger on pg_statistics 
> table(forgot the exact name). For every insert/update watch the value and 
> trigger the vacuum daemon from a separate thread. (Assuming that you can create 
> a trigger on view)
> 
> But Tom has earlier pointed out that even a couple of lines of trigger on such 
> a table/view would be a huge performance hit in general..
> 
> I would still prefer polling. It would serve the need for foreseeable future..

Well this is a debate that can probably only be solved after doing some
legwork, but I was envisioning something that just monitored the same
messages that get send to the stats collector, I would think that would
be pretty lightweight, or even perhaps extending the stats collector to
also fire off the vacuum processes since it already has all the
information we are polling for.

> The reason I brought up issue of multiple processes/connection is starvation of 
> a DB.
> 
> Say there are two DBs which are seriously hammered. Now if a DB starts 
> vacuuming and takes long, another DB just keeps waiting for his turn for 
> vacuuming and by the time vacuum is triggered, it might already have suffered 
> some performance hit.
> 
> Of course these things are largely context dependent and admin should be abe to 
> make better choice but the app. should be able to handle the worst situation..

agreed

> The other way round is make AVD vacuum only one database. DBA can launch 
> multiple instances of AVD for each database as he sees fit. That would be much 
> simpler..

interesting thought.  I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well.  I understand what you are saying about starvation, I
was erring on the conservative side by only allowing one vacuum at a
time (also simplicity of code :-) Where the worst case scenario is that
you "suffer some performance hit" but the hit would be finite since
vacuum will get to it fairly soon.

> Please send me the code offlist. I would go thr. it and get back to you by 
> early next week(bit busy, right now)

already sent.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Auto Vacuum Daemon (again...)

2002-11-29 Thread Matthew T. O'Connor
On Thursday 28 November 2002 23:26, Shridhar Daithankar wrote:
> On 28 Nov 2002 at 10:45, Tom Lane wrote:
> > "Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
> > > interesting thought.  I think this boils down to how many knobs do we
> > > need to put on this system. It might make sense to say allow upto X
> > > concurrent vacuums, a 4 processor system might handle 4 concurrent
> > > vacuums very well.
> >
> > This is almost certainly a bad idea.  vacuum is not very
> > processor-intensive, but it is disk-intensive.  Multiple vacuums running
> > at once will suck more disk bandwidth than is appropriate for a
> > "background" operation, no matter how sexy your CPU is.  I can't see
> > any reason to allow more than one auto-scheduled vacuum at a time.
>
> Hmm.. We would need to take care of that as well..

Not sure what you mean by that, but it sounds like the behaviour of my AVD 
(having it block until the vacuum command completes) is fine, and perhaps 
preferrable. 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] nested transactions

2002-11-29 Thread Matthew T. O'Connor
On Friday 29 November 2002 00:56, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> But we already have a recycling mechanism for pg_clog.  AFAICS,
> >> creating a parallel log file with a separate recycling mechanism is
> >> a study in wasted effort.
> >
> > But that recycling requires the vacuum of every database in the system.
> > Do people do that frequently enough?
>
> Once the auto vacuum code is in there, they won't have any choice ;-)

OK, I know postgres needs to be vacuumed every so often (I think its to 
guarantee safe XID wraparound?)  I think the AVD should do something to 
guarnatee this is hapening.  Since I am working on AVD, what are the criterea 
for this?  From the above I assume it also pertains to pg_clog recycling 
(which is related to XID wraparound?), but I know nothing about that.

Right now AVD only performs vacuum analyze on specific tables as it deems they 
need it, it does not perform vacuum on entire databases at any point yet.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] 7.4 Wishlist

2002-11-29 Thread Matthew T. O'Connor
pg_dump, our upgrade process is painful enough having to do a dump, reload.
I think we should be able to guarantee (or at least let much closer to it)
that the process works in all cases.

Personally pg_upgrade would be even nicer.

- Original Message -
From: "Christopher Kings-Lynne" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, November 29, 2002 1:51 PM
Subject: [HACKERS] 7.4 Wishlist


> Hi guys,
>
> Just out of interest, if someone was going to pay you to hack on Postgres
> for 6 months, what would you like to code for 7.4?
>
> My ones are:
>
> * Compliant ADD COLUMN
> * Integrated full text indexes
> * pg_dump dependency ordering
>
> What would you guys do?  Even if it isn't feasible right now...
>
> Chris
>
>
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Auto Vacuum Daemon (again...)

2002-12-02 Thread Matthew T. O'Connor
- Original Message -
From: "Shridhar Daithankar" <[EMAIL PROTECTED]>
To: "Matthew T. O'Connor" <[EMAIL PROTECTED]>
Sent: Monday, December 02, 2002 11:12 AM
Subject: Re: [HACKERS] Auto Vacuum Daemon (again...)


> On 28 Nov 2002 at 3:02, Matthew T. O'Connor wrote:
> I went thr. it today and I have some comments to make.
>
> 1. The idea of using single database is real great. I really liked that
> idea which keeps configuration simple.

I'm no longer think this is a good idea.  Tom Lane responded to our thread
on the hacker list saying that it would never be a good idea to have more
than one vacuum process running at a time, even on different databases as
vacuum is typically io bound.  Since never want to run more than one vacuum
at a time, it is much simpler to have it all managed by one AVD, rather than
one AVD for each database on a server.

> 2. You are fetching all the statistics in the list. This could get big if
> there are thousands of table or for a hosting companies where there are
tons
> of databases. That is the reason  I put a table in there..
>
> Of course not that it won't work, but by putting a table I thought it
> cause some less code in the app.

I don't see how putting a table in is any different than checking the view.
First I don't like the idea of having to have tables in someones database, I
find that intrusive.  I know that some packages such as PGAdmin do this, and
I never liked it as a developer.  Second, the only reason that it would be
less work for the server is that you may not have an entry in your table for
all tables in the database.  This can be accomplished through some type of
exclusion list that could be part of the configuration system.

> I will hack in a add-on for parallel vacuums by tom. and send you. Just
> put a command line switch(never played with getopt). Basically,after list
of
> database is read, fork a child that sleeps and vacuums only one database.

See comments above.

> Besides I have couple of bugreports which I will check against your
> version as well..

Please let me know what you find, I know it's far from a polished piece of
work yet :-)

> After a thorough look of code, I will come up with more of these but next
> time I will send you patched rather than comments..

I look forward to it.

Also, I wanted to let you know that I am working on integrating it into the
main Postgres source tree right now.  From what I have heard on the hackers
list it seems that they are hoping to have this be a core feature that they
can depend on so that they can guarantee that databases are vacuumed every
so often as required for 24x7 operation.  Basically I will still have it as
a separate executable, but the postmaster will take care of launching it
with proper arguments, restarting it if it dies (much like the stats
collector) and stop the AVD on shutdown.  This should be fairly easy to
do, I still don't know if others think this is a good idea, as I got to
response to that part of my other email, but it is the best idea I have
right now.

> Sorry for late reply. Still fighting with some *very* stupid bugs in my
> daytime jobs ( like 'if (k < 60)' evaluating to false for k=0 in release
version
> only etc..)

Good luck with your work, I hope you find all the bugs quickly, Its not the
fun part of coding.

Thanks again for the feedback, I really want this feature in postgres.

Matthew


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] psql's \d commands --- end of the line for

2002-12-09 Thread Matthew T. O'Connor
> >"\D" works though.)
> >
> >Any objections out there?
>
> My only complaint here is being forced to use the 'shift' key on commands
> that will be common.

\dd perhaps?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Matthew T. O'Connor
On Tue, 2003-02-11 at 13:01, Tom Lane wrote:
> "Jon Griffin" <[EMAIL PROTECTED]> writes:
> > So it appears that linux at least is way above your 8 meg point, unless I
> > am missing something.
> 
> Yeah, AFAIK all recent Linuxen are well above the range of parameters
> that I was suggesting (and even if they weren't, Linux is particularly
> easy to change the SHMMAX setting on).  It's other Unixoid platforms
> that are likely to have a problem.  Particularly the ones where you
> have to rebuild the kernel to change SHMMAX; people may be afraid to
> do that.

The issue as I see it is: 
Better performing vs. More Compatible Out of the box Defaults.

Perhaps a compromise (hack?):
Set the default to some default value that performs well, a value we all
agree is not too big (16M? 32M?). On startup, if the OS can't give us
what we want, instead of failing, we can try again with a smaller
amount, perhaps half the default, if that fails try again with half
until we reach some bottom threshold (1M?).

The argument against this might be: When I set shared_buffers=X, I want
X shared buffers. I don't want it to fail silently and give me less than
what I need / want.  To address this we might want to add a guc option
that controls this behavior. So we ship postgresql.conf with 32M of
shared memory and auto_shared_mem_reduction = true.  With a comment that
the administrator might want to turn this off for production.

Thoughts?  

I think this will allow most uninformed users get decent performing
defaults as most systems will accommodate this larger value.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Plans for solving the VACUUM problem

2001-05-17 Thread Matthew T. O'Connor

> Free space map details
> --
>
> Accesses to the FSM could create contention problems if we're not careful.

Another quick thought for handling FSM contention problems.  A backend could
give up waiting for access to the FSM after a short period of time, and just
append it's data to the end of the file the same way it's done now.  Dunno
if that is feasable but it seemed like an idea to me.

Other than that, I would just like to say this will be a great improvement
for pgsql.  Tom, you and several other on this list continue to impress the
hell out of me.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] Help with Vacuum Failure

2001-08-14 Thread Matthew T. O'Connor

Hello, I'm having a problem vacuum a table and I didn't see an answer using
the fts engine.

I have two questions:

1) Is this a big problem, can it be fixed, do I have to dump / restore this
table?
2) I found this problem from my nightly cron driven vacuum -a -z.  When it
hits this error the entire vacuumdb process stops immediately thus skipping
any remaining databases.  Should it do this?  Or should it continue on and
vacuum the other databases?

Here is the error:

cms_beau=# vacuum hits;  (It works without the analyze phase of backup.)
VACUUM
cms_beau=# VACUUM verbose analyze hits;
NOTICE:  --Relation hits--
NOTICE:  Pages 8389: Changed 0, reaped 2, Empty 0, New 0; Tup 834575: Vac 0,
Keep/VTL 4/4, Crash 0, UnUsed 6, MinLen 52, MaxLen 121; Re-using:
Free/Avail. Space 376/64; EndEmpty/Avail. Pages 0/1. CPU 0.34s/0.05u sec.
NOTICE:  Index hits_id_key: Pages 1831; Tuples 834575: Deleted 0. CPU
0.11s/0.56u sec.
NOTICE:  Rel hits: Pages: 8389 --> 8389; Tuple(s) moved: 0. CPU 0.00s/0.00u
sec.
NOTICE:  --Relation pg_toast_6742393--
NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_toast_6742393_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE:  Analyzing...
ERROR:  MemoryContextAlloc: invalid request size 4294079565
cms_beau=#

Additional information:

sort_mem = 16384
shared_buffers = 8192

cms_beau=# select version();
   version
-
 PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

cms_beau=# \d hits
  Table "hits"
  Attribute  |   Type   |   Modifier
-+--+---

 id  | integer  | not null default
nextval('hits_id_seq'::text)
 operator_id | integer  |
 connected   | timestamp with time zone | default 'now'
 page| text |
Index: hits_id_key

cms_beau=# select count(*) from hits;
 count

 834539
(1 row)


Please let me know if there is any other information you need.

Thank you much,

Matt O'Connor


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] CREATEDB Where ??

2001-08-20 Thread Matthew T. O'Connor

Yes and no :-).  The files were created but all postgres data files are now
idententified by numbers (oids I think), so you will not find a file or
directory anywhere in your filesystem named "mydb", or "mytable".

- Original Message -
From: "Peter Moscatt" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, August 17, 2001 9:05 AM
Subject: [HACKERS] CREATEDB Where ??


> I am pretty new to PostgreSQL so please bare with me  :-)
>
> When issuing the CREATEDB MyDb  then creating some tables with CREATE
> TABLE, I then go back and do a search for the file I have just created
> (MyDb) but can't find the physical file.
>
> Does one actually exist ??
>
> Pete
>
>
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Link to bug webpage

2001-08-21 Thread Matthew T. O'Connor

> > > > > I disagree. Unless you are omniscient, we will only ever have a
partial
> > > > > list.
> > > but there wasn't enough interest for someone to take on
> > > the maintenance.
> >
> > We need someone willing to be a kibo. Or is that too arcane a reference?
>
> Gotta admit, I haven't heard that in a while.  But I think I'm nearing
> a solution.  Stay tuned.
>
I don't know what a kibo is, but I would be willing to put in some time
helping maintaing a bug reporting system.  One of the helpful things with
bugzilla setup with some other big projects is that the bug gets assigned to
a developer and the bug submitter gets emailed updates any time there is a
status change.

I agree that a bug database is not a replacement for the mailing lists, but
I do think it could serve the project well if it is done correctly.  I think
most uses look for a bugzilla type bug reporting tool these days.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] List response time...

2001-08-22 Thread Matthew T. O'Connor

> > Actually, the 'multi-day' delay is generally related to posts from ppl
> > that aren't subscribed to the lists that I have to approve manually ...
>
> I have been getting delayed duplicates from people (ie, Tom Lane)
addressed
> to only the hackers list (which I know he's subscribed to).  Up to a week
> after reading it once already.
>

I can confirm this also.  I have seen delayed (up to several days later)
duplicates of emails I have already received.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Abort state on duplicated PKey in transactions

2001-09-08 Thread Matthew T. O'Connor


> A solution, could be to query for the existance of the PK, just before the
> insertion. But there is a little span between the test and the
> insertion, where another insertion from another transaction could void
> the existance test. Any clever ideas on how to solve this? Using
> triggers maybe? Other solutions?
>

All you need to do is use a sequence.  If you set the sequence to be the
primary key with a default value of nextval(seq_name) then you will never
have a collision.  Alternatly if you need to know that number before you
start inserting you can select next_val(seq_name) before you inser and use
that.  By the way the datatype serial automates exactly what I described.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Free-space-map management thoughts

2003-02-26 Thread Matthew T. O'Connor
> PS: Another idea I'm toying with is to dump out the FSM contents at
> postmaster shutdown and reload them at restart, so that the FSM doesn't
> have to start from ground zero on every restart cycle.  But that's
> independent of the management algorithm...

Correct me if I'm wrong, but the FSM is only populated by vacuum, so there
is no FSM information for any given table / database until it's vacuumed, in
a long running production enviornment this may not be that important, but it
could result in a large increase in file size any time the database is
restarted.

I think this change, while independent of your proposal, is important.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Two weeks to feature freeze

2003-06-18 Thread Matthew T. O'Connor
On Wed, 2003-06-18 at 21:27, Christopher Kings-Lynne wrote:
> Do we have any "killer" features added to 7.4 that we can shout about?
> There's usually been one or two in the past...?

Isn't the index growth problem solved in this release?  I think that is
a killer feature that solves a big problem for alot of people.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Updating psql for features of new FE/BE protocol

2003-06-25 Thread Matthew T. O'Connor
From: "Tom Lane" <[EMAIL PROTECTED]>
> It would be easy (and essentially free, since libpq already gets the info)
> to add such a notice to psql startup.  How do other people feel about
> it?  How would you word the notice exactly?
> "psql: server version is FOO, psql version is BAR, some things may not
work"
> seems awfully vague, but I doubt we can be much more specific ...

Do we have any documentation on psql compatibility across versions?  If so,
we could refer the user to that document.  Might be nice to know that most
\commands will not work, but ad hoc queries will be fine.

Matthew


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] persistant psql feature suggestion

2003-06-27 Thread Matthew T. O'Connor
On Fri, 2003-06-27 at 03:21, James Pye wrote:
> Greets,
> 
>   Just a thought for a psql enhancement, afiak, it is not easily possible for 
> persistent connections to a database in a shell script..
>   The ability for psql to remain in the background reading from stdin and 
> writing to stdout until explicitly killed. More specifically, so a shell scriptor 
> can have "persistent" connections to the database by calling psql once(leaving it in 
> the bg), and redirecting stdio through a fifo(mkfifo)(sending queries by echo > 
> fifo, and fetching results by cat fifo).
>   When I have tried this in the past it will read the query, execute it, and 
> exit when the results are cat'd from the fifo.

Not sure if it's exactly what you are looking for, nor how well it's
still maintained, but

I believe there is a took out there called pgbash which is a modified
version of bash that understands database queries natively.  I think
it's just what you are looking for.

Check out:  http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html

Looks like it was updated for 7.3

Matthew


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] pg_autovacuum bug and feature request

2003-07-07 Thread Matthew T. O'Connor
Sorry for the slow response, I was away for the 4th.

On Fri, 2003-07-04 at 14:53, Christopher Browne wrote:
> Vincent Van Leeuwen wrote:
> > 2411 All DBs checked in: -717533400 usec, will sleep for 30 secs.
> 
> That sure looks like a 32 bit wraparound bug; 

Agreed, please check with pg_autovaccum from CVS and let us know if you
still see this problem.

> > Also, I'd like to see a way to tell pg_autovacuum which tables it
> > should monitor. I understand most setups would like to have all tables
> > monitored, but on our setup pg_autovacuum is wasting most of it's time
> > (and a fair amount of serverload) vacuuming some large tables 

See my other email about your choice of settings.  I believe that you
need to tweak your scaling factor settings to get pg_autovacuum to
behave in the manner you are looking for.  In fact I think the default
values used by pg_autovacuum (CVS version) will work for you.  If not,
please let me know.  In most instances -V < .5 will probably have a net
loss of performance, but it's there for testing purposes.

> The whole point of the architecture of pg_autovacuum is for it to be
> totally unnecessary to give any indication of which tables are to be
> vacuumed, and for the daemon to come up with reasonable answers all by
> itself.

Agreed this was a design principle for pg_autovacuum, and I think it
will get reasonable answers with reasonable settings.

> The FIRST approach I'd take would be to see if there are "tweaks" that
> might be made to the model it uses to determine when to vacuum.  Perhaps
> the formula should take account of table size, and thus vacuum less
> often for larger tables, 

It already does this.  That is what the scaling factor -V does, it is a
percentage of the table size that is added to -v (a base value) to
determine the threshold for vacuums.

> perhaps throwing in a "[-z] Table Pages
> Factor", where the calculation of "how often" would get added into it
> the value:
> 
>pg_class.relpages * table_pages_factor
> 
> [Jan, Matthew; if you have thoughts on this, feel free to suggest
> further.]

There are lots of things we could try, but the next step I am planning
on taking, is to use the FSM as a guide to what should be vacuumed
when.  It has several benefits over the current setup.  The only thing I
don't know is if we can use it alone, or if we will still need to
monitor other sources of information such as the stats system.

> Of course, there is always the answer:  "Use the Source, Luke!"
> 
> The "local kludge" would be for you to customize pg_autovacuum to
> exclude your "not favorite" tables.  That oughtn't be too difficult to
> do, actually.  If you have several tables you don't want to deal with,
> you could do something like the following:
> 
> 
>   if ((strcmp(tbl->table_name, "table_i_dont_want")) ||
>   (strcmp(tbl->table_name, "another_table_i_dont_want")) ||
>   (strcmp(tbl->table_name, "still_another_table_i_dont_want"))) {
> /* do nothing */
>   } else {
> /* proceed with usual logic */
> if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >=
> tbl->vacuum_threshold)
> ...

A kludge for sure, but it is open source so...

> To provide the answer that you asked for, in a more general way, would
> require introducing either a data file parser (which is what made pgavd
> a serious pain to deploy) or that pg_autovacuum set up its own
> PostgreSQL tables and store data in them.  

I think a better setup would be to have the config information in the
database itself, or add a new system table that allows it.  A new system
table would require that pg_autovacuum be accecpted as a core component
of postgresql.  I don't see this happening, not as long as it's a libpq
based client app.  Using a table inside of a database makes it easy for
the settings to be tweaked live and reduces complexity.

> There _is_ merit to that; one present shortcoming of pg_autovacuum is
> that it can only talk to one postmaster.  If one were to, for instance,
> have _four_ backends (with 4 separate port numbers) on one server, you
> need four instances of pg_autovacuum, and they would be perfectly happy
> to trample the I/O bus if they each concurrently figure they need to
> vacuum some big tables in the respective instances.

Agreed, this was never built into the design, nor do I think it should
be.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_autovacuum bug and feature request

2003-07-07 Thread Matthew T. O'Connor
On Fri, 2003-07-04 at 13:40, Vincent van Leeuwen wrote:
> I've been using pg_autovacuum for a couple of weeks now

Glad to hear it.

>  and have noticed one
> weird little bug: sometimes the daemon calculates it used a negative amount of
> time for the last vacuum it did, and waits no time at all before checking if
> it needs to run anything again. Sample output:
> 
> 2411 All DBs checked in: -717533400 usec, will sleep for 30 secs.

Strange, I have never seen this.  I run redhat and have tested with
RH7.3, 8.0, 9.  Christopher Browne has also worked on pg_autovacuum and
I have never heard of this problem from him either. 

I would suggest upgrading to the version that is in cvs and seeing if
it's any better.

> The 30 secs is only because I ran it like this:
> pg_autovacuum -d 2 -s 30 -S 0 -t 250 -T 0.01 -U postgres
> 
> I'm using PostgreSQL 7.3.2 on Debian Linux, kernel 2.4.21-rc3.
> 
> 
> Also, I'd like to see a way to tell pg_autovacuum which tables it should
> monitor. I understand most setups would like to have all tables monitored, but
> on our setup pg_autovacuum is wasting most of it's time (and a fair amount of
> serverload) vacuuming some large tables (several GB's of data, the vacuums
> regularly take half an hour per table or something in the very rough vicinity)
> which doesn't give a large win in performance anyway, while it should be
> focusing it's efforts on a few intensively used small tables, where frequent
> vacuums are a much larger win for performance. I vacuum everything nightly
> anyway, so those large tables can be totally ignored by pg_autovacuum in my
> setup. As you can see from the weird -t and -T parameters I already tried to
> make it favor those smaller tables (which get about the same amount of updates
> as the large tables), but I'm not quite sure I'm doing it the right way.

First issue is that you are using an old version of pg_autovaccum,
please update.  Also many of the command line options have changed, the
threshold settings (-t, -T) have been broken up into independent
settings for separate vacuum and analyze thresholds (-v -V and -a -A).

If your large tables are being vacuumed too often, then your scaling
factor is too small.  The -V option says vacuum this table when the
number of update / inserts / deletes = -T percent of the total tuples in
the table.  So, -V = .01 says vacuum when 1% of the tables has been
updated, so if a table has 100k rows, it will get vacuumed every 1k
updates.  

I tried to address this problem by providing -v and -V.  pg_autovacuum
vacuums when (-v + -V*(num_rows_in_table)) updates occur (See the
README.pg_autovacuum for more details on the calculations).  So I would
set your scaling factor higher.  The default settings in cvs are now -v
= 1000 and -V = 2.0

Currently there is no way to specifically tell pg_autovacuum what tables
to check and which to ignore.  I have considering adding an option of
looking in the current database for a pg_autovacuum table that would
provide a list of tables to check / ignore and allow for custom values
of scaling factors etc... on a per database or table basis, but this is
not in cvs and won't be put in for 7.4.  Hopefully for 7.5 there will be
something integrated into the backend making this whole issue moot.

Good luck with this, and please email if you have any questions /
problems.

Matthew T. O'Connor


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] 7.4 Beta RPMS?

2003-07-13 Thread Matthew T. O'Connor
I know it's early, but I was just wondering if there would be 7.4 rpms
during beta?  

Thanks,

Matthew


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Win32 port - current status?

2003-07-13 Thread Matthew T. O'Connor
On Mon, 2003-07-14 at 01:58, Claudio Natoli wrote:
> I'm just (one of the many?) hanging out for this, to justify continued use
> of Postgres to the powers that be. Seems like there has been no word on this
> for a couple weeks, and I'm not even sure whether or not it has made/will
> make it into 7.4? Perhaps I've missed a crucial message somewhere...

I certainly can't comment on it's current status, but it most certainly
will not be in 7.4.

I too am very excited by the port, but it just didn't get done in time
for 7.4, I hope it makes it into 7.5 devel early on in the cycle.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] full text archives working?

2003-08-28 Thread Matthew T. O'Connor
Hey, I just tried to to a search of the mail archives and got an error. 
I was trying to go here:

http://archives.postgresql.org/search.php?q=autovacuum&ps=50&wm=wrd&o=0&ul=http%3A%2F%2Farchives.postgresql.org%2Fpgsql-hackers%2F&m=all&wf=11

and got the following error:

DB err: could not connect to server: Connection refused Is the server
running on host neptune.hub.org and accepting TCP/IP connections on port
5432?

The error was repeated about 5 times.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Stats Collector Error 7.4beta1 and 7.4beta2

2003-09-03 Thread Matthew T. O'Connor
On Wed, 2003-09-03 at 23:50, Tom Lane wrote:
> Does 'ps' show that the stats collector and stats buffer postmaster
> child processes are alive?  Are there any suggestive complaints in
> the postmaster's log?

As Adam mentioned, I took a look at his system since the initial report
was about a problem with pg_autovacuum.  Anyway, Yes ps shows the two
stats collector related processes running, and no the log files don't
show anything helpful, however I didn't try to change any logging
settings.  Initially I saw an error in the logs about an IPv6 address
error but after I recompiled everthing with a simple ./configure
--prefix=/home/user/somethingelse/ I didn't get the IPv6 error in the
logs anymore.

Short answer is I have no idea why this is happening, and I didn't see
any obvious configuration problems that might cause this (make check
passed all tests).

Tom, Adam was able to give me a login to his machine, maybe he would do
the same for you.

Anyway, that is all I was able to see, hence the call for more help on
hackers :-)

Matthew



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Stats Collector Error 7.4beta1 and 7.4beta2

2003-09-03 Thread Matthew T. O'Connor
On Thu, 2003-09-04 at 01:23, Tom Lane wrote:
> Hm.  Could it be an IPv6 issue --- that is, the stats collector is alive
> and faithfully listening on some UDP port, but it's not the same port
> the backends try to send to?  Given the discussion over the past couple
> of days about bizarre interpretations of loopback addresses in
> pg_hba.conf, I could sure believe there's some similar kind of issue for
> the stats collector.

I had a similar thought, but I have no idea how I would verify this. 
The thing is, when I recompiled postgresql myself, I left pg_hba.conf at
default settings, and it's running on RH9, which I am running and have
not had a problem with...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-04 Thread Matthew T. O'Connor
Ouch... sorry, my fault.  I'll fix this tomorrow (Friday) and submit a
patch, or if you want to submit a patch that would be fine.  All you
have to do is change the the sql statements to put quotes around the
relation name. 

Thanks for catching this.

Matthew T. O'Connor

On Thu, 2003-09-04 at 18:39, Adam Kavan wrote:
> Now that I have pg_autovacuum working I've bumped into another small 
> bug.  When pg_autovacuum goes to vacuum or analyze one of my tables it runs...
> 
> analyze public.ConfigBackup
> 
> Because ConfigBackup is mixed case it cannot find the relation.  I fixed 
> this by going to the function init_table_info and increasing the malloc for 
> new_tbl->table_name by 2 and adding "'s to either side of the table 
> name.  Is there anything wrong with this approach?  Is there a config I can 
> set to make this non-case sensitive?
> 
> Thanks again for your time.
> 
> --- Adam Kavan
> --- [EMAIL PROTECTED]
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html
> 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-04 Thread Matthew T. O'Connor
On Thu, 2003-09-04 at 18:39, Adam Kavan wrote:
> Now that I have pg_autovacuum working I've bumped into another small 
> bug.  When pg_autovacuum goes to vacuum or analyze one of my tables it runs...

Also, has this been officially fixed?  All I have heard so far is that
you commented out the check and now now it works for you.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 11:08, Jan Wieck wrote:
> > Another way to give autovacuum some hints would be to return some number 
> > as commandtuples from vacuum. like the number of tuples actually 
> > vacuumed. That together with the new number of reltuples in pg_class 
> > will tell autovacuum how frequent a relation really needs scanning.
> 
> Which actually would be much better because it'd work without the 
> statistics collector configured for gathering IO stats.

Which is certainly a good thing.  Using the stats system is a measurable
performance hit.

I still want to play with pg_autovacuum ignoring the stats system and
just looking at the FSM data.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 11:17, Shridhar Daithankar wrote:
> On 22 Aug 2003 at 11:03, Jan Wieck wrote:
> > That's why I think it needs one more pg_stat column to count the number 
> > of vacuumed tuples. If one does
> > 
> >  tuples_updated + tuples_deleted - tuples_vacuumed
> > 
> > he'll get approximately the number of tuples a regular vacuum might be 
> > able to reclaim. If that number is really small, no need for autovacuum 
> > to cause any big trouble by scanning the relation.
> > 
> > Another way to give autovacuum some hints would be to return some number 
> > as commandtuples from vacuum. like the number of tuples actually 
> > vacuumed. That together with the new number of reltuples in pg_class 
> > will tell autovacuum how frequent a relation really needs scanning.
> 
> This kind of information does not really help autovacuum. If we are talking 
> about modifying backend stat collection algo., so that vacuum does minimum 
> work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
> at will any time. In the best case, another resident process like stat 
> collector can keep cleaning the deads.

I believe what Jan is talking about is knowing when to use a normal
vacuum, and when to do a vacuum decent.  So his proposal is working
under the assumption that there would be a cheaper vacuum analyze that
can be run most of the time.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 10:45, Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> Right.  One big question mark in my mind about these "partial vacuum"
> proposals is whether they'd still allow adequate FSM information to be
> maintained.  If VACUUM isn't looking at most of the pages, there's no
> very good way to acquire info about where there's free space.

Well, pg_autovacuum really needs to be looking at the FSM anyway.  It
could look at the FSM, and choose to to do a vacuum normal when there
the amount of FSM data becomes inadequate.  Of course I'm not sure how
you would differentiate a busy table with "inadequate" FSM data and an
inactive table that doesn't even register in the FSM.  Perhaps you would
still need to consult the stats system.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-10 Thread Matthew T. O'Connor
On Wed, 2003-09-10 at 15:57, Bruce Momjian wrote:
> I assume the attached patch is what you want done to fix this.  Applied.
> 
> It quotes table names for vacuum and analyze, and uppercases the
> keywords for clarity.

Yeah, this is basically what I meant, sorry I didn't get to it quicker. 

However, I tested it out a little and the patch you made doesn't work
because it produces commands like:

VACUUM ANALYZE "public.FooBar"

Which doesn't work, so I made my own patch that creates commands like:

VACUUM ANALYZE "public"."FooBar"

This allows for mixed case schema names as well as tables.

Adam, can you please give this a test as you are the person who caught
the bug in the first place.

Thanks, 

Matthew T. O'Connor

*** pg_autovacuum.c.orig	2003-09-10 23:13:51.95072 -0400
--- pg_autovacuum.c	2003-09-10 23:59:25.672571940 -0400
***
*** 88,103 
  
  	new_tbl->table_name = (char *)
  		malloc(strlen(PQgetvalue(res, row, PQfnumber(res, "relname"))) +
! 			   strlen(new_tbl->schema_name) + 2);
  	if (!new_tbl->table_name)
  	{
  		log_entry("init_table_info: malloc failed on new_tbl->table_name");
  		fflush(LOGOUTPUT);
  		return NULL;
  	}
! 	strcpy(new_tbl->table_name, new_tbl->schema_name);
! 	strcat(new_tbl->table_name, ".");
  	strcat(new_tbl->table_name, PQgetvalue(res, row, PQfnumber(res, "relname")));
  
  	new_tbl->CountAtLastAnalyze =
  		(atol(PQgetvalue(res, row, PQfnumber(res, "n_tup_ins"))) +
--- 88,108 
  
  	new_tbl->table_name = (char *)
  		malloc(strlen(PQgetvalue(res, row, PQfnumber(res, "relname"))) +
! 			   strlen(new_tbl->schema_name) + 6);
  	if (!new_tbl->table_name)
  	{
  		log_entry("init_table_info: malloc failed on new_tbl->table_name");
  		fflush(LOGOUTPUT);
  		return NULL;
  	}
! 
! 	/* Put both the schema and table name in quotes so that 
! 		we can work with mixed case table names */
! 	strcpy(new_tbl->table_name, "\"");
! 	strcat(new_tbl->table_name, new_tbl->schema_name);
! 	strcat(new_tbl->table_name, "\".\"");
  	strcat(new_tbl->table_name, PQgetvalue(res, row, PQfnumber(res, "relname")));
+ 	strcat(new_tbl->table_name, "\"");
  
  	new_tbl->CountAtLastAnalyze =
  		(atol(PQgetvalue(res, row, PQfnumber(res, "n_tup_ins"))) +
***
*** 581,587 
  	{
  		PGresult   *res = NULL;
  
! 		res = send_query("vacuum", dbi);
  		/* FIXME: Perhaps should add a check for PQ_COMMAND_OK */
  		PQclear(res);
  		return 1;
--- 586,592 
  	{
  		PGresult   *res = NULL;
  
! 		res = send_query("VACUUM", dbi);
  		/* FIXME: Perhaps should add a check for PQ_COMMAND_OK */
  		PQclear(res);
  		return 1;
***
*** 733,739 
  	PGresult   *res = NULL;
  	int			ret = 0;
  
! 	res = send_query("show stats_row_level", dbi);
  	ret =
  		strcmp("on", PQgetvalue(res, 0, PQfnumber(res, "stats_row_level")));
  	PQclear(res);
--- 738,744 
  	PGresult   *res = NULL;
  	int			ret = 0;
  
! 	res = send_query("SHOW stats_row_level", dbi);
  	ret =
  		strcmp("on", PQgetvalue(res, 0, PQfnumber(res, "stats_row_level")));
  	PQclear(res);
***
*** 1082,1088 
   */
  if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >= tbl->vacuum_threshold)
  {
! 	snprintf(buf, sizeof(buf), "vacuum analyze %s", tbl->table_name);
  	if (args->debug >= 1)
  	{
  		sprintf(logbuffer, "Performing: %s", buf);
--- 1087,1093 
   */
  if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >= tbl->vacuum_threshold)
  {
! 	snprintf(buf, sizeof(buf), "VACUUM ANALYZE %s", tbl->table_name);
  	if (args->debug >= 1)
  	{
  		sprintf(logbuffer, "Performing: %s", buf);
***
*** 1096,1102 
  }
  else if ((tbl->curr_analyze_count - tbl->CountAtLastAnalyze) >= tbl->analyze_threshold)
  {
! 	snprintf(buf, sizeof(buf), "analyze %s", tbl->table_name);
  	if (args->debug >= 1)
  	{
  		sprintf(logbuffer, "Performing: %s", buf);
--- 1101,1107 
  }
  else if ((tbl->curr_analyze_count - tbl->CountAtLastAnalyze) >= tbl->analyze_threshold)
  {
! 	snprintf(buf, sizeof(buf), "ANALYZE %s", tbl->table_name);
  	if (args->debug >= 1)
  	{
  		sprintf(logbuffer, "Performing: %s", buf);

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Vote: Adding flex/bison derived files in WIN32_DEV

2003-09-11 Thread Matthew T. O'Connor
On Wed, 2003-09-10 at 12:03, Bruce Momjian wrote:
> Because MinGW/Msys doesn't come with flex/bison by default, I have added
> those derived files to the WIN32_DEV branch in CVS.  

I'm confused.  Right on the MinGW download page is a link for
bison-1.875.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-11 Thread Matthew T. O'Connor
On Thu, 2003-09-11 at 15:02, Bruce Momjian wrote:
> Patch applied.  You might want to look at pg_dump/dumputils.c::fmtId()
> for a function that does smart quoting.

OK, thanks.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-11 Thread Matthew T. O'Connor
On Thu, 2003-09-11 at 08:12, Christopher Browne wrote:
> Something I am feeling a little suspicious of is that I haven't seen,
> in the logs, pg_autovacuum looking at pg_ tables.  
> 
> I know that if we don't periodically vacuum such system tables as
> pg_class, pg_attribute, pg_statistic, and pg_type, they can get to
> "pretty evil size."
> 
> [Rummaging around...]  These tables are being added for template1, but
> apparently not for "main" databases.  That looks like a bit of a fly
> in the ointment...

I designed it that way.  It was my understanding that all of the system
tables pg_class etc... are shared tables, available in all databases,
but actually stored as only one central set of real tables.  Hence
vacuuming pg_class from template1 helps every database that accesses
pg_class.

Did I make a design error?



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-11 Thread Matthew T. O'Connor
On Thu, 2003-09-11 at 17:11, Tom Lane wrote:
> "Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
> > I designed it that way.  It was my understanding that all of the system
> > tables pg_class etc... are shared tables, available in all databases,
> > but actually stored as only one central set of real tables.
> 
> You are very badly mistaken.
> 
> Only the tables marked "relisshared" in pg_class (currently pg_shadow,
> pg_group, pg_database, and their indexes and toast tables) are shared
> across a cluster.  The rest have separate copies per-database.

hrm OK.  Patch forthcoming



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-11 Thread Matthew T. O'Connor
On Thu, 2003-09-11 at 18:25, Tom Lane wrote:
> BTW, I am not sure it is a good idea to suppress "redundant" vacuuming
> of shared tables in the first place.  The trouble with doing so is that
> if you only vacuum pg_shadow through template1, then only template1 will
> ever have up-to-date statistics about it.  That's not good.
> 
> You might be able to get away with doing actual vacuums only through
> template1, and doing just ANALYZEs every so often in other DBs.

ok I will see what I can do about that.  So I assume that the vacuumdb
script handle this just does redundant vacuums / analyzes on shared
tables so that it doesn't have a problem with this.

If we can supress "redundant" vacuuming I think that would be a good
thing as pg_autovacuum is supposed to make the required vacuuming as
efficient as possible.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-12 Thread Matthew T. O'Connor
On Thu, 2003-09-11 at 18:25, Tom Lane wrote:
> "Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
> > hrm OK.  Patch forthcoming
> 
> BTW, I am not sure it is a good idea to suppress "redundant" vacuuming
> of shared tables in the first place.  The trouble with doing so is that
> if you only vacuum pg_shadow through template1, then only template1 will
> ever have up-to-date statistics about it.  That's not good.
> 
> You might be able to get away with doing actual vacuums only through
> template1, and doing just ANALYZEs every so often in other DBs.

I made a patch to fix this, but in testing it I noticed that the stats
system doesn't work on shared tables as I was expecting it too (as my
latest patch requires it too :-). It treats instances of shared tables
in separate databases as totally unique tables.  This makes it hard to
know how much activity has really gone on for a shared table.

Is the behavior of the following example expected / desired?

template1=# select ... (query details snipped)
   relname   | relisshared | n_tup_ins | n_tup_upd | n_tup_del
-+-+---+---+---
 pg_database | t   |28 | 0 |28
(1 row)

template1=# create database foo; drop database foo;
CREATE DATABASE
DROP DATABASE
template1=# select 
   relname   | relisshared | n_tup_ins | n_tup_upd | n_tup_del
-+-+---+---+---
 pg_database | t   |29 | 0 |29
(1 row)

template1=# \c matthew
You are now connected to database "matthew".
matthew=# select 
   relname   | relisshared | n_tup_ins | n_tup_upd | n_tup_del
-+-+---+---+---
 pg_database | t   | 2 | 0 | 2
(1 row)

matthew=# create database foo; drop database foo;
CREATE DATABASE
DROP DATABASE
matthew=# select 
   relname   | relisshared | n_tup_ins | n_tup_upd | n_tup_del
-+-+---+---+---
 pg_database | t   | 3 | 0 | 3
(1 row)

matthew=# \c template1
You are now connected to database "template1".
template1=# select 
   relname   | relisshared | n_tup_ins | n_tup_upd | n_tup_del
-+-+---+---+---
 pg_database | t   |29 | 0 |29
(1 row)



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-12 Thread Matthew T. O'Connor
On Fri, 2003-09-12 at 09:35, Bruce Momjian wrote:
> Matthew T. O'Connor wrote:
> > I made a patch to fix this, but in testing it I noticed that the stats
> > system doesn't work on shared tables as I was expecting it too (as my
> > latest patch requires it too :-). It treats instances of shared tables
> > in separate databases as totally unique tables.  This makes it hard to
> > know how much activity has really gone on for a shared table.
> > 
> > Is the behavior of the following example expected / desired?
> 
> I suspect is just a bug because no one noticed it before.  As I
> understand it, the stats system is recorded per-database.  We could add
> stuff so the global tables are only recorded in template1 or perhaps
> record in template1 but report template1's numbers for all databases.

OK, well as we wait on the fix for the stats system, let me submit my
patch for pg_autovacuum.  This patch assumes that the stats system will
be fixed so that all inserts, updates and deletes performed on shared
tables reguardless of what database those commands were executed from,
will show up in the stats shown in each database.

Even if this the stats system isn't fixed, this patch still is much
better about monitoring system tables that aren't shared, so it's an
improvement no matter what.

Matthew T. O'Connor


*** pg_autovacuum.c	2003-09-12 02:28:38.012400601 -0400
--- /home/matthew/downloads/pg_autovacuum.c	2003-09-12 02:42:35.891499368 -0400
***
*** 118,130 
  	new_tbl->reltuples = atoi(PQgetvalue(res, row, PQfnumber(res, "reltuples")));
  	new_tbl->relpages = atoi(PQgetvalue(res, row, PQfnumber(res, "relpages")));
  
- 	log_entry(PQgetvalue(res, row, PQfnumber(res, "relisshared")));
- 
- 	if(strcmp("t", PQgetvalue(res, row, PQfnumber(res, "relisshared"
- 		new_tbl->relisshared = 0;
- 	else 
- 		new_tbl->relisshared = 1;
- 
  	new_tbl->analyze_threshold =
  		args->analyze_base_threshold + args->analyze_scaling_factor * new_tbl->reltuples;
  	new_tbl->vacuum_threshold =
--- 118,123 
***
*** 220,226 
  		 * both remove tables from the list that no longer exist and add
  		 * tables to the list that are new
  		 */
! 		res = send_query((char *) TABLE_STATS_QUERY, dbi);
  		t = PQntuples(res);
  
  		/*
--- 213,219 
  		 * both remove tables from the list that no longer exist and add
  		 * tables to the list that are new
  		 */
! 		res = send_query(query_table_stats(dbi), dbi);
  		t = PQntuples(res);
  
  		/*
***
*** 360,366 
  {
  	sprintf(logbuffer, "  table name: %s.%s", tbl->dbi->dbname, tbl->table_name);
  	log_entry(logbuffer);
! 	sprintf(logbuffer, " relfilenode: %i;   relisshared: %i", tbl->relfilenode, tbl->relisshared);
  	log_entry(logbuffer);
  	sprintf(logbuffer, " reltuples: %i;  relpages: %i", tbl->reltuples, tbl->relpages);
  	log_entry(logbuffer);
--- 353,359 
  {
  	sprintf(logbuffer, "  table name: %s.%s", tbl->dbi->dbname, tbl->table_name);
  	log_entry(logbuffer);
! 	sprintf(logbuffer, " relfilenode: %i", tbl->relfilenode);
  	log_entry(logbuffer);
  	sprintf(logbuffer, " reltuples: %i;  relpages: %i", tbl->reltuples, tbl->relpages);
  	log_entry(logbuffer);
***
*** 595,601 
  
  		res = send_query("VACUUM", dbi);
  		/* FIXME: Perhaps should add a check for PQ_COMMAND_OK */
- 		/* FIXME: We should also reset all table stats since we just vacuumed every table */
  		PQclear(res);
  		return 1;
  	}
--- 588,593 
***
*** 698,703 
--- 690,707 
  
  /* Begninning of misc Functions */
  
+ 
+ char *
+ query_table_stats(db_info * dbi)
+ {
+ 	if (!strcmp(dbi->dbname, "template1"))		/* Use template1 to
+  * monitor the system
+  * tables */
+ 		return (char *) TABLE_STATS_ALL;
+ 	else
+ 		return (char *) TABLE_STATS_USER;
+ }
+ 
  /* Perhaps add some test to this function to make sure that the stats we need are available */
  PGconn *
  db_connect(db_info * dbi)
***
*** 749,757 
  	if (NULL == dbi->conn)
  		return NULL;
  
- if (args->debug >= 4)
- 		log_entry(query);
- 
  	res = PQexec(dbi->conn, query);
  
  	if (!res)
--- 753,758 
***
*** 1054,1060 
  
  if (0 == xid_wraparound_check(dbs));
  {
! 	res = send_query((char *) TABLE_STATS_QUERY, dbs);		/* Get an updated
  		 * snapshot of this dbs
  		 * table stats */
  	for (j = 0; j < PQntuples(res); j++)
--- 1055,1061 
  
  if (0 == xid_wraparound_check(dbs));
  {
! 	res = send_query(query_table_stats(dbs), dbs);		/* Get an updated
  		 * snapshot of this dbs
  

Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-12 Thread Matthew T. O'Connor
On Fri, 2003-09-12 at 12:46, Tom Lane wrote:
> How will it act with shared tables if the stats system isn't fixed?
> We may decide that tracking shared tables correctly will have to wait
> for 7.5.

The behavior in the patch will vacuum a shared table only from
template1, and only analyze from all other databases. If the stats
system isn't fixed for 7.4, then I can easily change this to vacuum
shared tables from all databases. The real problem is that each database
will only see activity that was performed in it, so we might need to
vacuum far before pg_autovacuum is shown the required amount of activity
to do something about it.

So we would have a problem if commands that effect these tables are done
from lots of different databases.  In reality, I don't think these
tables change that much (pg_database, pg_shadow, and pg_group), and most
of commands that do effect these tables are usually done from template1.

I can hardwire in something to hedge this off like setting the threshold
for shared tables much much lower than normal thresholds.  I could also
do something more complicated and try to aggregate all the activity seen
by all the databases and when the sum exceeds the threshold then have
then perform a vacuum from template1 and analyze from all other
databases.

Thoughts?



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Another small bug (pg_autovacuum)

2003-09-12 Thread Matthew T. O'Connor
On Fri, 2003-09-12 at 13:06, Tom Lane wrote:
> > I can hardwire in something to hedge this off like setting the threshold
> > for shared tables much much lower than normal thresholds.  I could also
> > do something more complicated and try to aggregate all the activity seen
> > by all the databases and when the sum exceeds the threshold then have
> > then perform a vacuum from template1 and analyze from all other
> > databases.
> 
> That seems like more work than it's worth for a short-term stopgap.
> 
> If Jan concludes that fixing pgstats is *really* hard and will not
> happen for awhile, then we could talk about more extensive workarounds
> in pg_autovacuum, but right now I doubt it's needed.

OK, I'll hold for now.  The patches I just submitted should be fine, and
if the stats system doesn't get fixed, I'll make the small change where
vacuum is done from all databases, not just template1.

Thanks


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump doesn't dump binary compatible casts

2003-09-24 Thread Matthew T. O'Connor
On Tue, 2003-09-23 at 22:40, Greg Stark wrote:
> [But then I'm not a fan of treating pg_dump files as if they were backups.]

If you don't use pg_dump for backups what do you use?  Stop the database
and copy the data directory? That is not a valid choice for most people.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] initdb failure (was Re: [GENERAL] sequence's plpgsql)

2003-09-26 Thread Matthew T. O'Connor
On Fri, 2003-09-26 at 11:01, Tom Lane wrote:
> so it appears that cygwin's "echo" generates a different newline style
> than what got put into sql_features.txt.  A possible way to fix this is
> to put the "\." line into sql_features.txt, but maybe there's a cleaner
> answer.  Peter, any thoughts?

Does cygwin still have the install time option of what type of line feed
to use?  I know at one point (a long time ago) when I installed cygwin,
and chose windows line feeds (CRLF) that it caused problems with several
applications.  So the problem might be that with CYGWIN you could have
either type of line feed depending on what the user selected during
install.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Autovacuum improvements

2007-01-14 Thread Matthew T. O'Connor
First, thanks for working on this.  I hope to be helpful with the design 
discussion and possibly some coding if I can find the time.


My initial reaction to this proposal is that it seems overly complex, 
however I don't see a more elegant solution.  I'm a bit concerned that 
most users won't figure out all the knobs.


Alvaro Herrera wrote:

I've been thinking how to improve autovacuum so that we can convince
more people that it can be enabled by default.  


I would like to see it enabled by default too, however the reason it 
isn't already enabled by default is that it caused failures in the 
regression test when we tried to turn it on during the 8.2 dev cycle and 
it was too close to beta to fix everything.  All this new machinery is 
great, but it doesn't address that problem.



Here are my thoughts.
There are two areas of improvements:

1. scheduling, and
2. process handling, i.e., how to have multiple vacuum processes running
   at any time.


Fail enough, but I would say the two biggest area for improvement are 
scheduling and preventing "HOT" tables from becoming vacuum starved 
(essentially what you said, but with a different emphasis).


[snip]


Process Handling


My idea here is to morph the current autovacuum daemon from an agent
that itself runs a vacuum command, into something that launches other
processes to run those vacuum commands.  I'll call this "the autovacuum
launcher process", or the launcher for short.  The idea here is that the
launcher can take care of the scheduling while the worker processes do
their work.  If the launcher then determines that a particular instant
there should be two vacuums running, then it simply starts two worker
processes.


How about calling it the autovacuum_master process?

[snip autovacuum launcher process description]

That all sounds reasonable to me.  I think the harder part is what you 
are getting at below (how to get the launcher to figure out what to 
vacuum when).



Scheduling
==
We introduce the following concepts:

1. table groups.  We'll have a system catalog for storing OID and group
name, and another catalog for membership, linking relid to group OID.

pg_av_tablegroup
  tgrname   name

pg_av_tgroupmembers
  groupid   oid
  relid oid

>

2. interval groups.  We'll have a catalog for storing igroup name and
OID, and another catalog for membership.  We identify an interval by:
   - month of year
   - day of month
   - day of week
   - start time of day
   - end time of day

This is modelled after crontabs.

pg_av_intervalgroup
 igrnamename

pg_av_igroupmembers
 groupidoid
 month  int
 domint
 dowint
 starttime  timetz
 endtimetimetz


This seems to assume that the start and end time for an interval will be 
on the same day, you probably need to specify a start month, dom, dow, 
time and an end month, dom, dow and time.


Since this is modeled after cron, do we allow wild-cards, or any of the 
other cron tricks like */20 or 1-3,5,7,9-11?


Also your notation above is ambiguous, it took me a while to realize 
that pg_av_igroupmembers.groupid wasn't referencing the id from 
pg_av_tablegroup.



Additionally, we'll have another catalog on which we'll store table
groups to interval groups relationships.  On that catalog we'll also
store those autovacuum settings that we want to be able to override:
whether to disable it for this interval group, or the values for the
vacuum/analyze equations.

pg_av_schedule
 tgroup oid
 igroup oid
 enabledbool
 queue  int
 vac_base_threshint
 vac_scale_factor   float
 anl_base_threshint
 anl_scal_factorfloat
 vac_cost_delay int
 vac_cost_limit int
 freeze_min_age int
 freeze_max_age int



What is queue for?


So the scheduler, at startup, loads the whole schedule in memory, and
then wakes up at reasonable intervals and checks whether these equations
hold for some of the tables it's monitoring.  If they do, then launch a
new worker process to do the job.

We need a mechanism for having the scheduler rescan the schedule when a
user modifies the catalog -- maybe having a trigger that sends a signal
to the process is good enough (implementation detail: the signal must be
routed via the postmaster, since the backend cannot hope to know the
scheduler's PID.  This is easy enough to do.)


This all looks reasonable if not a bit complex.  Question, what happens 
to the current pg_autovacuum relation?


Also what about system defaults, will we have a hard coded default 
interval of always on, and one default table group that contains all the 
tables with one default entry in pg_av_schedule?


I think we need more discussion on scheduling, we need to make sure this 
solves the vacuum starvation problem.  Does the launcher process 
consider each row in pg_av_schedule t

Re: [HACKERS] Autovacuum improvements

2007-01-14 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  

Alvaro Herrera wrote:


pg_av_igroupmembers
groupid oid
month   int
dom int
dow int
starttime   timetz
endtime timetz
  
This seems to assume that the start and end time for an interval will be 
on the same day, you probably need to specify a start month, dom, dow, 
time and an end month, dom, dow and time.



Actually, I was thinking that if you want intervals that cross day
boundaries, you just add more tuples (one which finishes at 23:59:59 and
another which starts at 00:00:00 the next day).
  


This still seems ambiguous to me, how would I handle a maintenance 
window of Weekends from Friday at 8PM though Monday morning at 6AM? My 
guess from what said is:

mon dom dow starttime endtime
null  null6  20:00  null
null  null1  null  06:00

So how do we know to vacuum on Saturday or Sunday?  I think clearly 
defined intervals with explicit start and stop times is cleaner.


This all looks reasonable if not a bit complex.  Question, what happens 
to the current pg_autovacuum relation?



I had two ideas: one was to make pg_autovacuum hold default config for
all tables not mentioned in any group, so sites which are OK with 8.2's
representation can still use it.  The other idea was to remove it and
replace it with this mechanism.

  


Probably best to just get rid of it.  GUC variables hold the defaults 
and if we create a default interval / group, it will also have defaults.


I think we need more discussion on scheduling, we need to make sure this 
solves the vacuum starvation problem.  Does the launcher process 
consider each row in pg_av_schedule that applies at the current time 
separately?  That is say there are three entries in pg_av_schedule that 
apply right now, does that mean that the launcher can fire off three 
different vacuums?  Perhaps we need to add a column to pg_av_tablegroup 
that specifies the max number of concurrent worker processes for this 
table group.



My idea was to assign each table, or maybe each group, to a queue, and
then have as much workers as there are queues.  So you could put them
all in a single queue and it would mean there can be at most one vacuum
running at any time.  Or you could put each group in a queue, and then
there could be as many workers as there are groups.  Or you could mix.

And also there would be a "autovac concurrency limit", which would be
a GUC var saying how many vacuums to have at any time.


Hmm... this seems like queue is nearly a synonym for group.  Can't we 
just add num_workers property to table groups?  That seems to accomplish 
the same thing.  And yes, a GUC variable to limits the total number of 
concurrent autovacuums is probably a good idea.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).


I'm not familiar with his proposal, but I'm not sure what I think of 
logging vacuum (and perhaps analyze) commands to a table.  We have never 
logged anything to tables inside PG.  I would be worried about this 
eating a lot of space in some situations.


I think most people would just be happy if we could get autovacuum to 
log it's actions at a much higher log level.  I think that "autovacuum 
vacuumed table x" is important and shouldn't be all the way down at the 
debug level.


The other (more involved) solution to this problem was proposed which 
was create a separate set of logging control params for autovacuum so 
that you can turn it up or down independent of the general server logging.



Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.


+1


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum improvements

2007-01-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  
This still seems ambiguous to me, how would I handle a maintenance 
window of Weekends from Friday at 8PM though Monday morning at 6AM? My 
guess from what said is:

mon dom dow starttime endtime
null  null6  20:00  null
null  null1  null  06:00

So how do we know to vacuum on Saturday or Sunday?  I think clearly 
defined intervals with explicit start and stop times is cleaner.



mon dom dow start   end
nullnull5   20:00   23:59:59
nullnull6   00:00   23:59:59
nullnull7   00:00   23:59:59
nullnull1   00:00   06:00

(1 = monday, 5 = friday)
  


So it takes 4 lines to handle one logical interval, I don't really like 
that.  I know that your concept of interval groups will help mask this 
but still.



Now I'm starting to wonder what will happen between 23:59:59 of day X
and 00:00:00 of day (X+1) ...  Maybe what we should do is not specify
an end time, but a duration as an interval:

month   int
dom int
dow int
start   time
durationinterval

That way you can specify the above as
mon dom dow start   duration
nullnull5   20:00   (4 hours + 2 days + 6 hours)

Now, if a DST boundary happens to fall in that interval you'll be an
hour short, or it'll last an hour too long :-)
  


I certainly like this better than the first proposal, but I still don't 
see how it's better than a  full set of columns for start and end 
times.  Can you tell me why you are trying to avoid that design? 

Hmm... this seems like queue is nearly a synonym for group.  Can't we 
just add num_workers property to table groups?  That seems to accomplish 
the same thing.  And yes, a GUC variable to limits the total number of 
concurrent autovacuums is probably a good idea.



queue = group of groups.  But I'm not sure about this at all, which is
why I took it away from the proposal.


I think we can live without the groups of groups, at least for now. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:


[Snip Detailed Description]


How does this sound?


On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).  In general the only time it's a 
good idea to have multiple vacuums running at the same time is when a 
big table is starving a small hot table and causing bloat.


I think we can extend the current autovacuum stats to add one more 
column that specifies "is hot" or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


Thoughts?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).


Yeah, I understand that.  However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available.  The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.


So you are telling people to choose an autovacuum_delay so high that 
they need to run multiple autovacuums at once to keep up?  I'm probably 
being to dramatic, but it seems inconsistent.


I think we can extend the current autovacuum stats to add one more 
column that specifies "is hot" or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


How would you define what's a "hot" table?


I wasn't clear, I would have the Admin specified it, and we can store it 
as an additional column in the pg_autovacuum_settings table.  Or perhaps 
if the table is below some size threshold and autovacuum seems that it 
needs to be vacuumed every time it checks it 10 times in a row or 
something like that.	


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Ok, scratch that :-)  Another round of braindumping below.


I still think this is solution in search of a problem.  The main problem 
we have right now is that hot tables can be starved from vacuum.  Most 
of this proposal doesn't touch that.  I would like to see that problem 
solved first, then we can talk about adding multiple workers per 
database or per tablespace etc...



(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables).  This may be overkill.)


I think this is more along the lines of what we need first.


Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc.  He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented.  So this "take 2" could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).


Agree this is probably a good idea in the long run, but I agree this is 
lower on the priority list and should come next.



Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace.  I say we're not :-)  If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate.  I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.


I agree that in the long run it might be better to have multiple workers 
with MB/s throttle and tablespace aware, but we don't have any of that 
infrastructure right now.  I think the piece of low-hanging fruit that 
your launcher concept can solve is the hot table starvation.


My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.


This seems a very simple solution (given your launcher work) that can 
solve the starvation problem.



Thoughts?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
  
My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.



Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.


So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



Thoughts?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
  

So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



That would likely result in a number of workers running in one database,
unless you limited how many workers per database. And if you did that,
you wouldn't be addressing the frequently update table problem.

A second vacuum in a database *must* exit after a fairly short time so
that we can go back in and vacuum the important tables again (well or
the 2nd vacuum has to periodically re-evaluate what tables need to be
vacuumed).
  


I'm not sure this is a great idea, but I don't see how this would result 
in large numbers of workers working in one database.   If workers work 
on tables in size order, and exit as soon as they catch up to an older 
worker, I don't see the problem.  Newer works are going to catch-up to 
older workers pretty quickly since small tables will vacuum fairly quickly.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:


That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number of seconds.

If we wanted to get fancy, we could factor in how far over the vacuum
threshold a table is, so even if the table is on the larger size, if
it's way over the threshold the second vacuum will hit it.


Ok, I think we may be actually getting somewhere.


Me too.


I propose to have two different algorithms for choosing the tables to
work on.  The worker would behave differently, depending on whether
there is one or more workers on the database already or not.

The first algorithm is the plain threshold equation stuff we use today.
If a worker connects and determines that no other worker is in the
database, it uses the "plain worker" mode.  A worker in this mode would
examine pgstats, determine what tables to vacuum/analyze, sort them by
size (smaller to larger), and goes about its work.  This kind of worker
can take a long time to vacuum the whole database -- we don't impose any
time limit or table size limit to what it can do.


Right, I like this.


The second mode is the "hot table worker" mode, enabled when the worker
detects that there's already a worker in the database.  In this mode,
the worker is limited to those tables that can be vacuumed in less than
autovacuum_naptime, so large tables are not considered.  Because of
this, it'll generally not compete with the first mode above -- the
tables in plain worker were sorted by size, so the small tables were
among the first vacuumed by the plain worker.  The estimated time to
vacuum may be calculated according to autovacuum_vacuum_delay settings,
assuming that all pages constitute cache misses.


How can you determine what tables can be vacuumed within 
autovacuum_naptime?  I agree that large tables should be excluded, but I 
don't know how we can do that calculation based on autovacuum_naptime.


So at:
t=0*autovacuume_naptime: worker1 gets started on DBX
t=1*autovacuume_naptime: worker2 gets started on DBX
worker2 determines all tables that need to be vacuumed,
worker2 excludes tables that are too big from it's to-do list,
worker2 gets started working,
worker2 exits when it either:
a) Finishes it's entire to-do-list.
b) Catches up to worker1

I think the questions are 1) What is the exact math you are planning on 
using to determine which tables are too big?  2) Do we want worker2 to 
exit when it catches worker1 or does the fact that we have excluded 
tables that re "too big" mean that we don't have to worry about this?



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  1   2   3   4   >