Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Alvaro Herrera wrote:

some things are defined in postgresql.conf by initdb and you probably 
want to be able to change them by SET PERSISTENT anyway (e.g. 
lc_messages, listen_addresses, shared_buffers)


An obvious next step once the directory parsing is committed is to change 
initdb to put all of its changes into a separate file.  Ideally, 8.5 would 
ship with a postgresql.conf having zero active settings, and the conf/ 
directory would have two entries:


initdb.conf : shared_buffers, lc_messages, listen_addresses, etc.
persistent.conf : Blank except for comment text

People who want to continue managing just the giant postgresql.conf are 
free to collapse the initdb.conf back into the larger file instead.  If we 
wanted to make that transition easier, an option to initdb saying do 
things the old way might make sense.  I think the best we can do here is 
make a path where new users who don't ask for anything special get a setup 
that's easy for tools to work on, while not completely deprecating the old 
approach for those who want it--but you have to ask for it.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Alvaro Herrera wrote:


But to me this also says that SET PERSISTENT has to go over
00initdb.conf and add a comment mark to the setting.


Now you're back to being screwed if the server won't start because of your 
change, because you've lost the original working setting.


I think the whole idea of making tools find duplicates and comment them 
out as part of making their changes is fundamentally broken, and it's just 
going to get worse when switching to use more config files.  The fact that 
user edits can introduce the same problem, where something is set in more 
than one file but only one of them works, means that you can run into this 
even if tool editing hygiene is perfect.


A whole new approach is needed if you're going to get rid of this problem 
both for tools and for manual edits.  What I've been thinking of is making 
it possible to run a configuration file check that scans the config 
structure exactly the same way as the server, but when it finds a 
duplicate setting it produces a warning showing where the one being 
ignored is.  The patch added near to the end of 8.4 development that 
remembers the source file and line number of lines already parsed made 
that more straightforward I think.  Not having that data is what made this 
hard to write when I last considered it a while ago.


If you had that utility, it's a simple jump to then make it run in a 
--fix mode that just comments out every such ignored duplicate.  Now 
you've got a solution to this problem that handles any sort of way users 
can mess with the configuration.  One might even make a case that this 
tool should get run just after every time the server starts successfully.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Tom Lane wrote:


BTW, why do we actually need an includedir mechanism for this?
A simple include of a persistent.conf file seems like it would be
enough.


Sure, you could do it that way.  This patch is more about elegance rather 
than being strictly required.  The general consensus here seemed to be 
that if you're going to start shipping the database with more than one 
config file, rather than just hacking those in one at a time it would be 
preferrable to grab a directory of them.  That seems to be how similar 
programs handle things once the number of shipped config files goes from 1 
to 1.


One thing this discussion has made me reconsider is whether one of those 
files needs to be enforced as always the last one to be parsed, similar to 
how postgresql.conf is always the first one.  I am slightly concerned that 
a future SET PERSISTENT mechanism might update a setting that's later 
overriden by a file that just happens to be found later than the mythical 
persistent.conf.  I'd rather worry about that in the future rather than 
burden current design with that detail though.  Alvaro already introduced 
the init-script way of handling this by suggesting the configuration file 
name 00initdb ; using that and 99persistent would seem to be a reasonable 
solution that's quite familiar to much of the target audience here.  Note 
that I don't think that standard requires anything beyond what the 
proposed patch already does, processing files in alphabetical order.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Tom Lane wrote:

When and if there is some evidence of people actually getting confused, 
we could consider trying to auto-comment-out duplicate settings.  But 
I've never heard of any other tool doing that, and fail to see why we 
should think Postgres needs to.


It's what people tend to do when editing the postgresql.conf file(s) by 
hand, which is why I think there's some expectation that tools will 
continue that behavior.  What everyone should understand is that we don't 
have more tools exactly because their design always gets burdened with 
details like that.  This is easy to handle by hand, but hard to get a 
program to do in a way that satisfies what everyone is looking for. 
Raising the bar for tool-assisted changes (and I'm including SET 
PERSISTENT in that category) like that is one reason so few such tools 
have been written.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Kevin Grittner wrote:


We do find the include capabilities useful.  For example, for our 72
production servers for county Circuit Court systems, we copy an
identical postgresql.conf file to each county, with the last line
being an include to an overrides conf file in /etc/.  For most
counties that file is empty.  For counties where we've installed extra
RAM or where data is not fully cached, we override settings like
effective_cache_size or the page costs.  I can't see where any of the
options under discussion would do much to help an environment like
ours -- they seem more likely to help shops with fewer servers or more
relaxed deployment procedures.


That's exactly a use case the parsing config files in a directory 
feature aims to make easier to manage.  You can just mix and match files 
that adjust a subset of the postgresql.conf without having to explicitly 
include them.  For this sort of situation, you could create a base set of 
configuration changes, then a set that customizes for less common server 
configurations, and possibly even server-specific ones.  Copy in the 
subset from that master list of possible configuration sets that apply to 
this server and you're done.


Since variations on this feedback keep coming up, let's be be clear here: 
there is nothing this patch aims to add you can't already do with include 
files.  It's just a way to make more aggressive use of include files 
easier to manage, and therefore make doing so in the default configuration 
less objectionable.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith

On Mon, 26 Oct 2009, Greg Stark wrote:

When scanning postgresql.conf.d we should follow the Apache/Debian 
standard of scanning only files which match a single simple hard-coded 
template. I think the convention is basically the regexp 
^[0-9a-zA-Z-]*.conf$. It's important that it exclude typical backup file 
conventions like foo~ or foo.bak and lock file conventions like .#foo. 
There's no need for this to be configurable and I think that would be 
actively harmful.


If the default glob pattern is *.conf, won't all those already be screened 
out?  I can see your point that letting it be adustable will inevitably 
result in some fool one day writing a bad matching pattern that does grab 
backup/lock files.  But is that concern so important that we should limit 
what people who know what they're doing are allowed to do?


That also seems to be the theme of the rest of your comments about how to 
reorganize the postgresql.conf file.  Your comments about what should and 
shouldn't be configurable presumes it's OK for your priorities and what 
you like to be enforced as policy on everyone.  Whether or not I agree 
with you, I object to the idea of dictating in this area because it just 
encourages argument.  The goal here is to add flexibility and ways people 
can choose to work with the configuration, not to replace what's being 
done now outright with an approach everyone must adopt.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-26 Thread Greg Smith
It sounds like there's a consensus brewing here on what should get done 
with this particular patch now.  Let me try to summarize:


-The new feature should be activated by allowing you to specify a 
directory to include in the postgresql.conf like this:


  includedir 'conf'

With the same basic semantics for how that directory name is interpreted 
as the existing include directive.  Tom has some concerns on how this 
will be implemented, with glob portability to Windows and error cleanup 
as two of the issues to consider.


-Within that directory, only file names of the form *.conf will be 
processed.  More flexibility is hard to implement and of questionable 
value.


-The order they are processed in will be alphabetical.  This allows (but 
doesn't explictly require) using the common convention of names like 
99name to get a really obvious ordering.


-The default postgresql.conf should be updated to end with the sample 
includedir statement shown above.  This will make anything that goes into 
there be processed after the main file, and therefore override anything in 
it.


-An intended purpose here is making tools easier to construct.  It's 
impractical to expect every tool that touches files in the config 
directory to do an exhaustive sweep to find every other place there might 
be a conflict and comment them all out.  The fact that pg_settings shows 
users the exact file and line they setting that is the active one is a 
good enough tool to allow DBAs to work through most of the problem cases.


And as far as how it impacts planning:

-A future patch to initdb could move the changes it makes from the primary 
file to one in the config directory.  It might make sense to use a name 
like 00initdb.conf to encourage a known good naming practice for files in 
the config directory; that doesn't need to get nailed down now though.


-This patch makes it easier to envision implementing a smaller default 
postgresql.conf, but it doesn't require such a change to be useful.


-SET PERSISTENT is still a bit away.  This patch assists in providing a 
cleaner preferred way to implement that, and certainly doesn't make it 
harder to build.  The issue of how to handle backing out changes that 
result in a non-functional server configuration is still there.  And 
there's some support for the idea that the SQL interface should do more 
sanity checks to make sure its setting changes aren't being overridden by 
config files parsed later than we might expect from external tuning tools.


Magnus, was there anything else you wanted feedback on here?

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-27 Thread Greg Smith

On Tue, 27 Oct 2009, Dimitri Fontaine wrote:


I parse the current status as always reading files in the
postgresql.conf.d directory located in the same place as the current
postgresql.conf file.


Way upthread I pointed out that what some packagers have really wanted for 
a while now is to put the local postgresql.conf changes into /etc rather 
than have them live where the database does.  Allowing the directory to be 
customized makes that possible.  The idea is to improve flexiblity and 
options for DBAs and packagers as long as it's not difficult to implement 
the idea, and allowing for a relocatable config directory isn't that hard.



Tom had a reserve about allowing the user the control the overloading
behavior, but it appears that what we're trying to provide is a way for
tools not to fight against DBA but help him/her. So Greg Stark's idea do
sounds better: .d/ files are read first in alphabetical order,
then postgresql.conf is read. If the DBA want to manually edit the
configuration and be sure his edit will have effect, he just edits
postgresql.conf. No wondering.


We're trying to make allowances and a smooth upgrade path for old-school 
users who don't want to use this approach.  At the same time, let's be 
clear:  people who do that are going to find themselves increasingly 
cut-off from recommended pracice moving forward.  I want to make it 
possible for them to continue operating as they have been, while making it 
obvious that approach is on its way out.


If you want a future where it's easier for tools to operate, the config 
directory goes last and overrides anything put in the primary 
postgresql.conf in the default config.  Having it inserted as an explicit 
includedir line lets the DBA move it to the front themselves if they want 
to.  One thing we cannot do is make the includedir line implicit.  It must 
be the case that someone who opens a new postgresql.conf file and browses 
it sees exactly what's being done, so they can disable it or move the 
order it happens in around.



The regexp is still to be agreed upon, [0-9a-zA-Z-_.]+.conf or sth.


This is being left to the author of the code to decide.  There's reason to 
believe that *.conf is going to be hard enough to implement, and that's 
acceptable.  If it turns out that it's easier than expected to make a full 
regex syntax possible here, maybe this should get revisited on next 
review.



Then the pg_settings view could also embed the comments.


That whole bit you outlined is an interesting idea, but it doesn't impact 
this patch so I'd rather not see it drag discussion out further right now.



00-initdb.conf if you want some bikesheding to happen


That's a future patch anyway, we can bikeshed more after it's been 
submitted.  One file per GUC is certainly never going to fly though, it's 
been hard enough getting people to accept going from one file to more than 
one.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-27 Thread Greg Smith

On Tue, 27 Oct 2009, Kevin Grittner wrote:

I have 200 clusters.  I understand the proposal.  I see no benefit to 
me.


-Kevin, the troglodyte  ;-)


It looks like we'll have to settle this the only way your kind understands 
then:  a battle to the death using clubs.  See you at the next conference!


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-27 Thread Greg Smith

On Tue, 27 Oct 2009, Greg Stark wrote:


If they all had to edit the same file then they have to deal with
writing out values and also reading them back. Everyone would need a
config file parser and have to make deductions about what other tools
were trying to do and how to interact with them.


Exactly, that's the situation we're trying to escape from now in a 
nutshell.


To answer Tom's question about providing better guidelines for tool 
authors, I was hoping to provide the first such tool and submit a patch 
for refactoring initdb using the same approach before 8.5 is done.  I'd 
rather see that nailed down with a concrete proof of concept attached that 
implements a candidate approach by example rather than to just talk about 
it in general.  I don't think that needs to hold up work on this patch 
though, particularly given that I'm dependent on this one being committed 
for my plan to work.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-27 Thread Greg Smith

On Tue, 27 Oct 2009, Robert Haas wrote:

I guess I didn't consider the possibility that someone might reuse an 
8.4 postgresql.conf on an 8.5 server.  That could be awkward.


Happens all the time, and it ends up causing problems like people still 
having settings for GUCs that doesn't even exist anymore.  You know how we 
could make this problem less likely to bite people?  By putting everything 
the user wants to customize that isn't done by initdb into another file. 
Then they can just move that file into the new version.  That's the 
direction we're trying to move here, except much slower than you're 
suggesting because we've already through about some of these gotchas. 
Obviously you could do the same thing by completely gutting the whole 
postgresql.conf, but I was hoping for a step in the right direction that 
doesn't require something that drastic yet.


The length of this thread has already proven why it's not worth even 
trying to completely trim the file down.  Had you never brought that up 
this discussion would be done already.  If you have a strong feeling about 
this, write a patch and submit it; I'm not going to talk about this 
anymore.


I was thinking that the algorithm would be something like: Read the old 
postgresql.conf and write it back out to a new file line by line


This sounds familiar...oh, that's right, this is almost the same algorithm 
pgtune uses.  And it sucks, and it's a pain to covert the tool into C 
because of it, and the fact that you have to write this sort of boring 
code before you can do a single line of productive work is one reason why 
we don't have more tools available; way too much painful grunt work to 
write.



True, but actually having a good SET PERSISTENT command would solve
most of this problem, because the tools could just use that.


The system running the tool and the one where the changes are being made 
are not the same.  The database isn't necessarily even up when the tool is 
being run yet.  The main overlap here is that one of the output formats 
available to future tools could be a series of SET PERSISTENT commands one 
could then run elsewhere, which is already on my pgtune roadmap when it's 
possible to implement.


You're doing a good job of reminding me why I didn't have a good vision of 
where this all needed to go until after I wrote a working tuning tool, to 
get a feel for the painful parts.  I wish I could share all of the 
postgresql.conf files I've seen so you could better appreciate how people 
torture the poor file in the field.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith

On Wed, 28 Oct 2009, Alvaro Herrera wrote:


Huh, isn't this code in initdb.c already?


The sketched out design I have for a contrib/pgtune in C presumes that I'd 
start by refactoring the relevant bits from initdb into a library for both 
programs to use.  But the initdb code doesn't care about preserving 
existing values when making changes to them; it just throws in its new 
settings and moves along.  So what's there already only handles about half 
the annoying parts most people would expect a tuning tool that reads the 
existing file and operates on it to do.


Also, I wouldn't be surprised to find that it chokes on some real-world 
postgresql.conf files.  The postgresql.conf.sample it's being fed is 
fairly pristine.  A tuning tool that intends to read any postgresql.conf 
it's fed can't always assume it's in exactly standard form.  I've recently 
started collecting complicated postgresql.conf lines that crashed my 
Python code as people submit bug reports with those.  You might be 
surprised at all of the places people put whitespace at.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith

On Wed, 28 Oct 2009, Tom Lane wrote:

Why in the world are you looking at initdb?  The standard reference for 
postgresql.conf-reading code, by definition, is guc-file.l.  I think the 
odds of building something that works right, without borrowing that same 
flex logic, are about nil.


initdb was the only sample around that actually makes changes to the 
postgresql.conf.  It's also a nice simple standalone program that's easy 
to borrow pieces from, which guc-file.l is not.  That's the reason it 
looks tempting at first.


If as you say the only right way to do this is to use the flex logic, that 
just reinforced how high the bar is for someone who wants to write a tool 
that modifies the file.  Periodically we get people who show up saying 
hey, I'd like to write a little [web|cli|gui] tool to help people update 
their postgresql.conf file, and when the answer they get incudes first 
you need to implement this grammar... that's scares off almost all of 
them.  It didn't work on me because I used to write compilers for fun 
before flex existed.  But even I just skimmed it and pragmatically wrote a 
simpler postgresql.conf parser implementation that worked well enough to 
get a working prototype out the door, rather than properly the whole 
grammar.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith

On Wed, 28 Oct 2009, Greg Stark wrote:


It's also a blatant violation of packaging rules for Debian if not
every distribution. If you edit the user's configuration file then
there's no way to install a modified default configuration file. You
can't tell the automatic modifications apart from the user's
modifications. So the user will get a prompt asking if he wants the
new config file or to keep his modifications which he never remembered
making.


The postgresql.conf file being modified is generated by initdb, and it's 
already being customized per install by the initdb-time rules like 
detection for maximum supported shared_buffers. It isn't one of the files 
installed by the package manager where the logic you're describing kicks 
in.  The conflict case would show up, to use a RHEL example, if I edited a 
/etc/sysconfig/postgresql file and then a changed version of that file 
appeared upstream.  Stuff in PGDATA is all yours and not tracked as a 
config file.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith

On Wed, 28 Oct 2009, Josh Berkus wrote:


It's the basic and unsolvable issue of how do you have a file which is
both perfectly human-readable-and-editable *and* perfectly
machine-readable-and-editable at the same time.


Let's see...if I remember correctly from the last two rounds of this 
discussion, this is the point where someone pops up and says that 
switching to XML for the postgresql.conf will solve this problem. 
Whoever does that this time goes into the ring with Kevin and I, but they 
don't get a club.  (All fight proceeds to benefit SPI of course).


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith

On Wed, 28 Oct 2009, Robert Haas wrote:


It would be completely logical to break up the configuration file into
subfiles by TOPIC.  That would complicate things for tool-writers
because they would need to get each setting into the proper file, and
we currently don't have any infrastructure for that.


Already done:

# select name,category from pg_settings limit 1;
   name   | category
--+---
 add_missing_from | Version and Platform Compatibility / Previous 
PostgreSQL Versions


You could make one per category, and pgtune for example already knows all 
this info.  The somewhat arbitrary category assignments Josh put things 
into are what Peter was complaining about upthread.  Questions like is 
'effective_cache_size' a memory parameters or an optimizer one? show why 
this is not trivial to do well.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch set under development to add usage reporting.

2009-10-30 Thread Greg Smith

On Fri, 30 Oct 2009, John Murtari wrote:

We now have a basic patch set that works and is basically stable (not 
recommended for production servers!).  We've dedicated a page at our web 
site and it hopefully has answers to most of your questions, and also 
has the patch set for download.  These are for 7.4.19 - the version 
included with RHEL 4.


This is kind of interesting, but targeting 7.4.19 isn't going to get you 
very far toward code anyone else will use.  That release is 6 years old, 
it's filled with unsolvable limitations, it's basically at end of life. 
The fact that it's bundled with RHEL4 and there are some legacy installs 
still floating around are the only reason it's not completely gone from 
everyone's radar.


In short, if you actually care about your data, you should be running a 
newer version of the database regardless of what RHEL ships.  And you 
should be building patches against no earlier than 8.4 if you want 
something that has any hope of being accepted into mainstream development. 
Eventually the patch will need to apply to the 8.5 work in progress source 
code tree before it's even a candidate to merge.  You can probably get 
away with developing against a more stable version like 8.4.1, if you must 
target something people can also deploy, but even that's not ideal and 
will eventually turn into a code merge hurdle.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] next CommitFest

2009-11-10 Thread Greg Smith

On Sun, 8 Nov 2009, Robert Haas wrote:


I would personally prefer not to be involved in the management of the
next CommitFest.  Having done all of the July CommitFest and a good
chunk of the September CommitFest, I am feeling a bit burned out.


I was just poking around on the Wiki, and it looks like the role of the 
CommitFest manager isn't very well documented yet.  Since you've done all 
of them since introducing the new CF software, I'm not sure if anyone else 
even knows exactly what you've been doing.  The transition over to that 
was so successful there isn't even a copy of the schedule for 8.5 on the 
Wiki itself.  Could you find some time this week to rattle off an outline 
of the work involved?  It's hard to decide whether to volunteer to help 
without having a better idea of what's required.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch committers

2009-11-11 Thread Greg Smith

Bruce Momjian wrote:

True, but even I avoid patches I don't understand, and practicing by
applying them could lead to a very undesirable outcome, e.g.
instability.
  
The usual type of practice here should come from applying trivial 
patches, or ones that don't impact code quality.  Docs patches come to 
mind as a good way someone could get used to the commit process without 
introducing much potential mayhem along the way.  As far as keeping new 
people away from complicated patches, ultimately you just have to trust 
that anyone who can commit has a reasonable idea of their own 
capabilities.  I seriously doubt you're going to find a new committer 
jumping right in by committing hot standby out of the gate just because 
they could do so.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch committers

2009-11-11 Thread Greg Smith

Robert Haas wrote:

I tried to help, but I was fairly tied up with overall CommitFest management and
did not have time for a full read-through of every patch.
  
I think it's completely unreasonable to expect the CF manager to do any 
patch review themselves.  It's a hard enough job to keep going without 
actually getting your hands into the details.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch committers

2009-11-11 Thread Greg Smith

Bruce Momjian wrote:

I also think the bad economy is making it harder for people/companies to
devote time to community stuff when paid work is available.
  
I think this explains away more of the recent situation than you're 
giving it credit for.  When everybody's fat and happy and it's easy to 
generate/raise money, it's also easy to throw money toward the 
community.  When times are tight, giving away work that you might charge 
for (or have already charged for) is harder for a company to justify.  
It's easy to plan to have someone do community work when you hire them, 
only to realize down the line that business has dried up enough that 
you're stuck with the choice between them doing that and a job that will 
make or break and upcoming payroll.  And that's where a lot more 
businesses are at right now than at any time in a long while.


After looking for an example of the boom/bust cycle impacting this 
community's work that's old enough to be clearer in hindsight, I would 
suggest noting that Great Bridge was officially announced in May of 2000 
and was gone by the end of 2001.  Overlay those dates on top of 
http://www.google.com/finance?q=INDEXNASDAQ:.IXIC after switching Zoom 
to show 10 years.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] next CommitFest

2009-11-11 Thread Greg Smith

Selena Deckelmann wrote:

On Tue, Nov 10, 2009 at 10:40 PM, Greg Smith gsm...@gregsmith.com wrote:
  

I was just poking around on the Wiki, and it looks like the role of the
CommitFest manager isn't very well documented yet.



It's pretty straightforward. Robert has actually done a great job of
communicating about this to the patch reviewers.
  
That's good to hear.  What I was hinting at was that some of the 
community knowledge here should start getting written down now that the 
process has matured, rather than trying to directly transfer just to one 
other person.  I'm not sure if Robert has shared 100% of what he does 
with the reviewers or not, but in general the easiest way to divest 
yourself of a position is to document how someone else can do it.  I 
don't know that having to poke through list archives or chat with 
someone is necessarily the best way to transfer that knowledge.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [HACKERS] next CommitFest

2009-11-11 Thread Greg Smith

Robert Haas wrote:

Here's an attempt.
http://wiki.postgresql.org/wiki/Running_a_CommitFest
  
Perfect, that's the sort of thing I was looking for the other day but 
couldn't find anywhere.  I just made a pass through better wiki-fying 
that and linking it to the related pages in this area.


Two things look to be true at the moment:

1) The call for reviewers is already running late and needs to start ASAP.
2) Some of the experienced helpers from the previous CFs, like Selena, 
should eventually be able to help, just everybody is busy during when 
the first round of action has to happen here.


Given all that, I'm thinking that unless we get an enthusiastic 
volunteer by tomorrow, I'll kick off the call for reviewers myself and 
follow that through to initial patch assignments.  I don't expect to 
have as much time as Robert put into the last couple of CommitFests 
after that, but this one looks smaller and with more familiar patches 
than those.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Tom Lane wrote:

Fujii Masao masao.fu...@gmail.com writes:
  

The problem is that fsync needs to be issued too frequently, which would
be harmless in asynchronous replication, but not in synchronous one.
A transaction would have to wait for the primary's and standby's fsync
before returning a success to a client.



Surely that is exactly what is *required* if the user has asked for
synchronous replication.
  
This a distressingly common thing people get wrong about replication.  
You can either have synchronous replication, which as you say has to be 
slow:  you must wait for an fsync ACK from the secondary and a return 
trip before you can say something is committed on the primary.  Or you 
can get better performance by not waiting for all of those things, but 
the minute you do that it's *not* synchronous replication anymore.  You 
can't get high-performance and true synchronous behavior; you have to 
pick one.  The best you can do if you need both is work on accelerating 
fsync everywhere using the standard battery-backed write cache technique.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



[HACKERS] CommitFest 2009-11 Call for Reviewers

2009-11-12 Thread Greg Smith
In a few days the 3rd 8.5 development CommitFest, 2009-11, is going to 
kick off, with the end goal being an alpha3 prerelease.  If you have a 
patch in progress, you'll need to submit it before the deadline of 
2009-11-15 00:00:00 GMT for it to be considered during this round:  
http://wiki.postgresql.org/wiki/Submitting_a_Patch


The actual process of the CommitFest itself is fairly well documented at 
this point:


http://wiki.postgresql.org/wiki/Reviewing_a_Patch
http://wiki.postgresql.org/wiki/RRReviewers
http://wiki.postgresql.org/wiki/Running_a_CommitFest

For lack of a more qualified volunteer, I'll be handling the initial 
round of patch assignments and reviewer organization.  I suspect we'll 
reorganize on the fly as things proceed based on who has time; I'd 
certainly welcome patch-chasing help in addition to reviewing.  Since 
the backlog for this CommitFest is so far lighter than we've seen 
recently, the small patches that don't already have an active reviewer 
shouldn't be too difficult to get through.


Please send me an email (without copying the list) if you are available 
to help with review.  Include any information that might be helpful in 
assigning you an appropriate patch.  If there's a specific one you want 
to claim, by all means let me know that.  All reviewers will need to be 
subscribed to the RRR mailing list, so when you write me please also 
follow the subscription link at 
http://archives.postgresql.org/pgsql-rrreviewers/ to add yourself to 
that list, too, if you're not already there.


The set of patches I have the least feel for are the five ECPG 
submissions, some of which were reviewed already.  I would particularly 
appreciate any early information reviewers might provide about their 
capability/willingness to work on that set.  Those are not so easy to 
just split among multiple people due to how they relate to one another.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RRR] CommitFest 2009-11 Call for Reviewers

2009-11-12 Thread Greg Smith

Josh Berkus wrote:

On 11/12/09 9:45 AM, Greg Smith wrote:
  

For lack of a more qualified volunteer, I'll be handling the initial
round of patch assignments and reviewer organization.



Hmmm?  Who's more qualified than you, exactly?
  

I was alluding to the fact that Robert isn't available to handle this one.

--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [HACKERS] CommitFest 2009-11 Call for Reviewers

2009-11-12 Thread Greg Smith

Tom Lane wrote:

AFAIK the ecpg patches are all waiting on Michael Meskes to have time
to review/commit them.  ecpg is pretty much his turf and no other
committers are likely to touch these patches.
Great to know, and since some of the regular reviewers already made a 
pass through them there's probably not too much general feedback left 
anyway.  I just marked all of those as having Michael as the reviewer.  
If it gets to where those are the main remaining hold-up I guess we'll 
revisit who else might help out then.  Would rather get the patches it's 
more obvious how to handle out of the way first.


Not considering those, HS/SR, or other patches with an already assigned 
reviewer, we're at 16 patches in the queue, and I've got 9 reviewer 
volunteers just so far today.  Barring a flood of last-minute entries, 
if I can get each reviewer to handle one patch and a moderate percentage 
of them to handle two, that should be all it takes for this round.  Will 
move the rest of the discussion here to just rrreviewers.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch committers

2009-11-12 Thread Greg Smith

Tom Lane wrote:

While I'm not against promoting more committers to deal with the influx of 
patches,
the only way I know for people to get to the skill level of being fully
competent reviewers is to have done a lot of patch writing themselves.
  
The dynamic going on right now is that many people who might otherwise 
be writing their own patches are instead doing patch review to try and 
keep the project as a whole moving forward.  I actually had two off-list 
discussions about that just today, that topic pops up pretty regularly 
as I talk with contributors at all levels.


Since most people have an upper limit on how much community time they 
can spend, every minute spent reviewing is one you're not working on 
your own patches during.  The way you're describing the qualification 
process, it would be easy to conclude that there's a reviewer ladder, 
and a developer ladder, and only climbing the latter leads to being a 
committer--that no matter how much review you do, it doesn't really 
count as a committer grade skill.


I'm not sure that's the message you want to be sending, because anyone 
who dreams of being a committer is going to stay as far away from doing 
review as they can if that notion spreads.  Based on the growing 
frustration with doing review doesn't leave me with time for my own 
patches I keep hearing, that perception is already something to be wary 
of.  If the primary criteria is generating patches that apply with 
minimal changes, you could make a case that someone who's gotten skilled 
enough as a reviewer to only pass through patches of that quality should 
get some recognition even if they didn't write them.  That's clearly a 
useful subset of the skills needed to commit patches only if they look 
to be ready for it.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

Personally, I think that semi-synchronous replication is sufficient for HA.
  
Whether or not you think it's sufficient for what you have in mind, 
synchronous replication requires a return ACK from the secondary 
before you say things are committed on the primary.  If you don't do 
that, it's not true sync replication anymore; it's asynchronous 
replication.  Plenty of people decide that a local commit combined with 
a promise to synchronize as soon as possible to the slave is good enough 
for their apps, which as you say is getting referred to as 
semi-synchronous replication nowadays.  That's an awful name though, 
because it's not true--that's asynchronous replication, just aiming for 
minimal lag.  It's OK to say that's what you want, but you can't say 
it's really a synchronous commit anymore if you do things that way.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

Umm... what is your definition of synchronous? I'm planning to provide
four synchronization modes as follows, for v8.5. Does this fit in your
thought?

  The primary waits ... before returning success of a transaction;
  * nothing   - asynchronous replication
  * recv ACK  - semi-synchronous replication
  * fsync ACK - semi-synchronous replication
  * redo ACK  - synchronous replication

Or, in synchronous replication, we must wait a fsync and a redo ACK?
  
Right, those are the possibilities, all four of them have valid use 
cases in the field and are worth implementing.  I don't like the label 
semi-synchronous replication myself, but it's a valuable feature to 
implement, and that is unfortunately the term other parts of the 
industry use for that approach.


But everyone needs to be extremely careful with the terminology here:  
if you say synchronous replication, that *only* means what you're 
labeling redo ACK (WAL ACK really).  Synchronous replication 
should not be used as a group term that includes the semi-synchronous 
variations, which are in fact asynchronous despite their marketing 
name.  If someone means semi-synchronous, but they say synchronous 
thinking it's a shared term also applicable to the semi-synchronous 
variations here, that's just going to be confusing for everyone.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/lite (r2429)

2009-11-12 Thread Greg Smith
KaiGai Kohei wrote:
 In the v8.4 development cycle, I got a suggestion to reduce
 a burden of reviewer to split off a few functionalities, such
 as security_context system column and row-level access controls.
   
I lost track of this patch and related bits somewhere along the way, had
to triage my unread mail a few times. Could someone summarize how it now
fits into plans for more general row-level access controls in the
database? I know incompatibilities between the SEPosgreSQL model for row
filtering and thoughts for a more general permissions feature that did
something similar were a major design issue in the early 8.4 versions of
SEPostgreSQL, and that as you say you've been working on that. I'm not
sure what relationship there is between those two today though, or
exactly where the general non-SELinux row filtering is at on the roadmap.

-- 
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote:
  

Right, those are the possibilities, all four of them have valid use cases in
the field and are worth implementing.  I don't like the label
semi-synchronous replication myself, but it's a valuable feature to
implement, and that is unfortunately the term other parts of the industry
use for that approach.



BTW, MySQL and DRBD use the term semi-synchronous:
http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
http://www.drbd.org/users-guide/s-replication-protocols.html
  
Yeah, that's the other parts of the industry I was referring to.  
MySQL uses semi-synchronous to distinguish between its completely 
asynchronous default replication mode and one where it provides a 
somewhat safer implementation.  The description reads more as 
asynchronous with some synchronous elements, not one style of 
synchronous implementation.  None of their documentation wanders into 
the problem area here by calling it a true synchronous solution when 
it's really not--MySQL Cluster is their synchronous vehicle. 

It's fine to adopt the term semi-synchronous, as it's become quite 
popular and people are going to label the PG implementation with it 
regardless of what is settled on here.  But we should all try to be 
careful to use it as correctly as possible.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SE-PgSQL/lite (r2429)

2009-11-12 Thread Greg Smith
KaiGai Kohei wrote:
 I found a uncertain term in your comment.
 It seems to me the model has two meanings in this context.
 - The way to make access control decision (allowed? or denied?).
 - The granularity of access controls (tables? columns? or tuples?).
   
What I meant by the SEPosgreSQL model for row filtering was the
original implementation you had, where row filtering was handled by code
specific to SEPostgreSQL, not something generic enough to be used for
other purposes. I wasn't sure what if anything from there was still in
the patch, and you answered that clearly enough. Thanks for clarifying
where things are at.

-- 
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] next CommitFest

2009-11-13 Thread Greg Smith

Simon Riggs wrote:

All the CF manager needs to do is ensure that every patch submitted
chalks up one review. If you think about it, we wouldn't actually need
any rr reviewers at all then, because if we have 20 patches we would
have 20 reviews due. So the whole scheme is self-balancing
In fact, just suggesting the guideline that everyone who submits a patch 
should review one here was sufficient to pull in a number of submitters 
who volunteered to do a single review as well, moving some distance 
toward what you're describing.  It seems we had a perception here that 
joining rrreviewers subscribed you to doing multiple patch reviews; 
I've let multiple submitters who were trying to help out know it's OK to 
just grab one patch and review without even getting involved on that list.


Take a look at 
https://commitfest.postgresql.org/action/commitfest_view?id=4 right 
now.  I've been suggesting to people that they assign themselves to the 
patches they like, and it's nearing completely populated two days before 
the CommitFest has even started.  I have 6 reviewers that haven't been 
assigned anything yet and there are only 8 unassigned patches out 
there.  In several cases, assigning the reviewer turned out to be quite 
easy because so many submitters joined in--just assign someone who 
submitted a patch in the same area.


So it far it looks sufficient to introduce the expectation that 
submitters should also do a review, without even needing to make that a 
firm rule.  That helps increase the reviewer pool significantly, 
addressing the general problem Robert has been fighting, while not 
forcing people like Dave who have other pulls on their time into a 
review role.  We'll see whether the follow-through here is good or not, 
maybe this will decay yet.  For now, simply telling submitters that the 
review of their own patches might be influenced by whether they do a 
good job reviewing someone else's has improved things considerably over 
past CommitFests, and it's hard to imagine how someone could complain 
about a guideline that fair.


The most difficult part here remains finding reviewers for the really 
big patches.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-29 Thread Greg Smith

On Fri, 28 Mar 2008, Gregory Stark wrote:


I described which interfaces worked on Linux and Solaris based on empirical
tests. I posted source code for synthetic benchmarks so we could test it on a
wide range of hardware. I posted graphs based on empirical results.


Is it possible to post whatever script that generates the graph (gnuplot?) 
so people can compare the results they get to yours?  It's when I realized 
I didn't have that and would have to recreate one myself that my intention 
to just run some quick results and ship them to you lost momentum and I 
never circled back to it.  It would be nice if you made it easier for 
people to generate fancy results here as immediate gratification for them 
(while of course just asking them to send the raw data).


I can run some tests on smaller Linux/Solaris systems to see if they don't 
show a regression, that was my main concern about this experiment.  Some 
of the discussion that followed your original request for tests was kind 
of confusing as far as how to interpret the results as well; I think I 
know what to look for but certainly wouldn't mind some more guidance 
there, too.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New boxes available for QA

2008-04-01 Thread Greg Smith

On Tue, 1 Apr 2008, Guillaume Smet wrote:

I wonder if it's not worth it to have a very simple thing already 
reporting results as the development cycle for 8.4 has already started 
(perhaps several pgbench unit tests testing various type of queries with 
a daily tree)


The pgbench-tools utilities I was working on at one point anticipated this 
sort of test starting one day.  You can't really get useful results out of 
pgbench without running it enough times that you get average or median 
values.  I dump everything into a results database which can be separated 
from the databases used for running the test, and then it's easy to 
compare day to day aggregate results across different query types.


I haven't had a reason to work on that recently, but if you've got a 
semi-public box ready for benchmarks now I do.  Won't be able to run any 
serious benchmarks on the systems you described, but should be great for 
detecting basic regressions and testing less popular compile-time options 
as you describe.


As far as the other more powerful machines you mentioned go, would need to 
know a bit more about the disks and disk controller in there to comment 
about whether those are worth the trouble to integrate.  The big missing 
piece of community hardware that remains elusive would be a system with

=4 cores, =8GB RAM, and =8 disks with a usable write-caching controller

in it.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work

2008-04-01 Thread Greg Smith

On Tue, 1 Apr 2008, Guillaume Smet wrote:


A good answer is probably to plan optional JDBC benchmarks in the
benchfarm design - not all people want to run Java on their boxes but
we have servers of our own to do so.


The original pgbench was actually based on an older test named JDBCbench. 
That code is kind of old and buggy at this point.  But with some care and 
cleanup it's possible to benchmark not only relative Java performance with 
it, but you can compare it with pgbench running the same queries on the 
same tables to see how much overhead going through Java is adding.


Original code at http://mmmysql.sourceforge.net/performance/ , there's 
also some improved versions at 
http://developer.mimer.com/features/feature_16.htm


I'm not sure if all of those changes are net positive for PostgreSQL 
though, they weren't last time I played with this.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] build multiple indexes in single table pass?

2008-04-01 Thread Greg Smith

On Tue, 1 Apr 2008, Andrew Dunstan wrote:

I don't know if this has come up before exactly, but is it possible that we 
could get a performance gain from building multiple indexes from a single 
sequential pass over the base table?


It pops up regularly, you might even have walked by a discussion of this 
idea with myself, Jan, and Jignesh over the weekend.  Jignesh pointed out 
that index creation was a major drag on his PostgreSQL benchmarking 
operations and I've run into that myself.  I have a large dataset and 
creating a simple index takes around 70% of the time it takes to load the 
data in the first place, his multiple index tables took multiples of load 
time to index.  Considering that the bulk load isn't exactly speedy either 
this gives you an idea how much room for improvement there is.


The idea we were bouncing around went a step past that and considered 
this:  if you have good statistics on a table, and you have a sample set 
of queries you want to execute against it, how would you use that 
information to plan what indexes should be created?  Needing to be able to 
create multiple indexes at once efficiently was an implementation detail 
to pull that off.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch queue - wiki (was varadic patch)

2008-04-02 Thread Greg Smith

On Wed, 2 Apr 2008, Bruce Momjian wrote:

The new permanent ones are permanent against mailbox movement, and in 
fact the comments and thread merging also travels with the email.


The someone replied to your comment links in e-messages I've been 
getting the last few days have all been working, which is a first.  The 
configuration you're running right now I'd consider the first candidate to 
be a stable version, so thumbs up from me for reaching that point.


It's clear to me only now that you can think of the patch queue as being a 
list with this structure:


1) Patch name (defaults to the subject of the first message)
2) List of messages related to that patch
3) List of comments
4) Status
5) Assigned reviewers

Bruce's toolchain converts an mbox of messages to generate the first two, 
then has a web interface to allow adding the third.  Right now the message 
list is internally consistant but not useful in the long term (doesn't 
have links to the archives, just this temporary page).  Until the search 
for message ID feature is added to the archives I don't know that this 
situation can be improved.


Those hacking on tools to convert Bruce's currently preferred working form 
(that revolves around mbox files) into something else that's web oriented 
are stuck with considering how all the above information is going to be 
handled before everybody will be satisfied.  I can see how a script that 
converts the current pages into wiki markup, with placeholders where 
someone can manually update the comments to summarize those on the page, 
would be helpful.  That basically creates an easier to read Queue 
summary like Stephan was doing for 8.3--that included items 1,4,5 from 
the above.  But that's a one-way operation that doesn't really help with 
the commenting situation, and it's inevitably going to lag behind the 
mailbox-centered queue unless it's made fully automatic.  I can't think of 
anything better that doesn't require building some sort of database that 
holds all this information and drives page generation.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch queue - wiki (was varadic patch)

2008-04-04 Thread Greg Smith

On Fri, 4 Apr 2008, Dave Page wrote:


We must be talking at cross purposes because I really cannot believe
you're asking me how to add a link to a wiki page :-o


He wants to know how to automate turning an entire mbox file full of them 
into wiki markup, now how to do one at a time.  Other people have been 
running such tools for Bruce but he doesn't have one he can become 
comfortable with running himself yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modules

2008-04-04 Thread Greg Smith

On Thu, 3 Apr 2008, Joshua D. Drake wrote:


IMO the core modules should be compiled via configure with something
like:
./configure --enable-module=ALL


If you really want to make the problems with using contrib modules go 
away, so they are a) installed even by lazy ISPs who just do 
compile/make/make install, and b) not viewed as second-class citizens when 
people have to ask them to be installed, this won't do it.  You should 
default to installing all the modules and provide configure options to 
turn them off instead.  All PostgreSQL installations should have them all 
available (but not installed in the database, as you point out) unless 
someone goes out of their way to circumvent that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-09 Thread Greg Smith

On Wed, 9 Apr 2008, Tom Lane wrote:


Gregory Stark [EMAIL PROTECTED] writes:

What would move us in the direction of this mythical patch tracker would be
if we knew exactly what our workflow was. Once we know what our workflow is
then we could pick a tool which enforces that workflow.


Well, I don't think we want or need an enforced workflow.  What we
need is just a list of pending patches so that nothing falls through the
cracks.


Making sure nothing falls through the cracks is exactly the point of an 
enforced workflow.  It might be a manual operation, it might be some piece 
of software, but ultimately you need a well-defined process where things 
move around but don't get dropped.  Exactly how said enforcement happens 
is certainly open to discussion though.


Last time I chimed in on this subject I tried unsuccessfully to move 
discussion into this area--trying to nail down the structure of a patch 
processing workflow--but all I managed to do was kick off was a discussion 
of the trivia involved with one step.  A better attempt is below.


As you say, most of the work is in recognizing which emails deserve to 
be entered into the list, and that's not subject to automation (not in 
this decade anyway).


Sure, but that can still be an input to the workflow.

Since I'm unphased by criticism and have been watching this whole 'Fest 
fairly closely, I'll even throw out a sample for a more formal workflow 
outline.  Always easier to map this stuff out when you've got a dummy 
proposal to beat up.  This is aimed to look somewhat like what happened 
this time around (except using the newer tools that are basically built 
now) rather than to be a more grand vision:


Input:  submissions to -patches and -hackers
Processing:  Saved via mail reader software
Output:  mbox file with relevant items
Person:  Bruce

Input:  mbox file
Processing:  Run script
Output:  Patch queue detail wiki page, with links to the archives
Person:  Greg Stark via his script

Input:  Patch queue detail
Processing:  Manually editing page, perhaps with some tool assistance
Output:  Patch queue summary wiki page
Person:  Alvaro

Input:  Patch queue summary
Processing:  Patch committed, removed from page
Output:  Updated patch queue summary, e-mail to author
Person:  Tom, Bruce, other committers

Input:  Patch queue summary
Processing:  Patch changed to be a TODO item
Output:  Expanded TODO list, updated patch queue summary, e-mail to author
Person:  Bruce

Input:  Patch queue summary
Processing:  Patch rejected or bounced back with comments
Output:  Reduced patch queue summary, e-mail to author
Person:  Bruce

There's a clear hole for messages to fall into when they're being 
summarized into the patch summary step, I recall Tom saying something 
about items that didn't make it into the current summary.  That needs to 
be improved a bit.  I also note that I didn't diagram separate review 
steps because I didn't see them happen in a formal way this time around 
that I could use as a model.


As a sideline observer here it seems to me that Bruce has a good and hard 
to replace process to kick this all off already going, so don't mess with 
that.  It would be nice to find vict...err, volunteers to pull him out of 
the later steps though for a net reduction in his time.  Simply getting 
things organized better from the start should help with getting more 
people helping out with review; the common complaint seemed to be I can't 
figure out what to help with in this big mess which having a summary from 
the start should improve.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-09 Thread Greg Smith

On Wed, 9 Apr 2008, Marc G. Fournier wrote:


Do other large projects accept patches 'ad hoc' like we do?  FreeBSD?  Linux?
KDE?


The Linux procedure is documented at 
http://www.mjmwired.net/kernel/Documentation/SubmittingPatches


Linux was forced into some structure by the SCO lawsuit circa 2004, in 
that they track who patches came from more carefully now.  But the process 
of submission to the Linux kernel developer's mailing list is even less 
organized than here; as stated in that document, they will drop patches 
without comment whenever they please.  However, they do have a person 
designated Trivial Patch Monkey which is such a great title that you 
have to forgive the rest of the problems in the process.


FreeBSD includes a program called send-pr just to submit problem reports 
into their system which can include feature changes.  You can get an idea 
how sophisticated their tracking for bug patches is by looking at 
http://www.freebsd.org/cgi/query-pr-summary.cgi?query


KDE's process works similarly to here, e-mail based with specific people 
assigned to track submissions to the various portions of the project: 
http://developer.kde.org/documentation/other/developer-faq.html#q2.21


GNOME makes all submitters create a report in bugzilla and tracks from 
there:

http://live.gnome.org/GnomeLove/SubmittingPatches

Apache also pushes everything through bugzilla: 
http://httpd.apache.org/dev/patches.html


The interesting quote there is:

Traditionally, patches have been submitted on the developer's mailing 
list as well as through the bug database. Unfortunately, this has made it 
hard to easily track the patches. And without being able to easily track 
them, too many of them have been ignored.  Patches must now be submitted 
through the bug database...


The thing that will obviously go away if this project were to switch to 
such a model is that right now, there are lots of ideas that go by that 
would never be submitted as patches like that.  But Bruce snags them and 
turns them into todo items and such rather than letting the idea just get 
lost in the archives.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-10 Thread Greg Smith

On Thu, 10 Apr 2008, Brendan Jurd wrote:

[Automatic e-mail notification] is trivial to configure in a real 
tracker.  Less so for a wiki page, but it could still be accomplished 
with the careful application of script-fu.


Anyone who is interested can sign up for e-mail notification whenever a 
specific wiki page is modified right now, that's a standard MediaWiki 
feature.  If you wanted you could even sign up a mailing list as the 
entity being notified.  That's not exactly what you had in mind I think, 
but it's close enough to be useful for now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to submit a patch

2008-04-16 Thread Greg Smith

On Wed, 16 Apr 2008, Joshua D. Drake wrote:


I've added a redirect at http://wiki.postgresql.org/wiki/CommitFest
which currently points to May, but should be updated whenever we close
a commitfest against new submissions.


We should also update the FAQ.


I wouldn't bother with that yet.  That whole area of the Wiki is still 
moving around a bit, and I expect some more usefully targetted pages will 
emerge (How to submit a patch comes to mind).  Having a stable 
CommitFest URL is handy, but I don't think it's where the FAQ should be 
sending people.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Plan targetlists in EXPLAIN output

2008-04-17 Thread Greg Smith

On Thu, 17 Apr 2008, Tom Lane wrote:


For debugging the planner work I'm about to do, I'm expecting it will be
useful to be able to get EXPLAIN to print the targetlist of each plan
node, not just the quals (conditions) as it's historically done.


I've heard that some of the academic users of PostgreSQL were hoping to 
add features in this area in order to allow better using planner internals 
for educational purposes.  It would be nice if that were available for 
such purposes without having to recompile.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lessons from commit fest

2008-04-17 Thread Greg Smith

On Fri, 18 Apr 2008, Gregory Stark wrote:

The reason I was asking these questions was because I was thinking about 
how hard it would be to generate the list from a textual analysis 
instead of using object files.


Is there some reason I don't understand why the listing doyxgen creates 
isn't good enough here?  http://doxygen.postgresql.org/globals_type.html


Scraping that HTML seems like it would be pretty straightforward.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to submit a patch

2008-04-19 Thread Greg Smith

On Wed, 16 Apr 2008, Heikki Linnakangas wrote:

Based on my observations, there's basically three different workflows a patch 
can follow (assuming the patch gets committed in the end)


This list was so good that I used it as the basis for a new page on the 
wiki:  http://wiki.postgresql.org/wiki/Submitting_a_Patch


I just did a big cleanup of the whole developer's area there.  Rather than 
the nested mess there before, there's now a fairly complete entry page:


http://wiki.postgresql.org/wiki/Development_information

That should have the majority of what most people are looking for.  The 
previous project management page was collapsed into the above.  There's 
still a Development projects subpage there, but that's fairly specific 
to people who know what they're looking for I think.  The March 
Commitfest section might be slimmed down a bit after the May one is 
better defined.


One small change I'd suggest on the main site: 
http://www.postgresql.org/developer/coding links to 
http://wiki.postgresql.org/wiki/Developer_and_Contributor_Resources which 
is now a redirect to the above page.  I separated out the advocacy 
contributors to their own section which made the longer title unneeded. 
It would be nice one day to change that to use the shorter 
Development_information URL instead.  It would also be worth considering a 
direct link to that URL in the manual, I believe it will remain stable 
now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to submit a patch

2008-04-19 Thread Greg Smith

On Sat, 19 Apr 2008, Joshua D. Drake wrote:


Greg Smith wrote:

One small change I'd suggest on the main site: 
http://www.postgresql.org/developer/coding links to 
http://wiki.postgresql.org/wiki/Developer_and_Contributor_Resources which 
is now a redirect to the above page. 


This request should be on -www and as a note I don't know that I like the 
idea.


There were two suggestions there, and technically one of them goes to 
-docs instead if we're gonna get picky.


1) Switch the URL that's already on the coding page to more directly point 
to a URL that actually exists, rather than a redirect.  That seems 
undebatable.


2) Put a link to an area that contains information like current CommitFest 
progress in the development section of the manual.


(2) was already suggested here recently; I said I didn't think that was a 
good idea until the content there stabilized because I planned a 
reorganization.  I was just announcing that I believe that to be stable 
now, and I nominate the revised Developer information page as the one to 
link to.


If you don't like the idea of embedding a few choice URLs from the wiki 
into the main documentation in general, I don't know why.  The manual is 
great for some things, the wiki is great for others, and the best way to 
use both for what they're good at is to start coupling them together at 
appropriate points.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest namespacing (was: TODO, FAQs to Wiki?)

2008-04-21 Thread Greg Smith

On Tue, 22 Apr 2008, Brendan Jurd wrote:


I wonder if we should namespace the CommitFest pages by year as well
as month (i.e., move CommitFest:May to CommitFest:May2008).


This already came up on pgsql-www and as I just replied to over there, the 
current structure has some things I'd like to fix beyond just this (and 
there's a pending namespace vs. categories argument brewing there). 
That's the list where this sort of thing will get hashed out at.  Please 
come join so you can get sucke...err, volunteer to help out even more than 
you already have.


This way, even after we've had a CommitFest:May in 2009/2010/etc., the 
history of the May 2008 CommitFest will still be easily viewable as a 
discrete item.


There ultimately should be pages for CommitFest:2008 and 
CommitFest:8.4 that the Wiki generates itself.  I'd prefer not to see 
any band-aid changes made in this area that aren't thinking forward to 
address those as well.  Work on improving the structure for May instead 
like you've been doing, that's much more valuable right now IMHO.



We probably need to have the following redirects in place:
* CommitFest:Current (for reviewers)...


Ditto here.  I already intend to eliminate the CommitFest redirect you've 
put there already and replace it with a page listing the Views available 
one day, and I'd prefer not to see more of these floating around.


Redirects are designed to be a useful hack when a page gets removed or to 
handle common shortcuts/errors.  In general, if you're relying on them 
heavily for external navigation structure, you're probably not using the 
right tool for that sort of job.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO, FAQs to Wiki?

2008-04-21 Thread Greg Smith

On Mon, 21 Apr 2008, Tino Wildenhain wrote:


Alvaro Herrera wrote:

I suggest we start an experiment with the FAQ in XML Docbook, which is
amenable to automatic processing, and move from there.


Well... or reStructuredText which has the advantage of beeing human
editable? (without specialized editor that is)


reST is a reasonable tool for building small documents, I don't use it 
because it really doesn't scale well to handle larger ones.  Given that 
the rest of the project is already committed to using Docbook for those 
larger documents, I think it's hard to justify the additional toolchain 
needed for reST processing just to make the FAQ a little easier to edit.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread Greg Smith

On Tue, 22 Apr 2008, PFC wrote:


Example : let's imagine a cache priority setting.


Which we can presume the DBA will set incorrectly because the tools needed 
to set that right aren't easy to use.


An alternative would be for the background writer to keep some stats and do 
the thing for us :

- begin bgwriter scan
- setup hashtable of [relid = page count]
- at each page that is scanned, increment page count for this 
relation...


I've already got a plan sketched out that does this I didn't manage to get 
finished in time for 8.3.  What I wanted it for was not for this purpose, 
but for instrumentation of what's in the cache that admins can look at. 
Right now you can get that out pg_buffercache, but that's kind of 
intrusive because of the locks it takes.  In many cases I'd be perfectly 
happy with an approximation of what's inside the buffer cache, accumulated 
while the page header is being locked anyway as the BGW passed over it. 
And as you note having this data available can be handy for internal 
self-tuning as well once it's out there.


Jim threw out that you can just look at the page hit percentages instead. 
That's not completely true.  If you've had some nasty query blow out your 
buffer cache, or if the server has been up a looong time and the total 
stas don't really reflect recent reality, what's in the buffer cache and 
what the stats say have been historical cached can diverge.



This would not examine whatever is in the OS' cache, though.


I don't know that it's too unrealistic to model the OS as just being an 
extrapolated bigger version of the buffer cache.  I can think of a couple 
of ways those can diverge:


1) Popular pages that get high usage counts can end up with a higher 
representation in shared_buffers than the OS


2) If you've being doing something like a bulk update, you can have lots 
of pages that have been written recently in the OS cache that aren't 
really accounted for fully in shared_buffers, because they never get a 
high enough usage count to stay there (only used once) but can fill the OS 
cache as they're spooled up to write.


I'm not sure that either of these cases are so strong they invalidate your 
basic idea though.  There's a pending 8.4 TODO to investigate whether 
increasing the maximum usage count a buffer can get would be an 
improvement.  If that number got bumped up I could see (2) become more of 
a problem.


I'd be a somewhat concerned about turning this mechanism on by default 
though, at least at first.  A hybrid approach that gives the DBA some 
control might work well.  Maybe have an adjust estimates for cache 
contents knob that you can toggle on a per-session or per-table basis?


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-04-30 Thread Greg Smith

On Thu, 1 May 2008, Andrej Ricnik-Bay wrote:


Not a hacker, just a curious reader ... are there equivalent frameworks
for the other supported platforms?  E.g. MacOS, *BSD, Windows?


SELinux is a Linux implementation of ideas from an earlier NSA project 
named Flask.  There is port of another variant of that, Flask/TE, that is 
making its way into the BSD variants via a project called SEBSD. 
TrustedBSD, Darwin (OS X), and OpenSolaris all have projects in this area 
already (the Solaris one just launched last month).  A good starter page 
is http://www.trustedbsd.org/sebsd.html


Particularly given the common heritage, I suspect that the PostgreSQL side 
of all these projects will be similar, and that once those hooks are in 
place it will just be a matter of tying them into the higher levels of the 
other framework.  It would be too ambitious to target all of them all at 
once for a first pass, but it may be worth a look at the fundamentals of 
SEBSD to make sure the right hooks look like they're in place.


Windows has this thing called Group Policy that's supposedly leaped 
forward for Windows Server 2008.  They are now advertising it as like 
SELinux, but better.  The presentation PDF I just read on that subject 
sounds like something written by the crazy guy at Broadway  57th street I 
used to walk by, as he talked on fruit as if they were his cell phone. 
It's such a deluded and wildly misguided bit of sales fluff that you can't 
take it seriously, and the whole thing just leaves me feeling sorry for 
them instead.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-05-01 Thread Greg Smith

On Wed, 30 Apr 2008, KaiGai Kohei wrote:


[1/4] sepostgresql-pgace-8.4devel-3-r739.patch
 provides PGACE (PostgreSQL Access Control Extension) framework.
  http://sepgsql.googlecode.com/files/sepostgresql-pgace-8.4devel-3-r739.patch


For those overwhelmed by sheer volume here, this is the patch to start 
with, because it's got all the core changes to the server.  I'm also in 
the camp that would like to see this feature added, but rather than just 
giving it a +1 I started looking at it.


The overall code is nice:  easy to understand, structured modularly.  I 
have some concerns though.  The first two things that jump out at me on an 
initial review appear right from the beginning for those who want to take 
a look:


-I'm a bit unnerved by both the performance and reliability implications 
from how the security check calls are done in every case, even if there is 
no SELinux support included.  Those checks are sitting in some pretty low 
level tuple and heap calls.


The approach taken here is to put all the #ifdef logic into the 
underlying ACE interface (see patch [2/4]), so that the caller doesn't 
have to care.  If SELinux support is off then the calls turns into


  void x(y) {} or
  bool a(b) { return true; }

This is a very clean design, but it's putting extra (possibly optimized 
away) calls into a lot of places.  While it would be uglier, it might make 
sense to put that on/off logic in all the places where the calls are made, 
so that when you turn SELinux support off most of the code really does go 
completely away rather than just turning into stubs.


-The only error reporting and handling method used is elog(ERROR, 
That seems a bit heavy handed for something that can be expected to happen 
all the time.


If I understand this correctly, when you're scanning a table with 1000 
rows where you're only allowed to see 50% of them, that's going to be 500 
call to elog(), one for each tuple you can't see.  Having a tuple get 
screened out isn't really an error per se, and while I can see how 
sensitive installs would want those all reported there are others where 
this volume of log activity would be too much.  Just because someone with 
classified clearance is looking at a big table that also has a lot of 
secret info in it, not all installs will want a million errors reported 
just because there's data that person can't see available.


At a minimum, this needs some finer log control, and maybe a rethinking 
altogether of how to handle error cases.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-05-05 Thread Greg Smith

On Mon, 5 May 2008, Tom Lane wrote:


elog() should not be used for user-facing errors.  I couldn't easily
tell just which of the messages are likely to be seen by users and
which ones should be can't happen cases, but certainly there are
a whole lot of these that need to be ereport()s.  Likely there need
to be some new ERRCODEs too.


And it would be a nice step toward the scenarios I was asking about if 
there was a GUC variable for what level to log security violations at.  I 
realize now the tuple-level warnings are going into the SELinux logs 
rather than the PostgreSQL ones, but it should be easier to change policy 
violations that impact the server to something other than just ERROR.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-05-06 Thread Greg Smith

On Tue, 6 May 2008, Tom Lane wrote:

And of course the next question after that is why we should want to 
depend on SELinux at all, rather than implementing row filtering in the 
framework of SQL permissions...


It may be the case that clean row and column filtering at the SQL layer 
are pre-requisites for a clean SELinux implementation, where the only 
difference is that the permission checks are handled by asking SELinux 
instead of looking in the catalog.


As for why SQL restrictions alone aren't enough...the simple answer is 
because it's not SELinux, which I say in all seriousness because it is 
turning into a requirement in some places.


SELinux lets you control what a user login is capable of no matter what 
application they run, and managing those capabilities can happen in one 
place--the SELinux tools.  There's lots of ways to address OS login 
problems.  Let's say the logins have a PAM plug-in that restrict what you 
can login to based on what machine you're on, and also require one of 
those randomly generating key cards so that you can't steal someone else's 
username/password.  If you've got a scheme like that, and the database 
enforces SELinux restrictions, it doesn't matter whether your DBA followed 
all the PostgreSQL security rules correctly, as long as they got the 
SELinux mapping part right.  And you don't have to make sure whatever 
custom security mechanism you've integrated into the login or post-login 
process is recognized by the database proper at all, as long as the 
restrictions can be mapped to the SELinux+database space.


Simple example of something hard to replicate without this framework: 
you discover someone is a rat.  You update your list of active users and 
push that to all your servers.  Now even if said rat is already logged 
into the database server and busy doing 'psql -o /disk/usbkey -c select * 
from secretdata' you just cut them off in the middle of the 
query--without needing to find all the database servers and execute alter 
table secretdata set ..., just by doing simple user account maintenance 
the way people are already comfortable with and have procedures for.


That's the basic idea here--put the authorization into one layer where 
it's easy (for some definitions of easy) to manage and extensible as 
needed, without having to touch the individual applications directly, just 
by adjusting what permissions you publish when data is requested.  I'm 
sure someone can raise issues or suggest alternate implemenations for my 
specific examples, but much like other privledge escalation defense 
mechanisms these environments look for redundant layers of security.  In 
reality users of this would aim for a completely locked down base 
PostgreSQL *and* a completely locked down SELinux implementation 
integrated into that, reinforcing one another, rather than just relying on 
one level of security.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Link requirements creep

2008-05-17 Thread Greg Smith

On Sat, 17 May 2008, Tom Lane wrote:


I was displeased to discover just now that in a standard RPM build of
PG 8.3, psql and the other basic client programs pull in libxml2 and
libxslt; this creates a package dependency that should not be there
by any stretch of the imagination.


When we noticed this recently, my digging suggested you'll be hard pressed 
to have a RedHat system now without those two installed.  The libxslt RPM 
provides necessary components for KDE, GNOME, and Sun's Java RPM. 
libxml2 is far more intertwined even than that.  These dependencies are 
unpleasant technically, but I don't think the introduce any real 
functional creep.  It would be difficult to even strip a system down to 
the point where these packages weren't available.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New DTrace probes proposal

2008-05-18 Thread Greg Smith

On Sat, 17 May 2008, Robert Lor wrote:

I'd like to propose adding the following probes (some of which came from 
Simon) to 8.4.


There's also a big DTrace probe set patch available from OmniTI: 
https://labs.omniti.com/project-dtrace/trunk/postgresql/

http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL

I don't know if you've looked at that before.  There's some overlap but 
many unique and handy probes to each set.  I think it would be nice to 
consider a superset union of the two.  I would guess OmniTI would be glad 
to have their set assimilated into core as well so they don't have to 
maintain their patch past 8.3; hopefully Theo or Robert will chime in on 
that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can't t compile current HEAD

2008-05-18 Thread Greg Smith

On Thu, 15 May 2008, Nikhils wrote:


On Thu, May 15, 2008 at 11:59 AM, Pavel Stehule [EMAIL PROTECTED]
wrote:

I always use a ~/.cvsrc containing

My .cvsrc also includes:


Good hints, and there's now a little section including them all at 
http://wiki.postgresql.org/wiki/Working_with_CVS#Initial_setup


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC08]some detail plan of improving hash index

2008-05-19 Thread Greg Smith

On Fri, 16 May 2008, Josh Berkus wrote:


For a hard-core benchmark, I'd try EAStress (SpecJAppserver Lite)


This reminds me...Jignesh had some interesting EAStress results at the 
East conference I was curious to try and replicate more publicly one day. 
Now that there are some initial benchmarking servers starting to become 
available, it strikes me that this would make a good test case to run on 
some of those periodically.  I don't have a spare $2K for a commercial 
license right now, but there's a cheap ($250) non-profit license for 
EAStress around.  That might be a useful purchase for one of the PG 
non-profits to make one day though.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] triggers on prepare, commit, rollback... ?

2008-05-20 Thread Greg Smith

On Tue, 20 May 2008, Hannu Krosing wrote:

Tell others that this trx failed, maybe log a failure ? OTOH, this can 
be implemented by a daemon that sits on tail -f logfile | grep 
ROLLBACK


In order to follow the log files like that successfully in many 
environments, you need to stay in sync as the underlying log file changes 
(it might rotate every day for example).  Unfortunately it's not as simple 
as just using tail.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catching exceptions from COPY

2008-05-28 Thread Greg Smith

On Wed, 28 May 2008, Darren Reed wrote:


Is it feasible to add the ability to catch exceptions from COPY?


Depends on what you consider feasible.  There's a start to a plan for that 
on the TODO list:  http://www.postgresql.org/docs/faqs.TODO.html but it's 
not trivial to implement.


It's also possible to do this right now using pgloader: 
http://pgfoundry.org/projects/pgloader/ That requires some setup and 
there's overhead to passing through that loading layer.


A third possibility is to write a short script specifically aimed at your 
copy need that breaks your input files into smaller chunks and loads them, 
kicking back the ones that don't load, or breaking them into even smaller 
chunks until you've found the problem line or lines.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] Memory question on win32 systems

2008-05-29 Thread Greg Smith

On Thu, 29 May 2008, Justin wrote:

I'm confussed trying to figure out how caches are being use and being 
moving through postgresql backend.


The shared_buffers cache holds blocks from the database files.  That's it. 
If you want some more information about how that actually works head to 
http://www.westnet.com/~gsmith/content/postgresql/ and read Inside the 
PostgreSQL Buffer Cache.


The work memory allocated for sorting is separate from that, and it 
doesn't cache anything.  It just provides working room for a query that's 
being executed right now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Greg Smith

On Thu, 29 May 2008, David Fetter wrote:

It's a giant up-hill slog to sell warm standby to those in charge of 
making resources available because the warm standby machine consumes SA 
time, bandwidth, power, rack space, etc., but provides no tangible 
benefit, and this feature would have exactly the same problem.


This is an interesting commentary on the priorities of the customers 
you're selling to, but I don't think you can extrapolate from that too 
much.  The deployments I normally deal with won't run a system unless 
there's a failover backup available, period, and the fact that such a 
feature is not integrated into the core yet is a major problem for them. 
Read-only slaves is a very nice to have, but by no means a prerequisite 
before core replication will be useful to some people.  Hardware/machine 
resources are only worth a tiny fraction of what the data is in some 
environments, and in some of those downtime is really, really expensive.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Greg Smith

On Thu, 29 May 2008, Tom Lane wrote:

There's no point in having read-only slave queries if you don't have a 
trustworthy method of getting the data to them.


This is a key statement that highlights the difference in how you're 
thinking about this compared to some other people here.  As far as some 
are concerned, the already working log shipping *is* a trustworthy method 
of getting data to the read-only slaves.  There are plenty of applications 
(web oriented ones in particular) where if you could direct read-only 
queries against a slave, the resulting combination would be a giant 
improvement over the status quo even if that slave was as much as 
archive_timeout behind the master.  That quantity of lag is perfectly fine 
for a lot of the same apps that have read scalability issues.


If you're someone who falls into that camp, the idea of putting the sync 
replication job before the read-only slave one seems really backwards.


I fully accept that it may be the case that it doesn't make technical 
sense to tackle them in any order besides sync-read-only slaves because 
of dependencies in the implementation between the two.  If that's the 
case, it would be nice to explicitly spell out what that was to deflect 
criticism of the planned prioritization.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Greg Smith

On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:


Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile.


Not if you use pg_clearxlogtail ( 
http://www.2ndquadrant.com/replication.htm ), which got lost in the giant 
March commitfest queue but should probably wander into contrib as part of 
8.4.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Greg Smith

On Sat, 31 May 2008, Gurjeet Singh wrote:


Not if you use pg_clearxlogtail


This means we need to modify pg_standby to not check for filesize when
reading XLogs.


No, the idea is that you run the segments through pg_clearxlogtail | gzip, 
which then compresses lightly used segments massively because all the 
unused bytes are 0.  File comes out the same size at the other side, but 
you didn't ship a full 16MB if there was only a few KB used.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-05-30 Thread Greg Smith

On Fri, 30 May 2008, Josh Berkus wrote:

1) Add several pieces of extra information to guc.c in the form of extra 
gettext commands:  default value, subcategory, long description, 
recommendations, enum lists.

2) Incorporate this data into pg_settings


When you last brought this up in February (I started on a long reply to 
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00759.php that I 
never quite finished) the thing I got stuck on was how to deal with the 
way people tend to comment in these files as they change things.


One problem I didn't really see addressed by the improvements you're 
suggesting is how to handle migrating customized settings to a new version 
(I'm talking about 8.4-9.0 after this is in place, 8.3-8.4 is a whole 
different problem).  It would be nice to preserve history of what people 
did like in your examples (which look exactly like what I find myself 
running into in the field).  Now, that will get a lot easier just by 
virtue of having a smaller config file, but I think that adding something 
into pg_settings that allows saving user-added commentary would be a nice 
step toward some useful standardization on that side of things.  It would 
make future automated tools aimed at parsing and generating new files, as 
part of things like version upgrades, a lot easier if there was a standard 
way such comments were handled in addition to the raw data itself.


The other thing I'd like to see make its way into pg_settings, so that 
tools can operate on it just by querying the database, is noting what file 
the setting came from so that you can track things like include file 
usage.  I think with those two additions (comments and source file 
tracking) it would even be concievable to clone a working facsimile of 
even a complicated postgresql.conf file set remotely just by reading 
pg_settings.


While a bit outside of the part you're specifically aiming to improve 
here, if you could slip these two additions in I think it would be a boon 
to future writers of multi-server management tools as well.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Sun, 1 Jun 2008, Peter Eisentraut wrote:


Josh Berkus wrote:

1. Most people have no idea how to set these.

Could you clarify this?  I can't really believe that people are incapable of
editing a configuration file.


The big problem isn't the editing, it's knowing what to set the 
configuration values to.


This is not to say that editing a configuration file should be considered 
reasonable.  Any GUCS overhaul should include a design goal of being able 
to completely manage the configuration system using, say, pgadmin (the 
manage settings via port access part that Josh already mentioned). 
This is why I was suggesting additions aimed at assimilating all the 
things that are in the postgresql.conf file.


Joshua has been banging a drum for a while now that all this data needs to 
get pushing into the database itself.  The GUCS data is clearly structured 
like a database table.  Josh's suggested changes are basically adding all 
the columns needed to it in order to handle everything you'd want to do to 
the table.  If you think of it in those terms and make it possible to 
manipulate that data using the tools already available for updating 
tables, you'll open up the potential to add a whole new class of 
user-friendly applications for making configuration easier to manage.


However, I don't fully agree with taking that idea as far as Joshua has 
suggested (only having the config data in the database), because having 
everything in a simple text file that can be managed with SCM etc. has 
significant value.  It's nice to allow admins to be able to make simple 
changes with just a file edit.  It's nice that you can look at all the 
parameters in one place and browse them.  However, I do think that the 
internal database representation must be capable of holding everything in 
the original postgresql.conf file and producing an updated version of the 
file, either locally or remotely, as needed.



4. We don't seem to be getting any closer to autotuning.

True.  But how does your proposal address this?


The idea that Josh's suggestions are working toward is simplying the 
construction of tools that operate on the server configuration file, so 
that it's easier to write an autotuning tool.  Right now, writing such a 
tool in a generic way gets so bogged down just in parsing/manipulating the 
postgresql.conf file that it's hard to focus on actually doing the tuning 
part.  If we go back to his original suggestion:

http://wiki.postgresql.org/wiki/GUCS_Overhaul
Add a script called pg_generate_conf to generate a postgresql.conf 
based on guc.c and command-line switches (rather than 
postgresql.conf.sample)


It's an easy jump from there to imagine a pg_generate_conf that provide a 
wizard interface to update a configuration file.  I forsee a little GUI 
or web app that connects to a server on port 5432, finds out some basic 
information about the server, and gives something like this:


Parameter   Current Recommended Change?
shared_buffers  32MB1024MB  [X]
effective_cache_size128MB   3000MB  [ ]
work_mem1MB 16MB[ ]

Josh has the actual brains behind such an app all planned out if you look 
at his presentations, but without the larger overhaul it's just not 
possible to make the implementation elegant.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Sun, 1 Jun 2008, Joshua D. Drake wrote:

Well I don't know that a minimum of comments is what I am arguing as 
much as not too much comments.


Josh's proposal included making three levels of documentation-level 
comments available:  terse, normal, and verbose.  The verbose comment 
level probably should include a web link to full documentation.  The way 
the comments litter the existing file, the status quo that's called normal 
mode in this proposal, is IMHO a complete mess.  Most use cases I can 
think of want either no comments or really verbose ones, the limited 
commentary in the current sample postgresql.conf seems a middle ground 
that's not right for anybody.


The key thing thing here in my mind is that it should be possible to 
transform between those three different verbosity levels without losing 
any settings or user-added comments.  They're really just different views 
on the same data, and which view you're seeing should be easy to change 
without touching the data.


I just extracted the original design proposal and some of the relevent 
follow-up in this thread, made some additional suggestions, and put the 
result at http://wiki.postgresql.org/wiki/GUCS_Overhaul I think reading 
that version makes it a bit clearer what the proposed overhaul is aiming 
for.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Mon, 2 Jun 2008, Jignesh K. Shah wrote:

Most people I have seen will increase one or few but not all parameters 
related to memory which can result in loss of performance and 
productivity in figuring out.


If it becomes easier to build a simple tool available to help people tune 
their configurations, that should help here without having to do anything 
more complicated than that.



What happened to AvailRAM setting and base all memory gucs on that.


Like some of the other GUC simplification ideas that show up sometimes 
(unifying all I/O and limiting background processes based on that total is 
another), this is hard to do internally.  Josh's proposal has a fair 
amount of work involved, but the code itself doesn't need to be clever or 
too intrusive.  Unifying all the memory settings would require being both 
clever and intrusive, and I doubt you'll find anybody who could pull it 
off who isn't already overtasked with more important improvements for the 
8.4 timeframe.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Mon, 2 Jun 2008, Tom Lane wrote:


Greg Smith [EMAIL PROTECTED] writes:

Joshua has been banging a drum for a while now that all this data needs to
get pushing into the database itself.


This is, very simply, not going to happen.


Right, there are also technical challenges in the way of that ideal.  I 
was only mentioning the reasons why it might not be the best idea even if 
it were feasible.  However, I do not see why the limitations you bring up 
must get in the way of thinking about how to interact and manage the 
configuration data in a database context, even though it ultimately must 
be imported and exported to a flat file.


The concerns you bring up again about leaving the database in an 
unstartable state are a particularly real danger in the only has access 
to 5432 hosted provider case that this redesign is trying to satisfy.  I 
added a Gotchas section to the wiki page so that this issue doesn't get 
forgotten about.  The standard way to handle this situation is to have a 
known good backup configuration floating around.  Adding something in that 
area may end up being a hard requirement before remote editing makes 
sense.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-03 Thread Greg Smith

On Tue, 3 Jun 2008, Paul van den Bogaard wrote:

So overhauling the GUC parameters is one step, but adding proper 
instrumentation in order to really measure the impact of the new setting 
is necessary too.


Correct, but completely off-topic regardless.  One problem to be solved 
here is to take PostgreSQL tuning from zero to, say, 50% automatic. 
Wander the user lists for a few months; the number of completely 
misconfigured systems out there is considerable, partly because the 
default values for many parameters are completely unreasonable for modern 
hardware and there's no easy way to improve on that without someone 
educating themselves.  Getting distracted by the requirements of the 
high-end systems will give you a problem you have no hope of executing in 
a reasonable time period.


By all means bring that up as a separate (and much, much larger) project: 
Database Benchmarking and Sensitivity Analysis of Performance Tuning 
Parameters would make a nice PhD project for somebody, and there's 
probably a good patent in there somewhere.  Even if you had such a tool, 
it wouldn't be usable by non-experts unless the mundate GUC generation 
issues are dealt with first, and that's where this is at right now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Greg Smith

On Wed, 4 Jun 2008, Andreas Pflug wrote:

When reading this thread, I'm wondering if anybody ever saw a config file for 
a complex software product that was easily editable and understandable.


I would recommend Apache's httpd.conf as an example of something that's 
easy to edit and follow.  Like any complex product, the comments in the 
configuration file itself can't possibly be sufficient by themselves. 
But in general I've found Apache's config file to have enough comments to 
jog my memory when I'm editing it while not being overwhelming.  They 
provide enough detail that when I run into a setting I don't understand 
there's enough context provided that it's easy to search for more 
information.


Poking around with Google for a bit, here's a reasonable sample: 
http://webdav.org/goliath/dav_on_x/apache.conf



IMHO the best compromise in machine and human readability is an XML format.


If the primary PostgreSQL configuration file becomes XML I will quit 
working with the project.  I'm not kidding.  If you think XML is easy to 
generate, edit by hand, and use revision control on, we are at such an 
fundamental disagreement that I wouldn't even try and directly argue with 
you.  Instead I'll quote Eric Raymond:


The most serious problem with XML is that it doesn't play well with 
traditional Unix tools. Software that wants to read an XML format needs an 
XML parser; this means bulky, complicated programs. 
http://www.catb.org/esr/writings/taoup/html/ch05s02.html#id2907018


Let me suggest the following requirement instead which naturally rules it 
out:  it should be possible for a DBA-level coder to write a simple shell 
script that does something useful with the configuration file in order for 
having a text-based configuration to be useful in this context.  To give a 
simple example, I can write a single line [sed|awk|perl] command that will 
let me update the value for one parameter in the current postgresql.conf 
file.  When you can give me a one-liner that does that on an XML file in 
any shell language in that class, then we might have something to talk 
about.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Greg Smith

On Wed, 4 Jun 2008, Tom Lane wrote:


The real problem we need to solve is how to allow newbies to have the
system auto-configured to something that more or less solves their
problems.  Putting the config settings in XML does not accomplish that,
and neither does putting them inside the database.


The subtle issue here is that what makes sense for the database 
configuration changes over time; there's not just one initial generation 
and you're done.  postgresql.conf files can end up moving from one machine 
to another for example.  I think something that doesn't recognize that 
reality and move toward a tune-up capability as well as initial 
generation wouldn't be as useful, and that's where putting the settings 
inside the database helps so much.


Also, there's a certain elegance to having a optimization tool that works 
again either a new installation or an existing one.  I personally have 
zero interest in a one-shot config generator.  It just doesn't solve the 
problems I see in the field.  Performance starts out just fine even with 
the default settings when people first start, and then goes to hell after 
the system has been running for a while (and possibly moved to another 
machine).  By that point nobody wants to mess with their configuration 
file unless it's one simple change at a time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rfc: add pg_dump options to dump output

2008-06-04 Thread Greg Smith

On Tue, 3 Jun 2008, Tom Lane wrote:


Well, the stuff included into the dump by pg_dump -v is informative,
too.  But we stopped doing that by default because of complaints.
I remain unconvinced that this proposal won't suffer the same fate.


I think it would be reasonable to only include the list of options used in 
the dump if you use one that changes what appears in the dump.  That way, 
you wouldn't see anything by default.  But if you make a modification that 
will likely break a diff with an existing dump done with the default 
parameters, the option change that introduced that should show at the very 
beginning.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Greg Smith

On Wed, 4 Jun 2008, Andrew Dunstan wrote:


Tom Lane wrote:

* Can we build a configuration wizard to tell newbies what settings
they need to tweak?


That would trump all the other suggestions conclusively. Anyone good at 
expert systems?


Sigh.  I guess we need to start over again.

Last year around this time, there was one of the recurring retreads of 
this topic named PostgreSQL Configuration Tool for Dummies: 
http://archives.postgresql.org/pgsql-performance/2007-06/msg00386.php


Josh Berkus pointed out that he already had the expert system part of 
this problem solved pretty well with a spreadsheet:


http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc (that's in 
the OpenOffice Calc format if you don't know the extension)


That particular spreadsheet has more useful tuning suggestions in this 
area than 99.9% of PostgreSQL users have or will ever know.  You can 
nitpick the exact recommendations, but the actual logic and thinking 
involved is pretty well solved.  It could use a touch of tweaking and 
modernization but it's not too far off from being as good as you're likely 
to get at making guesses without asking the user too many questions. 
There is one ugly technical issue, that you can't increase shared_buffers 
usefully in many situations because of SHMMAX restrictions, and that issue 
will haunt any attempt to be completely automatic.


Where Josh got hung up, where I got hung up, where Lance Campbell stopped 
at with his Dummies tool, and what some unknown number of other people 
have been twarted by, is that taking that knowledge and turning it into a 
tool useful to users is surprisingly difficult.  The reason for that is 
the current postgresql.conf file and how it maps internally to GUC 
information isn't particularly well suited to automated generation, 
analysis, or updates.  I think Josh got lost somewhere in the parsing the 
file stage.  The parts I personally got stuck on were distinguishing 
user-added comments from ones the system put in, plus being completely 
dissatisfied with how lossy the internal GUC process was (I would like a 
lot more information out of pg_settings than are currently there). 
Lance's helper tool was hobbled by the limitations of being a simple web 
application.


That's the background to Josh's proposal.  It has about an 80% overlap 
with what I was working on suggesting, which is why I jumped on his 
bandwagon so fast.  The outline at 
http://wiki.postgresql.org/wiki/GUCS_Overhaul includes the superset of our 
respective thinking on the first step here toward straightening out this 
mess, further expanded with observations made in this thread.


I would respectively point out that comments about the actual tuning 
itself have no bearing whatsoever on this proposal.  This is trying to 
nail down all the features needed to support both doing an initial 
generation and subsequent incremental improvements to the postgresql.conf 
file, while also reducing some redundancy in the code itself.  Reducing 
the scope to only handling initial generation would make this a smaller 
task.  But it happens to fall out that the work required to cut down on 
the redundancy and that required to better support incremental updates as 
well happen to be almost the same.  Josh's stated agenda is to get this 
right in one swoop, with only one version worth of disruption to the 
format, and that goal is served better IMHO as well by addressing all 
these changes as one batch.


I will attempt to resist further outbursts about non-productive comments 
here, and each time I am tempted instead work on prototyping the necessary 
code I think this really needs instead.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Greg Smith

On Wed, 4 Jun 2008, Aidan Van Dyk wrote:


* Are backends always writing out dirty buffers because there are no free
 ones?  This might mean tweaking settings affecting bgwriter.


What you mean on the first one is are backends always writing out dirty 
buffers becuase there are no *clean* ones; the server operates with no 
*free* buffers as standard operations.  Figuring that out is now easy in 
8.3 with the pg_stat_bgwriter view.



* Are the evicted buffers ones with really high usage counts?  This
 might mean an increase shared buffers would help?


Evicted buffers must have a 0 usage count.  The correct question to ask is 
are buffers never getting high usage counts because they keep getting 
evicted too fast?.  You can look at that in 8.3 using pg_buffercache, 
I've got suggested queries as part of my buffer cache presentation at 
http://www.westnet.com/~gsmith/content/postgresql/



* Are we always spilling small amounts of data to disk for sorting?  A
 a small work_mem increase might help...


I was just talking to someone today about building a monitoring tool for 
this.  Not having a clear way to recommend people monitor use of work_mem 
and its brother spilled to disk sorts is an issue right now, I'll whack 
that one myself if someone doesn't beat me to it before I get time.



* Are all our reads from disk really quick?  This probably means OS
 pagecache has our whole DB, and means random_page_cost could be
 tweaked?


This is hard to do with low overhead in an OS-independant way.  The best 
solution available now would use dtrace to try and nail it down.  There's 
movement in this area (systemtap for Linux, recent discussion at the PGCon 
Developer Meeting of possibly needing more platform-specific code) but 
it's not quite there yet.


So everything you mentioned is either recently added/documented or being 
actively worked on somewhere, and the first two were things I worked on 
myself after noticing they were missing.  Believe me, I feel the items 
that still aren't there, but they're moving along at their own pace. 
There's already more tuning knowledge available than tools to help apply 
that knowledge to other people's systems, which is why I think a diversion 
to focus just on that part is so necessary.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Greg Smith
, so that it's easy to figure out what 
people changed.  That is sometimes handy to include as part of this sort 
of analysis, and it's necessary to provide improvements like a strip the 
unnecessary junk out of this file that many people would like from this 
sort of tool.


When you show people that you recommend increasing a value to something 
larger, any comments about that setting will be shown and they'll know not 
to follow the tool's advice if there's a history there.


This seems like such a better place to be that I'd rather drive toward the 
server-side changes necessary to support it rather than fight the 
difficult tool creation problems.  That's why the focus on a new API for 
'writing my config' for me; that particular goal is just one part of a 
set of revisions that streamline the tool creation process in a not 
necessarily obvious way.  Unless, of course, you've tried to write a 
full-circle config tuning tool, in which case most of the proposed changes 
in this overhaul jump right out at you.


[1] In the shared_buffers case, it may be possible to just recommend a 
value without caring one bit what the current one is.  But for work_mem, 
you really need to actually understand the value if you want any real 
intelligence that combines that information with the maximum connections, 
so that you can compute how much memory is left over for things like 
effective_cache_size.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-05 Thread Greg Smith

On Thu, 5 Jun 2008, Heikki Linnakangas wrote:

A configuration wizard would be nice, but it would be a good start to add a 
section to the manual on how to do the basic tuning. AFAICS we don't have 
one. Clear instructions on how to set the few most important settings like 
shared_buffers and checkpoint_timeout/segments would probably be enough, with 
a link to the main configuration section that explains the rest of the 
settings.


It hasn't gelled yet but I'm working on that.  Most of the text needed is 
now linked to at http://wiki.postgresql.org/wiki/Performance_Optimization


I already talked with Chris Browne about merging his document I put first 
in that list with useful pieces from some of mine into one more 
comprehensive document on the Wiki, covering everything you mention here. 
If we took a snapshot of that when it's done and dumped that into the 
manual, I don't think that would be a problem to wrap up before 8.4 is 
done.  I'd like to include a link to the above performance page in that 
section of the manual as well, both so that people are more likely to find 
fresh content as well as to give them pointers toward more resources than 
the manual can possibly cover.


If people don't read the manual, we can add a link to it from 
postgresql.conf.sample, add a screen to the Windows installer suggesting 
to read it, or even open postgresql.conf in Notepad.


They don't.  Putting pointers toward a relatively simple performance 
tuning document a bit more in people's faces might help lower some of the 
criticism the project takes over providing low defaults for so many 
things.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-05 Thread Greg Smith

On Thu, 5 Jun 2008, Magnus Hagander wrote:

We really need a proper API for it, and the stuff in pgAdmin isn't 
even enough to base one on.


I would be curious to hear your opinion on whether the GUC overhaul 
discussed in this thread is a useful precursor to building such a proper 
API.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-05 Thread Greg Smith

On Thu, 5 Jun 2008, Alvaro Herrera wrote:


I must say that I am confused by this thread.  What's the discussed GUC
overhaul?


http://wiki.postgresql.org/wiki/GUCS_Overhaul

I drop that URL in every other message in hopes that people might start 
commenting on it directly if they see it enough; the fact that you're 
confused says I may need to keep that up :(



(1) Add a lot more comments to each setting
(2) Add documentation links to each setting
(3) Move more frequently used settings to the top of the file
(4) Ship different sample config files
(5) Create an expert system to suggest tuning
(6) Other random ideas (XML, settings in database, others?)

To me, there are two ideas that are doable right now, which are (2) and
(4).  (1) seems to be a step backwards in pg_hba.conf experience, and we
would have to maintain duplicate documentation.  (3) seems messy.  (5)
is a lot of work; do we have volunteers?  As for (6), the two examples I
give can be easily dismissed.
(2) and (4) do not seem necessary to get the config API built.


(1) is in that proposal but is strictly optional as something to put in 
the configuration file itself.  The idea behind (2) is to enable tool 
authors to have an easier way to suggest where to head for more 
information.  I'd like for it to be trivial for a tool to say Suggested 
value for x is y; see 
http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html 
for more information.  I know what most of the settings I tinker with do, 
but even I'd like it to be easier to find the right spot in the manual; 
for newbies it's vital.  You are correct that (2) isn't strictly necessary 
here, but it's valuable and will be easier to wrap into this than to bolt 
on later.


(3) (4) (5) and (6) were off-topic diversions.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-05 Thread Greg Smith

On Thu, 5 Jun 2008, Aidan Van Dyk wrote:

People like me don't want to have postgresql.conf be *only* a 
machine-generated file, which I am not allowed to edit anymore because 
next DBA doing a SET PERSISTANT type of command is going to cause 
postgres to write out something else, over-writing my carefully 
documented reason for some particular setting.


This is why there's the emphasis on preserving comments as they pass into 
the GUC structure and back to an output file.  This is one of the 
implementation details I haven't fully made up my mind on:  how to clearly 
label user comments in the postgresql.conf to distinguish them from 
verbose ones added to the file.  I have no intention of letting manual 
user edits go away; what I'm trying to do here (and this part is much more 
me than Josh) is make them more uniform such that they can co-exist with 
machine edits without either stomping on the other.  Right now doing that 
is difficult, because it's impossible to tell the default comments from 
the ones the users added and the current comment structure bleeds onto the 
same lines as the settings.



But the big issue I have (not that it really matters, because I'm not
one of the ones working on it, so I please don't take this as me telling
anyone what they can or can't do) is that that goal doesn't solve any of
the listed problems stated in the proposal
  1.  Most people have no idea how to set these.


Making it much easier to build recommendation tools is how this helps 
here.



  2. The current postgresql.conf file is a huge mess of 194 options,
 the vast majority of which most users will never touch.


The proposed pg_generate_conf tool includes options to spit out a basic 
configuration file instead of the complete one.



  3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf,
 and settings.sgml), which are only synched with each other manually.


The proposal throws away having a separate postgresql.conf file, so that 
reduces it from 3 places to 2.  That's moving in the right direction



  4. We don't seem to be getting any closer to autotuning.


If you try to build a tuning tool, these areas end up being the 
unnecessarily hard parts.


Thanks for the comments on the proposal.  I'm only bothering to respond to 
messages like yours now, am deleting all of the continuing attemps to 
divert the discussion over to parameter tuning details or expanding the 
scope here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-05 Thread Greg Smith

On Thu, 5 Jun 2008, Alvaro Herrera wrote:


FWIW smb.conf uses ; for one purpose and # for the other.


They're actually combining the way UNIX files use # with how Windows INI 
files use ; in a config file context, which I personally find a little 
weird.


I was already considering keeping user comments as # while making all 
system-inserted ones #! ; many people are already used to #! having a 
special system-related meaning from its use in UNIX shell scripting which 
makes it easier to remember.


I think the next step to this whole plan is to generate a next-gen 
postgresql.conf mock-up showing what each of the outputs from the 
pg_generate_conf tool might look like to get feedback on that; it will 
make what is planned here a bit easier to understand as well.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Greg Smith

On Fri, 6 Jun 2008, Peter Eisentraut wrote:


- What settings do newbies (or anyone else) typically need to change?
Please post a list.
- What values would you set those settings to?  Please provide a description
for arriving at a value, which can later be transformed into code.  Note that
in some cases, not even the documentation provides more than handwaving help.


Josh's spreadsheet at 
http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc provides 
five different models for setting the most critical parameters based on 
different types of workloads.  Everyone can quibble over the fine tuning, 
but having a good starter set of reasonable settings for these parameters 
is a solved problem.  It's just painful to build a tool to apply the 
available expert knowledge that is already around.



- If we know better values, why don't we set them by default?


Because there's not enough information available; the large differences 
between how you tune for different workloads is one example.  Another is 
that people tune for peak and projected activity rather than just what's 
happening right now.  Every model suggested for a tuning wizard recognizes 
you need to ask some set of questions to nail things down.  I continue to 
repeat in broken-record style, exactly what a tuning tool will ask about 
and what settings it will suggest is not important, and getting into that 
is an entirely different discussion (one that gets hashed out every single 
day on pgsql-performance).  The fact that writing such a tool is harder 
than it should be is the issue here.


Another orthogonal stumbling block on the way to making all of this 
automatic is that the surely criticial shared_buffers setting will in 
any useful configuration require messing around with kernel settings 
that no PostgreSQL tool can really help with.


Yes.  So?  All you can do is point this out to users.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Greg Smith

On Fri, 6 Jun 2008, Heikki Linnakangas wrote:

Or perhaps we should explicitly mark the settings the tool has generated, and 
comment out:


#shared_buffers = 32MB   # commented out by wizard on 2008-06-05
shared_buffers = 1024MB  # automatically set by wizard on 2008-06-05


What I would like to do is make the tool spit out a revision history in 
the same way I find all big IT shops handling this already:  by putting 
a revision history style commentary above the current setting.  Here's a 
sample:


# 2008-03-02 : 32MB : postgres : Database default
# 2008-05-02 : 512MB : pg_autotune : Wizard update
# 2008-05-15 : 1024MB : gsmith : Increased after benchmark tests
shared_buffers = 1024MB

If the first tuning tool that comes into existance used this format, and 
the format was reasonable, I think it would be possible to get people 
making manual edits to adopt it as well.


The exact details of how this should look are off-topic for the main 
discussion here, though, so I'd prefer if this whole line of discussion 
died off.  Anyone who wants to comment on this whole area, feel free to 
contact me off-list or edit the Wiki page (which has a section on this 
topic now) to hash out suggestions in this area, I'm trying to keep this 
somewhat thread focused now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Greg Smith

On Fri, 6 Jun 2008, Tom Lane wrote:


I grow weary of this thread.


If we keep it up for, oh, another three years, then maybe you'll be as 
weary as I am of struggling with problems in this area.  Strinking a 
balance between the wants and needs of people who want a fancy GUI tool 
for configuring database settings with those who want to edit things 
manually is a difficult problem that is not going away.  If this didn't 
keep coming back to haunt me all the time I'd like to forget about it 
myself.


I will say it once more: I do not believe for one instant that the 
current formatting of postgresql.conf is the major impediment, or even a 
noticeable impediment, to producing a useful configuration wizard.


Arguments about formatting change to postgresql.conf are a tangent to the 
central questions here, and having just closed some open comments on that 
I am with you on ignoring those as off-topic the same way I keep 
minimizing what are the parameters to tune? comments.


Here are the relevant questions around since the first message that are 
not attracting discussion:


1) Is it worthwhile to expand the information stored in the GUC structure 
to make it better capable of supporting machine generation and to provide 
more information for tool authors via pg_settings?  The exact fields that 
should or shouldn't be included remains controversial; consider default 
value, per-session/runtime/restart, and enum lists as the list of 
things that are most needed there.


2) Should the sample postgresql.conf file be replaced by a program that 
generates it using that beefed up structure instead, therefore removing 
one file that has to be manually kept in sync with the rest of the code 
base right now?


3) What now makes sense for a way to update database parameters for users 
whose primary (or only in some cases) access to the server is over the 
database port, given the other changes have improved automatic config file 
generation?


If you wish to prove otherwise, provide a complete wizard except for the 
parts that touch the config file, and I will promise to finish it.


You do realize that if I provided you with such a sample, the not 
implemented yet config API stubs it needs to work would be exactly what 
are suggested to add in the proposal page, right?  I (and Josh) didn't 
just make them all up out of nowhere you know.  I wrote a message here 
already about what the seemingly inevitable path the budding wizard tool 
hacker follows and why that leads into some of the changes suggested.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Greg Smith

On Fri, 6 Jun 2008, Gregory Stark wrote:


Greg Smith [EMAIL PROTECTED] writes:


1) Is it worthwhile to expand the information stored in the GUC structure to
make it better capable of supporting machine generation and to provide more
information for tool authors via pg_settings?  The exact fields that should or
shouldn't be included remains controversial; consider default value,
per-session/runtime/restart, and enum lists as the list of things that are
most needed there.


Isn't that a list of what's *already* there?


I should have been clearer there.  Some of the items suggested are already 
in the structure, but aren't visible via pg_settings. In those cases it's 
just exporting information that's already there.  In others (like the 
suggestion to add a URL to the documentation) it is actually a new field 
being added as well as its corresponding entry in the settings view.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Greg Smith

On Fri, 6 Jun 2008, Tom Lane wrote:


Well, you can't see the default or reset values in pg_settings, only the
current value.  However, I fail to see the use of either of those for
a configure wizard.


I'm under the impression that the primary reason to put the default in 
there is to make it easier for a file generator program to be decoupled a 
bit from the internal representation.  Regardless, these values should be 
exposed for tool writers.  If you build a prototype interface for an 
interactive settings changing tool, you quickly discover that showing the 
default, range, and recommended setting are all valuable things people 
would like to see when deciding what the change a setting to.  And there's 
no reason accumulating all that info should be the responsibility of a 
tool writer when it's easy to expose and keep up to date inside the 
database itself.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Greg Smith

On Mon, 9 Jun 2008, Tom Lane wrote:


It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented.  So I see absolutely no
point in spending time integrating pg_clearxlogtail now.


There are remote replication scenarios over a WAN (mainly aimed at 
disaster recovery) that want to keep a fairly updated database without 
putting too much traffic over the link.  People in that category really 
want zeroed tail+compressed archives, but probably not the extra overhead 
that comes with shipping smaller packets in a real-time implementation.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to Sponsor a Feature

2008-06-11 Thread Greg Smith

On Wed, 11 Jun 2008, Andrew Dunstan wrote:

If we want to help people to sponsor features, then I think we need to 
deal with subjects like finding someone to undertake the development, 
the sponsor's relationship with the developer, methods and times of 
payment, etc.


The bit on the wiki is helpful for developers trying to get a new feature 
implemented but I think that's where its scope ends.


There seem to be occasional person wandering by here that it really 
doesn't help though.  Periodically you'll see I want feature $X in 
PostgreSQL.  I'm willing to help fund it.  What do I do?.  In most of 
those that have wandered by recently, $X is a known feature any number of 
other people want.  Good sample cases here are recent requests to help 
fund or implement materialized views, supporting queries on read-only 
slaves, and SQL window support.


I don't think these people need guidance on how to manage the project, 
they need some sort of way to feel comfortable saying will pledge $Y for 
feature $X in a way that makes sense on both sides.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-11 Thread Greg Smith

On Wed, 11 Jun 2008, Tom Lane wrote:


Who said anything about loops?  What I am talking about is what happens
during
set memory_usage = X;  // implicitly sets work_mem = X/100, say
set work_mem = Y;
set memory_usage = Z;
What is work_mem now, and what's your excuse for saying so, and how
will you document the behavior so that users can understand it?
(Just to make things interesting, assume that some of the above SETs
happen via changing postgresql.conf rather than directly.)


People are already exposed to issues in this area via things like the 
include file mechanism.  You can think of that two ways.  You can say, 
there's already problems like this so who cares if there's another one. 
Or, you can say let's not add even more confusion like that.


Having a mini programming language for setting parameters is interesting 
and all, and it might be enough to do a good job of handling the basic 
newbie setup chores.  But I don't think it's a complete solution and 
therefore I find moving in that direction a bit of a distraction; your 
concerns about ambiguity just amplify that feeling.  It's unlikely that 
will get powerful enough to enable the one true config file that just 
works for everybody.  There's too many things that depend a bit on both 
data access pattern and on overall database size/structure no matter what 
you do.


[If only there were some technology that did workload profiling and set 
the server parameters based on that.  Some sort of dynamic tuning tool; 
wouldn't that be great?  Oh well, that's just a dream right now I guess.]


I'm not sure if I've stated this explicitly yet, but I personally have no 
interest in just solving the newbie problem.  I want a tool to help out 
tuning medium to large installs, and generating a simple config file is 
absolutely something that should come out of that as a bonus.  Anything 
that just targets the simple installs, though, I'm not very motivated to 
chase after.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-12 Thread Greg Smith

On Thu, 12 Jun 2008, Bruce Momjian wrote:


I am thinking a web-based wizard would make the most sense.


I have not a single customer I work with who could use an external 
web-based wizard.  Way too many companies have privacy policy restrictions 
that nobody dare cross by giving out any info about their server, or 
sometimes that they're even using PostgreSQL inside the firewall.  If it's 
not a tool that you can run on the same server you're running PostgreSQL 
on, I'd consider that another diversion that's not worth pursuing.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to Sponsor a Feature

2008-06-12 Thread Greg Smith

On Thu, 12 Jun 2008, Alvaro Herrera wrote:


Incidentally, we have minutes from the meeting.  Is it OK to publish
them openly?


There's a set of minutes already up at 
http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting


There was no solution proposed to the escrow problem, nor to allow 
sponsoring of one feature by multiple independent individuals.


Pity, as those are the main things I get asked about.  I've been thinking 
about this a fair amount recently, and it is difficult to figure out how 
SPI can handle this in reasonable way.  It almost has to keep a hands-off 
approach, but the centeral organizers here are where people would think 
they should come for advice in this area.


The best approach I've thought of is to have something like 
http://www.postgresql.org/support/professional_support this is instead a 
catalog of companies and/or associated worker bees who have successfully 
had submissions commited.  Then the only interaction SPI/Core would have 
is to confirm that the claims people were making about what patches they 
were involved in were factual, which should be easy enough to verify just 
with the release notes, while disclaiming any interaction in contracting 
with said companies/individuals.  This implements a meritocracy suggesting 
who people might work with by noting what areas they've worked in 
successfully before.


For example, the last time I fielded one of these, the person I was 
advising wanted some PITR work done.  I of course pointed them toward 
2ndquadrant because everything they asked about was in code Simon wrote in 
the first place, and some pointers over to the release notes were 
sufficient to prove that was true.


As for a format, I was thinking the directory would be organized like 
this:


Company
  Person A
8.3 features involved in
8.2 features
  Person B
8.2 features
...
  Current/future projects
8.4 add feature
Eventually add feature

Nothing new, really, I'm just suggesting an alternate view on the data 
that's available if you know how to look for it, structured in a way that 
would make it easier for potential sponsors to navigate.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to Sponsor a Feature

2008-06-15 Thread Greg Smith

On Mon, 16 Jun 2008, Peter Eisentraut wrote:


Joshua D. Drake wrote:

In reality though, what should happen is we should have a list of
companies and consultants that are willing to be paid to implement
features, todos and bug fixes.


I think the professional support company listing is already that list.


It is a much larger superset of that list.  There's a lot of entries there 
that provide support in various ways, but not core code customizations. 
You cannot expect anyone not already involved in the community to have any 
idea which of those companies have any track record of getting new 
features implemented.


Maybe all that's needed is to extend the provides section there with a 
tag for those who are willing to take that sort of work on.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]

2008-06-26 Thread Greg Smith

On Wed, 25 Jun 2008, yuan fang wrote:

i am studying the source code of postgresql and want to become a 
developer of it.What should i do?


1) If you send e-mail to pgsql-hackers, include a useful subject

2) Read the intros at http://www.postgresql.org/developer/

3) For browsing the code itself, I like
http://doxygen.postgresql.org/

4) Notes on how to deal with version control issues, patch submission, and 
to find out what development is going on currently are all at 
http://wiki.postgresql.org/wiki/Development_information


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for pgstat.stat

2008-07-01 Thread Greg Smith

On Tue, 1 Jul 2008, Tom Lane wrote:


Magnus Hagander [EMAIL PROTECTED] writes:

Tom Lane wrote:

Hmm ... that would almost certainly result in the stats being lost over
a system shutdown.  How much do we care?



Only for those who put it on a ramdrive. The default, unless you
move/sync it off, would still be the same as it is today. While not
perfect, the performance difference of going to a ramdrive might easily
be enough to offset that in some cases, I think.


Well, what I was wondering about is whether it'd be worth adding logic
to copy the file to/from a safer location at startup/shutdown.


Anyone who needs fast stats storage enough that they're going to symlink 
it to RAM should be perfectly capable of scripting server startup/shutdown 
to shuffle that to/from a more permanent location.  Compared to the admin 
chores you're likely to encounter before reaching that scale it's a pretty 
easy job, and it's not like losing that data file is a giant loss in any 
case.  The only thing I could see putting into the server code to help 
support this situation is rejecting an old stats file and starting from 
scratch instead if they restored a previous version after a crash that 
didn't save an updated copy.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix advises ...

2008-07-14 Thread Greg Smith

On Sat, 12 Jul 2008, Abhijit Menon-Sen wrote:


At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote:


The later versions of mine had a GUC named effective_spindle_count
which I think is nicely abstracted away from the implementation
details.


Yes, that does sound much better. (The patch I read had a
preread_pages_bitmapscan variable instead.)


This patch does need a bit of general care in a couple of areas.  The 
reviewing game plan I'm working through goes like this:


1) Update the original fadvise test program Greg Stark wrote to be a bit 
easier to use for testing general compatibility of this approach.  I want 
to collect some data from at least two Linux and Solaris systems with 
different disk setups.


2) Check out effective_spindle_count and see if it looks like a reasonable 
way to tune this feature.  If so, will probably need to merge that in to 
Zoltan's version of the patch.  May need some other cleanup in that patch 
set as well--I'm not sure that closed XLOG patch that got pushed into here 
as well is really helpful for example.


3) Generate a sequential scan test program aimed to hobble the Linux 
kernel in the way Zoltan described as motivation for his work.  I'm 
working with Jeff Davis this week to try and repurpose some of his 
syncronized scan test programs to handle this while we're both in the same 
place for a bit.


4) Generate a bitmap scan test program to check the original patch.

5) If the performance results look useful and consistant, then move toward 
cleaning up broader compatibility issues like the segfault concerns Zoltan 
mentioned.


Going to take a while to work through all that, but performance patches 
with platform-specific benefit are always painful like this.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sorting writes during checkpoint

2008-07-15 Thread Greg Smith

On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:


I will have a plan to test it on RAID-5 disks, where sequential writing
are much better than random writing. I'll send the result as an evidence.


If you're running more tests here, please turn on log_checkpoints and 
collect the logs while the test is running.  I'm really curious if there's 
any significant difference in what that reports here in the sorted case 
vs. the regular one.


Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. 
With sorted writes, we can call fsync() segment-by-segment for each 
writes of dirty pages contained in the segment. It could improve worst 
response time during checkpoints.


Further decreasing the amount of data that is fsync'd at any point in time 
might be a bigger improvement than just the sorting itself is doing (so 
far I haven't seen anything really significant just from the sort but am 
still testing).


One thing I didn't see any comments from you on is how/if the sorted 
writes patch lowers worst-case latency.  That's the area I'd hope an 
improved fsync protocol would help most with, rather than TPS, which might 
even go backwards because writes won't be as bunched and therefore will 
have more seeking.  It's easy enough to analyze the data coming from 
pgbench -l to figure that out; example shell snipped that shows just the 
worst ones:


pgbench -l -N db
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d   | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some 
examples of that on my web page you may have seen before.



In addition, the current smgr layer is completely useless because
it cannot be extended dynamically and cannot handle multiple md-layer
modules. I would rather merge current smgr and part of bufmgr into
a new smgr and add smgr_hook() than bulk_io_hook().


I don't really have a firm opinion here about the code to comment on this 
specific suggestion, but I will say that I've found the amount of layering 
in this area makes it difficult to understand just what's going on 
sometimes (especially when new to it).  A lot of that abstraction felt a 
bit pass-through to me, and anything that would collapse that a bit would 
be helpful for streamlining the code instrumenting going on with things 
like dtrace.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DOCS] [ADMIN] shared_buffers and shmmax

2008-07-26 Thread Greg Smith

On Thu, 24 Jul 2008, Greg Sabino Mullane wrote:


Bite the bullet and start showing the buffer settings as a pure number of bytes
everywhere, and get rid of the confusing '8kB' unit in pg_settings?


There's already some changes needed in this area needed to execute the 
full GUC cleanup/wizard plan that's being worked on.  The pg_settings view 
really should show the value both as the user input it and as it's stored 
internally for cases like these, which lowers the confusion here a bit 
even without going so far as converting everything to bytes.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   5   6   7   8   9   10   >