Re: [HACKERS] Core Extensions relocation

2011-06-11 Thread Greg Smith

Peter Eisentraut wrote:

For the directory name, I'd prefer either src/extensions (since there is
more than one), or if you want to go for short somehow, src/ext.  (Hmm,
I guess the installation subdirectory is also called "extension".  But
it felt wrong on first reading anyway.)
  


I jumped between those two a couple of times myself, settling on 
"extension" to match the installation location as you figured out.  
Assuming that name shouldn't change at this point, this seemed the best 
way to name the new directory, even though I agree it seems weird at first.




What version did you branch this off? :)
  


Long enough ago that apparently I've missed some major changes; Magnus 
already pointed out I needed to revisit how MODULEDIR was used.  Looks 
like I need to rebuild the first patch in this series yet again, which 
shouldn't be too bad.  The second time I did that, I made the commits 
atomic enough that the inevitable third one would be easy.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating new remote branch in git?

2011-06-09 Thread Greg Smith

On 06/10/2011 12:19 AM, Alex Hunsaker wrote:

It looks like if you push the remote branch first everything should work nicely:
git checkout master
git push origin origin:refs/heads/REL9_1_STABLE
git fetch # fetch the new branch
git checkout REL9_1_STABLE


This is basically the state of the art right now for the most frequently 
deployed versions of git.  I don't think checking out master first is 
necessary though.


Potentially useful automation/trivia for alternate approaches includes:

1) Write a little script to do this messy chore, so you don't have to 
remember this weird "create a new branch using a full refspec" syntax.  
There is an example named git-create-branch along with a short tutorial 
on this subject at 
http://www.zorched.net/2008/04/14/start-a-new-branch-on-your-remote-git-repository/


2) Use git_remote_branch https://github.com/webmat/git_remote_branch 
which is the swiss army knife of remote branch hackery automation.


3) Rather than manually hack the config files, use "git config" to do 
it.  Not sure if this is completely workable, but something like this 
might connect the newly created branch to your local one after pushing 
it out, without actually opening the config with an editor:


git config branch.REL9_1_STABLE.remote origin
git config branch.REL9_1_STABLE.merge refs/heads/REL9_1_STABLE

4) Use a system with git>=1.7.0, which adds:

git branch --set-upstream REL9_1_STABLE origin/REL9_1_STABLE

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuning autovacuum

2011-06-09 Thread Greg Smith

On 06/09/2011 05:41 PM, Tom Lane wrote:

As Robert said, we're already seeing scalability problems with the
pg_stats subsystem.  I'm not eager to add a bunch more per-table
counters, at least not without some prior work to damp down the ensuing
performance hit.
   


That's fair.  Anyone who is running into the sort of autovacuum issues 
prompting this discussion would happily pay the overhead to get better 
management of that; it's one of the easiest things to justify more 
per-table stats on IMHO.  Surely the per-tuple counters are vastly more 
of a problem than these messages could ever be.


But concerns about stats overload are why I was highlighting issues 
around sending multiple messages per vacuum, and why incremental updates 
as it runs are unlikely to work out.  Balancing that trade-off, getting 
enough data to help but not so such the overhead is obnoxious, is the 
non obvious tricky part of the design here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuning autovacuum

2011-06-09 Thread Greg Smith

On 06/09/2011 04:43 PM, Bernd Helmle wrote:
I'd go further and expose the info or details issued by VACUUM VERBOSE 
into the view, too, at least the number of pages visited and cleaned 
(or dead but not yet cleaned). Customers are heavily interested in 
these numbers and i've found pgfouine to provide those numbers very 
useful.


Agreed there.  The fact that VACUUM VERBOSE reports them suggests 
they're not too terribly difficult to track either.


What we'd probably need to do with those is handle them like the other 
stats in the system:  store a total number for visited/cleaned/dead for 
each relation, then increment the total as each vacuum finishes.  That 
way, you could point a standard monitoring system at it and see trends.  
Just saving the last snapshot of data there isn't as useful.


I'm seeing these as being like the counters in pg_stat_bgwriter; while 
it's easy to think of VACUUM "what work happened?" data as info you just 
want the last snapshot of, a continuous incrementing counter can do that 
and a lot of other things too.  Anyone who is extracting useful data 
from pg_stat_bgwriter can use the same logic to track this data, even if 
it only moves forward in big chunks as vacuum completes.  And it may be 
feasible to update it in the middle, too.


Stepping into implementation for a second, the stats that are showing up 
in pg_stat_user_tables are being driven by a PgStat_MsgVacuum message 
coming out of the stats collector when it finishes.  While that's the 
obvious place to put some more stuff, that's not necessarily the right 
way to build a better monitoring infrastructure.  Two things to consider:


-It's not really aimed at being called multiple times for one operation 
("needs vacuum", "started vacuum", "finished vacuum"
-There is a mix of things that make sense as long-term counters and 
things that update as snapshots--the timestamps are the main thing there.


I haven't thought about it enough to have a real opinion on whether you 
can squeeze everything into the existing message by adding more fields, 
or if another type of message is necessary.  Just pointing out that it's 
not trivially obvious which approach is better.


What is unambiguous is that all this new data is really going to need a 
new view for it, pg_stat_vacuum or something like that.  The fields that 
are already in pg_stat_user_tables can stay there as deprecated for a 
while, but this all wants to be in its own new view.


This would really be a nice medium sized feature that DBAs would love, 
and it would help adoption on big sites.  I have some ideas on how to 
get some funding to develop it because I keep running into this, but if 
someone wants to run with the idea I'd be happy to just help instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuning autovacuum

2011-06-09 Thread Greg Smith

Robert Haas wrote:

Well, if there are more tables that need vacuuming than there are
workers available at any given time, there will be a delay.  We
probably don't keep track of that delay at present, but we could.
  


There are at least four interesting numbers to collect each time 
autovacuum runs:


1) This one, when was the threshold crossed.  I believe one of the AV 
workers would have to pause periodically to update these if they're all 
busy doing work.

2) What time did the last autovacuum start at
3) How many dead rows were there at the point when it started
4) When did the last autovacuum end (currently the only value stored)

There may be a 5th piece of state I haven't looked at yet worth 
exposing/saving, something related to how much work was skipped by the 
partial vacuum logic introduced in 8.4.  I haven't looked at that code 
enough to know which is the right metric to measure its effectiveness 
by, but I have tis gut feel it's eventually going to be critical for 
distinguishing between the various common types of vacuum-heavy 
workloads that show up.


All of these need to be stored in a system table/view, so that an admin 
can run a query to answer questions like:


-What is AV doing right now?
-How far behind is AV on tables it needs to clean but hasn't even 
started on?

-How long is the average AV taking on my big tables?
-As I change the AV parameters, what does it do to the runtimes against 
my big tables?


As someone who is found by a lot of people whose problems revolve around 
databases with heavy writes or update churn, limitations in the current 
state of tracking what autovacuum does have been moving way up my 
priority list the last year.  I now have someone who is always running 
autovacuum on the same table, 24x7.  It finishes every two days, and 
when it does the 20% threshold is already crossed for it to start 
again.  The "wait until a worker was available" problem isn't there, but 
I need a good wasy to track all of the other three things to have a hope 
of improving their situation.  Right now getting the data I could use 
takes parsing log file output and periodic dumps of pg_stat_user_tables, 
then stitching the whole mess together.


You can't run a heavily updated database in the TB+ range and make sense 
of what autovacuum is doing without a large effort matching output from 
log_autovacuum_min_duration and the stats that are visible in 
pg_stat_user_tables.  It must get easier than that to support the sort 
of bigger tables it's possible to build now.  And if this data starts 
getting tracked, we can start to move toward AV parameters that are 
actually aiming at real-world units, too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core Extensions relocation

2011-06-09 Thread Greg Smith

Vinicius Abrahao wrote:
This is my first post at Hackers, so sorry if I am been a noob here, 
but I am pretty confused about

how to create the extension pg_buffercache.


This list is for talking about development of new features, normally on 
the latest development version of the software (right now 9.1).  There 
is no such thing as CREATE EXTENSION in versions before that.  A 
question like "how do I install pg_buffercache for 9.0?" should normally 
get sent to one of the other mailing lists; any of pgsql-performance, 
pgsql-admin, or pgsql-general would be appropriate to ask that at.  This 
one really isn't.  It's also better to avoid taking someone else's 
discussion and replying to it with your questions.



But even so, I need to ask, because my production is on another versions:
What is the right way to install this contrib at 9.0.1, 9.0.2 and 9.0.4 ?


But since I happen to know this answer, here's an example from a RedHat 
derived Linux system running PostgreSQL 9.0.4, logged in as the postgres 
user:


-bash-3.2$ locate pg_buffercache.sql
/usr/pgsql-9.0/share/contrib/pg_buffercache.sql
/usr/pgsql-9.0/share/contrib/uninstall_pg_buffercache.sql
-bash-3.2$ psql -d pgbench -f 
/usr/pgsql-9.0/share/contrib/pg_buffercache.sql

SET
CREATE FUNCTION
CREATE VIEW
REVOKE
REVOKE
-bash-3.2$ psql -d pgbench -c "select count(*) from pg_buffercache"
count
---
 4096

The location of the file will be different on other platforms, but 
that's the basic idea of how you install it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-06-07 Thread Greg Smith

On 06/05/2011 08:50 AM, Mitsuru IWASAKI wrote:

It seems that I don't have enough time to complete this work.
You don't need to keep cc'ing me, and I'm very happy if postgres to be
the first DBMS which support buffer cache hibernation feature.
   


Thanks for submitting the patch, and we'll see what happens from here.  
I've switch to bcc'ing you here and we should get you off everyone 
else's cc: list here soon.  If this feature ends up getting committed, 
I'll try to remember to drop you a note about it so you can see what 
happened.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting a bug tracker for the Postgres project

2011-06-01 Thread Greg Smith

On 05/31/2011 05:41 PM, Alvaro Herrera wrote:

Excerpts from Josh Berkus's message of mar may 31 17:05:23 -0400 2011:

   

BTW, we talked to Debian about debbugs ages ago, and the Debian project
said that far too much of debbugs was not portable to other projects.
 

The good news is that the GNU folk proved them wrong, as evidenced
elsewhere in the thread.
   


What happened is that one of the authors got motivated (not sure 
why/how) to put a major amount of work into making the code portable so 
that sites other than Debian could use it.  So past perceptions about it 
being really hard were correct, that's just been fixed since then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-06-01 Thread Greg Smith

On 06/01/2011 03:03 AM, Tatsuo Ishii wrote:

Also I really want to see the performance comparison between these two
approaches in the real world database.
   


Well, tell me how big of a performance improvement you want PgFincore to 
win by, and I'll construct a benchmark where it does that.  If you pick 
a database size that fits in the OS cache, but is bigger than 
shared_buffers, the difference between the approaches is huge.  The 
opposite--trying to find a case where this hibernation approach wins--is 
extremely hard to do.


Anyway, further discussion of this patch is kind of a waste right now.  
We've never gotten the patch actually sent to the list to establish a 
proper contribution (just pointers to a web page), and no feedback on 
that or other suggestions for redesign (extension repackaging, GUC 
renaming, removing unused code, and a few more).  Unless the author 
shows up again in the next two weeks, this is getting bounced back with 
no review as code we can't use.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I check the treatment of bug fixes?

2011-05-30 Thread Greg Smith
On 05/27/2011 08:36 AM, MauMau wrote:
> I posted a patch for bug #6011 to pgsql-hackers several days ago. How
> can I check the status of bug fixes? I'm worried that the patch might
> be forgotten, because bug #5842 was missed for two months until Bruce
> noticed it.

Discussion here seems to have wandered far away from useful suggestions
for you, let's see if that's possible to return to that. Best way to
confirm when a bug is resolved is to subscribe to the pgsql-committers
mailing list. If a commit for this fix appears, odds are good the
original bug number will be referenced. Even if it isn't, you may
recognize it via its description. Until you see that, the bug is almost
certainly still open.

Bugs that are considered to impact the current version under development
are sometimes listed at http://wiki.postgresql.org/wiki/Open_Items
Adding a bug to there that's not really specific to the new version may
not be considered good form by some. It is the closest thing to an open
bug tracker around though, and adding items to there means they won't be
forgotten about; it's checked regularly by developers considering when
it's a good time to release another alpha or beta.

In my mind, clarifying what circumstances it's appropriate for people to
put a bug onto the Open Items list would be a useful way to spend a
little time. Certainly more productive than trying firing more bullets
at the unkillable zombie that is bug tracking software.

-- 
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting a bug tracker for the Postgres project

2011-05-30 Thread Greg Smith

On 05/29/2011 05:17 AM, Peter Eisentraut wrote:

Here is a list to choose from:
http://en.wikipedia.org/wiki/Comparison_of_issue_tracking_systems
   


I turned this into a spreadsheet to sort and prune more easily; if 
anyone wants that let me know, it's not terribly useful beyond what I'm 
posting here.  44 total, 16 that are open-source.  I would say that 
having an e-mail interface is the next major cut to make.  While 
distasteful, it's possible for this project to adopt a solution that 
doesn't use PostgreSQL, and one interesting candidate is in that 
category.  It's not feasible to adopt one that doesn't integrate well 
with e-mail though.


List of software without listed e-mail integration:  Fossil, GNATS, 
Liberum Help Desk, MantisBT, org-mode, Flyspray, ikiwiki, Trac.


The 8 F/OSS programs left after that filter are:

OTRS
Debbugs
Request Tracker
Zentrack
LibreSource
Redmine
Roundup
Bugzilla

The next two filters you might apply are:

Support for Git:  Redmine, Bugzilla
PostgreSQL back-end:  OTRS, Request Tracker, LibreSource, Redmine, 
Roundup, Bugzilla


There are a couple of additional nice to have items I saw on the feature 
list, and they all seem to spit out just Redmine & Bugzilla.  Those are 
the two I've ended up using the most on PostgreSQL related projects, 
too, so that isn't a surprise to me.  While I'm not a strong fan of 
Redmine, it has repeatedly been the lesser of the evils available here 
for three different companies I've worked at or dealt with.


Greg Stark is right that Debbugs has a lot of interesting features 
similar to the desired workflow here.  It's not tied to just Debian 
anymore; the GNU project is also using it now.  And the database backend 
isn't that terrible to consider:  it's flat files with a BerkleyDB index 
built on top.  I think if it was perfect except for that, it would still 
be worth considering.  Debbugs is far from a general purpose solution 
though, so any customization to support differences in this project's 
workflow would likely end up being one-off hacks.  The VCS support might 
be a problem, but I've gotten the impression that git is increasingly 
popular for other Debian work.  Since the program is in Perl, I can't 
imagine it's a gigantic task to switch that out, and probably one other 
people would like to see.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-05-29 Thread Greg Smith

On 05/29/2011 03:11 PM, Jeff Janes wrote:

If you use "pgbench -S -M prepared" at a scale where all data fits in
memory, most of what you are benchmarking is network/IPC chatter, and
table locking.


If you profile it, you'll find a large amount of the time is actually 
spent doing more mundane things, like memory allocation.  The network 
and locking issues are really not the bottleneck at all in a surprising 
number of these cases.  Your patch isn't really dependent on your being 
right about the cause here, which means this doesn't impact your 
submissions any.  Just wanted to clarify that what people expect are 
slowing things down in this situation and what actually shows up when 
you profile are usually quite different.


I'm not sure whether this feature makes sense to add to pgbench, but 
it's interesting to have it around for developer testing.  The way 
you've built this isn't messing with the code too much to accomplish 
that, and your comments about it being hard to do this using "-f" are 
all correct.  Using a custom test file aims to shoot your foot unless 
you apply a strong grip toward doing otherwise.



some numbers for single client runs on 64-bit AMD Opteron Linux:
12,567 sps  under -S
19,646 sps  under -S -M prepared
58,165 sps  under -P
   


10,000 is too big of a burst to run at once.  The specific thing I'm 
concerned about is what happens if you try this mode when using "-T" to 
enforce a runtime limit, and your SELECT rate isn't high.  If you're 
only doing 100 SELECTs/second because your scale is big enough to be 
seek bound, you could overrun by nearly two minutes.


I think this is just a matter of turning the optimization around a bit.  
Rather than starting with a large batch size and presuming that's ideal, 
in this case a different approach is really needed.  You want the 
smallest batch size that gives you a large win here.  My guess is that 
most of the gain here comes from increasing batch size to something in 
the 10 to 100 range; jumping to 10K is probably overkill.  Could you try 
some smaller numbers and see where the big increases start falling off at?


Obligatory code formatting nitpick:  try not to overrun the right margin 
any further than the existing code around line 1779, where you add more 
ttype comments.  That needs to turn into a multi-line comment.  Rest of 
the patch looks fine, and don't worry about resubmitting for that; just 
something to tweak on your next update.  A slightly more descriptive 
filename for the patch would help out those of us who look at a lot of 
pgbench patches, too.  Something like "pgbench_loop_v1.patch" for 
example would make it easier for me to remember which patch this was by 
its name.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-26 Thread Greg Smith

On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:

For 1, I've just finish my work.  The latest patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
   


Reminder here--we can't accept code based on it being published to a web 
page.  You'll need to e-mail it to the pgsql-hackers mailing list to be 
considered for the next PostgreSQL CommitFest, which is starting in a 
few weeks.  Code submitted to the mailing list is considered a release 
of it to the project under the PostgreSQL license, which we can't just 
assume for things when given only a URL to them.


Also, you suggested you were out of time to work on this.  If that's the 
case, we'd like to know that so we don't keep cc'ing you about things in 
expectation of an answer.  Someone else may pick this up as a project to 
continue working on.  But it's going to need a fair amount of revision 
before it matches what people want here, and I'm not sure how much of 
what you've written is going to end up in any commit that may happen 
from this idea.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tackling full page writes

2011-05-25 Thread Greg Smith

On 05/24/2011 04:34 PM, Robert Haas wrote:

we could name this feature "partial full page writes" and enable it
either with a setting of full_page_writes=partial


+1 to overloading the initial name, but only if the parameter is named 
'maybe', 'sometimes', or 'perhaps'.


I've been looking into a similar refactoring of the names here, where we 
bundle all of these speed over safety things (fsync, full_page_writes, 
etc.) into one control so they're easier to turn off at once.  Not sure 
if it should be named "web_scale" or "do_you_feel_lucky_punk".



3. Going a bit further, Greg proposed the idea of ripping out our
current WAL infrastructure altogether and instead just having one WAL
record that says "these byte ranges on this page changed to have these
new contents".


The main thing that makes this idea particularly interesting to me, over 
the other two, is that it might translate well into the idea of using 
sync_file_range to aim for a finer fsync call on Linux than is currently 
possible.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner

2011-05-25 Thread Greg Smith

Michael Nolan wrote:
Based on last year's discussion of this TODO item, it seems thoughts 
have been focused on estimating how much data is
being satisfied from PG's shared buffers.  However, I think that's 
only part of the problem.  


Sure, but neither it nor what you're talking about are the real gating 
factor on making an improvement here.  Figuring out how to expose all 
this information to the optimizer so it can use it when planning is the 
hard part.  Collecting a read time profile is just one of the many ways 
you can estimate what's in cache, and each of the possible methods has 
good and bad trade-offs.  I've been suggesting that people assume that's 
a solved problem--I'm pretty sure what you're proposing was done by Greg 
Stark once and a prototype built even--and instead ask what you're going 
to do next if you had this data.


This data would probably need to be kept separately for each table or 
index, as some tables or indexes
may be mostly or fully in cache or on faster physical media than 
others, although in the absence of other
data about a specific table or index, data about other relations in 
the same tablespace might be of some use. 


This is the important part.  Model how the data needs to get stored such 
that the optimizer can make decisions using it, and I consider it easy 
to figure out how it will get populated later.  There are actually 
multiple ways to do it, and it may end up being something people plug-in 
an implementation that fits their workload into, rather than just having 
one available.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

On 05/24/2011 01:35 PM, Josh Berkus wrote:

I would suggest instead adding a new page to postgresql.org/developer
which lists the development schedule, rather than linking to that wiki
page.  Maybe on this page?

http://www.postgresql.org/developer/roadmap
   


Now that I look at the roadmap page again, I think all that would really 
be needed here is to tweak its wording a bit.  If the description on 
there of the link to the wiki looked like this:


General development information
  A wiki page about various aspects of the PostgreSQL development 
process, including detailed schedules and submission guidelines


I think that's enough info to keep there.  Putting more information back 
onto the main site when it can live happily on the wiki seems 
counterproductive to me; if there's concerns about things like 
vandalism, we can always lock the page.  I could understand the argument 
that "it looks more professional to have it on the main site", but 
perception over function only goes so far for me.


The idea of adding a link back to the wiki from the 
https://commitfest.postgresql.org/ page would complete being able to 
navigate among the three major sites here, no matter which people 
started at.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

On 05/24/2011 05:03 PM, Peter Eisentraut wrote:

On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote:
   

-Given that work in August is particularly difficult to line up with
common summer schedules around the world, having the other>1 month
gap in the schedule go there makes sense.
 

You might want to add a comment on the schedule page about the
June/July/August timing, because it looks like a typo, and the meeting
minutes are also inconsistent how they talk about June and July.
   


Yes, I was planning to (and just did) circle back to the minutes to make 
everything match up.  It's now self-consistent, same dates as the 
schedule, and explains the rationale better.


I'm not sure how to address the feeling of typo you have on the schedule 
page beyond that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

David Fetter wrote:

I thought we'd agreed on the timing for the first CF, and that I was
to announce it in the PostgreSQL Weekly News, so I did just that.
  


Yes, and excellent.  The other ideas were:

-Publish information about the full schedule to some of the more popular 
mailing lists


-Link to this page more obviously from postgresql.org (fixed redirect 
URL is probably the right approach) to "bless" it, and potentially 
improve its search rank too.


The specific new problem being highlighted to work on here is that the 
schedule and development process is actually quite good as open-source 
projects go, but that fact isn't visible at all unless you're already on 
the inside of the project.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.2 schedule

2011-05-23 Thread Greg Smith
At the developer meeting last week:  
http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there was 
an initial schedule for 9.2 hammered out and dutifully transcribed at  
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan , and 
the one part I wasn't sure I had written down correctly I see Robert 
already fixed.


There was a suggestion to add some publicity around the schedule for 
this release.  There's useful PR value to making it more obvious to 
people that the main development plan is regular and time-based, even if 
the release date itself isn't fixed.  The right time to make an initial 
announcement like that is "soon", particularly if a goal here is to get 
more submitted into the first 9.2 CF coming in only a few weeks.  Anyone 
have changes to suggest before this starts working its way toward an 
announcement?


The main parts of the discussion leading to changes from the 9.1 
schedule, as I recall them, are:


-Shooting for a slightly earlier branch/initial 9.2 CommitFest in June 
helps some with patch developer bit-rot, and may let developers who are 
focused on new features be productive for more of the year.  The 
perception that new development is unwelcome between the final CF and 
version release seems to have overshot a bit from its intention.  It's 
not the best period to chat on this list, with many people distracted by 
release goals.  But some people just aren't in the right position to 
work on alpha/beta testing and stability work then, and the intention 
was not to block them from doing something else if that's the case.  (A 
similar bit brought up during one of the patch prep talks is that review 
is also welcome outside of a CF, which isn't really clear)


-The last CF of the release is tough to reschedule usefully due to 
concerns about December/beginning of the year holidays.


-Given that work in August is particularly difficult to line up with 
common summer schedules around the world, having the other >1 month gap 
in the schedule go there makes sense.


As for why there aren't more changes, it's hard to argue that the 9.1 
process was broken such that it needs heavy modification.  There were a 
large number of new features committed, people seem satisfied with the 
quality of the result so far, and the schedule didn't go too far off the 
rails.  Outside of the manpower issues (which are serious), the sections 
that strained the most against problems seem really hard to identify 
with anything other than hindsight.  The tension between "ship it" and 
"make the release better" is a really fundamental one to software 
development.


The two main ideas that pop up regularly, organizing more CommitFests or 
making them shorter, are both hard to adopt without more active 
volunteers working on review (both at the initial and committer level) 
and an increase in available CF manager time.  An idea I heard a couple 
of people suggest is that it would take a CF manager focused exclusively 
on the "patch chasing" parts of the role--not someone who is also trying 
to develop, commit, or review during the CF--before this would be 
feasible to consider.  Some sort of relief for making that role less 
demanding is needed here, before it's practical to schedule those even 
more often.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-18 Thread Greg Smith

Two things that could be changed from this example to make it more useful:

-Document at least a little bit more how this is different from the 
"all/all" rule.  I can imagine users wondering "do I use this instead of 
the other one?  In addition?  Is it redundant if I have 'all' in there?  
A little more description here aiming at the expected FAQs here would 
make this much more useful.


-The default database is based on your user name, which is postgres in 
most packaged builds but not if you compile your own.  I don't know 
whether it's practical to consider substituting that into this file, or 
if it's just enough to mention that as an additional doc comment.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-18 Thread Greg Smith

Greg Smith wrote:
Any packager who grabs the shared/postgresql/extension directory in 
9.1, which I expect to be all of them, shouldn't need any changes to 
pick up this adjustment.  For example, pgstattuple installs these files:


share/postgresql/extension/pgstattuple--1.0.sql
share/postgresql/extension/pgstattuple--unpackaged--1.0.sql
share/postgresql/extension/pgstattuple.control

And these are the same locations they were already at.


...and the bit I missed here is that there's a fourth file here:

lib/postgresql/pgstattuple.so

If you look at a 9.1 spec file, such as 
http://svn.pgrpms.org/browser/rpm/redhat/9.1/postgresql/EL-6/postgresql-9.1.spec 
, you'll find:


%files contrib
...
%{pgbaseinstdir}/lib/pgstattuple.so

Which *does* require a packager change to relocate from the 
postgresql-91-package to the main server one.  So the theory that a 
change here might happen without pushing a repackaging suggestion toward 
packagers is busted.  This does highlight that some packaging guidelines 
would be needed here to completely this work.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-18 Thread Greg Smith

Greg Smith wrote:
Attached is a second patch to move a number of extensions from 
contrib/ to src/test/.  Extensions there are built by the default 
built target, making installation of the postgresql-XX-contrib package 
unnecessary for them to be available.


That was supposed to be contrib/ to src/extension , no idea where that 
test bit came from.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache estimates, cache access cost

2011-05-15 Thread Greg Smith

Cédric Villemain wrote:

http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache
  


This rebases easily to make Cedric's changes move to the end; I just 
pushed a version with that change to 
https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone 
wants a cleaner one to browse.  I've attached a patch too if that's more 
your thing.


I'd recommend not getting too stuck on the particular hook Cédric has 
added here to compute the cache estimate, which uses mmap and mincore to 
figure it out.  It's possible to compute similar numbers, albeit less 
accurate, using an approach similar to how pg_buffercache inspects 
things.  And I even once wrote a background writer extension that 
collected this sort of data as it was running the LRU scan anyway.  
Discussions of this idea seem to focus on how the "what's in the cache?" 
data is collected, which as far as I'm concerned is the least important 
part.  There are multiple options, some work better than others, and 
there's no reason that can't be swapped out later.  The more important 
question is how to store the data collected and then use it for 
optimizing queries.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/contrib/Makefile b/contrib/Makefile
index 6967767..47652d5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		oscache	\
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
diff --git a/contrib/oscache/Makefile b/contrib/oscache/Makefile
new file mode 100644
index 000..8d8dcc5
--- /dev/null
+++ b/contrib/oscache/Makefile
@@ -0,0 +1,15 @@
+# contrib/oscache/Makefile
+
+MODULE_big = oscache
+OBJS = oscache.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/oscache
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/oscache/oscache.c b/contrib/oscache/oscache.c
new file mode 100644
index 000..1ad7dc2
--- /dev/null
+++ b/contrib/oscache/oscache.c
@@ -0,0 +1,151 @@
+/*-
+ *
+ * oscache.c
+ *
+ *
+ * Copyright (c) 2011, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/oscache/oscache.c
+ *
+ *-
+ */
+/* { POSIX stuff */
+#include  /* exit, calloc, free */
+#include  /* stat, fstat */
+#include  /* size_t, mincore */
+#include  /* sysconf, close */
+#include  /* mmap, mincore */
+/* } */
+
+/* { PostgreSQL stuff */
+#include "postgres.h" /* general Postgres declarations */
+#include "utils/rel.h" /* Relation */
+#include "storage/bufmgr.h"
+#include "catalog/catalog.h" /* relpath */
+/* } */
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+float4 oscache(Relation, ForkNumber);
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+	/* Install hook. */
+	OSCache_hook = &oscache;
+}
+
+/*
+ * oscache process the os cache inspection for the relation.
+ * It returns the percentage of blocks in OS cache.
+ */
+float4
+oscache(Relation relation, ForkNumber forkNum)
+{
+	int  segment = 0;
+	char *relationpath;
+	char filename[MAXPGPATH];
+	int fd;
+	int64  total_block_disk = 0;
+	int64  total_block_mem  = 0;
+
+	/* OS things */
+	int64 pageSize  = sysconf(_SC_PAGESIZE); /* Page size */
+	register int64 pageIndex;
+
+	relationpath = relpathperm(relation->rd_node, forkNum);
+
+	/*
+	 * For each segment of the relation
+	 */
+	snprintf(filename, MAXPGPATH, "%s", relationpath);
+	while ((fd = open(filename, O_RDONLY)) != -1)
+	{
+		// for stat file
+		struct stat st;
+		// for mmap file
+		void *pa = (char *)0;
+		// for calloc file
+		unsigned char *vec = (unsigned char *)0;
+		int64  block_disk = 0;
+		int64  block_mem  = 0;
+
+		if (fstat(fd, &st) == -1)
+		{
+			close(fd);
+			elog(ERROR, "Can not stat object file : %s",
+filename);
+			return 0;
+		}
+
+		/*
+		* if file ok
+		* then process
+		*/
+		if (st.st_size != 0)
+		{
+			/* number of block in the current file */
+			block_disk = st.st_size/pageSize;
+
+			/* TODO We need to split mmap size to be sure (?) to be able to mmap */
+			pa = mmap(NULL, st.st_size, PROT_NONE, MAP_SHARED, fd, 0);
+			if (pa == MAP_FAILED)
+			{
+close(fd);
+elog(ERROR, "Can not mmap object file : %s, errno = %i,%s\nThis error can happen if there is not enought space in memory to do the projection. Please mail ced...@2ndquadrant.fr with '[oscache] ENOMEM' as subject.",
+	filename, errno, strerror(errno));
+return 0;
+			}
+
+			/* Prepare our vector containing all blocks informatio

Re: [HACKERS] performance-test farm

2011-05-12 Thread Greg Smith

Tom Lane wrote:

There's no such thing as a useful performance test that runs quickly
enough to be sane to incorporate in our standard regression tests.
  


To throw a hard number out:  I can get a moderately useful performance 
test on a SELECT-only workload from pgbench in about one minute.  That's 
the bare minimum, stepping up to 5 minutes is really necessary before 
I'd want to draw any real conclusions.


More importantly, a large portion of the time I'd expect regression test 
runs to be happening with debug/assert on.  We've well established this 
trashes pgbench performance.  One of the uglier bits of code added to 
add the "performance farm" feature to the buildfarm code was hacking in 
a whole different set of build options for it.


Anyway, what I was envisioning here was that performance farm systems 
would also execute the standard buildfarm tests, but not the other way 
around.  We don't want performance numbers from some platform that is 
failing the basic tests.  I would just expect that systems running the 
performance tests would cycle through regression testing much less 
often, as they might miss a commit because they were running a longer 
test then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance-test farm

2011-05-11 Thread Greg Smith

Tomas Vondra wrote:

Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.
  


The idea is that buildfarm systems that are known to have a) reasonable 
hardware and b) no other concurrent work going on could also do 
performance tests.  The main benefit of this approach is it avoids 
duplicating all of the system management and source code building work 
needed for any sort of thing like this; just leverage the buildfarm 
parts when they solve similar enough problems.  Someone has actually 
done all that already; source code was last sync'd to the build farm 
master at the end of March:  https://github.com/greg2ndQuadrant/client-code


By far the #1 thing needed to move this forward from where it's stuck at 
now is someone willing to dig into the web application side of this.  
We're collecting useful data.  It needs to now be uploaded to the 
server, saved, and then reports of what happened generated.  Eventually 
graphs of performance results over time will be straighforward to 
generate.  But the whole idea requires someone else (not Andrew, who has 
enough to do) sits down and figures out how to extend the web UI with 
these new elements.




I guess we could run a script that collects all those important
parameters and then detect changes. Anyway we still need some 'really
stable' machines that are not changed at all, to get a long-term baseline.
  


I have several such scripts I use, and know where two very serious ones 
developed by others are at too.  This part is not a problem.  If the 
changes are big enough to matter, they will show up as a difference on 
the many possible "how is the server configured?" reports, we just need 
to pick the most reasonable one.  It's a small details I'm not worried 
about yet.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Infinity bsearch crash on Windows

2011-05-11 Thread Greg Smith

Tom Lane wrote:

SELECT 'INFINITY'::TIMESTAMP;



Hmm ... I bet this is related to the recent reports about ALTER USER
VALID UNTIL 'infinity' crashing on Windows.  Can the people seeing this
get through the regression tests?  Perhaps more to the point, what is
their setting of TimeZone?  What does the pg_timezone_abbrevs view show
for them?
  


I must have missed that thread, I think I'm missing one of these lists 
(pgsql-bugs maybe?).  I've cc'd Mark Watson so maybe you can get better 
responses without me in the middle if needed; for this one, he reports:


Show timezone gives US/Eastern
Select * from pg_timezone_abbrevs returns zero rows


My Linux system that doesn't have this problem is also in US/Eastern, 
too, but I get 189 rows in pg_timezone_abrevs.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Infinity bsearch crash on Windows

2011-05-10 Thread Greg Smith
A 9.1Beta1 test report from Richard Broersma (and confirmed on another 
system by Mark Watson) showed up pgsql-testers this week at 
http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with 
the following test crashing his Windows server every time:


SELECT 'INFINITY'::TIMESTAMP;

That works fine for me on Linux.  Richard chased the error in the logs, 
which was a generic "you can't touch that memory" one, down to a full 
stack trace:  
http://archives.postgresql.org/pgsql-testers/2011-05/msg9.php


It looks like it's losing its mind inside of 
src/backend/utils/adt/datetime.c , specifically at this line in datebsearch:


   3576 while (last >= base)
   3577 {
   3578 position = base + ((last - base) >> 1);
   3579 result = key[0] - position->token[0];

Why crash there only on Windows?  Was the problem actually introduced 
above this part of the code?  These are all questions I have no answer for.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Greg Smith

Heikki Linnakangas wrote:
Well, my first patch was two-phase commit. And I had never even used 
PostgreSQL before I dived into the source tree and started to work on that


Well, everyone knows you're awesome.  A small percentage of the people 
who write patches end up having the combination of background skills, 
mindset, and approach to pull something like that off.  But there are at 
least a dozens submissions that start review with "I don't think there 
will ever work, but I can't even read your malformed patch to be sure" 
for every one that went like 2PC.  If every submitter was a budding 
Heikki we wouldn't need patch submission guidelines at all.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-09 Thread Greg Smith

Josh Berkus wrote:

As I don't think we can change this, I think the best answer is to tell people
"Don't submit a big patch to PostgreSQL until you've done a few small
patches first.  You'll regret it".
  


When I last did a talk about getting started writing patches, I had a 
few people ask me afterwards if I'd ever run into problems with having 
patch submissions rejected.  I said I hadn't.  When asked what my secret 
was, I told them my first serious submission modified exactly one line 
of code[1].  And *that* I had to defend in regards to its performance 
impact.[2]


Anyway, I think the intro message should be "Don't submit a big patch to 
PostgreSQL until you've done a small patch and some patch review" 
instead though.  It's both a good way to learn what not to do, and it 
helps with one of the patch acceptance bottlenecks.



The problem is not the process itself, but that there is little
documentation of that process, and that much of that documentation does
not match the defacto process.  Obviously, the onus is on me as much as
anyone else to fix this.
  


I know the documentation around all this has improved a lot since then.  
Unfortunately there's plenty of submissions done by people who never 
read it.  Sometimes it's because people didn't know about it; in others 
I suspect it was seen but some hard parts ignored because it seemed like 
too much work.


One of these days I'm going to write the "Formatting Curmudgeon Guide to 
Patch Submission", to give people an idea just how much diff reading and 
revision a patch should go through in order to keep common issues like 
spurious whitespace diffs out of it.  Submitters can either spent a 
block of time sweating those details out themselves, or force the job 
onto a reviewer/committer; they're always there.  And every minute it's 
sitting in someone else's hands who is doing that job instead of reading 
the real code, the odds of the patch being kicked back go up.


[1] http://archives.postgresql.org/pgsql-patches/2007-03/msg00553.php
[2] http://archives.postgresql.org/pgsql-patches/2007-02/msg00222.php

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-09 Thread Greg Smith

On 05/09/2011 02:31 PM, Robert Haas wrote:

I don't think we should be too obstinate about trying to twist the arm
of packagers who (as Tom points out) will do whatever they want in
spite of us, but the current state of contrib, with all sorts of
things of varying type, complexity, and value mixed together cannot
possibly be a good thing.


I think the idea I'm running with for now means that packagers won't 
actually have to do anything.  I'd expect typical packaging for 9.1 to 
include share/postgresql/extension from the build results without being 
more specific.  You need to grab 3 files from there to get the plpgsql 
extension, and I can't imagine any packager listing them all by name.  
So if I dump the converted contrib extensions to put files under there, 
and remove them from the contrib build area destination, I suspect they 
will magically jump from the contrib to the extensions area of the 
server package at next package build; no packager level changes 
required.  The more I look at this, the less obtrusive of a change it 
seems to be.  Only people who will really notice are users who discover 
more in the basic server package, and of course committers with 
backporting to do.


Since packaged builds of 9.1 current with beta1 seem to be in short 
supply still, this theory looks hard to prove just yet.  I'm very 
excited that it's packaging week here however (rolls eyes), so I'll 
check it myself.  I'll incorporate the suggestions made since I posted 
that test patch and do a bigger round of this next, end to end with an 
RPM set as the output.  It sounds like everyone who has a strong opinion 
on what this change might look like has sketched a similar looking 
bikeshed.  Once a reasonable implementation is hammered out, I'd rather 
jump to the big argument between not liking change vs. the advocacy 
benefits to PostgreSQL of doing this; they are considerable in my mind.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-09 Thread Greg Smith

On 05/09/2011 03:31 PM, Alvaro Herrera wrote:

For executables we already have src/bin. Do we really need a separate
place for, say, pg_standby or pg_upgrade?
   


There's really no executables in contrib that I find myself regularly 
desperate for/angry at because they're not installed as an integral part 
of the server, the way I regularly find myself cursing some things that 
are now extensions.  The only contrib executable I use often is pgbench, 
and that's normally early in server life--when it's still possible to 
get things installed easily most places.  And it's something that can be 
removed when finished, in cases where people are nervous about the 
contrib package.


Situations where pg_standby or pg_upgrade suddenly pop up as emergency 
needs seem unlikely too, which is also the case with oid2name, 
pg_test_fsync, pg_archivecleanup, and vacuumlo.  I've had that happen 
with pg_archivecleanup exactly once since it appeared--running out of 
space and that was the easiest way to make the problem go away 
immediately and permanently--but since it was on an 8.4 server we had to 
download the source and build anyway.


Also, my experience is that people are not that paranoid about running 
external binaries, even though they could potentially do harm to the 
database.  Can always do a backup beforehand.  But the idea of loading a 
piece of code that lives in the server all the time freaks them out.  
The way the word contrib implies (and sometimes is meant to mean) low 
quality, while stuff that ships with the main server package does not, 
has been beaten up here for years already.  It's only a few cases where 
that's not fully justified, and the program can easily become an 
extension, that I feel are really worth changing here.  There are 49 
directories in contrib/ ; at best maybe 20% of them will ever fall into 
that category.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-09 Thread Greg Smith

On 05/09/2011 12:06 PM, Andrew Dunstan wrote:
The fact that we can do in place upgrades of the data only addresses 
one pain point in upgrading. Large legacy apps require large retesting 
efforts when upgrading, often followed by lots more work renovating 
the code for backwards incompatibilities. This can be a huge cost for 
what the suits see as little apparent gain, and making them do it more 
frequently in order to stay current will not win us any friends.


I just had a "why a new install on 8.3?" conversation today, and it was 
all about the application developer not wanting to do QA all over again 
for a later release.


Right now, one of the major drivers for "why upgrade?" has been the 
performance improvements in 8.3, relative to any older version.  The 
main things pushing happy 8.3 sites to 8.4 or 9.0 that I see are either 
VACUUM issues (improved with partial vacuum in 8.4) or wanting real-time 
replication (9.0).  I predict many sites that don't want either are 
likely to sit on 8.3 for a really long time.  The community won't be 
able to offer a compelling reason why smaller sites in particular should 
go through the QA an upgrade requires.  The fact that the app QA time is 
now the main driver--not the dump and reload time--is good, because it 
makes it does make it easier for the people with the biggest data sets 
to move.  They're the ones that need the newer versions the most anyway, 
and in that regard having in-place upgrade start showing up as of 8.3 
was really just in time.


I think 8.3 is going to be one of those releases like 7.4, where people 
just keep running it forever.  At least shortening the upgrade path has 
made that concern a little bit better.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-05-08 Thread Greg Smith
ased, while 
working on simpler programs that don't aim so high leads to software 
that ships to the world and finds users.  The only reason pgtune is now 
available in packaged form on multiple operating systems is that I 
ignored all advice about aiming for a complicated tool and instead wrote 
a really simple one.  That was hard enough to finish.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-07 Thread Greg Smith

Mitsuru IWASAKI wrote:

the patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110508.patch 
  


We can't accept patches just based on a pointer to a web site.  Please 
e-mail this to the mailing list so that it can be considered a 
submission under the project's licensing terms.



I hope this would be committable and the final version.
  


PostgreSQL has high standards for code submissions.  Extremely few 
submissions are committed without significant revisions to them based on 
code review.  So far you've gotten a first round of high-level design 
review, there's several additional steps before something is considered 
for a commit.  The whole process is outlined at 
http://wiki.postgresql.org/wiki/Submitting_a_Patch


From a couple of minutes of reading the patch, the first things that 
pop out as problems are:


-All of the ControlFile -> controlFile renaming has add a larger 
difference to ReadControlFile than I would consider ideal.

-Touching StrategyControl is not something this patch should be doing.
-I don't think your justification ("debugging or portability") for 
keeping around your original code in here is going to be sufficient to 
do so.
-This should not be named enable_buffer_cache_hibernation.  That very 
large diff you ended up with in the regression tests is because all of 
the settings named enable_* are optimizer control settings.  Using the 
name "buffer_cache_hibernation" instead would make a better starting point.


From a bigger picture perspective, this really hasn't addressed any of 
my comments about shared_buffers only being the beginning of the useful 
cache state to worry about here.  I'd at least like the solution to the 
buffer cache save/restore to have a plan for how it might address that 
too one day.  This project is also picky about only committing code that 
fits into the long-term picture for desired features.


Having a working example of a server-side feature doing cache storage 
and restoration is helpful though.  Don't think your work here is 
unappreciated--it is.  Getting this feature added is just a harder 
problem than what you've done so far.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-07 Thread Greg Smith
Attached patch is a first cut at what moving one contrib module (in this 
case pg_buffercache) to a new directory structure might look like.  The 
idea is that src/extension could become a place for "first-class" 
extensions to live.  Those are ones community is committed to providing 
in core, but are just better implemented as extensions than in-database 
functions, for reasons that include security.  This idea has been shared 
by a lot of people for a while, only problem is that it wasn't really 
practical to implement cleanly until the extensions code hit.  I think 
it is now, this attempts to prove it.


Since patches involving file renaming are clunky, the changes might be 
easier to see at 
https://github.com/greg2ndQuadrant/postgres/commit/507923e21e963c873a84f1b850d64e895776574f 
where I just pushed this change too.  The install step for the modules 
looks like this now:


gsmith@grace:~/pgwork/src/move-contrib/src/extension/pg_buffercache$ 
make install

/bin/mkdir -p '/home/gsmith/pgwork/inst/move-contrib/lib/postgresql'
/bin/mkdir -p 
'/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension'
/bin/sh ../../../config/install-sh -c -m 755  pg_buffercache.so 
'/home/gsmith/pgwork/inst/move-contrib/lib/postgresql/pg_buffercache.so'
/bin/sh ../../../config/install-sh -c -m 644 ./pg_buffercache.control 
'/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension/'
/bin/sh ../../../config/install-sh -c -m 644 ./pg_buffercache--1.0.sql 
./pg_buffercache--unpackaged--1.0.sql  
'/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension/'

$ psql -c "create extension pg_buffercache"
CREATE EXTENSION

The only clunky bit I wasn't really happy with is the amount of code 
duplication that comes from having a src/extension/Makefile that looks 
almost, but not quite, identical to contrib/Makefile.  The rest of the 
changes don't seem too bad to me, and even that's really only 36 lines 
that aren't touched often.  Yes, the paths are different, so backports 
won't happen without an extra step.  But the code changes required were 
easier than I was expecting, due to the general good modularity of the 
extensions infrastructure.  So long as the result ends up in 
share/postgresql/extension/ , whether they started in contrib/ 
or src/extension/ doesn't really matter to CREATE EXTENSION.  
But having them broke out this way makes it easy for the default 
Makefile to build and install them all.  (I recognize I didn't do that 
last step yet though)


I'll happily go covert pgstattuple and the rest of the internal 
diagnostics modules to this scheme, and do the doc cleanups, this 
upcoming week if it means I'll be able to use those things without 
installing all of contrib one day.  Ditto for proposing RPM and Debian 
packaging changes that match them.  All that work will get paid back the 
first time I don't have to fill out a bunch of paperwork (again) at a 
customer site justifying why they need to install the contrib [RPM|deb] 
package (which has some scary stuff in it) on all their servers, just so 
I can get some bloat or buffer inspection module.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/contrib/Makefile b/contrib/Makefile
index 6967767..04cf330 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,7 +30,6 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
-		pg_buffercache	\
 		pg_freespacemap \
 		pg_standby	\
 		pg_stat_statements \
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
deleted file mode 100644
index 323c0ac..000
--- a/contrib/pg_buffercache/Makefile
+++ /dev/null
@@ -1,18 +0,0 @@
-# contrib/pg_buffercache/Makefile
-
-MODULE_big = pg_buffercache
-OBJS = pg_buffercache_pages.o
-
-EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.0.sql pg_buffercache--unpackaged--1.0.sql
-
-ifdef USE_PGXS
-PG_CONFIG = pg_config
-PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
-else
-subdir = contrib/pg_buffercache
-top_builddir = ../..
-include $(top_builddir)/src/Makefile.global
-include $(top_srcdir)/contrib/contrib-global.mk
-endif
diff --git a/contrib/pg_buffercache/pg_buffercache--1.0.sql b/contrib/pg_buffercache/pg_buffercache--1.0.sql
deleted file mode 100644
index 9407d21..000
--- a/contrib/pg_buffercache/pg_buffercache--1.0.sql
+++ /dev/null
@@ -1,17 +0,0 @@
-/* contrib/pg_buffercache/pg_buffercache--1.0.sql */
-
--- Register the function.
-CREATE FUNCTION pg_buffercache_pages()
-RETURNS SETOF RECORD
-AS 'MODULE_PATHNAME', 'pg_buffercache_pages'
-LANGUAGE C;
-
--- Create a view for convenient access.
-CREATE VIEW pg_buffercache AS
-	SELECT P.* FROM pg_buffercache_pages() AS P
-	(bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid,
-	 r

Re: [HACKERS] Why not install pgstattuple by default?

2011-05-07 Thread Greg Smith

On 05/06/2011 04:06 PM, Tom Lane wrote:

FWIW, I did move pg_config from -devel to the "main" (really client)
postgresql package in Fedora, as of 9.0.  That will ensure it's present
in either client or server installations.  Eventually that packaging
will reach RHEL ...
   


We should make sure that the PGDG packages adopt that for 9.1 then, so 
it starts catching on more.  Unless Devrim changed to catch up since I 
last installed an RPM set, in that 9.0 it's still in the same place:


$ rpm -qf /usr/pgsql-9.0/bin/pg_config
postgresql90-devel-9.0.2-2PGDG.rhel5

While Peter's question about whether it's really all that useful is 
reasonable, I'd at least like to get a better error message when you 
don't have everything needed to compile extensions.  I think the 
shortest path to that is making pg_config more likely to be installed, 
then to check whether the file "pg_config --pgxs" references exists.  
I'll see if I can turn that idea into an actual change to propose.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-07 Thread Greg Smith

On 05/07/2011 12:42 PM, Peter Eisentraut wrote:

On fre, 2011-05-06 at 14:32 -0400, Greg Smith wrote:
   

Given the other improvements in being able to build extensions in 9.1,
we really should push packagers to move pg_config from the PostgreSQL
development package into the main one starting in that version.  I've
gotten bit by this plenty of times.
 

Do you need pg_config to install extensions?
   


No, but you still need it to build them.  PGXN is a source code 
distribution method, not a binary one.  It presumes users can build 
modules they download using PGXS.  No pg_config, no working PGXS, no 
working PGXN.  For such a small binary to ripple out to that impact is bad.


The repmgr program we distribute has the same problem, so I've been 
getting first-hand reports of just how many people are likely to run 
into this recently.  You have to install postgresql-devel with RPM and 
on Debian, the very non-obvious postgresql-server-dev-$version


Anyway, didn't want to hijack this thread beyond pointing out that if 
there any package reshuffling that happens for contrib changes, it 
should check for and resolve this problem too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-06 Thread Greg Smith

On 05/06/2011 05:58 PM, Josh Berkus wrote:

Yeah, I wasn't thinking of including all of contrib. There's a lot of
reasons not to do that.  I was asking about pgstattuple in particular,
since it's:
(a) small
(b) has no external dependancies
(c) adds no stability risk or performance overhead
(d) is usually needed on production systems when it's needed at all

It's possible that we have one or two other diagnostic utilities which
meet the above profile. pageinspect, maybe?
   


I use pgstattuple, pageinspect, pg_freespacemap, and pg_buffercache 
regularly enough that I wish they were more common.  Throw in pgrowlocks 
and you've got the whole group Robert put into the debug set.  It makes 
me sad every time I finish a utility using one of these and realize I'll 
have to include the whole "make sure you have the contrib modules 
installed" disclaimer in its documentation again.


These are the only ones I'd care about moving into a more likely place.  
The rest of the contrib modules are the sort where if you need them, you 
realize that early and get them installed.  These are different by 
virtue of their need popping up most often during emergencies.  The fact 
that I believe they all match the low impact criteria too makes it even 
easier to consider.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-06 Thread Greg Smith

On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote:

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).
   


Right.  The thing to realize is that shared_buffers is becoming a 
smaller fraction of the total RAM used by the database every year.  On 
Windows it's been stuck at useful settings being less than 512MB for a 
while now.  And on UNIX systems, around 8GB seems to be effective upper 
limit.  Best case, shared_buffers is only going to be around 25% of 
total RAM; worst-case, approximately, you might have Windows server with 
64GB of RAM where shared_buffers is less than 1% of total RAM.


There's nothing wrong with the general idea you're suggesting.  It's 
just only targeting a small (and shrinking) subset of the real problem 
here.  Rebuilding cache state starts with shared_buffers, but that's not 
enough of the problem to be an effective tweak on many systems.


I think that all the complexity with CRCs etc. is unlikely to lead 
anywhere too, and those two issues are not completely unrelated.  The 
simplest, safest thing here is the right way to approach this, not the 
most complicated one, and a simpler format might add some flexibility 
here to reload more cache state too.  The bottleneck on reloading the 
cache state is reading everything from disk.  Trying to micro-optimize 
any other part of that is moving in the wrong direction to me.  I doubt 
you'll ever measure a useful benefit that overcomes the expense of 
maintaining the code.  And you seem to be moving to where someone can't 
restore cache state when they change shared_buffers.  A simpler 
implementation might still work in that situation; reload until you run 
out of buffers if shared_buffers shrinks, reload until you're done with 
the original size.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-06 Thread Greg Smith

Christopher Browne wrote:

I'm getting "paper cuts" quite a bit these days over the differences
between what different packaging systems decide to install.  The one
*I* get notably bit on, of late, is that I have written code that
expects to have pg_config to do some degree of self-discovery, only to
find production folk complaining that they only have "psql" available
in their environment.
  


Given the other improvements in being able to build extensions in 9.1, 
we really should push packagers to move pg_config from the PostgreSQL 
development package into the main one starting in that version.  I've 
gotten bit by this plenty of times.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compiling a PostgreSQL 7.3.2 project with Eclipse

2011-05-06 Thread Greg Smith

Krešimir Križanović wrote:



However, where I compile and run the project, in the Eclipse console I 
get:


POSTGRES backend interactive interface

$Revision: 1.115 $ $Date: 2006/02/06 01:19:46 $

I don’t know what to do with this backend interface. I would like to 
get a postmaster running and to connect to a data base with psql. 
However, when i try to start psql, it says that there is no postmaster 
running.




An example of a session with that interface is at 
http://archives.postgresql.org/pgsql-hackers/2000-01/msg01471.php ; you 
may find it useful at some point.


Your problem is caused by a change made to PostgreSQL's naming 
convention made after the 7.3 fork you're using. In earlier versions, 
"postgres" meant start the server in single user mode: 
http://www.postgresql.org/docs/7.3/static/app-postgres.html


While "postmaster" started it as a proper server: 
http://www.postgresql.org/docs/7.3/static/app-postmaster.html


In modern versions, they are the same thing. The Eclipse example uses 
"postgres", which starts the regular server now, but in 7.3 only started 
single user mode. Change where you run the program to use "postmaster" 
instead and it should work more like what you're expecting.


Doing something useful with the TelegraphCQ code is probably going to 
take you a while.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Greg Smith

Alvaro Herrera wrote:

As for gain, I have heard of test setups requiring hours of runtime in
order to prime the buffer cache.
  


And production ones too.  I have multiple customers where a server 
restart is almost a planned multi-hour downtime.  The system may be back 
up, but for a couple of hours performance is so terrible it's barely 
usable.  You can watch the MB/s ramp up as the more random data fills in 
over time; getting that taken care of in a larger block more amenable to 
elevator sorting would be a huge help.


I never bothered with this particular idea though because shared_buffers 
is only a portion of the important data.  Cedric's pgfincore code digs 
into the OS cache, too, which can then save enough to be really useful 
here.  And that's already got a snapshot/restore feature.  The slides at 
http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into 
to that, pages 30 through 34 are the neat ones.  That provides some 
other neat APIs for preloading popular data into cache too.  I'd rather 
work on getting something like that into core, rather than adding 
something that only is targeting just shared_buffers.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-03 Thread Greg Smith

Kevin Grittner wrote:

Check where the plan goes from a table scan to an indexed access.
Also look at what is showing for SIRead locks in pg_locks as you go.
Between those two bits of information, it should become apparent.


OK, so this doesn't look to be an index lock related thing at all here.  
Updated test case does this to create the table and show some additional 
state:


drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,348) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
explain analyze select * from t where id = 2;
select pid,locktype,relation::regclass,page,tuple from pg_locks where 
mode='SIReadLock';

insert into t (id, value) values (-2, 1);
select pid,locktype,relation::regclass,page,tuple from pg_locks where 
mode='SIReadLock';


Do the same thing as before on the second process:

begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;

Then return to the first client to commit.  When I execute that with 348 
records, the case that fails, it looks like this:


gsmith=# explain analyze select * from t where id = 2;
QUERY 
PLAN

Seq Scan on t  (cost=0.00..6.35 rows=2 width=16) (actual 
time=0.106..0.286 rows=1 loops=1)

  Filter: (id = 2)
Total runtime: 0.345 ms
(3 rows)

gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks 
where mode='SIReadLock';

pid  | locktype | relation | page | tuple
--+--+--+--+---
1495 | relation | t|  | 

So it's actually grabbing a lock on the entire table in that situation.  
The other client does the same thing, and they collide with the 
described serialization failure.


The minute I try that with table that is 349 rows instead, it switches 
plans:


gsmith=# explain analyze select * from t where id = 2;
 QUERY 
PLAN 
--
Bitmap Heap Scan on t  (cost=4.27..6.29 rows=2 width=16) (actual 
time=0.169..0.171 rows=1 loops=1)

  Recheck Cond: (id = 2)
  ->  Bitmap Index Scan on t_idx  (cost=0.00..4.27 rows=2 width=0) 
(actual time=0.144..0.144 rows=1 loops=1)

Index Cond: (id = 2)
Total runtime: 0.270 ms
(5 rows)

gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks 
where mode='SIReadLock';

pid  | locktype | relation | page | tuple
--+--+--+--+---
1874 | page | t_idx|1 | 
1874 | tuple| t|0 | 2

(2 rows)

Grabbing a lock on the index page and the row, as Dan explained it 
would.  This actually eliminates this particular serialization failure 
altogether here though, even with these still on the same table and 
index page.


So the root problem with Vlad's test isn't the index lock at all; it's 
heavy locking from the sequential scan that's executing on the trivial 
cases.  If he expands his tests to use a larger amount of data, such 
that the plan switches to a realistic one, his results with the new 
serialization mode may very well be more satisfying.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-03 Thread Greg Smith

Kevin Grittner wrote:

I don't think Vlad is being unreasonable here; he's provided a
test case demonstrating the behavior he'd like to see, and shown
it doesn't work as expected.

 
... on a toy table with contrived values.  How different is this

from the often-asked question about why a query against a four-line
table is not using the index they expect, and how can we expect it
to scale if it doesn't?


It's not, but in that case I've been known to show someone that the 
behavior they're seeing doesn't happen on a larger table.  My point was 
just that no one has really done that here yet:  provided an example 
showing SSI serialization working as a substitute for predicate locking 
in this sort of use case.  I trust that the theory is sound here, and 
yet I'd still like to see that demonstrated.  This is why I launched 
into making a less trivial test case.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-03 Thread Greg Smith

Dan Ports wrote:

Yes, you're right -- the current implementation of SSI only locks
indexes at the granularity of index pages. So although those
transactions don't actually access the same records, they're detected
as a conflict because they're on the same index page.


Let's try to demonstrate that with an update to Vlad's example.  Run 
this on a first client to generate the same type of table, but with an 
easy to vary number of rows in it:


drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,348) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
select * from t where id = 2;
insert into t (id, value) values (-2, 1);

Execute this on the second client:

begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;

And then return to the first one to run COMMIT.  I'm getting a 
serialization failure in that case.  However, if I increase the 
generate_series to create 349 rows (or more) instead, it works.  I don't 
see where it crosses a page boundary in table or index size going from 
348 to 349, so I'm not sure why I'm seeing a change happening there.  In 
both cases, there's only one non-header index block involved.


I don't think Vlad is being unreasonable here; he's provided a test case 
demonstrating the behavior he'd like to see, and shown it doesn't work 
as expected.  If we can prove that test does work on non-trivial sized 
tables, and that it only suffers from to-be-optimized broader locks than 
necessary, that would make a more compelling argument against the need 
for proper predicate locks.  I don't fully understand why this attempt I 
tried to do that is working the way it does though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-04-27 Thread Greg Smith

Daniel Farina wrote:

It seems like in general it lacks a feedback mechanism to figure things out 
settings
from workloads, instead relying on Greg Smith's sizable experience to
do some arithmetic and get you off the ground in a number of common cases.
  


To credit appropriately, the model used right now actually originated 
with a Josh Berkus spreadsheet, from before I was doing this sort of 
work full-time.  That's held up pretty well, but it doesn't fully 
reflect how I do things nowadays.  The recent realization that pgtune is 
actually shipping as a package for Debian/Ubuntu now has made realize 
this is a much higher profile project now, one that I should revisit 
doing a better job on.


Every time I've gotten pulled into discussions of setting parameters 
based on live monitoring, it's turned into a giant black hole--absorbs a 
lot of energy, nothing useful escapes from it.  I credit completely 
ignoring that idea altogether, and using the simplest possible static 
settings instead, as one reason I managed to ship code here that people 
find useful.  I'm not closed to the idea, just not optimistic it will 
lead anywhere useful.  That makes it hard to work on when there are so 
many obvious things guaranteed to improve the program that could be done 
instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-04-27 Thread Greg Smith

Shiv wrote:
 On the program I hope to learn as much about professional software 
engineering principles as PostgreSQL. My project is aimed towards 
extending and hopefully improving upon pgtune. If any of you have some 
ideas or thoughts to share. I am all ears!!


Well, first step on the software engineering side is to get a copy of 
the code in a form you can modify.  I'd recommend grabbing it from 
https://github.com/gregs1104/pgtune ; while there is a copy of the 
program on git.postgresql.org, it's easier to work with the one on 
github instead.  I can push updates over to the copy on postgresql.org 
easily enough, and that way you don't have to worry about getting an 
account on that server.


There's a long list of suggested improvements to make at 
https://github.com/gregs1104/pgtune/blob/master/TODO


Where I would recommend getting started is doing some of the small items 
on there, some of which I have already put comments into the code about 
but just not finished yet.  Some examples:


-Validate against min/max
-Show original value in output
-Limit shared memory use on Windows (see notes on shared_buffers at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more 
information)

-Look for postgresql.conf file using PGDATA environment variable
-Look for settings files based on path of the pgtune executable
-Save a settings reference files for newer versions of PostgreSQL (right 
now I only target 8.4) and allow passing in the version you're configuring.


A common mistake made by GSOC students is to dive right in to trying to 
make big changes.  You'll be more successful if you get practice at 
things like preparing and sharing patches on smaller changes first.


At the next level, there are a few larger features that I would consider 
valuable that are not really addressed by the program yet:


-Estimate how much shared memory is used by the combination of 
settings.  See Table 17-2 at 
http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those 
numbers aren't perfect, and improving that table is its own useful 
project.  But it gives an idea how they fit together.  I have some notes 
at the end of the TODO file on how I think the information needed to 
produce this needs to be passed around the inside of pgtune.


-Use that estimate to produce a sysctl.conf file for one platform; Linux 
is the easiest one to start with.  I've attached a prototype showing how 
to do that, written in bash.


-Write a Python-TK or web-based front-end for the program.

Now that I know someone is going to work on this program again, I'll see 
what I can do to clean some parts of it up.  There are a couple of 
things it's easier for me to just fix rather than to describe, like the 
way I really want to change how it adds comments to the settings it changes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


#!/bin/bash

# Output lines suitable for sysctl configuration based
# on total amount of RAM on the system.  The output
# will allow up to 50% of physical memory to be allocated
# into shared memory.

# On Linux, you can use it as follows (as root):
# 
# ./shmsetup >> /etc/sysctl.conf
# sysctl -p

# Early FreeBSD versions do not support the sysconf interface
# used here.  The exact version where this works hasn't
# been confirmed yet.

page_size=`getconf PAGE_SIZE`
phys_pages=`getconf _PHYS_PAGES`

if [ -z "$page_size" ]; then
  echo Error:  cannot determine page size
  exit 1
fi

if [ -z "$phys_pages" ]; then
  echo Error:  cannot determine number of memory pages
  exit 2
fi

shmall=`expr $phys_pages / 2`
shmmax=`expr $shmall \* $page_size` 

echo \# Maximum shared segment size in bytes
echo kernel.shmmax = $shmmax
echo \# Maximum number of shared memory segments in pages
echo kernel.shmall = $shmall

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving the memory allocator

2011-04-26 Thread Greg Smith

On 04/25/2011 05:45 PM, Andres Freund wrote:

The profile I used in this case was:

pgbench -h /tmp/ -p5433 -s 30 pgbench -S -T 20
   


I'd suggest collecting data from running this with "-M prepared" at some 
point too, so that you can get a basic idea which of these allocations 
are avoided when using prepared statements.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-04-25 Thread Greg Smith

On 04/25/2011 02:26 PM, Josh Berkus wrote:

Overall, I think the advantages to a faster/shorter CF cycle outweigh
the disadvantages enough to make it at least worth trying.  I'm willing
to run the first 1-week CF, as well as several of the others during the
9.2 cycle to try and make it work.
   


It will be interesting to see if it's even possible to get all the 
patches assigned a reviewer in a week.  The only idea I've come up with 
for making this idea more feasible is to really buckle down on the idea 
that all submitters should also be reviewing.  I would consider a fair 
policy to say that anyone who doesn't volunteer to review someone else's 
patch gets nudged toward the bottom of the reviewer priority list.  
Didn't offer to review someone else's patch?  Don't be surprised if you 
get bumped out of a week long 'fest.


This helps with two problems.  It helps fill in short-term reviewers, 
and in the long-term it makes submitters more competent.  The way things 
are setup now, anyone who submits a patch without having done a review 
first is very likely to get their patch bounced back; the odds of 
getting everything right without that background are low.  Ideally 
submitters might even start fixing their own patches without reviewer 
prompting, once they get into someone else's and realize what they 
didn't do.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync reliability

2011-04-25 Thread Greg Smith

On 04/23/2011 09:58 AM, Matthew Woodcraft wrote:

As far as I can make out, the current situation is that this fix (the
auto_da_alloc mount option) doesn't work as advertised, and the ext4
maintainers are not treating this as a bug.

See https://bugzilla.kernel.org/show_bug.cgi?id=15910
   


I agree with the resolution that this isn't a bug.  As pointed out 
there, XFS does the same thing, and this behavior isn't going away any 
time soon.  Leaving behind zero-length files in situations where 
developers tried to optimize away a necessary fsync happens.


Here's the part where the submitter goes wrong:

"We first added a fsync() call for each extracted file. But scattered 
fsyncs resulted in a massive performance degradation during package 
installation (factor 10 or more, some reported that it took over an hour 
to unpack a linux-headers-* package!) In order to reduce the I/O 
performance degradation, fsync calls were deferred..."


Stop right there; the slow path was the only one that had any hope of 
being correct.  It can actually slow things by a factor of 100X or more, 
worst-case.  "So, we currently have the choice between filesystem 
corruption or major performance loss":  yes, you do.  Writing files is 
tricky and it can either be slow or safe.  If you're going to avoid even 
trying to enforce the right thing here, you're really going to get 
really burned.


It's unfortunate that so many people are used to the speed you get in 
the common situation for a while now with ext3 and cheap hard drives:  
all writes are cached unsafely, but the filesystem resists a few bad 
behaviors.  Much of the struggle where people say "this is so much 
slower, I won't put up with it" and try to code around it is futile, and 
it's hard to separate out the attempts to find such optimizations from 
the legitimate complaints.


Anyway, you're right to point out that the filesystem is not necessarily 
going to save anyone from some of the tricky rename situations even with 
the improvements made to delayed allocation.  They've fixed some of the 
worst behavior of the earlier implementation, but there are still 
potential issues in that area it seems.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync reliability

2011-04-25 Thread Greg Smith

On 04/24/2011 10:06 PM, Daniel Farina wrote:

On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith  wrote:
   

There's still the "fsync'd a data block but not the directory entry yet"
issue as fall-out from this too.  Why doesn't PostgreSQL run into this
problem?  Because the exact code sequence used is this one:

open
write
fsync
close

And Linux shouldn't ever screw that up, or the similar rename path.  Here's
what the close man page says, from http://linux.die.net/man/2/close :
 

Theodore Ts'o addresses this *exact* sequence of events, and suggests
if you want that rename to definitely stick that you must fsync the
directory:

http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync
   


Not exactly.  That's talking about the sequence used for creating a 
file, plus a rename.  When new WAL files are being created, I believe 
the ugly part of this is avoided.  The path when WAL files are recycled 
using rename does seem to be the one with the most likely edge case.


The difficult case Tso's discussion is trying to satisfy involves 
creating a new file and then swapping it for an old one atomically.  
PostgreSQL never does that exactly.  It creates new files, pads them 
with zeros, and then starts writing to them; it also renames old files 
that are already of the correctly length.  Combined with the fact that 
there are always fsyncs after writes to the files, and this case really 
isn't exactly the same as any of the others people are complaining about.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync reliability

2011-04-22 Thread Greg Smith

On 04/22/2011 09:32 AM, Simon Riggs wrote:

OK, that's good, but ISTM we still have a hole during
RemoveOldXlogFiles() where we don't fsync or open/close the file, just
rename it.
   


This is also something that many applications rely upon working as hoped 
for here, even though it's not technically part of POSIX.  Early 
versions of ext4 broke that, and it caused a giant outcry of 
complaints.  
http://www.h-online.com/open/news/item/Ext4-data-loss-explanations-and-workarounds-740671.html 
has a good summary.  This was broken on ext4 from around 2.6.28 to 
2.6.30, but the fix for it was so demanded that it's even been ported by 
the relatively lazy distributions to their 2.6.28/2.6.29 kernels.


There may be a small window for metadata issues here if you've put the 
WAL on ext2 and there's a crash in the middle of rename.  That factors 
into why any suggestions I make about using ext2 come with a load of 
warnings about the risk of not journaling.  It's hard to predict every 
type of issue that fsck might force you to come to terms with after a 
crash on ext2, and if there was a problem with this path I'd expect it 
to show up as something to be reconciled then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync reliability

2011-04-22 Thread Greg Smith

Simon Riggs wrote:

We do issue fsync and then close, but only when we switch log files.
We don't do that as part of the normal commit path.
  


Since all these files are zero-filled before use, the space is allocated 
for them, and the remaining important filesystem layout metadata gets 
flushed during the close.  The only metadata that changes after 
that--things like the last access time--isn't important to the WAL 
functioning.  So the metadata doesn't need to be updated after a normal 
commit, it's already there.  There are two main risks when crashing 
while fsync is in the middle of executing a push out to physical 
storage: torn pages due to partial data writes, and other out of order 
writes.  The only filesystems where this isn't true are the copy on 
write ones, where the blocks move around on disk too.  But those all 
have their own more careful guarantees about metadata too.



The issue you raise above where "fsync is not safe for Write Ahead
Logging" doesn't sound good. I don't think what you've said has fully
addressed that yet. We could replace the commit path with O_DIRECT and
physically order the data blocks, but I would guess the code path to
durable storage has way too many bits of code tweaking it for me to
feel happy that was worth it.
  


As far as I can tell the CRC is sufficient protection against that.  
This is all data that hasn't really been committed being torn up here.  
Once you trust that the metadata problem isn't real, reordered writes 
are the only going to destroy things that are in the middle of being 
flushed to disk.  A synchronous commit mangled this way will be rolled 
back regardless because it never really finished (fsync didn't return); 
an asynchronous one was never guaranteed to be on disk.


On many older Linux systems O_DIRECT is a less reliable code path than 
than write/fsync is, so you're right that isn't necessarily a useful 
step forward.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-22 Thread Greg Smith

Andrew Dunstan wrote:
Personally, I want pgindent installed in /usr/local/ or similar. That 
way I can have multiple trees and it will work in all of them without 
my having to build it for each. What I don't want is for the installed 
patched BSD indent to conflict with the system's indent, which is why 
I renamed it. If you still think that's a barrier to easy use, then I 
think we need a way to provide hooks in the makefiles for specifying 
the install location, so we can both be satisfied.


I don't think there's a conflict here, because the sort of uses I'm 
worried about don't want to install the thing at all; just to run it.  I 
don't really care what "make install" does because I never intend to run 
it; dumping into /usr/local is completely reasonable for the people who do.


Since there's no script I know of other than your prototype, I don't 
think repackaging is likely to break anything. That makes it worth 
doing *now* rather than later.


But frankly, I'd rather do without an extra script if possible.


Fine, the renaming bit I'm not really opposed to.  The odds there's 
anyone using this thing that isn't reading this message exchange is 
pretty low.  There is the documentation backport issue if you make any 
serious change it though.  Maybe put the new version in another 
location, leave the old one where it is?


There's a fair number of steps to this though.  It's possible to 
automate them all such that running the program is trivial.  I don't 
know how we'd ever get that same ease of use without some sort of 
scripting for the whole process.  Could probably do it in a makefile 
instead, but I don't know if that's really any better.  The intersection 
between people who want to run this and people who don't have bash 
available is pretty slim I think.  I might re-write in Perl to make it 
more portable, but I think that will be at the expense of making it 
harder for people to tweak if it doesn't work out of the box.  More 
code, too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench \for or similar loop

2011-04-21 Thread Greg Smith

Kevin Grittner wrote:

I'm not clear on exactly what you're proposing there, but the thing
I've considered doing is having threads to try to keep a FIFO queue
populated with a configurable transaction mix, while a configurable
number of worker threads pull those transactions off the queue and...
  


This is like the beginning of an advertisement for how Tsung is useful 
for simulating complicated workloads.  The thought of growing pgbench to 
reach that level of capabilities makes my head hurt.


When faced with this same issue, the sysbench team decided to embed Lua 
as their scripting language; sample scripts:  
http://bazaar.launchpad.net/~sysbench-developers/sysbench/0.5/files/head:/sysbench/tests/db/


This isn't very well known because the whole MySQL community fracturing 
has impacted their ability to actually release this overhaul--seems like 
they spend all their time just trying to add support for each new engine 
of the month.  I don't even like Lua, yet this still seems like a much 
better idea than trying to build on top of the existing pgbench codebase.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench \for or similar loop

2011-04-21 Thread Greg Smith

Alvaro Herrera wrote:

Why do we have pgbench at all in the first place?  Surely we could
rewrite it in plpgsql with proper stored procedures.
  


pgbench gives you a driver program with the following useful properties:

1) Multiple processes are spawned and each gets its own connection
2) A time/transaction limit is enforced across all of the connections at 
once

3) Timing information is written to a client-side log file
4) The work of running the clients can happen on a remote system, so 
that it's possible to just test the server-side performance
5) The program is similar enough to any other regular client, using the 
standard libpq interface, that connection-related overhead should be 
similar to a real workload.


All of those have some challenges before you could duplicate them in a 
stored procedure context.


My opinion of this feature is similar to the one Aiden already 
expressed:  there's already so many ways to do this sort of thing using 
shell-oriented approaches (as well as generate_series) that it's hard to 
get too excited about implementing it directly in pgbench.  Part of the 
reason for adding the \shell and \setshell commands way to make tricky 
things like this possible without having to touch the pgbench code 
further.  I for example would solve the problem you're facing like this:


1) Write a shell script that generates the file I need
2) Call it from pgbench using \shell, passing the size it needs.  Have 
that write a delimited file with the data required.

3) Import the whole thing with COPY.

And next thing you know you've even got the CREATE/COPY optimization as 
a possibility to avoid WAL, as well as the ability to avoid creating the 
data file more than once if the script is smart enough.


Sample data file generation can be difficult; most of the time I'd 
rather solve in a general programming language.  The fact that simple 
generation cases could be done with the mechanism you propose is true.  
However, this only really helps cases that are too complicated to 
express with generate_series, yet not so complicated that you really 
want a full programming language to generate the data.  I don't think 
there's that much middle ground in that use case.


But if this is what you think makes your life easier, I'm not going to 
tell you you're wrong.  And I don't feel that your desire for this 
features means you must tackle a more complicated thing instead--even 
though what I personally would much prefer is something making this sort 
of thing easier to do in regression tests, too.  That's a harder 
problem, though, and you're only volunteering to solve an easier one 
than that.


Stepping aside from debate over usefulness, my main code concern is that 
each time I look at the pgbench code for yet another tacked on bit, it's 
getting increasingly creakier and harder to maintain.  It's never going 
to be a good benchmark driver program capable of really complicated 
tasks.  And making it try keeps piling on the risk of breaking it for 
its intended purpose of doing simple tests.  If you can figure out how 
to keep the code contortions to implement the feature under control, 
there's some benefit there.  I can't think of a unique reason for it; 
again, lots of ways to solve this already.  But I'd probably use it if 
it were there.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync reliability

2011-04-21 Thread Greg Smith

On 04/21/2011 04:26 AM, Simon Riggs wrote:

However, that begs the question of what happens with WAL. At present,
we do nothing to ensure that "the entry in the directory containing
the file has also reached disk".
   



Well, we do, but it's not obvious why that is unless you've stared at 
this for far too many hours.  A clear description of the possible issue 
you and Dan are raising showed up on LKML a few years ago:  
http://lwn.net/Articles/270891/


Here's the most relevant part, which directly addresses the WAL case:

"[fsync] is unsafe for write-ahead logging, because it doesn't really 
guarantee any _ordering_ for the writes at the hard storage level.  So 
aside from losing committed data, it can also corrupt structural 
metadata.  With ext3 it's quite easy to verify that fsync/fdatasync 
don't always write a journal entry.  (Apart from looking at the kernel 
code :-)


Just write some data, fsync(), and observe the number of writes in 
/proc/diskstats.  If the current mtime second _hasn't_ changed, the 
inode isn't written.  If you write data, say, 10 times a second to the 
same place followed by fsync(), you'll see a little more than 10 write 
I/Os, and less than 20."


There's a terrible hack suggested where you run fchmod to force the 
journal out in the next fsync that makes me want to track the poster 
down and shoot him, but this part raises a reasonable question.


The main issue he's complaining about here is a moot one for 
PostgreSQL.  If the WAL rewrites have been reordered but have not 
completed, the minute WAL replay hits the spot with a missing block the 
CRC32 will be busted and replay is finished.  The fact that he's 
assuming a database would have such a naive WAL implementation that it 
would corrupt the database if blocks are written out of order in between 
fsync call returning is one of the reasons this whole idea never got 
more traction--hard to get excited about a proposal whose fundamentals 
rest on an assumption that doesn't turns out to be true on real databases.


There's still the "fsync'd a data block but not the directory entry yet" 
issue as fall-out from this too.  Why doesn't PostgreSQL run into this 
problem?  Because the exact code sequence used is this one:


open
write
fsync
close

And Linux shouldn't ever screw that up, or the similar rename path.  
Here's what the close man page says, from 
http://linux.die.net/man/2/close :


"A successful close does not guarantee that the data has been 
successfully saved to disk, as the kernel defers writes. It is not 
common for a filesystem to flush the buffers when the stream is closed. 
If you need to be sure that the data is physically stored use fsync(2). 
(It will depend on the disk hardware at this point.)"


What this is alluding to is that if you fsync before closing, the close 
will write all the metadata out too.  You're busted if your write cache 
lies, but we already know all about that issue.


There was a discussion of issues around this on LKML a few years ago, 
with Alan Cox getting the good pull quote at 
http://lkml.org/lkml/2009/3/27/268 : "fsync/close() as a pair allows the 
user to correctly indicate their requirements."  While fsync doesn't 
guarantee that metadata is written out, and neither does close, kernel 
developers seem to all agree that fsync-before-close means you want 
everything on disk.  Filesystems that don't honor that will break all 
sorts of software.


It is of course possible there are bugs in some part of this code path, 
where a clever enough test case might expose a window of strange 
file/metadata ordering.  I think it's too weak of a theorized problem to 
go specifically chasing after though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-21 Thread Greg Smith
're not participating in the thing that needs the 
most help.  This is not a problem you make better with fuzzy management 
directives to be nicer to people.  There are real software engineering 
issues about how to ensure good code quality at its core.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-21 Thread Greg Smith

On 04/21/2011 12:39 PM, Robert Haas wrote:

In fact, I've been wondering if we shouldn't consider extending the
support window for 8.2 past the currently-planned December 2011.
There seem to be quite a lot of people running that release precisely
because the casting changes in 8.3 were so painful, and I think the
incremental effort on our part to extend support for another year
would be reasonably small.


The pending EOL for 8.2 is the only thing that keeps me sane when 
speaking with people who refuse to upgrade, yet complain that their 8.2 
install is slow.  This last month, that seems to be more than usual "why 
does autovacuum suck so much?" complaints that would all go away with an 
8.3 upgrade.  Extending the EOL is not doing any of these users a 
favor.  Every day that goes by when someone is on a version of 
PostgreSQL that won't ever allow in-place upgrade is just making worse 
the eventual dump and reload they face worse.  The time spent porting to 
8.3 is a one-time thing; the suffering you get trying to have a 2011 
sized database on 2006's 8.2 just keeps adding up the longer you 
postpone it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-17 Thread Greg Smith

Andrew Dunstan wrote:
Now we could certainly make this quite a bit slicker. Apart from 
anything else, we should change the indent source code tarball so it 
unpacks into its own directory. Having it unpack into the current 
directory is ugly and unfriendly. And we should get rid of the "make 
clean" line in the install target of entab's makefile, which just 
seems totally ill-conceived.


I think the script I submitted upthread has most of the additional 
slickness needed here.  Looks like we both were working on documenting a 
reasonable way to do this at the same time the other day.  The idea of 
any program here relying on being able to write to /usr/local/bin as 
your example did makes this harder for people to run; that's why I made 
everything in the build tree and just pushed the appropriate directories 
into the PATH.


Since I see providing a script to automate this whole thing as the 
preferred way to make this easier, re-packaging the indent source 
tarball to extract to a directory doesn't seem worth the backwards 
compatibility trouble it will introduce.  Improving the entab makefile I 
don't have an opinion on.


It might also be worth setting it up so that instead of having to pass 
a path to a typedefs file on the command line, we default to a file 
sitting in, say, /usr/local/etc. Then you'd just be able to say 
"pgindent my_file.c".


OK, so I need to update my script to handle either indenting a single 
file, or doing all of them.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-17 Thread Greg Smith

Robert Haas wrote:

But it turns out that it doesn't really matter.  Whitespace or no
whitespace, if you don't read the diff before you hit send, it's
likely to contain some irrelevant cruft, whether whitespace changes or
otherwise.
  


Right.  Presuming that pgident will actually solve anything leaps over 
two normally incorrect assumptions:


-That the main tree was already formatted with pgident before you 
started, so no stray diffs will result from it touching things the 
submitter isn't even involved in.


-There is no larger code formatting or diff issues except for spacing.

This has been a nagging loose end for a while, so I'd like to see 
pgindent's rough edges get sorted out so it's easier to use.  But 
whitespace errors because of bad editors are normally just a likely sign 
of a patch with bigger problems, rather than something that can get 
fixed and then submissions is good.  There is no substitute for the 
discipline of reading your own diff before submission.  I'll easily 
obsess over mine for an hour before I submit something major, and that 
time is always well spent.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-17 Thread Greg Smith

Andrew Dunstan wrote:
What makes you think this isn't possible to run pgindent? There are no 
secret incantations.


The first hit newbies find looking for info about pgident is 
http://blog.hagander.net/archives/185-pgindent-vs-dash.html which sure 
looks like secret incantations to me.  The documentation 
src/tools/pgindent/README reads like a magic spell too:


   find . -name '*.[ch]' -type f -print | \
   egrep -v -f src/tools/pgindent/exclude_file_patterns | \
   xargs -n100 pgindent src/tools/pgindent/typedefs.list

And it doesn't actually work as written unless you've installed 
pgindent, entab/detab, and the specially patched NetBSD indent into the 
system PATH somewhere--unreasonable given that this may be executing on 
a source only tree that has never been installed..  The fact that the 
documention is only in the README and not with the rest of the code 
conventions isn't helping either.


The last time I tried to do this a few years ago I failed miserably and 
never came back.  I know way more about building software now though, 
and just got this to work for the first time.  Attached is a WIP wrapper 
script for running pgident that builds all the requirements into 
temporary directories, rather than expecting you to install anything 
system-wide or into a PostgreSQL destination directory.  Drop this into 
src/tools/pgindent, make it executable, and run it from that directory.  
Should do the right thing on any system that has "make" as an alias for 
"gmake" (TODO to be better about that in the file, with some other 
nagging things).


When I just ran it against master I got a bunch of modified files, but 
most of them look like things that have been touched recently so I think 
it did the right thing.  A test of my work here from someone who isn't 
running this for the first time would be helpful.  If this works well 
enough, I think it would make a good helper script to include in the 
distribution.  The loose ends to fix I can take care of easily enough 
once basic validation is finished.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


#!/bin/bash -ex

# Presume that we're in the pgindent directory at the start.
# TODO this should switch to the directory where this script is located at,
# in case someone calls src/tools/pgident/run-pgident

# Back to the base directory
pushd ../../..

# Sanity check we're in the right place
if [ ! -d src/tools/pgindent ] ; then
  echo run-pgindent can only be run from within the src/tools/pgindent 
directory, aborting
  popd
  exit 1
fi

echo pgindent setting up environment
wget ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
mkdir -p indent
cd indent
zcat ../indent.netbsd.patched.tgz | tar xvf -
rm -f indent.netbsd.patched.tgz
make
INDENT_DIR=`pwd`
cd ..

pushd src/tools/entab directory
make
ln -s entab detab
ENTAB_DIR=`pwd`
popd

export PATH="$INDENT_DIR:$ENTAB_DIR:$PATH"

wget -O src/tools/pgindent/typedefs.list 
http://buildfarm.postgresql.org/cgi-bin/typedefs.pl

# This cleanup can only happen if there is already a makefile; assume
# that if there's isn't, this tree is clean enough
if [ -f GNUmakefile ] ; then
  # TODO this may need to be "gmake" on some systems instead
  make maintainer-clean
fi

echo pgindent starting run
find . -name '*.[ch]' -type f -print | \
  egrep -v -f src/tools/pgindent/exclude_file_patterns | \
  xargs -n100 src/tools/pgindent/pgindent src/tools/pgindent/typedefs.list

# Cleanup of utilities built temporarily here
unlink src/tools/entab/detab
rm -rf indent

popd
echo pgindent run complete

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Radosław Smogura wrote:
Yes, but, hmm... in Netbeans I had really long gaps (probably 8 
spaces, from tabs), so deeper "ifs", comments at the and of variables, 
went of out my screen. I really wanted to not format this, but 
sometimes I needed.


The guide at 
http://www.open-source-editor.com/editors/how-to-make-netbeans-use-tabs-for-indention.html 
seems to cover how to fix this in Netbeans.  You want it to look like 
that screen shot:  4 spaces per indent with matching tab size of 4, and 
"Expand Tabs to Spaces" unchecked.


Generally, if you look at the diff you've created, and your new code 
doesn't line up right with what's already there, that means the 
tab/space setup isn't quite right when you were editing.  Reading the 
diff is useful for catching all sorts of other issues, too, so it's just 
generally a good practice.  As Peter already mentioned, the big problem 
here is that you checked in a modified configure file.


I also note that you use C++ style "//" comments, which aren't allowed 
under the coding guidelines--even though they work fine on many common 
platforms.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-04-16 Thread Greg Smith

Joshua Berkus wrote:

Then you can say that politely and firmly with direct reference to the problem, 
rather than making the submitter feel bad.
  


That's exactly what happened.  And then you responded that it was 
possible to use a patch without fixing the formatting first.  That's not 
true, and those of us who do patch review are tired of even trying. 




Our project has an earned reputation for being rejection-happy curmudgeons.  
This is something I heard more than once at MySQLConf, including from one 
student who chose to work on Drizzle instead of PostgreSQL for that reason.  I 
think that we could stand to go out of our way to be helpful to first-time 
submitters.
  


I'll trade you anecdotes by pointing out that I heard from half a dozen 
business people that the heavy emphasis on quality control and standards 
was the reason they were looking into leaving MySQL derived 
distributions for PostgreSQL.


I've spent days of time working on documentation to help new submitters 
get their patches improve to where they meet this community's 
standards.  This thread just inspired another round of that.  What 
doesn't help is ever telling someone they can ignore those and still do 
something useful we're interested in.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Robert Haas wrote:

The OP says that this patch maintains the WAL-before-data rule without any 
explanation of how it accomplishes that seemingly quite amazing feat.  I assume 
I'm going to have to read this patch at some point to refute this assertion, 
and I think that sucks.


I don't think you have to read any patch that doesn't follow the 
submission guidelines.  The fact that you do is a great contribution to 
the community.  But if I were suggesting how your time would be best 
spent improving PostgreSQL, "reviewing patches that don't meet coding 
standards" would be at the bottom of the list.  There's always something 
better for the project you could be working on instead.


I just added 
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned 
, recycling some existing text, adding some new suggestions.


I hope I got the tone of that text right.  The intention was to have a 
polite but clear place to point submitters to when their suggestion 
doesn't meet the normal standards here, such that they might even get 
bounced before even entering normal CommitFest review.  This MMAP patch 
looks like it has all 5 of the problems mentioned on that now more 
focused list.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Tom Lane wrote:

* On the other side of the coin, I have seen many a patch that was
written to minimize the length of the diff to the detriment of
readability or maintainability of the resulting code, and that's *not*
a good tradeoff.
  


Sure. that's possible.  But based on the reviews I've done, I'd say that 
the fact someone is even aware that minimizing their diff is something 
important to consider automatically puts them far ahead of the average 
new submitter.  There are a high percentage of patches where the 
submitter generates a diff and sents it without even looking at it.  
That a person would look at their diff and go too far without trying to 
make it small doesn't happen nearly as much.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Greg Smith

Joshua Berkus wrote:

Guys, can we *please* focus on the patch for now, rather than the formatting, 
which is fixable with sed?
  


Never, and that's not true.  Heikki was being nice; I wouldn't have even 
slogged through it long enough to ask the questions he did before 
kicking it back as unusable.  A badly formatted patch makes it 
impossible to evaluate whether the changes from a submission are 
reasonable or not without the reviewer fixing it first.  And you can't 
automate correcting it, it takes a lot of tedious manual work.  Start 
doing a patch review every CommitFest cycle and you very quickly realize 
it's not an ignorable problem.  And lack of discipline in minimizing 
one's diff is always a sign of other code quality issues.


Potential contributors to PostgreSQL should know that a badly formatted 
patch faces an automatic rejection, because no reviewer can work with it 
easily.  This fact is not a mystery; in fact it's documented at 
http://wiki.postgresql.org/wiki/Submitting_a_Patch :  "The easiest way 
to get your patch rejected is to make lots of unrelated changes, like 
reformatting lines, correcting comments you felt were poorly worded etc. 
Each patch should have the minimum set of changes required to fulfil the 
single stated objective."  I think I'll go improve that text 
next--something like "Ways to get your patch rejected" should be its own 
section.


The problem here isn't whether someone used an IDE or not, it's that 
this proves they didn't read their own patch before submitting it.  
Reading one's own diff and reflecting on what you've changed is one of 
the extremely underappreciated practices of good open-source software 
development.  Minimizing the size of that diff is perhaps the most 
important thing someone can do in order to make their changes to a piece 
of software better.  Not saying something that leads in that direction 
would be a disservice to the submitter.


P.S. You know what else I feel should earn an automatic rejection 
without any reviewer even looking at the code?  Submitting a patch that 
claims to improve performance and not attaching the test case you used, 
along with detailed notes about before/after tests on your own 
hardware.  A hand wave "it's faster" is never good enough, and it's 
extremely wasteful of our limited reviewer resources to try and 
duplicate what the submitter claimed.  Going to add something about that 
to the submission guidelines too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single client performance on trivial SELECTs

2011-04-15 Thread Greg Smith
I did some more research into the memory allocation hotspots found in 
the profile, in regards to what MySQL has done in those areas.  (And by 
"research" I mean "went to the bar last night and asked Baron about 
it")  The issue of memory allocation being a limiter on performance has 
been nagging that community for long enough that the underlying malloc 
used can even be swapped with a LD_PRELOAD trick:  
http://dev.mysql.com/doc/refman/5.5/en/mysqld-safe.html#option_mysqld_safe_malloc-lib


Plenty of people have benchmarked that and seen a big difference between 
the implementations; some sample graphs:


http://locklessinc.com/articles/mysql_performance/
http://blogs.sun.com/timc/entry/mysql_5_1_memory_allocator
http://mysqlha.blogspot.com/2009/01/double-sysbench-throughput-with_18.html

To quote from the last of those, "Malloc is a bottleneck for sysbench 
OLTP readonly", so this problem is not unique to PostgreSQL.  As of 5.5 
the better builds are all defaulting to TCMalloc, which is interesting 
but probably not as useful because it's focused on improving 
multi-threaded performance:  
http://goog-perftools.sourceforge.net/doc/tcmalloc.html


I'm not sure exactly what is useful to be learned from that specific 
work.  But it does suggest two things:  one, this is far from an easy 
thing to fix.  Two, the only reason MySQL does so well on it is because 
there was some focused work on it, taking a quite a while to accomplish, 
and involving many people.  Doing better for PostgreSQL is something I 
see as more of a long-term goal, rather than something it would be 
reasonable to expect quick progress on.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single client performance on trivial SELECTs

2011-04-14 Thread Greg Smith

Heikki Linnakangas wrote:
In this case you could just use prepared statements and get rid of all 
the parser related overhead, which includes much of the allocations.


Trying that gives me around 9200 TPS instead of 5500 on my laptop, so a 
pretty big gain here.  Will have to include that in my next round of 
graphs across multiple client loads once I'm home again and can run 
easily on my server.


To provide a matching profile from the same system as the one I already 
submitted from, for archival sake, here's what the profile I get looks 
like with "-M prepared":


samples  %image name   symbol name
33093 4.8518  postgres AllocSetAlloc
30012 4.4001  postgres hash_seq_search
27149 3.9803  postgres MemoryContextAllocZeroAligned
26987 3.9566  postgres hash_search_with_hash_value
25665 3.7628  postgres hash_any
16820 2.4660  postgres _bt_compare
14778 2.1666  postgres LockAcquireExtended
12263 1.7979  postgres AllocSetFreeIndex
11727 1.7193  postgres tas
11602 1.7010  postgres SearchCatCache
11022 1.6159  postgres pg_encoding_mbcliplen
10963 1.6073  postgres MemoryContextAllocZero
9296  1.3629  postgres MemoryContextCreate
8368  1.2268  postgres fmgr_isbuiltin
7973  1.1689  postgres LockReleaseAll
7423  1.0883  postgres ExecInitExpr
7309  1.0716  postgres     pfree

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Single client performance on trivial SELECTs

2011-04-14 Thread Greg Smith
This week several list regulars here waded into the MySQL Convention.  I 
decided to revisit PostgreSQL vs. MySQL performance using the sysbench 
program as part of that.  It's not important to what I'm going to 
describe to understand exactly what statements sysbench runs here or how 
to use it, but if anyone is curious I've got some more details about how 
I ran the tests in my talk slides at 
http://projects.2ndquadrant.com/talks  The program has recently gone 
through some fixes that make it run a bit better both in general and 
against PostgreSQL.  The write tests are still broken against 
PostgreSQL, but it now seems to do a reasonable job simulating a simple 
SELECT-only workload.  A fix from Jignesh recently made its way into the 
database generation side of the code that makes it less tedious to test 
with it too. 

The interesting part was how per-client scaling compared between the two 
databases; graph attached.  On my 8 core server, PostgreSQL scales 
nicely up to a steady 50K TPS.  I see the same curve, almost identical 
numbers, with PostgreSQL and pgbench--no reason to suspect sysbench is 
doing anything shady.  The version of MySQL I used hits around 67K TPS 
with innodb when busy with lots of clients.  That part doesn't bother 
me; nobody expects PostgreSQL to be faster on trivial SELECT statements 
and the gap isn't that big.


The shocking part was the single client results.  I'm using to seeing 
Postgres get around 7K TPS per core on those, which was the case here, 
and I never considered that an interesting limitation to think about 
before.  MySQL turns out to hit 38K TPS doing the same work.  Now that's 
a gap interesting enough to make me wonder what's going on.


Easy enough to exercise the same sort of single client test case with 
pgbench and put it under a profiler:


sudo opcontrol --init
sudo opcontrol --setup --no-vmlinux
createdb pgbench
pgbench -i -s 10 pgbench
psql -d pgbench -c "vacuum"
sudo opcontrol --start
sudo opcontrol --reset
pgbench -S -n -c 1 -T 60 pgbench
sudo opcontrol --dump ; sudo opcontrol --shutdown
opreport -l image:$HOME/pgwork/inst/test/bin/postgres

Here's the top calls, from my laptop rather than the server that I 
generated the graph against.  It does around 5.5K TPS with 1 clients and 
10K with 2 clients, so same basic scaling:


samples  %image name   symbol name
53548 6.7609  postgres AllocSetAlloc
32787 4.1396  postgres MemoryContextAllocZeroAligned
26330 3.3244  postgres base_yyparse
21723 2.7427  postgres hash_search_with_hash_value
20831 2.6301  postgres SearchCatCache
19094 2.4108  postgres hash_seq_search
18402 2.3234  postgres hash_any
15975 2.0170  postgres AllocSetFreeIndex
14205 1.7935  postgres _bt_compare
13370 1.6881  postgres core_yylex
10455 1.3200  postgres MemoryContextAlloc
10330 1.3042  postgres LockAcquireExtended
10197 1.2875  postgres ScanKeywordLookup
9312  1.1757  postgres MemoryContextAllocZero

I don't know nearly enough about the memory allocator to comment on 
whether it's possible to optimize it better for this case to relieve any 
bottleneck.  Might just get a small gain then push the limiter to the 
parser or hash functions.  I was surprised to find that's where so much 
of the time was going though.


P.S. When showing this graph in my talk, I pointed out that anyone who 
is making decisions about which database to use based on trivial SELECTs 
on small databases isn't going to be choosing between PostgreSQL and 
MySQL anyway--they'll be deploying something like MongoDB instead if 
that's the important metric.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


<>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgresql on multi-core CPU's: is this old news?

2011-04-06 Thread Greg Smith

On 04/05/2011 02:21 PM, Mischa Sandberg wrote:


Came across the following in a paper from Oct 2010. Was wondering is 
this is old news I missed in this group.


http://pdos.csail.mit.edu/papers/linux:osdi10.pdf

about Linux optimization on multi-core CPU's.



Only a little old; 
http://postgresql.1045698.n5.nabble.com/MIT-benchmarks-pgsql-multicore-up-to-48-performance-td3173545.html 
shows most of the obvious comments to be made about it.  There is more 
detail explaining why the hand-waving done in the paper about increasing 
NUM_LOCK_PARTITIONS is not a simple improvement at 
http://postgresql.1045698.n5.nabble.com/Lock-partitions-td1952557.html


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: [HACKERS] 2nd Level Buffer Cache

2011-03-31 Thread Greg Smith

On 03/24/2011 03:36 PM, Jim Nasby wrote:

On Mar 23, 2011, at 5:12 PM, Tom Lane wrote:
   

Robert Haas  writes:
 

It looks like the only way anything can ever get put on the free list
right now is if a relation or database is dropped.  That doesn't seem
too good.
   

Why not?  AIUI the free list is only for buffers that are totally dead,
ie contain no info that's possibly of interest to anybody.  It is *not*
meant to substitute for running the clock sweep when you have to discard
a live buffer.
 

Turns out we've had this discussion before: 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01088.php and 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00689.php
   


Investigating this has been on the TODO list for four years now:

http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php

I feel that work in this area is blocked behind putting together a 
decent mix of benchmarks that can be used to test whether changes here 
are actually good or bad.  All of the easy changes to buffer allocation 
strategy, ones that you could verify by inspection and simple tests, 
were made in 8.3.  The stuff that's left has the potential to either 
improve or reduce performance, and which will happen is very workload 
dependent.


Setting up systematic benchmarks of multiple workloads to run 
continuously on big hardware is a large, boring, expensive problem that 
few can justify financing (except for Jim of course), and even fewer 
want to volunteer time toward.  This whole discussion of cache policy 
tweaks is fun, but I just delete all the discussion now because it's 
just going in circles without a good testing regime.  The right way to 
start is by saying "this is the benchmark I'm going to improve with this 
change, and it has a profiled hotspot at this point".


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2011 Eager MV implementation proposal

2011-03-31 Thread Greg Smith

On 03/31/2011 04:38 PM, AAMIR KHAN wrote:
I would like to implement eager materialized view.An eager 
materialized view will be updated whenever the view changes. This is 
done with a system of triggers on all of the underlying tables.


Last summer someone worked on snapshot materialized views.  That didn't 
result in anything that could be committed.  If you wanted to work on 
trying to actually finish that, that might go somewhere useful.  There 
are a number of hard problems in getting a working implementation of 
materialized views that all get ignored by all of the student proposals 
we get, and what you're talking about doesn't address any of them.


You really should read all of the messages in the following threads:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00479.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00743.php
http://archives.postgresql.org/pgsql-hackers/2010-07/msg00396.php

And the following summaries:

http://wiki.postgresql.org/wiki/Materialized_Views_GSoC_2010
http://rhaas.blogspot.com/2010/04/materialized-views-in-postgresql.html

And then say how what you're suggesting fits into the issues raised last 
summer.  The theory and way to implement eager MVs are interesting 
problems.  But working on them won't lead toward code that can be 
committed to PostgreSQL this year.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [HACKERS] Performance Farm Release

2011-03-27 Thread Greg Smith
I just dusted off this code and brought it back to current again.  
Basically a lot of reformatting the new performance farm parts to 
minimize their diff.  Once that was done, all of the other buildfarm 
client updates since then applied cleanly.


The result is now sitting as a fork of Andrew's client code repo at 
https://github.com/greg2ndQuadrant/client-code , replacing the repo 
Scott published at https://github.com/slux/PostgreSQL-Performance-Farm ; 
much easier to avoid future bit-rot with this structure.


The main changes made here are now pretty easy to read on github:  
https://github.com/greg2ndQuadrant/client-code/commit/d0339a59ceb4711a6b042d3f1d053c77f07720f4 
and I've attached the code as a patch here.  It also uses some scripts 
from pgbench-tools that aren't the interesting part.


I got a nibble from Endpoint at PGEast about interest in hacking the 
buildfarm server code to add support for a new test type, so this may 
start moving forward again if that works out.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/build-farm.conf b/build-farm.conf
index 77ff694..fc4b103 100644
--- a/build-farm.conf
+++ b/build-farm.conf
@@ -39,6 +39,16 @@ my $branch;
 	 build_root => '/path/to/buildroot',
 	 use_vpath => undef, # set true to do vpath builds
 
+	 # Performance farm testing
+	 make_perf => 1, # set true to make performance farms, must keep on if not doing a build farm	 
+	 clients => [qw(8 12 24)],
+	 maxworkers => '12',
+	 sets => '1',
+	 scale => [qw(1 10)],
+	 scriptvar => '-S',
+	 transvar => '-T 60',
+	 pgopts => '-l',
+
  # path to directory with auxiliary web script 
  # if relative, the must be relative to buildroot/branch
 	 # possibly only necessary now on WIndows, if at all
@@ -150,6 +160,18 @@ my $branch;
 		 --with-openssl		 
 		 )],
 
+	# Options for performance tests		 
+	perf_opts => 
+	 [qw(
+		 --enable-nls
+		 --enable-integer-datetimes
+		 --with-perl 
+		 --with-python 
+		 --with-tcl
+		 --with-krb5 --with-includes=/usr/include/et
+		 --with-openssl		 
+		 )],
+
 	 # per-branch contents of extra config for check stage
 	 # each branch has an array of setting lines (no \n required)
 	 extra_config => 
diff --git a/run_build.pl b/run_build.pl
index 666268f..81487af 100755
--- a/run_build.pl
+++ b/run_build.pl
@@ -139,6 +139,14 @@ my ($buildroot,$target,$animal, $print_success, $aux_path, $trigger_filter,
 my  $scm_timeout_secs = 
 $PGBuild::conf{scm_timeout_secs} || $PGBuild::conf{cvs_timeout_secs}; 
 
+my $make_perf = $PGBuild::conf{make_perf};
+my $clients = $PGBuild::conf{clients};
+my $maxworkers = $PGBuild::conf{maxworkers};
+my $sets = $PGBuild::conf{sets};
+my $scale = $PGBuild::conf{scale};
+my $scriptvar= $PGBuild::conf{scriptvar};
+my $pgopts = $PGBuild::conf {pgopts};
+
 print scalar(localtime()),": buildfarm run for $animal:$branch starting\n"
 	if $verbose;
 
@@ -152,6 +160,7 @@ if (ref($force_every) eq 'HASH')
 }
 
 my $config_opts = $PGBuild::conf{config_opts};
+my $perf_opts = $PGBuild::conf{perf_opts};
 my $scm = new PGBuild::SCM \%PGBuild::conf;
 
 my $buildport;
@@ -261,6 +270,9 @@ chdir $buildroot || die "chdir to $buildroot: $!";
 
 mkdir $branch unless -d $branch;
 
+my $bench ="$branch/bench";
+mkdir $bench unless -d $bench;
+
 chdir $branch || die "chdir to $buildroot/$branch";
 
 # rename legacy status files/directories
@@ -542,6 +554,15 @@ set_last('run.snap',$current_snap) unless $nostatus;
 
 my $started_times = 0;
 
+# Additional setup for performance farm
+my $perf_farm = 0;
+my $time = time;
+my $benchpath = "$bench/$time";
+if($make_perf)
+{
+	$perf_farm = 1;
+}
+
 # each of these routines will call send_result, which calls exit,
 # on any error, so each step depends on success in the previous
 # steps.
@@ -593,6 +614,13 @@ foreach my $locale (@locales)
 
 	make_install_check($locale);
 
+	if($perf_farm)
+	{
+		system("mkdir $buildroot/$benchpath");
+		print time_str(),"running pgbench analysis ...\n" if $verbose;
+		benchmark();
+	}	
+
 	# releases 8.0 and earlier don't support the standard method for testing 
 	# PLs so only check them for later versions
 
@@ -739,6 +767,218 @@ sub check_make
 	return 'OK';
 }
 
+sub benchmark
+{
+	return if $skip_steps{benchmark};
+	my $script;
+	my @scripts;
+	my @path;
+	chdir "$installdir/bin/";
+	system("./createdb pgbench");
+	my ($scount,$setcount,$ccount,$workers);
+	my $shorthost;
+	foreach $scount (@$scale)
+	{
+		my @benchout = `./pgbench -i -s $scount pgbench >$devnull  2>&1`;
+		for($setcount = 0; $setcount<$sets; $setcount++)
+		{
+			foreach $ccount (@$clients)
+			{
+my $transvar = $PGBuild::conf{transvar};
+if

Re: [HACKERS] sync rep design architecture (was "disposition of remaining patches")

2011-02-25 Thread Greg Smith

Daniel Farina wrote:

Server A syncreps to Server B

Now I want to provision server A-prime, which will eventually take the
place of A.

Server A syncreps to Server B
Server A syncreps to Server A-prime

Right now, as it stands, the syncrep patch will be happy as soon as
the data has been fsynced to either B or A-prime; I don't think we can
guarantee at any point that A-prime can become the leader, and feed B.
  


One of the very fundamental breaks between how this patch implements 
sync rep and what some people might expect is this concern.  Having such 
tight control over the exact order of failover isn't quite here yet, so 
sometimes people will need to be creative to work within the 
restrictions of what is available.  The path for this case is probably:


1) Wait until A' is caught up
2) Switchover to B as the right choice to be the new master, with A' as 
its standby and A going off-line at the same time.
3) Switchover the master role from B to A'.  Bring up B as its standby. 


There are other possible transition plans available too.

I appreciate that you would like to do this as an atomic operation, 
rather than handling it as two steps--one of which puts you in a middle 
point where B, a possibly inferior standby, is operating at the master.  
There are a dozen other complicated "my use case says I want  and it 
must be done as " requests for Sync Rep floating around here, too.  
They're all getting ignored in favor of something smaller that can get 
built today. 

The first question I'd ask is whether you could you settle for this more 
cumbersome than you'd prefer switchover plan for now.  The second is 
whether implementing what this feature currently does would get in the 
way of coding of what you really want eventually. 

I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 
either.  But committing what was reasonable to include in that version 
let me march forward with very useful new code, doing another year of 
development on my own projects and getting some new things get fixed in 
core.  And so far it looks like 9.1 will sort out all of the kinks I was 
unhappy about.  The same sort of thing will need to happen to get Sync 
Rep committed and then appropriate for more use cases.  There isn't any 
margin left for discussions of scope creep left here; really it's "is 
this subset useful for some situations and stable enough to commit" now.



2. The unprivileged user can disable syncrep, in any situation. This
flexibility is *great*, but you don't really want people to do it when
one is performing the switchover.


For the moment you may have to live with a situation where user 
connections must be blocked during the brief moment of switchover to 
eliminate this issue.  That's what I end up doing with 9.0 production 
systems to get a really clean switchover, there's a second of hiccup 
even in the best case.  I'm not sure yet of the best way yet to build a 
UI to make that more transparent in the sync rep case.  It's sure not a 
problem that's going to get solved in this release though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] disposition of remaining patches

2011-02-23 Thread Greg Smith

Robert Haas wrote:

2. Synchronous replication.  Splitting up this patch has allowed some


This has gotten a bunch of review, on several different threads.  I
assume Simon will publish an update when he gets back to his
keyboard...
  


That was the idea.  If anyone has any serious concerns about the current 
patch, please don't hold off just because you know Simon is away for a 
bit.  We've been trying to keep that from impacting community progress 
too badly this week.


On top of 4 listed reviewers I know Dan Farina is poking at the last 
update, so we may see one more larger report on top of what's already 
shown up.  And Jaime keeps kicking the tires too.  What Simon was hoping 
is that a week of others looking at this would produce enough feedback 
that it might be possible to sweep the remaining issues up soon after 
he's back.  It looks to me like that's about when everything else that's 
still open will probably settle too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Greg Smith

Daniel Farina wrote:

As it will be somewhat hard to prove the durability guarantees of
commit without special heroics, unless someone can suggest a
mechanism.


Could you introduce a hack creating deterministic server side crashes in 
order to test this out?  The simplest thing that comes to mind is a rule 
like "kick shared memory in the teeth to force a crash after every 100 
commits", then see if #100 shows up as expected.  Pick two different 
small numbers for the interval and you could probably put that on both 
sides to simulate all sorts of badness.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-16 Thread Greg Smith

Andrew Dunstan wrote:
You're assuming a fact not in evidence, namely the existence of an 
identifiable group of "libedit folks". Last time I looked there was no 
such group.


There appear to be two people working periodically on the upstream 
NetBSD libedit:  
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/?sortby=date


And a third who periodically packages that at 
http://www.thrysoee.dk/editline/


Those are the group as far as I can tell. 

It's not encouraging that the Debian issue with libedit+UTF8 has been 
documented for almost year a now:  
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579729


(And we shouldn't assume that GnuTLS is the right replacement for 
OpenSSL either, BTW).


The idea of using NSS instead is an interesting one.  Looking at 
http://en.wikipedia.org/w/index.php?title=Comparison_of_TLS_Implementations 
it does seem to match the basic feature set of OpenSSL.  And the 
nss_compat_ossl compatibility layer might be useful: 
http://fedoraproject.org/wiki/Nss_compat_ossl


I find it hard to get excited about working to replace the software that 
has a reasonable license here (readline) rather than trying to eliminate 
dependence on the one with an unreasonable license (OpenSSL).


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-16 Thread Greg Smith

Tom Lane wrote:

I think that might be a good idea --- it'd reduce the cross-platform
variability of the results quite a bit, I suspect.  random() is not
to be trusted everywhere, but I think erand48 is pretty much the same
wherever it exists at all (and src/port/ provides it elsewhere).
  


Given that pgbench will run with threads in some multi-worker 
configurations, after some more portability research I think odds are 
good we'd get nailed by 
http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : "erand48 
implementation not thread safe but POSIX says it should be".  The AIX 
docs have a similar warning on them, so who knows how many versions of 
that library have the same issue.


Maybe we could make sure the one in src/port/ is thread safe and make 
sure pgbench only uses it.  This whole area continues to be messy enough 
that I think the patch needs to brew for another CF before it will all 
be sorted out properly.  I'll mark it accordingly and can pick this back 
up later.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-14 Thread Greg Smith

Markus Wanner wrote:

Anybody realized that this Debian bug (and several others) got closed in
the mean time (Sunday)?  According to the changelog [1], Martin Pitt
(which I'm CC'ing here, as he might not be aware of this thread, yet)
worked around this issue by pre-loading readline via LD_PRELOAD for psql.

Personally, I'm a bit suspicious about that solution (technically as
well as from a licensing perspective), but it's probably the simplest
way to let only psql link against readline.
  


This originated in 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608442 , and from what 
I'm reading there it sounds like Martin is inserting this as a 
workaround but it hasn't passed through Debian Legal yet.  I would 
expect them to reject this as unacceptable.  Dynamic linking via 
LD_PRELOAD is still linking, even if it happens at runtime.  I commend 
Martin for buying some time here by doing that, but this doesn't change 
the urgency to come up with an alternate solution much to me.  As I see 
it, that change could be reverted at any time via pushback from legal.


As far as working around this by releasing our own packages goes, that's 
useful, but I'd also characterize that as only a workaround rather than 
a real solution.  OpenSSL is open-source, but it's not "free software" 
via that standards of the FSF, which I feel is a completely reasonable 
position given the license.  When you depend on a software stack built 
from unambiguously free software, having components that aren't you've 
wedged in there and are dependent on is never a good idea.  I won't 
consider this truly resolved until GnuTLS support for PostgreSQL is in core.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-12 Thread Greg Smith

Dimitri Fontaine wrote:

Now, what I think I would do about the core package is a quite simple
backport of them, using Martin's excellent work.  Do we want our own QA
on them?  If yes, I think I would need some help here, maybe with some
build farm support for running from our debian packages rather than from
either CVS or git.
  


What the RPM packaging does is run this (approximately):

   pushd src/test/regress
   make all
   make MAX_CONNECTIONS=5 check

Something similar might be sufficient for QA on the Debian packaging 
too.  The overhead of the buildfarm makes sense if you want to rebuild 
after every single commit.  It may be overkill to go through that just 
for testing .deb packaging, which realistically you're only going to 
want to do after each point release.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-11 Thread Greg Smith

charles.mcdev...@emc.com wrote:

The GNU people will never be 100% satisfied by anything you do to psql, other 
than making it GPL.
Readline is specifically licensed in a way to try to force this (but many 
disagree with their ability to force this).
  


The "GNU people" are perfectly content with the license of PostgreSQL.  
They are unhappy with the license terms of OpenSSL, which is fair 
because they are ridiculous.  Eric Young and the rest of the 
contributors produced a useful piece of software, and made it considerly 
less valuable to the world due to the ego trip terms:  
http://www.openssl.org/source/license.html -- the worst specific problem 
is the requirement to acknowledge OpenSSL use in advertising of projects 
that use it.


The PostgreSQL community has had similar issues with popular software 
commonly used on top of PostgreSQL, that happened to use a non-standard 
license with unique terms.  It would be both hypocritical and incorrect 
to now blame the GNU projects for taking a similar stand on this one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-11 Thread Greg Smith

Stephen Frost wrote:

-Adding GnuTLS support to PostgreSQL would require solving several
code quality issues



I'm curious about this, but I don't know that I've got time to dive into
it and solve it. :/
  


Note that the past discussion was on the difficulty of matching the 
existing OpenSSL API using GnuTLS, which is apparently difficult to do.  
I wasn't trying to suggest there were issues specificially with GnuTLS's 
code quality.  It's more that the APIs are just different enough that 
it's not trivial to do a swap--which is surprising given how many people 
have seemingly needed to do exactly this conversion.  You'd think 
there'd be a simple "OpenSSL-like" interface available for GnuTLS by now 
or something.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



Re: [HACKERS] Debian readline/libedit breakage

2011-02-11 Thread Greg Smith
nown and serious bugs/limitations in libedit relative 
to libreadline
-Adding GnuTLS support to PostgreSQL would require solving several code 
quality issues


Idealogically, I find the worst offendor here to be the OpenSSL 
license.  From a license purity perspective I'd like to see their 
ridiculous requirements bypassed altogether by doing whatever is 
necessary to get GnuTLS support working.  But pragmatically, fixing the 
bugs and adding features to libedit may be the easier route here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [HACKERS] Spread checkpoint sync

2011-02-10 Thread Greg Smith
e/16384/16752 time=0.002 msec
2011-02-10 02:54:09 EST: DEBUG:  checkpoint sync: number=19 
file=base/16384/16761_fsm time=0.001 msec
2011-02-10 02:54:09 EST: DEBUG:  checkpoint sync: number=20 
file=base/16384/16749_vm time=0.001 msec
2011-02-10 02:54:09 EST: DEBUG:  checkpoint sync: number=21 
file=base/16384/16385 time=0.001 msec
2011-02-10 02:54:09 EST: DEBUG:  checkpoint sync: number=22 
file=base/16384/16761.1 time=175.575 msec
2011-02-10 02:54:10 EST: LOG:  checkpoint complete: wrote 242614 buffers 
(46.3%); 0 transaction log file(s) added, 0 removed, 34 recycled; 
write=716.637 s, sync=54.659 s, total=772.976 s; sync files=22, 
longest=20.874 s, average=2.484 s


That's 12 minutes for the write phase, even though checkpoints should be 
happening every 5 minutes here.  With that bad of a write phase overrun, 
spread sync had no room to work, so no net improvement at all.  What is 
happening here is similar to the behavior I described seeing on my 
client system but didn't have an example to share until now.  During the 
write phase, looking at "Dirty:" in /proc/meminfo showed the value 
peaking at over 1GB while writes were happening, and eventually the 
background writer process wasn't getting any serious CPU time compared 
to the backends; this is what it looked like via ps:


%CPU %MEMTIME+ COMMAND
4001:51.28 /home/gsmith/pgwork/inst/spread-sync/bin/pgbench 
-f /home/gsmith/pgbench-tools

28.100:39.71 postgres: gsmith pgbench ::1(43871) UPDATE
2800:39.28 postgres: gsmith pgbench ::1(43875) UPDATE
28.100:39.92 postgres: gsmith pgbench ::1(43865) UPDATE
28.100:39.54 postgres: gsmith pgbench ::1(43868) UPDATE
2800:39.36 postgres: gsmith pgbench ::1(43870) INSERT
28.100:39.47 postgres: gsmith pgbench ::1(43877) UPDATE
1800:39.39 postgres: gsmith pgbench ::1(43864) COMMIT
18.100:39.78 postgres: gsmith pgbench ::1(43866) UPDATE
1800:38.99 postgres: gsmith pgbench ::1(43867) UPDATE
18.100:39.55 postgres: gsmith pgbench ::1(43872) UPDATE
18.100:39.90 postgres: gsmith pgbench ::1(43873) UPDATE
18.100:39.64 postgres: gsmith pgbench ::1(43876) UPDATE
18.100:39.93 postgres: gsmith pgbench ::1(43878) UPDATE
18.100:39.83 postgres: gsmith pgbench ::1(43863) UPDATE
1800:39.47 postgres: gsmith pgbench ::1(43869) UPDATE
18.100:40.11 postgres: gsmith pgbench ::1(43874) UPDATE
1000:11.91 [flush-9:1]
0027:43.75 [xfsdatad/6]
09.400:02.21 postgres: writer process

I want to make this problem go away, but as you can see spreading the 
sync calls around isn't enough.  I think the main write loop needs to 
get spread out more, too, so that the background writer is trying to 
work at a more reasonable pace.  I am pleased I've been able to 
reproduce this painful behavior at home using test data, because that 
much improves my odds of being able to isolate its cause and test 
solutions.  But it's a tricky problem, and I'm certainly not going to 
fix it in the next week.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-10 Thread Greg Smith

Stephen Frost wrote:

Just wondering, did you consider just calling random() twice and
smashing the result together..?
  


I did.  The problem is that even within the 32 bits that random() 
returns, it's not uniformly distributed.  Combining two of them isn't 
really going to solve the distribution problem, just move it around.  
Some number of lower-order bits are less random than the others, and 
which they are is implementation dependent.


Poking around a bit more, I just discovered another possible approach is 
to use erand48 instead of rand in pgbench, which is either provided by 
the OS or emulated in src/port/erand48.c  That's way more resolution 
than needed here, given that 2^48 pgbench accounts would be a scale of 
2.8M, which makes for a database of about 42 petabytes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-09 Thread Greg Smith
oncerned about actually appear.  Unfortunately I can't fit all of those 
tests in right now, as throwing around one of these 300GB data sets is 
painful--when you're only getting 72 TPS, looking for large scale 
patterns in the transactions takes a long time to do.  For example, if I 
really wanted a good read on how bad the data distribution skew due to 
small random range is, I'd need to let some things run for a week just 
for a first pass.


I'd like to see this go in, but the problems I've spotted are such that 
I would completely understand this being considered not ready by 
others.  Just having this patch available here is a very useful step 
forward in my mind, because now people can always just grab it and do a 
custom build if they run into a larger system.


Wavering between Returned with Feedback and Ready for Committer here.  
Thoughts?


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 55ca1e8..5b9b582 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -60,6 +60,8 @@
 #define INT64_MAX	INT64CONST(0x7FFF)
 #endif
 
+#define MAX_RANDOM_VALUE64	INT64_MAX
+
 /*
  * Multi-platform pthread implementations
  */
@@ -364,15 +366,84 @@ usage(const char *progname)
 		   progname, progname);
 }
 
+/*
+ * strtoint64 -- convert a string to 64-bit integer
+ *
+ * This function is a modified version of scanint8() from
+ * src/backend/utils/adt/int8.c.
+ *
+ */
+static int64
+strtoint64(const char *str)
+{
+	const char *ptr = str;
+	int64		result = 0;
+	int			sign = 1;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+
+		/*
+		 * Do an explicit check for INT64_MIN.	Ugly though this is, it's
+		 * cleaner than trying to get the loop below to handle it portably.
+		 */
+		if (strncmp(ptr, "9223372036854775808", 19) == 0)
+		{
+			result = -INT64CONST(0x7fff) - 1;
+			ptr += 19;
+			goto gotdigits;
+		}
+		sign = -1;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (!isdigit((unsigned char) *ptr))
+		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int64		tmp = result * 10 + (*ptr++ - '0');
+
+		if ((tmp / 10) != result)		/* overflow? */
+			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+		result = tmp;
+	}
+
+gotdigits:
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (*ptr != '\0')
+		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+
+	return ((sign < 0) ? -result : result);
+}
+
 /* random number generator: uniform distribution from min to max inclusive */
-static int
-getrand(int min, int max)
+static int64
+getrand(int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
 	 * being selected as do numbers between them.
 	 */
-	return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
+	return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
 }
 
 /* call PQexec() and exit() on failure */
@@ -887,7 +958,7 @@ top:
 		if (commands[st->state] == NULL)
 		{
 			st->state = 0;
-			st->use_file = getrand(0, num_files - 1);
+			st->use_file = (int) getrand(0, num_files - 1);
 			commands = sql_files[st->use_file];
 		}
 	}
@@ -1007,7 +1078,7 @@ top:
 		if (pg_strcasecmp(argv[0], "setrandom") == 0)
 		{
 			char	   *var;
-			int			min,
+			int64		min,
 		max;
 			char		res[64];
 
@@ -1019,15 +1090,15 @@ top:
 	st->ecnt++;
 	return true;
 }
-min = atoi(var);
+min = strtoint64(var);
 			}
 			else
-min = atoi(argv[2]);
+min = strtoint64(argv[2]);
 
 #ifdef NOT_USED
 			if (min < 0)
 			{
-fprintf(stderr, "%s: invalid minimum number %d\n", argv[0], min);
+fprintf(stderr, "%s: invalid minimum number " INT64_FORMAT "\n", argv[0], min);
 st->ecnt++;
 return;
 			}
@@ -1041,22 +1112,22 @@ top:
 	st->ecnt++;
 	return true;
 }
-max = atoi(var);
+max = strtoint64(var);
 			}
 			else
-max = atoi(argv[3]);
+max = strtoint64(argv[3]);
 
-			if (max < min || max > MAX_RANDOM_VALUE)
+			if (max < min || max > MAX_RANDOM_VALUE64)
 			{
-fprintf(stderr

[HACKERS] Allow pg_archivecleanup to ignore extensions

2011-02-07 Thread Greg Smith
One bit of feedback I keep getting from people who archive their WAL 
files is that the fairly new pg_archivecleanup utility doesn't handle 
the case where those archives are compressed.  As the sort of users who 
are concerned about compression are also often ones with giant archives 
they struggle to cleanup, they would certainly appreciate having a 
bundled utility to take care of that. 

The attached patch provides an additional option to the utility to 
provide this capability.  It just strips a provided extension off any 
matching file it considers before running the test for whether it should 
be deleted or not.  It includes updates to the usage message and some 
docs about how this might be used.  Code by Jaime Casanova and myself.


Here's an example of it working:

$ psql -c "show archive_command"
 archive_command  


cp -i %p archive/%f < /dev/null && gzip archive/%f
[Yes, I know that can be written more cleanly.  I call external scripts 
with more serious error handling than you can put into a single command 
line for this sort of thing in production.]


$ psql -c "select pg_start_backup('test',true)"
$ psql -c "select pg_stop_backup()"
$ psql -c "checkpoint"
$ psql -c "select pg_switch_xlog()"

$ cd $PGDATA/archive
$ ls
00010025.gz
00010026.gz
00010027.gz
00010028.0020.backup.gz
00010028.gz
00010029.gz

$ pg_archivecleanup -d -x .gz `pwd`  
00010028.0020.backup
pg_archivecleanup: keep WAL file 
"/home/gsmith/pgwork/data/archivecleanup/archive/00010028" 
and later
pg_archivecleanup: removing file 
"/home/gsmith/pgwork/data/archivecleanup/archive/00010025.gz"
pg_archivecleanup: removing file 
"/home/gsmith/pgwork/data/archivecleanup/archive/00010027.gz"
pg_archivecleanup: removing file 
"/home/gsmith/pgwork/data/archivecleanup/archive/00010026.gz"

$ ls
00010028.0020.backup.gz
00010028.gz
00010029.gz

We recenty got some on-list griping that pg_standby doesn't handle 
archive files that are compressed, either.  Given how the job I'm 
working on this week is going, I'll probably have to add that feature 
next.  That's actually an easier source code hack than this one, because 
of how the pg_standby code modularizes the concept of a restore command.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/contrib/pg_archivecleanup/pg_archivecleanup.c b/contrib/pg_archivecleanup/pg_archivecleanup.c
index 7989207..a95b659 100644
*** a/contrib/pg_archivecleanup/pg_archivecleanup.c
--- b/contrib/pg_archivecleanup/pg_archivecleanup.c
*** const char *progname;
*** 36,41 
--- 36,42 
  
  /* Options and defaults */
  bool		debug = false;		/* are we debugging? */
+ char	   *additional_ext = NULL;	/* Extension to remove from filenames */
  
  char	   *archiveLocation;	/* where to find the archive? */
  char	   *restartWALFileName; /* the file from which we can restart restore */
*** static void
*** 93,105 
--- 94,135 
  CleanupPriorWALFiles(void)
  {
  	int			rc;
+ 	int			chop_at;
  	DIR		   *xldir;
  	struct dirent *xlde;
+ 	char		walfile[MAXPGPATH];
  
  	if ((xldir = opendir(archiveLocation)) != NULL)
  	{
  		while ((xlde = readdir(xldir)) != NULL)
  		{
+ 			strncpy(walfile, xlde->d_name, MAXPGPATH);
+ 			/* 
+ 			 * Remove any specified additional extension from the filename
+ 			 * before testing it against the conditions below.
+ 			 */
+ 			if (additional_ext)
+ 			{
+ chop_at = strlen(walfile) - strlen(additional_ext);
+ /*
+  * Only chop if this is long enough to be a file name and the
+  * extension matches.
+  */
+ if ((chop_at >= (XLOG_DATA_FNAME_LEN - 1)) && 
+ 	(strcmp(walfile + chop_at,additional_ext)==0))
+ {
+ 	walfile[chop_at] = '\0';
+ 	/* 
+ 	 * This is too chatty even for regular debug output, but
+ 	 * leaving it in for program testing.
+ 	 */
+ 	if (false)
+ 		fprintf(stderr, 
+ 			"removed extension='%s' from file=%s result=%s\n",
+ 			additional_ext,xlde->d_name,walfile);
+ }
+ 			}
+ 
  			/*
  			 * We ignore the timeline part of the XLOG segment identifiers in
  			 * deciding whether a segment is still needed.	This ensures that
*** CleanupPriorWALFiles(void)
*** 113,122 
  			 * file. Note that this means files are not removed in the order
  			 * they were originally written, in case this worries you.
  			 */
! 			if (strlen(xlde->d_name) == XLO

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith

Kevin Grittner wrote:

There are occasional posts from those wondering why their read-only
queries are so slow after a bulk load, and why they are doing heavy
writes.  (I remember when I posted about that, as a relative newbie,
and I know I've seen others.)
  


Sure; I created http://wiki.postgresql.org/wiki/Hint_Bits a while back 
specifically to have a resource to explain that mystery to offer 
people.  But there's a difference between having a performance issue 
that people don't understand, and having a real bottleneck you can't get 
rid of.  My experience is that people who have hint bit issues run into 
them as a minor side-effect of a larger vacuum issue, and that if you 
get that under control they're only a minor detail in comparison.  Makes 
it hard to get too excited about optimizing them.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-07 Thread Greg Smith
The update on the work to push towards a bigger pgbench is that I now 
have the patch running and generating databases larger than any 
previously possible scale:


$ time pgbench -i -s 25000 pgbench
...
25 tuples done.
...
real258m46.350s
user14m41.970s
sys0m21.310s

$ psql -d pgbench -c "select 
pg_size_pretty(pg_relation_size('pgbench_accounts'));"

pg_size_pretty

313 GB

$ psql -d pgbench -c "select 
pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));"

pg_size_pretty

52 GB

$ time psql -d pgbench -c "select count(*) from pgbench_accounts"
  count   


25

real18m48.363s
user0m0.010s
sys0m0.000s

The only thing wrong with the patch sent already needed to reach this 
point was this line:


for (k = 0; k < naccounts * scale; k++)

Which needed a (int64) cast for the multiplied value in the middle there.

Unfortunately the actual test itself doesn't run yet.  Every line I see 
when running the SELECT-only test says:


client 0 sending SELECT abalance FROM pgbench_accounts WHERE aid = 1;

So something about the updated random generation code isn't quite right 
yet.  Now that I have this monster built, I'm going to leave it on the 
server until I can sort that out, which hopefully will finish up in the 
next day or so.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith

Cédric Villemain wrote:

Is it worth a new thread with the different IO improvements done so
far or on-going and how we may add new GUC(if required !!!) with
intelligence between those patches ? ( For instance, hint bit IO limit
needs probably a tunable to define something similar to
hint_write_completion_target and/or IO_throttling strategy, ...items
which are still in gestation...)
  


Maybe, but I wouldn't bring all that up right now.  Trying to wrap up 
the CommitFest, too distracting, etc.


As a larger statement on this topic, I'm never very excited about 
redesigning here starting from any point other than "saw a bottleneck 
doing  on a production system".  There's a long list of such things 
already around waiting to be addressed, and I've never seen any good 
evidence of work related to hint bits being on it.  Please correct me if 
you know of some--I suspect you do from the way you're brining this up.  
If we were to consider kicking off some larger work here, I would drive 
that by asking where the data supporting that work being necessary is at 
first.  It's hard enough to fix a bottleneck that's staring right at 
you, trying to address one that's just theorized is impossible.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2011-02-06 Thread Greg Smith
pening then is that there can be a 
long pause between the end of the write phase and when syncs start to 
happen, which I consider a good thing.  Gives the kernel a little more 
time to try and get writes moving out to disk.  Here's what that looks 
like on my development desktop:


2011-02-07 00:46:24 EST: LOG:  checkpoint starting: time
2011-02-07 00:48:04 EST: DEBUG:  checkpoint sync:  estimated segments=10
2011-02-07 00:48:24 EST: DEBUG:  checkpoint sync: naps=99
2011-02-07 00:48:36 EST: DEBUG:  checkpoint sync: number=1 
file=base/16736/16749.1 time=12033.898 msec
2011-02-07 00:48:36 EST: DEBUG:  checkpoint sync: number=2 
file=base/16736/16749 time=60.799 msec

2011-02-07 00:48:48 EST: DEBUG:  checkpoint sync: naps=59
2011-02-07 00:48:48 EST: DEBUG:  checkpoint sync: number=3 
file=base/16736/16756 time=0.003 msec

2011-02-07 00:49:00 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:49:00 EST: DEBUG:  checkpoint sync: number=4 
file=base/16736/16750 time=0.003 msec

2011-02-07 00:49:12 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:49:12 EST: DEBUG:  checkpoint sync: number=5 
file=base/16736/16737 time=0.004 msec

2011-02-07 00:49:24 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:49:24 EST: DEBUG:  checkpoint sync: number=6 
file=base/16736/16749_fsm time=0.004 msec

2011-02-07 00:49:36 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:49:36 EST: DEBUG:  checkpoint sync: number=7 
file=base/16736/16740 time=0.003 msec

2011-02-07 00:49:48 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:49:48 EST: DEBUG:  checkpoint sync: number=8 
file=base/16736/16749_vm time=0.003 msec

2011-02-07 00:50:00 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:50:00 EST: DEBUG:  checkpoint sync: number=9 
file=base/16736/16752 time=0.003 msec

2011-02-07 00:50:12 EST: DEBUG:  checkpoint sync: naps=60
2011-02-07 00:50:12 EST: DEBUG:  checkpoint sync: number=10 
file=base/16736/16754 time=0.003 msec
2011-02-07 00:50:12 EST: LOG:  checkpoint complete: wrote 14335 buffers 
(43.7%); 0 transaction log file(s) added, 0 removed, 64 recycled; 
write=47.873 s, sync=127.819 s, total=227.990 s; sync files=10, 
longest=12.033 s, average=1.209 s


Since this is ext3 the spike during the first sync is brutal, anyway, 
but it tried very hard to avoid that:  it waited 99 * 200ms = 19.8 
seconds between writing the last buffer and when it started syncing them 
(00:42:04 to 00:48:24).  Given the slow write for #1, it was then 
behind, so it immediately moved onto #2.  But after that, it was able to 
insert a moderate nap time between successive syncs--60 naps is 12 
seconds, and it keeps that pace for the remainder of the sync.  This is 
the same sort of thing I'd worked out as optimal on the system this 
patch originated from, except it had a lot more dirty relations; that's 
why its naptime was the 3 seconds hard-coded into earlier versions of 
this patch.


Results on XFS with mini-server class hardware should be interesting...

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 4df69c2..f58ac3e 100644
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
*** static bool am_bg_writer = false;
*** 168,173 
--- 168,175 
  
  static bool ckpt_active = false;
  
+ static int checkpoint_flags = 0;
+ 
  /* these values are valid when ckpt_active is true: */
  static pg_time_t ckpt_start_time;
  static XLogRecPtr ckpt_start_recptr;
*** static pg_time_t last_xlog_switch_time;
*** 180,186 
  
  static void CheckArchiveTimeout(void);
  static void BgWriterNap(void);
! static bool IsCheckpointOnSchedule(double progress);
  static bool ImmediateCheckpointRequested(void);
  static bool CompactBgwriterRequestQueue(void);
  
--- 182,188 
  
  static void CheckArchiveTimeout(void);
  static void BgWriterNap(void);
! static bool IsCheckpointOnSchedule(double progress,double target);
  static bool ImmediateCheckpointRequested(void);
  static bool CompactBgwriterRequestQueue(void);
  
*** CheckpointWriteDelay(int flags, double p
*** 691,696 
--- 693,701 
  	if (!am_bg_writer)
  		return;
  
+ 	/* Cache this value for a later spread sync */
+ 	checkpoint_flags=flags;
+ 
  	/*
  	 * Perform the usual bgwriter duties and take a nap, unless we're behind
  	 * schedule, in which case we just try to catch up as quickly as possible.
*** CheckpointWriteDelay(int flags, double p
*** 698,704 
  	if (!(flags & CHECKPOINT_IMMEDIATE) &&
  		!shutdown_requested &&
  		!ImmediateCheckpointRequested() &&
! 		IsCheckpointOnSchedule(progress))
  	{
  		if (got_SIGHUP)
  		{
--- 703,709 
  	if (!(flags & CHECKPOINT_IMMEDIATE) &&

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-05 Thread Greg Smith

Tomas Vondra wrote:

Because when I create a database, the field is
NULL - that's true. But once I connect to the database, the stats are
updated and the field is set (thanks to the logic in pgstat.c).
  


OK--so it does what I was hoping for, I just didn't test it the right 
way.  Let's call that a documentation issue and move on.


Attached is an updated patch that fixes the docs and some other random 
bits.  Looks ready for committer to me now.  Make sure to adjust 
PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the 
new functions.


Below is what changed since the last posted version, mainly as feedback 
for Tomas:


-Explained more clearly that pg_stat_reset and 
pg_stat_reset_single_counters will both touch the database reset time, 
and that it's initialized upon first connection to the database.


-Added the reset time to the list of fields in pg_stat_database and 
pg_stat_bgwriter.


-Fixed some tab/whitespace issues.  It looks like you had tab stops set 
at 8 characters during some points when you were editing non-code 
files.  Also, there were a couple of spot where you used a tab while 
text in the area used spaces.  You can normally see both types of errors 
if you read a patch, they showed up as misaligned things in the context 
diff.


-Removed some extra blank lines that didn't fit the style of the 
surrounding code.


Basically, all the formatting bits I'm nitpicking about I found just by 
reading the patch itself; they all stuck right out.  I'd recommend a 
pass of that before submitting things if you want to try and avoid those 
in the future.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ca83421..100f938 100644
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*** postgres: user 
   
  
--- 267,273 
by backends (that is, not by the background writer), how many times
those backends had to execute their own fsync calls (normally the
background writer handles those even when the backend does its own
!   write), total buffers allocated, and time of last statistics reset.
   
   
  
*** postgres: user 
   
  
--- 278,286 
number of transactions committed and rolled back in that database,
total disk blocks read, total buffer hits (i.e., block
read requests avoided by finding the block already in buffer cache),
!   number of rows returned, fetched, inserted, updated and deleted, the
total number of queries cancelled due to conflict with recovery (on
!   standby servers), and time of last statistics reset.
   
   
  
*** postgres: user 
  
   
+   pg_stat_get_db_stat_reset_time(oid)
+   timestamptz
+   
+Time of the last statistics reset for the database.  Initialized to the
+system time during the first connection to each database.  The reset time
+is updated when you call pg_stat_reset on the
+database, as well as upon execution of
+pg_stat_reset_single_table_counters against any
+table or index in it.
+   
+  
+ 
+  
pg_stat_get_numscans(oid)
bigint

*** postgres: user bgwriter_lru_maxpages parameter

   
+  
+  
+   pg_stat_get_bgwriter_stat_reset_time()
+   timestamptz
+   
+ Time of the last statistics reset for the background writer, updated
+ when executing pg_stat_reset_shared('bgwriter')
+ on the database cluster.
+   
+  
  
   
pg_stat_get_buf_written_backend()
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 718e996..904714f 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_stat_database AS
*** 523,529 
  pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
  pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
  pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
! pg_stat_get_db_conflict_all(D.oid) AS conflicts
  FROM pg_database D;
  
  CREATE VIEW pg_stat_database_conflicts AS
--- 523,530 
  pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
  pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
  pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
! pg_stat_get_db_conflict_all(D.oid) AS conflicts,
! pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
  FROM pg_database D;
  
  CREATE VIEW pg_stat_database_conflicts AS
*** CREATE VIEW pg_stat_bgwriter AS
*** 570,576 **

Re: [HACKERS] Linux filesystem performance and checkpoint sorting

2011-02-04 Thread Greg Smith

Josh Berkus wrote:

So: Linux flavor?  Kernel version?  Disk system and PG directory layout?
  


OS configuration and PostgreSQL settings are saved into the output from 
the later runs (I added that somewhere in the middle):


http://www.2ndquadrant.us/pgbench-results/294/pg_settings.txt

That's Ubuntu 10.04, kernel 2.6.32. 

There is a test rig bug that queries the wrong PostgreSQL settings in 
the later ones, but they didn't change after #294 here.  The kernel 
configuration stuff is accurate through, which confirms exactly what 
settings for the dirty_* parameters was effective for each during the 
tests I was changing those around.


16GB of RAM, 8 Hyperthreaded cores (4 real ones) via Intel i7-870.  
Areca ARC-1210 controller, 256MB of cache.


Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sda1  40G  7.5G   30G  20% /
/dev/md1  838G   15G  824G   2% /stripe
/dev/sdd1 149G  2.1G  147G   2% /xlog

/stripe is a 3 disk RAID0, setup to only use the first section of the 
drive ("short-stroked").  That makes its performance a little more like 
a small SAS disk, rather than the cheapo 7200RPM SATA drives they 
actually are (Western Digital 640GB WD6400AAKS-65A7B).  /xlog is a 
single disk, 160GB WD1600AAJS-00WAA.  OS, server logs, and test results 
information all go to the root filesystem on a different drive.  My aim 
was to get similar performance to what someone with an 8-disk RAID10 
array might see, except without the redundancy.  Basic entry-level 
database server here in 2011.


bonnie++ on the main database disk:  read 301MB/s write 215MB/s, seeks 
423.4/second.  Measured around 10K small commits/second to prove the 
battery-backed write cache works fine.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linux filesystem performance and checkpoint sorting

2011-02-04 Thread Greg Smith

Mark Kirkwood wrote:
Are you going to do some runs with ext4? I'd be very interested to see 
how it compares (assuming that you are on a kernel version 2.6.32 or 
later so ext4 is reasonably stable...).


Yes, before I touch this system significantly I'll do ext4 as well, and 
this is running the Ubuntu 10.04 2.6.32 kernel so ext4 should be stable 
enough.  I have some PostgreSQL work that needs to get finished first 
though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith
As already mentioned in the broader discussion at 
http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com 
, I'm seeing no solid performance swing in the checkpoint sorting code 
itself.  Better sometimes, worse others, but never by a large amount.


Here's what the statistics part derived from the sorted data looks like 
on a real checkpoint spike:


2011-02-04 07:02:51 EST: LOG:  checkpoint starting: xlog
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 10 dirty blocks in 
relation.segment_fork 17216.0_2
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 159 dirty blocks in 
relation.segment_fork 17216.0_1
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 10 dirty blocks in 
relation.segment_fork 17216.3_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 548 dirty blocks in 
relation.segment_fork 17216.4_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 808 dirty blocks in 
relation.segment_fork 17216.5_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 799 dirty blocks in 
relation.segment_fork 17216.6_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 807 dirty blocks in 
relation.segment_fork 17216.7_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 716 dirty blocks in 
relation.segment_fork 17216.8_0
2011-02-04 07:02:51 EST: DEBUG:  BufferSync 3857 buffers to write, 8 
total dirty segment file(s) expected to need sync
2011-02-04 07:03:31 EST: DEBUG:  checkpoint sync: number=1 
file=base/16384/17216.5 time=1324.614 msec
2011-02-04 07:03:31 EST: DEBUG:  checkpoint sync: number=2 
file=base/16384/17216.4 time=0.002 msec
2011-02-04 07:03:31 EST: DEBUG:  checkpoint sync: number=3 
file=base/16384/17216_fsm time=0.001 msec
2011-02-04 07:03:47 EST: DEBUG:  checkpoint sync: number=4 
file=base/16384/17216.10 time=16446.753 msec
2011-02-04 07:03:53 EST: DEBUG:  checkpoint sync: number=5 
file=base/16384/17216.8 time=5804.252 msec
2011-02-04 07:03:53 EST: DEBUG:  checkpoint sync: number=6 
file=base/16384/17216.7 time=0.001 msec
2011-02-04 07:03:54 EST: DEBUG:  compacted fsync request queue from 
32768 entries to 2 entries
2011-02-04 07:03:54 EST: CONTEXT:  writing block 1642223 of relation 
base/16384/17216
2011-02-04 07:04:00 EST: DEBUG:  checkpoint sync: number=7 
file=base/16384/17216.11 time=6350.577 msec
2011-02-04 07:04:00 EST: DEBUG:  checkpoint sync: number=8 
file=base/16384/17216.9 time=0.001 msec
2011-02-04 07:04:00 EST: DEBUG:  checkpoint sync: number=9 
file=base/16384/17216.6 time=0.001 msec
2011-02-04 07:04:00 EST: DEBUG:  checkpoint sync: number=10 
file=base/16384/17216.3 time=0.001 msec
2011-02-04 07:04:00 EST: DEBUG:  checkpoint sync: number=11 
file=base/16384/17216_vm time=0.001 msec
2011-02-04 07:04:00 EST: LOG:  checkpoint complete: wrote 3813 buffers 
(11.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; 
write=39.073 s, sync=29.926 s, total=69.003 s; sync files=11, 
longest=16.446 s, average=2.720 s


You can see that it ran out of fsync absorption space in the middle of 
the sync phase, which is usually when compaction is needed, but the 
recent patch to fix that kicked in and did its thing.


Couple of observations:

-The total number of buffers I'm computing based on the checkpoint 
writes being sorted it not a perfect match to the number reported by the 
"checkpoint complete" status line.  Sometimes they are the same, 
sometimes not.  Not sure why yet.


-The estimate for "expected to need sync" computed as a by-product of 
the checkpoint sorting is not completely accurate either.  This 
particular one has a fairly large error in it, percentage-wise, being 
off by 3 with a total of 11.  Presumably these are absorbed fsync 
requests that were already queued up before the checkpoint even 
started.  So any time estimate I drive based off of this count is only 
going to be approximate.


-The order in which the sync phase processes files is unrelated to the 
order in which they are written out.  Note that 17216.10 here, the 
biggest victim (cause?) of the I/O spike, isn't even listed among the 
checkpoint writes!


The fuzziness here is a bit disconcerting, and I'll keep digging for why 
it happens.  But I don't see any reason not to continue forward using 
the rough count here to derive a nap time from, which I can then feed 
into the "useful leftovers" patch that Robert already refactored here.  
Can always sharpen up that estimate later, I need to get some solid 
results I can share on what the delay time does to the 
throughput/latency pattern next.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith

Michael Banck wrote:

On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote:
  

For example, the pre-release Squeeze numbers we're seeing are awful so
far, but it's not really done yet either. 



Unfortunately, it does not look like Debian squeeze will change any more
(or has changed much since your post) at this point, except for maybe
further stable kernel updates.  


Which file system did you see those awful numbers on and could you maybe
go into some more detail?
  


Once the release comes out any day now I'll see if I can duplicate them 
on hardware I can talk about fully, and share the ZCAV graphs if it's 
still there.  The server I've been running all of the extended pgbench 
tests in this thread on is running Ubuntu simply as a temporary way to 
get 2.6.32 before Squeeze ships.  Last time I tried installing one of 
the Squeeze betas I didn't get anywhere; hoping the installer bug I ran 
into has been sorted when I try again.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



[HACKERS] Linux filesystem performance and checkpoint sorting

2011-02-04 Thread Greg Smith
Switching to a new thread for this summary since there's some much more 
generic info here...at this point I've finished exploring the major 
Linux filesystem and tuning options I wanted to, as part of examining 
changes to the checkpoint code.  You can find all the raw data at 
http://www.2ndquadrant.us/pgbench-results/index.htm  Here are some 
highlights of what's been demonstrated there recently, with a summary of 
some of the more subtle and interesting data in the attached CSV file too:


-On ext3, tuning the newish kernel tunables dirty_bytes and 
dirty_background_bytes down to a lower level than was possible using the 
older dirty_*ratio ones shows a significant reduction in maximum latency 
on ext3; it drops to about 1/4 of the worst-case behavior.  
Unfortunately transactions per second takes a 10-15% hit in the 
process.  Not shown in the data there is that the VACUUM cleanup time 
between tests was really slowed down, too, running at around half the 
speed of when the system has a full-size write cache.


-Switching from ext3 to xfs gave over a 3X speedup on the smaller test 
set:  from the 600-700 TPS range to around 2200 TPS.  TPS rate on the 
larger data set actually slowed down a touch on XFS, around 10%.  Still, 
such a huge win when it's better makes it easy to excuse the occasional 
cases where it's a bit slower.  And the latency situation is just wildly 
better, the main thing that drove me toward using XFS more in the first 
place.  Anywhere from 1/6 to 1/25 of the worst-case latency seen on 
ext3.  With abusively high client counts for this hardware, you can 
still see >10 second pauses, but you don't see >40 second ones at 
moderate client counts like ext3 experiences.


-Switching to the lower possible dirty_*bytes parameters on XFS was 
negative in every way.  TPS was cut in half, and maximum latency 
actually went up.  Between this and the nasty VACUUM slowdown, I don't 
really see that much potential for these new tunables.  They do lower 
latency on ext3 a lot, but even there the penalty you pay for that is 
quite high.  VACUUM in particular seems to really, really benefit from 
having a giant write cache to dump its work into--possibly due to the 
way the ring buffer implementation avoids using the database's own cache 
for that work.


-Since earlier tests suggested sorting checkpoints gave little change on 
ext3, I started testing that with XFS instead.  The result is a bit 
messy.  At the lower scale, TPS went up a bit, but so did maximum 
latency.  At the higher scale, TPS dropped in some cases (typically less 
than 1%), but most latency results were better too.


At this point I would say checkpoint sorting remains a wash:  you can 
find workloads it benefits a little, and others it penalizes a little.  
I would say that it's neutral enough on average that if it makes sense 
to include for other purposes, that's unlikely to be a really bad change 
for anyone.  But I wouldn't want to see it committed by itself; there 
needs to be some additional benefit from the sorting before it's really 
worthwhile.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

"Compact fsync",,"ext3",,,"XFS + Regular Writes",,"Sorted Writes"
"scale","clients","tps","max_latency","XFS Speedup","tps","max_latency","tps","max_latency","TPS Delta","%","Latency Delta"
500,16,631,17116.31,3.49,2201,1290.73,2210,2070.74,9,0.41%,780.01
500,32,655,24311.54,3.37,2205,1379.14,2357,1971.2,152,6.89%,592.06
500,64,727,38040.39,3.11,2263,1440.48,2332,1763.29,69,3.05%,322.81
500,128,687,48195.77,3.2,2201,1743.11,2221,2742.18,20,0.91%,999.07
500,256,747,46799.48,2.92,2184,2429.74,2171,2356.14,-13,-0.60%,-73.6
1000,16,321,40826.58,1.21,389,1586.17,386,1598.54,-3,-0.77%,12.37
1000,32,345,27910.51,0.91,314,2150.94,331,2078.02,17,5.41%,-72.91
1000,64,358,45138.1,0.94,336,6681.57,320,6469.71,-16,-4.76%,-211.87
1000,128,372,47125.46,0.88,328,8707.42,330,9037.63,2,0.61%,330.21
1000,256,350,83232.14,0.91,317,11973.35,315,11248.18,-2,-0.63%,-725.17

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-03 Thread Greg Smith

Robert Haas wrote:

At least in my book, we need to get this committed in the next two
weeks, or wait for 9.2.
  


Yes, I was just suggesting that I was not going to get started in the 
first week or two given the other pgbench related tests I had queued up 
already.  Those are closing up nicely, and I'll start testing 
performance of this change over the weekend.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-03 Thread Greg Smith
Thinking I should start with why I think this patch is neat...most of 
the servers I deal with are up 24x7 minus small amounts of downtime, 
presuming everyone does their job right that is.  In that environment, 
having a starting timestamp for when the last stats reset happened lets 
you quickly compute some figures in per-second terms that are pretty 
close to actual average activity on the server.  Some examples of how I 
would use this:


psql -c "
SELECT
 CAST(buffers_backend * block_size AS numeric) / seconds_uptime / 
(1024*1024)

   AS backend_mb_per_sec
FROM
 (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime,
 (SELECT cast(current_setting('block_size') AS int8)) AS block_size
  FROM pg_stat_bgwriter) AS raw WHERE raw.seconds_uptime > 0
"
backend_mb_per_sec

  4.27150807681618

psql -c "
SELECT
 datname,CAST(xact_commit AS numeric) / seconds_uptime
   AS commits_per_sec
FROM
 (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime
  FROM pg_stat_database) AS raw WHERE raw.seconds_uptime > 0
"

 datname  |  commits_per_sec  
---+

template1 | 0.0338722604313051
postgres  | 0.0363144438470267
gsmith| 0.0820573653236174
pgbench   |  0.059147072347085

Now I reset, put some load on the system and check the same stats 
afterward; watch how close these match up:


$ psql -d pgbench -c "select pg_stat_reset()"
$ pgbench -j 4 -c 32 -T 30 pgbench
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 32
number of threads: 4
duration: 30 s
number of transactions actually processed: 6604
tps = 207.185627 (including connections establishing)
tps = 207.315043 (excluding connections establishing)

 datname  |  commits_per_sec  
---+

pgbench   |   183.906308135572

Both these examples work as I expected, and some playing around with the 
patch didn't find any serious problems with the logic it implements.  
One issue though, an oversight I think can be improved upon; watch what 
happens when I create a new database:


$ createdb blank
$ psql -c "select datname,stats_reset from pg_stat_database where 
datname='blank'"

datname | stats_reset
-+-
blank   |

That's not really what I would hope for here.  One major sort of 
situation I'd like this feature to work against is the one where someone 
asks for help but has never touched their database stats before, which 
is exactly what I'm simulating here.  In this case that person would be 
out of luck, the opposite of the experience I'd like a newbie to have at 
this point.


The logic Tomas put in here to initialize things in the face of never 
having a stat reset is reasonable.  But I think to really be complete, 
this needs to hook database creation and make sure the value gets 
initialized with the current timestamp, not just be blank.  Do that, and 
I think this will make a nice incremental feature on top of the existing 
stats structure.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   5   6   7   8   9   10   >