Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Greg Smith
rk. Linus was 
directly aiming to maximize the number of person-hours thrown at 
debugging and development, even at the possible cost of instability in 
the code and user-base burnout if any serious bug proved intractable." I 
continue to be disappointed at how contributing code to PostgreSQL is 
far more likely to come with a dose of argument and frustration rather 
than reward, and this discussion is a perfect example of such.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-07 Thread Greg Smith

Erik Rijkers wrote:

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).
  


Standard RAID10 only supports an even number of disks.  The 1E variants 
also allow putting an odd number in.  If you're using an even number, 
like in your case, the results are the same until you start losing 
drives, at which point the degradation performance pattern changes a bit 
due to differences in how things are striped.  See these for more info:


http://bytepile.com/raid_class.php and 
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1E

http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=/diricinfo/fqy0_craid1e.html

I don't think using RAID1E has anything to do with your results, but it 
is possible given your test configuration that part of the difference 
you're seeing relates to where on disk blocks are stored.  If you take a 
hard drive and write two copies of something onto it, the second copy 
will be a little slower than the first.  That's just because how drive 
speed drops over the surface as you move further along it.  There's some 
examples of what that looks like at 
http://projects.2ndquadrant.it/sites/default/files/pg-hw-bench-2010.pdf 
on pages 21-23.


Returning to your higher level results, one of the variables you weren't 
sure how to account for was caching effects on the standby--the 
possibility that it just didn't have the data cached the same as the 
master impacting results.  The usual way I look for that is by graphing 
the pgbench TPS over time.  In that situation, you can see it shoot 
upwards over time, very obvious pattern.  Example at 
http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf on 
pages 36,37.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Greg Smith

Bruce Momjian wrote:

Remember, delaying wal application just delays making the standby a
master and makes the slave data appear staler.  We can just tell people
that the larger their queries are, the larger this delay will be.  If
they want to control this, they can set 'statement_timeout' already.
  


While a useful defensive component, statement_timeout is a user setting, 
so it can't provide guaranteed protection against a WAL application 
denial of service from a long running query.  A user that overrides the 
system setting and kicks off a long query puts you right back into 
needing a timeout to ensure forward progress of standby replay.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Greg Smith

Yeb Havinga wrote:

Rob Wultsch wrote:

I can not imagine setting this value to anything other than a bool and
most of the time that bool would be -1.
That's funny because when I was reading this thread, I was thinking 
the exact opposite: having max_standby_delay always set to 0 so I know 
the standby server is as up-to-date as possible.


If you ask one person about this, you'll discover they only consider one 
behavior here sane, and any other setting is crazy.  Ask five people, 
and you'll likely find someone who believes the complete opposite.  Ask 
ten and carefully work out the trade-offs they're willing to make given 
the fundamental limitations of replication, and you'll arrive at the 
range of behaviors available right now, plus some more that haven't been 
built yet.  There are a lot of different types of database applications 
out there, each with their own reliability and speed requirements to 
balance.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Greg Smith

Heikki Linnakangas wrote:

Robert Haas wrote:
  

I am not convinced it will be unpredictable.  The only caveats that
I've seen so far are:

- You need to run ntpd.
- Queries will get cancelled like crazy if you're not using steaming
replication.



And also in situations where the master is idle for a while and then
starts doing stuff. That's the most significant source of confusion,
IMHO, I wouldn't mind the requirement of ntpd so much.
  


I consider it mandatory to include an documentation update here that 
says "if you set max_standby_delay > 0, and do not run something that 
regularly generates activity to the master like [example], you will get 
unnecessary query cancellation on the standby".  As well as something 
like what Josh was suggesting, adding warnings that this is "for 
advanced users only", to borrow his wording.  This is why my name has 
been on the open items list for a while now--to make sure I follow 
through on that.


I haven't written it yet because there were still changes to the 
underlying code being made up until moments before beta started, then 
this discussion started without a break between.  There are a clear set 
of user land things that can be done to make up the deficiencies in the 
state of the server code, but we won't even get to see how they work out 
in the field (feedback needed to improve the 9.1 design) if this 
capability goes away altogether.


Is it not clear that there are some people who consider the occasional 
bit of cancellation OK, because they can correct for at the application 
layer and they're willing to factor it in to their design if it allows 
using the otherwise idle HA standby?  I'm fine with expanding that 
section of the documentation too, to make it more obvious that's the 
only situation this aspect of HS is aimed at and suitable for.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-05 Thread Greg Smith

Greg Stark wrote:

On Thu, May 6, 2010 at 2:36 AM, Tom Lane  wrote:
  

One reason I believe this isn't so critical as all that is that it only
matters for cases where the operation on the master took an exclusive
lock.



Uhm, or a vacuum ran. Or a HOT page cleanup occurred, or a btree page
split deleted old tuples.
  


Right; because there are so many regularly expected causes for query 
cancellation, the proposed boolean setup really hurts the ability of a 
server whose primary goal is high-availability to run queries of any 
useful duration.  For years I've been hearing "my HA standby is idle, 
how can I put it to use?"; that's the back story of the users I thought 
everyone knew were the known audience waiting for this feature.


If the UI for vacuum_defer_cleanup_age that prevented these things was 
good, I would agree that the cases where max_standby_delay does 
something useful are marginal.  That's why I tried to get someone 
working on SR to provide a hook for that purpose months ago.  But since 
the vacuum adjustment we have in completely obtuse xid units, that 
leaves max_standby_delay as the only tunable here that you can even 
think about in terms of human time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On a somewhat disappointing correspondence

2010-05-05 Thread Greg Smith

Bruce Momjian wrote:

We are not very good at _removing_ functionality/GUCs, and based on the
discussion so far, I think there is a very slim chance we would get it
right for 9.0, which is why I suggested converting it to a boolean and
revisiting this for 9.1.
  


There's some feedback you can only get by exposing a complicated feature 
to the users and seeing what they make of it.  This one hasn't even had 
a full week to gather beta user reports.  Given that it's easy to 
disable (just limiting the range on what is effectively a 3-way switch 
to two positions), I don't understand why you're pushing at this point 
for its removal.  You could be encouraging testing instead, which I 
believe is needed to know exactly what the right thing to do in 9.1 is.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-05 Thread Greg Smith

Heikki Linnakangas wrote:

Let's rip out the concept of a delay altogether, and make it a boolean.
If you really want your query to finish, set it to -1 (using the current
max_standby_delay nomenclature). If recovery is important to you, set it
to 0.
  


So the only user options would be "allow long-running queries to block 
WAL application forever" and "always cancel queries on conflict?"  That 
would be taking away the behavior I was going to suggest as the default 
to many customers I work with.  I expect a non-trivial subset of people 
using this feature will set max_standby_delay to is some small number of 
minutes, similarly to how archive_timeout is sized now.  Enough time to 
get reasonably sized queries executed, not so long as to allow something 
that might try to run for hours on the standby to increase failover 
catchup time very much.


The way the behavior works is admittedly limited, and certainly some 
people are going to want to set it to either 0 or -1.  But taking it 
away altogether is going to cripple one category of potential Hot 
Standby use in the field.  Consider this for a second:  do you really 
think that Simon would have waded into this coding mess, or that I would 
have spent as much energy as I have highlighting issues with its use, if 
there wasn't demand for it?  If it wouldn't hurt the usefulness of 
PostgreSQL 9.0 significantly to cut it, I'd have suggested that myself 
two months ago and saved everyone (especially myself) a lot of trouble.



If you have the monitoring in place to sensibly monitor the delay
between primary and standby, and you want a limit on that, you can put
together a script to flip the switch in postgresql.conf if the standby
falls too much behind.
  


There's a couple of things you should do in order for max_standby_delay 
to working as well as it can.  Watching clock sync and forcing periodic 
activity are two of them that always come up.  Those are both trivial to 
script for, and something I wouldn't expect any admin to object to.


If you need a script that involves changing a server setting to do 
something, that translates into "you can't do that" for a typical DBA.  
The idea of a program regularly changing a server configuration setting 
on a production system is one you just can't sell.  That makes this idea 
incredibly more difficult to use in the field than any of the 
workarounds that cope with the known max_standby_delay issues.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Smith

Josh Berkus wrote:

Having built-in replication in PostgreSQL was supposed to give the *majority* 
of users a *simple*
option for 2-server failover, not cater only to the high end.


If that's your criteria, 9.0 has already failed that goal.  Just the 
fact that you have to make your own base backup and manage that whole 
area alone excludes "simple" as a goal anyone can claim 9.0 meets with a 
straight face, long before you get to the mechanics of how HS handles 
query cancellation.  The new replication oriented features are 
functional, but neither are close to simple yet.  Based on the 
complication level of replication in other database products, I wouldn't 
put money on that even being possible.  You can make a simpler path the 
default one, but the minute you want to support more than one use case 
the complexity involved in setting up replication explodes.


Anyway, I have no idea where the idea that recommending time 
synchronization is a somehow a "high end" requirement, given that every 
OS I'm aware of makes that trivial nowadays.  Slave servers that drift 
too far away from the master time are going to cause all sorts of 
problems for user apps too.  Any app that gauges how long ago something 
happened by comparing a database timestamp with now() is going to give 
misleading results for example, and I know I see those all the time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Smith

Tom Lane wrote:

1. The timestamps we are reading from the log might be historical,
if we are replaying from archive rather than reading a live SR stream.
In the current implementation that means zero grace period for standby
queries.  Now if your only interest is catching up as fast as possible,
that could be a sane behavior, but this is clearly not the only possible
interest --- in fact, if that's all you care about, why did you allow
standby queries at all?
  


If the standby is not current, you may not want people to execute 
queries against it.  In some situations, returning results against 
obsolete data is worse than not letting the query execute at all.  As I 
see it, the current max_standby_delay implementation includes the 
expectation that the results you are getting are no more than 
max_standby_delay behind the master, presuming that new data is still 
coming in.  If the standby has really fallen further behind than that, 
there are situations where you don't want it doing anything but catching 
up until that is no longer the case, and you especially don't want it 
returning stale query data.


The fact that tuning in that direction could mean the standby never 
actually executes any queries is something you need to monitor for--it 
suggests the standby isn't powerful/well connected to the master enough 
to keep up--but that's not necessarily the wrong behavior.  Saying "I 
only want the standby to execute queries if it's not too far behind the 
master" is the answer to "why did you allow standby queries at all?" 
when tuning for that use case.



2. There could be clock skew between the master and slave servers.
  


Not the database's problem to worry about.  Document that time should be 
carefully sync'd and move on.  I'll add that.



3. There could be significant propagation delay from master to slave,
if the WAL stream is being transmitted with pg_standby or some such.
Again this results in cutting into the standby queries' grace period,
for no defensible reason.
  


Then people should adjust their max_standby_delay upwards to account for 
that.  For high availability purposes, it's vital that the delay number 
be referenced to the commit records on the master.  If lag is eating a 
portion of that, again it's something people should be monitoring for, 
but not something we can correct.  The whole idea here is that 
max_standby_delay is an upper bound on how stale the data on the standby 
can be, and whether or not lag is a component to that doesn't impact how 
the database is being asked to act.



In addition to these fundamental problems there's a fatal implementation
problem: the actual comparison is not to the master's current clock
reading, but to the latest commit, abort, or checkpoint timestamp read
from the WAL.
Right; this has been documented for months at 
http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before 
that, i.e. "If there's little activity in the master, that can lead to 
surprising results."  The suggested long-term fix has been adding 
keepalive timestamps into SR, which seems to get reinvented every time 
somebody plays with this for a bit.  The HS documentation improvements 
I'm working on will suggest that you make sure this doesn't happen, that 
people have some sort of keepalive  WAL-generating activity on the 
master regularly, if they expect max_standby_delay to work reasonably in 
the face of an idle master.  It's not ideal, but it's straightforward to 
work around in user space.



I'm inclined to think that we should throw away all this logic and just
have the slave cancel competing queries if the replay process waits
more than max_standby_delay seconds to acquire a lock.  This is simple,
understandable, and behaves the same whether we're reading live data or
not.


I don't consider something that allows queries to execute when not 
playing recent "live" data is necessarily a step forward, from the 
perspective of implementations preferring high-availability.  It's 
reasonable for some people to request that the last thing a standby 
that's not current (last thing received) should be doing is answering any queries, when it 
doesn't have current data and it should be working on catchup instead.


Discussion here obviously has wandered past your fundamental objections 
here and onto implementation trivia, but I didn't think the difference 
between what you expected and what's actually committed already was 
properly addressed before doing that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Greg Smith

Erik Rijkers wrote:

  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)
  


To fill in from data you already mentioned upthread:
32 GB RAM
CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

Thanks for the all the reporting you've done here, really helpful.  
Questions to make sure I'm trying to duplicate the right thing here:


Is your disk array all configured as one big RAID10 volume, so 
essentially a 6-disk stripe with redundancy, or something else?  In 
particular I want know whether the WAL/database/archives are split onto 
separate volumes or all on one big one when you were testing. 


Is this is on ext3 with standard mount parameters?

Also, can you confirm that every test you ran only had a single pgbench 
worker thread (-j 1 or not specified)?  That looked to be the case from 
the ones I saw where you posted the whole command used.  It would not 
surprise me to find that the CPU usage profile of a standby is just 
different enough from the primary that it results in the pgbench program 
not being scheduled enough time, due to the known Linux issues in that 
area.  Not going to assume that, of course, just one thing I want to 
check when trying to replicate what you've run into. 

I didn't see any glaring HS performance issues like you've been 
reporting on last time I tried performance testing in this area, just a 
small percentage drop.  But I didn't specifically go looking for it 
either.  With your testing rig out of service, we're going to try and 
replicate that on a system here.  My home server is like a scaled down 
version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 
disks instead of 12) and it's running the same CentOS version.  If the 
problems really universal I should see it here too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing file in git repo

2010-05-03 Thread Greg Smith

Aidan Van Dyk wrote:

HINT:  It's all been done and posted to -hackers before too... Along
with comparisons on on whte "one-time" conversions fare (parsecvs,
cvs2svn/git), etc, as well as long discussion on which keyword you want
expanded, and which you don't, etc. bla bla bla...
  


And in some cases, even indexed to make them easier to find again due to 
links at 
http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git


There's an item on there that talks about Tom's specific requests 
related to git backbranch workflow, from the last time this came up.  
Given Andrew's breakthrough with the buildfarm today, and so many 
developers due to be in the same place here this month, that might be 
something worth working through the details of live during PGCon.  Most 
of the other steps beyond that are on Peter's plate, which isn't as 
concerning to me because we at least know for sure he's on-board with 
pushing toward the conversion.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator to /contrib in a later 9.0 beta

2010-05-03 Thread Greg Smith

Bruce Momjian wrote:

As a summary, let me list the migrations pg_migrator supports:
8.3 -> 8.4
8.4 -> 9.0
8.3 -> 9.0
Surprisingly, it is 8.3 -> 8.4 that has the most restrictions because it
doesn't have access to the features we added in Postgres 9.0.
Tom is right that the code could be cleaned up if we removed 8.3 -> 8.4,
but more importantly the documentation would be clearer.
  


I think it's extremely valuable that either 8.3 or 8.4 can be upgraded 
to 9.0.  But let's face it:  in the long run, the number of people who 
are going to use pg_migrator for a 8.3->8.4 migration, but that's 
haven't already done so, is a small user base.  The feature set 
improvement in 8.4 had a lot of great stuff, but few that were 
compelling from a "now I can do something completely impossible before!" 
standpoint.  As was noted recently during the "Native DB replication for 
PG" discussion over on pgsql-general last week, there are plenty of 
people happily running a stable 8.3 who just ignore 8.4 altogether for 
that reason.


The replication features in 9.0 are compelling in that way though, and I 
expect to see plenty of upgrades to that version from both 8.3 and 8.4 
installs.  If that works fine right now, I would prefer to see that 
documented as a special case two-versions at once situation that people 
shouldn't necessarily expect in the future, but certainly valuable to 
keep going if the maintenance burden isn't so bad.


Balancing out development reality with the ideal situation from the 
perspective of [potential|current] customers that I deal with every day, 
what I would prefer to see here is:


1) Commit a streamlined pg_migrator that only handles conversions with 
9.0 as a target into contrib, and ship it with 9.0.  Like Bruce, I had 
presumed that the discussion about whether that was going to happen 
would happen in parallel with beta (read:  right now), rather than its 
already being too late to even consider.  I think it's completely 
bizarre from an advocacy standpoint to even consider that you wouldn't 
ship such a tool with the core database, now that it's been around for 
long enough to have a positive track record.


2) Deprecate the pg_migrator hosted on pg_foundry as only being 
recommended for limited 8.3->8.4 upgrades.  Essentially stop active 
development on the version there, and focus on the one in contrib/ 
instead.  People who want an improved 8.3->8.4 tool can always contract 
with someone to backport fixes needed for their particular use case.  I 
think we're past the point where the community at large (meaning:  
mainly Bruce right now) should be expected to do that, now that 9.0 is 
coming out, so long as 8.3 to 9.0 conversions are available too.  I 
can't imagine suggesting to anyone that they upgrade in-place from 8.3 
to 8.4 right now.  Everybody I talk to who isn't already on 8.4 is 
delaying upgrades in anticipation of 9.0 later this year or early next.


My main issues with this project continuing to be hosted in pgfoundry are:

1) Perceived lack of confidence and/or legitimacy for it as an in-place 
upgrade solution, which would be a terrible PR move.  When people ask 
about in-place upgrades and I tell them "there's a tool you can download 
for that", they look at me in terror and ask if I'm serious that it 
isn't just included in the core code.  The improvement between answering 
that way and saying "yes, the tool for 8.3 and 8.4 is included with the 
core distribution", from the perspective of selling people on adopting 
PostgreSQL, cannot be overstated.


2) Anyone who looks at pgfoundry for more than a few minutes walks away 
covered with the scent of dead projects.  One reason for that is that 
related to how painful it is to develop there.  I don't want to reignite 
a full anti-pgfoundry discussion here.  Suffice it to say that there are 
many of us who just can't bear working with CVS anymore who have just 
given up on doing anything useful to projects hosted there.  Something 
that's in core (and therefore included in the git conversion already 
being published) is much easier to work with and submit patches 
against.  I'm already dumping git clones of the PG repo on every system 
I do serious work on.  If each of those were then capable of generating 
pg_migrator patches I could submit, I would actually do that each time I 
use the tool for an upgrade and notice how it could be improved.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

2010-04-29 Thread Greg Smith

Aidan Van Dyk wrote:

It's all about the path of least *suprise*.  My "least suprise" would
have been that a similar config I have now with my PITR slaves would
pretty much still work, and not suddenly start accepting connections,
and even worse, at some later date when I've already verified that it
was doing what I expected with the configuration.
  


The first time I setup a system to test HS, when I got to the point 
where the slave was up and running as a standard warm standby, I 
expected there to be something else I had to do in order for it to be 
available for queries.  When I fired up psql and I was able to run 
queries without doing anything extra, I was surprised--but it was that 
fun, everything just works when I expected it to be harder than that 
kind of surprise.


One of the reasons the version number was bumped up to 9.0 was to put 
people on warning that they should not assume their old setups would 
port forward without behavioral changes.  The fact that existing 
warm-standby server users will be surprised to find they can run queries 
without doing anything special could be considered under that banner.  
If you feel that's not obvious enough, that could argue for more 
prominent documentation of that fact, rather than turning it off.  The 
idea that it should be made harder to enable just to protect the 
expectations current users, and therefore introduce yet another place 
where PostgreSQL is less friendly to get started with than it could be, 
is backwards from the perspective of making things as easy as possible 
for new users.


Arguing from a usability standpoint needs to consider both new and 
existing user requirements, and those are quite opposed to one another 
in terms of what default makes more sense IMHO.  Now, if the argument is 
from the perspective of "this adds performance/reliability issues that 
weren't there before", and those go away if the feature is disabled by 
default, that's a respectable and indisputable reason to do so.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_start_backup and pg_stop_backup Re: [HACKERS] Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

2010-04-28 Thread Greg Smith

Dimitri Fontaine wrote:

IMO the real fun begins when we talk about multi-slaves support and
their roles (a failover slave wants the master to wait for it to have
applied the WAL before to commit, a reporting slave not so much). So
you'd set the Availability level on each slave and wouldn't commit on
the master until each slave got what it's configured for, or something
like that.
  


Ultimately the commit is stuck waiting for the slowest committing sync 
operation on the list; it's the bottleneck.  Let's presume that the 
commit waits can be done in parallel, after sending the transaction to 
every slave.  Given that and the situation you describe, having per-node 
sync levels only turns out to be a useful optimization if the reporting 
slave commits slower than the failover slave does.  The master is going 
to be stuck waiting for the slowest one of the batch regardless of 
whether you've optimized them individually.


There is a related situation that I think a per-node sync option would 
be more obviously useful for:  local failover slave, remote disaster 
recovery slave over a WAN, where you accept that a serious disaster 
taking out a whole data center will lose some transactions.  In that 
situation, you'd probably want fsync for the local slave, while going 
async for the remote datacenter.


If the commits are done in a serial fashion, tuning sync per-node would 
be much more valuable in many use cases.


Regardless, I wouldn't want to burden the first sync rep version with 
this requirement.  Let's wait until the current scope is cleared before 
trying to move the goalposts for the people working on that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_start_backup and pg_stop_backup Re: [HACKERS] Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

2010-04-28 Thread Greg Smith

Aidan Van Dyk wrote:

I remember a presentation at pgcon a while ago, it was probaly Fujii
(from NTT?) about their log streaming, and at that time, they talked
about different "sync" options...


It's all outlined at 
http://wiki.postgresql.org/wiki/Streaming_Replication#Synchronization_capability


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Greg Smith

Erik Rijkers wrote:

This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  That's 
often the cause of this type of difference between pgbench results on 
two systems, which easy to introduce when everyone is building from 
source.  You should try this on both systems:


psql -c "show debug_assertions"


just to rule that out.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-20 Thread Greg Smith

Jim Nasby wrote:

I've also seen large shared buffer settings perform poorly outside of IO 
issues, presumably due to some kind of internal lock contention. I tried 
running 8.3 with 24G for a while, but dropped it back down to our default of 8G 
after noticing some performance problems. Unfortunately I don't remember the 
exact details, let alone having a repeatable test case
We got a report for Jignesh at Sun once that he had a benchmark workload 
where there was a clear performance wall at around 10GB of 
shared_buffers.  At 
http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he says: 

"Shared Bufferpool getting better in 8.2, worth to increase it to 3GB 
(for 32-bit PostgreSQL) but still

not great to increase it more than 10GB (for 64-bit PostgreSQL)"

So you running into the same wall around the same amount just fuels the 
existing idea there's an underlying scalablity issue in there.  Nobody 
with that right hardware has put it under the light of a profiler yet as 
far as I know.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-19 Thread Greg Smith

Kevin Grittner wrote:

Perhaps, but be aware the current default benchmarked better
than a larger setting in bulk loads.
 
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php
 
The apparent reason is that when there were fewer of them the WAL

files were re-used before the RAID controller flushed them from BBU
cache, causing an overall reduction in disk writes.  I have little
doubt that *without* a good BBU cached controller a larger setting
is better, but it's not universally true that bigger is better on
this one


After running some tests, I believe what you observed is more universal 
than that, because I've been able to replicate a performance drop from a 
checkpoint_segments increase on a system without a BBWC (laptop with 
write caching turned off) where I really expected it to help.  My 
working theory is that are a broader set of situations where limiting 
the working set of WAL files to a small number in order to decrease 
cache disruption applies than just when you've got hardware caching 
involved.


However, I believe the guidelines to increasing this parameter along 
with shared_buffers still applies.  The real case for wins with more 
segments is when you also have a large buffer cache, because that's 
where the write savings from postponed database writes to often used 
blocks becomes easy to measure.  I've found it difficult today to 
demonstrate a slam-dunk bulk loading improvement through 
checkpoint_segments increase when shared_buffers is fixed at its default 
of ~32MB.  If that keeps up, I might soon have enough data to bust the 
idea that it alone improves bulk loading performance when you haven't 
touched anything else in the default config, which was unexpected to 
me.  Will report back once I've got a full handle on it.


Thanks for reminding me about this counter example, it slipped by in 
that broader thread before and I didn't try doing that myself until 
today, to see that you're onto something there.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-16 Thread Greg Smith

Robert Haas wrote:

Well, why can't they just hang out as dirty buffers in the OS cache,
which is also designed to solve this problem?
  


If the OS were guaranteed to be as suitable for this purpose as the 
approach taken in the database, this might work.  But much like the 
clock sweep approach should outperform a simpler OS caching 
implementation in many common workloads, there are a couple of spots 
where making dirty writes the OS's problem can fall down:


1) That presumes that OS write coalescing will solve the problem for you 
by merging repeat writes, which depending on implementation it might not.


2) On some filesystems, such as ext3, any write with an fsync behind it 
will flush the whole write cache out and defeat this optimization.  
Since the spread checkpoint design has some such writes going to the 
data disk in the middle of the currently processing checkpoing, in those 
situations that's likely to push the first write of that block to disk 
before it can be combined with a second.  If you'd have kept it in the 
buffer cache it might survive as long as a full checkpoint cycle longer..


3) The "timeout" as it were for shared buffers is driven by the distance 
between checkpoints, typically as long as 5 minutes.  The longest a 
filesystem will hold onto a write is probably less.  On Linux it's 
typically 30 seconds before the OS considers a write important to get 
out to disk, longest case; if you've already filled a lot of RAM with 
writes it can be substantially less.



I guess the obvious question is whether Windows "doesn't need" more
shared memory than that, or whether it "can't effectively use" more
memory than that.
  


It's probably can't effectively use.  We know for a fact that 
applications where blocks regularly accumulate high usage counts and 
have repeat read/writes to them, which includes pgbench, benefit in 
several easy to measure ways from using larger amounts of database 
buffer cache.  There's just plain old less churn of buffers going in and 
out of there.  The alternate explanation of "Windows is just so much 
better at read/write caching that you should give it most of the RAM 
anyway" doesn't really sound as probable as the more commonly proposed 
theory "Windows doesn't handle large blocks of shared memory well".


Note that there's no discussion of the why behind this is in the commit 
you just did, just the description of what happens.  The reasons why are 
left undefined, which I feel is appropriate given we really don't know 
for sure.  Still waiting for somebody to let loose the Visual Studio 
profiler and measure what's causing the degradation at larger sizes.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-16 Thread Greg Smith

Robert Haas wrote:

It seems intuitive to me that setting shared_buffers too small will
also cause a performance problem, especially for write-heavy
workloads, but I'm less sure I can clearly explain why.  


More text to add:

When the server needs to allocate more space for reading or writing 
blocks, and the next available space available is a block that's been 
modified but not used recently, that block will be written out to the 
operating system.  With large settings for shared_buffers, that prefers 
evicting blocks that are used infrequently from the cache.  The main 
downside to tuning in that direction is that all recently modified 
blocks not already written must be flushed to disk during each 
checkpoint, which can cause large amounts of disk writes grouped 
together.  But if shared_buffers is set too low instead, and therefore 
only a portion of the active working set can be kept in the buffer cache 
at once, that can cause the same block to be written out more frequently 
than is optimal.



And I'm curious why the correct setting is different on Windows than it is on
other platforms.  Can anyone shed some light on this?
  


No one has ever come up with a good explanation for why this is other 
than "Windows doesn't seem to like large amounts of shared memory".  But 
we've seen it show up in too many benchmarks to dismiss.Dave and 
Greg Stark did benchmarks focused on this:  
http://archives.postgresql.org/pgsql-hackers/2008-12/msg3.php that 
Magnus concurred with last time I tried to dig for more info about this 
specific subject.  And the last time I remember this caming up it was 
with someone who suggested 8MB (!) worked best on their Windows system:  
http://archives.postgresql.org/pgsql-general/2009-12/msg00475.php


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rogue TODO list created

2010-04-14 Thread Greg Smith

Bruce Momjian wrote:

http://wiki.postgresql.org/wiki/User:Simon

Well, unless Simon wants to keep it for some reason, it should be
removed, and if kept, renamed.  Simon?
  


I already retitled the copy left on the personal page and deleted the 
one that was causing the confusion.  I doubt anyone will accidentally 
consider official a page labeled "Simon's Work in Progress: Prioritised 
Todo" that's attached to User:Simon, that nothing links to, and that 
doesn't show up on the first set of results if you search for "todo" either.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rogue TODO list created

2010-04-14 Thread Greg Smith

Bruce Momjian wrote:

What is "Prioritised" Todo?  It looks like a copy of the TODO list that
was created on March 23, 2010, and only you and Simon have modified it:


http://wiki.postgresql.org/index.php?title=Prioritised_Todo&action=history
  


Well, the updates I made to that one were strictly an accident; I didn't 
notice I was editing the forked version.  I have put everything I did in 
that session back onto the right one.  The "Prioritised Todo" wasn't 
linked to anywhere that you'd find it except via a bit of bad late night 
searching like I did. 

I'm not sure what Simon was tinkering with there, but having fallen 
victim to it myself I agree having it there with that name is not a 
great choice.  I moved that bit of work in progress he was doing to 
http://wiki.postgresql.org/wiki/User:Simon and deleted the one with the 
confusing name.  Sorry about propagating my own confusion to others.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-14 Thread Greg Smith

Kevin Grittner wrote:

I wonder if we should add any hints telling people
what they might see as problems if they are too far one way or the
other.  (Or does that go beyond the scope of what makes sense in TFM?)
  


It's hard to figure that out.  One of the talks I'm doing at PGCon next 
month is focusing on how to monitor things when increasing 
shared_buffers and the related checkpoint parameters, so that you don't 
make things worse.  It's going to take a solid 45 minutes to cover that, 
and a section of the manual covering this bit of trivial would be a few 
pages long and hard to follow.  Maybe I'll get that in shape to insert 
into TFM eventually, but it's a bit bleeding edge to put into there 
now.  Trying to explain it live to other people a couple of times should 
make it clearer how to describe what I do.


As for updating the size recommendations, the text at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been 
beaten into the status quo by a number of people.  Here's what might 
make sense from there to insert into the docs, removing the bits 
referring to older versions, rewriting a bit for manual tone, and noting 
the checkpoint issues:


If you have a system with 1GB or more of RAM, a reasonable starting 
value for shared_buffers on a dedicated database server is 25% of the 
memory in your system. If you have less RAM, you'll have to account more 
carefully for how much memory the operating system is taking up, 
allocating a fraction of the free memory instead. There are some 
workloads where even larger settings for shared_buffers are effective.  
But given the way PostgreSQL also relies on the operating system cache, 
it's unlikely you'll find using more than 40% of RAM to work better than 
a smaller amount.


On Windows, large values for shared_buffers aren't as effective.  You 
may find better results keeping the setting relatively low and using the 
OS cache more instead. The useful size range for shared_buffers on 
Windows systems is generally from 64MB to 512MB of RAM.


Larger settings for shared_buffers usually require a corresponding 
increase in checkpoint_segments, in order to spread out writing large 
quantities of changed or new data in the cache over a longer period of time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

I just worry about any feature which doesn't get as far as a
user-visible implementation.  If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.
  


While they're limited, there are complexly viable prototype quality 
implementations possible here without a large amount of work to get them 
started.  I'm not worried too much about this feature being unused.  As 
I was just reminded when assembling an page on the wiki about it:  
http://wiki.postgresql.org/wiki/Materalized_Views it's currently ranked 
#1--by a large margin--on the UserVoice feature request survey that 
Peter kicked off.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

What would be the use case for (1) by itself?
  


There isn't any use case for just working on the infrastructure, just 
like there's no use case for "Syntax for partitioning" on its own.  That 
why people rarely work on that part of these problems--it's boring and 
produces no feature of value on its own.  I believe that in both cases, 
attempts to build the more complicated parts, ones that don't first 
address some of the core infrastructure first, will continue to produce 
only prototypes.


I don't want to see Materialized Views wander down the same path as 
partitioning, where lots of people produce "fun parts" patches, while 
ignoring the grunt work of things like production quality catalog 
support for the feature.  I think Pavel's proposal got that part right 
by starting with the grammar and executor setup trivia.  And Robert's 
comments about the details in that area it's easy to forget about hit 
the mark too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

There are basically 2 major parts for materialized views:
A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;
B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).
  


I'm run more into problems where it's perfectly fine to specify using 
the materialized view directly in the query, but keeping that view up to 
date usefully was the real problem.  The whole idea of getting a MV used 
automatically is valuable, but far down the roadmap as I see it.


Not everyone would agree of course, and your description does suggest a 
better way to organize a high-level summary though; here's a first cut:


1) Creation of materalized view
Current state:  using "CREATE TABLE AS" or similar mechanism, maintain 
manually
Optimal:  "CREATE MATERIALIZED VIEW" grammar, metadata to store MV data, 
dump/reload support


2) Updating materialized views
Current state:  periodically create new snapshots, or maintain using 
triggers
Optimal:  Built-in refresh via multiple strategies, with minimal locking 
as to improve concurrent access


3) Using materialized views in the planner
Current state:  specify the manually created MV in queries that can use it
Optimal:  Automatically accelerate queries that could be satisfied by 
substituting available MVs


With (1) being what I think is the only GSoC sized subset here.

I'm not saying someone can't jump right into (3), using the current 
implementations for (1) and (2) that are floating around out there.  I 
just think it would end up wasting a fair amount of work on prototypes 
that don't work quite the same way as the eventual fully integrated 
version.  You certainly can start working on (3) without a fully fleshed 
out implementation of (2), I don't know that it makes sense to work on 
before (1) though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Greg Smith

Robert Haas wrote:

I also think that you're underestimating the number of problems that
will have to be solved to get this done.  It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system.  (What should
be the value of pg_class.relkind?  Where should the node
representation of the snapshot query be stored?  And did we handle all
of those OID dependencies correctly?)
  


I don't think I'm underestimating all that, but I suspect Pavel is by a 
considerable amount.  This is why I've been suggesting that a GSoC scope 
here might just be wrestling with this area of the problem for the whole 
summer--not even getting into updates beyond a completely trivial 
implementation, if any at all.  Things like "handle OID dependencies" 
are definitely not on the fun side of the development work that people 
tend to think about in advance.



Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views.  I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner.


Exactly my concern; comitting this part without knowing how that's later 
going to fit into place strikes me the sort of the thing this project 
doesn't like to do.  The alternate approach of starting with the update 
machinery is less likely IMHO to get stuck wondering if there's a future 
blind spot coming or not, since you'd be building from the bottom up 
starting with the hardest parts.


From the rest of your comments, I'm comfortable that you're in sync 
with the not necessarily obvious risky spots here I wanted to raise 
awareness of.  It's unreasonable to expect we'll have exactly the same 
priorities  here, and I doubt it's useful to debate how I perceive the 
merit of various development subsets here compared to yourself.  I don't 
think it's really important whether anyone agrees with me or not about 
exactly the value of a full table lock implementation.  The main thing 
I'm concerned about is just that it's noted as a known risky part, one 
that could end up blocking the project's ability to commit even a subset 
of the proposed patch here.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Robert Haas wrote:

It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation.  Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.
  


There already is an initial implementation of sorts.  There are a couple 
of ways you can build these right now, so any new development has to 
look like it will end with good odds of being an improvement over what's 
already available before it's worth putting development resources into.


As a rough idea of what people want these for in the field, based on 
what I've seen requests for, imagine that someone has a 1TB table 
they're materializing a view on in order to get at least a 10:1, and 
hopefully close to a 100:1, speedup on viewing summary data.  Now, 
picture what happens if you have someone doing a sequential scan on the 
MV, which is still quite big, the updater process lines up to grab an 
exclusive lock when it's done, and now a second user wanting to read a 
single row quickly comes along behind it.  Given a full-table lock 
implementation, that scenario is unlikely to play out with the second 
user getting a fast response.  They'll likely sit in a lock queue for 
some potentially long period of time instead, waiting for the active seq 
scan to finish then the update to happen.  You have to build it that way 
or a steady stream of people reading could block out updates forever.


To be frank, that makes for a materalized view implementation of little 
value over what you can currently do as far as I'm concerned.  It might 
be interesting as a prototype, but that's not necessarily going to look 
like what's needed to do this for real at all.  I'm not a big fan of 
dumping work into projects when you can see exactly how it's going to 
fail before you even get started.  As I see if, if you know where it's 
going to fall down, you don't need to build a prototype as an exercise 
to show you how to build it--you should work on that part first instead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Heikki Linnakangas wrote:

Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing "CREATE
MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just
writing "CREATE TABLA AS ...". But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.
  


You've hit upon the core issue here.  You can build materialized views 
right now using "CREATE TABLE AS".  You can even update them by creating 
a new table the same way, with a new name, and doing the 
LOCK/RENAME/DROP shuffle--what people used to do for rewriting bloated 
tables before there was CLUSTER.  The first step in the proposal here is 
essentially syntax to give an easier UI for that.  That's an interesting 
step, but recognize that it doesn't actually provide anything you can't 
do already.


If you then note that doing any sort of incremental update to the view 
is a hard problem, and that a lot of the useful cases for materialized 
views involve tables where it's impractical to recreate the whole thing 
anyway, you'll inevitably find yourself deeply lost in the minutia of 
how to handle the updates.  It's really the core problem in building 
what people expect from a materialized view implementation in a serious 
database.  Chipping away at the other pieces around it doesn't move the 
feature that far forward, even if you get every single one of them 
except incremental updates finished, because everything else combined is 
still not that much work in comparison to the issues around updates.


There certainly are a fair number of subproblems you can break out of 
here.  I just think it's important to recognize that the path that leads 
to a useful GSoC project and the one that gives a production quality 
materialized view implementation may not have that much in common, and 
to manage expectations on both sides accordingly.  If Pavel thinks he's 
going to end up being able to say "I added materialized views to 
PostgreSQL" at the end of the summer, that's going to end in 
disappointment.  And if people think this project plan will lead to 
being able to claim PostgreSQL now has this feature, that's also not 
going to go well.  If the scope is "add initial grammar and rewriting 
moving toward a future materialized view feature", which the underlying 
implementation noted as a stub prototype, that might work out OK.  This 
is why I likened it to the work on "Syntax for partitioning", which has 
a similarly focused subgoal structure.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Greg Smith
 a way that will be useful (and by that 
I mean committable quality) until there's a better way to handle updates 
than writing a whole new table and grabbing a full relation lock to 
switch to it.  To do a good job just on the first two steps should take 
at least a whole summer anyway--there's a whole stack of background 
research needed I haven't seen anyone do yet, and that isn't on your 
plan yet.  There is a precedent for taking this approach.  After getting 
stalled trying to add the entirety of easy partitioning to PostgreSQL, 
the current scope has been scaled back to just trying to get the syntax 
and on-disk structure right, then finish off the implementation.  See 
http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how 
that's been broken into those two major chunks.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Greg Smith
An introduction to the current state of work in progress for adding 
improved partitioning features to PostgreSQL is documented at 
http://wiki.postgresql.org/wiki/Table_partitioning


If you can find a small, targeted piece of that overall plan that builds 
on the work already done, and is in the direction of the final goal 
here, you may be able to make useful progress in a few months time.  
This area is extremely well explored already.  There are 13 mailing list 
threads you'll need to browse through carefully just to have enough 
background that you're likely to build something new, rather than just 
wandering down a path that's already been followed but leads to a dead end.


You have picked a PostgreSQL feature that is dramatically more difficult 
than it appears to be, and I wouldn't expect you'll actually finish even 
a fraction of your goals in a summer of work.  You're at least in 
plentiful company--most students do the same.  As a rule, if you see a 
feature on our TODO list that looks really useful and fun to work on, 
it's only still there because people have tried multiple times to build 
it completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with improving the partitioning 
support that's built in to the database.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT constant; takes 15x longer on 9.0?

2010-04-06 Thread Greg Smith

Merlin Moncure wrote:

On Tue, Apr 6, 2010 at 3:58 PM, Tom Lane  wrote:
  

Greg has the right idea: show debug_assertions.



why not the entire set of configure options?
  


Given that the standard way to find those is pg_config, there's a couple 
of reasons why not to prefer that, on top of those Tom already mentioned:


1) pg_config isn't in the standard PostgreSQL package set in some 
distributions (it's sometimes in the -devel package), so it may not be 
available; debug_assertions is always there if you have psql.  For my 
goals, which include benchmarking scripts I often distribute to other 
people, that matters.


2) It's easy to get pg_config output from your client that doesn't 
actually match the running server, particularly when developing.  That's 
been the source of more than one of the times I was running a debug 
build on the server but didn't notice it, and therefore would have 
produced worthless performance numbers.  Given that the main slowdowns 
from having assertions turned on are server side, whether or not the 
local client running things like psql have them turned on or not doesn't 
worry me as much.


3) It's a little easier to check the value of "show" in a script to 
confirm you're not running a bad build than to parse the output from 
pg_config.  Here's the recipe I use for shell scripts:


#!/bin/sh
DEBUG=`psql -At -c "show debug_assertions"`
if [ "$DEBUG" = "on" ] ; then
 echo "Debug build - aborting performance test"
 exit 1
fi

Pushing this data into something like version() would solve the first 
two items above, while making the issue of how to parse the results in a 
test client even harder, given there's already too much junk in one big 
string there.  You couldn't make the above check much simpler, which 
makes it hard to justify any alternative approach to grab this data.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [HACKERS] SELECT constant; takes 15x longer on 9.0?

2010-04-06 Thread Greg Smith

David E. Wheeler wrote:

By stupidly having configured with --enable-cassert --enable-debug without 
realizing it. I've just rebuilt without them and run the tests again using the 
default postgresql.conf and I'm back down to 57s and 46s over two runs.
  


Every performance test I run, regardless of where the binaries come from 
or how I thought they were built, starts like this:


postgres=# show debug_assertions;
 debug_assertions 
--

 off
(1 row)


It's a really good habit to get into, or even enforce in your testing 
script if practical.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha release this week?

2010-04-05 Thread Greg Smith

Josh Berkus wrote:

I started with pgbench performance comparisons:
http://wiki.postgresql.org/wiki/Pgbenchtesting
  


I'd already created 
http://wiki.postgresql.org/wiki/Regression_Testing_with_pgbench for this 
purpose, and it looks like you started where I ended that, more or less, 
which is good because you didn't duplicate anything I'd already 
written.  I just recently finished a full exploration of how the 
multi-threaded pgbench ends up working in practice, and will 
update/merge those two as part of that once I get the full data 
published where people can look at it.  I've given up on expecting 
ad-hoc pgbench testing done without an extremely clear methodology to 
produce a lot of data, it's tough to get useful results out of that 
without a clear plan to follow for finding useful data points.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable_joinremoval

2010-03-29 Thread Greg Smith

Tom Lane wrote:

The problem with this line of thought is that it imagines you can look
at worked-out alternative plans.  You can't, because the planner doesn't
pursue rejected alternatives that far (and you'd not want to wait long
enough for it to do so...)
  


Not on any production system, sure.  I know plenty of people who would 
gladly let a rejected plan enumerator run for *a day* on their 
development box if it let them figure out exactly why the costing on the 
plan they expected ended up higher than the plan they actually get.  
While I know you don't run into this, regular people can easily spend a 
week on one such problem without gaining even that much insight, given 
the current level of instrumentation and diagnostic tools available.  
"Read the source" and "ask Tom" are both effective ways to resolve that 
but have their limits.  (Not because of you, of course--my bigger 
problem are people who just can't share their plans with the lists for 
privacy or security reasons)


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable_joinremoval

2010-03-29 Thread Greg Smith

Robert Haas wrote:

The query planner is a great piece of code but it
is not so transparently simple that it doesn't need debugging or
instrumentation, and "why did the planner do X" has got to be one of
our top ten most-frequently asked questions.


Debugging and instrumentation are two slightly different issues.  There 
is a lot more instrumentation needed in the query optimizer before 
people have better odds of understanding what's going on in this part of 
the database.  Recent features like pg_stat_statements and auto_explain 
are just the first round of what people really want here.  Now that we 
can get the explain data out in usable formats (XML, JSON, YAML) for a 
tool to manage them, the thing at the top of my list in this area for 
9.1 is to track down the rumored patch that exports information about 
the rejected plans considered and get that comitted.  That always seems 
what I want to look at for answering the question "why this plan instead 
of what I was expecting?"


Stepping away from that, from the debugging perspective it seems one way 
to answer the question "is this unexpected behavior being caused by the 
new join removal code or not?" is to provide a way to toggle it off and 
see what changes.  Much like enable_seqscan, just because we don't ever 
want people to use it in production doesn't necessarily mean it's a bad 
idea to expose it.


Also, given that this is a complicated feature, I think it's reasonable 
to ask whether allowing it to be turned off is the right thing just from 
the pragmatic basis that it provides a, ahem, backup plan in case 
there's unexpected difficulty with it in the field.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC Query

2010-03-29 Thread Greg Smith

gaurav gupta wrote:
My idea is to add a functionality of Auto tuning and Auto Indexing/ 
Reindexing in DB languages.


Ah, the classic request to start with the tuning and index wizards.  
Really fun to work on, always lots of interest in it.  Here's the 
thing:  projects in this area attract endless interest.  If it were 
possible to write something useful in a couple of months, we'd have a 
hundred such programs fighting for attention.  So the fact that we 
actually have zero of them should tell you something about the actual 
difficultly level of the work.  You could spend the whole summer just 
reading research papers on this topic and maybe catch up to the late 
90's by the end.


Here's the usual advice I give to students looking to make a useful 
contribution to any mature development project:  the more boring the 
work sounds, the more likely it is you'll actually do something people 
can use.  It's easy to find people who want to work on fun projects--so 
easy that they've all been done already.  What's left is either much 
harder than it looks, or kind of dull to do.  The idea behind 
intentionally picking a boring one is that you're more likely to get one 
that's unfinished for that reason, rather than because it's actually a 
year or two of work to complete.  Or, in the case you're asking about, a 
decade or three if you were to start from scratch and were really 
smart.  If you started working on this now rather than stopping to 
follow the research already done you might catch up to 
http://portal.acm.org/citation.cfm?id=810505 in a couple of months.


Similarly using the no. of select hits on a table we can check that if 
maximum no. of times it is on a non-index field we can index on that 
field to make select faster.


It's impractical to figure out where indexes should go at without 
simulating what the optimizer would then do with them against a sample 
set of queries.  You can't do anything useful just with basic statistics 
about the tables.


I would recommend 
http://msdn.microsoft.com/en-us/library/aa226167(SQL.70).aspx as a good, 
practical introduction to the topic of what it takes to figure out where 
indexes go at, from someone who came up with a reasonable solution to 
that problem.  You can find a list of the underlying research they cite 
(and an idea what has been done since then) at 
http://portal.acm.org/citation.cfm?id=673646


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread Greg Smith

David Christensen wrote:

Enclosed is a patch to add a -C option to initdb to allow you to easily append 
configuration directives to the generated postgresql.conf file for use in 
programmatic generation.


We had a patch not quite make it for 9.0 that switched over the 
postgresql.conf file to make it easy to scan a whole directory looking 
for configuration files:  
http://archives.postgresql.org/message-id/9837222c0910240641p7d75e2a4u2cfa6c1b5e603...@mail.gmail.com


The idea there was to eventually reduce the amount of postgresql.conf 
hacking that initdb and other tools have to do.  Your patch would add 
more code into a path that I'd like to see reduced significantly.


That implementation would make something easy enough for your use case 
too (below untested but show the general idea):


$ for cluster in 1 2 3 4 5 6;
 do initdb -D data$cluster
 (
 cat < data$cluster/conf.d/99clustersetup
done

This would actually work just fine for what you're doing right now if 
you used ">> data$cluster/postgresql.conf" for that next to last line 
there.  There would be duplicates, which I'm guessing is what you wanted 
to avoid with this patch, but the later values set for the parameters 
added to the end would win and be the active ones.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] last_statrequest is in the future

2010-03-26 Thread Greg Smith

Tom Lane wrote:

Should we redesign the stats signaling logic to work around this,
or just hope we can nag kernel people into fixing it?
  


Even if there was something to be done in kernel space, how many years 
from now would it be before it made this problem go away for the 
majority of near future 9.0 users?  We've been seeing a fairly regular 
stream of "pgstat wait timeout" reports come in.  The one I reported was 
from recent hardware and a very mainstream Linux setup.  I'm not real 
optimistic that this one can get punted toward the OS and get anything 
done about it in time to head off problems in the field.


This particularly pathologic case with jaguar is great because it's made 
it possible to nail down how to report the problem.  I don't think it's 
possible to make a strong conclusion about how to resolve this just from 
that data though.  What we probably need is for your additional logging 
code to catch this again on some systems that are not so obviously 
broken, to get a better idea what a normal (rather than extreme) 
manifestation looks like.  How much skew is showing up, whether those do 
in fact correspond with the wait timeouts, that sort of thing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-22 Thread Greg Smith

Takahiro Itagaki wrote:

Greg Smith  wrote:

  
Attached is an updated version that I think is ready to commit.  Only 
changes are docs--I rewrote those to improve the wording some.



Thanks for the correction. Applied.
  


By the way: the pgbench.sgml that you committed looks like it passed 
through a system that added a CR to every line in it.  Probably not the 
way you intended to commit that.


So far I've done over 40 hours of pgbench runtime worth of testing (>500 
runs) using the patched version without any issues, the code itself 
continues to act fine.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Greg Smith

Robert Haas wrote:

On Thu, Mar 18, 2010 at 4:28 PM, Marc G. Fournier  wrote:
  

On Thu, 18 Mar 2010, Joshua D. Drake wrote:



On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:
  


Or, let's put it another way: I've made my opinion clear in the past
that I think that we ought to ship with a minimal postgresql.conf with
maybe 15 items in it.

+1
  

+1 ... but, why the 'top 15'?  why not just those that are uncommented to
start with, and leave those that are commented out as 'in the docs' ... ?


+1 to either proposal.
  


If this is turning into a vote:  -1 from me for any work on this until 
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items is cleared.  
It boggles my mind that anyone could have a different prioritization 
right now.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Greg Smith

Joshua D. Drake wrote:

As usual, the postgresql.conf is entirely too full. We should ship with
the top 15.


Maybe, but what we should do is ship, and then talk about this again 
when it's appropriate--earlier in the release cycle.  Let me try and cut 
this one off before it generates a bunch of traffic by summarizing where 
this is stuck at.


We started this release with a good plan for pulling off a major 
postgresql.conf trimming effort that I still like a lot ( 
http://wiki.postgresql.org/wiki/PgCon_2009_Developer_Meeting#Auto-Tuning 
)  The first step was switching over to a directory-based structure that 
allowed being all things to all people just by selecting which of the 
files provided you put into there.  We really need the things initdb 
touches to go into a separate file, rather than the bloated sample, in a 
way that it's easy to manage; if you just drop files into a directory 
and the server reads them all that's the easiest route.  Extending to 
include the top 15 or whatever other subset people want is easy after that.


Now, that didn't go anywhere in this release due to development focus 
constraints, but I'm willing to take "has what we can advertise as 
built-in replication" as a disappointing but acceptable substitute in 
lieu of that.  (rolls eyes)  I think it will fit nicely into the "9.1 
adds the polish" theme already gathering around the replication features 
being postponed to the next release.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-18 Thread Greg Smith

Takahiro Itagaki wrote:

The log filenames are "pgbench_log.."
for each thread, but the first thread (including single-threaded) still uses
"pgbench_log." for the name because of compatibility.
  


Attached is an updated version that I think is ready to commit.  Only 
changes are docs--I rewrote those to improve the wording some.  The code 
looked and tested fine to me.  I just added support for the new format 
to pgbench-tools and am back to happily running large batches of tests 
using it again.


I confirmed a few things:

-On my CentOS system, the original problem is masked if you have 
"--enable-thread-safety" on; the multi-threaded output shows up without 
any broken lines into the single file.  As I suspected it's only the 
multi-process implementation that shows the issue here.  Since Tom 
points out that's luck rather than something that should be relied upon, 
I don't think that actually changes what to do here, it just explains 
why this wasn't obvious in earlier testing--normally I have thread 
safety on nowadays.


-Patch corrects the problem.  I took a build without thread safety on, 
demonstrated the issue with its pgbench.  Apply the patch, rebuild just 
pgbench, run again; new multiple log files have no issue.


-It's easy to convert existing scripts to utilize the new multiple log 
format.  Right now the current idiom you're forced into using when 
running pgbench scripts is to track the PID it's run as, then use 
something like:


mv pgbench_log.${PID} pgbench.log

To convert to a stable filename for later processing.  Now you just use 
something like this instead:


cat pgbench_log.${PID}* > pgbench.log
rm -f pgbench_log.${PID}*

And that works fine. 


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 0019db4..28a8c84 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -131,11 +131,9 @@ int			fillfactor = 100;
 #define ntellers	10
 #define naccounts	10
 
-FILE	   *LOGFILE = NULL;
-
 bool		use_log;			/* log transaction latencies to a file */
-
-int			is_connect;			/* establish connection for each transaction */
+bool		is_connect;			/* establish connection for each transaction */
+int			main_pid;			/* main process id used in log filename */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -183,6 +181,7 @@ typedef struct
  */
 typedef struct
 {
+	int			tid;			/* thread id */
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
@@ -741,7 +740,7 @@ clientDone(CState *st, bool ok)
 
 /* return false iff client should be disconnected */
 static bool
-doCustom(CState *st, instr_time *conn_time)
+doCustom(CState *st, instr_time *conn_time, FILE *log_file)
 {
 	PGresult   *res;
 	Command   **commands;
@@ -778,7 +777,7 @@ top:
 		/*
 		 * transaction finished: record the time it took in the log
 		 */
-		if (use_log && commands[st->state + 1] == NULL)
+		if (log_file && commands[st->state + 1] == NULL)
 		{
 			instr_time	now;
 			instr_time	diff;
@@ -791,12 +790,12 @@ top:
 
 #ifndef WIN32
 			/* This is more than we really ought to know about instr_time */
-			fprintf(LOGFILE, "%d %d %.0f %d %ld %ld\n",
+			fprintf(log_file, "%d %d %.0f %d %ld %ld\n",
 	st->id, st->cnt, usec, st->use_file,
 	(long) now.tv_sec, (long) now.tv_usec);
 #else
 			/* On Windows, instr_time doesn't provide a timestamp anyway */
-			fprintf(LOGFILE, "%d %d %.0f %d 0 0\n",
+			fprintf(log_file, "%d %d %.0f %d 0 0\n",
 	st->id, st->cnt, usec, st->use_file);
 #endif
 		}
@@ -857,7 +856,7 @@ top:
 		INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
 	}
 
-	if (use_log && st->state == 0)
+	if (log_file && st->state == 0)
 		INSTR_TIME_SET_CURRENT(st->txn_begin);
 
 	if (commands[st->state]->type == SQL_COMMAND)
@@ -1833,7 +1832,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
-is_connect = 1;
+is_connect = true;
 break;
 			case 's':
 scale_given = true;
@@ -1955,6 +1954,12 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/*
+	 * save main process id in the global variable because process id will be
+	 * changed after fork.
+	 */
+	main_pid = (int) getpid();
+
 	if (nclients > 1)
 	{
 		state = (CState *) realloc(state, sizeof(CState) * nclients);
@@ -1980,20 +1985,6 @@ main(int argc, char **argv)
 		}
 	}
 
-	if (use_log)
-	{
-		char		logpath[64];
-
-		snprintf(logpath, 64, "pgbench_log.%d", (int) getpid());
-		LOGFILE = fopen(logpath, "w");
-
-		if (LOGFILE == NULL)
-		{
-			fprintf(stderr, "Couldn't open logfile \"%s\": %s", logpath, strerror(errno

Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Greg Smith

Takahiro Itagaki wrote:

  1. Use explicit locks. The lock primitive will be pthread_mutex for
 multi-threaded implementations or semaphore for multi-threaded ones.
  2. Use per-thread log files.
 File names would be "pgbench_log..".
  


I'm concerned that the locking itself will turn into a new pgbench 
bottleneck, just as we're clearing the point where it's not for the 
first time in a while.  And that sounds like it has its own potential 
risks/complexity involved.


I could live with per-thread log files.  I think my pgbench-tools is the 
main consumer of these latency logs floating around right now, I just 
pushed a 9.0 update to handle the multiple workers option today that 
discovered this).  It doesn't make any difference to what I'm doing how 
many file I have to process.  Just a few lines of extra shell code for 
me to pull the rest into the import.  That seems like the simplest 
solution that's guaranteed to work, just push the problem onto the 
client side instead where it's easier to deal with.


Unless someone feels strongly that these have to be interleaved into one 
file, based on Andrew's suggestion that this is a hard problem to get 
right and Tom's suggestion that this might even extend into the proper 
threaded version too, I think a log file per worker is the easiest way 
out of this.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Greg Smith
Just noticed a problem popping up sometimes with the new multi-threaded 
pgbench.  This is on a Linux RPM build (the alpha4 set) compiled with 
'--disable-thread-safety'.  Still trying to nail down whether that's a 
requirement for this problem to appear or not.  I did most of my review 
of this feature with it turned on, and haven't been seeing this problem 
on other systems that are thread safe.  Not sure yet if that's cause and 
effect or coincidence yet.


Here's a sample invocation that produces ragged output for me on my one 
system:


pgbench -S -T 5 -c 4 -j 4 -l pgbench

The log file produced by this (pgbench_log.) is supposed to consist 
of a series of lines in the following format:


client,trans,latency,filenum,sec,usec

It looks like the switch between clients running on separate workers can 
lead to a mix of their respective lines showing up though.  Here's a 
couple of typical samples, with the bad line in the middle of each set:


1 138 178 0 1268665788 607559
1 139 182 0 1268665788 607751
1 1402 0 2491 0 1268665788 586135
2 1 264 0 1268665788 586463
2 2 192 0 1268665788 586665

1 274 160 0 1268665788 632966
1 275 178 0 1268665788 633154
1 276 184 0 126866578 178 0 1268665788 614015
2 141 190 0 1268665788 614252
2 142 169 0 1268665788 614430

2 274 178 0 1268665788 639218
2 275 175 0 1268665788 639402
2 276 169 0 126866578 171 0 1268665788 626933
0 141 185 0 1268665788 627165
0 142 202 0 1268665788 627377

Looks like sometimes a client is only getting part of its line written 
out before getting stomped on by the next one.  I think one of the 
assumptions being made about how to safely write to this log file may be 
broken by the multi-process implementation, which is what you get when 
thread-safety is not available.


Since there should only be 6 fields here, I think you can find whether a 
given log file has this problem or not like this:


cat pgbench_log.x | cut -d " " -f 7 | sort | uniq

If anything comes out of that, the latency log file has at least one bad 
line in it.


Similarly, this:

cat pgbench_log.x | cut -d " " -f 1 | sort | uniq

Should only show the client numbers; here there's some first columns 
with much bigger numbers too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-15 Thread Greg Smith

Dimitri Fontaine wrote:

Maybe some more admin level tutorial would be great to have too, such as
how to find what's locking, how to monitor table and index usage to
determine which indexes to drop, which to create, how to monitor
 (slaves lag, hitratio, transactions, I/U/D activity, you name
it).
  


Wow, that's at least one order of magnitude more ambitious than the 
actual scope of work on the docs that should be getting focused on for 
beta right now, perhaps two.  Regardless, I already have stubs for the 
first couple of these sitting on the wiki at 
http://wiki.postgresql.org/wiki/Category:Administration (locks, 
monitoring).  I know I'd rather see work done on those, where we can 
continue to improve without doc commits and easily make things available 
for all versions, until that content is good.  Maybe then we can talk 
about merging some of that back into the main docs.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable pg_controldata?

2010-03-05 Thread Greg Smith

Josh Berkus wrote:

Oh, I wasn't proposing doing *anything* for 9.0.  I wanted to get
something on the TODO list for 9.1.  As far as I'm concerned, 9.0 is
closed to new ideas.  We have enough bugs to fix as it is.
  


I didn't get that initially from how you characterized this as "past 
time" to add.  It's at 
http://wiki.postgresql.org/wiki/Todo#Point-In-Time_Recovery_.28PITR.29 now.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable pg_controldata?

2010-03-05 Thread Greg Smith

Heikki Linnakangas wrote:

Then again, if you don't use the copy in shared memory but just open the
pg_control file and read it in the UDF, you could implement this as a
pgfoundry module that works with older versions too.
  


This is the direction I'd prefer to see this go in a 9.0 context.  It's 
easy enough to build a fully functional version that lives works via the 
same proposed final UDF interface, just with the extra step of reading 
the file.  Get that working, and you just added a useful module 
supporting all the way back to 8.2 (I think--not sure if there's been 
any other changes that would break this) that people would love to have. 

Once it's done, the UI is solid, all the data is known to be exposed in 
the right way it turns out people wanted it to be, then do the simple 
conversion it to grab from shared memory instead and add it as an 
official 9.1 feature.  I'm not feeling any pressure that this is a 
must-fix item for the 9.0 release freeze--as warts here go, this is a 
both a small one and one that doesn't have to be fixed in core, so two 
strikes against it being critical.


I would rather have the ability to tweak on this for a few months to get 
everything right, while being able to expose regular updates outside of 
core, than to commit "this is the best we've got so far" just under the 
wire for 9.0 without necessarily enough time to do it well.  The few 
messages that have shown up here already have made left me with the 
optinion that just getting the requirements and preferred implementation 
nailed down here is going to take a few rounds of development to work 
out to everyone's satisfaction.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable pg_controldata?

2010-03-05 Thread Greg Smith

Magnus Hagander wrote:

Huh? It's fixed with, you don't need regexps for that. Just split the
string apart.

Taking options for single fields might have a better usecase, of course :-)
  


I do find it a bit hard to imagine that any program capable of shelling 
out to call pg_controldata and doing something with the output would hit 
a major hurdle parsing the format that's already there.  Moving toward 
single fields I could see as being better for some cases, but going all 
the way to XML/JSON is a step backwards from the plain text format as 
far as I'm concerned.  Anything that can parse one of those complicated 
formats should be able to trivially chew the existing one already.  
Seriously:  I have bash scripts that parse that thing.



Even better would be the ability to get everything which is in
pg_controldata currently as part of a system view in a running
PostgreSQL; I can get most of them, but certainly not all.



+1 for having all the information available from inside the backend,
if that's technically possible (which I assume it should be)
  


I revisit this every time I write yet another user-space parser and ask 
myself why I haven't exposed it in the server yet.  The primary answer 
so far has always been "because you can't execute a query on the standby 
while it's in recovery", making half the stuff I wanted the data far 
(e.g. standby lag monitoring like 
http://www.kennygorman.com/wordpress/?p=249 ) unable to use that 
interface anyway.  Now that Hot Standby cracks that objection, it's 
worth talking about for a minute.


pg_controldata itself just reads the file in directly and dumps the 
data.  There is a copy of it kept around all the time in shared memory 
though (ControlFile in xlog.c), protected by a LWLock.  At a high level 
you can imagine a new function in xlog.c that acquires that lock, copies 
the block into a space the backend allocated for saving it, releases the 
lock, and then returns the whole structure.  Then just wrap some number 
of superuser-only UDFs around it (I'd guess nobody wants regular ones 
able to hold a lock on ControlFile) and you've exposed the results to 
user-space.


Two questions before I'd volunteer to write that:

1) How do you handle the situation where the pg_controldata is invalid?  
"Not read in yet" and "CRC is bad" are the two most obvious ones that 
can happen.  Return a null for every field, try and guess (the way 
pg_resetxlog does), don't return a row of output at all, or throw an 
error?  Each of these has slightly different implications for how admin 
code that will do something with these values will have to be structured.


2) While it's easy to say "I only want one or two of these values" and 
expose a whole set of UDFs to grab them individually (perhaps wrapping 
into a system view via that popular approach), I am concerned that 
people are going to call any single-value versions provided one at a 
time and get an inconsistent set.  I think the only reasonable interface 
to this would not return a single field, it would pop out all of them so 
you got a matching set from the point in time the lock was held.  And if 
that's the case, I'm not sure of the most reasonable UI is.  Just return 
a whole row with a column for each field in the file, and then people 
can select out just the ones they want?  (That's probably the right 
one)  Produce the mess as a series of rows with (name,value) pairs?  Put 
them into an array?


Have re-raised these concerns to myself, this is usually the point in 
this exercise where I go "screw it, I'll just parse pg_controldata again 
instead" and do that instead.  This is happening so much lately that I 
think Josh's suggestion it's just unworkable to keep going via that 
model forever has merit though.  I find it hard to mark this 9.0 
territory though, given the data is not actually difficult to grab--and 
that trail is already well blazed, nothing new in this version.


In short:  I'd vote for TODO item and would happily write myself for 9.1 
given reasonable agreement on the questions raised above, -1 for doing 
anything about it right now though.  Given both the existence of 
completely reasonable workarounds and the existence of much more serious 
blocker problems sitting on the roadmap to release, can't get real 
excited about this as the thing to worry about right now.  Same reason I 
ignored the idea when Joshua Tolley brought it up last month:  
http://archives.postgresql.org/message-id/4b69caeb.9513f30a.731a.3...@mx.google.com


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us




Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-03-02 Thread Greg Smith

Bruce Momjian wrote:

Right now you can't choose "master bloat", but you can choose the other
two.  I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.



I was wrong.  You can choose "master bloat" with
vacuum_defer_cleanup_age, but only crudely because it is measured in
xids and the master defers no matter what queries are running on the
slave...


OK with you finding the situation acceptable, so long as it's an 
informed decision.  From how you're writing about this, I'm comfortable 
you (and everybody else still involved here) have absorbed the issues 
enough that we're all talking about the same thing now.  Since there are 
a couple of ugly user-space hacks possible for prioritizing "master 
bloat", and nobody is stepping up to work on resolving this via my 
suggestion involving better SR integration, seems to me heated 
discussion of code changes has come to a resolution of sorts I (and 
Simon, just checked) can live with.  Sounds like we have three action 
paths here:


-Tom already said he was planning a tour through the HS/SR code, I 
wanted that to happen with him aware of this issue.
-Josh will continue doing his testing, also better informed about this 
particular soft spot.
-I'll continue test-case construction for the problems here there are 
still concerns about (pathologic max_standby_delay and b-tree split 
issues being the top two on that list), and keep sharing particularly 
interesting ones here to help everyone else's testing. 

If it turns out any of those paths leads to a must-fix problem that 
doesn't have an acceptable solution, at least the idea of this as a 
"plan B" is both documented and more widely understood then when I 
started ringing this particular bell.


I just updated the Open Items list:  
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items to officially 
put myself on the hook for the following HS related documentation items 
that have come up recently, aiming to get them all wrapped up in time 
before or during early beta:


-Update Hot Standby documentation: clearly explain relationships between 
the 3 major setup trade-offs, "buffer cleanup lock", notes on which 
queries are killed once max_standby_delay is reached, measuring XID 
churn on master for setting vacuum_defer_cleanup_age
-Clean up archive_command docs related to recent "/bin/true" addition.  
Given that's where I expect people who run into the pg_stop_backup 
warning message recently added will end up at, noting its value for 
escaping from that particular case might be useful too.


To finish airing my personal 9.0 TODO list now that I've gone this far, 
I'm also still working on completing the following patches that initial 
versions have been submitted of, was close to finishing both before 
getting side-tracked onto this larger issue:


-pgbench > 4000 scale bug fix:  
http://archives.postgresql.org/message-id/4b621ba3.7090...@2ndquadrant.com
-Improving the logging/error reporting/no timestamp issues in pg_standby 
re-raised recently by Selena:  
http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7...@mail.gmail.com


If nobody else claims them as something they're working on before, I 
suspect I'll then move onto building some of the archiver UI 
improvements discussed most recently as part of the "pg_stop_backup does 
not complete" thread, despite Heikki having crushed my dreams of a 
simple solution to those by pointing out the shared memory memory 
limitation involved.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us




Re: [HACKERS] building postgres-A4.tex-pdf crushed

2010-03-02 Thread Greg Smith

Oleg Bartunov wrote:

just to inform, that building of postgres-A4.tex-pdf (8.4) crushed on my
Ubuntu 9.10 machine. No problem with CVS HEAD.


Alvaro and I are both having the opposite problem on Ubuntu 9.04:  8.4 
works fine, but CVS HEAD dumps core:  
http://archives.postgresql.org/message-id/20100215141242.ga2...@alvh.no-ip.org


I wonder if someone more familiar with doing git bisects than me might 
track down the exact commit that swapped the behavior here.  Since this 
looks like an awful openjade bug regardless, not sure what we could do 
about it; would be nice to know exactly how it happened though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-03-02 Thread Greg Smith

Simon Riggs wrote:

On Tue, 2010-03-02 at 13:13 +, Greg Stark wrote:
  
Why do we disallow turning off archive_mode anyways? 



Because it is needed for safety and nobody has got around to coding the
idea of turning it on/off during normal running, which is possible, with
appropriate care.
  


It's actually made it pretty high up on the list of desired features for 
some of the replication projects:  
http://wiki.postgresql.org/wiki/ClusterFeatures#Start.2Fstop_archiving_at_runtime


Since that is one of the easier items on that list to actually knock off 
(probably an order of magnitude so than the average feature there), it's 
completely feasible somebody will do so for 9.1.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-03-01 Thread Greg Smith

Fujii Masao wrote:

We would be easily able to calculate the last archived log file from
the existence of archive status files.
  


Right, but you have to actually scan the whole archive directory to 
figure that out, and I'd rather not see that code get duplicated 
somewhere else when it's already inside the archive_command logic.  If 
it just shared that info with the rest of the system instead this would 
be trivial to discover.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-03-01 Thread Greg Smith

Fujii Masao wrote:

On Fri, Feb 26, 2010 at 2:47 AM, Bruce Momjian  wrote:
  

Postgres 9.0 will be the first release to mention /bin/true as a way of
turning off archiving in extraordinary circumstances:

   http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html

Setting archive_mode to a command that does nothing but return true, e.g. /bin/true,



"return true" seems ambiguous for me. How about writing clearly
"return a zero exit status" instead?
  


This is a good catch, and I have a work in progress update to that doc 
section that fixes that wording, as well as rearranging the recent 
additions a bit.  Really that whole "/bin/true" big needs to go after 
the example.  A very brief intro to what "exit status" means on various 
platforms might be in order too.  I'm adjusting all that to read better, 
once I'm happy with it I'll submit a doc patch in the next week or two 
with the final result.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-03-01 Thread Greg Smith

Robert Haas wrote:

I just read through the current documentation and it doesn't really
seem to explain very much about how HS decides which queries to kill.
Can someone try to flesh that out a bit?


I believe it just launches on a mass killing spree once things like 
max_standby_delay expire.  This I want to confirm via testing (can 
simulate with a mix of long and short running pgbench queries) and then 
intend to update the docs to clarify.



It also uses the term
"buffer cleanup lock", which doesn't seem to be used anywhere else in
the documentation (though it does appear in the source tree, including
README.HOT).
  


This loose end was already noted in my last docs update.  I wrote an 
initial description, but Bruce and I decided to leave out until 
something more thorough could be put together.  This is also on my docs 
cleanup list, will get to it somewhere along the beta timeline.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-03-01 Thread Greg Smith

Bruce Momjian wrote:

Joachim Wieland wrote:
  

1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their queries
on the standby might fail and they will start thinking that this HS+SR
feature is not as convincing as they thought it was...



I assumed they would set max_standby_delay = -1 and be happy.
  


The admin in this situation might be happy until the first time the 
primary fails and a failover is forced, at which point there is an 
unbounded amount of recovery data to apply that was stuck waiting behind 
whatever long-running queries were active.  I don't know if you've ever 
watched what happens to a pre-8.2 cold standby when you start it up with 
hundreds or thousands of backed up WAL files to process before the 
server can start, but it's not a fast process.  I watched a production 
8.1 standby get >4000 files behind once due to an archive_command bug, 
and it's not something I'd like to ever chew my nails off to again.  If 
your goal was HA and you're trying to bring up the standby, the server 
is down the whole time that's going on.


This is why no admin who prioritizes HA would consider 
'max_standby_delay = -1' a reasonable setting, and those are the sort of 
users Joachim's example was discussing.  Only takes one rogue query that 
runs for a long time to make the standby so far behind it's useless for 
HA purposes.  And you also have to ask yourself "if recovery is halted 
while waiting for this query to run, how stale is the data on the 
standby getting?".  That's true for any large setting for this 
parameter, but using -1 for the unlimited setting also gives the maximum 
possible potential for such staleness.


'max_standby_delay = -1' is really only a reasonable idea if you are 
absolutely certain all queries are going to be short, which we can't 
dismiss as an unfounded use case so it has value.  I would expect you 
have to also combine it with a matching reasonable statement_timeout to 
enforce that expectation to make that situation safer.


In any of the "offload batch queries to the failover standby" 
situations, it's unlikely an unlimited value for this setting will be 
practical.  Perhaps you set max_standby_delay to some number of hours, 
to match your expected worst-case query run time and reduce the chance 
of cancellation.  Not putting a limit on it at all is a situation no DBA 
with healthy paranoia is going to be happy with the potential downside 
of in a HA environment, given that both unbounded staleness and recovery 
time are then both possible.  The potential of a failed long-running 
query is much less risky than either of those.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-03-01 Thread Greg Smith

Josh Berkus wrote:

However, this leaves aside Greg's point about snapshot age and
successive queries; does anyone dispute his analysis?  Simon?
  


There's already a note on the Hot Standby TODO about unexpectly bad 
max_standby_delay behavior being possible on an idle system, with no 
suggested resolution for it besides better SR integration.  The issue 
Greg Stark has noted is another variation on that theme.  It's already 
on my list of theorized pathological but as yet undemonstrated concerns 
that Simon and I identified, the one I'm working through creating a test 
cases to prove/disprove.  I'm past "it's possible..." talks at this 
point though as not to spook anyone unnecessarily, and am only raising 
things I can show concrete examples of in action.  White box testing at 
some point does require pausing one's investigation of what's in the box 
and getting on with the actual testing instead.


The only real spot where my opinion diverges here that I have yet to 
find any situation where 'max_standby_delay=-1' makes any sense to me.  
When I try running my test cases with that setting, the whole system 
just reacts far too strangely.  My first patch here is probably going to 
be adding more visibility into the situation when queries are blocking 
replication forever, because I think the times I find myself at "why is 
the system hung right now?" are when that happens and it's not obvious 
as an admin what's going on.


Also, the idea that a long running query on the standby could cause an 
unbounded delay in replication is so foreign to my sensibilities that I 
don't ever include it in the list of useful solutions to the problems 
I'm worried about.  The option is there, not disputing that it makes 
sense for some people because there seems some demand for it, just can't 
see how it fits into any of the use-cases I'm concerned about.


I haven't said anything about query retry mainly because I can't imagine 
any way it's possible to build it in time for this release, so whether 
it's eventually feasible or not doesn't enter into what I'm worried 
about right now.  In any case, I would prioritize that behind work on 
preventing the most common situations that cause cancellations in the 
first place, until those are handled so well that retry is the most 
effective improvement left to consider.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-03-01 Thread Greg Smith

Josh Berkus wrote:

And I think we can measure bloat in a pgbench test, no?  When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.
  


The test case I attached at the start of this thread runs just the 
UPDATE to the tellers table.  Running something similar that focuses 
just on UPDATEs to the pgbench_accounts table, without the rest of the 
steps done by the standard test, is the fastest route to bloat.  The 
standard test will do it too, just does a lot of extra stuff too that 
doesn't impact results (SELECT, INSERT) so it wastes some resources 
compared to a targeted bloater script.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-02-28 Thread Greg Smith

Josh Berkus wrote:

Well, we could throw this on the user if we could get them some
information on how to calculate that number.  For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master.  Sure, it's
clunky, but we've already warned people that 9.0 will be clunky and hard
to administer.  And it's no worse than setting FSM_pages used to be.

However, first we need to test that setting vacuum_defer_cleanup_age
actually benefits query cancel issues.
  


Proving that setting works as expected is already on my test case grid, 
seems fine in my limited testing so far.  I've started looking into ways 
to monitor XID churn in a way for setting it better.  I'll take care of 
providing all that in my next test case update.  My intent here is to 
take the ideas outlined in my "Hot Standby Tradeoffs" blog post and turn 
that into a new documentation section making it more clear where the 
problem steps are, regardless of what else happens here.  And I need 
some concrete example of XID burn rate measurement to finish that job.


The main problem with setting vacuum_defer_cleanup_age high isn't 
showing it works, it's a pretty simple bit of code.  It's when you 
recognize that it penalizes all cleanup all the time, whether or not the 
standby is actually executing a long-running query or not, that you note 
the second level of pain in increasing it.  Returning to the idea of 
"how is this different from a site already in production?", it may very 
well be the case that a site that sets vacuum_defer_cleanup_age high 
enough to support off-peak batch reporting cannot tolerate how that will 
impact vacuums during their peak time of day.  The XID export 
implementation sidesteps that issue by only making the vacuum delay 
increase when queries that require it are running, turning this back 
into a standard "what's the best time of day to run my big reports?" 
issue that people understand how to cope with already.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-02-28 Thread Greg Smith

Josh Berkus wrote:

First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub.  And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.
  


As I see it, the main technical obstacle here is that a subset of a 
feature already on the SR roadmap needs to get built earlier than 
expected to pull this off.  I don't know about Tom, but I have no 
expectation it's possible for me to get up to speed on that code fast 
enough to contribute anything there.  I expect the thing I'd be most 
productive at as far as moving the release forward is to continue 
testing this pair of features looking for rough edges, which is what I 
have planned for the next month. 

I'm not even close to finished with generating test cases specifically 
probing for bad behavior suspected after a look the implementation 
details--this is just what I came up with in my first week of that.  
Count me in for more testing, but out for significant development here.  
It's not what I've got my time allocated for because it's not where I 
think I'll be most productive.



2) A more usable vacuum_defer_cleanup_age.  If it was feasible for a
user to configure the master to not vacuum records less than, say, 5
minutes dead, then that would again offer the choice to the user of
slightly degraded performance on the master (acceptable) vs. lots of
query cancel (unacceptable).  I'm going to test Greg's case with
vacuum_cleanup_age used fairly liberally to see if this approach has merit.
  


I've been down that road and it leads quickly to the following 
question:  "how can I tell how old in time-based units an xid is?"  If 
there were an easy answer to that question, vacuum_defer_cleanup_age 
would already be set in time units.  It's the obvious UI to want, it's 
just not obvious how to build it internally.  Maybe I missed something, 
but my guess is that vacuum_defer_cleanup_age is already as good as it's 
going to get.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-02-28 Thread Greg Smith

Joachim Wieland wrote:

Instead, I assume that most people who will grab 9.0 and use HS+SR do
already have a database with a certain query profile. Now with HS+SR
they will try to put the most costly and longest read-only queries to
the standby but in the end will run the same number of queries with
the same overall complexity.
  


This is a nice summary of the primary use-case I am trying to optimize 
usability for, because I know for a fact there's a stack of pent-up 
requests for exactly this form of improvement from existing warm standby 
users.  And your subsequent discussion of how administrators will react 
in each of the possible configurations here matches my own concerns.  I 
would highly recommend anyone who feels this is not a critical feature 
to fix carefully read Joachim's message from an advocacy perspective, 
that's a better user-oriented prediction than mine of exactly how this 
is going to play out in the field post-release.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-27 Thread Greg Smith

Greg Stark wrote:

On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith  wrote:
  

The idea of the workaround is that if you have a single long-running query
to execute, and you want to make sure it doesn't get canceled because of a
vacuum cleanup, you just have it connect back to the master to keep an open
snapshot the whole time.

Also, I'm not sure this actually works. When your client makes this
additional connection to the master it's connecting at some
transaction in the future from the slave's point of view. The master
could have already vacuumed away some record which the snapshot the
client gets on the slave will have in view.


Right, and there was an additional comment in the docs alluding to some 
sleep time on the master that intends to try and improve thins.  If you 
knew how long archive_timeout was you could try to sleep longer than it 
to try and increase your odds of avoiding an ugly spot.  But there are 
race conditions galore possible here, particularly if your archiver or 
standby catchup is backlogged.



Still it's a handy practical trick even if it isn't 100% guaranteed to
work. But I don't think it provides the basis for something we can
bake in.
  


Agreed on both counts, which is why it's in the current docs as a 
workaround people can consider, but not what I've been advocating as the 
right way to proceed.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-02-27 Thread Greg Smith

Robert Haas wrote:

It seems to me that if we're forced to pass the xmin from the
slave back to the master, that would be a huge step backward in terms
of both scalability and performance, so I really hope it doesn't come
to that.


Not forced to--have the option of.  There are obviously workloads where 
you wouldn't want this.  At the same time, I think there are some pretty 
common ones people are going to expect HS+SR to work on transparently 
where this would obviously be the preferred trade-off to make, were it 
available as one of the options.  The test case I put together shows an 
intentionally pathological but not completely unrealistic example of 
such a workload.



I wish I understood better exactly what you mean by "the
notion of synchronizing the WAL stream against slave queries" and why
you don't think it will work.  Can you elaborate?
  


There's this constant WAL stream coming in from the master to the 
slave.  Each time the slave is about to apply a change from that stream, 
it considers "will this disrupt one of the queries I'm already 
executing?".  If so, it has to make a decision about what to do; that's 
where the synchronization problem comes from.


The current two options are "delay applying the change", at which point 
the master and standby will drift out of sync until the query ends and 
it can catch back up, or "cancel the query".  There are tunables for 
each of these, and they all seem to work fine (albeit without too much 
testing in the field yet).  My concern is that the tunable that tries to 
implement the other thing you might want to optimize for--"avoid letting 
the master generate WAL entires that are the most likely ones to 
conflict"--just isn't very usable in its current form.


Tom and I don't see completely eye to eye on this, in that I'm not so 
sure the current behaviors are "fundamentally wrong and we will never be 
able to make [them] work".  If that's really the case, you may not ever 
get the scalability/performance results you're hoping for from this 
release, and really we're all screwed if those are the only approaches 
available.


What I am sure of is that a SR-based xmin passing approach is simpler, 
easier to explain, more robust for some common workloads, and less 
likely to give surprised "wow, I didn't think *that* would cancel my 
standby query" reports from the field than any way you can configure Hot 
Standby alone right now.  And since I never like to bet against Tom's 
gut feel, having it around as a "plan B" in case he's right about an 
overwhelming round of bug reports piling up against the 
max_standby_delay etc. logic doesn't hurt either.


I spent a little time today seeing if there was any interesting code I 
might steal from the early "synchrep" branch at 
http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary , 
but sadly when I tried to rebase that against the master to separate out 
just the parts unique to it the merge conflicts were overwhelming.  I 
hate getting beaten by merge bitrot even when Git is helping.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-27 Thread Greg Smith

Bruce Momjian wrote:
"The first option is to connect to the primary server and keep a query 
active for as long as needed to run queries on the standby. This 
guarantees that a WAL cleanup record is never generated and query 
conflicts do not occur, as described above. This could be done using 
contrib/dblink  and pg_sleep(), or via other mechanisms."



I am unclear how you would easily advance the snapshot as each query
completes on the slave.
  


The idea of the workaround is that if you have a single long-running 
query to execute, and you want to make sure it doesn't get canceled 
because of a vacuum cleanup, you just have it connect back to the master 
to keep an open snapshot the whole time.  That's basically the same idea 
that vacuum_defer_cleanup_age implements, except you don't have to 
calculate a value--you just hold open the snapshot to do it.


When that query ended, its snapshot would be removed, and then the 
master would advance to whatever the next latest one is.  Nothing 
fancier than that.  The only similarity is that if you made every query 
that happened on the standby do that, it would effectively be the same 
behavior I'm suggesting could be available via the standby->master xmin 
publication.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Performance Patches Was: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-27 Thread Greg Smith

Mark Kirkwood wrote:
While I completely agree that the submitter should be required to 
supply a test case and their results, so the rest of us can try to 
reproduce said improvement - rejecting the patch out of hand is a bit 
harsh I feel - Hey, they may just have forgotten to supply these things! 
I didn't put any strong wording in the Wiki, I was just mentioning my 
personal position is far less tolerant of this than the current project 
policy.  What I added was:


"If the patch is intended to improve performance, it's a good idea to 
include some reproducible tests to demonstrate the improvement. If a 
reviewer cannot duplicate your claimed performance improvement in a 
short period of time, it's very likely your patch will be bounced. Do 
not expect that a reviewer is going to find your performance feature so 
interesting that they will build an entire test suite to prove it works. 
You should have done that as part of patch validation, and included the 
necessary framework for testing with the submission."


Finding a reviewer for a performance patch and getting them up to speed 
to evaluate any submitted patch is time intensive, and it really sucks 
from the perspective of the CF manager and any reviewer who is handed a 
messy one.  The intention was not to cut people off without warning 
them.  The position I would advocate as being a fair one is that if you 
don't provide a test case for a performance improvement patch, you can't 
then expect that you'll be assigned a reviewer by the CF manager either 
until that's corrected.  And if time for the CF runs out before you do 
that, you're automatically moved to "returned with 
feedback"--specifically, "write us a test case".


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-27 Thread Greg Smith

Josh Berkus wrote:

Hey, take it easy!  I read Stark's post as tongue-in-cheek, which I
think it was.
  


Yeah, I didn't get that.  We've already exchanged mutual off-list 
apologies for the misunderstanding in both directions, I stopped just 
short of sending flowers.


I did kick off this discussion with noting a clear preference this not 
wander into any personal finger-pointing.  And I am far too displeased 
with the technical situation here to have much of a sense of humor left 
about it either.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-27 Thread Greg Smith

Mark Kirkwood wrote:
I don't mean to be ungrateful about the actual reviews at all - and I 
did value the feedback received (which I hope was reasonably clear in 
the various replies I sent). I sense a bit of attacking the messenger 
in your tone...


I thought there was a moderately big difference between the reality of 
the review you got and how you were characterizing it, and I was just 
trying to provide some perspective on how bad a true "bit of review" 
only would have worked.  Since I saw you disclaimed that wording with a 
smiley I know it wasn't intending to be ungrateful, and I didn't intend 
to shoot the messenger.  Apologies if my tone grazed you though.


In any case, process feedback noted and assimilated into recommended 
practice:  I just added a section about WIP patches to 
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission


While I was in there I also added some more notes on my personal top 
patch submission peeve, patches whose purpose in life is to improve 
performance that don't come with associated easy to run test cases, 
including a sample of that test running on a system that shows the 
speedup clearly.  If I were in charge I just would make it standard 
project policy to reject any performance patch without those 
characteristics immediately.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-27 Thread Greg Smith

Josh Berkus wrote:


Now that I think about it, the xmin thing really doesn't seem
conceptually difficult.  If the slave just opens a 2nd, special query
connection back to the master and publishes its oldest xmin there, as
far as the master is concerned, it's just another query backend.
Could it be that easy?
  


Something just like that is in fact already suggested as a workaround in 
the Hot Standby manual:


"The first option is to connect to the primary server and keep a query 
active for as long as needed to run queries on the standby. This 
guarantees that a WAL cleanup record is never generated and query 
conflicts do not occur, as described above. This could be done using 
contrib/dblink  and pg_sleep(), or via other mechanisms."


And the idea of doing it mainly in client land has its attractions. 

The main reason I wandered toward asking about it in the context of SR 
is that there's already this open "Standby delay on idle system" issue 
with Hot Standby, and the suggested resolution for that problem involves 
publishing keep-alive data with timestamps over SR.  While all these 
problems and potential solutions have been floating around for a long 
time, as you pointed out, the little flash of insight I had here was 
that it's possible to bundle these two problems together with a combined 
keep-alive timestamp+xmin message that goes in both directions.  That 
removes one serious Hot Standby issue altogether, and adds an additional 
conflict avoidance mechanism for people who want to enable it, all with 
something that needs to get done sooner or later anyway for sync rep.


The part I still don't have good visibility on is how much of the 
necessary SR infrastructure needed to support this communications 
channel is already available in some form.  I had though the walsender 
on the master was already receiving messages sometimes from the 
walreceiver on the standby, but I'm getting the impression from Heikki's 
comments that this not the case at all yet.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-27 Thread Greg Smith

Gokulakannan Somasundaram wrote:
Statspack works by the following way -a) it takes a copy of important 
catalog tables(pg_ tables) which store the information like wait 
statistics against wait events, i/o statistics cumulative against each 
SQL_Hash( and SQL_Text), whether a particular plan went for hard 
parse/ soft parse(because of plan caching) and the status of different 
in-memory data structures etc.


This is actually missing the real work that went into building this 
feature into Oracle.  Before it was possible to build statspack, first 
they went to the trouble of noting every source of this form of 
latency--lock waits, I/O statistics and waits, buffer pool waits--and 
instrumented every single one of them internally.  Basically, every time 
something that might wait for a resource you later wanted to monitor the 
wait for happens, a start/end timestamp for that wait is noted, and 
ultimately the difference between them noting how long the event took is 
stored into the database.  That's the data you must have collected at 
some point in order to get the summary.


Meanwhile, PostgreSQL development is resistant to making any changes in 
this direction under the theory that a) it adds a lot of code complexity 
and b) constant calls to get the current system time are too expensive 
on some platforms to do them all the time.  Until those two things are 
sorted out, what the high-level interface to the direction you're 
suggesting looks like doesn't really matter.  DTrace support has managed 
to clear both of those hurdles due to its optional nature, perceived low 
overhead, and removing *storing* all the events generated to something 
that happens outside of the database.


I agree with you that something like statspack is the direction 
PostgreSQL eventually needs to go, but it's going to be an uphill battle 
the whole time to get it built.  The objections will be that it will add 
too much overhead at the lowest levels, where the data needed to support 
it is collected at.  Once that is cleared, the high-level interface is 
easy to build.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-27 Thread Greg Smith

Mark Kirkwood wrote:

Greg Smith wrote:
Returned with feedback in October after receiving a lot of review, no 
updated version submitted since then:


https://commitfest.postgresql.org/action/patch_view?id=98



Hmm - I would say a bit of review rather than a lot :-)


It looks like you got useful feedback from at least three people, and 
people were regularly looking at your patch in some form for about three 
months.  That's a lot of review.  In many other open-source projects, 
your first patch would have been rejected after a quick look as 
unsuitable and that would have been the end of things for you.  I feel 
lucky every time I get a volunteer to spend time reading my work and 
suggesting how it could be better; your message here doesn't seem to 
share that perspective.


I'd also like to take the opportunity to express a little frustration 
about the commitfest business - really all I wanted was the patch 
*reviewed* as WIP - it seemed that in order to do that I needed to 
enter it into the various commitfests... then I was faced with 
comments to the effect that it was not ready for commit so should not 
have been entered into a commifest at all... sigh, a bit of an 
enthusiasm killer I'm afraid...


To lower your frustration level next time, make sure to label the e-mail 
and the entry on the CommitFest app with the magic abbreviation "WIP" 
and this shouldn't be so much of an issue.  The assumption for patches 
is that someone submitted them as commit candidates, and therefore they 
should be reviewed to that standard, unless clearly labeled otherwise.  
You briefly disclaimed yours as not being in that category in the 
initial text of your first message, but it was easy to miss that, 
particularly once it had been >8 months from when that messages showed 
up and it was still being discussed.


If you wanted to pick this back up again, I'd think that a look at 
what's been happening with the lock_timeout GUC patch would be 
informative--I'd think that has some overlap with the sort of thing you 
were trying to do.


FYI, I thought your patch was useful, but didn't spent time on it 
because it's not ambitious enough.  I would like to see statistics on a 
lot more types of waiting than just locks, and keep trying to find time 
to think about that big problem rather than worrying about the 
individual pieces of it.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-26 Thread Greg Smith

Bruce Momjian wrote:

What happened to this patch?
  


Returned with feedback in October after receiving a lot of review, no 
updated version submitted since then:


https://commitfest.postgresql.org/action/patch_view?id=98

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Aidan Van Dyk wrote:

Would we (ya, the royal we) be willing to say that if you want the
benifit of removing the MVCC overhead of long-running queries you need
to run PITR backup/archive recovery, and if you want SR, you get a
closed-loop master-follows-save-xmin behaviour?
  


To turn that question around a little, I think it's reasonable to say 
that closed-loop master-follows-slave-xmin behavior is only practical to 
consider implementing with SR--and even there, it should be optional 
rather than required until there's more field experience on the whole 
thing.  Whether it's the default or not could take a bit of debate to 
sort out too.


If you think of it in those terms, the idea that "you need to run PITR 
backup/archive recovery" to not get that behavior isn't an important 
distinction anymore.  If you run SR with the option enabled you could 
get it, any other setup and you won't.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Greg Stark wrote:

But if they move from having a plain old PITR warm standby to having
one they can run queries on they might well assume that the big
advantage of having the standby to play with is precisely that they
can do things there that they have never been able to do on the master
previously without causing damage.
  


Just not having the actual query running on the master is such a 
reduction in damage that I think it's delivering the essence of what 
people are looking for regardless.  That it might be possible in some 
cases to additionally avoid the overhead that comes along with any 
long-running query is a nice bonus, and it's great the design allows for 
that possibility.  But if that's only possible with risk, heavy 
tweaking, and possibly some hacks, I'm not sure that's making the right 
trade-offs for everyone.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Joshua D. Drake wrote:

On Sat, 27 Feb 2010 00:43:48 +, Greg Stark  wrote:
  

I want my ability to run large batch queries without any performance
or reliability impact on the primary server.



+1

I can use any number of other technologies for high availability.
  


Remove "must be an instant-on failover at the same time" from the 
requirements and you don't even need 9.0 to handle that, this has been a 
straightforward to solve problem since 8.2.  It's the combination of HA 
and queries that make things hard to do.


If you just want batch queries on another system without being concerned 
about HA at the same time, the first option is to just fork the base 
backup and WAL segment delivery to another server and run queries there.


Some simple filesystem snapshot techniques will also suffice to handle 
it all on the same standby.  Stop warm standby recovery, snapshot, 
trigger the server, run your batch job; once finished, rollback to the 
snapshot, grab the latest segment files, and resume standby catchup.  
Even the lame Linux LVM snapshot features can handle that job--one of my 
coworkers has the whole thing scripted even this is so common.


And if you have to go live because there's a failover, you're back to 
the same "cold standby" situation a large max_standby_delay puts you at, 
so it's not even very different from what you're going to get in 9.0 if 
this is your priority mix.  The new version is just lowering the 
operational complexity involved.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Greg Stark wrote:

Eh? That's not what I meant at all. Actually it's kind of the exact
opposite of what I meant.
  


Sorry about that--I think we just hit one of those language usage drift 
bits of confusion.  "Sit in the corner" has a very negative tone to it 
in US English and I interpreted your message badly as a result.  A 
Google search for images using that phrase will quickly show you what I 
mean.



What I meant was that your description of the "High Availability first
and foremost" is only one possible use case. Simon in the past
expressed the same single-minded focus on that use case. It's a
perfectly valid use case and I would probably agree if we had to
choose just one it would be the most important.
  


Sure, there are certainly others, and as much as possible more 
flexibility here is a good thing.  What I was suggesting is that if the 
only good way to handle long-running queries has no choice but to 
sacrifice high-availability, which is is the situation if 
max_standby_delay is the approach you use, then the most obvious users 
for this feature are not being well served by that situation.  I would 
guess a large portion of the users looking forward to Hot Standby are in 
the "have an underutilized high-availability standby I'd like to use for 
offloading long running reports", and if there is no way to serve them 
well this feature is missing the mark a bit. 

You really can't do any better without better master/standby integration 
though, and as pointed out a couple of times here that was considered 
and just not followed through on yet.  I'm increasingly concerned that 
nothing else will really do though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Heikki Linnakangas wrote:

One such landmine is that the keepalives need to flow from client to
server while the WAL records are flowing from server to client. We'll
have to crack that problem for synchronous replication too, but I think
that alone is a big enough problem to make this 9.1 material.
  


This seems to be the real sticking point then, given that the 
xmin/PGPROC side on the master seems logically straightforward.  For 
some reason I thought the sync rep feature had the reverse message flow 
already going, and that some other sort of limitation just made it 
impractical to merge into the main codebase this early.  My hope was 
that just this particular part could get cherry-picked out of there, and 
that it might even have been thought about already in that context given 
the known HS keepalive "serious issue".  If there was a solution or 
partial solution in progress to that floating around, my thought was 
that just piggybacking this extra xid info on top of it would be easy 
enough.


If there's not already a standby to primary communications backchannel 
implementation available that can be harvested from that work, your 
suggestion that this may not be feasible at all for 9.0 seems like a 
more serious concern than I had thought it was going to be.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Bruce Momjian wrote:

Well, I think the choice is either you delay vacuum on the master for 8
hours or pile up 8 hours of WAL files on the slave, and delay
application, and make recovery much slower.  It is not clear to me which
option a user would prefer because the bloat on the master might be
permanent.
  


But if you're running the 8 hour report on the master right now, aren't 
you already exposed to a similar pile of bloat issues while it's going?  
If I have the choice between "sometimes queries will get canceled" vs. 
"sometimes the master will experience the same long-running transaction 
bloat issues as in earlier versions even if the query runs on the 
standby", I feel like leaning toward the latter at least leads to a 
problem people are used to. 

This falls into the principle of least astonishment category to me.  
Testing the final design for how transactions get canceled here led me 
to some really unexpected situations, and the downside for a mistake is 
"your query is lost".  Had I instead discovered that sometimes 
long-running transactions on the standby can ripple back to cause a 
maintenance slowdown on the master, that's not great.  But it would not 
have been so surprising, and it won't result in lost query results. 

I think people will expect that their queries cancel because of things 
like DDL changes.  And the existing knobs allow inserting some slack for 
things like locks taking a little bit of time to acquire sometimes.  
What I don't think people will see coming is that a routine update on an 
unrelated table is going to kill a query they might have been waiting 
hours for the result of, just because that update crossed an autovacuum 
threshold for the other table and introduced a dead row cleanup.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Greg Stark wrote:

Well you can go sit in the same corner as Simon with your high
availability servers.

I want my ability to run large batch queries without any performance
or reliability impact on the primary server.
  


Thank you for combining a small personal attack with a selfish 
commentary about how yours is the only valid viewpoint.  Saves me a lot 
of trouble replying to your messages, can just ignore them instead if 
this is how you're going to act.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Bruce Momjian wrote:

5 Early cleanup of data still visible to the current query's
  snapshot

#5 could be handled by using vacuum_defer_cleanup_age on the master.
Why is vacuum_defer_cleanup_age not listed in postgresql.conf?
  


I noticed that myself and fired off a corrective patch to Simon 
yesterday, he said it was intentional but not sure why that is yet. 
We'll sort that out.


You are correct that my suggestion is targeting primarily #5 on this 
list. There are two problems with the possible solutions using that 
parameter though:


-vacuum_defer_cleanup_age is set in a unit that people cannot be 
expected to work in--transactions ids. The UI is essentially useless, 
and there's no obvious way how to make a better one. The best you can do 
will still be really fragile.


-If you increase vacuum_defer_cleanup_age, it's active all the time. 
You're basically making every single transaction that could be cleaned 
up pay for the fact that a query *might* be running on the standby it 
needs to avoid.


You can think of the idea of passing an xmin back from the standby as 
being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no 
standby queries are running, but grows in size to match longer ones. And 
you don't have to have to know anything to set it correctly; just toggle 
on the proposed "feedback xid from the standby" feature and you're safe.


Expecting that anyone will ever set vacuum_defer_cleanup_age correctly 
in the field in its current form is pretty unreasonable I think. Since 
there's no timestamp-based memory of past xid activity, it's difficult 
to convert it to that form instead, and I think something in terms of 
time is what people would like to set this in.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Bruce Momjian wrote:

Doesn't the system already adjust the delay based on the length of slave
transactions, e.g. max_standby_delay.  It seems there is no need for a
user switch --- just max_standby_delay really high.
  


The first issue is that you're basically saying "I don't care about high 
availability anymore" when you increase max_standby_delay to a high 
value.  Want to offload an 8 hour long batch report every day to the 
standby?  You can do it with max_standby_delay=8 hours.  But the day 
your master crashes 7 hours into that, you're in for a long wait before 
your standby is available while it replays all the queued up segments.  
Your 'hot standby' has actually turned into the old form of 'cold 
standby' just when you need it to be responsive.


This is also the reason why the whole "pause recovery" idea is a 
fruitless path to wander down.  The whole point of this feature is that 
people have a secondary server available for high-availability, *first 
and foremost*, but they'd like it to do something more interesting that 
leave it idle all the time.  The idea that you can hold off on applying 
standby updates for long enough to run seriously long reports is 
completely at odds with the idea of high-availability.


The second major problem is that the day the report actually takes 8.1 
hours instead, because somebody else ran another report that slowed you 
down a little, you're screwed if that's something you depend on being 
available--it just got canceled only *after* wasting 8 hours of 
reporting resource time.


max_standby_delay is IMHO only useful for allowing non-real-time web-app 
style uses of HS (think "Facebook status updates"), where you say "I'm 
OK giving people slightly out of date info sometimes if it lets me split 
the query load over two systems".  Set max_standby_delay to a few 
seconds or maybe a minute, enough time to service a typical user query, 
make your app tolerate the occasional failed query and only send big 
ones to the master, and you've just scaled up all the small ones.  
Distributed queries with "eventual consistency" on all nodes is where 
many of the web app designs are going, and this feature is a reasonable 
match for that use case.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith

Tom Lane wrote:

I don't see a "substantial additional burden" there.  What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.
  


That is exactly the core idea I was trying to suggest in my rambling 
message.  Just that small additional bit of information transmitted and 
published to the master via that route, and it's possible to optimize 
this problem in a way not available now.  And it's a way that I believe 
will feel more natural to some users who may not be well served by any 
of the existing tuning possibilities.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A thought: should we run pgindent now?

2010-02-26 Thread Greg Smith

Bruce Momjian wrote:

The diff is here:

http://momjian.us/tmp/pgindent.diff

and I checked into CVS a copy of the typedef list I used from Andrew
Dunstan.
  


Is it worthwhile to consider writing up a "how to run pgindent like 
Bruce does" page on the wiki?  I've been scared off of that topic before 
because it always seemed like it was perilous and there were missing 
pieces.  I think we'd get some uptake from contributors doing this 
themselves before submitting patches if it were made easy enough for 
them to do.  I know I'd do it just to get rid of the accidental 
tab/space errors that always manage to sneak into non-trivial work, if I 
could automate the whole thing as a pass to do just before generating a 
final diff to submit.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-02-26 Thread Greg Smith
rious 
issue you can run into (that itself would be great to eliminate):


"Requires keep-alives with timestamps to be added to sync rep feature"

If those keep-alives flowed in both directions, and included both 
timestamps *and* xid visibility information, the master could easily be 
configured to hold open xid snapshots needed for long running queries on 
the standby when that was necessary. I might be missing an 
implementation detail here, but from a high level it seems like you 
could make the walreceiver on the master publish the information about 
where the standby has advanced to as a bit of ProcArray xmin data.  Then 
the master could only advance past where the standby says it cannot need 
visibility behind anymore.


This is a much more elegant solution than any of the hacks available so 
far.  It would turn Hot Standby + Streaming Replication into a system 
that stepped out of the way of the worst of the technical limitations of 
HS alone.  The master would easily and automatically avoid advancing 
past where the queries running on the standby needed visibility back to, 
essentially the same way cleanup is blocked during a long-running query 
on the primary--except with the actual main query work offloaded to the 
standby, the idea all along.


I don't know how difficult the keepalive feature was expected to be, and 
there's certainly plenty of potential landmines in this whole xid export 
idea.  How to handle situations where the standby goes away for a while, 
such as a network outage, so that it doesn't block the master from ever 
cleaning up dead tuples is a concern.  I wouldn't expect that to be too 
serious of a blocker, given that if the standby isn't talking it 
probably isn't running queries you need to worry about canceling 
either.  Not sure where else this can fall down, and unfortunately I 
don't know nearly enough about the SR code to help myself with 
implementing this feature.  (I think Simon is in a similar 
position--it's just not what we've been staring at the last few months).


But I do know that the current Hot Standby implementation is going to be 
frustrating to configure correctly for people.  If it's possible to make 
most of that go away just by doing some final integration between it and 
Streaming Replication that just wasn't practical to accomplish until 
now, I think it's worth considering how to make that happen before the 
final 9.0 release.


I really hope this discussion can say focused on if and how it's 
possible to improve this area, with the goal being to deliver a product 
everyone can be proud of with the full feature set that makes this next 
release a killer one. The features that have managed to all get into 
this release already are fantastic, everyone who contributed should be 
proud of that progress, and it's encouraging that the alpha4 date was 
nailed.  It would be easy to descend into finger-pointing for why 
exactly this particular problem is only getting more visibility now, or 
into schedule-oriented commentary suggesting it must be ignored because 
it's too late to do anything about it.  I hope everyone appreciates 
wandering that way will not help make PostgreSQL 9.0 a better release.  
This issue is so easy to encounter, and looks so bad when it happens, 
that I feel it could easily lead to an embarrassing situation for the 
community if something isn't done about it before release.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



hs-demo.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-02-25 Thread Greg Smith

Greg Stark wrote:

In an ideal world it would be best if pg_stop_backup could actually
print the error status of the archiving command. Is there any way for
it to get ahold of the fact that the archiving is failing?
  


This is in the area I mentioned I'd proposed a patch to improve not too 
long ago.  The archiver doesn't tell anyone anything about what it's 
doing right now, or even save its state information.  I made a proposal 
for making the bit it's currently working on (or just finished, or both) 
visible not too long ago:  
http://archives.postgresql.org/message-id/4b4fea18.5080...@2ndquadrant.com


The main content for that was tracking disk space, which wandered into a 
separate discussion, but it would be easy enough to use the information 
that intends to export ("what archive file is currently being 
processed?") and print that in the error message too.  Makes it easy 
enough for people to infer the command is failing if the same segment 
number shows up every time in that message.


I didn't finish that only because the CF kicked off and I switched out 
of new development to review.  Since this class of error keeps popping 
up, I could easily finish that patch off by next week and see if it 
helps here.  I thought it was a long overdue bit of monitoring to add to 
the database anyway, just never had the time to work on it before.



And do we have closure on whether a "fast" shutdown is hanging? Or was
that actually a smart shutdown?
  


When I tested this myself, a smart shutdown hung every time, while a 
fast one blew right through the problem--matching what's described in 
the manual.  Josh suggested at one point he might have seen a situation 
where fast shutdown wasn't sufficient to work around this and an 
immediate one was required.  Certainly possible that happened for an as 
yet unknown reason--I've seen plenty of situations where fast shutdown 
didn't work--but I haven't been able to replicate it.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-02-24 Thread Greg Smith

Tom Lane wrote:

The one thing I'm undecided about is whether we want the immediate
NOTICE, as opposed to dialing down the time till the first WARNING
to something like 5 or 10 seconds.  I think the main argument for the
latter approach would be to avoid log-spam in normal operation


I though about that for a minute, but didn't think pg_stop_backup is a 
common enough operation that anyone will complain that it's a little 
more verbose in its logging now.  I know when I was new to this, I used 
to wonder just what it was busy doing just after executing this command 
when it hung there for a while sometimes, and would have welcomed this 
extra bit of detail--preferably immediately, not even after a 5 or 10 
second delay.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-02-24 Thread Greg Smith

Tom Lane wrote:

The value of the HINT I think would be to make them (a) not afraid to
hit control-C and (b) aware of the fact that their archiver has got
a problem.
  
Agreed on both points.  Patch attached that implements something similar 
to Josh's wording, tweaking the original warning too.  Here's what it 
looks like when you run into the bad situation (which I easily simulated 
with "archive_command='/bin/false'") from the client's perspective:


gsm...@meddle:~/pgwork/src/master/src$ psql -c "select 
pg_start_backup('test')"

pg_start_backup
-
0/520
(1 row)

gsm...@meddle:~/pgwork/src/master/src$ psql
psql (9.0devel)
Type "help" for help.

gsmith=# select pg_stop_backup();
NOTICE:  pg_stop_backup cleanup done, waiting for required segments to 
archive
WARNING:  pg_stop_backup still waiting for all required segments to 
archive (60 seconds elapsed)
HINT:  Confirm your archive_command is executing successfully.  
pg_stop_backup can be aborted safely, but the resulting backup will not 
be usable.

^CCancel request sent
ERROR:  canceling statement due to user request

And this is the sort of thing that shows up in the logs with default 
logging behavior while all this is happening; you don't see the NOTICE, 
but the WARNING and HINT are both there which I think is good:


LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: /bin/false
WARNING:  transaction log file "0001" could not be 
archived: too many failures
WARNING:  pg_stop_backup still waiting for all required segments to 
archive (60 seconds elapsed)
HINT:  Confirm your archive_command is executing successfully.  
pg_stop_backup can be aborted safely, but the resulting backup will not 
be usable.


Does this solve the logging side of this?  You can still make a case for 
a more forceful pg_stop_backup, this seems to at least remove much of 
the mystery and frustration from the whole exercise.  This patch plus a 
little documentation suggesting how to recover from this issue might be 
enough.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca088b0..c09ede9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8125,6 +8125,9 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 	BackupHistoryFileName(histfilename, ThisTimeLineID, _logId, _logSeg,
 		  startpoint.xrecoff % XLogSegSize);
 
+	ereport(NOTICE,
+			(errmsg("pg_stop_backup cleanup done, waiting for required segments to archive")));
+
 	seconds_before_warning = 60;
 	waits = 0;
 
@@ -8139,8 +8142,10 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 		{
 			seconds_before_warning *= 2;		/* This wraps in >10 years... */
 			ereport(WARNING,
-	(errmsg("pg_stop_backup still waiting for archive to complete (%d seconds elapsed)",
-			waits)));
+	(errmsg("pg_stop_backup still waiting for all required segments to archive (%d seconds elapsed)",
+			waits),
+ 	 errhint("Confirm your archive_command is executing successfully.  "
+			 "pg_stop_backup can be aborted safely, but the resulting backup will not be usable.")));
 		}
 	}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-02-24 Thread Greg Smith

Josh Berkus wrote:

Thing is, if archive_command is failing, then the backup is useless
regardless until it's fixed.  And sending the archives to /dev/null (the
fix you're essentially recommending above) doesn't make the backup any
more useful.


That's not what I said to do first.  If it's possible to fix your 
archive_command, and it never returned bad "I'm saying success but I 
didn't really do the right thing" information to the server--it just 
failed--this situation is completely recoverable with no damage to the 
backup.  Just fix the archive_command, reload the configuration, and the 
queue of archived files will flow and eventually your consistent backup 
completes.  This it the only behavior someone who is trying to recover 
from a mistake  in production is likely to find acceptable, and as Simon 
has pointed out that is what the current situation is optimized for.


Only in the situation where the archive_command was so bad that it 
returned the wrong data to the server--saying the segment was saved but 
it really wasn't--did I suggest that you might as well change 
archive_command to go nowhere.  Because in that case, your backup is 
already screwed, you lost an essential piece of it.


As far your comment about treating this like it's a problem specific to 
you, did you miss the part where I pointed out I was just expressing 
concerns about poor visiblity into this area ("what is the archiver 
doing?") recently?  I'm well aware this path is full of difficult to 
escape from holes.  We just need to be careful not do something that 
screws over production users in the name of reducing the learning curve.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup does not complete

2010-02-24 Thread Greg Smith

Josh Berkus wrote:

pg_stop_backup() doesn't complete until all the WAL segments needed to
restore from the backup are archived. If archive_command is failing,
that never happens.



OK, so we need a way out of that cycle if the user is issuing
pg_stop_backup because they *already know* that archive_command is
failing.  Right now, there's no way out other than a fast shutdown,
which is a bit user-hostile.
  

gsmith=# select name,context from pg_settings where name like 'archive%';
 name   |  context  
-+

archive_command | sighup
archive_mode| postmaster
archive_timeout | sighup

I expect for your particular bad situation, you can replace the 
archive_command with a corrected one, use "pg_ctl reload" to send a 
SIGHUP to make that fix active, and escape from this.  That's the only 
right way out of this situation.  You can't just abort a backup someone 
has asked for just because archives are failing and allow the server to 
shutdown cleanly in this situation.  That's the wrong thing to do for 
production setups; the last thing you want for a system with archiving 
issues is to be stopped normally if it's interfering with an explicit 
admin requested backup.


Not necessarily any reason that backup even needs to fail, and no reason 
for the server to get restarted in this situation at all.  If the 
archive_command never returned false information, and in fact just 
returned a valid error code, all of the segments needed to make the 
backup consistent will be queued up waiting for the problem to be 
fixed.  Put the fixed archive_command in place, and you're off and 
running again.  If that's impossible, because the archive_command was 
really screwed up, we can just tell people to swap to an archive_command 
that just returns success, and let the queued up segments to be archived 
all get tossed away.  That backup will be bad, they fix the 
archive_command, send SIGHUP, and start over with a new backup.


There's some doc patches that could guide how to handle this situation 
better for sure, but I don't see any code changes needed.  Everything 
working as designed, optimized for production use at the expense of some 
confusion on how to recover if you configure things badly.


I suggested a patch a few weeks ago to make "what is the archiver 
doing?" behavior easier to monitor, got the impression people felt it 
was redundant given SR was the preferred path moving forward and 
eventually this whole archive_command bit would be going away.  I could 
revive that work if you feel this is such a bad issue that we need a 
better way to watch what the archiver is doing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOTIFY/LISTEN on read-only slave?

2010-02-17 Thread Greg Smith

Bruce Momjian wrote:

Oops, I did cleanup on the HS docs all day today in response to a doc
patch that was posted in December.  How extensive are your changes?
  


Not very--wording improvements, typos, some technical clarification 
after quizzing Simon on internals that were described in a fuzzy way.  
I'll just wait until I see your commit and then rebase on top of that, 
no big deal.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOTIFY/LISTEN on read-only slave?

2010-02-17 Thread Greg Smith

Greg Smith wrote:

Tom Lane wrote:

The explanation is wrong, but it's still disallowed.
  
What's the actual reason for the restriction then?  I did a whole 
proofreading round on the HS documentation the other day and am 
working on a patch to clean up everything I found, can add better 
notes about this to it.


Will answer my own question now, from the commit log:

"There is not yet any facility for HS slaves to receive notifications 
generated on the master,

although such a thing is possible in future."

I'll include that detail in the doc patch I'm working on.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOTIFY/LISTEN on read-only slave?

2010-02-17 Thread Greg Smith

Tom Lane wrote:

Bruce Momjian  writes:
  

Our documentation says listen/notify will return an error if executed on
the hot standby server:

	o LISTEN, UNLISTEN, NOTIFY since they currently write to system tables 

With the listen/notify system now implemented in memory, is this still

true?



The explanation is wrong, but it's still disallowed.
  


What's the actual reason for the restriction then?  I did a whole 
proofreading round on the HS documentation the other day and am working 
on a patch to clean up everything I found, can add better notes about 
this to it.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psycopg2 license changed

2010-02-15 Thread Greg Smith

Federico Di Gregorio wrote:

Even if tests and examples code aren't almost never distributed except
in the psycopg2 source package? A couple of other people contributed to
the tests: if you really feel like it is so important I'll contact them
and ask their permission to use the LGPL3 + exception (the contribution
was without the exception) or remove the code (we won't lose much.)
  


I understand that from a technical perspective these are all different 
bits.  But the sort of people who get stressed about licenses might not, 
and that's why it's always better to have a simple, standard, unified 
license that covers the entire chunk of software you're packaging.  If 
the examples show up in the source package, that means the source 
package has two licenses instead of one, and that's a bad thing.  It's 
not a huge issue, I'm just afraid that if you don't get this nailed down 
now there's just going to another round of this tedious license 
investigation in the future one day.  I'd think it's better for you and 
everyone else in the long run to just completely unify the license.


And if takes another release for the examples to get that license 
change, I think that's OK.  I wouldn't hold up the big work 
here--getting your next release out with the big LGPL3 switch for the 
main code--over this bit of trivia.  I just think it's a potential 
future headache you should try to remove when you can.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-15 Thread Greg Smith

Jakub Ouhrabka wrote:

I've found similar reports but with older versions of postgres:
http://old.nabble.com/100--of-CPU-utilization-postgres-process-tt27302021.html 



Those all looked like a FreeBSD issue, doubt it's related to yours.


The pgstat.stat is ~20MB. There are 650 databases, 140GB total.
default_statistics_target = 1000
The system is running Proxmox linux distribution. PostgreSQL is in 
OpenVZ container.


With this many databases and this high of a statistics target, running 
in a VM, suspecting autovacuum seems reasonable.  You might want to try 
setting log_autovacuum_min_duration=0 in the postgresql.conf, restarting 
or signalling (pg_ctl reload) the server, and watching just what it's 
doing.  You might need to reduce how aggressively that runs, or limit 
the higher target to only the tables that need it, to get this under 
control.  You're really pushing what you can do in a VM with this many 
databases of this size.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain buffers display units.

2010-02-15 Thread Greg Smith

Greg Stark wrote:

We do *not* display raw block numbers anywhere else. Generally I think
we should have a policy of outputing human-readable standard units of
memory whenever displaying a memory quantity. Actually I thought we
already had that policy, hence things like...
  


The first counter example I thought of is log_checkpoints which looks 
like this:


LOG: checkpoint complete: wrote 133795 buffers (25.5%); 0 transaction 
log file(s) added, 0 removed, 98 recycled; write=112.281 s, sync=108.809 
s, total=221.166 s



 Probably the XML schema should include the units as an attribute for
each tag so tools don't have to hard-code knowledge about what unit
each tag is in.
  


I don't know if it's practical at this point, but it might be helpful 
for the truly machine-targeted output formats to include specifically 
BLCKSZ somewhere in their header--just so there's a universal way to 
interpret the output even if the user tuned that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [HACKERS] psycopg2 license changed

2010-02-15 Thread Greg Smith

Federico Di Gregorio wrote:

I just wanted all interested people know that psycopg2 2.0.14 to be
released in the next few days will be under the LGPL3 + OpenSSL
exception (example code and tests under the LGPL3 alone because they are
never linked to OpenSSL).
  


Great news and I look forward to the release.  One small thing to 
consider:  having more than one license can turn into a cost to users of 
your software who are required to have each license reviewed for legal 
issues, and I'd think that maintaining two has some cost for you too.  
If it's possible for you to fold all these into a single license, that 
would really be a lot nicer.  Being able to say "psycopg2 is LGPL3 + 
OpenSSL exception", period, is much easier for people to deal with than 
having two licenses and needing to include the description you gave 
above for explanation.  Having to educate a lawyer on how linking works, 
so they understand the subtle distinction for why the two licenses 
exist, is no fun at all.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain buffers display units.

2010-02-15 Thread Greg Smith

Greg Stark wrote:

We can always continue tweak the details of the format such as adding
spaces before the units to make it similar to the pg_size_pretty().
I'm not sure I like the idea of making it exactly equivalent because
pg_size_pretty() doesn't print any decimals so it's pretty imprecise
for smaller values.
  


That's a reasonable position; I'd be fine with upgrading the 
requirements for a text scraping app to handle either "8 kB" or "1.356 
kB" if it wanted to share some code to consume either type of info, if 
all you did was throw a space in there.  I'd suggest either removing the 
PB units support from your implementation, or adding it to 
pg_size_pretty, just to keep those two routines more like one another in 
terms of what they might produce as output given the same scale of input.


Also, a quick comment in the new code explaining what you just said 
above might be helpful, just to preempt a similar "how is this different 
from pg_size_pretty?" question from popping up again one day.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch

2010-02-15 Thread Greg Smith

Leonardo F wrote:

Could at least the message:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php
be added to the TODO page, under 
"Improve CLUSTER performance by sorting to reduce

random I/O" ?
It would be sad if the patch got lost...


You should read http://wiki.postgresql.org/wiki/Submitting_a_Patch and 
follow the instructions there.  As outlined in the "Submission timing" 
section, you're asking about something during the wrong time to be doing 
so--that's why you're not getting any real feedback.  Add your patch to 
the next CommitFest by linking to your message at 
https://commitfest.postgresql.org/ and it won't get lost--it will get 
assigned a reviewer at the time when the time comes to look at 
completely new patches again.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain buffers display units.

2010-02-14 Thread Greg Smith

Greg Stark wrote:

b) Used units of memory -- I formatted them with 3 significant digits
(unless the unit is bytes or kB where that would be silly). It's just
what looked best to my eye.
  


How does this compare with what comes out of pg_size_pretty 
(src/backend/utils/adt/dbsize.c)? I already have code floating around 
that parses the output from pg_size_pretty when I'm slurping in things 
from PostgreSQL, and it's not immediately obvious to me what having a 
format that's similar to but not quite the same as that one is buying here.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation build issues on Debian/Ubuntu

2010-02-13 Thread Greg Smith
I can't seem to build the PDF version of the documentation on any of my 
Ubuntu 9.04 systems, and wonder if there's anything that can/should 
should get done about it.


The problem happens like this:

gsm...@gsmith-desktop:~/pgwork/doc/src/sgml$ make postgres-US.pdf
openjade  -D . -D . -c 
/usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d 
./stylesheet.dsl -t tex -V tex-backend -i output-print -i include-index 
-V texpdf-output -V '%paper-type%'=USletter -o postgres-US.tex-pdf 
postgres.sgml
openjade:./stylesheet.dsl:621:2:E: flow object not accepted by port; 
only display flow objects accepted

make: *** [postgres-US.tex-pdf] Segmentation fault
make: *** Deleting file `postgres-US.tex-pdf'

These "flow object not accepted by port" errors showing up and then 
causing a crash go back a long ways:  
http://archives.postgresql.org/pgsql-docs/2003-12/msg00024.php


There used to be a warning in the docs about not building with openjade 
1.4, which is certainly what I've got here:


$ openjade --version
openjade:I: "OpenJade" version "1.4devel"
openjade:I: "OpenSP" version "1.5.2"

The problems here don't seem limited to this project; I see a similar 
report of the same style of crash against some Linux kernel docbooks at 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=521148


That wanders toward suggesting it's a 32/64 bit issue, but that's not 
true here--I get the same crash on both architectures.


Any suggestions?

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    4   5   6   7   8   9   10   11   12   13   >