Re: [HACKERS] [PATCH] PL/Python: Add spidata to all spiexceptions

2013-01-27 Thread Heikki Linnakangas

On 09.01.2013 21:29, Karl O. Pinc wrote:

On 01/09/2013 01:08:39 PM, Jan Urbański wrote:

I actually still prefer to keep the signatures of
PLy_get_spi_sqlerrcode
and PLy_get_spi_error_data similar, so I'd opt for keeping the patch
as-is :)


I will send it on to a committer.


Looks good to me. Committed, thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay

2013-01-27 Thread Peter Geoghegan
On 28 January 2013 03:34, Noah Misch  wrote:
> Would you commit to the same git repository the pgbench-tools data for the
> graphs appearing in that blog post?  I couldn't readily tell what was
> happening below 16 clients due to the graphed data points blending together.

I'm afraid that I no longer have that data. Of course, I could fairly
easily recreate it, but I don't think I'll have time tomorrow. Is it
important? Are you interested in both the insert and tpc-b cases?

-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: pgbench - aggregation of info written into log

2013-01-27 Thread Tatsuo Ishii
>> On Thu, Jan 17, 2013 at 7:43 PM, Tatsuo Ishii  wrote:
>>> So if my understating is correct, 1)Tomas Vondra commits to work on
>>> Windows support for 9.4, 2)on the assumption that one of Andrew
>>> Dunstan, Dave Page or Magnus Hagander will help him in Windows
>>> development.
>>>
>>> Ok? If so, I can commit the patch for 9.3 without Windows support. If
>>> not, I will move the patch to next CF (for 9.4).
>>>
>>> Please correct me if I am wrong.
>> 
>> +1 for this approach.  I agree with Dave and Magnus that we don't want
>> Windows to become a second-class platform, but this patch isn't making
>> it so.  The #ifdef that peeks inside of an instr_time is already
>> there, and it's not Tomas's fault that nobody has gotten around to
>> fixing it before now.
> 
> Right.
> 
>> OTOH, I think that this sort of thing is quite wrong:
>> 
>> +#ifndef WIN32
>> +   "  --aggregate-interval NUM\n"
>> +   "   aggregate data over NUM seconds\n"
>> +#endif
>> 
>> The right approach if this can't be supported on Windows is to still
>> display the option in the --help output, and to display an error
>> message if the user tries to use it, saying that it is not currently
>> supported on Windows.  That fact should also be mentioned in the
>> documentation.
> 
> Agreed. This seems to be much better approach.

Here is the new patch.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 3ca120f..e8867a6 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -153,6 +153,7 @@ char	   *index_tablespace = NULL;
 
 bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
+int			agg_interval;		/* log aggregates instead of individual transactions */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -248,6 +249,18 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 } Command;
 
+typedef struct
+{
+
+	long	start_time;			/* when does the interval start */
+	int 	cnt;/* number of transactions */
+	double	min_duration;		/* min/max durations */
+	double	max_duration;
+	double	sum;/* sum(duration), sum(duration^2) - for estimates */
+	double	sum2;
+	
+} AggVals;
+
 static Command **sql_files[MAX_FILES];	/* SQL script files */
 static int	num_files;			/* number of script files */
 static int	num_commands = 0;	/* total number of Command structs */
@@ -381,6 +394,8 @@ usage(void)
 		   "  -l   write transaction times to log file\n"
 		   "  --sampling-rate NUM\n"
 		   "   fraction of transactions to log (e.g. 0.01 for 1%% sample)\n"
+		   "  --aggregate-interval NUM\n"
+		   "   aggregate data over NUM seconds\n"
 		   "  -M simple|extended|prepared\n"
 		   "   protocol for submitting queries to server (default: simple)\n"
 		   "  -n   do not run VACUUM before tests\n"
@@ -834,9 +849,25 @@ clientDone(CState *st, bool ok)
 	return false;/* always false */
 }
 
+static
+void agg_vals_init(AggVals * aggs, instr_time start)
+{
+	/* basic counters */
+	aggs->cnt = 0;		/* number of transactions */
+	aggs->sum = 0;		/* SUM(duration) */
+	aggs->sum2 = 0;		/* SUM(duration*duration) */
+
+	/* min and max transaction duration */
+	aggs->min_duration = 0;
+	aggs->max_duration = 0;
+
+	/* start of the current interval */
+	aggs->start_time = INSTR_TIME_GET_DOUBLE(start);
+}
+
 /* return false iff client should be disconnected */
 static bool
-doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile)
+doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVals * agg)
 {
 	PGresult   *res;
 	Command   **commands;
@@ -901,22 +932,74 @@ top:
 			if (sample_rate == 0.0 ||
 pg_erand48(thread->random_state) <= sample_rate)
 			{
-
 INSTR_TIME_SET_CURRENT(now);
 diff = now;
 INSTR_TIME_SUBTRACT(diff, st->txn_begin);
 usec = (double) INSTR_TIME_GET_MICROSEC(diff);
 
+/* should we aggregate the results or not? */
+if (agg_interval > 0)
+{
+	/* are we still in the same interval? if yes, accumulate the
+	* values (print them otherwise) */
+	if (agg->start_time + agg_interval >= INSTR_TIME_GET_DOUBLE(now))
+	{
+		agg->cnt += 1;
+		agg->sum  += usec;
+		agg->sum2 += usec * usec;
+
+		/* first in this aggregation interval */
+		if ((agg->cnt == 1) || (usec < agg->min_duration))
+			agg->min_duration =  usec;
+
+		if ((agg->cnt == 1) || (usec > agg->max_duration))
+			agg->max_duration = usec;
+	}
+	else
+	{
+		/* Loop until we reach the interval of the current transaction (and
+		 * print all the empty intervals in between). */
+		while (agg->start_time + agg_interval 

Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay

2013-01-27 Thread Peter Geoghegan
On 28 January 2013 03:34, Noah Misch  wrote:
> On the EBS configuration with volatile fsync timings, the variability didn't
> go away with 15s runs.  On systems with stable fsync times, 15s was no better
> than 2s.  Absent some particular reason to believe 5s is better than 2s, I
> would leave it alone.

I'm not recommending doing so because I thought you'd be likely to get
better numbers on EBS; obviously the variability you saw there likely
had a lot to do with the fact that the underlying physical machines
have multiple tenants. It has just been my observation that more
consistent figures can be obtained (on my laptop) by using a
pg_test_fsync --secs-per-test of about 5. That being the case, why
take the chance with 2 seconds? It isn't as if people run
pg_test_fsync everyday, or that they cannot set --secs-per-test to
whatever they like themselves. On the other hand, the cost of setting
it too low could be quite high now, because the absolute values (and
not just how different wal_sync_methods compare) is now important.

-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allowing privileges on untrusted languages

2013-01-27 Thread Craig Ringer
On 01/28/2013 02:15 AM, Robert Haas wrote:
>
> I am not sure whether it's really true that a capability mechanism
> could "never really satisfy" anyone.  It worked for Linux.
I have no concern about using a capabilities approach for this, but I
don't think Linux is a great example here. Linux's capabilities have
been defined in a somewhat ad-hoc fashion and a huge amount of stuff is
bundled into CAP_SYS_ADMIN. Several capabilities provide escalation
routes to root / CAP_SYS_ADMIN. See:

https://lwn.net/Articles/486306/
http://dl.packetstormsecurity.net/papers/attack/exploiting_capabilities_the_dark_side.pdf

There's nothing wrong with capability systems, it's just clear that they
need to be designed, documented and maintained carefully. Adding ad-hoc
capbilities is exactly the wrong route to take, and will lead us into
the same mess Linux is in now.
> But, I think event triggers are a credible answer, too, and they
> certainly are more flexible.
Yes,  but with the caveat that leaving security design to user triggers
will provide users with more opportunities for error - failure to think
about schemas and search_path, testing role membership via some
hacked-together queries instead of the built-in system information
functions, failure to consider SECURITY DEFINER and the effect of
session_user vs current_user, etc. Some docs on writing security
triggers and some standard triggers in an extension module would go a
long way to mitigating that, though. The appeal of the trigger based
approach is that it means core doesn't land up needing
CAP_CAN_EXECUTE_PLPERLU_ON_TUESDAYS_AFTER_MIDDAY_ON_A_FULL_MOON_IN_A_LEAPYEAR.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-27 Thread Steve Singer

On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:


Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?


I was able to get something that generated output for INSERT 
statements in a format similar to what a modified slony apply trigger 
would want.  This was with the list of tables to replicate hard-coded 
in the plugin.  This was with the patchset from the last commitfest.I 
had gotten a bit hung up on the UPDATE and DELETE support because 
slony allows you to use an arbitrary user specified unique index as 
your key.  It looks like better support for tables with a unique 
non-primary key is in the most recent patch set.  I am hoping to have 
time this weekend to update my plugin to use parameters passed in on 
the init and other updates in the most recent version.  If I make some 
progress I will post a link to my progress at the end of the weekend.  
My big issue is that I have limited time to spend on this.






A few more comments;

In decode.c DecodeDelete

+   if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
+   {
+   elog(DEBUG2, "huh, no primary key for a delete on wal_level = 
logical?");

+   return;
+   }
+

I think we should be passing delete's with candidate key data logged to 
the plugin.  If the table isn't a replicated table then ignoring the 
delete is fine.  If the table is a replicated table but someone has 
deleted the unique index from the table then the plugin will receive 
INSERT changes on the table but not DELETE changes. If this happens the 
plugin would have any way of knowing that it is missing delete changes.  
If my plugin gets passed a DELETE change record but with no key data 
then my plugin could do any of

1.  Start screaming for help (ie log errors)
2.  Drop the table from replication
3.  Pass the delete (with no key values) onto the replication client and 
let it deal with it (see 1 and 2)


Also, 'huh' isn't one of our standard log message phrases :)


How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see 
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer.  Is there a reason 
why this can't be easily added?


Also what do we want to do about TRUNCATE support.  I could always leave 
a TRUNCATE trigger in place that logged the truncate to a sl_truncates 
and have my replication daemon respond to the insert on a   sl_truncates 
table by actually truncating the data on the replica.


I've spent some time this weekend updating my prototype plugin that 
generates slony 2.2 style COPY output.  I have attached my progress here 
(also https://github.com/ssinger/slony1-engine/tree/logical_repl).  I 
have not gotten as far as modifying slon to act as a logical log 
receiver, or made a version of the slony apply trigger that would 
process these changes.  I haven't looked into the details of what is 
involved in setting up a subscription with the snapshot exporting.


I couldn't get the options on the START REPLICATION command to parse so 
I just hard coded some list building code in the init method. I do plan 
on pasing the list of tables to replicate from the replica to the plugin 
(because this list comes from the replica).   Passing what could be a 
few thousand table names as a list of arguments is a bit ugly and I 
admit my list processing code is rough.  Does this make us want to 
reconsider the format of the option_list ?


I guess should provide an opinion on if I think that the patch in this 
CF, if committed could be used to act as a source for slony instead of 
the log trigger.



The biggest missing piece I mentioned in my email yesterday, that we 
aren't logging the old primary key on row UPDATEs.  I don't see building 
a credible replication system where you don't allow users to update any 
column of a row.


The other issues I've raised (DecodeDelete hiding bad deletes, 
replication options not parsing for me) look like easy fixes


 no wal decoding support for sequences or truncate are things that I 
could work around by doing things much like slony does today.  The SYNC 
can still capture the sequence changes in  a table (where the INSERT's 
would be logged) and I can have a trigger capture truncates.


I mostly did this review from the point of view of someone trying to use 
the feature, I haven't done a line-by-line review of the code.


I suspect Andres can address these issues and get an updated patch out 
during this CF.   I think a more detailed code review by someone more 
familiar with postgres internals will reveal a handful of other issues 
that hopefully can be fixed without a lot of effort. If this were the 
only patch in the commitfest I would encourage Andres to push to get 
these changes done.  If the standard for CF4 is that a patch needs to be 
basically in a commitable stat

Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay

2013-01-27 Thread Noah Misch
On Mon, Jan 28, 2013 at 12:16:24AM +, Peter Geoghegan wrote:
> On 27 January 2013 02:31, Noah Misch  wrote:
> > I did a few more benchmarks along the spectrum.
> 
> > So that's a nice 27-53% improvement, fairly similar to the pattern for your
> > laptop pgbench numbers.
> 
> I presume that this applies to a tpc-b benchmark (the pgbench
> default). Note that the really compelling numbers that I reported in
> that blog post (where there is an increase of over 80% in transaction
> throughput at lower client counts) occur with an insert-based
> benchmark (i.e. a maximally commit-bound workload).

Correct.  The pgbench default workload is already rather friendly toward
commit_delay, so I wanted to stay away from even-friendlier tests.

Would you commit to the same git repository the pgbench-tools data for the
graphs appearing in that blog post?  I couldn't readily tell what was
happening below 16 clients due to the graphed data points blending together.

> >> !   
> >> !Since the purpose of commit_delay is to allow
> >> !the cost of each flush operation to be more effectively amortized
> >> !across concurrently committing transactions (potentially at the
> >> !expense of transaction latency), it is necessary to quantify that
> >> !cost when altering the setting.  The higher that cost is, the more
> >> !effective commit_delay is expected to be in
> >> !increasing transaction throughput.  The
> >
> > That's true for spinning disks, but I suspect it does not hold for storage
> > with internal parallelism, notably virtualized storage.  Consider an iSCSI
> > configuration with high bandwidth and high latency.  When network latency is
> > the limiting factor, will sending larger requests less often still help?
> 
> Well, I don't like to speculate about things like that, because it's
> just too easy to be wrong. That said, it doesn't immediately occur to
> me why the statement that you've highlighted wouldn't be true of
> virtualised storage that has the characteristics you describe. Any
> kind of latency at flush time means that clients idle, which means
> that the CPU is potentially not kept fully busy for a greater amount
> of wall time, where it might otherwise be kept more busy.

On further reflection, I retract the comment.  Regardless of internal
parallelism of the storage, PostgreSQL issues WAL fsyncs serially.

> > One would be foolish to run a performance-sensitive workload like those in
> > question, including the choice to have synchronous_commit=on, on spinning
> > disks with no battery-backed write cache.  A cloud environment is more
> > credible, but my benchmark showed no gain there.
> 
> In an everyday sense you are correct. It would typically be fairly
> senseless to run an application that was severely limited by
> transaction throughput like this, when a battery-backed cache could be
> used at the cost of a couple of hundred dollars. However, it's quite
> possible to imagine a scenario in which the economics favoured using
> commit_delay instead. For example, I am aware that at Facebook, a
> similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs)
> is used. Furthermore, it might not be obvious that fsync speed is an
> issue in practice. Setting commit_delay to 4,000 has seemingly no
> downside on my laptop - it *positively* affects both average and
> worse-case transaction latency - so with spinning disks, it probably
> would actually be sensible to set it and forget it, regardless of
> workload.

I agree that commit_delay is looking like a safe bet for spinning disks.

> I attach a revision that I think addresses your concerns. I've
> polished it a bit further too - in particular, my elaborations about
> commit_delay have been concentrated at the end of wal.sgml, where they
> belong. I've also removed the reference to XLogInsert, because, since
> all XLogFlush call sites are now covered by commit_delay, XLogInsert
> isn't particularly relevant.

I'm happy with this formulation.

> I have also increased the default time that pg_test_fsync runs - I
> think that the kind of variability commonly seen in its output, that
> you yourself have reported, justifies doing so in passing.

On the EBS configuration with volatile fsync timings, the variability didn't
go away with 15s runs.  On systems with stable fsync times, 15s was no better
than 2s.  Absent some particular reason to believe 5s is better than 2s, I
would leave it alone.

I'm marking this patch Ready for Committer, qualified with a recommendation to
adopt only the wal.sgml changes.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay

2013-01-27 Thread Peter Geoghegan
Hi Noah,

On 27 January 2013 02:31, Noah Misch  wrote:
> I did a few more benchmarks along the spectrum.

> So that's a nice 27-53% improvement, fairly similar to the pattern for your
> laptop pgbench numbers.

I presume that this applies to a tpc-b benchmark (the pgbench
default). Note that the really compelling numbers that I reported in
that blog post (where there is an increase of over 80% in transaction
throughput at lower client counts) occur with an insert-based
benchmark (i.e. a maximally commit-bound workload).

> Next, based on your comment about the possible value
> for cloud-hosted applications

> -clients-   -tps@commit_delay=0--tps@commit_delay=500-
> 32  1224,1391,1584  1175,1229,1394
> 64  1553,1647,1673  1544,1546,1632
> 128 1717,1833,1900  1621,1720,1951
> 256 1664,1717,1918  1734,1832,1918
>
> The numbers are all over the place, but there's more loss than gain.

I suspected that the latency of cloud storage might be relatively
poor. Since that is evidently not actually the case with Amazon EBS,
it makes sense that commit_delay isn't compelling there. I am not
disputing whether or not Amazon EBS should be considered
representative of such systems in general - I'm sure that it should
be.

> There was no appreciable
> performance advantage from setting commit_delay=0 as opposed to relying on
> commit_siblings to suppress the delay.  That's good news.

Thank you for doing that research; I investigated that the fastpath in
MinimumActiveBackends() works well myself, but it's useful to have my
findings verified.

> On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us.
> However, the sleep was consistently around 70us longer than requested.  A
> 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep.
> Mac OS X was similarly precise for short sleeps, but it could oversleep a full
> 1000us on a 35000us sleep.

Ugh.

> The beginning of this paragraph stills says "commit_delay causes a delay just
> before a synchronous commit attempts to flush WAL to disk".  Since it now
> applies to every WAL flush, that should be updated.

Agreed.

> There's a similar problem at the beginning of this paragraph; it says
> specifically, "The commit_delay parameter defines for how many microseconds
> the server process will sleep after writing a commit record to the log with
> LogInsert but before performing a LogFlush."

Right.

> As a side note, if we're ever going to recommend a fire-and-forget method for
> setting commit_delay, it may be worth detecting whether the host sleep
> granularity is limited like this.  Setting commit_delay = 20 for your SSD and
> silently getting commit_delay = 1 would make for an unpleasant surprise.

Yes, it would. Note on possible oversleeping added.

>> !   
>> !Since the purpose of commit_delay is to allow
>> !the cost of each flush operation to be more effectively amortized
>> !across concurrently committing transactions (potentially at the
>> !expense of transaction latency), it is necessary to quantify that
>> !cost when altering the setting.  The higher that cost is, the more
>> !effective commit_delay is expected to be in
>> !increasing transaction throughput.  The
>
> That's true for spinning disks, but I suspect it does not hold for storage
> with internal parallelism, notably virtualized storage.  Consider an iSCSI
> configuration with high bandwidth and high latency.  When network latency is
> the limiting factor, will sending larger requests less often still help?

Well, I don't like to speculate about things like that, because it's
just too easy to be wrong. That said, it doesn't immediately occur to
me why the statement that you've highlighted wouldn't be true of
virtualised storage that has the characteristics you describe. Any
kind of latency at flush time means that clients idle, which means
that the CPU is potentially not kept fully busy for a greater amount
of wall time, where it might otherwise be kept more busy.

> One would be foolish to run a performance-sensitive workload like those in
> question, including the choice to have synchronous_commit=on, on spinning
> disks with no battery-backed write cache.  A cloud environment is more
> credible, but my benchmark showed no gain there.

In an everyday sense you are correct. It would typically be fairly
senseless to run an application that was severely limited by
transaction throughput like this, when a battery-backed cache could be
used at the cost of a couple of hundred dollars. However, it's quite
possible to imagine a scenario in which the economics favoured using
commit_delay instead. For example, I am aware that at Facebook, a
similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs)
is used. Furthermore, it might not be obvious that fsync speed is an
issue in practice. Setting commit_delay to 4,000 has seemingly no
downside on my laptop - it *positively* af

Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread Andrew Dunstan


On 01/27/2013 06:51 PM, james wrote:
On the contrary, only a few months ago there was a far from 
groundless fear that Microsoft would do just that. Following 
considerable outcry they changed their mind. But this is definitely 
not just paranoia. As for w64 support, the mingw-64 project exists 
more or less explicitly to produce 64 bit compilers, including those 
hosted on mingw/msys.



Huh.  The only reason we have to use mingw64 or one of the assorted 
personal builds is because 'mingw support' doesn't deliver on its own, 
and last I looked there was a confusing variety of personal builds 
with various strengths and weaknesses. I managed to make some progress 
but we seem to be a ways off having a reference download (and ideally 
one with clang too I guess).


I'd very much like there to be a good reference implementation, but 
the whole mingw/mingw64 thing is indicative of some problems, and 
reminds me of egcs.


You have references to back up your statements, and demonstrate that 
it wasn't primarily FUD?  FWIW I think the higher entry prices of 
pay-for VStudio almost guarantees continued availability of a free 
compiler, though it might end up slightly crippled, but I'm not a 
product planner for MS any more than you are.



I blogged about it at the time: 
 
and here when they relented a month later: 
. 
That was based on sources I considered credible at the time. But 
frankly, I'm not going to spend a lot of time digging up old references. 
If you think it's FUD then feel free, but I was and am far from alone in 
thinking it wasn't. Given the work I've put in over the years in 
supporting PostgreSQL on Windows I don't think any suggestion I have a 
bias against Microsoft has much credibility.


As for mingw64, I still use this one: 
. 
The reason there isn't a later more official build is that they have 
screwed up their automated build process (which I also blogged about ;-) 
). I'm going to 
try to get to the bottom of that when I get a chance. But this compiler 
works perfectly well AFAICT. It's been running on one of my buildfarm 
members quite happily for quite a while.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james

On the contrary, only a few months ago there was a far from groundless fear 
that Microsoft would do just that. Following considerable outcry they changed 
their mind. But this is definitely not just paranoia. As for w64 support, the 
mingw-64 project exists more or less explicitly to produce 64 bit compilers, 
including those hosted on mingw/msys.



Huh.  The only reason we have to use mingw64 or one of the assorted 
personal builds is because 'mingw support' doesn't deliver on its own, 
and last I looked there was a confusing variety of personal builds with 
various strengths and weaknesses.  I managed to make some progress but 
we seem to be a ways off having a reference download (and ideally one 
with clang too I guess).


I'd very much like there to be a good reference implementation, but the 
whole mingw/mingw64 thing is indicative of some problems, and reminds me 
of egcs.


You have references to back up your statements, and demonstrate that it 
wasn't primarily FUD?  FWIW I think the higher entry prices of pay-for 
VStudio almost guarantees continued availability of a free compiler, 
though it might end up slightly crippled, but I'm not a product planner 
for MS any more than you are.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2013-01-27 Thread Dimitri Fontaine
Hi,

Please find attached a new version of the patch, answering to most of
your reviewing points. I'll post another version shortly with support
for pg_dump and alter owner/rename.

The main priority was to confirm that the implementation is conforming
to the rought specs and design we agreed before with Tom and Heikki, in
order to be able to adjust anything I would have misunderstood there.

Stephen Frost  writes:
> What's with removing the OBJECT_TABLESPACE case?  Given that

Merge artifact or fat fingers, something like that.

> ExtensionControlFile seemed like a good name, just changing that to
> "ExtensionControl" doesn't seem as nice, tho that's a bit of bike
> shedding, I suppose.

Yeah, well, the values in there now can be fetched from a catalog, so I
though I should reflect that change somehow. Will revert to the previous
name if that's the consensus.

> I'm not sure we have a 'dile system'... :)
> Also pretty sure we only have one catalog
> ('get_ext_ver_list_from_catalogs')

Fixed.

> 'Template' seems like a really broad term which might end up being
> associated with things beyond extensions, yet there are a number of
> places where you just use 'TEMPLATE', eg, ACL_KIND_TEMPLATE.  Seems like
> it might become an issue later.

Fixed. Any other straight TEMPLATE reference would be another error when
cleaning up the patch, though my reading tonight didn't catch'em.

> Also, no pg_dump/restore support..?  Seems like that'd be useful..

Yeah, that's the obvious next step. The design is that we absolutely
want to dump and restore those templates, that's the key here :)

> That's just a real quick run-through with my notes.  If this patch is
> really gonna go into 9.3, I'll try to take a deeper look.

Thanks for that!

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



templates.v1.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-01-27 Thread Jeff Davis
On Sat, 2013-01-26 at 23:23 -0500, Robert Haas wrote:
> > If we were to try to defer writing the WAL until the page was being
> > written, the most it would possibly save is the small XLOG_HINT WAL
> > record; it would not save any FPIs.
> 
> How is the XLOG_HINT_WAL record kept small and why does it not itself
> require an FPI?

There's a maximum of one FPI per page per cycle, and we need the FPI for
any modified page in this design regardless.

So, deferring the XLOG_HINT WAL record doesn't change the total number
of FPIs emitted. The only savings would be on the trivial XLOG_HINT wal
record itself, because we might notice that it's not necessary in the
case where some other WAL action happened to the page.

> > At first glance, it seems sound as long as the WAL FPI makes it to disk
> > before the data. But to meet that requirement, it seems like we'd need
> > to write an FPI and then immediately flush WAL before cleaning a page,
> > and that doesn't seem like a win. Do you (or Simon) see an opportunity
> > here that I'm missing?
> 
> I am not sure that isn't a win.  After all, we can need to flush WAL
> before flushing a buffer anyway, so this is just adding another case -

Right, but if we get the WAL record in earlier, there is a greater
chance that it goes out with some unrelated WAL flush, and we don't need
to flush the WAL to clean the buffer at all. Separating WAL insertions
from WAL flushes seems like a fairly important goal, so I'm a little
skeptical of a proposal to narrow that gap so drastically.

It's hard to analyze without a specific proposal on the table. But if
cleaning pages requires a WAL record followed immediately by a flush, it
seems like that would increase the number of actual WAL flushes we need
to do by a lot.

> and the payoff is that the initial access to a page, setting hint
> bits, is quickly followed by a write operation, we avoid the need for
> any extra WAL to cover the hint bit change.  I bet that's common,
> because if updating you'll usually need to look at the tuples on the
> page and decide whether they are visible to your scan before, say,
> updating one of them

That's a good point, I'm just not sure how avoid that problem without a
lot of complexity or a big cost. It seems like we want to defer the
XLOG_HINT WAL record for a short time; but not wait so long that we need
to clean the buffer or miss a chance to piggyback on another WAL flush.

> > By the way, the approach I took was to add the heap buffer to the WAL
> > chain of the XLOG_HEAP2_VISIBLE wal record when doing log_heap_visible.
> > It seemed simpler to understand than trying to add a bunch of options to
> > MarkBufferDirty.
> 
> Unless I am mistaken, that's going to heavy penalize the case where
> the user vacuums an insert-only table.  It will emit much more WAL
> than currently.

Yes, that's true, but I think that's pretty fundamental to this
checksums design (and of course it only applies if checksums are
enabled). We need to make sure an FPI is written and the LSN bumped
before we write a page.

That's why I was pushing a little on various proposals to either remove
or mitigate the impact of hint bits (load path, remove PD_ALL_VISIBLE,
cut down on the less-important hint bits, etc.). Maybe those aren't
viable, but that's why I spent time on them.

There are some other options, but I cringe a little bit thinking about
them. One is to simply exclude the PD_ALL_VISIBLE bit from the checksum
calculation, so that a torn page doesn't cause a problem (though
obviously that one bit would be vulnerable to corruption). Another is to
use a double-write buffer, but that didn't seem to go very far. Or, we
could abandon the whole thing and tell people to use ZFS/btrfs/NAS/SAN.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuuming template0

2013-01-27 Thread Tom Lane
Jeff Janes  writes:
> I thought that template0 did not need vacuuming because everything in
> it was frozen.  But it looks like it does need vacuuming, and no one
> but autovac can connect in order to do that vacuum.

Everything in it *should* be frozen, typically.  My recollection is that
we used to exclude it from autovacuuming, but decided to treat it the
same as every other database on the grounds that somebody might've
connected to it and modified something, then restored the normal
not-datallowconn marking.  (A valid reason for doing that would be to
apply some maintenance correction to the system catalogs, as we've
occasionally had to recommend in update release notes.)

> Is this a real problem?  Presumably no one systematically crashes
> their database shortly after start up on a production system; but that
> doesn't mean there are not other ways to get into the situation.

I can't get excited about this scenario.  Given that template0 should be
(a) small and (b) all-frozen already, it should not take a noticeable
amount of time for autovac to look through it once per freeze cycle.
If your database is under such stress that that can't get done, you've
got *serious* problems, probably much worse than whether template0
itself is getting processed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Simon Riggs
On 27 January 2013 17:11, Robert Haas  wrote:
> On Sun, Jan 27, 2013 at 4:17 AM, Simon Riggs  wrote:
>> On 25 January 2013 17:19, Robert Haas  wrote:
>>> We
>>> could easily run across a system where pg_class order happens to be
>>> better than anything else we come up with.
>>
>> I think you should read that back to yourself and see if you still
>> feel the word "easily" applies here.
>
> I absolutely do.

> You will not convince me that whacking around the
> behavior of autovacuum in a maintenance release is a remotely sane
> thing to do.

This is a different argument. It would be better to say this than to
come up with implausible problems as a way of rejecting something.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread Andrew Dunstan


On 01/27/2013 02:48 PM, james wrote:

Anyway, this is getting way off track. The point is that the MS SDKs and
compilers are a bit of a mess and that MinGW support is useful because
we can't rely on them continuing to offer free SDKs and compilers in 
future.


Well, more compilers are always useful, but complaining that Microsoft 
might withdraw their working compilers smacks of 'what if?' paranoia. 
What if mingw support for Win64 was (sometimes/often/always/still) a 
bit rubbish?  Oh wait ...




On the contrary, only a few months ago there was a far from groundless 
fear that Microsoft would do just that. Following considerable outcry 
they changed their mind. But this is definitely not just paranoia. As 
for w64 support, the mingw-64 project exists more or less explicitly to 
produce 64 bit compilers, including those hosted on mingw/msys.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] vacuuming template0

2013-01-27 Thread Jeff Janes
I have a stress test of the of the WAL replay which panics the
database over and over again to make sure it recovers correctly.

This is in 9.3dev.

The test was eventually freezing up because of wraparound.  The
problem was that, on fast enough hardware, the intentional crashes
were always happening before autovac could do its thing.

So I added a  periodic "bin/vacuumdb -a" command in a place where
crashes are inhibited.

It was still freezing eventually with: database is not accepting
commands to avoid wraparound data loss in database "template0"

I thought that template0 did not need vacuuming because everything in
it was frozen.  But it looks like it does need vacuuming, and no one
but autovac can connect in order to do that vacuum.

Is this a real problem?  Presumably no one systematically crashes
their database shortly after start up on a production system; but that
doesn't mean there are not other ways to get into the situation.  (I
can't think of any of them--that is is why I'm asking here)

I guess if it has been like this forever then it must not be a problem
or it would have been noticed.  But if this need to vacuum template0
arose recently, it could be a problem.  (Doing git bisect on
over-night runs is no fun, so if someone happens to know off the top
of their head...)

So, is this a real problem or purely a fantastical one, and does
anyone know how old it would be?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-27 Thread Tom Lane
Peter Geoghegan  writes:
> On 27 January 2013 18:57, Tom Lane  wrote:
>> However, this patch is not that, and mere documentation isn't going to buy a
>> thing here IMO.  Especially not user-facing documentation, as opposed
>> to something that might be in a developers' face when he's
>> copying-and-pasting code somewhere.  This patch didn't even touch the
>> one place in the documentation that might be somewhat useful from a
>> developer's standpoint, which is 49.2. Reporting Errors Within the
>> Server.

> Well, an entry should probably be added to 49.2 too, then. Why should
> documentation (of whatever kind deemed appropriate) not buy a thing?
> Don't we want to prevent the kind of problems that I describe above?
> How are people supposed to know about something that isn't written
> down anywhere? Surely documentation is better than nothing?

I don't think I said or implied that we should not have any
documentation about this.  What I'm trying to say is that I find this
approach to documentation unhelpful, both to users and developers.
It's based on the notion that there's a rigorous connection between
particular SQLSTATEs and the auxiliary fields that should be provided;
an assumption already proven false within the very tiny set of SQLSTATEs
dealt with in this first patch.  That's a connection that we probably
could have made valid if we'd been assigning SQLSTATEs with that idea
in mind from the beginning, but we didn't and now it's too late.  Future
development will almost surely expose even more inconsistencies, not be
able to get rid of them.

I think we'd be better off providing docs that say "this is what this
auxiliary field means, if it's provided" and then encourage developers
to provide it wherever that meaning applies.  We can at the same time
note something like "As of Postgres 9.3, only errors in Class 23 provide
this information", so that users (a) don't have unrealistic expectations
about what's provided, and (b) don't get the idea that the current set
of auxiliary fields is fixed.  But a SQLSTATE-by-SQLSTATE listing
doesn't seem to me to be very helpful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Dimitri Fontaine
Robert Haas  writes:
>> Well I would think that "we don't support droping several objects at
>> once yet in 9.3" is an acceptable answer.
>
> On that point, I disagree.

Ok, will provide a patch for that soon, then. I did give higher priority
to exposing more information, ok to switch. Read some more for caveats.

> I don't really know how to respond to this.  The CommitFest time is
> supposed to be when you stop working on your own stuff and work on
> reviewing other people's stuff, and yet, both this year and last year,
> you seem to have no compunctions whatever about continuing to code
> right through the CommitFest, which to me misses the point.  The

You might realize that I'm not a commiter, so commit fest is when I get
some review and a chance to react quickly so that we get enough momentum
to reach a consensus and maybe commit something. Also, it seems to me
that I've been very careful to review as many patches as I could in all
commit fests where I had a patch involved.

Also, please consider that if I only allowed myself to write code in
between commit fests, I would still have zero patches in PostgreSQL.
None. Zilch.

> CommitFest, and especially the last one, is supposed to be the time to
> commit what is committable now, not to go write new code.  And it is
> not as if I didn't make these same design points last year around this
> time.

Well, a commit fest for me is the only time when we can reach a common
design, call it a consensus and have matching code to evaluate and maybe
commit. And it's not like if I didn't rewrite this patch series several
times already from scratch to follow your design reviews.

The points where I didn't rewrite the code to match your ideas are those
that I didn't give priority to, or for which I still think my proposal
is better than yours, and so we didn't get to a consensus yet. It seems
to me to be a pretty normal state of affairs, and again, if I am to
follow your advice and never come up with new code while a commit fest
is in progress, I'd better just stop trying to contribute to PostgreSQL.

>> So, do you want to see a patch implementing that to be able to provide
>> information about DROP commands targeting more than one object and same
>> information in other cases (object id, name, schema, operation name,
>> object type)? That could be the next patch of the series, by Thursday.
>
> I think this is approximately correct as a next logical step, but are
> we really clear on what the design for that looks like?  I thought you
> were saying upthread that you wanted to make this information
> available via some kind of pseudo-table for 9.4.  In any event, adding
> a new event type (sql_drop) is a separate patch, and adding
> information to the existing types of event triggers is a separate
> patch, so you should try to do one of those two, not both.

That's exactly what I've been saying too from the beginnings, argueing
that the first step should be providing enough information to user
functions called by event triggers.

If I'm to provide a solution for the multiple object drops without any
specific information associated with firing a trigger, I don't much see
the point of the exercise.

So in my view, either we have specific information and a proper DROP
CASCADE support, or only specific information without support for
multiple DROP targets, or both of them.

Support for CASCADE with no information about the objects being DROPed?
Why would we want that?

> Doing some of it one way and some of it the other way seems kind of
> grotty, though.

It's only following existing design:

  
http://www.postgresql.org/docs/9.2/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

  39.6.6.1. Obtaining information about an error

  There are two ways to get information about the current exception in
  PL/pgSQL: special variables and the GET STACKED DIAGNOSTICS command.

And I think that having special variables for the most common cases (ID,
name, schema) and a more complex API (later) for more complex cases only
makes sense. And in 9.3, for the most complex cases, you will need to
code your Event Trigger in C and walk the parse tree.

What's wrong with that, in details?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Jeff Janes
On Thu, Jan 24, 2013 at 1:57 PM, Alvaro Herrera
 wrote:
> Hi,
>
> I have a bug pending that autovacuum fails to give priority to
> for-wraparound tables.  When xid consumption rate is high and dead tuple
> creation is also high, it is possible that some tables are waiting for
> for-wraparound vacuums that don't complete in time because the workers
> are busy processing other tables that have accumulated dead tuples; the
> system is then down because it's too near the Xid wraparound horizon.
> Apparently this is particularly notorious in connection with TOAST
> tables, because those are always put in the tables-to-process list after
> regular tables.

Is something killing off your autovacuum workers routinely, such that
they rarely reach the end of their to-do list?

Otherwise it seems like the tables would come up for vacuuming in a
cyclic fashion, staggered for each worker; and it being a circle it
shouldn't systematically matter where in it they were added.

What are the various settings for vacuuming?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-27 Thread Tom Lane
Peter Geoghegan  writes:
> On 26 January 2013 22:36, Tom Lane  wrote:
>> BTW, one thing that struck me in a quick look-through is that the
>> ERRCODE_FOREIGN_KEY_VIOLATION patches seem to inconsistently send
>> either the PK or FK rel as the "errtable".  Is this really per spec?
>> I'd have sort of expected that the reported table ought to be the one
>> that the constraint belongs to, namely the FK table.

> Personally, on the face of it I'd expect the "inconsistency" to simply
> reflect the fact that the error related to the referencing table or
> referenced table.

I looked in the spec a bit, and what I found seems to support my
recollection about this.  In SQL99, it's 19.1  that defines the usage of these fields, and I see

f) If the value of RETURNED_SQLSTATE corresponds to integrity
  constraint violation, transaction rollback - integrity
  constraint violation, or a triggered data change violation
  that was caused by a violation of a referential constraint,
  then:

  i) The values of CONSTRAINT_CATALOG and CONSTRAINT_SCHEMA are
 the  and the  of the
  of the schema containing the constraint or
 assertion. The value of CONSTRAINT_NAME is the  of the constraint or assertion.

 ii) Case:

 1) If the violated integrity constraint is a table
   constraint, then the values of CATALOG_NAME, SCHEMA_
   NAME, and TABLE_NAME are the , the
of the , and
   the  or ,
   respectively, of the table in which the table constraint
   is contained.

The notion of a constraint being "contained" in a table is a bit weird;
I guess they mean contained in the table's schema description.  Anyway
it seems fairly clear to me that it's supposed to be the table that the
constraint belongs to, and that has to be the FK table not the PK table.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-27 Thread Peter Geoghegan
On 27 January 2013 18:57, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> I think we may be talking at cross purposes here. Guarantee may have
>> been too strong a word, or the wrong word entirely. All that I really
>> want here is for there to be a coding standard instituted, so that in
>> future client code will not be broken by a failure to include some
>> field in a new ereport site that related to what is effectively the
>> same error as an existing ereport site. I'm sure you'll agree that
>> subtly breaking client code when refactoring Postgres is unacceptable.
>
> [ shrug... ] If you have a way of making a guarantee that future
> versions introduce no new bugs, patent it and you'll soon have all the
> money in the world.

Is that kind of sarcasm really necessary?

Certain sets of ereport call sites within Postgres relate to either
very similar errors (i.e. ereports that have the same errcode) or
arguably identical errors (e.g. the various ERRCODE_CHECK_VIOLATION
sites within nbtinsert.c, that have identical error texts). It would
seem quite unfortunate to me if client code was to break based only on
an internal implementation detail that differed between Postgres
versions, or based on the current phase of the moon. This is the kind
of problem that I'd hoped to prevent by documenting a set of required
fields for a small number of errcodes going forward.

Now, you could take the view that all of this is only for the purposes
of error handling, and it isn't terribly critical that things work
very reliably. That isn't my view, though.

> It's conceivable that we could adapt some static checker to look for
> ereport calls that mention particular ERRCODEs and lack particular
> helper functions, but even that wouldn't be a cast-iron guarantee
> because of the possibility of call sites using non-constant errcode
> values.  It'd probably be good enough in practice though.

I thought about ways of doing that, but it didn't seem worth pursuing right now.

> However, this patch is not that, and mere documentation isn't going to buy a
> thing here IMO.  Especially not user-facing documentation, as opposed
> to something that might be in a developers' face when he's
> copying-and-pasting code somewhere.  This patch didn't even touch the
> one place in the documentation that might be somewhat useful from a
> developer's standpoint, which is 49.2. Reporting Errors Within the
> Server.

Well, an entry should probably be added to 49.2 too, then. Why should
documentation (of whatever kind deemed appropriate) not buy a thing?
Don't we want to prevent the kind of problems that I describe above?
How are people supposed to know about something that isn't written
down anywhere? Surely documentation is better than nothing?

> At some point we might want to undertake a round of refactoring that
> makes this type of information available; and once we do, we'd probably
> want to expose it in the regular message texts not just the auxiliary
> fields.  I think that sort of work is out-of-scope for this patch
> though.  IMO what we should be doing here is getting the infrastructure
> in place, and then decorating some basic set of messages with aux info
> in places where not a lot of new code is needed to make that happen.
> Extending the decoration beyond that is material for future work.

Fair enough.

-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-27 Thread Phil Sorber
On Sun, Jan 27, 2013 at 2:38 PM, Phil Sorber  wrote:
> On Fri, Jan 25, 2013 at 11:20 AM, Fujii Masao  wrote:
>> On Fri, Jan 25, 2013 at 4:10 AM, Phil Sorber  wrote:
>>> On Thu, Jan 24, 2013 at 1:12 PM, Fujii Masao  wrote:
 set_pglocale_pgservice() should be called?

 I think that the command name (i.e., pg_isready) should be given to
 PQpingParams() as fallback_application_name. Otherwise, the server
 by default uses "unknown" as the application name of pg_isready.
 It's undesirable.

 Why isn't the following message output only when invalid option is
 specified?

 Try \"%s --help\" for more information.
>>>
>>> I've updated the patch to address these three issues. Attached.
>>>

 When the conninfo string including the hostname or port number is
 specified in -d option, pg_isready displays the wrong information
 as follows.

 $ pg_isready -d "port="
 /tmp:5432 - no response

>>>
>>> This is what i asked about in my previous email about precedence of
>>> the parameters. I can parse that with PQconninfoParse, but what are
>>> the rules for merging both individual and conninfo params together?
>>
>> If I read conninfo_array_parse() correctly, PQpingParams() prefer the
>> option which is set to its keyword array later.
>
> It would be really nice to expose conninfo_array_parse() or some
> wrapped version directly to a libpq consumer. Otherwise, I need to
> recreate this behavior in pg_isready.c.
>
> Thoughts on adding:
>   PQconninfoOption *PQparamsParse(const char **keywords, const char
> **values, char **errmsg, bool use_defaults, int expand_dbname)
> or similar?
>
> Or perhaps there is a better way to accomplish this that I am not aware of?
>

It would also be nice to be able to pass user_defaults to PQconninfoParse().

>>
>> Regards,
>>
>> --
>> Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james

Anyway, this is getting way off track. The point is that the MS SDKs and
compilers are a bit of a mess and that MinGW support is useful because
we can't rely on them continuing to offer free SDKs and compilers in future.


Well, more compilers are always useful, but complaining that Microsoft 
might withdraw their working compilers smacks of 'what if?' paranoia. 
What if mingw support for Win64 was (sometimes/often/always/still) a bit 
rubbish?  Oh wait ...






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-27 Thread Phil Sorber
On Fri, Jan 25, 2013 at 11:20 AM, Fujii Masao  wrote:
> On Fri, Jan 25, 2013 at 4:10 AM, Phil Sorber  wrote:
>> On Thu, Jan 24, 2013 at 1:12 PM, Fujii Masao  wrote:
>>> set_pglocale_pgservice() should be called?
>>>
>>> I think that the command name (i.e., pg_isready) should be given to
>>> PQpingParams() as fallback_application_name. Otherwise, the server
>>> by default uses "unknown" as the application name of pg_isready.
>>> It's undesirable.
>>>
>>> Why isn't the following message output only when invalid option is
>>> specified?
>>>
>>> Try \"%s --help\" for more information.
>>
>> I've updated the patch to address these three issues. Attached.
>>
>>>
>>> When the conninfo string including the hostname or port number is
>>> specified in -d option, pg_isready displays the wrong information
>>> as follows.
>>>
>>> $ pg_isready -d "port="
>>> /tmp:5432 - no response
>>>
>>
>> This is what i asked about in my previous email about precedence of
>> the parameters. I can parse that with PQconninfoParse, but what are
>> the rules for merging both individual and conninfo params together?
>
> If I read conninfo_array_parse() correctly, PQpingParams() prefer the
> option which is set to its keyword array later.

It would be really nice to expose conninfo_array_parse() or some
wrapped version directly to a libpq consumer. Otherwise, I need to
recreate this behavior in pg_isready.c.

Thoughts on adding:
  PQconninfoOption *PQparamsParse(const char **keywords, const char
**values, char **errmsg, bool use_defaults, int expand_dbname)
or similar?

Or perhaps there is a better way to accomplish this that I am not aware of?

>
> Regards,
>
> --
> Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Jeff Janes
On Fri, Jan 25, 2013 at 10:02 AM, Robert Haas  wrote:
>
> I'm worried about the case of a very, very frequently updated table
> getting put ahead of a table that needs a wraparound vacuum, but only
> just.  It doesn't sit well with me to think that the priority of that
> goes from 0 (we don't even try to update it) to infinity (it goes
> ahead of all tables needing to be vacuumed for dead tuples) the
> instant we hit the vacuum_freeze_table_age.

What if it were the instant we hit autovacuum_freeze_max_age, not
vacuum_freeze_table_age?  Or does the current behavior already do
this?  Which process is responsible for enforcing
autovacuum_freeze_max_age?


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Jeff Janes
On Fri, Jan 25, 2013 at 9:19 AM, Robert Haas  wrote:
>
> I think that to do this right, we need to consider not only the status
> quo but the trajectory.  For example, suppose we have two tables to
> process, one of which needs a wraparound vacuum and the other one of
> which needs dead tuples removed.  If the table needing the wraparound
> vacuum is small and just barely over the threshold, it isn't urgent;

But it being small, it also won't take long to vacuum.  Why not just do it?

> but if it's large and way over the threshold, it's quite urgent.
> Similarly, if the table which needs dead tuples removed is rarely
> updated, postponing vacuum is not a big deal, but if it's being
> updated like crazy, postponing vacuum is a big problem.

I don't see this as being the case.  If it is being updated like
crazy, it doesn't matter whether it meets the threshold to have tuples
removed *right at the moment* or not.  It will meet that threshold
soon.  If you can't keep up with that need with your current settings,
you have a steady-state problem.  Changing the order, or not changing
the order, isn't going to make a whole lot of difference, you need to
overcome the steady-state problem.

> Categorically
> putting autovacuum wraparound tables ahead of everything else seems
> simplistic, and thinking that more dead tuples is more urgent than
> fewer dead tuples seems *extremely* simplistic.
>
> I ran across a real-world case where a user had a small table that had
> to be vacuumed every 15 seconds to prevent bloat.  If we change the
> algorithm in a way that gives other things priority over that table,

Eventually an anti-wrap around is going to be done, and once it starts
it does have priority, because things already underway don't get
preempted.  Have they ever reached that point?  Did it cause problems?

> then that user could easily get hosed when they install a maintenance
> release containing this change.

Yeah, I don't know that back-patching is a good idea, or at least not soon.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 12:57 PM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>>> The current patch implementation is to fill in the object id, name and
>>> schema with NULL when we have something else than a single object as the
>>> target. I did that when I realized we have a precedent with statement
>>> triggers and that we would maybe share the implementation of the "record
>>> set variable" facility for PLs here.
>>
>> But I don't see how *that's* going to be any good for logical
>> replication either.  If the raw query string isn't useful, getting the
>> OID in some cases but not all surely can't be any better.
>
> Well I would think that "we don't support droping several objects at
> once yet in 9.3" is an acceptable answer.

On that point, I disagree.

>>> So we have two proposals here:
>>>
>>>   - Have the cascading drop calls back to process utility with a new
>>> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
>>> you only stuff the ObjectAdress, and provide event triggers support
>>> for this new cascade command;
>>>
>>>   - Implement a new event called "sql_drop" where you provide the same
>>> amount of information than in a ddl_command event (same lookups,
>>> etc), but without any parsetree nor I suppose the command string
>>> that the user would have to type to drop just that object.
>>>
>>> You objected to the first on modularity violation grounds, and on the
>>> second on development cycle timing grounds. And now you're saying that
>>> because we don't have a practical solution, I'm not sure, apparently
>>> it's dead, but what is?
>>>
>>> Please help me decipher your process of thoughs and conclusions.
>>
>> OK, so I object to the first one on modularity grounds (as did Tom).
>> I don't object to the second one at all except that, AFAIK, nobody's
>> written the code yet.  Maybe I'm misunderstanding something.
>
> I mean object as in "not in 9.3, too late", which you seem to me
> confirming here. So what do we want to provide in 9.3 exactly?
>
> Note that I have enough time allocated on finishing that patch that I
> surely can come up with an implementation of whatever we finally decide
> within this commit fest, if the implementation is straightforward
> enough. As I've been playing in and out around those topics for 2 years
> now, plenty of things are a SMOP now: the problem is reaching an
> agreement about how to implement things.

I don't really know how to respond to this.  The CommitFest time is
supposed to be when you stop working on your own stuff and work on
reviewing other people's stuff, and yet, both this year and last year,
you seem to have no compunctions whatever about continuing to code
right through the CommitFest, which to me misses the point.  The
CommitFest, and especially the last one, is supposed to be the time to
commit what is committable now, not to go write new code.  And it is
not as if I didn't make these same design points last year around this
time.

On the other hand, if it's really true that you can bang this out in a
couple of days and come up with something committable, will I commit
it?  Yeah, probably.  But I think that is likely to take several
months of work, not several days, and the longer we go on with this
patch the less time there is for me to review other things that might
be a whole lot more ready to go than code that isn't written yet.

>> Actually, I think we could probably use a similar set of
>> infrastructure *either* to implement sql_drop OR to provide the
>> information to ddl_command_end.  What I'm thinking about is that we
>> might have something sort of like after-trigger queue that we use for
>> ordinary triggers.  Instead of calling a trigger as the drops happen,
>> we queue up a bunch of events that say "xyz got dropped".  And then,
>> we can either fire something like ddl_command_end (and pass in the
>> whole list as an array or a table) or we can call something like
>> sql_drop (n times, passing details one one object per invocation).  In
>> either approach, the triggers fire *after* the drops rather than in
>> the middle of them - it's just a question of whether you want one call
>> for all the drops or one call for each drop.
>
> So, do you want to see a patch implementing that to be able to provide
> information about DROP commands targeting more than one object and same
> information in other cases (object id, name, schema, operation name,
> object type)? That could be the next patch of the series, by Thursday.

I think this is approximately correct as a next logical step, but are
we really clear on what the design for that looks like?  I thought you
were saying upthread that you wanted to make this information
available via some kind of pseudo-table for 9.4.  In any event, adding
a new event type (sql_drop) is a separate patch, and adding
information to the existing types of event triggers is a separate
patch, so you should try to do one of those two, not both.

>> Well, the po

Re: [HACKERS] enhanced error fields

2013-01-27 Thread Tom Lane
Peter Geoghegan  writes:
> On 26 January 2013 22:36, Tom Lane  wrote:
>> I'm inclined to remove the "requirements" business altogether and just
>> document that these fields may be supplied, or words to that effect.

> I think we may be talking at cross purposes here. Guarantee may have
> been too strong a word, or the wrong word entirely. All that I really
> want here is for there to be a coding standard instituted, so that in
> future client code will not be broken by a failure to include some
> field in a new ereport site that related to what is effectively the
> same error as an existing ereport site. I'm sure you'll agree that
> subtly breaking client code when refactoring Postgres is unacceptable.

[ shrug... ] If you have a way of making a guarantee that future
versions introduce no new bugs, patent it and you'll soon have all the
money in the world.

It's conceivable that we could adapt some static checker to look for
ereport calls that mention particular ERRCODEs and lack particular
helper functions, but even that wouldn't be a cast-iron guarantee
because of the possibility of call sites using non-constant errcode
values.  It'd probably be good enough in practice though.  However,
this patch is not that, and mere documentation isn't going to buy a
thing here IMO.  Especially not user-facing documentation, as opposed
to something that might be in a developers' face when he's
copying-and-pasting code somewhere.  This patch didn't even touch the
one place in the documentation that might be somewhat useful from a
developer's standpoint, which is 49.2. Reporting Errors Within the
Server.

>> A lot of that looks pretty broken to me, eg the changes in
>> ExecEvalCoerceToDomain are just hokum.  (Even if the expression is
>> executing in a statement that has a result table, there's no very good
>> reason to think that the error has anything to do with the result
>> table.)

> I must admit that that particular change wasn't very well thought out.
> I was trying to appease Stephen, who seemed to think that having a
> table_name where it was in principle available was particularly
> important.

Well, if we can get an accurate and relevant value then I'm all for
adding it.  But field values that are misleading or even plain wrong
in corner cases are not an improvement.

At some point we might want to undertake a round of refactoring that
makes this type of information available; and once we do, we'd probably
want to expose it in the regular message texts not just the auxiliary
fields.  I think that sort of work is out-of-scope for this patch
though.  IMO what we should be doing here is getting the infrastructure
in place, and then decorating some basic set of messages with aux info
in places where not a lot of new code is needed to make that happen.
Extending the decoration beyond that is material for future work.

>> BTW, one thing that struck me in a quick look-through is that the
>> ERRCODE_FOREIGN_KEY_VIOLATION patches seem to inconsistently send
>> either the PK or FK rel as the "errtable".  Is this really per spec?
>> I'd have sort of expected that the reported table ought to be the one
>> that the constraint belongs to, namely the FK table.

> Personally, on the face of it I'd expect the "inconsistency" to simply
> reflect the fact that the error related to the referencing table or
> referenced table. Pavel's original patch followed the same convention
> (though it also had a constraint_table field). I'm having a hard time
> figuring out the standards intent here, and I'm not sure that we
> should even care, because that applies on to GET DIAGNOSTICS, which
> isn't really the same thing as what we have here. I defer to you,
> though - it's not as if I feel too strongly about it.

A large part of the argument for doing this patch at all is to satisfy
the standard's requirements for information reported to a client.
(I believe that GET DIAGNOSTICS is in the end a client-side requirement,
ie in principle ecpg or similar library should be able to implement
it based on what the server reports.)  So to the extent that the spec
defines what should be in the fields, we need to follow that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-01-27 Thread Alexander Korotkov
On Fri, Jan 25, 2013 at 11:47 AM, Erik Rijkers  wrote:

> On Wed, January 23, 2013 08:36, Alexander Korotkov wrote:
> > Hi!
> >
> > Some quick answers to the part of notes/issues. I will provide rest of
> > answers soon.
> >
> [...]
> > trgm-regexp-0.10.patch.gz27 k
>
> Trying to build this I get, after 'make install' in contrib/ :
>
> /usr/bin/install: cannot stat `./pg_trgm--1.1.sql': No such file or
> directory
> /usr/bin/install: cannot stat `./pg_trgm--1.0--1.1.sql': No such file or
> directory
> make[1]: *** [install] Error 1
> make: *** [install-pg_trgm-recurse] Error 2
>

Forgot to include these files into patch.

Another changes in new version of patch:
1) Get rid of recursion in trigramsMatchGraph and addKeys.
2) Both bos[0] and eos[0] are also included into checks.
3) Get rid of strnlen.
4) I found the way to fix work with collation in previous version of patch
to be wrong. Collation of operator must match collation of indexed column
for index scan. Only thing to fix is passing collation in gincost_pattern.
5) Some tests were added which improves the coverage.

Now I'm working on additional comments.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.11.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allowing privileges on untrusted languages

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 1:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jan 25, 2013 at 2:59 PM, Kohei KaiGai  wrote:
>>> 2013/1/20 Tom Lane :
 The traditional answer to that, which not only can be done already in
 all existing releases but is infinitely more flexible than any
 hard-wired scheme we could implement, is that you create superuser-owned
 security-definer functions that can execute any specific operation you
 want to allow, and then GRANT EXECUTE on those functions to just the
 people who should have it.
>
>> This is valid, but I think that the people who want this functionality
>> are less interest in avoiding bugs in trusted procedures than they are
>> in avoiding the necessity for the user to have to learn the local
>> admin-installed collection of trusted procedures.
>
> Sure, but given that we are working on event triggers, surely the
> correct solution is to make sure that user-provided event triggers can
> cover permissions-checking requirements, rather than to invent a whole
> new infrastructure that's guaranteed to never really satisfy anybody.

I am not sure whether it's really true that a capability mechanism
could "never really satisfy" anyone.  It worked for Linux.

But, I think event triggers are a credible answer, too, and they
certainly are more flexible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 6:30 AM, Josh Berkus  wrote:
> So while testing some replication stuff on 9.2.2 I discovered that it's
> completely possible to connect a replica to itself.  Seems like we ought
> to at least be able to detect and log *that*.

We could certainly alter the protocol so that it can detect that
situation, but like Simon, I dowanna.  I rarely get the chance to
agree wholeheartedly with Simon, so let me just take a moment to revel
in it here: you have discovered a non-problem problem.  Sure, if you
do that, nothing useful will happen.  But there are lots of non-useful
things in the world you can do, and it is neither practical nor
sensible to try to prevent them all.  And again, yes, you could do
that by accident when you meant to do something more sane, but again,
there are any number of other ways to accidentally do something truly
worthless.

If we're going to start installing safeguards against doing stupid
things, there's a long list of scenarios that happen far more
regularly than this ever will and cause far more damage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allowing privileges on untrusted languages

2013-01-27 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 25, 2013 at 2:59 PM, Kohei KaiGai  wrote:
>> 2013/1/20 Tom Lane :
>>> The traditional answer to that, which not only can be done already in
>>> all existing releases but is infinitely more flexible than any
>>> hard-wired scheme we could implement, is that you create superuser-owned
>>> security-definer functions that can execute any specific operation you
>>> want to allow, and then GRANT EXECUTE on those functions to just the
>>> people who should have it.

> This is valid, but I think that the people who want this functionality
> are less interest in avoiding bugs in trusted procedures than they are
> in avoiding the necessity for the user to have to learn the local
> admin-installed collection of trusted procedures.

Sure, but given that we are working on event triggers, surely the
correct solution is to make sure that user-provided event triggers can
cover permissions-checking requirements, rather than to invent a whole
new infrastructure that's guaranteed to never really satisfy anybody.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Dimitri Fontaine
Robert Haas  writes:
>> The current patch implementation is to fill in the object id, name and
>> schema with NULL when we have something else than a single object as the
>> target. I did that when I realized we have a precedent with statement
>> triggers and that we would maybe share the implementation of the "record
>> set variable" facility for PLs here.
>
> But I don't see how *that's* going to be any good for logical
> replication either.  If the raw query string isn't useful, getting the
> OID in some cases but not all surely can't be any better.

Well I would think that "we don't support droping several objects at
once yet in 9.3" is an acceptable answer. And for DROP, it's a special
case anyway as what happens is that you *stop* receiving any data from
those tables/objects, so the problem is one of cleanup.

>> So we have two proposals here:
>>
>>   - Have the cascading drop calls back to process utility with a new
>> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
>> you only stuff the ObjectAdress, and provide event triggers support
>> for this new cascade command;
>>
>>   - Implement a new event called "sql_drop" where you provide the same
>> amount of information than in a ddl_command event (same lookups,
>> etc), but without any parsetree nor I suppose the command string
>> that the user would have to type to drop just that object.
>>
>> You objected to the first on modularity violation grounds, and on the
>> second on development cycle timing grounds. And now you're saying that
>> because we don't have a practical solution, I'm not sure, apparently
>> it's dead, but what is?
>>
>> Please help me decipher your process of thoughs and conclusions.
>
> OK, so I object to the first one on modularity grounds (as did Tom).
> I don't object to the second one at all except that, AFAIK, nobody's
> written the code yet.  Maybe I'm misunderstanding something.

I mean object as in "not in 9.3, too late", which you seem to me
confirming here. So what do we want to provide in 9.3 exactly?

Note that I have enough time allocated on finishing that patch that I
surely can come up with an implementation of whatever we finally decide
within this commit fest, if the implementation is straightforward
enough. As I've been playing in and out around those topics for 2 years
now, plenty of things are a SMOP now: the problem is reaching an
agreement about how to implement things.

> Actually, I think we could probably use a similar set of
> infrastructure *either* to implement sql_drop OR to provide the
> information to ddl_command_end.  What I'm thinking about is that we
> might have something sort of like after-trigger queue that we use for
> ordinary triggers.  Instead of calling a trigger as the drops happen,
> we queue up a bunch of events that say "xyz got dropped".  And then,
> we can either fire something like ddl_command_end (and pass in the
> whole list as an array or a table) or we can call something like
> sql_drop (n times, passing details one one object per invocation).  In
> either approach, the triggers fire *after* the drops rather than in
> the middle of them - it's just a question of whether you want one call
> for all the drops or one call for each drop.

So, do you want to see a patch implementing that to be able to provide
information about DROP commands targeting more than one object and same
information in other cases (object id, name, schema, operation name,
object type)? That could be the next patch of the series, by Thursday.

> Well, the point is that if you have a function that maps a parse tree
> onto an object name, any API or ABI changes can be reflected in an
> updated definition for that function.  So suppose I have the command
> "CREATE TABLE public.foo (a int)".  And we have a call
> pg_target_object_namespace(), which will return "public" given the
> parse tree for the foregoing command.  And we have a call
> pg_target_object_name(), which will return "foo".  We can whack around
> the underlying parse tree representation all we want and still not
> break anything - because any imaginable parse tree representation will
> allow the object name and object namespace to be extracted.  Were that
> not possible it could scarcely be called a parse tree any longer.

In my view, we have no DDL operation where we don't want to be able to
get at the Object ID, name and schema of the operation. So while more
complex cases might well require a full function based API, I think a
set of magic variables TG_* for basic information is going to make life
easier for everybody.

Then, more complex cases are left alone in 9.3 and we have whole
development cycle to address them, with some data type and a set of
functions maybe.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-27 Thread Pavel Stehule
2013/1/27 Tom Lane :
> Craig Ringer  writes:
>> That's what it sounds like - confirming that PostgreSQL is really fully
>> shut down.
>
>> I'm not sure how you could do that over a protocol connection, myself.
>> I'd just read the postmaster pid from the pidfile on disk and then `kill
>> -0` it in a delay loop until the `kill` command returns failure. This
>> could be a useful convenience utility but I'm not convinced it should be
>> added to pg_isready because it requires local and possibly privileged
>> execution, unlike pg_isready's network based operation. Privileges could
>> be avoided by using an aliveness test other than `kill -0`, but you
>> absolutely have to be local to verify that the postmaster has fully
>> terminated - and it wouldn't make sense for a non-local process to care
>> about this anyway.
>
> This problem is actually quite a bit more difficult than it looks.
> In particular, the mere fact that the postmaster process is gone does
> not prove that the cluster is idle: it's possible that the postmaster
> crashed leaving orphan backends behind, and the orphans are still busily
> modifying on-disk state.  A real postmaster knows how to check for that
> (by looking at the nattch count of the shmem segment cited in the old
> lockfile) but I can't see any shell script getting it right.
>
> So ATM I wouldn't trust any method short of "try to start a new
> postmaster and see if it works", which of course is not terribly helpful
> if your objective is to get to a stopped state.
>
> We could consider transposing the shmem logic into a new pg_ctl command.
> It might be better though to have a new switch in the postgres
> executable that just runs postmaster startup as far as detecting
> lockfile conflicts, and reports what it found (without ever launching
> any child processes that could confuse matters).  Then "pg_ctl isdone"
> could be a frontend for that, instead of duplicating logic.
>

+1

Pavel

> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-27 Thread Tom Lane
Craig Ringer  writes:
> That's what it sounds like - confirming that PostgreSQL is really fully
> shut down.

> I'm not sure how you could do that over a protocol connection, myself.
> I'd just read the postmaster pid from the pidfile on disk and then `kill
> -0` it in a delay loop until the `kill` command returns failure. This
> could be a useful convenience utility but I'm not convinced it should be
> added to pg_isready because it requires local and possibly privileged
> execution, unlike pg_isready's network based operation. Privileges could
> be avoided by using an aliveness test other than `kill -0`, but you
> absolutely have to be local to verify that the postmaster has fully
> terminated - and it wouldn't make sense for a non-local process to care
> about this anyway.

This problem is actually quite a bit more difficult than it looks.
In particular, the mere fact that the postmaster process is gone does
not prove that the cluster is idle: it's possible that the postmaster
crashed leaving orphan backends behind, and the orphans are still busily
modifying on-disk state.  A real postmaster knows how to check for that
(by looking at the nattch count of the shmem segment cited in the old
lockfile) but I can't see any shell script getting it right.

So ATM I wouldn't trust any method short of "try to start a new
postmaster and see if it works", which of course is not terribly helpful
if your objective is to get to a stopped state.

We could consider transposing the shmem logic into a new pg_ctl command.
It might be better though to have a new switch in the postgres
executable that just runs postmaster startup as far as detecting
lockfile conflicts, and reports what it found (without ever launching
any child processes that could confuse matters).  Then "pg_ctl isdone"
could be a frontend for that, instead of duplicating logic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v4

2013-01-27 Thread Steve Singer

On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication



Exactly what is the syntax for using that.  My reading your changes to 
repl_gram.y make me think that any of the following should work (but 
they don't).


START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
 ERROR:  syntax error: unexpected character "("

"START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
 ERROR:  syntax error: unexpected character "("

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR:  syntax error: unexpected character "("

I'm also attaching a patch to pg_receivellog that allows you to specify 
these options on the command line.  I'm not saying I think that it is 
appropriate to be adding more bells and whistles to the utilities  two 
weeks into the CF but I found this useful for testing so I'm sharing it.






Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services





>From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001
From: Steve Singer 
Date: Sun, 27 Jan 2013 12:24:33 -0500
Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin

---
 src/bin/pg_basebackup/pg_receivellog.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
index 04bedbe..30b3cea 100644
--- a/src/bin/pg_basebackup/pg_receivellog.c
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -54,7 +54,7 @@ static XLogRecPtr	startpos;
 static bool do_init_slot = false;
 static bool do_start_slot = false;
 static bool do_stop_slot = false;
-
+static const char * plugin_opts="";
 
 static void usage(void);
 static void StreamLog();
@@ -84,6 +84,7 @@ usage(void)
 	printf(_("  -s, --status-interval=INTERVAL\n"
 			 " time between status packets sent to server (in seconds)\n"));
 	printf(_("  -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n"));
+	printf(_("  -o --options=OPTIONS   A comma separated list of options to the plugin\n"));
 	printf(_("\nAction to be performed:\n"));
 	printf(_("  --init initiate a new replication slot (for the slotname see --slot)\n"));
 	printf(_("  --startstart streaming in a replication slot (for the slotname see --slot)\n"));
@@ -264,8 +265,8 @@ StreamLog(void)
 slot);
 
 	/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
-			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts);
 	res = PQexec(conn, query);
 	if (PQresultStatus(res) != PGRES_COPY_BOTH)
 	{
@@ -560,6 +561,7 @@ main(int argc, char **argv)
 		{"init", no_argument, NULL, 1},
 		{"start", no_argument, NULL, 2},
 		{"stop", no_argument, NULL, 3},
+		{"options",required_argument,NULL,'o'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -584,7 +586,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:",
+	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:o:",
 			long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -659,6 +661,10 @@ main(int argc, char **argv)
 			case 3:
 do_stop_slot = true;
 break;
+			case 'o':
+if(optarg != NULL)
+	plugin_opts = pg_strdup(optarg);
+break;
 /* action */
 
 			default:
-- 
1.7.0.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 12:08 PM, Steve Singer  wrote:
> On 13-01-26 11:11 PM, Robert Haas wrote:
>> On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
>>  wrote:
>>>
>>> My understanding is that if the command string we give to event triggers
>>> is ambiguous (sub-object names, schema qualifications, etc), it comes
>>> useless for logical replication use. I'll leave it to the consumers of
>>> that to speak up now.
>>
>> Yeah, that's probably true.  I think it might be useful for other
>> purposes, but I think we need a bunch of infrastructure we don't have
>> yet to make logical replication of DDL a reality.
>
> I agree.  Does anyone have a specific use case other than DDL replication
> where an ambiguous command string would be useful? Even for use cases like
> automatically removing a table from replication when it is dropped, I would
> want to be able to determine which table is being dropped unambiguously.
> Could I determine that from an oid? I suspect so, but parsing a command
> string and then trying to figure out the table from the search_path doesn't
> sound very appealing.

I was thinking about logging or auditing applications - e.g. log all
command strings whose first word is "drop".  We do get customers
coming to us with that sort of request from time to time.

>> Well, the point is that if you have a function that maps a parse tree
>> onto an object name, any API or ABI changes can be reflected in an
>> updated definition for that function.  So suppose I have the command
>> "CREATE TABLE public.foo (a int)".  And we have a call
>> pg_target_object_namespace(), which will return "public" given the
>> parse tree for the foregoing command.  And we have a call
>> pg_target_object_name(), which will return "foo".  We can whack around
>> the underlying parse tree representation all we want and still not
>> break anything - because any imaginable parse tree representation will
>> allow the object name and object namespace to be extracted.  Were that
>> not possible it could scarcely be called a parse tree any longer.
>
> How do you get the fully qualified type of the first column?
> col1=pg_target_get_column(x, 0)
> pg_target_get_type(col1);
>
> or something similar.

Or maybe return all the data for all the columns as an array of records...

> I think that could work but we would be adding a lot of API functions to get
> all the various bits of info one would want the API to expose.

Yeah, that's the principle disadvantage of this method, AFAICS.  OTOH,
I am not sure anything else we might do is any better.   You can avoid
the complexity of defining and specifying all that interface by just
taking the parse tree and running nodeToString() on it and lobbing it
over the fence, but that is basically abdicating the responsibility of
defining a sane interface in favor of just chucking whatever you
happen to have handy over the wall and hoping the user is OK with
calling that an interface.  Assuming you think that's a bad idea (and
Tom and I do, at least), you've got to put in the effort to design
something, and that thing, however constituted, is going to be fairly
extensive.

> I also
> suspect executing triggers that had to make lots of function calls to walk a
> tree would be much slower than an extension that could just walk the
> parse-tree or some other abstract tree like structure.

I am kind of doubtful that this is a real problem - why should it be
any slower than, say, parsing a JSON object into an AST?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 4:17 AM, Simon Riggs  wrote:
> On 25 January 2013 17:19, Robert Haas  wrote:
>> We
>> could easily run across a system where pg_class order happens to be
>> better than anything else we come up with.
>
> I think you should read that back to yourself and see if you still
> feel the word "easily" applies here.

I absolutely do.  You will not convince me that whacking around the
behavior of autovacuum in a maintenance release is a remotely sane
thing to do.  There are plenty of things wrong with the way autovacuum
works today, and I am all in favor of fixing them - but not in the
back-branches.  Every time we whack behavior around in the back
branches, no matter how innocuous it looks, somebody's environment
gets broken, and then they won't apply patch releases, and it causes
all sorts of headaches.  At least, that's my experience at
EnterpriseDB. YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Steve Singer

On 13-01-26 11:11 PM, Robert Haas wrote:

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
 wrote:

My understanding is that if the command string we give to event triggers
is ambiguous (sub-object names, schema qualifications, etc), it comes
useless for logical replication use. I'll leave it to the consumers of
that to speak up now.


Yeah, that's probably true.  I think it might be useful for other
purposes, but I think we need a bunch of infrastructure we don't have
yet to make logical replication of DDL a reality.



I agree.  Does anyone have a specific use case other than DDL 
replication where an ambiguous command string would be useful? Even for 
use cases like automatically removing a table from replication when it 
is dropped, I would want to be able to determine which table is being 
dropped unambiguously. Could I determine that from an oid? I suspect so, 
but parsing a command string and then trying to figure out the table 
from the search_path doesn't sound very appealing.



Well, the point is that if you have a function that maps a parse tree
onto an object name, any API or ABI changes can be reflected in an
updated definition for that function.  So suppose I have the command
"CREATE TABLE public.foo (a int)".  And we have a call
pg_target_object_namespace(), which will return "public" given the
parse tree for the foregoing command.  And we have a call
pg_target_object_name(), which will return "foo".  We can whack around
the underlying parse tree representation all we want and still not
break anything - because any imaginable parse tree representation will
allow the object name and object namespace to be extracted.  Were that
not possible it could scarcely be called a parse tree any longer.


How do you get the fully qualified type of the first column?
col1=pg_target_get_column(x, 0)
pg_target_get_type(col1);

or something similar.

I think that could work but we would be adding a lot of API functions to 
get all the various bits of info one would want the API to expose. I 
also suspect executing triggers that had to make lots of function calls 
to walk a tree would be much slower than an extension that could just 
walk the parse-tree or some other abstract tree like structure.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 3:50 AM, Simon Riggs  wrote:
> If we attempted to defer the FPI last thing before write, we'd need to
> cope with the case that writes at checkpoint occur after the logical
> start of the checkpoint, and also with the overhead of additional
> writes at checkpoint time.

Oh, good point.  That's surely a good reason not to do it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request for vote to move forward with recovery.conf overhaul

2013-01-27 Thread Robert Haas
On Sun, Jan 27, 2013 at 1:01 AM, Michael Paquier
 wrote:
>> So... what happens when recovery ends?  Do the settings loaded from
>> recovery.conf get reverted, or what?
>
> With current patch the settings are kept if set in postgresql.conf and
> discarded if they are loaded as GUC after a server restart or reload.

I can't understand that answer.  Is there a word missing or something?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Simon Riggs
On 27 January 2013 11:30, Josh Berkus  wrote:

> So while testing some replication stuff on 9.2.2 I discovered that it's
> completely possible to connect a replica to itself.  Seems like we ought
> to at least be able to detect and log *that*.

How do we do that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-27 Thread MauMau

From: "Fujii Masao" 

On Sun, Jan 27, 2013 at 12:17 AM, MauMau  wrote:
Although you said the fix will solve my problem, I don't feel it will. 
The
discussion is about the crash when the standby "re"starts after the 
primary
vacuums and truncates a table.  On the other hand, in my case, the 
standby
crashed during failover (not at restart), emitting a message that some 
WAL

record refers to an "uninitialized" page (not a non-existent page) of an
"index" (not a table).

In addition, fujii_test.sh did not reproduce the mentioned crash on
PostgreSQL 9.1.6.

I'm sorry to cause you trouble, but could you elaborate on how the fix
relates to my case?


Maybe I had not been understanding your problem correctly.
Could you show the self-contained test case which reproduces the problem?
Is the problem still reproducible in REL9_1_STABLE?


As I said before, it's very hard to reproduce the problem.  All what I did 
is to repeat the following sequence:


1. run "pg_ctl stop -mi" against the primary while the applications were 
performing INSERT/UPDATE/SELECT.
2. run "pg_ctl promote" against the standby of synchronous streaming 
standby.
3. run pg_basebackup on the stopped (original) primary to create a new 
standby, and start the new standby.


I did this failover test dozens of times, probably more than a hundred.  And 
I encountered the crash only once.


Although I saw the problem only once, the result is catastrophic.  So, I 
really wish Heiki's patch (in cooperation with Horiguchi-san and you) could 
fix the issue.


Do you think of anything?

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]Fix for ecpglib's native language messages output

2013-01-27 Thread Michael Meskes
On Sun, Jan 27, 2013 at 12:42:48PM +0800, Chen Huajun wrote:
> I found the ecpg programs can not output the native language messages which
> defined in ecpglib6-9.x.mo.
> The reason is a misstake in "src/interfaces/ecpg/ecpglib/misc.c",
> and i mad a patch for that.

Thanks for finding an fixing the problem, patch committed.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Josh Berkus
All,

So while testing some replication stuff on 9.2.2 I discovered that it's
completely possible to connect a replica to itself.  Seems like we ought
to at least be able to detect and log *that*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-27 Thread Fujii Masao
On Sun, Jan 27, 2013 at 12:17 AM, MauMau  wrote:
> From: "Fujii Masao" 
>>
>> On Thu, Jan 24, 2013 at 11:53 PM, MauMau  wrote:
>>>
>>> I'm wondering if the fix discussed in the above thread solves my problem.
>>> I
>>> found the following differences between Horiguchi-san's case and my case:
>>>
>>> (1)
>>> Horiguchi-san says the bug outputs the message:
>>>
>>> WARNING:  page 0 of relation base/16384/16385 does not exist
>>>
>>> On the other hand, I got the message:
>>>
>>>
>>> WARNING:  page 506747 of relation base/482272/482304 was uninitialized
>>>
>>>
>>> (2)
>>> Horiguchi-san produced the problem when he shut the standby immediately
>>> and
>>> restarted it.  However, I saw the problem during failover.
>>>
>>>
>>> (3)
>>> Horiguchi-san did not use any index, but in my case the WARNING message
>>> refers to an index.
>>>
>>>
>>> But there's a similar point.  Horiguchi-san says the problem occurs after
>>> DELETE+VACUUM.  In my case, I shut the primary down while the application
>>> was doing INSERT/UPDATE.  As the below messages show, some vacuuming was
>>> running just before the immediate shutdown:
>>>
>>> ...
>>> LOG:  automatic vacuum of table "GOLD.scm1.tbl1": index scans: 0
>>> pages: 0 removed, 36743 remain
>>> tuples: 0 removed, 73764 remain
>>> system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec
>>> LOG:  automatic analyze of table "GOLD.scm1.tbl1" system usage: CPU
>>> 0.00s/0.14u sec elapsed 0.32 sec
>>> LOG:  automatic vacuum of table "GOLD.scm1.tbl2": index scans: 0
>>> pages: 0 removed, 12101 remain
>>> tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec
>>> elapsed 0.30 sec
>>> LOG:  automatic analyze of table "GOLD.scm1.tbl2" system usage: CPU
>>> 0.00s/0.06u sec elapsed 0.14 sec
>>> LOG:  received immediate shutdown request
>>> ...
>>>
>>>
>>> Could you tell me the details of the problem discussed and fixed in the
>>> upcoming minor release?  I would to like to know the phenomenon and its
>>> conditions, and whether it applies to my case.
>>
>>
>>
>> http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp
>>
>> Could you read the discussion in the above thread?
>
>
> Yes, I've just read the discussion (it was difficult for me...)
>
> Although you said the fix will solve my problem, I don't feel it will.  The
> discussion is about the crash when the standby "re"starts after the primary
> vacuums and truncates a table.  On the other hand, in my case, the standby
> crashed during failover (not at restart), emitting a message that some WAL
> record refers to an "uninitialized" page (not a non-existent page) of an
> "index" (not a table).
>
> In addition, fujii_test.sh did not reproduce the mentioned crash on
> PostgreSQL 9.1.6.
>
> I'm sorry to cause you trouble, but could you elaborate on how the fix
> relates to my case?

Maybe I had not been understanding your problem correctly.
Could you show the self-contained test case which reproduces the problem?
Is the problem still reproducible in REL9_1_STABLE?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-27 Thread Simon Riggs
On 25 January 2013 17:19, Robert Haas  wrote:

> We
> could easily run across a system where pg_class order happens to be
> better than anything else we come up with.

I think you should read that back to yourself and see if you still
feel the word "easily" applies here.

I agree with Tom that its hard for almost any prioritisation not to be
better than we have now.

But also, we should keep it fairly simple to avoid introducing new
behaviour that defeats people with a highly tuned vacuum config.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-01-27 Thread Simon Riggs
On 25 January 2013 20:29, Robert Haas  wrote:

>> The checksums patch also introduces another behavior into
>> SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
>> if checksums are enabled (to avoid torn page hazards). That's only
>> necessary for changes where the caller does not write WAL itself and
>> doesn't bump the LSN of the data page. (There's a reason the caller
>> can't easily write the XLOG_HINT WAL itself.) So, we could introduce
>> another flag "needsWAL" that would control whether we write the
>> XLOG_HINT WAL or not (only applies with checksums on, of course).
>
> I thought Simon had the idea, at some stage, of writing a WAL record
> to cover hint-bit changes only at the time we *write* the buffer and
> only if no FPI had already been emitted that checkpoint cycle.  I'm
> not sure whether that approach was sound, but if so it seems more
> efficient than this approach.

The requirement is that we ensure that a FPI is written to WAL before
any changes to the block are made.

The patch does that by inserting an XLOG_HINT_WAL record when we set a
hint. The insert is a no-op if we've already written the FPI in this
checkpoint cycle and we don't even reach there except when dirtying a
clean data block.

If we attempted to defer the FPI last thing before write, we'd need to
cope with the case that writes at checkpoint occur after the logical
start of the checkpoint, and also with the overhead of additional
writes at checkpoint time.

I don't see any advantage in deferring the FPI, but I do see
disadvantage in complicating this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers