date:20111026

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs  wrote:
>> This fixes both the subtrans and clog bugs in one patch.

> I don't see the point of changing StartupCLOG() to be an empty
> function and adding a new function TrimCLOG() that does everything
> StartupCLOG() used to do.

+1 ... I found that overly cute also.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Robert Haas

On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs  wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs  wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I don't see the point of changing StartupCLOG() to be an empty
function and adding a new function TrimCLOG() that does everything
StartupCLOG() used to do.  Seems simpler to just move the calls to
StartupCLOG() wherever they need to be - i.e. remove the one that
happens before WAL replay, and extricate the one at end-of-recovery
from the if block which currently contains it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct26, 2011, at 18:08 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug  wrote:
>> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
>> but
>> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
>> whose CLOG page hasn't necessarily made it to the disk yet. The longer 
>> CheckPointGuts()
>> takes to finish it's work the more likely it becomes (assuming that CLOG 
>> writing
>> and syncing doesn't happen at the very end). This fits the OP's observation 
>> ob the
>> problem vanishing when pg_start_backup() does an immediate checkpoint.
> 
> As it turns out the correct fix is actually just to skip StartupClog()
> until the end of recovery because it does nothing useful when executed
> at that time. When I wrote the original code I remember thinking that
> StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts(). 

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray
traversals.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot

2011-10-26 Thread David E. Wheeler

On Oct 26, 2011, at 2:58 PM, Tom Lane wrote:

>> I read this as equivalent to "can be not released." Which of course is 
>> silly, so as I read it I realize what it means, but it trips up my overly 
>> logical brain. It interrupts the flow. There is no such confusion in "cannot 
>> be released" and thus no tripping up on meaning.
> 
> This particular change seems like an improvement to me, but it's hardly
> an adequate argument for a global search-and-replace.  There might be
> other places where such a change renders things *less* readable.

The patch is actually quite modest; there are only a few instances of "can 
not". Attached.

Best,

David



cannot.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot

2011-10-26 Thread Tom Lane

"David E. Wheeler"  writes:
> On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:
>> Why? "can not" is perfectly grammatical AFAIK.

> True, but there's a logic issue. Take this example from 
> doc/src/sgml/func.sgml:

>> 
>> pg_advisory_xact_lock works the same as
>> pg_advisory_lock, expect the lock is automatically released
>> at the end of the current transaction and can not be released explicitly.
>> 

> I read this as equivalent to "can be not released." Which of course is silly, 
> so as I read it I realize what it means, but it trips up my overly logical 
> brain. It interrupts the flow. There is no such confusion in "cannot be 
> released" and thus no tripping up on meaning.

This particular change seems like an improvement to me, but it's hardly
an adequate argument for a global search-and-replace.  There might be
other places where such a change renders things *less* readable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot

2011-10-26 Thread Andrew Dunstan




On 10/26/2011 05:15 PM, David E. Wheeler wrote:

On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:


Suggested doc “patch”:

 grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'

Why? "can not" is perfectly grammatical AFAIK.

True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml:



  pg_advisory_xact_lock  works the same as
  pg_advisory_lock, expect the lock is automatically released
  at the end of the current transaction and can not be released explicitly.
 

I read this as equivalent to "can be not released." Which of course is silly, so as I 
read it I realize what it means, but it trips up my overly logical brain. It interrupts the flow. 
There is no such confusion in "cannot be released" and thus no tripping up on meaning.




Here's what I would do:

1. s/expect/except that/

2. s/can not be released explicitly/can not be explicitly released/

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot

2011-10-26 Thread David E. Wheeler

On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:

>> Suggested doc “patch”:
>> 
>> grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'
> 
> Why? "can not" is perfectly grammatical AFAIK.

True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml:

>
>  pg_advisory_xact_lock works the same as
>  pg_advisory_lock, expect the lock is automatically released
>  at the end of the current transaction and can not be released explicitly.
> 

I read this as equivalent to "can be not released." Which of course is silly, 
so as I read it I realize what it means, but it trips up my overly logical 
brain. It interrupts the flow. There is no such confusion in "cannot be 
released" and thus no tripping up on meaning.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot

2011-10-26 Thread Andrew Dunstan




On 10/26/2011 02:16 PM, David E. Wheeler wrote:

Suggested doc “patch”:

 grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'




Why? "can not" is perfectly grammatical AFAIK.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Dickson S. Guedes

2011/10/26 Euler Taveira de Oliveira :
> I'm not saying that is not the right direction, I'm arguing that a hint is
> better than nothing. Right now the only way to know if it is out of workers
> is to query pg_stat_activity frequently.

The currently number of autovaccum workers could be in the errmsg only
instead errhint, then errhint could be omited from patch if there
isn't a good hint to report.

-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Alvaro Herrera


Excerpts from Euler Taveira de Oliveira's message of mié oct 26 16:57:18 -0300 
2011:
> 
> On 26-10-2011 16:14, Alvaro Herrera wrote:
> > Well, just increasing the number of workers would do nothing to solve
> > the problem, because the more workers there are, the slower they work.
> > The actual solution to the problem would be decreasing
> > autovacuum_vacuum_delay_cost, and/or related knobs.
> >
> Why not? You're saying that parallelizing the work won't help? As about 
> autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to 
> suggest decreasing instead of increasing the number of workers?

I am saying that if you have two workers running, they increase their
cost delay to 40ms internally.  If you increase the max to four, they
would run at an effective delay of 80ms.

> > Wasn't there some discussion recently on measuring the length of the
> > work queue, or something like that?
> >
> Yes, there is. As I said, it is an expensive and approximate measure. I'm not 
> saying that is not the right direction, I'm arguing that a hint is better 
> than 
> nothing. Right now the only way to know if it is out of workers is to query 
> pg_stat_activity frequently.

Well, I'm not saying there aren't other ways.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Euler Taveira de Oliveira


On 26-10-2011 16:14, Alvaro Herrera wrote:

Well, just increasing the number of workers would do nothing to solve
the problem, because the more workers there are, the slower they work.
The actual solution to the problem would be decreasing
autovacuum_vacuum_delay_cost, and/or related knobs.

Why not? You're saying that parallelizing the work won't help? As about 
autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to 
suggest decreasing instead of increasing the number of workers?



Wasn't there some discussion recently on measuring the length of the
work queue, or something like that?

Yes, there is. As I said, it is an expensive and approximate measure. I'm not 
saying that is not the right direction, I'm arguing that a hint is better than 
nothing. Right now the only way to know if it is out of workers is to query 
pg_stat_activity frequently.



--
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

FYI I have given this patch a good test and can now no longer reproduce
either the subtrans nor the clog error.  Thanks guys!


On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs  wrote:

> On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs 
> wrote:
> > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs 
> wrote:
> >
> >> Brewing a patch now.
> >
> > Latest thinking... confirmations or other error reports please.
> >
> > This fixes both the subtrans and clog bugs in one patch.
>
> I'll be looking to commit that tomorrow afternoon as two separate
> patches with appropriate credits.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Alvaro Herrera


Excerpts from Euler Taveira de Oliveira's message of mar oct 25 16:56:12 -0300 
2011:
> Hi,
> 
> Some time ago [1], I proposed print a message every time there isn't 
> autovacuum slots available and it asks for another one. It is not a complete 
> solution for autovacuum tuning but it would at least give us a hint that 
> number of workers is insufficient to keep up with the current load. The 
> accurate number of slots needed would be the optimal solution but that 
> information is not free (it would have to check every table in the databases 
> available to get the approximate number of slots needed. Approximate because 
> some table could be finishing the operation). A new warning is better than 
> nothing. If we decided to improve this area in a future we should remove the 
> warning but right now it would be an excelent hint to tune autovacuum.

Well, just increasing the number of workers would do nothing to solve
the problem, because the more workers there are, the slower they work.
The actual solution to the problem would be decreasing
autovacuum_vacuum_delay_cost, and/or related knobs.

I'm sure we need more data on autovacuum activities to properly tune
autovac, but I'm not sure that "maxxing out max_workers" is one of them.
Wasn't there some discussion recently on measuring the length of the
work queue, or something like that?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Tom Lane

Euler Taveira de Oliveira  writes:
> + if (!can_launch)
> + ereport(LOG,
> + (errmsg("maximum number of autovacuum 
> workers reached"),
> +  errhint("Consider increasing 
> autovacuum_max_workers (currently %d).",
> + 
> autovacuum_max_workers)));

Isn't it normal for the launcher to max out the number of workers?
A log message that's generated routinely in normal operation doesn't
sound particularly helpful to me ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated version of pg_receivexlog

2011-10-26 Thread Magnus Hagander

On Tue, Oct 25, 2011 at 12:37, Magnus Hagander  wrote:
> On Mon, Oct 24, 2011 at 14:40, Magnus Hagander  wrote:
>> On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
>>  wrote:
 +               /*
 +                * Looks like an xlog file. Parse it's position.
>>>
>>> s/it's/its/
>>>
 +                */
 +               if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log,
 &seg) != 3)
 +               {
 +                       fprintf(stderr, _("%s: could not parse xlog
 filename \"%s\"\n"),
 +                                       progname, dirent->d_name);
 +                       disconnect_and_exit(1);
 +               }
 +               log *= XLOG_SEG_SIZE;
>>>
>>> That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
>>> It seems completely unnecessary, anyway,
>>
>> How do you mean completely unnecessary? We'd have to change the points
>> that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
>> way to get around the overflow, but I'm not sure that's what you mean?
>
> Talked to Heikki on IM about this one, turns out we were both wrong.
> It's needed, but there was a bug hiding under it, due to (once again)
> mixing up segments and offsets. Has been fixed now.
>
>>> In pg_basebackup, it would be a good sanity check to check that the systemid
>>> returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
>>> connection match. Just to be sure that some connection pooler didn't hijack
>>> one of the connections and point to a different server. And better check
>>> timelineid too while you're at it.
>>
>> That's a good idea. Will fix.
>
> Added to the new version of the patch.
>
>
>>> How does this interact with synchronous replication? If a base backup that
>>> streams WAL is in progress, and you have synchronous_standby_names set to
>>> '*', I believe the in-progress backup will count as a standby for that
>>> purpose. That might give a false sense of security.
>>
>> Ah yes. Did not think of that. Yes, it will have this problem.
>
> Actually, thinking more, per other mail, it won't. Because it will
> never report that the data is synced to disk, so it will not be
> considered for sync standby.
>
> This is something we might consider in the future (it could be a
> reasonable scenario where you had this), but not in the first version.
>
> Updated version of the patch attached.

I've applied this version with a few more minor changes that Heikki found.

His comment about .partial files still applies, and I intend to
address this in a follow-up commit, along with some further
documentation enhancements.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cannot

2011-10-26 Thread David E. Wheeler

Suggested doc “patch”:

grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane

Robert Haas  writes:
> I believe that we're in trouble with XIDs as soon as you have two
> active XIDs that are separated by a billion, because then you could
> have a situation where some people think a given XID is in the future
> and others think it's in the past.  I have been wondering if we should
> have some sort of active guard against that scenario; I don't think we
> do at present.

Sure we do.  It's covered by the XID wraparound prevention logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Tom Lane

Magnus Hagander  writes:
> On Wed, Oct 26, 2011 at 16:37, Tom Lane  wrote:
>> If that was what he meant, I'd vote against it.  There are way too many
>> people who will *not* want their databases configured to be able to
>> reach out onto the net.  This feature should be something that has to be
>> installed by explicit user action.

> That is not what I meant.

> I meant installed the shared library by defualt, but still require
> CREATE EXTENSION.

Whether the shlib is installed by default is a decision for packagers to
make, not us.  At best we could make a recommendation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum workers warning

2011-10-26 Thread Euler Taveira de Oliveira


Hi,

Some time ago [1], I proposed print a message every time there isn't 
autovacuum slots available and it asks for another one. It is not a complete 
solution for autovacuum tuning but it would at least give us a hint that 
number of workers is insufficient to keep up with the current load. The 
accurate number of slots needed would be the optimal solution but that 
information is not free (it would have to check every table in the databases 
available to get the approximate number of slots needed. Approximate because 
some table could be finishing the operation). A new warning is better than 
nothing. If we decided to improve this area in a future we should remove the 
warning but right now it would be an excelent hint to tune autovacuum.



[1] http://archives.postgresql.org/pgsql-hackers/2011-06/msg00678.php


--
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3b71232..4ec0f87 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -656,6 +656,12 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		can_launch = (AutoVacuumShmem->av_freeWorkers != NULL);
 
+		if (!can_launch)
+			ereport(LOG,
+	(errmsg("maximum number of autovacuum workers reached"),
+	 errhint("Consider increasing autovacuum_max_workers (currently %d).",
+		 	autovacuum_max_workers)));
+
 		if (AutoVacuumShmem->av_startingWorker != NULL)
 		{
 			int			waittime;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Magnus Hagander

On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan  wrote:
>
> On 10/26/2011 12:47 PM, Magnus Hagander wrote:
>>>
>>> If that was what he meant, I'd vote against it.  There are way too many
>>> people who will *not* want their databases configured to be able to
>>> reach out onto the net.  This feature should be something that has to be
>>> installed by explicit user action.
>>
>> That is not what I meant.
>>
>> I meant installed the shared library by defualt, but still require
>> CREATE EXTENSION.
>>
>
> I don't see why it should be different from other standard modules, such as
> citext or hstore, both of which have pretty wide use, and less possible
> security implications than this.

As I stated earlier, it's really back to the old discussion of
splitting up contrib. This would be the "additional module" part, but
not the "example of how to do things" part of that...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane

Heikki Linnakangas  writes:
> That's not what Jeff is referring to here, though (correct me if I'm 
> wrong). He's talking about the one-item cache in 
> TransactionIdLogFetch(). You don't need need long-running transactions 
> for that to get confused. Specifically, this could happen:

> 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
> The row has xmin = 123456, and it is cached as committed in the 
> one-item cache by TransactionLogFetch.
> 2. A lot of time passes. Everything is frozen, and XID wrap-around 
> happens. (Session A is idle but not in a transaction, so it doesn't 
> inhibit freezing.)
> 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
> By coincidence, this transaction was assigned XID 123456.
> 4. In session A: SELECT * FROM foo WHERE id = 2;
> The one-item cache still says that 123456 committed, so we return 
> the tuple inserted by the aborted transaction. Oops.

I think this is probably a red herring, because it's impossible for a
session to remain totally idle for that long --- see sinval updating.
(If you wanted to be really sure, we could have sinval reset clear
the TransactionLogFetch cache, but I doubt it's worth the trouble.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Andrew Dunstan




On 10/26/2011 12:47 PM, Magnus Hagander wrote:


If that was what he meant, I'd vote against it.  There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net.  This feature should be something that has to be
installed by explicit user action.

That is not what I meant.

I meant installed the shared library by defualt, but still require
CREATE EXTENSION.



I don't see why it should be different from other standard modules, such 
as citext or hstore, both of which have pretty wide use, and less 
possible security implications than this.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs  wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs  wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Jeff Davis

On Wed, 2011-10-26 at 12:19 -0400, Robert Haas wrote:
> > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
> >   The row has xmin = 123456, and it is cached as committed in the one-item
> > cache by TransactionLogFetch.
> > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> > (Session A is idle but not in a transaction, so it doesn't inhibit
> > freezing.)
> > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
> >   By coincidence, this transaction was assigned XID 123456.
> > 4. In session A: SELECT * FROM foo WHERE id = 2;
> >   The one-item cache still says that 123456 committed, so we return the
> > tuple inserted by the aborted transaction. Oops.

Yes, that's the scenario I was talking about.

> Oh, hmm.  That sucks.

It didn't seem like a major concern a while ago:
http://archives.postgresql.org/message-id/28107.1291264...@sss.pgh.pa.us

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Magnus Hagander

On Wed, Oct 26, 2011 at 16:37, Tom Lane  wrote:
> Shigeru Hanada  writes:
>> (2011/10/25 19:15), Magnus Hagander wrote:
>>> I have not looked at the code itself, but I wonder if we shouldn't
>>> consider making this a part of core-proper, not just a contrib module.
>>> The fact that it isn't *already* available in core surprises a lot of
>>> people...
>
>> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
>> so that it's available just after initdb?
>
> If that was what he meant, I'd vote against it.  There are way too many
> people who will *not* want their databases configured to be able to
> reach out onto the net.  This feature should be something that has to be
> installed by explicit user action.

That is not what I meant.

I meant installed the shared library by defualt, but still require
CREATE EXTENSION.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Alvaro Herrera


Excerpts from Robert Haas's message of mié oct 26 13:19:47 -0300 2011:
> On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
>  wrote:

> > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
> >   The row has xmin = 123456, and it is cached as committed in the one-item
> > cache by TransactionLogFetch.
> > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> > (Session A is idle but not in a transaction, so it doesn't inhibit
> > freezing.)
> > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
> >   By coincidence, this transaction was assigned XID 123456.
> > 4. In session A: SELECT * FROM foo WHERE id = 2;
> >   The one-item cache still says that 123456 committed, so we return the
> > tuple inserted by the aborted transaction. Oops.
> 
> Oh, hmm.  That sucks.

Can this be solved by simple application of the Xid epoch?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas

On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
 wrote:
> On 26.10.2011 18:42, Robert Haas wrote:
>>
>> On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis  wrote:
>>>
>>> Aren't there a few other cases like this floating around the code? I
>>> know the single-xid cache is potentially vulnerable to xid wraparound
>>> for the same reason.
>>
>> I believe that we're in trouble with XIDs as soon as you have two
>> active XIDs that are separated by a billion, ...
>
> That's not what Jeff is referring to here, though (correct me if I'm wrong).
> He's talking about the one-item cache in TransactionIdLogFetch(). You don't
> need need long-running transactions for that to get confused. Specifically,
> this could happen:
>
> 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
>   The row has xmin = 123456, and it is cached as committed in the one-item
> cache by TransactionLogFetch.
> 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> (Session A is idle but not in a transaction, so it doesn't inhibit
> freezing.)
> 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
>   By coincidence, this transaction was assigned XID 123456.
> 4. In session A: SELECT * FROM foo WHERE id = 2;
>   The one-item cache still says that 123456 committed, so we return the
> tuple inserted by the aborted transaction. Oops.

Oh, hmm.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs  wrote:

> Brewing a patch now.

Latest thinking... confirmations or other error reports please.

This fixes both the subtrans and clog bugs in one patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


oldestActiveXid_fixed.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug  wrote:
> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> As you said, the CLOG page corresponding to nextId
>> *should* always be accessible at the start of recovery (Unless whole file
>> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
>> Yet the error suggest that the CLOG is, in fact, too short. What I said
>> is that we shouldn't apply any fix (for the CLOG problem) before we 
>> understand
>> the reason for that apparent contradiction.
>
> Ha! I think I've got a working theory.
>
> In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
> record, and then call CheckPointGuts() which does the actual writing and 
> fsyncing.
> So far, that fine. If a transaction ID is assigned before we compute the
> checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
> will
> make sure the new CLOG page goes to disk.
>
> But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
> CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
> fine too, except that LogStandbySnapshot() re-assigned the *current* value of
> ShmemVariableCache->nextXid to the checkpoint's nextXid field.
>
> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
> but
> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
> whose CLOG page hasn't necessarily made it to the disk yet. The longer 
> CheckPointGuts()
> takes to finish it's work the more likely it becomes (assuming that CLOG 
> writing
> and syncing doesn't happen at the very end). This fits the OP's observation 
> ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct26, 2011, at 17:36 , Chris Redekop wrote:
> > And I think they also reported that if they didn't run hot standby,
> > but just normal recovery into a new master, it didn't have the problem
> > either, i.e. without hotstandby, recovery ran, properly extended the
> > clog, and then ran as a new master fine.
> 
> Yes this is correct...attempting to start as hotstandby will produce the
> pg_clog error repeatedly and then without changing anything else, just
> turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

> > This fits the OP's observation ob the
> > problem vanishing when pg_start_backup() does an immediate checkpoint.
> 
> Note that this is *not* the behaviour I'm seeingit's possible it happens
> more frequently without the immediate checkpoint, but I am seeing it happen
> even with the immediate checkpoint.

Yeah, I should have said "of the problem's likelihood decreasing" instead
of "vanishing". The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug  wrote:
> On Oct26, 2011, at 15:12 , Simon Riggs wrote:
>> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk  wrote:
>>
>>> The read fails because their is no data at the location it's trying to
>>> read from, because clog hasn't been extended yet by recovery.
>>
>> You don't actually know that, though I agree it seems a reasonable
>> guess and was my first thought also.
>
> The actual error message also supports that theory. Here's the relevant
> snippet from the OP's log (Found in 
> ca9fd2fe.1d8d2%linas.virba...@continuent.com)
>
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 
> 1188673
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at 
> offset 32768: Success.
>
> Note that it says "Success" at the end of the second log entry. That
> can only happen, I think, if we're trying to read the page adjacent to
> the last page in the file. The seek would be successfull, and the subsequent
> read() would indicate EOF by returning zero bytes. None of the calls would
> set errno. If there was a real IO error, read() would set errno, and if the
> page wasn't adjacent to the last page in the file, seek() would set errno.
> In both cases we'd see the corresponding error messag, not "Success".

And even more pointedly, in the original go around on this:
   http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/ after pg_start_backup call:
-rw--- 1 postgres postgres 8192 Sep 23 14:31 

Changed during the rsync phase to this:
-rw--- 1 postgres postgres 16384 Sep 23 14:33 

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw--- 1 postgres postgres 8192 Sep 23 14:31 

With the error:
  2011-09-23 14:33:46 CEST FATAL:  could not access status of transaction 37206
  2011-09-23 14:33:46 CEST DETAIL:  Could not read from file
"pg_clog/" at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the "recent" clog/ from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Kohei KaiGai

2011/10/26 Robert Haas :
> 2011/10/26 Shigeru Hanada :
>> (2011/10/25 19:15), Magnus Hagander wrote:
>>> 2011/10/25 Shigeru Hanada:
 I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
 contrib module.  I think that this module would be the basis of further
 SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>>>
>>> I have not looked at the code itself, but I wonder if we shouldn't
>>> consider making this a part of core-proper, not just a contrib module.
>>> The fact that it isn't *already* available in core surprises a lot of
>>> people...
>>
>> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
>> so that it's available just after initdb?  It would be accomplished with
>> some more changes:
>>
>> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
>> install dynamically loadable module during "make install" for core.  The
>> pgsql_fdw_handler function can't be included into core binary because we
>> must avoid liking libpq with server binary directly.  This method is
>> also used for libwalreceiver of replication module.
>> * Create pgsql_fdw extension during initdb invocation, like plpgsql.
>>
>> These are not trivial, but not difficult so much.  However, I think
>> contrib would be the appropriate place for pgsql_fdw because it's
>> (relatively) special feature.
>
> I agree.  pgsql_fdw will be a nice feature, but there's no reason to
> think that everyone will want it installed by default, and there are
> some security reasons to think that they might not.  On the flip side,
> pushing it out of contrib and onto pgfoundry or whatever makes it
> unnecessarily difficult to install, and not as many people will
> benefit from it.  So contrib seems exactly right to me.
>
I also agree. The pgsql_fdw will be worthful to locate in the main tree
as a contrib module. It will give us clear opportunity to test new
features of FDW using RDBMS characteristics; such as join-push-down.
However, it should be a separated discussion whether it shall be installed
by the default.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas


On 26.10.2011 18:42, Robert Haas wrote:

On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis  wrote:

Aren't there a few other cases like this floating around the code? I
know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, ...


That's not what Jeff is referring to here, though (correct me if I'm 
wrong). He's talking about the one-item cache in 
TransactionIdLogFetch(). You don't need need long-running transactions 
for that to get confused. Specifically, this could happen:


1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
   The row has xmin = 123456, and it is cached as committed in the 
one-item cache by TransactionLogFetch.
2. A lot of time passes. Everything is frozen, and XID wrap-around 
happens. (Session A is idle but not in a transaction, so it doesn't 
inhibit freezing.)

3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
   By coincidence, this transaction was assigned XID 123456.
4. In session A: SELECT * FROM foo WHERE id = 2;
   The one-item cache still says that 123456 committed, so we return 
the tuple inserted by the aborted transaction. Oops.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Dimitri Fontaine

Stephen Frost  writes:
> I'm in favor of making that distinction.  I would still have pgsql_fdw,
> file_fdw, etc, be packaged more-or-less the same way and still use the
> CREATE EXTENTION framework, of course.

We called that idea “core extension” at the latest hackers meeting, and
Greg Smith had a patch with a first selections of extensions to package
this way.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct26, 2011, at 15:57 , Florian Pflug wrote:
> As you said, the CLOG page corresponding to nextId
> *should* always be accessible at the start of recovery (Unless whole file
> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
> Yet the error suggest that the CLOG is, in fact, too short. What I said
> is that we shouldn't apply any fix (for the CLOG problem) before we understand
> the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and 
fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache->nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer 
CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob 
the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there 
some
documentation on what assumptions the hot standby code makes about the various 
XID
fields included in a checkpoint?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Robert Haas

2011/10/26 Shigeru Hanada :
> (2011/10/25 19:15), Magnus Hagander wrote:
>> 2011/10/25 Shigeru Hanada:
>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>>> contrib module.  I think that this module would be the basis of further
>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>>
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...
>
> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
> so that it's available just after initdb?  It would be accomplished with
> some more changes:
>
> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
> install dynamically loadable module during "make install" for core.  The
> pgsql_fdw_handler function can't be included into core binary because we
> must avoid liking libpq with server binary directly.  This method is
> also used for libwalreceiver of replication module.
> * Create pgsql_fdw extension during initdb invocation, like plpgsql.
>
> These are not trivial, but not difficult so much.  However, I think
> contrib would be the appropriate place for pgsql_fdw because it's
> (relatively) special feature.

I agree.  pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not.  On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it.  So contrib seems exactly right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] cache lookup failed in plpgsql - reason?

2011-10-26 Thread Pavel Stehule

Hello

one my customer reported a random issue. He uses a procedure with
following fragment

 IF NOT EXISTS(
   SELECT relname FROM pg_class
   WHERE relname = 'tmp_object_state_change' AND relkind = 'r' AND
   pg_table_is_visible(oid)
 )
 THEN
   CREATE TEMPORARY TABLE tmp_object_state_change (
 object_id INTEGER,
 object_hid INTEGER,
 new_states INTEGER[],
 old_states INTEGER[]
   );
 ELSE
   TRUNCATE tmp_object_state_change;
 END IF;

These lines sometimes raise a error

Oct 25 20:13:44  db-s-01 postgres:  -- postgres[29970]:
[3-1] 2011-10-25 20:13:44 CEST adifd 29970 ERROR:  cache lookup failed
for relation 319883311
Oct 25 20:13:44  db-s-01 postgres:  -- postgres[29970]:
[3-2] 2011-10-25 20:13:44 CEST adifd 29970 CONTEXT:  SQL statement
"SELECT  NOT EXISTS( SELECT relname FROM pg_class WHERE relname =
Oct 25 20:13:44  db-s-01 postgres:  -- postgres[29970]:
[3-3]  'tmp_object_state_change' AND relkind = 'r' AND
pg_table_is_visible(oid) )"
Oct 25 20:13:44  db-s-01 postgres:  -- postgres[29970]:
[3-4]   PL/pgSQL function "update_object_states" line 2 at IF
Oct 25 20:13:44  db-s-01 postgres:  -- postgres[29970]:
[3-5] 2011-10-25 20:13:44 CEST adifd 29970 STATEMENT:  SELECT
update_object_states($1::integer)

I don't see a reason why on this query cache should be broken,

He uses Pg 8.3.15.

Any idea?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

> And I think they also reported that if they didn't run hot standby,
> but just normal recovery into a new master, it didn't have the problem
> either, i.e. without hotstandby, recovery ran, properly extended the
> clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the
pg_clog error repeatedly and then without changing anything else, just
turning hot standby off it will start up successfully.

> This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeingit's possible it happens
more frequently without the immediate checkpoint, but I am seeing it happen
even with the immediate checkpoint.

> This is a different problem and has already been reported by one of
> your colleagues in a separate thread, and answered in detail by me
> there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct26, 2011, at 15:12 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk  wrote:
> 
>> The read fails because their is no data at the location it's trying to
>> read from, because clog hasn't been extended yet by recovery.
> 
> You don't actually know that, though I agree it seems a reasonable
> guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in 
ca9fd2fe.1d8d2%linas.virba...@continuent.com)

2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at 
offset 32768: Success.

Note that it says "Success" at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not "Success".

> The error is very specifically referring to 22811359, which is the
> nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfso5nzue3x_kdjz78eaun3_fkmq-u...@mail.gmail.com

> 22811359 is mid-way through a clog page, so prior xids will already
> have been allocated, pages extended and then those pages fsyncd before
> the end of pg_start_backup().  So it shouldn't be possible for that
> page to be absent from the base backup, unless the base backup was
> taken without a preceding checkpoint, which seems is not the case from
> the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

> Note that if you are correct, then the solution is to extend clog,
> which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Tom Lane

Shigeru Hanada  writes:
> (2011/10/25 19:15), Magnus Hagander wrote:
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...

> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
> so that it's available just after initdb?

If that was what he meant, I'd vote against it.  There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net.  This feature should be something that has to be
installed by explicit user action.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas

On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis  wrote:
> On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
>> Hmm, I don't think that's safe. After Oid wraparound, a range type oid
>> might get reused for some other range type, and the cache would return
>> stale values. Extremely unlikely to happen by accident, but could be
>> exploited by an attacker.
>
> Any ideas on how to remedy that? I don't have another plan for making it
> perform well. Plugging it into the cache invalidation mechanism seems
> like overkill, but I suppose that would solve the problem.
>
> Aren't there a few other cases like this floating around the code? I
> know the single-xid cache is potentially vulnerable to xid wraparound
> for the same reason.

I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, because then you could
have a situation where some people think a given XID is in the future
and others think it's in the past.  I have been wondering if we should
have some sort of active guard against that scenario; I don't think we
do at present.

But OID wraparound is not the same as XID wraparound.  It's far more
common, I think, for a single transaction to use lots of OIDs than it
is for it to use lots of XIDs (i.e. have many subtransactions).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified

2011-10-26 Thread Robert Haas

On Tue, Oct 25, 2011 at 11:24 PM, Tom Lane  wrote:
> Even given your recent changes to reduce the overhead of checking for
> sinval messages, I'm not sure that it'd be practical to move the sinval
> message processing to just-before-we-look-up-a-cache-entry.

Wait a minute: I'm confused.  What's at issue here, at least AIUI, is
taking a TOAST pointer and fetching the corresponding value.  But that
must involve reading from the TOAST table, and our usual paradigm for
reading from system catalogs is (1) take AccessShareLock, (2) read the
relevant rows, (3) release AccessShareLock.  If we're doing that here,
then AcceptInvalidationMessages() is already getting called.  If we're
not, this seems horribly unsafe; aside from the current bug, somebody
could rewrite the table in the interim.

> Right now,
> we do AcceptInvalidationMessages basically once per table per query
> (or maybe it's twice or so, but anyway a very small multiplier on that).
> If we try to do it every time through SearchSysCache, we are probably
> talking two to three orders of magnitude more checks, which ISTM is
> certain to push the sinval queue back up to the top of the heap for
> contention.

I think we've pretty much got the *contention* licked, barring an
increase in the number of things that generate sinval messages, but
calling it too often will still be slow, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk  wrote:

> The read fails because their is no data at the location it's trying to
> read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The error is very specifically referring to 22811359, which is the
nextxid from pg_control and updated by checkpoint.

22811359 is mid-way through a clog page, so prior xids will already
have been allocated, pages extended and then those pages fsyncd before
the end of pg_start_backup().  So it shouldn't be possible for that
page to be absent from the base backup, unless the base backup was
taken without a preceding checkpoint, which seems is not the case from
the script output.

Note that if you are correct, then the solution is to extend clog,
which Florian disagrees with as a solution.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs  wrote:

>> It's very likely that it's a PostgreSQL problem, though. It's probably
>> not a pilot error since it happens even for backups taken with 
>> pg_basebackup(),
>> so the only explanation other than a PostgreSQL bug is broken hardware or
>> a pretty serious kernel/filesystem bug.
>
> The way forwards here is for someone to show the clog file that causes
> the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious.  Either that, of I've
completely missed something in these threads...  I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small.  So on the "recovering
instance", the clog is small.  PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big.  But that file was copied
after the backup was started, not after the backup finished.  So its
size is only guarenteed to be as big as it was when the backup
started.  Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug  wrote:

>> Chris' clog error was caused by a file read error. The file was
>> opened, we did a seek within the file and then the call to read()
>> failed to return a complete page from the file.
>>
>> The xid shown is 22811359, which is the nextxid in the control file.
>>
>> So StartupClog() must have failed trying to read the clog page from disk.
>
> Yep.
>
>> That isn't a Hot Standby problem, a recovery problem nor is it certain
>> its a PostgreSQL problem.
>
> It's very likely that it's a PostgreSQL problem, though. It's probably
> not a pilot error since it happens even for backups taken with 
> pg_basebackup(),
> so the only explanation other than a PostgreSQL bug is broken hardware or
> a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for distinguishing PG instances in event log v2

2011-10-26 Thread MauMau


From: "Magnus Hagander" 
2011/7/16 MauMau :

Hello,

The attached file is a revised patch which reflects all review comments by
Magnus in:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg00839.php

I have applied this patch after another round of rather extensive 
modifications.



Thank you, Magnus. I'll see the final code some later.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct25, 2011, at 13:39 , Florian Pflug wrote:
> On Oct25, 2011, at 11:13 , Simon Riggs wrote:
>> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs  wrote:
>>> We are starting recovery at the right place but we are initialising
>>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>>> being derived later than it should be, which can cause problems if
>>> this then means that whole pages are unitialised in subtrans. The bug
>>> only shows up if you do enough transactions (2048 is always enough) to
>>> move to the next subtrans page between the redo pointer and the
>>> checkpoint record while at the same time we do not have a long running
>>> transaction that spans those two points. That's just enough to happen
>>> reasonably frequently on busy systems and yet just enough to have
>>> slipped through testing.
>>> 
>>> We must either
>>> 
>>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>>> we derive the redo location
> 
>> (1) looks the best way forwards in all cases.
> 
> Let me see if I understand this
> 
> The probem seems to be that we currently derive oldestActiveXid end the end of
> the checkpoint, just before writing the checkpoint record. Since we use
> oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
> that checkpoint record (but after the REDO location, of course) may very well
> contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
> to touch these XID's SUBTRANS state, we error out.
> 
> Your patch seems sensible, because the checkpoint "logically" occurs at the
> REDO location not the checkpoint's location, so we ought to log an 
> oldestActiveXID
> corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to 
StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at 
first
glance, implies that we ought to use the nextId at the *beginning* of the 
checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has 
started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are 
always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug

On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug  wrote:
> 
>> What I don't understand is how this affects the CLOG. How does 
>> oldestActiveXID
>> factor into CLOG initialization?
> 
> It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.

> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
> 
> The xid shown is 22811359, which is the nextxid in the control file.
> 
> So StartupClog() must have failed trying to read the clog page from disk.

Yep.

> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified

2011-10-26 Thread Simon Riggs

On Wed, Oct 26, 2011 at 4:24 AM, Tom Lane  wrote:

> Even given your recent changes to reduce the overhead of checking for
> sinval messages, I'm not sure that it'd be practical to move the sinval
> message processing to just-before-we-look-up-a-cache-entry.  Right now,
> we do AcceptInvalidationMessages basically once per table per query
> (or maybe it's twice or so, but anyway a very small multiplier on that).
> If we try to do it every time through SearchSysCache, we are probably
> talking two to three orders of magnitude more checks, which ISTM is
> certain to push the sinval queue back up to the top of the heap for
> contention.

Initially I provided an implementation of eager locking, as Robert
suggests, but the above is a great argument against doing it that way.

Incidentally, I'd like to focus on the causal reason for these
problems. We have static type definitions, which allow us to
completely avoid dynamic type management at runtime. That is a
performance advantage for us in normal running, but it can also give
operational problems when we look to make changes.

> But in any case, this isn't the core of the problem.  The real point
> here is that we need a guarantee that a syscache entry we're going to
> use is/was valid as of some suitable time point later than the start of
> our current transaction.  (Once we have taken a snapshot, VACUUM will
> know that it can't remove any tuples that were deleted after the time of
> that snapshot; so even for SnapshotNow fetches, it's important to have
> an MVCC snapshot to protect toast-table dereferences.)  Perhaps rather
> than tying the problem into SearchSysCache, we should attach the
> overhead to GetTransactionSnapshot, which is called appealingly few
> times per query.  But right offhand it seems like that only protects us
> against the toast-tuple-deletion problem, not against the more general
> one of getting a stale view of the status of some relation.

Do we need to guarantee that? It seems strange to me to use the words
cache and strict in the same sentence.

I'm sure there are many points in the code where strictness is
required though also in many cases, it should be acceptable to say
that a cache miss is not a problem, so does not require strict
guarantees.

Do we need a redesign? Or do we need a way to use
relaxation/optimistic techniques.

I'm aware that it could be a huge undertaking to examine all call
points. If you have any ideas of where to investigate or parts of the
problem to assist with, I'll be happy to work on this. This seems
important to me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs

On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop  wrote:
>> Chris, can you rearrange the backup so you copy the pg_control file as
>> the first act after the pg_start_backup?
> I tried this and it doesn't seem to make any difference.

It won't, that was a poor initial diagnosis on my part.

> I also tried the
> patch and I can no longer reproduce the subtrans error

Good

>, however instead it
> now it starts up, but never gets to the point where it'll accept
> connections.  It starts up but if I try to do anything I always get "FATAL:
>  the database system is starting up"...even if the load is removed from the
> primary, the standby still never finishes "starting up".
...
> 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
> because recovery snapshot incomplete
...

This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

>From here, it looks like the published fixes the originally reported problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Shigeru Hanada

(2011/10/25 19:15), Magnus Hagander wrote:
> 2011/10/25 Shigeru Hanada:
>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>> contrib module.  I think that this module would be the basis of further
>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
> 
> I have not looked at the code itself, but I wonder if we shouldn't
> consider making this a part of core-proper, not just a contrib module.
> The fact that it isn't *already* available in core surprises a lot of
> people...

Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb?  It would be accomplished with
some more changes:

* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during "make install" for core.  The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly.  This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.

These are not trivial, but not difficult so much.  However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup

2011-10-26 Thread Heikki Linnakangas

(CC'ing pgsql-hackers, this started as an IM discussion yesterday but 
really belongs in the archives)


On 25.10.2011 23:52, Magnus Hagander wrote:

There's a tiny chance to get incomplete xlog files with pg_receivexlog if you 
crash:
1. pg_receivexlog finishes write()ing a file but system crashes before fsync() 
finishes.
2. When pg_receivexlog restarts after crash, the last WAL file was not fully 
flushed to disk, with
holes in the middle, but it has the right length. pg_receivexlog will continue 
streaming from the next file.
not sure if we care about such a narrow window, but maybe we do


So how would we go about fixing that?  Always unlink the last file in
the directory and try from there would seem dangerous too - what if
it's not available on the master anymore, then we might have given up
on data...


Start streaming from the beginning of the last segment, but don't unlink 
it first. Just overwrite it as you receive the data.


Or, always create new xlog file as "0001000100D3.partial", 
and only when it's fully written, fsync it, and then rename it to 
"0001000100D3". Then you know that if a file doesn't have 
the .partial suffix, it's complete. The fact that the last partial file 
always has the .partial suffix needs some extra pushups in the 
restore_command, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas


On 25.10.2011 19:37, Jeff Davis wrote:

On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:

Hmm, I don't think that's safe. After Oid wraparound, a range type oid
might get reused for some other range type, and the cache would return
stale values. Extremely unlikely to happen by accident, but could be
exploited by an attacker.


Any ideas on how to remedy that? I don't have another plan for making it
perform well. Plugging it into the cache invalidation mechanism seems
like overkill, but I suppose that would solve the problem.


I think we should look at the array-functions for precedent. array_in et 
al cache the information in fn_extra, so that when it's called 
repeatedly in one statement for the same type, the information is only 
looked up once. That's good enough, it covers repeated execution in a 
single query, as well as COPY and comparison calls from index searches, 
for example.



Aren't there a few other cases like this floating around the code?


Not that I know of. That said, I wouldn't be too surprised if there was.


I know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


True.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

53 matches

Mail list logo