date:20130507

Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-07 Thread Fabien COELHO



Reviews are to be posted to pgsql-hackers, and then linked from the 
CommitFest site.  Putting reviews only on the CommitFest site is bad 
practice.


Indeed. Sorry, shame on me!

I had not the original mail in my mailbox because I deleted it, I did not 
want to create a new thread because this is /also/ bad practice as I was 
recently told, and I was not motivated by fetching and reinstating the 
messages in my mailbox for a short one-liner review.


Weel, I'll do better next time.

Anyway, all Robins' test cases are basically a very good thing, especially 
as he tries corner cases, including checking for expected errors and 
permission denials.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-07 Thread Robins Tharakan

Completely agree. Although the poster was kind enough to intimate me by
email about his update there, but was wondering that if he hadn't, I
wouldnt' have dreamt that there is a feedback on the site, two months down
the line.

--
Robins Tharakan


On 8 May 2013 09:13, Robert Haas  wrote:

> On Tue, May 7, 2013 at 7:26 PM, Robins Tharakan 
> wrote:
> > Here is an updated patch that uses different schema / role names for
> > different tests (as per commitfest site feedback).
>
> I'm not sure what's going on here.  Reviews are to be posted to
> pgsql-hackers, and then linked from the CommitFest site.  Putting
> reviews only on the CommitFest site is bad practice.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

[HACKERS] remove src/tools/make_keywords?

2013-05-07 Thread Peter Eisentraut

It doesn't look as though it is used or usable.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-07 Thread Peter Geoghegan

On Tue, May 7, 2013 at 12:48 PM, Jeff Janes  wrote:
> Anyway, I don't see this behavior change when turning on wal_debug and
> looking in the logfiles for 'xlog flush request' messages.

That could have everything to do with the hardware you're using. In
general, the higher the cost of an fsync, the more useful it is to
amortize that cost among concurrently committing transactions.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-07 Thread Amit Langote

> Are you sure you properly cleared out the stats between profiling sessions?
> Also, XLogFlush gets called by background processes like autovac,
> checkpointer and bgwriter, in addition to being called by committing
> processes.  If one profiled session contained a checkpoint and other did
> not, or one just idled a lot longer between when the benchmark finished and
> when you shutdown the server, perhaps that explains it.
>
> Anyway, I don't see this behavior change when turning on wal_debug and
> looking in the logfiles for 'xlog flush request' messages.

Yes I did reset the stats between profiling sessions. Yes, probably
XLogFlush()'s done by other processes were not considered by me. Also,
after doing a few more runs (used pgbench -j 4 -c 32 -T 60) and
observing results, I think I am getting convinced there is no surprise
in getting XLogFlush() being called more often. Since, the patch
enables backends to return more quickly from XLogFlush() in more
number of cases than before, which in turn causes them to proceed to
further transactions which again call XLogFlush(). So, the increased
number of XLogFlush() should be accounted for by increase in
throughput that we see. In earlier mail, it might have been wrong of
me to conclude that the throughput rise and rise in #invocations of
XLogFlush() are not proportional. I think they are, though not
rigorously as far as I could measure. I am wondering if this line of
interpreting it is correct?


--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-07 Thread Robert Haas

On Tue, May 7, 2013 at 7:26 PM, Robins Tharakan  wrote:
> Here is an updated patch that uses different schema / role names for
> different tests (as per commitfest site feedback).

I'm not sure what's going on here.  Reviews are to be posted to
pgsql-hackers, and then linked from the CommitFest site.  Putting
reviews only on the CommitFest site is bad practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement

2013-05-07 Thread Peter Geoghegan

On Tue, May 7, 2013 at 7:52 PM, Amit Langote  wrote:
> So, any rise in number of XLogFlush() calls should roughly
> be accounted for by increased throughput. Am I right in interpreting
> it this way?

I think so. There certainly isn't any question that the increased
throughput and the increased number of XLogFlush() calls are because
of the new group commit behavior. The cost of a WAL write + flush is
more effectively amortized, and so XLogFlush() calls becomes cheaper.
I'm not prepared to make any predictions as to exactly how they might
relate.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] local_preload_libraries logspam

2013-05-07 Thread Peter Geoghegan

On Tue, May 7, 2013 at 7:01 PM, Tom Lane  wrote:
> On the other hand, if we have it as DEBUG2 in the EXEC_BACKEND code
> path, I would be willing to argue that that's too low.  If you're
> starting to feel a need to inquire into the backend's behavior, knowing
> about loaded modules seems like one of the first things you need to know
> about.  Hence, I'd vote for DEBUG1.

+1 to DEBUG1.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make targets of doc links used by phpPgAdmin static

2013-05-07 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On Tue, 2013-05-07 at 00:32 -0500, Karl O. Pinc wrote:
> > Attached is a documentation patch against head which makes
> > static the targets of the on-line PG html documentation that
> > are referenced by the phpPgAdmin help system.e
> 
> done

I wonder about backpatching this to 9.2 ?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement

2013-05-07 Thread Amit Langote

> Why is that surprising? Most of those XLogFlush() calls will recheck
> the flushed-up-to point, and realize that another backend assumed the
> role of group commit leader, and flushed their WAL for them, so aside
> from the wait, the call to XLogFlush is cheap for that individual
> backend. It's being invoked twice as many times because backends *can*
> invoke it twice as many times.

After going through it again, I think, I am getting convinced it is
not surprising. Since, backends are now able to return quickly from
XLogFlush(), on an average, they should also be able to proceed to
next transactions faster and hence cause XLogFlush() to be invoked
more often. So, any rise in number of XLogFlush() calls should roughly
be accounted for by increased throughput. Am I right in interpreting
it this way?

--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] local_preload_libraries logspam

2013-05-07 Thread Tom Lane

Peter Eisentraut  writes:
> On Tue, 2013-05-07 at 14:28 -0700, Peter Geoghegan wrote:
>> It seems like an oversight to me that local_preload_libraries causes a
>> new log message to appear each time a new connection is established.

> It is correct in my view (but perhaps I have just gotten used to it),
> but I wouldn't mind if you wanted to make an informed change.

It seems reasonable to me to reduce it to DEBUG1 level.  In most
use-cases, that message will appear on every backend start, rendering
its usefulness debatable.  Peter's characterization as "log spam" might
be excessive, but probably not by much.

On the other hand, if we have it as DEBUG2 in the EXEC_BACKEND code
path, I would be willing to argue that that's too low.  If you're
starting to feel a need to inquire into the backend's behavior, knowing
about loaded modules seems like one of the first things you need to know
about.  Hence, I'd vote for DEBUG1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-05-07 Thread KONDO Mitsumasa


(2013/05/07 22:40), Heikki Linnakangas wrote:

On 26.04.2013 11:51, KONDO Mitsumasa wrote:

So I fix CreateRestartPoint at branching point of executing
MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.


I didn't understand this. I committed a fix for the issue where recycled WAL
files get in the way of recovery, but I'm not sure if you're describing the same
issue or something else. But before we dig any deeper into this, can you verify
if you're still seeing any issues on 9.3beta1?
Yes, my fix was very plain and mistake fix point. I see your patch, your fix is 
right. I posted another problem that is cannot promote and PITR. It might be 
different problem, uut I could not reproduce it...
Please wait for more! I try to analyze and test in 9.2.4 with your patch and 
9.3beta1.


Best regards,
--
NTT Open Source Software Center
Mitsumasa KONDO


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-07 Thread Tom Lane

Greg Stark  writes:
> If we just reverted your fix and didn't fix it in 9.2 that would also
> fix the crash right? The bug was only that it leaked the fact that the
> view was provably empty from the definition?

Well, it might fail to report a permissions violation when the
not-allowed-to-be-accessed relation could be proven to yield no rows.
I agree that it's a bit hard to call that a security issue as long as
you assume that the attacker has access to the system catalogs; and
even if you don't assume that, being able to discern that there's a
check constraint on some table doesn't seem like a big leakage.

I had originally thought that the issue only occurred in 9.2, but it
turns out that the appendrel form of the problem occurs at least as far
back as 8.4; for example the following admittedly-artificial query

select * from
  ((select f1 as x from t1 offset 0)
   union all
   (select f2 as x from t2 offset 0)) ss
where false;

will not throw an error in any current release, even if the caller lacks
select privilege on t1 and/or t2.  With manipulation of the outer WHERE
clause, you could find out about the nature of any check constraints on
t1/t2.  It's easier to see the bug in 9.2 because you no longer need a
UNION ALL, but that doesn't much change the argument about whether it's
a security issue.

Given that forms of the bug have been around for a long time without
anyone noticing, it might be okay to leave it unfixed in the back
branches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make targets of doc links used by phpPgAdmin static

2013-05-07 Thread Peter Eisentraut

On Tue, 2013-05-07 at 00:32 -0500, Karl O. Pinc wrote:
> Attached is a documentation patch against head which makes
> static the targets of the on-line PG html documentation that
> are referenced by the phpPgAdmin help system.e

done




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] local_preload_libraries logspam

2013-05-07 Thread Peter Eisentraut

On Tue, 2013-05-07 at 14:28 -0700, Peter Geoghegan wrote:
> It seems like an oversight to me that local_preload_libraries causes a
> new log message to appear each time a new connection is established.

It is correct in my view (but perhaps I have just gotten used to it),
but I wouldn't mind if you wanted to make an informed change.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-07 Thread Robins Tharakan

Hi,

Here is an updated patch that uses different schema / role names for
different tests (as per commitfest site feedback).

--
Robins Tharakan


On 18 March 2013 17:16, robins  wrote:

> Hi,
>
> Attached is an updated patch that uses better schema / role names.
>
> --
> Robins Tharakan
>
> On 18 March 2013 12:59, robins  wrote:
>
>> Thanks Alvaro.
>>
>> Since the tests were running fine, I didn't bother with elaborate names,
>> but since you're concerned guess I'll make that change and re-submit.
>>
>> And yes, I've already submitted (to Commitfest) another patch related to
>> regression tests for SEQUENCE.
>> Would submit the SCHEMA patch once the above change is done.
>> --
>> Robins
>>
>>
>> On 18 March 2013 09:42, Alvaro Herrera  wrote:
>>
>>> robins escribió:
>>> > Hi,
>>> >
>>> > Please find attached a patch to take 'make check' code-coverage of
>>> SCHEMA
>>> > from 33% to 98%.
>>> >
>>> > Any feedback is more than welcome.
>>>
>>> I think you should use more explicit names for shared objects such as
>>> roles -- i.e. not "r1" but "regression_user_1" and so on. (But be
>>> careful about role names used by other tests).  The issue is that these
>>> tests might be run in a database that contains other stuff; certainly we
>>> don't want to drop or otherwise affect previously existing roles.
>>>
>>> > p.s.: I am currently working on more regression tests (USER / VIEW /
>>> > DISCARD etc). Please let me know if I need to post these as one large
>>> > patch, instead of submitting one patch at a time.
>>>
>>> I think separate patches is better.  Are you adding these patches to the
>>> upcoming commitfest, for evaluation?  https://commitfest.postgresql.org
>>>
>>> --
>>> Álvaro Herrerahttp://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>>
>


regress_schema_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-07 Thread Greg Stark

On Tue, May 7, 2013 at 6:59 PM, Tom Lane  wrote:
> The main downside of this
> is that fully dummy appendrels (those with no live children at all,
> such as in your example) wouldn't be recognized by IS_DUMMY_PATH,
> so the quality of planning in the outer query would be slightly
> degraded.  But such cases are probably sufficiently unusual that this
> might be an okay price to pay for a back branch

That's kind of dismaying. ORMs have a tendency to create queries like
this and people may have even written such queries by hand and tested
them to determine that postgres was able to exclude the useless
relation. To have them install a security update and discover that
something they had previously tested no longer worked would be
annoying.

If we just reverted your fix and didn't fix it in 9.2 that would also
fix the crash right? The bug was only that it leaked the fact that the
view was provably empty from the definition? But it's been that way
for a long time and after all:

xxx=> select * from pg_views where viewname = 'v';
 schemaname | viewname | viewowner |   definition
+--+---+
 public | v| stark |  SELECT 1 +
|  |   |   WHERE false;
(1 row)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add some regression tests for SEQUENCE

2013-05-07 Thread Robins Tharakan

Hi,

Have provided an updated patch as per Fabien's recent response on
Commitfest site.
Any and all feedback is appreciated.

--
Robins Tharakan


regress_sequence_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] about index inheritance

2013-05-07 Thread Robert Haas

On Mon, May 6, 2013 at 9:30 AM, Vincenzo Melandri  wrote:
> Hi guys,
>
> My first post here :)
> I stumbled into the same problem as this guy
> http://www.postgresql.org/message-id/4be2835a.5020...@cybertec.at
> , so since I have some spare time recently, I've set-up the development
> environment for postgresql and I think I may be able to contibute for the
> feature of index inheritance, that is currently unsopported, but listed in
> TODOs.
>
> I've spent some time reading the docs and I took a look at the code. Is
> anybody out there working on this already? I don't want to overlap someone
> else effort, plus I'll gladly take any advice or join the community efforts
> if any, 'cause this feature seems pretty huge to me at a first glance..

This is a really hard problem.  If you pick this as your first project
hacking on PostgreSQL, you will almost certainly fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] local_preload_libraries logspam

2013-05-07 Thread Peter Geoghegan

It seems like an oversight to me that local_preload_libraries causes a
new log message to appear each time a new connection is established.

Is there any sympathy for the view that we should have a way of
turning this off, or simply not log such messages? We could still have
it appear at DEBUG2 level, as sometimes happens in the EXEC_BACKEND
case (granted, this is just so that there is behavior equivalent to
the !EXEC_BACKEND case for shared_preload_libraries).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issues with dropped columns in plpgsql code again

2013-05-07 Thread Szymon Guz

On 7 May 2013 21:23, Pavel Stehule  wrote:

> sorry
>
> my test
>
> create table f1(a int, b int, c varchar, dropped_column numeric, d
> varchar);
>
> create or replace function f1_trg()
> returns trigger as $$
> declare _f1_var f1;
> begin raise notice 'run trigger';
>   _f1_var := new;
>   return _f1_var;
> end;
> $$ language plpgsql;
>
> create trigger xx before insert on f1 for row execute procedure f1_trg();
>
> insert into f1 values(1,1,'aaa',1.1,'aaa');
> alter table f1 drop column dropped_column ;
>
> insert into f1 values(1,1,'aaa','aaa');
>
>
>
Fails for me as well. I managed to run the last query either with
restarting session, or disabling the trigger.

Checked that on PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit

regards,
Szymon

Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-07 Thread Jeff Janes

On Tue, May 7, 2013 at 2:20 AM, Amit Langote wrote:

> Hello,
>
> I have been trying to understand how group commit implementation works
> the way it does after 9.2 group commit enhancement patch
> (9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit
> it's a pretty old commit though I seek some clarification as to how it
> provides the performance gain as it does. Also, I have observed some
> behavior in this regard that I could not understand.
>
> Profiling results show that XLogFlush is called about twice as much
> after this patch while for XLogWrite count remains about same as
> before.

Are you sure you properly cleared out the stats between profiling sessions?
 Also, XLogFlush gets called by background processes like autovac,
checkpointer and bgwriter, in addition to being called by committing
processes.  If one profiled session contained a checkpoint and other did
not, or one just idled a lot longer between when the benchmark finished and
when you shutdown the server, perhaps that explains it.

Anyway, I don't see this behavior change when turning on wal_debug and
looking in the logfiles for 'xlog flush request' messages.

> I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result
> (after applying the patch) not being significantly different (as in
> not twice as much on my system).
>

1000 is a very small number of transactions to run a benchmark for.  What
was the duration?

What were the actual TPS numbers?  Does your hardware honor fsyncs?

Cheers,

Jeff

Re: [HACKERS] issues with dropped columns in plpgsql code again

2013-05-07 Thread Pavel Stehule

sorry

my test

create table f1(a int, b int, c varchar, dropped_column numeric, d varchar);

create or replace function f1_trg()
returns trigger as $$
declare _f1_var f1;
begin raise notice 'run trigger';
  _f1_var := new;
  return _f1_var;
end;
$$ language plpgsql;

create trigger xx before insert on f1 for row execute procedure f1_trg();

insert into f1 values(1,1,'aaa',1.1,'aaa');
alter table f1 drop column dropped_column ;

insert into f1 values(1,1,'aaa','aaa');


2013/5/7 Pavel Stehule 

> Hello
>
> One user reported a issues with triggers related to dropped columns.
>
> When I checked his code, I found a more possible problems.
>
> He use a trigger in form
>
> $$
> DECLARE somevar targettable;
> BEGIN
>   somevar := NEW;
>   // do some with somevar;
>   RETURN somevar;
> END;
> $$
>
> When I dropped column (I dropped numeric column) I had to finish session,
> because I got a error in assign
>
> LOCATION:  exec_stmt_raise, pl_exec.c:2985
> ERROR:  22P02: invalid input syntax for type numeric: "aaa"
> CONTEXT:  PL/pgSQL function f1_trg() line 4 at assignment
> LOCATION:  set_var_from_str, numeric.c:3253
>
> Regards
>
> Pavel
>

[HACKERS] issues with dropped columns in plpgsql code again

2013-05-07 Thread Pavel Stehule

Hello

One user reported a issues with triggers related to dropped columns.

When I checked his code, I found a more possible problems.

He use a trigger in form

$$
DECLARE somevar targettable;
BEGIN
  somevar := NEW;
  // do some with somevar;
  RETURN somevar;
END;
$$

When I dropped column (I dropped numeric column) I had to finish session,
because I got a error in assign

LOCATION:  exec_stmt_raise, pl_exec.c:2985
ERROR:  22P02: invalid input syntax for type numeric: "aaa"
CONTEXT:  PL/pgSQL function f1_trg() line 4 at assignment
LOCATION:  set_var_from_str, numeric.c:3253

Regards

Pavel

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> > It helps in that once we have the lock, things aren't changing under us.
> > The closer we can keep that to when the transaction starts, the better..
> 
> If you look at my example the timing where we take the snapshot isn't
> the problem. While we wait for a lock on one object the not-yet-locked
> objects can still change, get dropped et al. That window is so big that
> the timing around the snapshot acquiration and trying to get the first
> lock is insignificant. 

I wasn't talking about just getting the first lock but rather all the
locks..  I agree that we can get stuck behind something else which is
already holding a lock and that's certainly a problem.

> Remember this is only a problem *if* there is
> concurrent DDL. And in that case we very, very likely will have access
> exlusive locks and thus will wait for them and have the described
> problem.

Yes, I understand that issue.

> I don't think thats entirely possible. Think e.g. of constraints,
> determination of HOTability, ... All those need to look at the database
> state as its valid now, not as it was valid back when our snapshot
> started.

There will certainly still need to be parts of the system which are
using SnapshotNow, yes.  Catalog MVCC is a rather massive discussion
which isn't going to be solved here.

> In the logical decoding patches (submitted previously, will get
> resubmitted when 9.3 is getting somewhere), we export a snapshot once we
> have read enough WAL that all changes from that point henceforth can be
> streamed out. That snapshot is obviously very useful to get a new node
> up and running.

At that point, you take a snapshot and want to export it for pg_dump to
use to get the same view of the DB and then dump/restore it into a new
database where you'll then start applying changes from the logical
replication..?  Interesting, but certainly also quite a specialized
use-case, no?  How do you plan to handle the locking concerns which have
been raised during this discussion?  Do you take locks out before
creating the snapshot to pass to pg_dump?

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] \watch stuck on execution of commands returning no tuples

2013-05-07 Thread Robert Haas

On Tue, May 7, 2013 at 2:36 PM, Ian Lawrence Barwick  wrote:
> As a British native speaker involved in translating some PostgreSQL-related
> Japanese text, all I can say is "yes please". (Although for true Japanese
> support, the grammar would have to be pretty much reversed, with the verb
> being placed last; and WHERE would come before SELECT, which might
> challenge the parser a little).

I am personally of the opinion that whoever designed SQL was far too
concerned about making it look like English and insufficiently
concerned about making it pleasant to use.  Since the target list
comes before the FROM clause, you see (or must write) what you want to
select from which table aliases before defining what those table
aliases mean.  Overall, you end up with an organization where you
define the aliases in the middle, and then half the stuff that uses
those definitions is at the beginning (in the target list) while the
other half is at the end (in the WHERE clause).  Strange!

But, it's a standard, so, we live with it.  And, none of the query
languages I've designed have gained quite as much traction as SQL, so
who am I to complain?

>> (I don't know whether I'm joking, so don't ask.)
>
> I think I am joking.

Keep me posted as things develop.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \watch stuck on execution of commands returning no tuples

2013-05-07 Thread Ian Lawrence Barwick

2013/5/8 Robert Haas :
> On Thu, May 2, 2013 at 7:53 PM, Bruce Momjian  wrote:
>> OK, what other database supports British spelling of commands?  Can we
>> call this a compatibility feature.  ;-)  The feature has been there
>> since 2000:
>>
>> commit ebe0b236909732c75d665c73363bd4ac7a7aa138
>> Author: Bruce Momjian 
>> Date:   Wed Nov 8 21:28:06 2000 +
>>
>> Add ANALYSE spelling of ANALYZE for vacuum.
>
> Maybe we should also support ANALIZAR, ANALYSER, and 分析する.

As a British native speaker involved in translating some PostgreSQL-related
Japanese text, all I can say is "yes please". (Although for true Japanese
support, the grammar would have to be pretty much reversed, with the verb
being placed last; and WHERE would come before SELECT, which might
challenge the parser a little).

> (I don't know whether I'm joking, so don't ask.)

I think I am joking.

Regards

Ian Barwick

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure at standby promotion

2013-05-07 Thread Heikki Linnakangas


On 06.05.2013 13:08, Heikki Linnakangas wrote:

On 03.05.2013 18:17, Fujii Masao wrote:

Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG: received promote request
2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes


I was able to reproduce this. The assertion checks that if the system is
promoted at WAL location X, we must not have already sent WAL at > X to
the client. As the code stands, that assumption is wrong; the walsender
will merrily stream WAL that hasn't been replayed yet, and the system
can be promoted before replaying all the WAL that has been streamed to a
cascading standby. The comment in GetStandbyFlushRecPtr(), which is the
function that determined how far the WAL may be streamed to a cascading
standby, says this:


/*
* We can safely send what's already been replayed. Also, if walreceiver
* is streaming WAL from the same timeline, we can send anything that
* it has streamed, but hasn't been replayed yet.
*/


There seems to be two bugs here:

1. This used to work in 9.2, because the startup process would always
replay all the WAL present in pg_xlog before promoting (the WAL present
in pg_xlog was streamed from master). But the refactorings in xlog.c in
9.3devel broke that, so that the startup process can promote earlier.

2. Even after fixing the logic in xlog.c, there is still a corner-case
where the startup process can promote before all the WAL that has been
received from walreceiver has been received. That happens if the WAL
streaming is terminated at a page boundary, rather than at a record
boundary. For example, we might have received WAL up to the page
boundary at 0/5BFA000, but the last *complete* record that we have
received ends at 0/5BF9BD8.

To fix the second issue, I think two things need to happen. First, we
need to suppress the check in walsender. Second, we need to teach the
WAL replay to back off when that happens. At the moment, the replay in
the cascading standby gets stuck, trying to fetch the next page
containing rest of the partial WAL record. Instead, it should throw away
the partial record it has, and resync at the end of the last replayed
record.


I came up with the attached patch for this 
(fix-standby-promotion-assert-fail-2.patch). You will also need to apply 
fast-promotion-quick-fix.patch to work around the bug in fast promotion 
I reported here: 
http://www.postgresql.org/message-id/5188cffa.3020...@vmware.com.


pg_receivexlog has a variant of the same bug. If the server has sent WAL 
up to segment boundary, and the segment boundary splits a WAL record in 
half, and the server is then promoted so that the promotion checkpoint 
(or end-of-recovery) record goes to the previous segment, pg_receivexlog 
will not fetch the segment containing the checkpoint record correctly. 
When following a timeline switch, it should restart streaming from the 
new timeline at the point where the timeline switch occurred, rather 
than at the point on the old timeline where it stopped. Usually it's the 
same thing, but not if the streaming was paused at a page or segment 
boundary.


To fix this for pg_receivexlog, I added the start position of the next 
timeline to the result set that the server sends at end of WAL 
streaming. Previously, it only sent the TLI of the next timeline, and it 
was assumed that the starting point is the same as the point where 
streaming was stopped.


It's a bit late to change it, but I have to say I don't like this whole 
business of relaying WAL to a cascading standby that hasn't been 
replayed. It would be a lot simpler if you could assume that whatever 
gets streamed downstream has already been replayed upstream. I 
understand that it's nice from a performance point of view, and because 
a cascading standby isn't then lagged behind if the replay gets stalled 
in the upstream server because of a hot standby conflict. But still..


I'm going to sleep over this and continue testing tomorrow. Please have 
a look.



I think 9.2 has the same bug, BTW. Without support for timeline
switches over streaming replication, it was just more difficult to hit.


Looking closer, I believe 9.2 is OK. Recovery loops back to retry the 
whole record correctly, it was just the timeline switch over streamin

Re: [HACKERS] \watch stuck on execution of commands returning no tuples

2013-05-07 Thread Robert Haas

On Thu, May 2, 2013 at 7:53 PM, Bruce Momjian  wrote:
> OK, what other database supports British spelling of commands?  Can we
> call this a compatibility feature.  ;-)  The feature has been there
> since 2000:
>
> commit ebe0b236909732c75d665c73363bd4ac7a7aa138
> Author: Bruce Momjian 
> Date:   Wed Nov 8 21:28:06 2000 +
>
> Add ANALYSE spelling of ANALYZE for vacuum.

Maybe we should also support ANALIZAR, ANALYSER, and 分析する.

(I don't know whether I'm joking, so don't ask.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Andres Freund

On 2013-05-06 13:07:17 -0400, Tom Lane wrote:
> I'm afraid that this is institutionalizing a design deficiency in
> pg_dump; namely that it takes its snapshot before acquiring locks.

I have suggested this before, but if pg_dump would use SELECT FOR SHARE
in the queries it uses to build DDL it would detect most if not all
modifications for most database objects including tables. Sure, it would
error out, but thats far better than a silently corrupt dump:

S1: =# CREATE TABLE testdump();
S2: =# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
S2: =# SELECT count(*) FROM pg_class; --acquire snapshot
S1: =# ALTER TABLE testdump ADD COLUMN a text;
S2: =#
-# SELECT * FROM pg_class cls
-# JOIN pg_attribute att ON (cls.oid = att.attrelid)
-# WHERE cls.oid = 'testdump'::regclass FOR UPDATE
ERROR: could not serialize access due to concurrent update

The serialization failure could be caught and translated into some error
message explaining that concurrent ddl prevented pg_dump from working
correctly. I don't immediately see a case where that would prevent valid
backups from working.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-07 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 1, 2013 at 6:27 PM, Tom Lane  wrote:
>> Fix permission tests for views/tables proven empty by constraint exclusion.

> I believe that this commit is responsible for the fact that the
> following test case now crashes the server:

> rhaas=# create or replace view foo (x) AS (select 1 union all select 2);
> CREATE VIEW
> rhaas=# select * from foo where false;
> The connection to the server was lost. Attempting reset: Failed.

OK, after looking a bit closer, this is actually a situation I had been
wondering about last week, but I failed to construct a test case proving
there was a problem.  The issue is that the "append rel" code path also
needs changes to handle the fact that subqueries that are appendrel
members might get proven empty by constraint exclusion.  (Even aside
from the crash introduced here, the excluded subqueries fail to
contribute to the final rangetable for permissions checks.)

Unfortunately this blows a bit of a hole in the minimal fix I committed
last week; there's no nice way to include these dummy subqueries into
the plan tree that's passed on to setrefs.c.  Ideally we'd add an
out-of-band transmission path for them, and I think I will fix it that
way in HEAD, but that requires adding a new List field to PlannerInfo.
I'm afraid to do that in 9.2 for fear of breaking existing planner
plugin modules.

One way to fix it in 9.2 is to teach setrefs.c to thumb through the
append_rel_list looking for excluded child subqueries to pull up their
rangetables.  Icky, but it shouldn't cost very many cycles in typical
queries.  The main downside of this solution is that it would add at
least several dozen lines of new-and-poorly-tested code to a back
branch, where it would get no additional testing to speak of before
being loosed on users.

The other way I can think of to handle it without PlannerInfo changes
is to lobotomize set_append_rel_pathlist so that it doesn't omit dummy
children from the AppendPath it constructs.  The main downside of this
is that fully dummy appendrels (those with no live children at all,
such as in your example) wouldn't be recognized by IS_DUMMY_PATH,
so the quality of planning in the outer query would be slightly
degraded.  But such cases are probably sufficiently unusual that this
might be an okay price to pay for a back branch.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-07 Thread Jeff Davis

On Tue, 2013-05-07 at 13:20 -0400, Robert Haas wrote:
> Hmm.  Rereading your last email, I see your point: since we now have
> HEAP_XLOG_VISIBLE, this is much less of an issue than it would have
> been before.  I'm still not convinced that simplifying that code is a
> good idea, but maybe it doesn't really hurt us much in practice.

Given that there's not a big impact one way or another, I don't mind
whether this particular patch is committed or not. Whichever you think
is more understandable and safer at this late hour. Also, to be clear,
the fact that I posted a patch was not meant to advocate either way;
merely to present the options.

Not sure exactly what you mean about the code simplification, but I
agree that anything more substantial than this patch should be left for
9.4.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread bricklen

On Tue, May 7, 2013 at 10:02 AM, Dimitri Fontaine wrote:

> Rather than take some locks, you can now prevent the database objects
> from changing with an event trigger. pg_dump could install that event
> trigger in a preparing transaction, then do its work as currently, then
> when done either remove or disable the event trigger.
>
> All the event trigger has to do is unconditionnaly raise an exception
> with a message explaining that no DDL command is accepted during when a
> dump is in progress.
>

I'm thinking of a case where a hot standby is executing a pg_dump and DDL
is issued on the master -- would that cause any unexpected problems on the
hot standby?

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-07 Thread Robert Haas

On Mon, May 6, 2013 at 5:04 PM, Jeff Davis  wrote:
> On Mon, 2013-05-06 at 15:31 -0400, Robert Haas wrote:
>> On Wed, May 1, 2013 at 3:04 PM, Jeff Davis  wrote:
>> > Regardless, you have a reasonable claim that my patch had effects that
>> > were not necessary. I have attached a draft patch to remedy that. Only
>> > rudimentary testing was done.
>>
>> This looks reasonable to me.
>
> Can you please explain the scenario that loses many VM bits at once
> during a crash, and results in a bunch of already-all-visible heap pages
> being dirtied for no reason?

Hmm.  Rereading your last email, I see your point: since we now have
HEAP_XLOG_VISIBLE, this is much less of an issue than it would have
been before.  I'm still not convinced that simplifying that code is a
good idea, but maybe it doesn't really hurt us much in practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Andres Freund

On 2013-05-07 16:50:52 +0100, Greg Stark wrote:
> What's the worst case for using an old snapshot? If I try to access a
> table that doesn't exist any longer I'll get an error. That doesn't
> really seem that bad for the use case I described. It's worse for the
> full table dump but for an explicit list of tables, eh. Seems ok to
> me.

Its worth than that, you can get a dump that dumps successfully but
doesn't restore:
http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de

But that's not really related to snapshots. And not related to the
patch. Imo the whole focus on the time between snapshot taking and
taking the locks is a misguided and not really the problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Andres Freund

On 2013-05-07 11:01:48 -0400, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> > > Agreed- but it also isn't currently possible to make it any smaller.
> > 
> > Uh. Why not? I think this is what needs to be fixed instead of making
> > the hole marginally smaller elsewhere. 
> 
> If we're able to fix it- how would we allow users to take advantage of
> that fix when starting their own snapshot and feeding it to pg_dump?

Depends on the fix. I think the most realistic one is making sure
central definitions haven't changed. In that case they wouldn't have to
do anything.
If we get some DDL lock or something they would probably need to do
something like SELECT pg_prepare_database_dump(); or something - likely
the same pg_dump would use.

> > You can trivially reproduce the
> > problem with pg_dump today:
> 
> Agreed.  The idea was to simply avoid making it worse.  Your argument
> seems to be that it's already horrible and easily broken and therefore
> we can go ahead and make it worse and no one will complain because no
> one has complained about how bad it is already.  I don't follow that.

It doesn't change *anything* with the fundamental problems. We could
also say we forbid ALTER TABLE taking exlusive locks because that makes
the problem far more noticeable.

> > > My suggestion was to lock everything that pg_dump locks, which we
> > > clearly have locks for since pg_dump is acquiring them.  Also, I don't
> > > believe it'd be that difficult to identify what pg_dump would lock, at
> > > least in a 'default' whole-database run.  This is more of a stop-gap
> > > than a complete solution.
> > 
> > The problem is that locking - as shown above - doesn't really help all
> > that much.
> 
> It helps in that once we have the lock, things aren't changing under us.
> The closer we can keep that to when the transaction starts, the better..

If you look at my example the timing where we take the snapshot isn't
the problem. While we wait for a lock on one object the not-yet-locked
objects can still change, get dropped et al. That window is so big that
the timing around the snapshot acquiration and trying to get the first
lock is insignificant. Remember this is only a problem *if* there is
concurrent DDL. And in that case we very, very likely will have access
exlusive locks and thus will wait for them and have the described
problem.

> > You would have to do it like:
> > 1) start txn
> > 2) acquire DDL prevention lock
> > 3) assert we do not yet have a snapshot
> > 4) acquire snapshot
> > 5) lock objects
> > 6) release DDL lock
> > 7) dump objects/data
> > 8) commit txn
> 
> We'd need a '4.5' to get the list of objects, no?

Yea, I had folded that into lock objects.

> > pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
> > would make it unrunnable in many situations. Especially as we cannot
> > easily (without using several connections at once) release locks before
> > ending a transaction.
> 
> I agree that we wouldn't want pg_dump preventing all catalog updates,
> but wouldn't we be able to draw a distinction between objects being
> modified and objects being added?

I don't easily see how. Often enough object creation modifies rows at
some point. So we would need intelligence at some higher level. I guess
it might be possible to reuse the event trigger infrastructure if it
could do all that already...

> > >  This will be an interesting thing to consider when
> > > implementing MVCC for the catalog.
> > 
> > I think using proper mvcc snapsot for catalog scans doesn't, cannot even
> > in all case, imply having to use the user's transaction's snapshot, just
> > one that guarantees a consistent result while a query is running.
> 
> The hope would be to see a consistent view of what you can access while
> the transaction is running..

I don't think thats entirely possible. Think e.g. of constraints,
determination of HOTability, ... All those need to look at the database
state as its valid now, not as it was valid back when our snapshot
started.

> > c) Quite possibly the snapshot we need needs to meet some special
> > criterions that pg_dump cannot guarantee.
> 
> That's a much more difficult case, of course.  It would help to have a
> better understanding of what, exactly, Simon's use-case for this feature
> is to answer if this is an issue or not.

I am sure its an issue for at least one of Simon's use cases. Because
one of them is also mine ;). But I think there are many usecases that
require this.

In the logical decoding patches (submitted previously, will get
resubmitted when 9.3 is getting somewhere), we export a snapshot once we
have read enough WAL that all changes from that point henceforth can be
streamed out. That snapshot is obviously very useful to get a new node
up and running.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Dimitri Fontaine

Robert Haas  writes:
> On Mon, May 6, 2013 at 1:07 PM, Tom Lane  wrote:
>> I'm afraid that this is institutionalizing a design deficiency in
>> pg_dump; namely that it takes its snapshot before acquiring locks.
>> Ideally that would happen the other way around.  I don't have a good
>> idea how we could fix that --- but a feature that allows imposition
>> of an outside snapshot will permanently foreclose ever fixing it.

I don't even understand how you could take locks for a later snapshot,
it seems to me you're talking about something much harder to design and
code than making the catalogs fully MVCC compliant.

>> What's more, this would greatly widen the risk window between when
>> the snapshot is taken and when we have all the locks and can have
>> some confidence that the DB isn't changing under us.

Rather than take some locks, you can now prevent the database objects
from changing with an event trigger. pg_dump could install that event
trigger in a preparing transaction, then do its work as currently, then
when done either remove or disable the event trigger.

All the event trigger has to do is unconditionnaly raise an exception
with a message explaining that no DDL command is accepted during when a
dump is in progress.

> Now, I'll grant you that this technique couldn't be used together with
> the proposed --snapshot option, but so what?  Many people do DDL
> infrequently enough that this is not a problem in practice.  But even
> if it is a problem, I don't see why that can't simply be a documented
> limitation of --snapshot.

IME it's a problem in practice, because people tend to want to run their
pg_dump and their rollouts within the same maintenance window. I mean,
those lucky guys out there having such thing as a maintenance window.

Again, it seems to me that the proper long term solution is both fully
MVCC catalogs and reduced locking for most commands.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-05-07 Thread Simon Riggs

On 7 May 2013 13:50, Heikki Linnakangas  wrote:

>> Can I suggest that we discuss a range of related changes together? So
>> we have a roadmap of agreed changes in this area. That will be more
>> efficient than discussing each one individually; often each one makes
>> sense only as part of the wider context.
>
>
> Sure, do you have something else in mind related to this?

Not right this second. But I feel it would be better to consider
things in a more top-down "what do we need in this area?" approach
than the almost random mechanisms we use now. Given each of us seems
to be equally surprised by what others are thinking, it would make
sense to have a broader topic-level discussion, make a list of the
thoughts and priorities in each area and discuss things as a whole.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-05-07 Thread Fujii Masao

On Tue, May 7, 2013 at 9:38 PM, Simon Riggs  wrote:
> On 3 May 2013 14:40, Heikki Linnakangas  wrote:
>> On 03.05.2013 16:29, Bruce Momjian wrote:
>>>
>>> On Fri, May  3, 2013 at 01:02:08PM +0200, Cédric Villemain wrote:

 This changes the existing API which will confuse people that know it
 and invalidate everything written in software and on wikis as to how
 to do it. That means all the "in case of fire break glass"
 instructions are all wrong and need to be rewritten and retested.
>>>
>>>
>>> Yes, *that* is the main reason *not* to make the change. It has a
>>> pretty bad cost in backwards compatibility loss. There is a gain, but
>>> I don't think it outweighs the cost.
>>
>>
>> So, is there a way to add this feature without breaking the API?
>
>
> Yes, by adding a new parameter exclusively used to control this feature,
> something like recovery_target_immediate = 'on/off'.


 We just need to add a named restore point when ending the backup (in
 pg_stop_backup() ?).
 No API change required. Just document that some predefined target names
 are set
 during backup.
>>>
>>>
>>> So we auto-add a restore point based on the backup label.  Does that
>>> work for everyone?
>>
>>
>> Unfortunately, no. There are cases where you want to stop right after
>> reaching consistency, but the point where you reach consistency is not at
>> the end of a backup. For example, if you take a backup using an atomic
>> filesystem snapshot, there are no pg_start/stop_backup calls, and the system
>> will reach consistency after replaying all the WAL in pg_xlog. You might
>> think that you can just not create a recovery.conf file in that case, or
>> create a dummy recovery.conf file with restore_command='/bin/false'.
>> However, then the system will not find the existing timeline history files
>> in the archive, and can pick a TLI that's already in use. I found this out
>> the hard way, and actually ended up writing a restore_command that restore
>> timeline history files normally, but returns non-zero for any real other
>> files; it wasn't pretty.
>>
>> Another case is that you take a backup from a standby server; you can't
>> write a restore-point WAL record in a standby.
>>
>> If we want to avoid adding a new option for this, how about a magic restore
>> point called "consistent" or "immediate":
>>
>> recovery_target_name='immediate'
>>
>> That would stop recovery right after reaching consistency, but there
>> wouldn't be an actual restore point record in the WAL stream.
>
> recovery_target_name='something'
>
> ...works for me. Either constent or immediate works.
>
> I request that the docs recommend this be used in conjunction with
> pause_at_recovery_target = on, so that the user can begin inspecting
> the database at the first available point and then roll forward from
> that point if desired.

And, we should forbid users from setting recovery_target_inclusive to false
when recovery_target_name is set to something like 'immediate'? Because
in this case, recovery would always end before reaching the consistent state
and fail.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Robert Haas

On Mon, May 6, 2013 at 1:07 PM, Tom Lane  wrote:
>> Exported snapshots allow you to coordinate a number of actions
>> together, so they all see a common view of the database. So this patch
>> allows a very general approach to this, much more so than pg_dump
>> allows currently since the exact timing of the snapshot is not
>> controlled by the user.
>
> I'm afraid that this is institutionalizing a design deficiency in
> pg_dump; namely that it takes its snapshot before acquiring locks.
> Ideally that would happen the other way around.  I don't have a good
> idea how we could fix that --- but a feature that allows imposition
> of an outside snapshot will permanently foreclose ever fixing it.
>
> What's more, this would greatly widen the risk window between when
> the snapshot is taken and when we have all the locks and can have
> some confidence that the DB isn't changing under us.

I don't find this argument very convincing.  The way you could fix the
ordering problem is:

1. take locks on all the objects you think you need to dump
2. update your snapshot and check whether the list of objects you
think you need to dump has changed
3. if yes, go to step 1, else done

Now, I'll grant you that this technique couldn't be used together with
the proposed --snapshot option, but so what?  Many people do DDL
infrequently enough that this is not a problem in practice.  But even
if it is a problem, I don't see why that can't simply be a documented
limitation of --snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Tom Lane

Greg Stark  writes:
> [ ideas about dumping some past state of a table ]

> If I try to access a table whose schema has changed then I might use
> the wrong tupledesc  and see rows that don't decode properly. That
> would be a disaster. Can we protect against that by noticing that the
> pg_class row isn't visible to our snapshot and throw an error? Would
> that be sufficient to protect against all schema changes? Would it
> cause massive false positives based on whether vacuum had happened to
> have run recently?

No, no, and I'm not sure :-(.  It might be sufficient to notice whether
the pg_class row and all relevant pg_attribute rows are visible in your
snapshot, at least for the narrow purpose of deciding whether you can
dump data.  (This would probably not, for instance, be enough to give
you reliable info about check or foreign key constraints.)

In general though, any such facility would surely block vacuuming on
the table, indeed probably *all* tables in the database, which would
be pretty disastrous in the long run.  I think a better answer for
people who need such a facility is to keep a WAL archive and use PITR
when they actually need to get back yesterday's data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement

2013-05-07 Thread Peter Geoghegan

On Tue, May 7, 2013 at 8:13 AM, Amit Langote  wrote:
> Profiling results show that XLogFlush() is called about twice as many
> times after this patch while for XLogWrite() count remains about same
> as before. This patch modifies XLogFlush() such that it  offers the
> said performance gain by alleviating the lock contention on
> WALWriteLock using the new LWLockAcquireOrWait(). I do not however
> understand why XLogFlush is invoked twice as many times (as seen from
> the profiling results) as an effect of this patch. Why should this
> patch make XLogFlush() being invoked twice as many times?

Why is that surprising? Most of those XLogFlush() calls will recheck
the flushed-up-to point, and realize that another backend assumed the
role of group commit leader, and flushed their WAL for them, so aside
from the wait, the call to XLogFlush is cheap for that individual
backend. It's being invoked twice as many times because backends *can*
invoke it twice as many times.

For the record, the new group commit code did more than just alleviate
lock contention. If it was true that amelioration of lock contention
fully explained the improvements, then surely sleeping on an exclusive
lock on WALWriteLock within XLogFlush would always hurt throughput,
when in fact it can considerably help throughput, as demonstrated by
9.3's commit_delay implementation.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Stephen Frost

Greg,

* Greg Stark (st...@mit.edu) wrote:
> One natural way to do it would be to make an option to pg_dump which
> caused it to do all the normal pre-dump things it would normally do,
> then export a snapshot and wait for the user. (Alternately it could
> even create a prepared transaction which iirc keeps the locks until
> it's committed). That gives users a way to get a snapshot that is
> guaranteed to work until that transaction is exited.

Right, that was one of the suggestions that I made up-thread a bit.

> But I don't think that would really make users happy. I think the
> usual use case for this  feature would be to dump a single table or
> small number of tables as of some time in the past that they didn't
> plan in advance that they would need. They might have a cron job
> periodically export a snapshot "just in case" and then want to use it
> later. 

The snapshot has to be 'alive' in order to be joined, so unless we're
changing something else, I don't think this cronjob approach would
actually work (unless it forks off and keeps the transaction open, but
I didn't read that from what you wrote..).

> They wouldn't be happy if they had to create a prepared
> transaction for each such snapshot which locked every table in their
> database until they decide they don't actually need it. That would
> mean they could never do any ddl.

If they don't lock the table then any DDL they do would break the backup
*anyway*.  This is a pretty clear example of exactly the problem with
simply adding this option to pg_dump..  The view of pg_class would be
from when the snapshot was taken, but without locks, anything could have
changed between then and when the backup tries to run.

> The use case of wanting to dump a single table as of a few hours ago
> (e.g. before some application data loss bug) is pretty compelling. If
> we could do it it I think it would be worth quite a bit.

I certainly like the idea, but it seems rather beyond what Simon was
after with this patch, aiui, and does come with some other concerns like
dealing with what happens if the table was modified after the snapshot
was taken (possible because there wasn't a lock on it).

> What's the worst case for using an old snapshot? If I try to access a
> table that doesn't exist any longer I'll get an error. That doesn't
> really seem that bad for the use case I described. It's worse for the
> full table dump but for an explicit list of tables, eh. Seems ok to
> me.

I'm not convinced that risk of error is the only issue with this..  What
if a column was dropped and then a new one put in its place (with the
same name)?

> If I try to access a table whose schema has changed then I might use
> the wrong tupledesc  and see rows that don't decode properly. That
> would be a disaster. Can we protect against that by noticing that the
> pg_class row isn't visible to our snapshot and throw an error? Would
> that be sufficient to protect against all schema changes? Would it
> cause massive false positives based on whether vacuum had happened to
> have run recently?

The way this is handled now is that we use syscache to get whatever the
current view of the object is when we're running queries against it.
The problem is that the 'current' view ends up being different from what
we see in pg_class / pg_attribute.  Andres provided a good example which
illustrates this upthread.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Greg Stark

On Tue, May 7, 2013 at 1:54 PM, Stephen Frost  wrote:
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks.  The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

One natural way to do it would be to make an option to pg_dump which
caused it to do all the normal pre-dump things it would normally do,
then export a snapshot and wait for the user. (Alternately it could
even create a prepared transaction which iirc keeps the locks until
it's committed). That gives users a way to get a snapshot that is
guaranteed to work until that transaction is exited.

But I don't think that would really make users happy. I think the
usual use case for this  feature would be to dump a single table or
small number of tables as of some time in the past that they didn't
plan in advance that they would need. They might have a cron job
periodically export a snapshot "just in case" and then want to use it
later. They wouldn't be happy if they had to create a prepared
transaction for each such snapshot which locked every table in their
database until they decide they don't actually need it. That would
mean they could never do any ddl.

The use case of wanting to dump a single table as of a few hours ago
(e.g. before some application data loss bug) is pretty compelling. If
we could do it it I think it would be worth quite a bit.

What's the worst case for using an old snapshot? If I try to access a
table that doesn't exist any longer I'll get an error. That doesn't
really seem that bad for the use case I described. It's worse for the
full table dump but for an explicit list of tables, eh. Seems ok to
me.

If I try to access a table whose schema has changed then I might use
the wrong tupledesc  and see rows that don't decode properly. That
would be a disaster. Can we protect against that by noticing that the
pg_class row isn't visible to our snapshot and throw an error? Would
that be sufficient to protect against all schema changes? Would it
cause massive false positives based on whether vacuum had happened to
have run recently?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement

2013-05-07 Thread Amit Langote

Hello,

I have been trying to understand how group commit implementation works
the way it does after 9.2 group commit enhancement patch
(9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I have
observed some behavior in this regard that I could not understand.

Profiling results show that XLogFlush() is called about twice as many
times after this patch while for XLogWrite() count remains about same
as before. This patch modifies XLogFlush() such that it  offers the
said performance gain by alleviating the lock contention on
WALWriteLock using the new LWLockAcquireOrWait(). I do not however
understand why XLogFlush is invoked twice as many times (as seen from
the profiling results) as an effect of this patch. Why should this
patch make XLogFlush() being invoked twice as many times?

I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result
after applying the patch not being significantly different (as in not
twice as much on my system).

I used:
pgbench scale=10

shared_buffers=256MB
max_connections=1000
checkpoint_segments=15
synchronous_commit=on

Comments?

--

Amit Langote


--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> > Agreed- but it also isn't currently possible to make it any smaller.
> 
> Uh. Why not? I think this is what needs to be fixed instead of making
> the hole marginally smaller elsewhere. 

If we're able to fix it- how would we allow users to take advantage of
that fix when starting their own snapshot and feeding it to pg_dump?

> You can trivially reproduce the
> problem with pg_dump today:

Agreed.  The idea was to simply avoid making it worse.  Your argument
seems to be that it's already horrible and easily broken and therefore
we can go ahead and make it worse and no one will complain because no
one has complained about how bad it is already.  I don't follow that.

> > My suggestion was to lock everything that pg_dump locks, which we
> > clearly have locks for since pg_dump is acquiring them.  Also, I don't
> > believe it'd be that difficult to identify what pg_dump would lock, at
> > least in a 'default' whole-database run.  This is more of a stop-gap
> > than a complete solution.
> 
> The problem is that locking - as shown above - doesn't really help all
> that much.

It helps in that once we have the lock, things aren't changing under us.
The closer we can keep that to when the transaction starts, the better..

> You would have to do it like:
> 1) start txn
> 2) acquire DDL prevention lock
> 3) assert we do not yet have a snapshot
> 4) acquire snapshot
> 5) lock objects
> 6) release DDL lock
> 7) dump objects/data
> 8) commit txn

We'd need a '4.5' to get the list of objects, no?

> Unfortunately most of these steps cannot easily/safely exposed to
> sql. And again, this is a very old situation, that doesn't really have
> to do anything with snapshot exporting.

Perhaps we can't expose the individual components, but perhaps we could
have a function which does some/all of the pieces (except the pieces
that must be done by pg_dump) and which users need to grant explicit
execute rights to for their "backup" user?

> pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
> would make it unrunnable in many situations. Especially as we cannot
> easily (without using several connections at once) release locks before
> ending a transaction.

I agree that we wouldn't want pg_dump preventing all catalog updates,
but wouldn't we be able to draw a distinction between objects being
modified and objects being added?

> > It's really a problem for just about everything that uses transactions
> > and locking, isn't it?  pg_dump just happens to have it worst since it
> > wants to go and touch every object in the database.  It's certainly
> > possible for people to connect to the DB, look at pg_class and then try
> > to access some object which no longer exists (even though it's in
> > pg_class).
> 
> Well, normal sql shouldn't need to touch pg_class and will know
> beforehand which locks it will need. But I have to say I more than once
> wished we would throw an error if an objects definition is "newer" than
> the one we started out with.

I agree- that would be nice.  I'd also contend that 'normal' SQL quite
often looks at pg_class; I know that we have tons of code which does.
We also basically lock all users out of the database when we're doing
DDL changes to avoid any issues, but it'd certainly be nice if we didn't
have to..

> >  This will be an interesting thing to consider when
> > implementing MVCC for the catalog.
> 
> I think using proper mvcc snapsot for catalog scans doesn't, cannot even
> in all case, imply having to use the user's transaction's snapshot, just
> one that guarantees a consistent result while a query is running.

The hope would be to see a consistent view of what you can access while
the transaction is running..

> I am not sure if the correct fix is locking and not just making sure the
> definition of objects hasn't changed since the snapshot started. 

I'm guessing that the intent of the locking is to make sure that the
objects don't change under us. :)  The problem, as you've explained, is
that the object might change before we get our lock in place.  That
window of opportunity gets a lot larger when it's moved outside of our
control.

> But if
> we go for locking creating a function which makes sure that the source
> transaction has a certain strong lock wouldn't be that hard. We have all
> the data for it.

Agreed.

> a) exporting a snapshot to a file was discussed and deemed unacceptable
> risky. That's why pg_export_snapshot() exports it itself into some
> internal format somewhere. The user doesn't need to know where/how.

It wasn't my intent to export the 'snapshot' to a file but rather to
simply dump out what pg_dump gets when it runs pg_export_snapshot(),
allowing others to connect in and use that snapshot.

> b) When importing a snapshot the source transaction needs to be alive,
> otherwise we cannot guarantee the global xmin hasn't advanced too
> much. That would

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Andres Freund

Hi,

On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > > All of which I
> > > think I agree with, but I don't agree with the conclusion that this
> > > larger window is somehow acceptable because there's a very small window
> > > (one which can't be made any smaller, today..) which exists today.
> >
> > The window isn't that small currently:
>
> Agreed- but it also isn't currently possible to make it any smaller.

Uh. Why not? I think this is what needs to be fixed instead of making
the hole marginally smaller elsewhere. You can trivially reproduce the
problem with pg_dump today:

S1:
$ psql postgres
=# CREATE DATABASE pgdumptest;
=# CREATE DATABASE pgrestoretest;
=# \c pgdumptest
=# CREATE TABLE tbl(id serial primary key, data_a int8, data_b float8);
=# INSERT INTO tbl(data_a, data_b) SELECT random()::int, random() FROM 
generate_series(1, 10);
=# BEGIN;
=# ALTER TABLE tbl RENAME COLUMN data_a TO data_swap;
=# ALTER TABLE tbl RENAME COLUMN data_b TO data_a;
=# ALTER TABLE tbl RENAME COLUMN data_swap TO data_b;

S2:
$ pg_dump pgdumptest > /tmp/pg_dump.sql

S1:
=# COMMIT;

S2:
$ psql pgrestoretest -f /tmp/pgdump.sql
psql:/tmp/pgdump.sql:87: ERROR:  invalid input syntax for integer: 
"0.944006722886115313"
CONTEXT:  COPY tbl, line 1, column data_a: "0.944006722886115313"

A ddl upgrade script taking some seconds isn't exactly anything
unusual...

> > > Alright, then let's provide a function which will do that and tell
> > > people to use it instead of just using pg_export_snapshot(), which
> > > clearly doesn't do that.
> >
> > If it were clear cut what to lock and we had locks for
> > everything. Maybe. But we don't have locks for everything.
>
> My suggestion was to lock everything that pg_dump locks, which we
> clearly have locks for since pg_dump is acquiring them.  Also, I don't
> believe it'd be that difficult to identify what pg_dump would lock, at
> least in a 'default' whole-database run.  This is more of a stop-gap
> than a complete solution.

The problem is that locking - as shown above - doesn't really help all
that much. You would have to do it like:
1) start txn
2) acquire DDL prevention lock
3) assert we do not yet have a snapshot
4) acquire snapshot
5) lock objects
6) release DDL lock
7) dump objects/data
8) commit txn

Unfortunately most of these steps cannot easily/safely exposed to
sql. And again, this is a very old situation, that doesn't really have
to do anything with snapshot exporting.

> > So we would
> > need to take locks preventing any modification on any of system catalogs
> > which doesn't really seem like a good thing, especially as we can't
> > release them from sql during the dump were we can allow creation of
> > temp tables and everything without problems.
>
> That's already an issue when pg_dump runs, no?  Not sure why this is
> different.

pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
would make it unrunnable in many situations. Especially as we cannot
easily (without using several connections at once) release locks before
ending a transaction.

> > > I believe the main argument here is really around "you should think
> > > about these issues before just throwing this in" and not "it must be
> > > perfect before it goes in".  Perhaps "it shouldn't make things *worse*
> > > than they are now" would also be apt..
> >
> > That's not how I read 8465.1367860...@sss.pgh.pa.us :(
>
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks.  The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

> The proposed patch certainly doesn't make any attempt to address that
> issue and would encourage users to open themselves up to this risk more
> than they are exposted today w/ pg_dump.

I fail to see a difference that is big enough to worry overly much
about. The above problem is easy enough to encounter without any
snapshot exporting and I can't remember a single report.

> > I think there is no point in fixing it somewhere else. The problem is in
> > pg_dump, not the snapshot import/export.
>
> It's really a problem for just about everything that uses transactions
> and locking, isn't it?  pg_dump just happens to have it worst since it
> wants to go and touch every object in the database.  It's certainly
> possible for people to connect to the DB, look at pg_class and then try
> to access some object which no longer exists (even though it's in
> pg_class).

Well, normal sql shouldn't need to touch pg_class and will know
beforehand which locks it will need. But I have to say I more than

[HACKERS] Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog

2013-05-07 Thread Heikki Linnakangas


On 08.04.2013 18:58, Jeff Bohmer wrote:


On Apr 6, 2013, at 1:24 PM, Jeff Janes  wrote:


On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas
wrote:


Perhaps we should improve the documentation to make it more explicit that
backup_label must be included in the backup. The docs already say that,
though, so I suspect that people making this mistake have not read the docs
very carefully anyway.


I don't think the docs are very clear on that.  They say "This file will of
course be archived as a part of your backup dump file", but "will be" does
not imply "must be".  Elsewhere it emphasizes that the label you gave to
pg_start_backup is written into the file, but doesn't really say what the
file itself is there for.  To me it seems to imply that the file is there
for your convenience, to hold that label, and not as a critical part of the
system.

Patch attached, which I hope can be back-patched.  I'll also add it to
commitfest-Next.


I think this documentation update would be helpful.


Committed that.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-05-07 Thread Heikki Linnakangas


On 26.04.2013 11:51, KONDO Mitsumasa wrote:

Hi,

I discavered the problem cause. I think taht horiguchi's discovery is
another problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in
archive recovery.
In recovery mode, I think that RestartPoint can write only
MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.

[mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
00020003 00020007
0002000B 0003.history
00020004 00020008
0002000C 0003000E
00020005 00020009
0002000D 0003000F
00020006 0002000A
0002000E archive_status


This problem case is here.

[Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: attempting to remove
WAL segments older than log file 00030002
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3568
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction
log file "00010002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction
log file "00020002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint complete:
wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2
recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3,
longest=1.176 s, average=0.392 s
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
[Standby] 2013-04-26 04:26:44 EDT LOG: 0: recovery restart point
at 0/90FE448
[Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction
was at log time 2013-04-26 04:25:53.203725-04
[Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint,
xlog.c:8601
[Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint starting: xlog
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart,
xlog.c:7821
cp: cannot stat `../arc/0003000F': そのようなファイル
やディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file
"0003000F" from archive: return code 256
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile,
xlog.c:3323
[Standby] 2013-04-26 04:26:44 EDT LOG: 0: unexpected pageaddr
0/200 in log file 0, segment 15, offset 0
[Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/0003000F': そのようなファイル
やディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file
"0003000F" from archive: return code 256


In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search
pg_xlog directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal)
in pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing
MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.


I didn't understand this. I committed a fix for the issue where recycled 
WAL files get in the way of recovery, but I'm not sure if you're 
describing the same issue or something else. But before we dig any 
deeper into this, can you verify if you're still seeing any issues on 
9.3beta1?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 7 May 2013 01:18, Stephen Frost  wrote:
> > * Simon Riggs (si...@2ndquadrant.com) wrote:
> >> If anybody really wanted to fix pg_dump, they could do. If that was so
> >> important, why block this patch, but allow parallel pg_dump to be
> >> committed without it?
> >
> > Because parallel pg_dump didn't make the problem any *worse*..?  This
> > does.
> 
> Sorry, not accurate. Patch makes nothing *worse*.

I really don't understand that claim.  It clearly increases the time
between transaction start and when the locks are all acquired.  If this
is used programatically (instead of someone hand-typing commands into
psql), perhaps it doesn't increase it very much, but increasing it at
all is clearly going in the "worse" direction.

> The existing API *can* be misused in the way you say, and so also
> could pg_dump if the patch is allowed.

And if the patch isn't allowed, then pg_dump isn't any worse off than
it's always been.  The corrollary is that pg_dump *is* worse off if the
patch goes in.  The patch is absolutely *encouraging* such misuse of
this API because there's almost no other way to use this new option to
pg_dump except as an abuse of the API.

> However, there is no reason to suppose that such misuse would be
> common; 

The concern is exactly this point.  It wouldn't be simply common, were
this patch applied and the option used, it would be the predominant
case and tools would be built which either completely ignore the problem
(most likely) or which attempt to duplicate what pg_dump already does,
likely very poorly.

> no reason why a timing gap would *necessarily* occur in the
> way your previous example showed, or if it did why it would
> necessarily present a problem for the user. Especially if we put
> something in the docs.

I don't believe papering over this with documentation is really a
solution.  I did suggest a few other options which don't seem to be
getting much traction from anyone, so perhaps they're not workable, but
I'll reiterate them for fun anyway:

Provide a way for users (perhaps even pg_dump) to acquire locks on
essentially everything in the DB.  This could be a plpgsql function or
perhaps even a new lock type.  Grander ideas might also support the
filtering options which pg_dump supports.

Have pg_dump output the snapshot which it has acquired and shared, for
others to then connect and use.  It would do this after acquiring all of
the locks, of course.

> Snapshots have no WARNING on them.

The documentation is certainly fallible, though this is certainly a
problem which all applications need to address in some way.  We address
it as best we can in pg_dump today and the goal is to continue to do so.

> There is no guarantee in any
> transaction that the table you want will not be dropped before you try
> to access it. 

Hence why we do the best we can in pg_dump by immediately locking the
objects, once we've identified what they are.

> > However,
> > there really isn't a way for users to use this new option correctly-
> 
> Not accurate.

Then please articulate how they would use it correctly?  Would they
write a function which goes and acquires all of the locks which pg_dump
would (assuming they've figured out what locks pg_dump needs), prior to
exporting the snapshot and calling pg_dump?  If so, perhaps we could
simplify their lives by providing such a function for them to use
instead of asking each user to write it?

> The argument against this is essentially that we don't trust the user
> to use it well, so we won't let them have it at all. Which makes no
> sense since they already have this API and don't need our permission
> to use it. All that blocking this patch does is to remove any chance
> the user has of coordinating pg_dump with other actions; preventing
> that causes more issues for the user and so doing nothing is not a
> safe or correct either. A balanced viewpoint needs to include the same
> level of analysis on both sides, not just a deep look at the worst
> case on one side and claim everything is rosy with the current
> situation.

I'm not sure where the claim that everything is rosy was made; I've
certainly not seen it nor made any claim that it is.  This whole
discussion started with Tom pointing out that pg_dump is essentially
already busted and that this change would end up making that situation
worse.  Perhaps in very select cases users will do the right thing and
acquire all the locks they need before exporting the snapshot and
calling pg_dump, but that description of how to use this option
correctly certainly wasn't articulated anywhere in the initial
description or contents of the patch.  There's also nothing done to make
that any easier to do nor a way to move users using this system to a new
methodology when we figure out how to make pg_dump do this better, thus
putting us in a situation of *always* having this issue, even if we fix
it for 'normal' pg_dump.

Thanks,

Stephe

Re: [HACKERS] pg_dump --snapshot

2013-05-07 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> > All of which I
> > think I agree with, but I don't agree with the conclusion that this
> > larger window is somehow acceptable because there's a very small window
> > (one which can't be made any smaller, today..) which exists today.
> 
> The window isn't that small currently:

Agreed- but it also isn't currently possible to make it any smaller.

> b) Locking all relations in a big database can take a second or some,
> even if there are no conflicting locks.

Yes, I've noticed.. :(  You can also run out of locks, which is quite
painful.

> > Alright, then let's provide a function which will do that and tell
> > people to use it instead of just using pg_export_snapshot(), which
> > clearly doesn't do that.
> 
> If it were clear cut what to lock and we had locks for
> everything. Maybe. But we don't have locks for everything. 

My suggestion was to lock everything that pg_dump locks, which we
clearly have locks for since pg_dump is acquiring them.  Also, I don't
believe it'd be that difficult to identify what pg_dump would lock, at
least in a 'default' whole-database run.  This is more of a stop-gap
than a complete solution.

> So we would
> need to take locks preventing any modification on any of system catalogs
> which doesn't really seem like a good thing, especially as we can't
> release them from sql during the dump were we can allow creation of
> temp tables and everything without problems.

That's already an issue when pg_dump runs, no?  Not sure why this is
different.

> Also, as explained above, the problem already exists in larger
> timeframes than referenced in this thread, so I really don't see how
> anything thats only based on plain locks on user objects can solve the
> issue in a relevant enough way.

The point is to try and avoid making the problem worse..

> I am comparing the time between 'snapshot acquiration' and 'getting
> the object list' with the time between 'getting the object list' and
> 'locking the object list'. What I am saying is that in many scenarios
> the second part will be the bigger part.

I can see how that can happen, sure.

> > I believe the main argument here is really around "you should think
> > about these issues before just throwing this in" and not "it must be
> > perfect before it goes in".  Perhaps "it shouldn't make things *worse*
> > than they are now" would also be apt..
> 
> That's not how I read 8465.1367860...@sss.pgh.pa.us :(

I believe the point that Tom is making is that we shouldn't paint
ourselves into a corner by letting users provide old snapshots to
pg_dump which haven't acquired any of the necessary locks.  The goal, at
least as I read it, is to come up with a workable design (and I don't
know that we have, but still) which provides a way for the locks to be
taken at least as quickly as what pg_dump does today and which we could
modify down the road to take the locks pre-snapshot (presuming we can
figure out a way to make that work).

The proposed patch certainly doesn't make any attempt to address that
issue and would encourage users to open themselves up to this risk more
than they are exposted today w/ pg_dump.

> I think there is no point in fixing it somewhere else. The problem is in
> pg_dump, not the snapshot import/export.

It's really a problem for just about everything that uses transactions
and locking, isn't it?  pg_dump just happens to have it worst since it
wants to go and touch every object in the database.  It's certainly
possible for people to connect to the DB, look at pg_class and then try
to access some object which no longer exists (even though it's in
pg_class).  This will be an interesting thing to consider when
implementing MVCC for the catalog.

> You did suggest how it can be fixed? You mean
> 20130506214515.gl4...@tamriel.snowman.net?

I suggested how it might be done. :)  There's undoubtably issues with an
all-database-objects lock, but it would certainly reduce the time
between transaction start and getting all the locks acquired and shrink
the window that much more.  If we did implement such a beast, how could
we ensure that the locks were taken immediately after transaction start
if the snapshot is being passed to pg_dump?  Basically, if we *did*
solve this issue for pg_dump in some way in the future, how would we use
it if pg_dump can accept an outside snapshot?

One other thought did occur to me- we could simply have pg_dump export
the snapshot that it gets to stdout, a file, whatever, and systems which
are trying to do this magic "everyone gets the same view" could glob
onto the snapshot created by pg_dump, after all the locks have been
acquired..

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Recovery target 'immediate'

2013-05-07 Thread Heikki Linnakangas


On 07.05.2013 15:38, Simon Riggs wrote:

On 3 May 2013 14:40, Heikki Linnakangas  wrote:

If we want to avoid adding a new option for this, how about a magic restore
point called "consistent" or "immediate":

recovery_target_name='immediate'

That would stop recovery right after reaching consistency, but there
wouldn't be an actual restore point record in the WAL stream.


recovery_target_name='something'

...works for me. Either constent or immediate works.

I request that the docs recommend this be used in conjunction with
pause_at_recovery_target = on, so that the user can begin inspecting
the database at the first available point and then roll forward from
that point if desired. That would cover my concern that this stopping
point is arbitrary and not intrinsically worth stopping at of itself.


Sounds good. I've added this to the TODO.


Can I suggest that we discuss a range of related changes together? So
we have a roadmap of agreed changes in this area. That will be more
efficient than discussing each one individually; often each one makes
sense only as part of the wider context.


Sure, do you have something else in mind related to this?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery target 'immediate'

2013-05-07 Thread Simon Riggs

On 3 May 2013 14:40, Heikki Linnakangas  wrote:
> On 03.05.2013 16:29, Bruce Momjian wrote:
>>
>> On Fri, May  3, 2013 at 01:02:08PM +0200, Cédric Villemain wrote:
>>>
>>> This changes the existing API which will confuse people that know it
>>> and invalidate everything written in software and on wikis as to how
>>> to do it. That means all the "in case of fire break glass"
>>> instructions are all wrong and need to be rewritten and retested.
>>
>>
>> Yes, *that* is the main reason *not* to make the change. It has a
>> pretty bad cost in backwards compatibility loss. There is a gain, but
>> I don't think it outweighs the cost.
>
>
> So, is there a way to add this feature without breaking the API?


 Yes, by adding a new parameter exclusively used to control this feature,
 something like recovery_target_immediate = 'on/off'.
>>>
>>>
>>> We just need to add a named restore point when ending the backup (in
>>> pg_stop_backup() ?).
>>> No API change required. Just document that some predefined target names
>>> are set
>>> during backup.
>>
>>
>> So we auto-add a restore point based on the backup label.  Does that
>> work for everyone?
>
>
> Unfortunately, no. There are cases where you want to stop right after
> reaching consistency, but the point where you reach consistency is not at
> the end of a backup. For example, if you take a backup using an atomic
> filesystem snapshot, there are no pg_start/stop_backup calls, and the system
> will reach consistency after replaying all the WAL in pg_xlog. You might
> think that you can just not create a recovery.conf file in that case, or
> create a dummy recovery.conf file with restore_command='/bin/false'.
> However, then the system will not find the existing timeline history files
> in the archive, and can pick a TLI that's already in use. I found this out
> the hard way, and actually ended up writing a restore_command that restore
> timeline history files normally, but returns non-zero for any real other
> files; it wasn't pretty.
>
> Another case is that you take a backup from a standby server; you can't
> write a restore-point WAL record in a standby.
>
> If we want to avoid adding a new option for this, how about a magic restore
> point called "consistent" or "immediate":
>
> recovery_target_name='immediate'
>
> That would stop recovery right after reaching consistency, but there
> wouldn't be an actual restore point record in the WAL stream.

recovery_target_name='something'

...works for me. Either constent or immediate works.

I request that the docs recommend this be used in conjunction with
pause_at_recovery_target = on, so that the user can begin inspecting
the database at the first available point and then roll forward from
that point if desired. That would cover my concern that this stopping
point is arbitrary and not intrinsically worth stopping at of itself.

Can I suggest that we discuss a range of related changes together? So
we have a roadmap of agreed changes in this area. That will be more
efficient than discussing each one individually; often each one makes
sense only as part of the wider context.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast promotion failure

2013-05-07 Thread Heikki Linnakangas

While testing the bug from the "Assertion failure at standby promotion", 
I bumped into a different bug in fast promotion. When the first 
checkpoint after fast promotion is performed, there is no guarantee that 
the checkpointer process is running with the correct, new, 
ThisTimeLineID. In CreateCheckPoint(), we have this:



/*
 * An end-of-recovery checkpoint is created before anyone is allowed to
 * write WAL. To allow us to write the checkpoint record, temporarily
 * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
 * initialized, which we need here and in AdvanceXLInsertBuffer.)
 */
if (flags & CHECKPOINT_END_OF_RECOVERY)
LocalSetXLogInsertAllowed();


That ensures that ThisTimeLineID is updated when performing an 
end-of-recovery checkpoint, but it doesn't get executed with fast 
promotion. The consequence is that the checkpoint is created with the 
old timeline, and subsequent recovery from it will fail.


I ran into this with the attached script. It sets up a master (M), a 
standby (B), and a cascading standby (C). I'm not sure why, but when I 
tried to simplify the script by removing the cascading standby, it 
started to work. The bug occurs in standby B, so I'm not sure why the 
presence of the cascading standby makes any difference. Maybe it just 
affects the timing.


- Heikki


fast-promotion-bug.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)

2013-05-07 Thread Fabien COELHO



Hello Robert,

This very minor patch adds a corresponding long option to all short 
(one letter) options of pgbench. [...]


I don't really have an opinion on whether this is worth doing, but we'd 
probably want to update all of our client utilities, not just pgbench, 
if we did.


The current status is that "official" clients already have systematic long 
options. I have checked: psql, pg_dump, pg_dumpall, pg_restore, 
pg_basebackup, createdb, createuser, createlang. Possibly other commands 
in contrib do not have long option.


As for the rational, when I type interactively I tend to use short 
options, but in a script I like long options so that I may not need to 
look them up in the documentation too often when reading the script later.
The other rational is that adding long options is cheap and 
straightforward.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-07 Thread Amit Langote

Hello,

I have been trying to understand how group commit implementation works
the way it does after 9.2 group commit enhancement patch
(9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit
it's a pretty old commit though I seek some clarification as to how it
provides the performance gain as it does. Also, I have observed some
behavior in this regard that I could not understand.

Profiling results show that XLogFlush is called about twice as much
after this patch while for XLogWrite count remains about same as
before. Since, XLogFlush is the place where this patch offers the said
performance gain by alleviating the lock contention on WALWriteLock
using the new LWLockAcquireOrWait(). So far so good. I do not
understand why XLogFlush is invoked twice (as see from the profiling
results) as an effect of this patch.

I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result
(after applying the patch) not being significantly different (as in
not twice as much on my system).

Using:

pgbench scale=10

shared_buffers=256MB
max_connections=1000
checkpoint_segments=15
synchronous_commit=on

Comments?
--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-05-07 Thread Stefan Kaltenbrunner


On 04/04/2013 02:18 AM, Tom Lane wrote:

Stefan Kaltenbrunner  writes:

On 04/03/2013 12:59 AM, Tom Lane wrote:

BTW, on further thought it seems like maybe this is an OpenBSD bug,
at least in part: what is evidently happening is that the temporary
blockage of SIGINT during the handler persists even after we've
longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
to establish that longjmp handler --- so why isn't the original signal
mask reinstalled when we return to the main loop?

If (your version of) OpenBSD is getting this wrong, it'd explain why
we've not seen similar behavior elsewhere.



hmm trolling the openbsd cvs history brings up this:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143


That's about alternate signal stacks, which we're not using.

I put together a simple test program (attached) and tried it on
spoonbill, and it says that the signal *does* get unblocked when control
returns to the sigsetjmp(...,1).  So now I'm really confused.  Somehow
the results we're getting in a full-fledged backend do not match up with
the results gotten by this test program ... but why?


as a followup to this - I spend some time upgrading spoonbill to the 
lastest OpenBSD release (5.3) the other day and it seems to be able to 
pass a full regression test run now on a manual run. I will add it to 
the regular reporting schedule again, but it seems at least part of the 
problem is/was an Operating system level issue that got fixed in either 
5.2 or 5.3 (spoonbill was on 5.1 before).



Stefan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

57 matches

Mail list logo