Re: [HACKERS] getting rid of freezing

2013-05-25 Thread Hannu Krosing
On 05/24/2013 07:00 PM, Robert Haas wrote:
 On Fri, May 24, 2013 at 11:29 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 24, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 [all-visible cannot restore hint bits without FPI because of torn pages]
 I haven't yet thought about this sufficiently yet. I think we might have
 a chance of working around this, let me ponder a bit.
 Yeah.  I too feel like there might be a solution.  But I don't know
 have something specific in mind, yet anyway.
 One thought I had is that it might be beneficial to freeze when a page
 ceases to be all-visible, rather than when it becomes all-visible.
That what I aimed to describe in my mail earlier, but your
description is much clearer :)
 Any operation that makes the page not-all-visible is going to emit an
 FPI anyway, so we don't have to worry about torn pages in that case.
 Under such a scheme, we'd have to enforce the rule that xmin and xmax
 are ignored for any page that is all-visible; 
Agreed. We already relay on all-visible pages enough that we
can trust it to be correct. Making that universal rule should not
add any risks .
The rule page all-visible == assume all tuples frozen would
also enable VACUUM FREEZE to only work only on the
non-all-visible pages .
 and when a page ceases
 to be all-visible, we have to go back and really freeze the
 pre-existing tuples.  
We can do this unconditionally, or in milder case use vacuum_freeze_min_age
if we want to retain xids for forensic purposes.
 I think we might be able to use the existing
 all_visible_cleared/new_all_visible_cleared flags to trigger this
 behavior, without adding anything new to WAL at all.
This seems to be easiest

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move unused buffers to freelist

2013-05-25 Thread Amit kapila
On Friday, May 24, 2013 8:22 PM Jim Nasby wrote:
On 5/14/13 8:42 AM, Amit Kapila wrote:
 In the attached patch, bgwriter/checkpointer moves unused (usage_count =0  
 refcount = 0) buffer’s to end of freelist. I have implemented a new API 
 StrategyMoveBufferToFreeListEnd() to

 move buffer’s to end of freelist.


 Instead of a separate function, would it be better to add an argument to 
 StrategyFreeBuffer? 

  Yes, it could be done with a parameter which will decide whether to put 
buffer at head or tail in freelist.
  However currently the main focus is to check in which cases this optimization 
can give benefit.
  Robert had ran tests for quite a number of cases where it doesn't show any 
significant gain.
  I am also trying with various configurations to see if it gives any benefit.
  Robert has given some suggestions to change the way currently new function is 
getting called, 
  I will try it and update the results of same.

  I am not very sure that default pgbench is a good test scenario to test this 
optimization.
  If you have any suggestions for tests where it can show benefit, that would 
be a great input.

 ISTM this is similar to the other strategy stuff in the buffer manager, so 
 perhaps it should mirror that...

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-25 Thread Amit kapila

On Thursday, May 23, 2013 4:40 PM Heikki Linnakangas wrote:
 Hi,

 I've been hacking on a tool to allow resynchronizing an old master
 server after failover. The need to do a full backup/restore has been a
 common complaint ever since we've had streaming replication. I saw on
 the wiki that this was discussed in the dev meeting; too bad I couldn't
 make it.

 In a nutshell, the idea is to do copy everything that has changed
 between the cluster, like rsync does, but instead of reading through all
 files, use the WAL to determine what has changed. Here's a somewhat more
 detailed explanation, from the README:

This is really a nice idea and an important requirement from many users.

Does this tool handle all kind of operations user would have performaed after
forking of new cluster or it would mandate that user should not have performed 
certain kind of
operations on old cluster after new cluster forked off?

 Theory of operation
 ---

 The basic idea is to copy everything from the new cluster to old, except
 for the blocks that we know to be the same.

 1. Scan the WAL log of the old cluster, starting from the point where
 the new cluster's timeline history forked off from the old cluster. For
 each WAL record, make a note of the data blocks that are touched. This
 yields a list of all the data blocks that were changed in the old
 cluster, after the new cluster forked off.

a. How about if after forking off new cluster, a new relation gets created in 
old cluster,
   then it might not find the blocks of same in new cluster, if new cluster 
also got 
   the same name relation as old but with different data, it might get error 
when it start
   to replay WAL of new master as mentioned in point-4.

b. How about if after forking off new cluster, an update occurs such that it 
has to put new 
   row in new heap page, now in WAL it will have mention of old and new row 
blocks (blk-1 and blk-2).
   It might be the case new cluster will not have blk-2, so only blk-1 will be 
copied from new cluster,
   in such scenario, it will have 2 valid versions of same row. 


With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_export_snapshot on standby side

2013-05-25 Thread Simon Riggs
On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote:

 We cannot run parallel pg_dump on the standby server because
 pg_export_snapshot() always fails on the standby. Is this the oversight
 of parallel pg_dump or pg_export_snapshot()?

 pg_export_snapshot() fails in the standby because it always assigns
 new XID and which is not allowed in the standby. Do we really need
 to assign new XID even in the standby for the exportable snapshot?

Having looked at the code, I say No, we don't *need* to.

There are various parts of the code that deal with
takenDuringRecovery, so much of this was clearly intended to work in
recovery.

We use the topXid for the name of the snapshot file. That is clearly
unnecessary and we should be using the virtualxid instead like we do
elsewhere. We also use the topXid to test whether it is still running,
though again, we could equally use the virtualxid instead. There is no
problem with virtualxids possibly not being active anymore, since if
we didn't have an xid before and don't have one now, and the xmin is
the same, the snapshot is still valid.

I think we should treat this as a bug and fix it in 9.3 while we're
still in beta. Why? Because once we begin using the topXid as the
filename we can't then change later to using the vxid.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-25 Thread Ants Aasma
On Sat, May 25, 2013 at 10:05 AM, Amit kapila amit.kap...@huawei.com wrote:
 In a nutshell, the idea is to do copy everything that has changed
 between the cluster, like rsync does, but instead of reading through all
 files, use the WAL to determine what has changed. Here's a somewhat more
 detailed explanation, from the README:

 This is really a nice idea and an important requirement from many users.

+1

 Does this tool handle all kind of operations user would have performaed after
 forking of new cluster or it would mandate that user should not have 
 performed certain kind of
 operations on old cluster after new cluster forked off?

Truncate and all kinds of DROP come to mind, also table rewrites from
ALTER. The tool should probably just flag those relation files to be
copied wholesale.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-25 Thread Simon Riggs
There are a number of changes we'd probably like to make to the way
things work in Postgres. This thread is not about discussing what
those are, just to say that requirements exist and have been discussed
in various threads over time.

The constraint on such changes is that we've decided that we must have
an upgrade path from release to release.

So I'd like to make a formal suggestion of a plan for how we cope with this:

1. Implement online upgrade in 9.4 via the various facilities we have
in-progress. That looks completely possible.

2. Name the next release after that 10.0 (would have been 9.5). We
declare now that
a) 10.0 will support on-line upgrade from 9.4 (only)
b) various major incompatibilities will be introduced in 10.0 - the
change in release number will indicate to everybody that is the case
c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so
that we will not be constrained by that

This plan doesn't presume any particular change. Each change would
need to be discussed on a separate thread, with a separate case for
each. All I'm suggesting is that we have a coherent plan for the
timing of such changes, so we can bundle them together into one
release.

By doing this now we give ourselves lots of time to plan changes that
will see us good for another decade. If we don't do this, then we
simply risk losing the iniative by continuing to support legacy
formats and approaches.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-25 Thread Simon Riggs
On 24 May 2013 17:00, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 24, 2013 at 11:29 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 24, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 [all-visible cannot restore hint bits without FPI because of torn pages]

 I haven't yet thought about this sufficiently yet. I think we might have
 a chance of working around this, let me ponder a bit.

 Yeah.  I too feel like there might be a solution.  But I don't know
 have something specific in mind, yet anyway.

 One thought I had is that it might be beneficial to freeze when a page
 ceases to be all-visible, rather than when it becomes all-visible.
 Any operation that makes the page not-all-visible is going to emit an
 FPI anyway, so we don't have to worry about torn pages in that case.
 Under such a scheme, we'd have to enforce the rule that xmin and xmax
 are ignored for any page that is all-visible; and when a page ceases
 to be all-visible, we have to go back and really freeze the
 pre-existing tuples.  I think we might be able to use the existing
 all_visible_cleared/new_all_visible_cleared flags to trigger this
 behavior, without adding anything new to WAL at all.

I like the idea but it would mean we'd have to freeze in the
foreground path rather in a background path.

Have we given up on the double buffering idea to remove FPIs
completely? If we did that, then this wouldn't work.

Anyway, I take it the direction of this idea is that we don't need a
separate freezemap, just use the vismap. That seems to be forcing
ideas down a particular route we may regret. I'd rather just keep
those things separate, even if we manage to merge the WAL actions for
most of the time.


Some other related thoughts:

ISTM that if we really care about keeping xids for debug purposes that
it could be a parameter. For the mainline, we just freeze blocks at
the same time we do page pruning.

I think the right way is actually to rethink and simplify all this
complexity of Freezing/Pruning/Hinting/Visibility

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Processing long AND/OR lists

2013-05-25 Thread Gurjeet Singh
When Postgres encounters a long list of AND/OR chains, it errors out at
check_stack_depth() after a limit of few thousand. At around 10,000
elements, the recursion at assign_expr_collations() causes the error. But
at a little higher element count, around 15,000, the recursion check errors
out a little earlier, in the stack around transformAExprAnd(). The test
queries were generated using the attached test.sh script.

This is not a synthetic test. Recently I saw a report where Slony generated
a huge list of 'log_actionseq  nnn and log_actionseq  nnn and ...' for
a SYNC event, and that SYNC event could not complete due to this error.
Granted that Slony can be fixed by making it generate the condition as
'log_actionseq not in (nnn, nnn)' which is processed non-recursively, but
IMHO Postgres should be capable of handling this trivial construct, however
long it may be (of course, limited by memory).

To that end, I wrote the attached patch, and although I seem to be playing
by the rules, `make check` fails. Some diffs look benign, but others not so
much.

AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a
list. And since SQL standard does not say anything about short-circuit
evaluation, the evaluation order of these expressions should not affect the
results. So clearly turning a tree of booleans expressions into a list is
not so subtle a change. I suspect that this patch is messing up the join
conditions.

I see two ways to fix it:
1) Fix the order of expressions in BoolExpr such that the order of
evaluation remains unchanged compared to before the patch.
I expect this to take care of the benign diffs. But I doubt this will
address the other diffs.

2) Construct a tree same as done before the patch, but do it
non-recursively.
This is I guess the right thing to do, but then we'll have to make
similar tree-traversal changes in other places, for eg. in BoolExpr walking
in expression_tree_walker() or maybe just the stack below
assign_expr_collations().

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation.patch
Description: Binary data


test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-25 Thread Cédric Villemain
  If it seems to be on the right way, I'll keep fixing EXTENSION building
  with VPATH.
 
 I haven't tried the patch, but let me just say that Debian (and
 apt.postgresql.org) would very much like the VPATH situation getting
 improved. At the moment we seem to have to invent a new build recipe
 for every extension around.

I have been busy the last week.
I just took time to inspect our contribs, USE_PGXS is not supported by all of 
them atm because of SHLIB_PREREQS (it used submake) I have a patch pending 
here to fix that. Once all our contribs can build with USE_PGXS I fix the VPATH.

The last step is interesting: installcheck/REGRESS. For this one, if I can 
know exactly what's required (for debian build for example), then I can also 
fix this target.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-25 Thread Merlin Moncure
On Sat, May 25, 2013 at 4:39 AM, Simon Riggs si...@2ndquadrant.com wrote:
 There are a number of changes we'd probably like to make to the way
 things work in Postgres. This thread is not about discussing what
 those are, just to say that requirements exist and have been discussed
 in various threads over time.

 The constraint on such changes is that we've decided that we must have
 an upgrade path from release to release.

 So I'd like to make a formal suggestion of a plan for how we cope with this:

 1. Implement online upgrade in 9.4 via the various facilities we have
 in-progress. That looks completely possible.

 2. Name the next release after that 10.0 (would have been 9.5). We
 declare now that
 a) 10.0 will support on-line upgrade from 9.4 (only)
 b) various major incompatibilities will be introduced in 10.0 - the
 change in release number will indicate to everybody that is the case
 c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so
 that we will not be constrained by that

 This plan doesn't presume any particular change. Each change would
 need to be discussed on a separate thread, with a separate case for
 each. All I'm suggesting is that we have a coherent plan for the
 timing of such changes, so we can bundle them together into one
 release.

 By doing this now we give ourselves lots of time to plan changes that
 will see us good for another decade. If we don't do this, then we
 simply risk losing the iniative by continuing to support legacy
 formats and approaches.

Huh.  I don't think that bumping the version number to 10.0 vs 9.5 is
justification to introduce breaking changes.  In fact, I would rather
see 10.0 be the version where we formally stop doing that.  I
understand that some stuff needs to be improved but it often doesn't
seem to be worth the cost in the long run.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-25 Thread Jeff Davis
On Sat, 2013-05-25 at 10:39 +0100, Simon Riggs wrote:
 The constraint on such changes is that we've decided that we must have
 an upgrade path from release to release.

Is this proposal only relaxing the binary upgrade requirement, or would
it also relax other compatibility requirements, such as language and API
compatibility?

We need a couple major drivers of the incompatibility that really show
users some value for going through the upgrade pain. Preferably, at
least one would be a serious performance boost, because the users that
encounter the most logical upgrade pain are also the ones that need a
performance boost the most.

Before we set a specific schedule, I think it would be a good idea to
start prototyping some performance improvements that involve breaking
the data format. Then, depending on how achievable it is, we can plan
for however many more 9.X releases we think we need. That being said, I
agree with you that planning in advance is important here, so that
everyone knows when they need to get format-breaking changes in by.

Regards,
Jeff Davis 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-25 Thread Simon Riggs
On 25 May 2013 18:13, Jeff Davis pg...@j-davis.com wrote:
 On Sat, 2013-05-25 at 10:39 +0100, Simon Riggs wrote:
 The constraint on such changes is that we've decided that we must have
 an upgrade path from release to release.

 Is this proposal only relaxing the binary upgrade requirement, or would
 it also relax other compatibility requirements, such as language and API
 compatibility?

I'm suggesting that as many as possible changes we would like to make
can happen in one release. This is for the benefit of users, so we
dont make every release a source of incompatibilities.

And that release should be the first one where we have online upgrade
possible, which is one after next.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-25 Thread Jon Nelson
On Thu, May 16, 2013 at 7:05 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 5/16/13 9:16 AM, Jon Nelson wrote:

 Am I doing this the right way? Should I be posting the full patch each
 time, or incremental patches?


 There are guidelines for getting your patch in the right format at
 https://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
 that would improve this one.  You have some formatting issues with tab
 spacing at lines 120 through 133 in your v3 patch.  And it looks like there
 was a formatting change on line 146 that is making the diff larger than it
 needs to be.

I've corrected the formatting change (end-of-line whitespace was
stripped) on line 146.
The other whitespace changes are - I think - due to newly-indented
code due to a new code block.
Included please find a v4 patch which uses context diffs per the above url.

 The biggest thing missing from this submission is information about what
 performance testing you did.  Ideally performance patches are submitted with
 enough information for a reviewer to duplicate the same test the author did,
 as well as hard before/after performance numbers from your test system.  It
 often turns tricky to duplicate a performance gain, and being able to run
 the same test used for initial development eliminates a lot of the problems.

This has been a bit of a struggle. While it's true that WAL file
creation doesn't happen with great frequency, and while it's also true
that - with strace and other tests - it can be proven that
fallocate(16MB) is much quicker than writing it zeroes by hand,
proving that in the larger context of a running install has been
challenging.

Attached you'll find a small test script (t.sh) which creates a new
cluster in 'foo', changes some config values, starts the cluster, and
then times how long it takes pgbench to prepare a database. I've used
wal_level = hot_standby in the hopes that this generates the largest
number of WAL files (and I set the number of such files to 1024). The
hardware is an AMD 9150e with a 2-disk software RAID1 (SATA disks) on
kernel 3.9.2 and ext4 (x86_64, openSUSE 12.3). The test results are
not that surprising. The longer the test (the larger the scale factor)
the less of a difference using posix_fallocate makes. With a scale
factor of 100, I see an average of 10-11% reduction in the time taken
to initialize the database. With 300, it's about 5.5% and with 900,
it's between 0 and 1.2%. I will be doing more testing but this is what
I started with. I'm very open to suggestions.

 Second bit of nitpicking.  There are already some GUC values that appear or
 disappear based on compile time options.  They're all debugging related
 things though.  I would prefer not to see this one go away when it's
 implementation isn't available.  That's going to break any scripts that SHOW
 the setting to see if it's turned on or not as a first problem.  I think the
 right model to follow here is the IFDEF setup used for
 effective_io_concurrency.  I wouldn't worry about this too much though.
 Having a wal_use_fallocate GUC is good for testing.  But if it works out
 well, when it's ready for commit I don't see why anyone would want it turned
 off on platforms where it works.  There are already too many performance
 tweaking GUCs.  Something has to be very likely to be changed from the
 default before its worth adding one for it.

Ack.  I've revised the patch to always have the GUC (for now), default
to false, and if configure can't find posix_fallocate (or the user
disables it by way of pg_config_manual.h) then it remains a GUC that
simply can't be changed.

I'll also be re-running the tests.

--
Jon
attachment: plot.png

fallocate-v4.patch
Description: Binary data


t.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-25 Thread Bruce Momjian
On Sat, May 25, 2013 at 10:39:30AM +0100, Simon Riggs wrote:
 There are a number of changes we'd probably like to make to the way
 things work in Postgres. This thread is not about discussing what
 those are, just to say that requirements exist and have been discussed
 in various threads over time.
 
 The constraint on such changes is that we've decided that we must have
 an upgrade path from release to release.
 
 So I'd like to make a formal suggestion of a plan for how we cope with this:
 
 1. Implement online upgrade in 9.4 via the various facilities we have
 in-progress. That looks completely possible.
 
 2. Name the next release after that 10.0 (would have been 9.5). We
 declare now that
 a) 10.0 will support on-line upgrade from 9.4 (only)
 b) various major incompatibilities will be introduced in 10.0 - the
 change in release number will indicate to everybody that is the case
 c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so
 that we will not be constrained by that

Assuming online upgrade is going to require logical replication, you are
also assuming 2x storage as you need to have a second cluster to perform
the upgrade.  pg_upgrade would still be needed to upgrade a cluster
in-place.

This sounds like, I created a new tool which does some of what the old
tool does.  Let's break the old tool to allow some unspecified changes I
might want to make.  I consider this thread to be not thought-through,
obviously.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_export_snapshot on standby side

2013-05-25 Thread Fujii Masao
On Sat, May 25, 2013 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote:

 We cannot run parallel pg_dump on the standby server because
 pg_export_snapshot() always fails on the standby. Is this the oversight
 of parallel pg_dump or pg_export_snapshot()?

 pg_export_snapshot() fails in the standby because it always assigns
 new XID and which is not allowed in the standby. Do we really need
 to assign new XID even in the standby for the exportable snapshot?

 Having looked at the code, I say No, we don't *need* to.

Good to hear.

 There are various parts of the code that deal with
 takenDuringRecovery, so much of this was clearly intended to work in
 recovery.

 We use the topXid for the name of the snapshot file. That is clearly
 unnecessary and we should be using the virtualxid instead like we do
 elsewhere. We also use the topXid to test whether it is still running,
 though again, we could equally use the virtualxid instead. There is no
 problem with virtualxids possibly not being active anymore, since if
 we didn't have an xid before and don't have one now, and the xmin is
 the same, the snapshot is still valid.

 I think we should treat this as a bug and fix it in 9.3 while we're
 still in beta.

+1

 Why? Because once we begin using the topXid as the
 filename we can't then change later to using the vxid.

I'm afraid that pg_export_snapshot() in 9.2 has already been using
topXid as the filename.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_export_snapshot on standby side

2013-05-25 Thread Alvaro Herrera
Fujii Masao escribió:
 On Sat, May 25, 2013 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote:

  I think we should treat this as a bug and fix it in 9.3 while we're
  still in beta.
 
 +1
 
  Why? Because once we begin using the topXid as the
  filename we can't then change later to using the vxid.
 
 I'm afraid that pg_export_snapshot() in 9.2 has already been using
 topXid as the filename.

I don't think this matters at all.  The filename is an implementation
detail which we don't even need to keep compatible across pg_upgrade.
If we think this is a bug, we should backpatch the fix to 9.2.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] background worker and normal exit

2013-05-25 Thread Fujii Masao
Hi,

I found that the normal exit (i.e., with exit code 0) of bgworker
always leads to
the immediate restart of bgworker whatever bgw_restart_time is. Is
this intentional?
Not only crash but also normal exit should go along with bgw_restart_time?

I'm now writing the bgworker which is allowed to be running only
during recovery.
After recovery ends, that bgworker is expected to exit with code 0. I
was thinking
to avoid the restart of the bgworker after normal exit, by using
BGW_NEVER_RESTART.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-25 Thread Josh Berkus
Andres,

 all visible is only set in vacuum and it determines which parts of a
 table will be scanned in a non full table vacuum. So, since we won't
 regularly start vacuum in the insert only case there will still be a
 batch of work at once. But nearly all of that work is *already*
 performed. We would just what the details of that around for a
 bit. *But* since we now would only need to vacuum the non all-visible
 part that would get noticeably cheaper as well.

Yeah, I can see that.  Seems worthwhile, then.

 I think for that case we should run vacuum more regularly for insert
 only tables since we currently don't do regularly enough which a) increases
 the amount of work needed at once and b) prevents index only scans from
 working there.

Yes.   I'm not sure how we would set this though; I think it's another
example of how autovacuum's parameters for when to vaccuum etc. are too
simple-minded for the real world.  Doing an all-visible scan on an
insert-only table, for example, should be based on XID age and not on %
inserted, no?

Speaking of which, I need to get on revamping the math for autoanalyze.

Mind you, in the real-world insert-only table case, this does create
extra IO -- real insert-only tables often have a few rows (  5% ) which
are updated/deleted.  Vacuum would see these and want to clean the pages
up, which would create much more substantial IO.  It might still be a
good tradeoff, but we should be aware of it.

Unless we want a special VACUUM ALL VISIBLE mode.  I vote no, unless we
demonstrate some really convincing case for it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] View Index and UNION

2013-05-25 Thread Stefan Keller
Hi

I've encountered a fundamental problem which - to me - can only be
solved with an (future/possible) real index on views in PostgreSQL
(like the exist already in MS SQL Server and Ora):

Given following schema:

1. TABLE a and TABLE b, each with INDEX on attribute geom.

2. A VIEW with union:

CREATE VIEW myview AS
  SELECT * FROM a
  UNION
  SELECT * FROM b;

3. And a simple query with KNN index and a coordinate mypos :

SELECT * FROM myview
ORDER BY ST_Geomfromtext(mypos) - myview.geom

Now, the problem is, that for the order by it is not enough that
each on the two tables calculate the ordering separately: We want a
total ordering over all involved tables!

In fact, the planner realizes that and chooses a seq scan over all
tuples of table a and b - which is slow and suboptimal!

To me, that's a use case where we would wish to have a distinct index on views.

Any opinions on this?

Yours, Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] View Index and UNION

2013-05-25 Thread William King
Could this scenario not be handled by a step that orders the two tables
independently, then for the view interleaves the presorted results?
Merging two sorted sets into a single sorted set is usually a trivial
task, and it could still take advantage of the existing indexes.

William King
Senior Engineer
Quentus Technologies, INC
1037 NE 65th St Suite 273
Seattle, WA 98115
Main:   (877) 211-9337
Office: (206) 388-4772
Cell:   (253) 686-5518
william.k...@quentustech.com

On 05/25/2013 05:35 PM, Stefan Keller wrote:
 Hi
 
 I've encountered a fundamental problem which - to me - can only be
 solved with an (future/possible) real index on views in PostgreSQL
 (like the exist already in MS SQL Server and Ora):
 
 Given following schema:
 
 1. TABLE a and TABLE b, each with INDEX on attribute geom.
 
 2. A VIEW with union:
 
 CREATE VIEW myview AS
   SELECT * FROM a
   UNION
   SELECT * FROM b;
 
 3. And a simple query with KNN index and a coordinate mypos :
 
 SELECT * FROM myview
 ORDER BY ST_Geomfromtext(mypos) - myview.geom
 
 Now, the problem is, that for the order by it is not enough that
 each on the two tables calculate the ordering separately: We want a
 total ordering over all involved tables!
 
 In fact, the planner realizes that and chooses a seq scan over all
 tuples of table a and b - which is slow and suboptimal!
 
 To me, that's a use case where we would wish to have a distinct index on 
 views.
 
 Any opinions on this?
 
 Yours, Stefan
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background worker and normal exit

2013-05-25 Thread Michael Paquier
On Sun, May 26, 2013 at 6:16 AM, Fujii Masao masao.fu...@gmail.com wrote:

 I found that the normal exit (i.e., with exit code 0) of bgworker
 always leads to
 the immediate restart of bgworker whatever bgw_restart_time is. Is
 this intentional?

Yes, per the docs:
http://www.postgresql.org/docs/devel/static/bgworker.html
Background workers are expected to be continuously running; if they exit
cleanly, postgres will restart them immediately.


 Not only crash but also normal exit should go along with bgw_restart_time?

bgw_restart_time corresponds to the time a bgworker is restarted in case of
a crash only.


 I'm now writing the bgworker which is allowed to be running only
 during recovery.
 After recovery ends, that bgworker is expected to exit with code 0. I
 was thinking
 to avoid the restart of the bgworker after normal exit, by using
 BGW_NEVER_RESTART.

This flag makes a worker not to restart only in case of a crash. To solve
your problem, you could as well allow your process to restart and put it in
indefinite sleep if server is not in recovery such it it will do nothing in
your case.

Regards,
-- 
Michael


Re: [HACKERS] Processing long AND/OR lists

2013-05-25 Thread Gurjeet Singh
On Sat, May 25, 2013 at 9:56 AM, Gurjeet Singh gurj...@singh.im wrote:

 When Postgres encounters a long list of AND/OR chains, it errors out at
 check_stack_depth() after a limit of few thousand. At around 10,000
 elements, the recursion at assign_expr_collations() causes the error. But
 at a little higher element count, around 15,000, the recursion check errors
 out a little earlier, in the stack around transformAExprAnd(). The test
 queries were generated using the attached test.sh script.

 This is not a synthetic test. Recently I saw a report where Slony
 generated a huge list of 'log_actionseq  nnn and log_actionseq  nnn and
 ...' for a SYNC event, and that SYNC event could not complete due to this
 error. Granted that Slony can be fixed by making it generate the condition
 as 'log_actionseq not in (nnn, nnn)' which is processed non-recursively,
 but IMHO Postgres should be capable of handling this trivial construct,
 however long it may be (of course, limited by memory).

 To that end, I wrote the attached patch, and although I seem to be playing
 by the rules, `make check` fails. Some diffs look benign, but others not so
 much.

 AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a
 list. And since SQL standard does not say anything about short-circuit
 evaluation, the evaluation order of these expressions should not affect the
 results. So clearly turning a tree of booleans expressions into a list is
 not so subtle a change. I suspect that this patch is messing up the join
 conditions.

 I see two ways to fix it:
 1) Fix the order of expressions in BoolExpr such that the order of
 evaluation remains unchanged compared to before the patch.
 I expect this to take care of the benign diffs. But I doubt this will
 address the other diffs.

 2) Construct a tree same as done before the patch, but do it
 non-recursively.
 This is I guess the right thing to do, but then we'll have to make
 similar tree-traversal changes in other places, for eg. in BoolExpr walking
 in expression_tree_walker() or maybe just the stack below
 assign_expr_collations().


As suspected, the problem was that a JOIN's USING clause processing cooks
up its own join conditions, and those were the ones that couldn't cope with
the changed order of output.

Fixing that order of arguments of the BoolExpr also fixed all the JOIN
related diffs. Now `make check` passes except for this benign diff.

  |   WHERE (((rsl.sl_color = rsh.slcolor)
AND (rsl.sl_len_cm = rsh.slminlen_cm)) AND (rsl.sl_len_cm =
rsh.slmaxlen_cm));
  |   WHERE ((rsl.sl_color = rsh.slcolor)
AND (rsl.sl_len_cm = rsh.slminlen_cm) AND (rsl.sl_len_cm =
rsh.slmaxlen_cm));

That's expected, since for cases like these, the patch converts what used
to be a tree of 2-argument BoolExprs into a single BoolExpr with many
arguments.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-25 Thread Christoph Berg
Re: Cédric Villemain 2013-05-25 201305251641.28401.ced...@2ndquadrant.com
 I just took time to inspect our contribs, USE_PGXS is not supported by all of 
 them atm because of SHLIB_PREREQS (it used submake) I have a patch pending 
 here to fix that. Once all our contribs can build with USE_PGXS I fix the 
 VPATH.

The evil part of the problem is that you'd need to fix it all the way
back to 8.4 (or 9.0 once 8.4 is EOL) if we want it to build extensions
on apt.pg.org. (Or find a way that is compatible with as many majors
as possible.)

 The last step is interesting: installcheck/REGRESS. For this one, if I can 
 know exactly what's required (for debian build for example), then I can also 
 fix this target.

Debian builds don't have root access, so make installcheck would
need to install into a temp dir (probably DESTDIR). Currently this
fails because the server will still insist on reading the extension
control files from PGSHAREDIR. (See the thread starting at
http://www.postgresql.org/message-id/20120221101903.GA15647@gnash for
details.)

Otherwise than that, and my pending pg_regress --host /tmp patch,
things should probably ok for buildds.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers