date:20121210

Re: [HACKERS] allowing multiple PQclear() calls

2012-12-10 Thread Tom Lane

Josh Kupershmidt  writes:
> Would it be crazy to add an "already_freed" flag to the pg_result
> struct which PQclear() would set, or some equivalent safety mechanism,
> to avoid this hassle for users?

Yes, it would.  Once the memory has been freed, malloc() is at liberty
to give it out for some other purpose.  The only way we could make that
work reliably is to permanently leak the memory occupied by the PGresult
... which I trust you will agree is a cure worse than the disease.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-10 Thread Darren Duncan


I agree with Jeff.

Options that change the language at initdb or create-database time just fragment 
the language.


It is best to just have 1 language where options are providable either 
dynamically per connection or otherwise lexically, so that then they are really 
just shorthands for the current local usage, and the language as a whole is the 
same.


That also means you can have example code out there and know it will work on any 
Postgres install, invariant of static global options.  If language modifiers are 
local or lexical, then any example code presumably would include the switches to 
turn them on for that example.


That all being said, I also think it is best to explicitly overload operators 
with extra parameter types, such as defining another operator with the signature 
of (Nunber,String) with the same base name as string catenation, rather than 
making numbers implicitly stringify.  But I can also accept implicit 
stringification / language behavior changes if it is a lexical/temporary effect 
that the same user is still explicitly turning on.


-- Darren Duncan

Jeff Davis wrote:
On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: 

And we not only don't give them the behavior they want; we
don't even have a meaningful way to give the option of opting into
that behavior at initdb or create-database time.


I strongly object to offering options that change the language in such a
substantial way. initdb-time options still mean that we are essentially
dividing our language, and therefore the applications that support
postgres, in half (or worse). One of the things I really like about
postgres is that we haven't forked the language with a million options
like mysql has. I don't even like the fact that we have a GUC to control
the output format of a BYTEA.

For every developer who says "wow, that mysql query just worked without
modification" there is another one who says "oh, I forgot to test with
option XYZ... postgres is too complex to support, I'm going to drop it
from the list of supported databases".

I still don't see a compelling reason why opting out of overloading on a
per-function basis won't work. Your objections seemed fairly minor in
comparison to how strongly you are advocating this use case.

In particular, I didn't get a response to:

http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025

For what it's worth, I'm glad that people like you are pushing on these
usability issues, because it can be hard for insiders to see them
sometimes.

Regards,
Jeff Davis



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-10 Thread Kyotaro HORIGUCHI

Hello, I've also found this does not fix this problem.

> >> So I'd say we should update minRecoveryPoint first, then
> >> truncate/delete. But we should still keep the XLogFlush() at the end of
> >> xact_redo_commit_internal(), for the case where files/directories are
> >> created. Patch attached.
> 
> Sounds reasonable.

It makes perfectly sense.

> > Committed and backpatched that. Attached is a script I used to reproduce
> > this problem, going back to 8.4.
> 
> Thanks!
> 
> Unfortunately I could reproduce the problem even after that commit.
> Attached is the script I used to reproduce the problem.

Me too.

> The cause is that CheckRecoveryConsistency() is called before rm_redo(),
> as Horiguchi-san suggested upthead. Imagine the case where
> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
> record. When restarting the server with that minRecoveryPoint,
> the followings would happen, and then PANIC occurs.
> 
> 1. XLOG_SMGR_TRUNCATE record is read.
> 2. CheckRecoveryConsistency() is called, and database is marked as
> consistent since we've reached minRecoveryPoint (i.e., the location
> of XLOG_SMGR_TRUNCATE).
> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
> found.
> 4. Since the database has already been marked as consistent, an invalid
> page leads to PANIC.

Exactly.

In smgr_redo, EndRecPtr which is pointing the record next to
SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
hand, during the second startup of the standby,
CheckRecoveryConsistency checks for consistency by
XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
just BEFORE the SMGR_TRUNCATE record is applied. So
reachedConsistency becomes true just before the SMGR_TRUNCATE
record will be applied. Bang!

I said I had no objection to placing CheckRecoveryConsistency
both before and after of rm_redo in previous message, but it was
wrong. Given aminRecoveryPoint value, it should be placed after
rm_redo from the point of view of when the database should be
considered to be consistent.

Actually, simply moving CheckRecoeverConsistency after rm_redo
turned into succeessfully startup, ignoring the another reason
for it should be before, which is unknown to me.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-10 Thread Jeff Davis

On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: 
> And we not only don't give them the behavior they want; we
> don't even have a meaningful way to give the option of opting into
> that behavior at initdb or create-database time.

I strongly object to offering options that change the language in such a
substantial way. initdb-time options still mean that we are essentially
dividing our language, and therefore the applications that support
postgres, in half (or worse). One of the things I really like about
postgres is that we haven't forked the language with a million options
like mysql has. I don't even like the fact that we have a GUC to control
the output format of a BYTEA.

For every developer who says "wow, that mysql query just worked without
modification" there is another one who says "oh, I forgot to test with
option XYZ... postgres is too complex to support, I'm going to drop it
from the list of supported databases".

I still don't see a compelling reason why opting out of overloading on a
per-function basis won't work. Your objections seemed fairly minor in
comparison to how strongly you are advocating this use case.

In particular, I didn't get a response to:

http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025

For what it's worth, I'm glad that people like you are pushing on these
usability issues, because it can be hard for insiders to see them
sometimes.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Jeff Davis

On Mon, 2012-12-10 at 08:16 -0500, Stephen Frost wrote: 
> I'm trying to figure out why there are all the constraints around this
> command to begin with.  If we're going to support this, why do we
> require the user to create or truncate the table in the same
> transaction?  Clearly that's going to reduce the usefulness of this
> feature and it's not clear to me why that constraint is needed,
> code-wise.

There is a very specific reason:

If you insert frozen tuples into a table that already has tuples in it,
then you no longer have just an isolation issue, you have an *atomicity*
issue (and probably a number of other serious issues) because you can't
roll back. Doing it in the same transaction as the table was created
leaves you with a way to roll back: just delete the table (which will
happen implicitly because the creation is part of the same transaction
anyway).

Perhaps we can take some partial steps toward MVCC-safe access? For
instance, how about trying to recheck the xmin of a pg_class tuple when
starting a scan?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] allowing multiple PQclear() calls

2012-12-10 Thread Josh Kupershmidt

The documentation for PQclear() doesn't say whether it is safe to call
PQclear() more than once on the same PGresult pointer. In fact, it is
not safe, but apparently only because of this last step:
/* Free the PGresult structure itself */
free(res);

The other members of PGresult which may be freed by PQclear are set to
NULL or otherwise handled so as not to not be affected by a subsequent
PQclear().

I find that accounting for whether I've already PQclear'ed a given
PGresult can be quite tedious in some cases. For example, in the
cleanup code at the end of a function where control may goto in case
of a problem, it would be much simpler to unconditionally call
PQclear() without worrying about whether this was already done. One
can see an admittedly small illustration of this headache in
pqSetenvPoll() in our own codebase, where several times PQclear(res);
is called immediately before a goto error_return;

Would it be crazy to add an "already_freed" flag to the pg_result
struct which PQclear() would set, or some equivalent safety mechanism,
to avoid this hassle for users?

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Pavan Deolasee

On Mon, Dec 10, 2012 at 7:02 PM, Stephen Frost  wrote:

>
> I continue to hold that this could end up being a slippery slope for us
> to go down wrt 'correctness' vs. 'do whatever the user wants'.  If we
> keep this to only COPY and where the table has to be truncated/created
> in the same transaction (which requires the user to have sufficient
> privileges to do non-MVCC-safe things on the table already), perhaps
> it's alright.  It'll definitely reduce the interest in finding a real
> solution though, which is unfortunate.
>

I wonder if something more complete can be done by forcing COPY FREEZE
or whatever we call it to take an exclusive lock on the table and run
loading as an append-only operation. By taking a strong lock, we will
block out any concurrent read/writes to the table. If an error occurs
while loading the data, the table will be truncated at the previously
recorded size. We may need some additional book keeping and WAL
logging to handle crash recovery.

To solve the visibility issue for old snapshots that should not see
the new data, we can store some additional visibility information in
the pg_class itself. For example, we can store the size of the table
before the COPY FREEZE started and the XID of the COPY FREEZE
operation. An old snapshot that can not see the XID, can not see the
tuples inserted in the new blocks either. Once the COPY FREEZE
finishes and the lock on the relation is released, new transactions
can start writing to the table and write past the old size of the
table. But that should be OK. If an old snapshot can't see the  tuples
inserted by COPY FREEZE, AFAIK it can't see any of those other tuples
as well.

I'm sure there will still be challenges with this approach. But I
wonder if we can guarantee correctness by proper use of
synchronization and still avoid multiple writes for most common data
loading scenarios.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade -o/-O regression in 9.2.2

2012-12-10 Thread Bruce Momjian

On Tue, Dec 11, 2012 at 12:17:11AM +0200, Marti Raudsepp wrote:
> Hi!
> 
> It seems that PostgreSQL 9.2.2 has a regression in pg_upgrade, the -o
> and -O options forget to add a space before passing on user options,
> thereby generating unparsable command lines.
> 
> For example:
> pg_upgrade -b /usr/local/pg91/bin -B /usr/bin -d /tmp/91 -D /tmp/92 -O -F
> [...]
> Creating catalog dump   ok
> *failure*
> could not connect to new postmaster started with the command:
> "/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/tmp/92" -o "-p
> 50432 -b -c synchronous_commit=off-F -c listen_addresses='' -c
> unix_socket_permissions=0700 -c unix_socket_directory='/tmp'" start
> 
> Notice the bad argument "synchronous_commit=off-F"
> 
> It's easy enough to work around by adding a space to the command line,
> passing -O ' -F' instead of -O '-F'
> 
> Here's the bad commit:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ed5699dd1b883e193930448b7ad532e233de0bd7;hp=5ed6546cf75623ba426942a3b71659a66cf7ed68
> 
> The attached patch re-introduces the space at the necessary place.

I was super-paranoid about making any changes in that area, but it seems
I wasn't paranoid enough.

Patch applied to head and 9.2.  Thanks for the workaround idea too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Noah Misch

On Mon, Dec 10, 2012 at 08:54:04PM -0500, Stephen Frost wrote:
> I agree that it's unlikely that
> applications are depending on today's behavior of TRUNCATE making
> concurrent transactions see an empty table, but it does *not* follow
> that applications *won't* start depending on this new behavior of COPY
> FREEZE.

That is a good point.  I'm not sure whether that should worry us enough to
implement an error in the scenario before letting COPY write frozen tuples.
But it's the strongest argument I've seen for imposing that prerequisite.

> Even if we could fix that, I'd be against allowing it arbitrairly for
> any regular user INSERT or UPDATE; I'm still not particularly happy with
> this approach for COPY.

Sure; COPY is the 99% important case here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multiple --table options for other commands

2012-12-10 Thread Karl O. Pinc

On 10/30/2012 10:14:19 PM, Josh Kupershmidt wrote:

> I went ahead and cooked up a patch to allow pg_restore, clusterdb,
> vacuumdb, and reindexdb to accept multiple --table switches. Hope I
> didn't overlook a similar tool, but these were the only other 
> commands
> I found taking a --table argument.

I believe you need ellipses behind --table in the syntax summaries
of the command reference docs.

(This was all I noticed while reading the patch.)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Noah Misch

On Mon, Dec 10, 2012 at 08:04:55PM -0500, Robert Haas wrote:
> I think the current behavior, where we treat FREEZE as a hint, is just
> awful.  Regardless of whether the behavior is automatic or manually
> requested, the idea that you might get the optimization or not
> depending on the timing of relcache flushes seems very much
> undesirable.  I mean, if the optimization is actually important for
> performance, then you want to get it when you ask for it.  If it
> isn't, then why bother having it at all?  Let's say that COPY FREEZE
> normally doubles performance on a data load that therefore takes 8
> hours - somebody who suddenly loses that benefit because of a relcache
> flush that they can't prevent or control and ends up with a 16 hour
> data load is going to pop a gasket.

Until these threads, I did not know that a relcache invalidation could trip up
the WAL avoidance optimization, and now this.  I poked at the relevant
relcache.c code, and it already takes pains to preserve the needed facts.  The
header comment of RelationCacheInvalidate() indicates that entries bearing an
rd_newRelfilenodeSubid can safely survive the invalidation, but the code does
not implement that.  I think the comment is right, and this is just an
oversight in the code going back to its beginning (fba8113c).

I doubt the comment at the declaration of rd_createSubid in rel.h, though I
can't presently say what correct comment should replace it.  CLUSTER does
preserve the old value, at least for the main table relation.  CLUSTER
probably should *set* rd_newRelfilenodeSubid.

Thanks,
nm
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
***
*** 2149,2156  RelationCacheInvalidate(void)
/* Must close all smgr references to avoid leaving dangling 
ptrs */
RelationCloseSmgr(relation);

!   /* Ignore new relations, since they are never cross-backend 
targets */
!   if (relation->rd_createSubid != InvalidSubTransactionId)
continue;

relcacheInvalsReceived++;
--- 2149,2162 
/* Must close all smgr references to avoid leaving dangling 
ptrs */
RelationCloseSmgr(relation);

!   /*
!* Ignore new relations; no other backend will manipulate them 
before
!* we commit.  Likewise, before replacing a relation's 
relfilenode, we
!* shall have acquired AccessExclusiveLock and drained any 
applicable
!* pending invalidations.
!*/
!   if (relation->rd_createSubid != InvalidSubTransactionId ||
!   relation->rd_newRelfilenodeSubid != 
InvalidSubTransactionId)
continue;

relcacheInvalsReceived++;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch, further describe and-mask nature of the permission system

2012-12-10 Thread Karl O. Pinc

On 11/14/2012 02:35:54 PM, Karl O. Pinc wrote:
> On 11/13/2012 08:50:55 PM, Peter Eisentraut wrote:
> > On Sat, 2012-09-29 at 01:16 -0500, Karl O. Pinc wrote:
> > > This patch makes some sweeping statements.
> > 
> > Unfortunately, they are wrong.
> 
> I will see if anything can be salvaged.

Here's another try.
(I bundled changes to both paragraphs into a single
patch.)

grants-of-roles-are-additive_v3.patch

Regards,


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index fb81af4..b57000c 100644
--- a/doc/src/sgml/ref/grant.sgml
+++ b/doc/src/sgml/ref/grant.sgml
@@ -429,11 +429,32 @@ GRANT role_name [, ...] TO 
 

-A user may perform SELECT, INSERT, etc. on a
-column if he holds that privilege for either the specific column or
-its whole table.  Granting the privilege at the table level and then
+Permission granted to a table grants permission to all the columns
+of a table, regardless of permissions granted to the table's
+columns.  Granting a privilege at the table level and then
 revoking it for one column will not do what you might wish: the
-table-level grant is unaffected by a column-level operation.
+table-level grant is unaffected by a column-level operation.  But
+revoking permission at the table level and granting it at the
+column level does grant permission to the column.
+   
+
+   
+Roles can be fashioned into a permission hierarchy.  Roles having
+the INHERIT attribute (the default) that are
+assigned to other roles in a hierarchical fashion produce a
+permission system which behaves in the fashion of the
+table.column hierarchy.
+E.g. a user's login role can be assigned a role of
+accountant which is in turn assigned a role of
+employee.  The user would have all the permissions of
+an accountant regardless of whether these permissions
+are revoked from the employee role.  And, by
+virtue of the employee/accountant role
+hierarchy, accountants also have all permissions
+granted to employees.  Unlike the fixed
+table.column hierarchy the
+PostgreSQL user is free to fashion roles into
+arbitrary hierarchical structures.

 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> You know, I hadn't been taking that option terribly seriously, but
> maybe we ought to reconsider it.  It would certainly be simpler, and
> as you point out, it's not really any worse from an MVCC point of view
> than anything else we do.  Moreover, it would make this available to
> clients like pg_dump without further hackery.

I really don't agree with this notion that the behavior of TRUNCATE, a
top-level, seperately permissioned command, makes it OK to introduce
other busted behavior in existing commands.

> I think the current behavior, where we treat FREEZE as a hint, is just
> awful.  

I agree that it's pretty grotty, but I had assumed it was at least
deterministic, ala TRUNCATE/COPY and WAL...  If it isn't, then this
certainly gets really ugly really quickly.

I don't think that means we should go ahead and try to always optimize
it though- even when it isn't explicit, there will be an expectation
that it's going to work when all the 'right' conditions are met.  I know
that's certainly how I feel about TRUNCATE/COPY and WAL'ing.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Stephen Frost

Noah,

* Noah Misch (n...@leadboat.com) wrote:
> I agree we should be reticent to compromise correctness for convenience.
> Compromising mere bug-compatibility, trading one incorrect behavior for
> another incorrect behavior, is not as bad.  Furthermore, today's behavior in
> question is not something I can see applications deliberately and successfully
> relying upon.

I actually don't agree with the notion that one bad bug should allow us
to introduce additional such bugs.  I agree that it's unlikely that
applications are depending on today's behavior of TRUNCATE making
concurrent transactions see an empty table, but it does *not* follow
that applications *won't* start depending on this new behavior of COPY
FREEZE.

> Extending it to cases not involving a just-created or just-truncated table
> really would compromise correctness; errors could leave the table in an
> otherwise-impossible state.  Let's indeed not go there.

Even if we could fix that, I'd be against allowing it arbitrairly for
any regular user INSERT or UPDATE; I'm still not particularly happy with
this approach for COPY.

> > It'll definitely reduce the interest in finding a real
> > solution though, which is unfortunate.
> 
> That effect seems likely, but I do not find it unfortunate.  The change
> variant I have advocated does not stand in contrast to some "real solution" to
> PostgreSQL's treatment for readers of tables created or truncated by a
> transaction not in the reader's snapshot.  The two topics interact at arm's
> length.  Bundling them into one patch, artificially making them to stand or
> fall as a pair, is not a win for PostgreSQL.

Having proper MVCC support for DDL *would* be a win for PostgreSQL and
this *does* reduce the chances of that ever happening.

> That does raise another disadvantage of making the change syntax-controlled:
> if we someday implement the other improvement, COPY FREEZE will have minimal
> reason not to be the default.  FREEZE then becomes a relic noise word.

Indeed, that's certainly unfortunate as well.  Really, though, it just
goes to show how much of a hack this is rather than a real solution.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks

2012-12-10 Thread Robert Haas

On Mon, Dec 3, 2012 at 9:59 AM, Kohei KaiGai  wrote:
> As we discussed before, it is hard to determine which attributes shall
> be informed to extension via object_access_hook, so the proposed
> post-alter hook (that allows to compare older and newer versions)
> works fine on 99% cases.
> However, I'm inclined to handle SET TABLESPACE as an exception
> of this scenario. For example, an idea is to define OAT_PREP_ALTER
> event additionally, but only invoked very limited cases that takes
> unignorable side-effects until system catalog updates.
> For consistency of hook, OAT_POST_ALTER event may also ought
> to be invoked just after catalog updates of pg_class->reltablespace,
> but is_internal=true.
>
> How about your opinion?

I don't really like it - I think POST should mean POST.  You can
define some other hook that gets called somewhere else, but after
means after.  There are other plausible uses of these hooks besides
sepgsql; for example, logging the completion time of an action.
Putting the hooks in the "wrong" places because that happens to be
more convenient for sepgsql is going to render them useless for any
other purpose.  Maybe nobody else will use them anyway, but I'd rather
not render it impossible before we get off the ground.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Robert Haas

On Sun, Dec 9, 2012 at 3:06 PM, Noah Misch  wrote:
> I favor[1] unconditionally letting older snapshots see the new rows after the
> CREATE+COPY transaction commits.  To recap, making affected scans see an empty
> table is as wrong as making them see those rows.  Robert also listed[2] that
> as a credible option, and I don't recall anyone opining against it in previous
> discussions.  I did perceive an undercurrent preference, all other things
> being equal, for an optimization free from semantic side-effects.  I shared
> that preference, but investigations showed that we must compromise something.

You know, I hadn't been taking that option terribly seriously, but
maybe we ought to reconsider it.  It would certainly be simpler, and
as you point out, it's not really any worse from an MVCC point of view
than anything else we do.  Moreover, it would make this available to
clients like pg_dump without further hackery.

I think the current behavior, where we treat FREEZE as a hint, is just
awful.  Regardless of whether the behavior is automatic or manually
requested, the idea that you might get the optimization or not
depending on the timing of relcache flushes seems very much
undesirable.  I mean, if the optimization is actually important for
performance, then you want to get it when you ask for it.  If it
isn't, then why bother having it at all?  Let's say that COPY FREEZE
normally doubles performance on a data load that therefore takes 8
hours - somebody who suddenly loses that benefit because of a relcache
flush that they can't prevent or control and ends up with a 16 hour
data load is going to pop a gasket.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Michael Paquier

On Mon, Dec 10, 2012 at 11:51 PM, Andres Freund wrote:

> Btw, as an example of the problems caused by renaming:
>
> postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
> serial primary key, a_id int REFERENCES a);
> CREATE TABLE
> Time: 137.840 ms
> CREATE TABLE
> Time: 143.500 ms
> postgres=# \d b
>  Table "public.b"
>  Column |  Type   |   Modifiers
> +-+
>  id | integer | not null default nextval('b_id_seq'::regclass)
>  a_id   | integer |
> Indexes:
> "b_pkey" PRIMARY KEY, btree (id)
> Foreign-key constraints:
> "b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
>
> postgres=# REINDEX TABLE a CONCURRENTLY;
> NOTICE:  drop cascades to constraint b_a_id_fkey on table b
> REINDEX
> Time: 248.992 ms
> postgres=# \d b
>  Table "public.b"
>  Column |  Type   |   Modifiers
> +-+
>  id | integer | not null default nextval('b_id_seq'::regclass)
>  a_id   | integer |
> Indexes:
> "b_pkey" PRIMARY KEY, btree (id)
>
Oops. I will fix that in the next version of the patch. There should be an
elegant way to change the dependencies at the swap phase.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Noah Misch

On Mon, Dec 10, 2012 at 08:32:53AM -0500, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Fri, Dec 07, 2012 at 06:51:18PM -0500, Stephen Frost wrote:
> > > Now, what I've honestly been hoping for on this thread was for someone
> > > to speak up and point out why I'm wrong about this concern and explain
> > > how this patch addresses that issue.  If that's been done, I've missed
> > > it..
> 
> [...]
> 
> So, apparently I'm not wrong about my concern, but no one seems to share
> my feelings on this change.
> 
> I continue to hold that this could end up being a slippery slope for us
> to go down wrt 'correctness' vs. 'do whatever the user wants'.

I agree we should be reticent to compromise correctness for convenience.
Compromising mere bug-compatibility, trading one incorrect behavior for
another incorrect behavior, is not as bad.  Furthermore, today's behavior in
question is not something I can see applications deliberately and successfully
relying upon.

> If we
> keep this to only COPY and where the table has to be truncated/created
> in the same transaction (which requires the user to have sufficient
> privileges to do non-MVCC-safe things on the table already), perhaps
> it's alright.

Extending it to cases not involving a just-created or just-truncated table
really would compromise correctness; errors could leave the table in an
otherwise-impossible state.  Let's indeed not go there.

I see no particular need to restrict this to COPY; that's just the most
important case by far.  As a side note, the calculus for whether to extend the
optimization to INSERT and UPDATE differs from that for WAL avoidance.  WAL
avoidance can be a substantial loss when the total change is small.  For
pre-hinting/freezing, the worst case is having needlessly checked a few local
variables to rule out the optimization.

> It'll definitely reduce the interest in finding a real
> solution though, which is unfortunate.

That effect seems likely, but I do not find it unfortunate.  The change
variant I have advocated does not stand in contrast to some "real solution" to
PostgreSQL's treatment for readers of tables created or truncated by a
transaction not in the reader's snapshot.  The two topics interact at arm's
length.  Bundling them into one patch, artificially making them to stand or
fall as a pair, is not a win for PostgreSQL.

That does raise another disadvantage of making the change syntax-controlled:
if we someday implement the other improvement, COPY FREEZE will have minimal
reason not to be the default.  FREEZE then becomes a relic noise word.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-12-10 Thread Jaime Casanova

On Fri, Nov 23, 2012 at 4:56 AM, Amit Kapila  wrote:
> On Thursday, November 22, 2012 10:09 PM Fujii Masao wrote:
>
>> Is it helpful to output the notice message like 'Run pg_reload_conf() or
>> restart the server if you want new settings to take effect' always after
>> SET PERSISTENT command?
>
> Not sure because if someone uses SET PERSISTENT command, he should know the
> effect or behavior of same.
> I think it's good to put this in UM along with "PERSISTENT" option
> explanation.
>

can we at least send the source file in the error message when a
parameter has a wrong value?

suppose you do: SET PERSISTENT shared_preload_libraries TO
'some_nonexisting_lib';
when you restart postgres and that lib doesn't exist you can get
confused when looking at postgresql.conf and find an empty string
there

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing SSL connection due to weird interaction with openssl

2012-12-10 Thread Tom Lane

Robert Haas  writes:
> FWICS, this kind of problem is endemic in OpenSSL, which
> also doesn't seem to believe in comprehensive documentation or code
> comments.  It would be nice if we had an API to some other, less
> crappy encryption library; or maybe even some generic API that lets
> you easily wire it into any library you happen to wish to use.

Awhile back Red Hat was trying to get people to switch to NSS or GnuTLS,
which apparently are better designed.

> Not that I'm volunteering to write the patch... :-(

Me either ... and in fact the lack of interest among upstreams in
rewriting their TLS code is what made the aforesaid effort crash and
burn.  But FWIW, there are better alternatives out there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failing SSL connection due to weird interaction with openssl

2012-12-10 Thread Robert Haas

On Sat, Dec 8, 2012 at 11:07 AM, Andres Freund  wrote:
> As there hasn't been any new input since this comment I am marking the
> patch as "Rejected" in the CF application.

Sounds good.  FWIW, even if we were going to accept this, I can't
imagine back-patching it.  Users will come after us with pitchforks if
we change something like this in a minor release, and for good reason.
 This could utterly break working applications in a fashion that
requires code changes and a recompile to fix.  That is not a nice kind
of thing for a shared library to do as part of a security/bug fix
update.

If you ask me, the problem here is that OpenSSL's error-reporting
mechanism is just plain badly designed.  I remember programming in
BASIC back in the 80s and thinking to myself: what kind of a stupid
error-handling interface is ON ERROR GOTO?  And can I pummel the
person who came up with it?  This is basically a throwback to that
sort of design, where your error-handlers get to guess where exactly
the program was when the exception happened.  You can make it work if
you try hard enough, but you sure have to try hard.  Frankly, I don't
have a lot of hope of making things a whole lot better here no matter
what we do.  FWICS, this kind of problem is endemic in OpenSSL, which
also doesn't seem to believe in comprehensive documentation or code
comments.  It would be nice if we had an API to some other, less
crappy encryption library; or maybe even some generic API that lets
you easily wire it into any library you happen to wish to use.

Not that I'm volunteering to write the patch... :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sketch of a Hook into the Logging Collector

2012-12-10 Thread Daniel Farina

On Sat, Dec 8, 2012 at 10:40 AM, Daniel Farina  wrote:
> Hello all,
>
> I am approaching this from the angle of increasing power by exposing
> the log collector ("syslogger") pipe protocol.

I just spotted a better, already-committed patch.  Thanks to Hannu for
pointing it out:

https://commitfest.postgresql.org/action/patch_view?id=717

I'll retract this patch, unless someone finds it interesting for some reason.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication optimization: page writes only at the slave

2012-12-10 Thread Robert Haas

On Mon, Dec 10, 2012 at 10:56 AM, Xin Pan  wrote:
> However, I still witness large amount of page writes.
> Can anyone tell where are the page writes come from?

Probably not without more details.  Things like VACUUM, COPY, and
sequential scans use ring-buffers that are smaller than
shared_buffers, so you often see the cache fill up with your data only
slowly just after starting the database, but if the database really
fits in shared_buffers, you should eventually settle into a rhythm
where dirty buffers are written to disk only once per checkpoint
cycle.

You might want to monitor pg_stat_bgwriter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [SPAM?]: Re: Support for REINDEX CONCURRENTLY

2012-12-10 Thread Andres Freund

On 2012-12-10 22:33:50 +, Simon Riggs wrote:
> On 10 December 2012 22:27, Peter Eisentraut  wrote:
> > On 12/10/12 5:21 PM, Simon Riggs wrote:
> >> On 10 December 2012 22:18, Peter Eisentraut  wrote:
> >>> On 12/8/12 9:40 AM, Tom Lane wrote:
>  I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>  preserve the index name exactly.  Something like adding or removing
>  trailing underscores would probably serve to generate a nonconflicting
>  name that's not too unsightly.
> >>>
> >>> If you think you can rename an index without an exclusive lock, then why
> >>> not rename it back to the original name when you're done?
> >>
> >> Because the index isn't being renamed. An alternate equivalent index
> >> is being created instead.
> >
> > Right, basically, you can do this right now using
> >
> > CREATE INDEX CONCURRENTLY ${name}_tmp ...
> > DROP INDEX CONCURRENTLY ${name};
> > ALTER INDEX ${name}_tmp RENAME TO ${name};
> >
> > The only tricks here are if ${name}_tmp is already taken, in which case
> > you might as well just error out (or try a few different names), and if
> > ${name} is already in use by the time you get to the last line, in which
> > case you can log a warning or an error.
> >
> > What am I missing?
>
> That this is already recorded in my book> ;-)
>
> And also that REINDEX CONCURRENTLY doesn't work like that, yet.

The last submitted patch works pretty similar:

CREATE INDEX CONCURRENTLY $name_cct;
ALTER INDEX $name RENAME TO cct_$name;
ALTER INDEX $name_tmp RENAME TO $tmp;
ALTER INDEX $name_tmp RENAME TO $name_cct;
DROP INDEX CONURRENCTLY $name_cct;

It does that under an exlusive locks, but doesn't handle dependencies
yet...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [SPAM?]: Re: Support for REINDEX CONCURRENTLY

2012-12-10 Thread Andres Freund

On 2012-12-10 17:27:45 -0500, Peter Eisentraut wrote:
> On 12/10/12 5:21 PM, Simon Riggs wrote:
> > On 10 December 2012 22:18, Peter Eisentraut  wrote:
> >> On 12/8/12 9:40 AM, Tom Lane wrote:
> >>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> >>> preserve the index name exactly.  Something like adding or removing
> >>> trailing underscores would probably serve to generate a nonconflicting
> >>> name that's not too unsightly.
> >>
> >> If you think you can rename an index without an exclusive lock, then why
> >> not rename it back to the original name when you're done?
> >
> > Because the index isn't being renamed. An alternate equivalent index
> > is being created instead.
>
> Right, basically, you can do this right now using
>
> CREATE INDEX CONCURRENTLY ${name}_tmp ...
> DROP INDEX CONCURRENTLY ${name};
> ALTER INDEX ${name}_tmp RENAME TO ${name};
>
> The only tricks here are if ${name}_tmp is already taken, in which case
> you might as well just error out (or try a few different names), and if
> ${name} is already in use by the time you get to the last line, in which
> case you can log a warning or an error.
>
> What am I missing?

I don't think this is the problematic side of the patch.

The question is rather how to transfer the dependencies without too much
ugliness or how to swap oids without a race. Either by accepting an
exlusive lock or by playing some games, the latter possibly being easier
with renaming...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [SPAM?]: Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Simon Riggs

On 10 December 2012 22:27, Peter Eisentraut  wrote:
> On 12/10/12 5:21 PM, Simon Riggs wrote:
>> On 10 December 2012 22:18, Peter Eisentraut  wrote:
>>> On 12/8/12 9:40 AM, Tom Lane wrote:
 I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
 preserve the index name exactly.  Something like adding or removing
 trailing underscores would probably serve to generate a nonconflicting
 name that's not too unsightly.
>>>
>>> If you think you can rename an index without an exclusive lock, then why
>>> not rename it back to the original name when you're done?
>>
>> Because the index isn't being renamed. An alternate equivalent index
>> is being created instead.
>
> Right, basically, you can do this right now using
>
> CREATE INDEX CONCURRENTLY ${name}_tmp ...
> DROP INDEX CONCURRENTLY ${name};
> ALTER INDEX ${name}_tmp RENAME TO ${name};
>
> The only tricks here are if ${name}_tmp is already taken, in which case
> you might as well just error out (or try a few different names), and if
> ${name} is already in use by the time you get to the last line, in which
> case you can log a warning or an error.
>
> What am I missing?

That this is already recorded in my book> ;-)

And also that REINDEX CONCURRENTLY doesn't work like that, yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [SPAM?]: Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Peter Eisentraut

On 12/10/12 5:21 PM, Simon Riggs wrote:
> On 10 December 2012 22:18, Peter Eisentraut  wrote:
>> On 12/8/12 9:40 AM, Tom Lane wrote:
>>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>>> preserve the index name exactly.  Something like adding or removing
>>> trailing underscores would probably serve to generate a nonconflicting
>>> name that's not too unsightly.
>>
>> If you think you can rename an index without an exclusive lock, then why
>> not rename it back to the original name when you're done?
> 
> Because the index isn't being renamed. An alternate equivalent index
> is being created instead.

Right, basically, you can do this right now using

CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};

The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.

What am I missing?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [SPAM?]: Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Simon Riggs

On 10 December 2012 22:18, Peter Eisentraut  wrote:
> On 12/8/12 9:40 AM, Tom Lane wrote:
>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>> preserve the index name exactly.  Something like adding or removing
>> trailing underscores would probably serve to generate a nonconflicting
>> name that's not too unsightly.
>
> If you think you can rename an index without an exclusive lock, then why
> not rename it back to the original name when you're done?

Because the index isn't being renamed. An alternate equivalent index
is being created instead.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Peter Eisentraut

On 12/8/12 9:40 AM, Tom Lane wrote:
> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> preserve the index name exactly.  Something like adding or removing
> trailing underscores would probably serve to generate a nonconflicting
> name that's not too unsightly.

If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] pg_upgrade -o/-O regression in 9.2.2

2012-12-10 Thread Marti Raudsepp

Hi!

It seems that PostgreSQL 9.2.2 has a regression in pg_upgrade, the -o
and -O options forget to add a space before passing on user options,
thereby generating unparsable command lines.

For example:
pg_upgrade -b /usr/local/pg91/bin -B /usr/bin -d /tmp/91 -D /tmp/92 -O -F
[...]
Creating catalog dump   ok
*failure*
could not connect to new postmaster started with the command:
"/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/tmp/92" -o "-p
50432 -b -c synchronous_commit=off-F -c listen_addresses='' -c
unix_socket_permissions=0700 -c unix_socket_directory='/tmp'" start

Notice the bad argument "synchronous_commit=off-F"

It's easy enough to work around by adding a space to the command line,
passing -O ' -F' instead of -O '-F'

Here's the bad commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ed5699dd1b883e193930448b7ad532e233de0bd7;hp=5ed6546cf75623ba426942a3b71659a66cf7ed68

The attached patch re-introduces the space at the necessary place.

Regards,
Marti


pg_upgrade_user_options-9.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-12-10 Thread Peter Geoghegan

On 10 December 2012 20:52, David Johnston  wrote:
> Just skimming this topic but if these enhanced error fields are going to be
> used by software, and we have 99% adherence to a standard, then my first
> reaction is why not just supply "" (or "" as
> appropriate) instead of suppressing the field altogether in these (and
> possibly other, future) cases and make adherence for these fields 100%?

Well, this is an area that the standard doesn't have anything to say
about. The standard defines errcodes, but not what fields they make
available. I suppose you could say that the patch proposes that they
become a Postgres extension to the standard.

I probably could have found more places to set the fields. Certainly,
I've already set them in some places where they are not actually
required to be set by the new contract of errcodes.txt/errcodes.h. My
immediate concern is getting something minimal committed, though. I
think the latest revision handles all of the important cases (i.e.
cases where applications may want a well-principled way of detecting a
particular kind of error, like an error due to the violation of a
particular, known constraint).

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-10 Thread Pavel Stehule

Hello

2012/12/10 Robert Haas :
> On Tue, Nov 27, 2012 at 4:36 PM, Peter Eisentraut  wrote:
>> On 11/25/12 6:36 PM, Robert Haas wrote:
>>> I think that is true.  But for whatever it's worth, and at the risk of
>>> beating a horse that seems not to be dead yet in spite of the fact
>>> that I feel I've already administered one hell of a beating, the LPAD
>>> case is unambiguous, and therefore it is hard to see what sort of
>>> programming mistake we are protecting users against.
>>
>> Upstream applications passing wrong data down to the database.
>
> The circumstantial evidence suggests that many users don't want to be
> protected against that in the way that we are currently protecting
> them - or at least not all of them do (see Merlin's email elsewhere on
> this thread).  What's frustrating about the status quo is that not
> only do you need lots of extra casts, but there's no real way to
> improve the situation.  If you add an implicit cast from integer to
> text, for example, then 4 || 'foo' breaks.  Now, you may think that
> adding an implicit cast from integer to text is a dumb idea, and maybe
> it is, but don't get too hung up on that example.  The point is that
> if you're unhappy with the quote_literal() case (because it accepts
> too much), or the lpad() case (because it doesn't accept enough), or
> the foo(smallint) case (because it can't be happy with foo(42)), you
> don't really have a lot of options for adjusting the behavior as
> things stand today.  I accept that some people think that decorating
> their code with lots of extra casts helps to avoid errors, and maybe
> it does, but there is plenty of evidence that a lot of users don't
> want to.  And we not only don't give them the behavior they want; we
> don't even have a meaningful way to give the option of opting into
> that behavior at initdb or create-database time.
>

it is looking so our design missing some feature, flag, that can
signalize safety of implicit cast - or allow more exactly to specify
casting rules for related functionality. For some functions we do this
magic inside parser and rewriter, but we don't allow this for custom
functions.

Few years ago I proposed a parser hooks, where this task can be
solved. The parser hook is probably too generic, but probably we don't
design good solution just by simple change of some parameter of
current design, and we should to enhance current design - maybe some
new parameter modifiers?

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-12-10 Thread David Johnston

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Peter Geoghegan
> Sent: Monday, December 10, 2012 3:29 PM
> To: Pavel Stehule
> Cc: PostgreSQL Hackers; Alvaro Herrera; Tom Lane
> Subject: Re: [HACKERS] enhanced error fields
> 
 
> Now, there are one or two places where these fields are not actually
> available even though they're formally required according to a literal
reading
> of the above. This is only because there is clearly no such field sensibly
> available, even in principle - to my mind this cannot be a problem,
because
> the application developer cannot have any reasonable expectation of a
field
> being set. I'm really talking about two cases in particular:
> 
> * For ERRCODE_NOT_NULL_VIOLATION, we don't actually provide
> schema_name and table_name in the event of domains. This was previously
> identified as an issue. If it is judged better to not have any
requirements
> there at all, so be it.
> 
> * For the validateDomainConstraint() ERRCODE_CHECK_VIOLATION ereport
> call, we may not provide a constraint name iff a Constraint.connname is
> NULL. Since there isn't a constraint name to give even in principle, and
this is
> an isolated case, this seems reasonable.
> 

Just skimming this topic but if these enhanced error fields are going to be
used by software, and we have 99% adherence to a standard, then my first
reaction is why not just supply "" (or "" as
appropriate) instead of suppressing the field altogether in these (and
possibly other, future) cases and make adherence for these fields 100%?

>From an "ease-of-use" aspect for the API if I can simply always query each
of those fields and know I will be receiving a string it does at least seem
theoretically easier to interface with.  If I am expecting special string
values (enclosed in symbols making them invalid identifiers) I can then
handle those as desired without either receiving an error or a NULL when I
go to poll the missing field if those couple of instances.

I may be paranoid or mistaken regarding how this work but figured I'd at
least throw it out for consideration.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shuffling xlog header files

2012-12-10 Thread Simon Riggs

On 10 December 2012 17:56, Heikki Linnakangas  wrote:

> Any objections?

No objections, though I'm concerned to make as few changes as possible. Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader

2012-12-10 Thread Heikki Linnakangas


On 10.12.2012 22:22, Heikki Linnakangas wrote:

(Offlist)

Just a quick note that I'm working on this patch now. I pushed some
trivial fixes to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, xlogreader_v3
branch.


Oops, wasn't offlist :-). Well, if anyone wants to take a look, feel free.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader

2012-12-10 Thread Heikki Linnakangas


(Offlist)

Just a quick note that I'm working on this patch now. I pushed some 
trivial fixes to my git repository at 
git://git.postgresql.org/git/users/heikki/postgres.git, xlogreader_v3 
branch.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The tarball's README has bad install instructions

2012-12-10 Thread Andrew Dunstan



On 12/10/2012 02:42 PM, Karl O. Pinc wrote:


I made a tarball from head, but did not look at it. :-(


Note that the INSTALL file is not present in the git repository, it's
generated and included in the tarball. See README.git.

That's my problem, I had checked out from git.
Thanks and sorry to bother you.




To see what the tarball will contain, run "make dist".

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch, distinguish sections with an empty row in error code table

2012-12-10 Thread Karl O. Pinc

On 11/08/2012 11:55:19 AM, Karl O. Pinc wrote:
> On 11/08/2012 11:10:39 AM, Robert Haas wrote:

> > Ah, well, as to that, I think you'd have to take that suggestion to
> > pgsql-www.  The style sheets used for the web site are - just to
> make
> > things exciting - stored in a completely different source code
> > repository to which I don't have access.  Some kind of CSS
> > frobnication along the lines you suggest might be worth discussing,
> > but I don't really work on that stuff.
> 
> Without being able to pass additional style from the source
> docs through to the html it seems a bit spooky to do this.

Since the existing style sheets aren't maintained upstream
and don't pass the necessary style through to the generated
style sheets, and since even if it did the style sheets
of the official docs on postgresql.org would not reflect
any changes made here, it seems like this patch should be
rejected.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The tarball's README has bad install instructions

2012-12-10 Thread Karl O. Pinc

On 12/10/2012 01:29:03 PM, Heikki Linnakangas wrote:
> On 10.12.2012 21:19, Karl O. Pinc wrote:
> > The top-level README in the source tarball says:
> >
> > -
> > See the file INSTALL for instructions on how to build and install
> > PostgreSQL.  That file also lists supported operating systems and
> > hardware platforms and contains information regarding any other
> > software packages that are required to build or run the PostgreSQL
> > system.  Changes between all PostgreSQL releases are recorded in 
> the
> > file HISTORY.  Copyright and license information can be found in 
> the
> > file COPYRIGHT.  A comprehensive documentation set is included in
> this
> > distribution; it can be read as described in the installation
> > instructions.
> > -
> >
> > There is no INSTALL file, and although there is documentation
> > it's scattered in a bunch of sgml files.
> 
> Which tarball did you look at? 

I made a tarball from head, but did not look at it. :-(

> Note that the INSTALL file is not present in the git repository, it's 
> generated and included in the tarball. See README.git.

That's my problem, I had checked out from git.  
Thanks and sorry to bother you.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The tarball's README has bad install instructions

2012-12-10 Thread Heikki Linnakangas


On 10.12.2012 21:19, Karl O. Pinc wrote:

The top-level README in the source tarball says:

-
See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Changes between all PostgreSQL releases are recorded in the
file HISTORY.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.
-

There is no INSTALL file, and although there is documentation
it's scattered in a bunch of sgml files.


Which tarball did you look at? The INSTALL file is there in the 9.2.2 
tarball at least:


~$ tar tvjf postgresql-9.2.2.tar.bz2 | grep INSTALL
-rw-r--r-- pgsql/pgsql   76825 2012-12-03 22:34 postgresql-9.2.2/INSTALL

Note that the INSTALL file is not present in the git repository, it's 
generated and included in the tarball. See README.git.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-12-10 Thread Jeff Davis

It looks like there are still some problems with this patch.

  CREATE TABLE foo(ir int4range);
  insert into foo select 'empty' from generate_series(1,1);
  insert into foo select int4range(NULL, g, '(]')
from generate_series(1,100) g;
  insert into foo select int4range(g, NULL, '[)')
from generate_series(1,100) g;
  insert into foo select int4range(g, ((g*1.01)+10)::int4, '[]')
from generate_series(1,100) g;
  CREATE TABLE bar(ir) AS select * from foo order by random();
  ANALYZE bar;

Now:
  EXPLAIN ANALYZE SELECT * FROM bar
WHERE ir @> int4range(1,2);

The estimates are "-nan". Similar for many other queries.

And I have a few other questions/comments:

* Why is "summ" spelled with two "m"s? Is it short for "summation"? If
so, might be good to use "summation of" instead of "integrate" in the
comment.

* Why does get_length_hist_frac return 0.0 when i is the last value? Is
that a mistake?

* I am still confused by the distinction between rbound_bsearch and
rbound_bsearch_bin. What is the intuitive purpose of each?

* You use "constant value" in the comments in several places. Would
"query value" or "search key" be better?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The tarball's README has bad install instructions

2012-12-10 Thread Karl O. Pinc

Hi,

The top-level README in the source tarball says:

-
See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Changes between all PostgreSQL releases are recorded in the
file HISTORY.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.
-

There is no INSTALL file, and although there is documentation
it's scattered in a bunch of sgml files.

I was going send in a patch that referred the user to
the on-line postgres docs but it occurred to me that it
would be possible to include the documentation in plain text
form in the tarballs.  This would add at least 1.7M to
the size of the tarball, and complicate the building
of the tarball due to all the dependencies needed to build
the docs.

All the same, it's aesthetically pleasing to have build instructions
included in the tarball.

A minimal replacement of the above text might be:

-
See the chapter titled "Installation from Source Code" found
in the PostgreSQL documentation available from
http://www.postgresql.org/docs/ for installation instructions,
supported platforms, and build requirements.  A record of the
changes made between PostgreSQL releases is recorded in
an appendix to the documentation.  Copyright and license 
information can be found in the file COPYRIGHT.
-

Is anyone interested in a patch that includes plain text
documentation in the tarballs?  How about a patch to the
README per the text above?

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-10 Thread Robert Haas

On Tue, Nov 27, 2012 at 4:36 PM, Peter Eisentraut  wrote:
> On 11/25/12 6:36 PM, Robert Haas wrote:
>> I think that is true.  But for whatever it's worth, and at the risk of
>> beating a horse that seems not to be dead yet in spite of the fact
>> that I feel I've already administered one hell of a beating, the LPAD
>> case is unambiguous, and therefore it is hard to see what sort of
>> programming mistake we are protecting users against.
>
> Upstream applications passing wrong data down to the database.

The circumstantial evidence suggests that many users don't want to be
protected against that in the way that we are currently protecting
them - or at least not all of them do (see Merlin's email elsewhere on
this thread).  What's frustrating about the status quo is that not
only do you need lots of extra casts, but there's no real way to
improve the situation.  If you add an implicit cast from integer to
text, for example, then 4 || 'foo' breaks.  Now, you may think that
adding an implicit cast from integer to text is a dumb idea, and maybe
it is, but don't get too hung up on that example.  The point is that
if you're unhappy with the quote_literal() case (because it accepts
too much), or the lpad() case (because it doesn't accept enough), or
the foo(smallint) case (because it can't be happy with foo(42)), you
don't really have a lot of options for adjusting the behavior as
things stand today.  I accept that some people think that decorating
their code with lots of extra casts helps to avoid errors, and maybe
it does, but there is plenty of evidence that a lot of users don't
want to.  And we not only don't give them the behavior they want; we
don't even have a meaningful way to give the option of opting into
that behavior at initdb or create-database time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_database_size issue an error (ERROR: could not stat file "base/16384/20041": Permission denied)

2012-12-10 Thread Christophe GUILLOT

Hi all,
The usage of the pg_database_size function generate a lot of "permission
denied" log errors.
Typical message is: ERROR:  could not stat file "base/16384/20041":
Permission denied.

As a short description:
- PostgreSQL 9.2.1/64 bits
- our system is running on windows platform (W7, W2008)
- frequent alter/create/drop of tables
- call to pg_database_size, sometimes.

When logging file system activity (using procmon, W7), the query made on
the file printed in the log message return a “DELETE PENDING” state and i
only find a handle to this file owned by a postgres executable.

After code exploration, it looks like the functions from "dbsize.c" should
use the same trick as found in "md.c" (i mean the FILE_POSSIBLY_DELETED
macro) for testing each "stat()" returned error code.

So, is it a good idea to patch dbsize.c with the same macro, which on
windows platform include the EACCES error code as a possibly deleted file,
or not. Or is it the wrong way ? I didn't try to patch yet.

Thanks for your help,

Christophe

Re: [HACKERS] Shuffling xlog header files

2012-12-10 Thread Andres Freund

Hi,

The xlog_fn.h patch was Alvaro's idea (and patch) btw, I previously had
used an ugly typedef for Datum to get arround defining
FRONTEND/including postgres.h...

On 2012-12-10 19:56:49 +0200, Heikki Linnakangas wrote:
> We still need the "#define FRONTEND 1" ugly hack in pg_controldata and
> pg_resetxlog with this, but we get rid of it in pg_basebackup. I think
> that's reasonable, pg_controldata and pg_resetxlog are more low-level
> programs than pg_basebackup.

It's also for the pg_receivellog introduced in one of my other
patches... So it seems to be a good idea to me.

> The name xlog_internal.h is a bit of a misnomer now, it's
> not very internal anymore, if it can now actually be included by external
> programs. But the point is that the file contains declarations related to
> the WAL file format.

We could rename it to xlog_details.h or such, but I guess the noise
would outhweigh the benefits.

> Any objections?

Unsurprisingly none from my side.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] PL/Python: Add spidata to all spiexceptions

2012-12-10 Thread Karl O. Pinc

On 12/09/2012 10:33:59 PM, Karl O. Pinc wrote:
> Hi,
> 
> I don't feel particularly qualified to comment, but in the
> interest of (hopefully) helping with the patch review process
> I'm going to comment anyway.

I've gone ahead and signed up to review this patch.

I can confirm that it compiles against head and
the tests pass.  I started up a server
and tried it and it works for me, trapping
the exception and executing the exception code.

So, looks good, as far as it goes.
I await your response to my previous message
in this thread before sending it on a
a committer.  There were 2 outstanding
issue raised:

Is this useful enough/proceeding in the right
direction to commit now?

Should some of the logic be moved out of a
subroutine and into the calling code?

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-10 Thread Fujii Masao

On Tue, Dec 11, 2012 at 1:33 AM, Heikki Linnakangas
 wrote:
> On 10.12.2012 13:50, Heikki Linnakangas wrote:
>>
>> So I'd say we should update minRecoveryPoint first, then
>> truncate/delete. But we should still keep the XLogFlush() at the end of
>> xact_redo_commit_internal(), for the case where files/directories are
>> created. Patch attached.

Sounds reasonable.

> Committed and backpatched that. Attached is a script I used to reproduce
> this problem, going back to 8.4.

Thanks!

Unfortunately I could reproduce the problem even after that commit.
Attached is the script I used to reproduce the problem.

The cause is that CheckRecoveryConsistency() is called before rm_redo(),
as Horiguchi-san suggested upthead. Imagine the case where
minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
record. When restarting the server with that minRecoveryPoint,
the followings would happen, and then PANIC occurs.

1. XLOG_SMGR_TRUNCATE record is read.
2. CheckRecoveryConsistency() is called, and database is marked as
consistent since we've reached minRecoveryPoint (i.e., the location
of XLOG_SMGR_TRUNCATE).
3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
found.
4. Since the database has already been marked as consistent, an invalid
page leads to PANIC.

Regards,

-- 
Fujii Masao

fujii_test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Shuffling xlog header files

2012-12-10 Thread Heikki Linnakangas

Andres Freund's xlogreader patch contains a change to move the 
declarations of SQL-callable functions in xlogfuncs.c to a new header 
file, xlog_fn.h. The purpose is to allow xlog_internal.h to be included 
in a frontend program, as the PG_FUNCTION_ARGS and Datum used in the 
prototypes require fmgr.h, which is backend-only. I think his patch 
missed a trick: pg_basebackup, pg_controlinfo and pg_resetxlog currently 
use this for the same purpose:


/*
 * We have to use postgres.h not postgres_fe.h here, because there's so 
much

 * backend-only stuff in the XLOG include files we need.  But we need a
 * frontend-ish environment otherwise.  Hence this ugly hack.
 */
#define FRONTEND 1

#include "postgres.h"
...
#include "access/xlog.h"

But this got me thinking whether we should do the xlog_fn.h refactoring, 
so that we could get rid of that ugly hack in the existing programs, 
too. I ended up with the attached. It allows xlog_internal.h to be 
included in frontend programs. The name xlog_internal.h is a bit of a 
misnomer now, it's not very internal anymore, if it can now actually be 
included by external programs. But the point is that the file contains 
declarations related to the WAL file format.


We still need the "#define FRONTEND 1" ugly hack in pg_controldata and 
pg_resetxlog with this, but we get rid of it in pg_basebackup. I think 
that's reasonable, pg_controldata and pg_resetxlog are more low-level 
programs than pg_basebackup.


Any objections?

- Heikki
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 862e3fa..c958a4f 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "utils/guc.h"
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 9bd6b8e..0ef53e7 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index d345761..40c0bd6 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "access/xlog.h"
+#include "access/xlog_fn.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 709ccf1..ff0a9cb 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 18e6a4e..c8a68a1 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index a6c0aea..b075231 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4f22116..4249c04 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -11,13 +11,7 @@
  *-
  */
 
-/*
- * We have to use postgres.h not postgres_fe.h here, because there's so much
- * backend-only stuff in the XLOG include files we need.  But we need a
- * frontend-ish environment otherwise.	Hence this ugly hack.
- */
-#define FRONTEND 1
-#include "postgres.h"
+#include "postgres_fe.h"
 #include "libpq-fe.h"
 
 #include 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 5452483..7f6121a 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -12,13 +12,7 @@
  *-
  */
 
-/*
- * We have to use postgres.h not postgres_fe.h here, because there's so much
- * backend-only stuff in the XLOG include files we need.  But we need a
- * frontend-ish environment otherwise.	Hence this ugly hack.
- */
-#define FRONTEND 1
-#include "postgres.h"
+#include "postgres_fe.h"
 #include "libpq-fe.h"
 #in

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-10 Thread Tomas Vondra


Dne 10.12.2012 16:38, Andres Freund napsal:

On 2012-12-08 17:07:38 +0100, Tomas Vondra wrote:

I've done some test and yes - once there are other objects the
optimization falls short. For example for tables with one index, it
looks like this:

  1) unpatched

  one by one:  28.9 s
  100 batches: 23.9 s

  2) patched

  one by one:  44.1 s
  100 batches:  4.7 s

So the patched code is by about 50% slower, but this difference 
quickly

disappears with the number of indexes / toast tables etc.

I see this as an argument AGAINST such special-case optimization. My
reasoning is this:

* This difference is significant only if you're dropping a table 
with
  low number of indexes / toast tables. In reality this is not going 
to

  be very frequent.

* If you're dropping a single table, it really does not matter - the
  difference will be like 100 ms vs. 200 ms or something like that.


I don't particularly buy that argument. There are good reasons (like
avoiding deadlocks, long transactions) to drop multiple tables
in individual transactions.
Not that I have a good plan to how to work around that though :(


Yeah, if you need to drop the tables one by one for some reason, you
can't get rid of the overhead this way :-(

OTOH in the example above the overhead is ~50%, i.e. 1.5ms / table with 
a
single index. Each such associated relation (index, TOAST table, ...) 
means

a relation that needs to be dropped and on my machine, once I reach ~5
relations there's almost no difference as the overhead is balanced by 
the

gains.

Not sure how to fix that in an elegant way, though :-(

Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication optimization: page writes only at the slave

2012-12-10 Thread Hannu Krosing


On 12/10/2012 04:56 PM, Xin Pan wrote:

Assumption: I have enough memory to cache all the database pages.
Goal:
Master never write pages. Slave replays logs from master and writes 
pages.

Benefits:
Reduce the page IO overhead at master, save money in EC2 cloud.


I have suggested something similar on table-by-table basis but this has 
not yet

generated much traction. I'll come back to this in coming weeks

For whole WAL you can achieve this by putting WAL on a large-enough 
ramdrive.


Hannu



Question:
Can you give me some comments on this idea?
And I cannot turn of page writes in Postgresql.

I adjust the following parameters:
shared_buffers = 3GB

bgwriter_delay = 2000ms # 10-1ms between rounds
bgwriter_lru_maxpages = 0   # 0-1000 max buffers 
written/round
bgwriter_lru_multiplier = 0 # 0-10.0 multipler on buffers 
scanned/round


checkpoint_segments = 256# in logfile segments, min 1, 
16MB each

checkpoint_timeout = 1h  # range 30s-1h
checkpoint_completion_target = 1.0  # checkpoint target duration, 
0.0 - 1.0


However, I still witness large amount of page writes.
Can anyone tell where are the page writes come from?
Can I turn off that part of page writes by configuration?
If not, which part of source code should I adjust to achieve my goal 
(turn of page writes)?



Thanks!

Xin







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-10 Thread Heikki Linnakangas


On 10.12.2012 13:50, Heikki Linnakangas wrote:

So I'd say we should update minRecoveryPoint first, then
truncate/delete. But we should still keep the XLogFlush() at the end of
xact_redo_commit_internal(), for the case where files/directories are
created. Patch attached.


Committed and backpatched that. Attached is a script I used to reproduce 
this problem, going back to 8.4.


- Heikki


minrecoverypoint-recipe.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] replication optimization: page writes only at the slave

2012-12-10 Thread Xin Pan


Assumption: I have enough memory to cache all the database pages.
Goal:
Master never write pages. Slave replays logs from master and writes pages.
Benefits:
Reduce the page IO overhead at master, save money in EC2 cloud.

Question:
Can you give me some comments on this idea?
And I cannot turn of page writes in Postgresql.

I adjust the following parameters:
shared_buffers = 3GB

bgwriter_delay = 2000ms # 10-1ms between rounds
bgwriter_lru_maxpages = 0   # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 0 # 0-10.0 multipler on buffers 
scanned/round


checkpoint_segments = 256# in logfile segments, min 1, 
16MB each

checkpoint_timeout = 1h  # range 30s-1h
checkpoint_completion_target = 1.0  # checkpoint target duration, 
0.0 - 1.0


However, I still witness large amount of page writes.
Can anyone tell where are the page writes come from?
Can I turn off that part of page writes by configuration?
If not, which part of source code should I adjust to achieve my goal 
(turn of page writes)?



Thanks!

Xin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Tom Lane

Michael Paquier  writes:
> On 2012/12/10, at 18:28, Simon Riggs  wrote:
>> If I have to choose between (1) keeping the same name OR (2) avoiding
>> an AccessExclusiveLock then I would choose (2). Most other people
>> would also, especially when all we would do is add/remove an
>> underscore. Even if that is user visible. And if it is we can support
>> a LOCK option that does (1) instead.

> Ok. Removing the switch name part is only deleting 10 lines of code in 
> index_concurrent_swap.
> Then, do you guys have a preferred format for the concurrent index name? For 
> the time being an inelegant _cct suffix is used. The underscore at the end?

You still need to avoid conflicting name assignments, so my
recommendation would really be to use the select-a-new-name code already
in use for CREATE INDEX without an index name.  The underscore idea is
cute, but I doubt it's worth the effort to implement, document, or
explain it in a way that copes with repeated REINDEXes and conflicts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-10 Thread Andres Freund

On 2012-12-08 17:07:38 +0100, Tomas Vondra wrote:
> On 8.12.2012 15:49, Tomas Vondra wrote:
> > On 8.12.2012 15:26, Andres Freund wrote:
> >> On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
> >>> I've re-run the tests with the current patch on my home workstation, and
> >>> the results are these (again 10k tables, dropped either one-by-one or in
> >>> batches of 100).
> >>>
> >>> 1) unpatched
> >>>
> >>> dropping one-by-one:15.5 seconds
> >>> dropping in batches of 100: 12.3 sec
> >>>
> >>> 2) patched (v3.1)
> >>>
> >>> dropping one-by-one:32.8 seconds
> >>> dropping in batches of 100:  3.0 sec
> >>>
> >>> The problem here is that when dropping the tables one-by-one, the
> >>> bsearch overhead is tremendous and significantly increases the runtime.
> >>> I've done a simple check (if dropping a single table, use the original
> >>> simple comparison) and I got this:
> >>>
> >>> 3) patched (v3.2)
> >>>
> >>> dropping one-by-one:16.0 seconds
> >>> dropping in batches of 100:  3.3 sec
> >>>
> >>> i.e. the best of both. But it seems like an unnecessary complexity to me
> >>> - if you need to drop a lot of tables you'll probably do that in a
> >>> transaction anyway.
> >>>
> >>
> >> Imo that's still a pretty bad performance difference. And your
> >> single-table optimization will probably fall short as soon as the table
> >> has indexes and/or a toast table...
> >
> > Why? I haven't checked the code but either those objects are droppped
> > one-by-one (which seems unlikely) or they're added to the pending list
> > and then the new code will kick in, making it actually faster.
> >
> > Yes, the original code might be faster even for 2 or 3 objects, but I
> > can't think of a simple solution to fix this, given that it really
> > depends on CPU/RAM speed and shared_buffers size.
>
> I've done some test and yes - once there are other objects the
> optimization falls short. For example for tables with one index, it
> looks like this:
>
>   1) unpatched
>
>   one by one:  28.9 s
>   100 batches: 23.9 s
>
>   2) patched
>
>   one by one:  44.1 s
>   100 batches:  4.7 s
>
> So the patched code is by about 50% slower, but this difference quickly
> disappears with the number of indexes / toast tables etc.
>
> I see this as an argument AGAINST such special-case optimization. My
> reasoning is this:
>
> * This difference is significant only if you're dropping a table with
>   low number of indexes / toast tables. In reality this is not going to
>   be very frequent.
>
> * If you're dropping a single table, it really does not matter - the
>   difference will be like 100 ms vs. 200 ms or something like that.

I don't particularly buy that argument. There are good reasons (like
avoiding deadlocks, long transactions) to drop multiple tables
in individual transactions.
Not that I have a good plan to how to work around that though :(

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest #3 and upcoming schedule

2012-12-10 Thread Andres Freund

On 2012-12-10 09:22:25 +0530, Amit Kapila wrote:
> On Sunday, December 09, 2012 9:27 PM Simon Riggs
> > On 16 November 2012 07:20, Greg Smith  wrote:
> >
> >
> > Let's bring balance to the situation through our own actions. Please
> > review one patch now for every one you submitted.
>
> In CF-3, I am Author of 5 and Reviewer of 5
>
> 3 of my patches as Author have been moved from CF-2

You're not alone in that ;)

> 4 of the patches where I am reviewer have been moved from CF-2
>
> One of my Patch : Patch for option in pg_resetxlog for restore from WAL
> files
> is dependent on another patch XLogReader, so I am expecting to get it done
> only after XLogReader.

Btw, I posted the current version of this at:
http://archives.postgresql.org/message-id/20121204175212.GB12055%40awork2.anarazel.de

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Andres Freund

On 2012-12-10 15:51:40 +0100, Andres Freund wrote:
> On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
> > I have updated the patch (v4) to take care of updating reltoastidxid for
> > toast parent relations at the swap step by using index_update_stats. In
> > prior versions of the patch this was done when concurrent index was built,
> > leading to toast relations using invalid indexes if there was a failure
> > before the swap phase. The update of reltoastidxids of toast relation is
> > done with RowExclusiveLock.
> > I also added a couple of tests in src/test/isolation. Btw, as for the time
> > being the swap step uses AccessExclusiveLock to switch old and new
> > relnames, it does not have any meaning to run them...
>
> Btw, as an example of the problems caused by renaming:
> Looking at the patch for a bit now.

Some review comments:

* Some of the added !is_reindex in index_create don't seem safe to
  me. Why do we now support reindexing exlusion constraints?

* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
  concurrent reindexing for user-tables and non-concurrent for system
  tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...

* ISTM index_concurrent_swap should get exlusive locks on the relation
  *before* printing their names. This shouldn't be required because we
  have a lock prohibiting schema changes on the parent table, but it
  feels safer.

* temporary index names during swapping should also be named via
  ChooseIndexName

* why does create_toast_table pass an unconditional 'is_reindex' to
  index_create?

* would be nice (but thats probably a step #2 thing) to do the
  individual steps of concurrent reindex over multiple relations to
  avoid too much overall waiting for other transactions.

* ReindexConcurrentIndexes:

  * says " Such indexes are simply bypassed if caller has not specified
anything." but ERROR's. Imo ERROR is fine, but the comment should be
adjusted...

  * should perhaps be names ReindexIndexesConcurrently?

  * Imo the PHASE 1 comment should be after gathering/validitating the
chosen indexes

  * It seems better to me to do use individual transactions + snapshots
for each index, no need to keep very long transactions open (PHASE
2/3)

  * s/same whing/same thing/

  * Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
5 as well?

  * PHASE 6 should acquire exlusive locks on the indexes

* can some of index_concurrent_* infrastructure be reused for
  DROP INDEX CONCURRENTLY?

* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
  object name, should we keep that conventioN?

Thats all I have for now.

Very nice work! Imo the code looks cleaner after your patch...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Andres Freund

On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
> I have updated the patch (v4) to take care of updating reltoastidxid for
> toast parent relations at the swap step by using index_update_stats. In
> prior versions of the patch this was done when concurrent index was built,
> leading to toast relations using invalid indexes if there was a failure
> before the swap phase. The update of reltoastidxids of toast relation is
> done with RowExclusiveLock.
> I also added a couple of tests in src/test/isolation. Btw, as for the time
> being the swap step uses AccessExclusiveLock to switch old and new
> relnames, it does not have any meaning to run them...

Btw, as an example of the problems caused by renaming:

postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
serial primary key, a_id int REFERENCES a);
CREATE TABLE
Time: 137.840 ms
CREATE TABLE
Time: 143.500 ms
postgres=# \d b
 Table "public.b"
 Column |  Type   |   Modifiers
+-+
 id | integer | not null default nextval('b_id_seq'::regclass)
 a_id   | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)

postgres=# REINDEX TABLE a CONCURRENTLY;
NOTICE:  drop cascades to constraint b_a_id_fkey on table b
REINDEX
Time: 248.992 ms
postgres=# \d b
 Table "public.b"
 Column |  Type   |   Modifiers
+-+
 id | integer | not null default nextval('b_id_seq'::regclass)
 a_id   | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)


Looking at the patch for a bit now.

Regards,

Andres
--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3

2012-12-10 Thread Andres Freund

Hi,

On 2012-11-19 09:50:30 +0100, Andres Freund wrote:
> On 2012-11-19 16:28:55 +0900, Michael Paquier wrote:
> > After launching some SQLs, the logical receiver is stuck just after sending
> > INIT_LOGICAL_REPLICATION, please see bt of process waiting:
>
> Its waiting till it sees initial an initial xl_running_xacts record. The
> easiest way to do that is manually issue a checkpoint. Sorry, should
> have included that in the description.
> Otherwise you can wait till the next routine checkpoint comes arround...
>
> I plan to cause more xl_running_xacts record to be logged in the
> future. I think the timing of those currently is non-optimal, you have
> the same problem as here in normal streaming replication as well :(

This is "fixed" now with the changes I pushed a second ago. Unless some
longrunning transactions are arround we now immediately jump to a ready
state.
This is achieved by
1. regularly logging xl_running_xacts (in background writer)
2. logging xl_runnign_xacts at the beginning of INIT_LOGICAL_REPLICATION

This also has the advantage that the xmin horizon can be increased much
more frequently.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Stephen Frost

* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Dec 07, 2012 at 06:51:18PM -0500, Stephen Frost wrote:
> > Now, what I've honestly been hoping for on this thread was for someone
> > to speak up and point out why I'm wrong about this concern and explain
> > how this patch addresses that issue.  If that's been done, I've missed
> > it..

[...]

So, apparently I'm not wrong about my concern, but no one seems to share
my feelings on this change.

I continue to hold that this could end up being a slippery slope for us
to go down wrt 'correctness' vs. 'do whatever the user wants'.  If we
keep this to only COPY and where the table has to be truncated/created
in the same transaction (which requires the user to have sufficient
privileges to do non-MVCC-safe things on the table already), perhaps
it's alright.  It'll definitely reduce the interest in finding a real
solution though, which is unfortunate.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-10 Thread Stephen Frost

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> Agreed, but that is also be a silent change of current behaviour.

Sure, proper MVCC for catalog entries would be a change, but I think
it's one which would actually reduce the surprises for our users.  Today
we tell people we have transactional DDL, and that's somewhat true, but
it's not MVCC-safe and that can lead to surprises.

> But the above will only work for CREATE TABLE, not for TRUNCATE.

I'm trying to figure out why there are all the constraints around this
command to begin with.  If we're going to support this, why do we
require the user to create or truncate the table in the same
transaction?  Clearly that's going to reduce the usefulness of this
feature and it's not clear to me why that constraint is needed,
code-wise.

Also, what about adding FREEZE options to INSERT and maybe even UPDATE?
Surely it would reduce the overhead associated with those commands as
well.

> I've invested a lot of time in trying to improve the situation and
> investigated many routes forwards, none of them very satisfying. Until
> someone comes up with a better plan, FREEZE is a pragmatic way
> forwards that improves things now rather than waiting for the perfect
> solution.

I agree that the perfect can sometimes be the enemy of the good, but I
still feel that this is quite a slippery slope that's going to end up
getting ourselves into trouble- regardless of how much we document it or
set up constraints around it.

> And if we want checksums anytime soon we need ways to
> ameliorate the effect of hints on checksums, which this does,
> soemwhat.

I don't buy into this argument in the least.

> Better plans, with code, welcome.

While I appreciate the mentality that those-who-code-win, I don't
believe that it can be used in an argument based on *correctness*.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Switching timeline over streaming replication

2012-12-10 Thread Amit Kapila

> From: Heikki Linnakangas [mailto:hlinnakan...@vmware.com]
> Sent: Friday, December 07, 2012 9:22 PM
> To: Amit Kapila
> Cc: 'PostgreSQL-development'; 'Thom Brown'
> Subject: Re: [HACKERS] Switching timeline over streaming replication
> 
> On 06.12.2012 15:39, Amit Kapila wrote:
> > On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
> >> On 05.12.2012 14:32, Amit Kapila wrote:
> >>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
>  After some diversions to fix bugs and refactor existing code, I've
>  committed a couple of small parts of this patch, which just add
>  some sanity checks to notice incorrect PITR scenarios. Here's a new
>  version of the main patch based on current HEAD.
> >>>
> >>> After testing with the new patch, the following problems are
> observed.
> >>>
> >>> Defect - 1:
> >>>
> >>>   1. start primary A
> >>>   2. start standby B following A
> >>>   3. start cascade standby C following B.
> >>>   4. start another standby D following C.
> >>>   5. Promote standby B.
> >>>   6. After successful time line switch in cascade standby C&
> D,
> >> stop D.
> >>>   7. Restart D, Startup is successful and connecting to standby
> C.
> >>>   8. Stop C.
> >>>   9. Restart C, startup is failing.
> >>
> >> Ok, the error I get in that scenario is:
> >>
> >> C 2012-12-05 19:55:43.840 EET 9283 FATAL:  requested timeline 2 does
> >> not contain minimum recovery point 0/3023F08 on timeline 1 C
> >> 2012-12-05
> >> 19:55:43.841 EET 9282 LOG:  startup process (PID 9283) exited with
> >> exit code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG:  aborting startup
> >> due to startup process failure
> >>
> >
> >>
> Well, it seems wrong for the control file to contain a situation like
> this:
> 
> pg_control version number:932
> Catalog version number:   201211281
> Database system identifier:   5819228770976387006
> Database cluster state:   shut down in recovery
> pg_control last modified: pe  7. joulukuuta 2012 17.39.57
> Latest checkpoint location:   0/3023EA8
> Prior checkpoint location:0/260
> Latest checkpoint's REDO location:0/3023EA8
> Latest checkpoint's REDO WAL file:00020003
> Latest checkpoint's TimeLineID:   2
> ...
> Time of latest checkpoint:pe  7. joulukuuta 2012 17.39.49
> Min recovery ending location: 0/3023F08
> Min recovery ending loc's timeline:   1
> 
> Note the latest checkpoint location and its TimelineID, and compare them
> with the min recovery ending location. The min recovery ending location
> is ahead of latest checkpoint's location; the min recovery ending
> location actually points to the end of the checkpoint record. But how
> come the min recovery ending location's timeline is 1, while the
> checkpoint record's timeline is 2.
> 
> Now maybe that would happen to work if remove the sanity check, but it
> still seems horribly confusing. I'm afraid that discrepancy will come
> back to haunt us later if we leave it like that. So I'd like to fix
> that.
> 
> Mulling over this for some more, I propose the attached patch. With the
> patch, we peek into the checkpoint record, and actually perform the
> timeline switch (by changing ThisTimeLineID) before replaying it. That
> way the checkpoint record is really considered to be on the new timeline
> for all purposes. At the moment, the only difference that makes in
> practice is that we set replayEndTLI, and thus minRecoveryPointTLI, to
> the new TLI, but it feels logically more correct to do it that way.

This has fixed both the problems reported in below link:
http://archives.postgresql.org/pgsql-hackers/2012-12/msg00267.php

The code is also fine.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2012-12-10 Thread Amit Kapila

On Monday, December 10, 2012 2:41 PM Kyotaro HORIGUCHI wrote:
> Thank you.
> 
> > >heap_attr_get_length_and_check_equals:
> ..
> > >- This function returns always false for attrnum <= 0 as whole
> > >  tuple or some system attrs comparison regardless of the real
> > >  result, which is a bit different from the anticipation which
> > >  the name gives. If you need to keep this optimization, it
> > >  should have the name more specific to the purpose.
> >
> > The heap_attr_get_length_and_check_equals function is similar to
> heap_tuple_attr_equals,
> > the attrnum <= 0 check is required for heap_tuple_attr_equals.
> 
> Sorry, you're right.
> 
> > >haap_delta_encode:
> > >
> > >- Some misleading variable names (like match_not_found),
> > >  some reatitions of similiar codelets (att_align_pointer,
> pglz_out_tag),
> > >  misleading slight difference of the meanings of variables of
> > >  similar names(old_off and new_off and the similar pairs),
> > >  and bit tricky use of pglz_out_add and pglz_out_tag with length =
> 0.
> > >
> > >  These are welcome to be modified for better readability.
> >
> > The variable names are modified, please check them once.
> >
> > The (att_align_pointer, pglz_out_tag) repetition code is added to take
> care of padding only incase of values are equal.
> > Use of pglz_out_add and pglz_out_tag with length = 0 is done because
> of code readability.
> 
> Oops! Sorry for mistake. My point was that the bases for old_off
> (of match_off) and dp, not new_off. It is no unnatural. Namings
> had not been the problem and the function was perfect as of the
> last patch. 

I think new naming I have done are more meaningful, do you think I should
revert to previous patch one's.

> I'd been confised by the asymmetry between match_off
> to pglz_out_tag and dp to pglz_out_add.

If we see the usage of pglz_out_tag and pglz_out_literal in pglz_compress(),
it is same as I have used.

 
> > Another change is also done to handle the history size of 2 bytes
> which is possible with the usage of LZ macro's for delta encoding.
> 
> Good catch. This seems to have been a potential bug which does no
> harm when called from pglz_compress..
> 
> ==
> 
> Looking into wal_update_changes_mod_lz_v6.patch, I understand
> that this patch experimentally adds literal data segment which
> have more than single byte in PG-LZ algorithm.  According to
> pglz_find_match, memCMP is slower than 'while(*s && *s == *d)' if
> len < 16 and I suppose it is probably true at least for 4 byte
> length data. This is also applied on encoding side. If this mod
> does no harm to performance, I want to see this applied also to
> pglz_comress.

Where in pglz_comress(),  do you want to see similar usage?
Or do you want to see such use in function
heap_attr_get_length_and_check_equals(), where it compares 2 attributes.



>  By the way, the comment on pg_lzcompress.c:690 seems to quite
> differ from what the code does.

I shall fix this.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-10 Thread Heikki Linnakangas


On 10.12.2012 03:52, Kyotaro HORIGUCHI wrote:

I think that minRecoveryPoint should be updated before the data-file
changes, so XLogFlush() should be called before smgrtruncate(). No?


Mmm.. As far as I saw in xact_redo_commit_internal, XLogFlush
seems should be done AFTER redo substantial operation. Is it
wrong? If so, I suppose xact_redo_commit_internal could shold be
fixed in the same way.


So, two options:

1. Redo truncation, then XLogFlush()

There's a window where the original problem could still occur: The file 
is truncated, and you crash before XLogFlush() finishes.


2. XLogFlush(), then redo truncation.

If the truncation fails for some reason (disk failure?) and you have 
already updated minRecoveryPoint, you can not start up until you somehow 
fix the problem so that the truncation can succeed, and then finish 
recovery.


Both options have their merits. The window in 1. is very small, and in 
2., the probability that an unlink() or truncation fails is very small 
as well.


We've chosen 1. in xact_redo_commit_internal(), but I don't think that 
was a carefully made decision, it just imitates what happens in the 
corresponding non-redo commit path. In xact_redo_commit_internal(), it 
makes sense to do XLogFlush() afterwards, for CREATE DATABASE and CREATE 
TABLESPACE. Those create files, and if you e.g run out of disk space, or 
non-existent path, you don't want minRecoveryPoint to be updated yet. 
Otherwise you can no longer recover to the point just before the 
creation of those files. But in case of DROP DATABASE, you have the 
exact same situation as with truncation: if you have already deleted 
some files, you must not be able to stop recovery at a point before that.


So I'd say we should update minRecoveryPoint first, then 
truncate/delete. But we should still keep the XLogFlush() at the end of 
xact_redo_commit_internal(), for the case where files/directories are 
created. Patch attached.


- Heikki
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***
*** 4617,4639  xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn,
  	}
  
  	/* Make sure files supposed to be dropped are dropped */
! 	for (i = 0; i < nrels; i++)
  	{
! 		SMgrRelation srel = smgropen(xnodes[i], InvalidBackendId);
! 		ForkNumber	fork;
  
! 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
! 			XLogDropRelation(xnodes[i], fork);
! 		smgrdounlink(srel, true);
! 		smgrclose(srel);
  	}
  
  	/*
  	 * We issue an XLogFlush() for the same reason we emit ForceSyncCommit()
! 	 * in normal operation. For example, in DROP DATABASE, we delete all the
! 	 * files belonging to the database, and then commit the transaction. If we
! 	 * crash after all the files have been deleted but before the commit, you
! 	 * have an entry in pg_database without any files. To minimize the window
  	 * for that, we use ForceSyncCommit() to rush the commit record to disk as
  	 * quick as possible. We have the same window during recovery, and forcing
  	 * an XLogFlush() (which updates minRecoveryPoint during recovery) helps
--- 4617,4660 
  	}
  
  	/* Make sure files supposed to be dropped are dropped */
! 	if (nrels > 0)
  	{
! 		/*
! 		 * First update minimum recovery point to cover this WAL record. Once
! 		 * a relation is deleted, there's no going back. The buffer manager
! 		 * enforces the WAL-first rule for normal updates to relation files,
! 		 * so that the minimum recovery point is always updated before the
! 		 * corresponding change in the data file is flushed to disk, but we
! 		 * have to do the same here since we're bypassing the buffer manager.
! 		 *
! 		 * Doing this before the deleting the files means that if a deletion
! 		 * fails for some reason, you cannot start up the system even after
! 		 * restart, until you fix the underlying situation so that the
! 		 * deletion will succeed. Alternatively, we could update the minimum
! 		 * recovery point after deletion, but that would leave a small window
! 		 * where the WAL-first rule would be violated.
! 		 */
! 		XLogFlush(lsn);
  
! 		for (i = 0; i < nrels; i++)
! 		{
! 			SMgrRelation srel = smgropen(xnodes[i], InvalidBackendId);
! 			ForkNumber	fork;
! 
! 			for (fork = 0; fork <= MAX_FORKNUM; fork++)
! XLogDropRelation(xnodes[i], fork);
! 			smgrdounlink(srel, true);
! 			smgrclose(srel);
! 		}
  	}
  
  	/*
  	 * We issue an XLogFlush() for the same reason we emit ForceSyncCommit()
! 	 * in normal operation. For example, in CREATE DATABASE, we copy all the
! 	 * files from the template database, and then commit the transaction. If we
! 	 * crash after all the files have been copied but before the commit, you
! 	 * have files in the data directory without an entry in pg_database.
! 	 * To minimize the window
  	 * for that, we use ForceSyncCommit() to rush the commit record to disk as
  	 * quick as possible. We have the same window during recovery, and forcing
  	 * an XLogFlush() (which updates minRecoveryPoi

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Michael Paquier



--
Michael Paquier
http://michael.otacoo.com

On 2012/12/10, at 18:28, Simon Riggs  wrote:

> On 10 December 2012 06:03, Michael Paquier  wrote:
>>> On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
 Andres Freund  writes:
 I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
 preserve the index name exactly.  Something like adding or removing
 trailing underscores would probably serve to generate a nonconflicting
 name that's not too unsightly.  Or just generate a new name using the
 same rules that CREATE INDEX would when no name is specified.  Yeah,
 it's a hack, but what about the CONCURRENTLY commands isn't a hack?
>>> 
>>> I have no problem with ending up with a new name or something like
>>> that. If that is what it takes: fine, no problem.
>> 
>> For the indexes that are created internally by the system like toast or
>> internal primary keys this is acceptable. However in the case of indexes
>> that have been created externally I do not think it is acceptable as this
>> impacts the user that created those indexes with a specific name.
> 
> If I have to choose between (1) keeping the same name OR (2) avoiding
> an AccessExclusiveLock then I would choose (2). Most other people
> would also, especially when all we would do is add/remove an
> underscore. Even if that is user visible. And if it is we can support
> a LOCK option that does (1) instead.
> 
> If we make it an additional constraint on naming, it won't be a
> problem... namely that you can't create an index with/without an
> underscore at the end, if a similar index already exists that has an
> identical name apart from the suffix.
> 
> There are few, if any, commands that need the index name to remain the
> same. For those, I think we can bend them to accept the index name and
> then add/remove the underscore to get that to work.
> 
> That's all a little bit crappy, but this is too small a problem with
> an important feature to allow us to skip.
Ok. Removing the switch name part is only deleting 10 lines of code in 
index_concurrent_swap.
Then, do you guys have a preferred format for the concurrent index name? For 
the time being an inelegant _cct suffix is used. The underscore at the end?

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-12-10 Thread Simon Riggs

On 10 December 2012 06:03, Michael Paquier  wrote:
>> On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
>> > Andres Freund  writes:
>> > I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>> > preserve the index name exactly.  Something like adding or removing
>> > trailing underscores would probably serve to generate a nonconflicting
>> > name that's not too unsightly.  Or just generate a new name using the
>> > same rules that CREATE INDEX would when no name is specified.  Yeah,
>> > it's a hack, but what about the CONCURRENTLY commands isn't a hack?
>>
>> I have no problem with ending up with a new name or something like
>> that. If that is what it takes: fine, no problem.
>
> For the indexes that are created internally by the system like toast or
> internal primary keys this is acceptable. However in the case of indexes
> that have been created externally I do not think it is acceptable as this
> impacts the user that created those indexes with a specific name.

If I have to choose between (1) keeping the same name OR (2) avoiding
an AccessExclusiveLock then I would choose (2). Most other people
would also, especially when all we would do is add/remove an
underscore. Even if that is user visible. And if it is we can support
a LOCK option that does (1) instead.

If we make it an additional constraint on naming, it won't be a
problem... namely that you can't create an index with/without an
underscore at the end, if a similar index already exists that has an
identical name apart from the suffix.

There are few, if any, commands that need the index name to remain the
same. For those, I think we can bend them to accept the index name and
then add/remove the underscore to get that to work.

That's all a little bit crappy, but this is too small a problem with
an important feature to allow us to skip.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2012-12-10 Thread Vik Reykja

On Sat, Dec 1, 2012 at 1:14 PM, Pavel Stehule wrote:

> Hello
>
> >
> > Hi Pavel.
> >
> > I am trying to review this patch and on my work computer everything
> compiles
> > and tests perfectly. However, on my laptop, the regression tests don't
> pass
> > with "cache lookup failed for type XYZ" where XYZ is some number that
> does
> > not appear to be any type oid.
> >
> > I don't really know where to go from here. I am asking that other people
> try
> > this patch to see if they get errors as well.
> >
>
> yes, I checked it on .x86_64 and I had a same problems
>
> probably there was more than one issue - I had to fix a creating a
> unpacked params and I had a issue with gcc optimalization when I used
> a stack variable for fcinfo.
>
> Now I fixed these issues and I hope  so it will work on all platforms
>

It appears to work a lot better, yes.  I played around with it a little bit
and wasn't able to break it, so I'm marking it as ready for committer.
Some wordsmithing will need to be done on the code comments.

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2012-12-10 Thread Kyotaro HORIGUCHI

Thank you.

> >heap_attr_get_length_and_check_equals: 
..
> >- This function returns always false for attrnum <= 0 as whole 
> >  tuple or some system attrs comparison regardless of the real 
> >  result, which is a bit different from the anticipation which 
> >  the name gives. If you need to keep this optimization, it 
> >  should have the name more specific to the purpose. 
> 
> The heap_attr_get_length_and_check_equals function is similar to 
> heap_tuple_attr_equals, 
> the attrnum <= 0 check is required for heap_tuple_attr_equals. 

Sorry, you're right.

> >haap_delta_encode: 
> > 
> >- Some misleading variable names (like match_not_found), 
> >  some reatitions of similiar codelets (att_align_pointer, pglz_out_tag), 
> >  misleading slight difference of the meanings of variables of 
> >  similar names(old_off and new_off and the similar pairs), 
> >  and bit tricky use of pglz_out_add and pglz_out_tag with length = 0. 
> > 
> >  These are welcome to be modified for better readability. 
> 
> The variable names are modified, please check them once. 
> 
> The (att_align_pointer, pglz_out_tag) repetition code is added to take care 
> of padding only incase of values are equal. 
> Use of pglz_out_add and pglz_out_tag with length = 0 is done because of code 
> readability. 

Oops! Sorry for mistake. My point was that the bases for old_off
(of match_off) and dp, not new_off. It is no unnatural. Namings
had not been the problem and the function was perfect as of the
last patch. I'd been confised by the asymmetry between match_off
to pglz_out_tag and dp to pglz_out_add.

> Another change is also done to handle the history size of 2 bytes which is 
> possible with the usage of LZ macro's for delta encoding.

Good catch. This seems to have been a potential bug which does no
harm when called from pglz_compress..

==

Looking into wal_update_changes_mod_lz_v6.patch, I understand
that this patch experimentally adds literal data segment which
have more than single byte in PG-LZ algorithm.  According to
pglz_find_match, memCMP is slower than 'while(*s && *s == *d)' if
len < 16 and I suppose it is probably true at least for 4 byte
length data. This is also applied on encoding side. If this mod
does no harm to performance, I want to see this applied also to
pglz_comress.

 By the way, the comment on pg_lzcompress.c:690 seems to quite
differ from what the code does.


regards,


*1: 
http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C38285495B0@szxeml509-mbx

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-10 Thread Amit Kapila

Monday, December 10, 2012 7:16 AM Kyotaro HORIGUCHI wrote:
> Thank you.
> 
> > I think moving CheckRecoveryConsistency() after redo apply loop might
> cause
> > a problem.
> > As currently it is done before recoveryStopsHere() function, which can
> allow
> > connections
> > on HOTSTANDY. But now if due to some reason recovery pauses or stops
> due to
> > above function,
> > connections might not be allowed as CheckRecoveryConsistency() is not
> > called.
> 
> It depends on the precise meaning of minRecoveryPoint. I've not
> found the explicit explanation for it.
> 
> Currently minRecoveryPoint is updated only in XLogFlush. Other
> updates of it seems to reset to InvalidXLogRecPtr. XLogFlush
> seems to be called AFTER the redo core operation has been done,
> at least in xact_redo_commit_internal. I shows me that the
> meaning of minRecoveryPoint is that "Just AFTER applying the XLog
> at current LSN, the database files will be assumed to be
> consistent."
> 
> At Mon, 10 Dec 2012 00:36:31 +0900, Fujii Masao 
> wrote in
> 
> > Yes, so we should just add the CheckRecoveryConsistency() call after
> > rm_redo rather than moving it? This issue is related to the old
> discussion:
> > http://archives.postgresql.org/pgsql-bugs/2012-09/msg00101.php
> 
> Since I've not cleary understood the problem of missing it before
> redo, and it also seems to have no harm on performance, I have no
> objection to place it both before and after of redo.

I have tried that way as well, but it didn't completely resolved the problem
reported in above link.
As the situation of defect got arised when it does first time ReadRecord(). 

To resolve the defect mentioned in link by Fujii Masao, actually we need to
check and 
try to reset the backupStartPoint before each ReadRecord.
The reason is that in ReadRecord(), it can go and start waiting for records
from Master.
So now if backupStartPoint is not set and CheckRecoveryConsistency() is not
done, it can keep on waiting
Records from Master and no connections will be allowed on Standby.

I have modified the code of XLogPageRead(...) such that before it calls
WaitForWALToBecomeavailable(), following code will be added

if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) && 

XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) 
{ 
/* 
 * We have reached the end of base
backup, the point where 
 * the minimum recovery point in
pg_control indicates. The 
 * data on disk is now consistent.
Reset backupStartPoint 
 * and backupEndPoint. 
 */ 
elog(DEBUG1, "end of backup
reached"); 

LWLockAcquire(ControlFileLock,
LW_EXCLUSIVE); 

MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr)); 
MemSet(&ControlFile->backupEndPoint,
0, sizeof(XLogRecPtr)); 
ControlFile->backupEndRequired =
false; 
UpdateControlFile(); 

LWLockRelease(ControlFileLock); 
}

CheckRecoveryConsistency();

This had completely resolved the problem reported on above link for me.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

68 matches

Mail list logo