date:20120625

Peter Eisentraut pete...@gmx.net writes:
 On sÃ¶n, 2012-06-24 at 16:05 -0400, Robert Haas wrote:
 +local $SIG{__WARN__} = sub { die $_[0] };

 This seems like a band-aid.

 I'd think of it as a safety net.

+1 for the concept of turning warnings into errors, but is that really
the cleanest, most idiomatic way to do so in Perl?  Sheesh.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

On Monday, June 25, 2012 03:08:51 AM Robert Haas wrote:
 On Sun, Jun 24, 2012 at 5:11 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  There are some interesting problems related to locking and snapshots
  here. Not sure if they are resolvable:
  
  We need to restrict SnapshotNow to represent to the view it had back when
  the wal record were currently decoding had. Otherwise we would possibly
  get wrong column types and similar. As were working in the past locking
  doesn't protect us against much here. I have that (mostly and
  inefficiently).
  
  One interesting problem are table rewrites (truncate, cluster, some ALTER
  TABLE's) and dropping tables. Because we nudge SnapshotNow to the past
  view it had back when the wal record was created we get the old
  relfilenode. Which might have been dropped in part of the transaction
  cleanup...
  With most types thats not a problem. Even things like records and arrays
  aren't problematic. More interesting cases include VACUUM FULL $systable
  (e.g. pg_enum) and vacuum full'ing a table which is used in the *_out
  function of a type (like a user level pg_enum implementation).
  
  The only theoretical way I see against that problem would be to postpone
  all relation unlinks untill everything that could possibly read them has
  finished. Doesn't seem to alluring although it would be needed if we
  ever move more things of SnapshotNow.
  
  Input/Ideas/Opinions?
 
 Yeah, this is slightly nasty.  I'm not sure whether or not there's a
 way to make it work.
Postponing all non-rollback unlinks to the next logical checkpoint is the 
only thing I can think of...

 I had another idea.  Suppose decoding happens directly on the primary,
 because I'm still hoping there's a way to swing that.  Suppose further
 that we handle DDL by insisting that (1) any backend which wants to
 add columns or change the types of existing columns must first wait
 for logical replication to catch up and (2) if a backend which has
 added columns or changed the types of existing columns then writes to
 the modified table, decoding of those writes will be postponed until
 transaction commit.  I think that's enough to guarantee that the
 decoding process can just use the catalogs as they stand, with plain
 old SnapshotNow.
I don't think its that easy. If you e.g. have multiple ALTER's in the same 
transaction interspersed with inserted rows they will all have different 
TupleDesc's.
I don't see how thats resolvable without either replicating ddl to the target 
system or changing what SnapshotNow does...

 The downside of this approach is that it makes certain kinds of DDL
 suck worse if logical replication is in use and behind.  But I don't
 necessarily see that as prohibitive because (1) logical replication
 being behind is likely to suck for a lot of other reasons too and (2)
 adding or retyping columns isn't a terribly frequent operation and
 people already expect a hit when they do it.  Also, I suspect that we
 could find ways to loosen those restrictions at least in common cases
 in some future version; meanwhile, less work now.
Agreed.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

Pavel Golub pa...@microolap.com writes:
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?

It's not that easy to make a 100% backwards compatible view for a system
catalog.  The main sticking point is the OID column; see recent problems
with pg_roles' OID column for an example.  On the whole I don't think
renaming pg_tablespace and inserting a mostly-compatible view would be
a net advance.

More generally, I don't see that this particular incompatibility in
the system catalogs is worse than many others that we've perpetrated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Magnus Hagander

On Mon, Jun 25, 2012 at 3:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Pavel Golub pa...@microolap.com writes:
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?

 It's not that easy to make a 100% backwards compatible view for a system
 catalog.  The main sticking point is the OID column; see recent problems
 with pg_roles' OID column for an example.  On the whole I don't think
 renaming pg_tablespace and inserting a mostly-compatible view would be
 a net advance.

 More generally, I don't see that this particular incompatibility in
 the system catalogs is worse than many others that we've perpetrated.

I'd say it's a lot less bad than some others. At least for this one
you can reasonably connect and figure it out. There was the changes
for database config, I think, which made at least pgadmin break even
before it managed to connect properly. (It got the connection in the
libpq sense, but not in the pgadmin sense).

Bottom line is, any admin tool will *always* have to have to know
about the specific versions and have code to deal with being able to
run different queries on different versions anyway. And this one is
trivial to reimplement with a different query, compared to most
others.

Yes, if we had a set of those stable-system-views that people keep
asking for every now and then, this is one of the few changes that
they *would* actually help with. But there would still be changes that
even those couldn't deal with, because they simply can't know the
future...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
build:

+ pg_upgrade -d
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
-B
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
Performing Consistency Checks
-
Checking current, bin, and data directories ok
Checking cluster versions ok
Some required control information is missing; cannot find:
first log file ID after reset
first log file segment after reset

Cannot continue without required control information, terminating
Failure, exiting

On MacOS X, on latest sources, initdb fails:

creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... FATAL: control file contains invalid data
child process exited with exit code 1
initdb: data directory
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data not removed at
user's request

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

Peter Geoghegan pe...@2ndquadrant.com writes:
 On 22 June 2012 01:04, Tom Lane t...@sss.pgh.pa.us wrote:
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.

 So those three don't have medium to high degrees of compatibility with GCC?

Uh, they all compile C, so perforce they have reasonable degrees of
compatibility with gcc.  That doesn't mean they implement gcc's
nonstandard extensions.

 We ought to have more IMO, because software monocultures are
 dangerous.  Of
 those three, two pass the quiet inline test and one --- the newest of the 
 three
 if I guess correctly --- does not.  So it is not the case that
 !USE_INLINE is dead code, even if you adopt the position that we don't
 care about any compiler not represented in the buildfarm.

 I note that you said that it doesn't pass the quiet inline test, and
 not that it doesn't support inline functions.

What's your point?  If the compiler isn't implementing inline the same
way gcc does, we can't use the same inlining arrangements.  I will be
the first to agree that C99's definition of inline sucks, but that
doesn't mean we can assume that gcc's version is implemented everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.

 Should there be no other trick - I think there is though - we could just 
 specify -W2177 as an alternative parameter to test in the 'quiet static 
 inline' test.

What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
compilers.

 I definitely do not want to bar any sensible compiler from compiling postgres
 but the keyword here is 'sensible'. If it requires some modest force/trickery
 to behave sensible, thats ok, but if we need to ship around huge unreadable 
 crufty macros just to support them I don't find it ok.

So you propose to define any compiler that strictly implements C99 as
not sensible and not one that will be able to compile Postgres?  I do
not think that's acceptable.  I have no problem with producing better
code on gcc than elsewhere (as we already do), but being flat out broken
for compilers that don't match gcc's interpretation of inline is not
good enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Thom Brown

On 25 June 2012 13:11, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
 build:

 + pg_upgrade -d
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 -B
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 Performing Consistency Checks
 -
 Checking current, bin, and data directories                 ok
 Checking cluster versions                                   ok
 Some required control information is missing;  cannot find:
  first log file ID after reset
  first log file segment after reset

 Cannot continue without required control information, terminating
 Failure, exiting

I get precisely the same on 64-bit Linux.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler

On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:

 +1 for the concept of turning warnings into errors, but is that really
 the cleanest, most idiomatic way to do so in Perl?  Sheesh.

It’s the most backward-compatible, but the most idiomatic way to do it 
lexically is:

use warnings 'FATAL';

However, that works only for the current lexical scope. If there are warnings 
in the code you are calling from the current scope, the use of `local 
$SIG{__WARN__}` is required.

HTH,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund and...@2ndquadrant.com wrote:
  The only theoretical way I see against that problem would be to postpone
  all relation unlinks untill everything that could possibly read them has
  finished. Doesn't seem to alluring although it would be needed if we
  ever move more things of SnapshotNow.
 
  Input/Ideas/Opinions?

 Yeah, this is slightly nasty.  I'm not sure whether or not there's a
 way to make it work.
 Postponing all non-rollback unlinks to the next logical checkpoint is the
 only thing I can think of...

There are a number of cool things we could do if we postponed unlinks.
 Like, why can't we allow concurrent read-only queries while a CLUSTER
operation is in progress?  Well, two reasons.  The first is that we
currently can't do ANY DDL with less than a full table lock because of
SnapshotNow-related race conditions.  The second is that people might
still need to look at the old heap after the CLUSTER transaction
commits.  Some kind of delayed unlink facility where we
garbage-collect relation backing files when their refcount falls to
zero would solve the second problem - not that that's any help by
itself without a solution to the first one, but hey.

 I had another idea.  Suppose decoding happens directly on the primary,
 because I'm still hoping there's a way to swing that.  Suppose further
 that we handle DDL by insisting that (1) any backend which wants to
 add columns or change the types of existing columns must first wait
 for logical replication to catch up and (2) if a backend which has
 added columns or changed the types of existing columns then writes to
 the modified table, decoding of those writes will be postponed until
 transaction commit.  I think that's enough to guarantee that the
 decoding process can just use the catalogs as they stand, with plain
 old SnapshotNow.
 I don't think its that easy. If you e.g. have multiple ALTER's in the same
 transaction interspersed with inserted rows they will all have different
 TupleDesc's.

If new columns were added, then tuples created with those older
tuple-descriptors can still be interpreted with the latest
tuple-descriptor.

Columns that are dropped or retyped are a little trickier, but
honestly... how much do we care about those cases?  How practical is
it to suppose we're going to be able to handle them sanely anyway?
Suppose that the user defines a type which works just like int4 except
that the output functions writes out each number in pig latin (and the
input function parses pig latin).  The user defines the types as
binary coercible to each other and then does ALTER TABLE on a large
table with an int4 column, transforming it into an int4piglatin
column.  Due to Noah Misch's fine work, we will conclude that no table
rewrite is needed.  But if logical replication is in use, then in
theory we should scan the whole table and generate an LCR for each row
saying the row with primary key X was updated, and column Y, which
used to contain 42, now contains ourty-two-fay.  Otherwise, if we're
doing heterogenous replication into a system that just stores that
column as text, it'll end up with the wrong contents.  On the other
hand, if we're trying to ship data to another PostgreSQL instance
where the column hasn't yet been updated, then all of those LCRs are
just going to error out when we try to apply them.

A more realistic scenario where you have the same problem is with
something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
column with a default in a single step (as opposed to first adding the
column and then setting its default), we rewrite the table and set
every row to the default value.  Should that generate LCRs showing
every row being updated to add that new value, or should we generate
no LCRs and assume that the DBA will independently do the same
operation on the remote side?  Either answer could be correct,
depending on how the LCRs are being used.  If you're just rewriting
with a constant default, then perhaps the sensible thing is to
generate no LCRs, since it will be more efficient to mimic the
operation on the remote side than to replay the changes row-by-row.
But what if the default isn't a constant, like maybe it's
nextval('new_synthetic_pkey_seq') or even something like now().  In
those cases, it seems quite likely that if you don't generate LCRs,
manual user intervention will be required to get things back on track.
 On the other hand, if you do generate LCRs, the remote side will
become horribly bloated on replay, unless the LCRs also instruct the
far side that they should be applied via a full-table rewrite.

Can we just agree to punt all this complexity for version 1 (and maybe
versions 2, 3, and 4)?  I'm not sure what Slony does in situations
like this but I bet for a lot of replication systems, the answer is
do a full resync.  In other words, we either forbid the operation
outright when the table is enabled for logical replication, or else we
emit an LCR that

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
  This is nonsense.  There are at least three buildfarm machines running
  compilers that do not pretend to be gcc (at least, configure
  recognizes them as not gcc) and are not MSVC either.
  
  Should there be no other trick - I think there is though - we could just
  specify -W2177 as an alternative parameter to test in the 'quiet static
  inline' test.
 What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
 compilers.
HP-UX/aCC, the only compiler in the buildfarm I found that seems to fall short 
in the quiet inline test.

MSVC seems to work fine with in supported versions, USE_INLINE is defined 
these days.

  I definitely do not want to bar any sensible compiler from compiling
  postgres but the keyword here is 'sensible'. If it requires some modest
  force/trickery to behave sensible, thats ok, but if we need to ship
  around huge unreadable crufty macros just to support them I don't find
  it ok.
 So you propose to define any compiler that strictly implements C99 as
 not sensible and not one that will be able to compile Postgres?  I do
 not think that's acceptable.  I have no problem with producing better
 code on gcc than elsewhere (as we already do), but being flat out broken
 for compilers that don't match gcc's interpretation of inline is not
 good enough.
I propose to treat any compiler which has no way to get to equivalent 
behaviour as not sensible. Yes. I don't think there really are many of those 
around. As you pointed out there is only one compiler in the buildfarm with 
problems and I think those can be worked around (can't test it yet though, the 
only HP-UX I could get my hands on quickly is at 11.11...).

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 24-06-2012 23:04, Robert Haas wrote:
 So I think we really
 need someone to try this both ways and compare.  Right now it seems
 like we're mostly speculating on how well either approach would work,
 which is not as good as having some experimental results.
 
Not a problem. That's what I'm thinking too but I would like to make sure that
others don't object to general idea. Let me give it a try in both ideas...


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

Robert Haas robertmh...@gmail.com writes:
 On MacOS X, on latest sources, initdb fails:

 creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
 creating subdirectories ... ok
 selecting default max_connections ... 100
 selecting default shared_buffers ... 32MB
 creating configuration files ... ok
 creating template1 database in
 /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
 initializing pg_authid ... ok
 initializing dependencies ... ok
 creating system views ... ok
 loading system objects' descriptions ... ok
 creating collations ... ok
 creating conversions ... ok
 creating dictionaries ... FATAL:  control file contains invalid data
 child process exited with exit code 1

Same for me.  It's crashing here:

if (ControlFile-state  DB_SHUTDOWNED ||
ControlFile-state  DB_IN_PRODUCTION ||
!XRecOffIsValid(ControlFile-checkPoint))
ereport(FATAL,
(errmsg(control file contains invalid data)));

state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
ControlFile-checkPoint == 19972072 (0x130BFE8), what's wrong with that?

(I suppose the reason this is only failing on some machines is
platform-specific variations in xlog entry size, but it's still a bit
distressing that this got committed in such a broken state.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts


Excerpts from David E. Wheeler's message of lun jun 25 11:23:34 -0400 2012:
 On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:
 
  +1 for the concept of turning warnings into errors, but is that really
  the cleanest, most idiomatic way to do so in Perl?  Sheesh.
 
 It’s the most backward-compatible, but the most idiomatic way to do it 
 lexically is:
 
 use warnings 'FATAL';
 
 However, that works only for the current lexical scope. If there are warnings 
 in the code you are calling from the current scope, the use of `local 
 $SIG{__WARN__}` is required.

So lets add 'FATAL' to the already existing use warnings lines in
Catalog.pm and genbki.pl.

I think the other files we should add this to  are generate-errcodes.pl,
generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
Maybe psql/create_help.pl too.

We have a bunch of files in ECPG and MSVC areas and others in src/tools;
not sure about those.

We also have gen_qsort_tuple.pl which amusingly does not even
use warnings.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

On Sat, Jun 23, 2012 at 6:26 PM, Peter Eisentraut pete...@gmx.net wrote:
About the new --maintenance-db options:

Why was this option not added to createuser and dropuser? In the
original discussion[0] they were mentioned, but it apparently never made
it into the code.

Oops. That was an oversight.

I find the name to be unfortunate. For example, I think of running
vacuum as maintenance. So running vacuumdb --maintenance-db=X would
imply that the vacuum maintenance is done on X. In fact, the whole
point of this option is to find out where the maintenance is to be run,
not to run the maintenance. Maybe something like --initial-db would be
better?

As Dave says, I picked this because pgAdmin has long used that terminology.

What is the purpose of these options? The initial discussion was
unclear on this. The documentation contains no explanation of why they
should be used. If we want to really support the case where both
postgres and template1 are removed, an environment variable might be
more useful than requiring this to be typed out for every command.

[0]:
http://archives.postgresql.org/message-id/ca+tgmoacjwsis9nnqjgaaml1vg6c8b6o1ndgqnuca2gm00d...@mail.gmail.com

Well, I would be opposed to having ONLY an environment variable,
because I think that anything that can be controlled via an
environment variable should be able to be overridden on the command
line. It might be OK to have both an environment variable AND a
command-line option, but I tend to thing it's too marginal to justify
that.

In retrospect, it seems as though it might have been a good idea to
make the postgres database read-only and undroppable, so that all
client utilities could count on being able to connect to it and get a
list of databases in the cluster without the need for all this
complexity. Or else having some other way for a client to
authenticate and list out all the available databases. In the absence
of such a mechanism, I don't think we can turn around and say that not
having a postgres database is an unsupported configuration, and
therefore we need some way to cope with it when it happens.

I think the original report that prompted this change was a complaint
that pg_upgrade failed when the postgres database had been dropped.
Now, admittedly, pg_upgrade fails for all kinds of crazy stupid
reasons and the chances of fixing that problem completely any time in
the next 5 years do not seem good, but that's not a reason not to keep
plugging the holes we can. Anyhow, the same commit that introduced
--maintenance-db fixed that problem by making arranging to try both
postgres and template1 before giving up... but have two hard-coded
database names either of which can be dropped or renamed seems only
marginally better than having one, hence the switch. Really, I think
pg_upgrade needs this option too, unless we're going to kill the
problem at its root by providing a reliable way to enumerate database
names without first knowing the name one that you can connect to.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

Andres Freund and...@2ndquadrant.com writes:
 On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
 So you propose to define any compiler that strictly implements C99 as
 not sensible and not one that will be able to compile Postgres?

 I propose to treat any compiler which has no way to get to equivalent 
 behaviour as not sensible. Yes.

Well, my response is no.  I could see saying that we require (some) C99
features at this point, but not features that are in no standard, no
matter how popular gcc might be.

 I don't think there really are many of those 
 around. As you pointed out there is only one compiler in the buildfarm with 
 problems

This just means we don't have a wide enough collection of non-mainstream
machines in the buildfarm.  Deciding to break any platform with a
non-gcc-equivalent compiler isn't going to improve that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler

On Jun 25, 2012, at 5:51 PM, Alvaro Herrera wrote:

 However, that works only for the current lexical scope. If there are 
 warnings in the code you are calling from the current scope, the use of 
 `local $SIG{__WARN__}` is required.
 
 So lets add 'FATAL' to the already existing use warnings lines in
 Catalog.pm and genbki.pl.
 
 I think the other files we should add this to  are generate-errcodes.pl,
 generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
 Maybe psql/create_help.pl too.
 
 We have a bunch of files in ECPG and MSVC areas and others in src/tools;
 not sure about those.
 
 We also have gen_qsort_tuple.pl which amusingly does not even
 use warnings.

Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
Which is annoying. I missed what was warning up-thread, but the most foolproof 
way to make all warnings fatal is the originally suggested

  local $SIG{__WARN__} = sub { die shift };

A *bit* cleaner is to use Carp::croak:

use Carp;
local $SIG{__WARN__} = \croak;

Or if you wanted to get a stack trace out of it, use Carp::confess:

use Carp;
local $SIG{__WARN__} = \confess;

Exception-handling in Perl is one of the few places that annoy me regularly.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

On Mon, Jun 25, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On MacOS X, on latest sources, initdb fails:

 creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... 
 ok
 creating subdirectories ... ok
 selecting default max_connections ... 100
 selecting default shared_buffers ... 32MB
 creating configuration files ... ok
 creating template1 database in
 /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
 initializing pg_authid ... ok
 initializing dependencies ... ok
 creating system views ... ok
 loading system objects' descriptions ... ok
 creating collations ... ok
 creating conversions ... ok
 creating dictionaries ... FATAL:  control file contains invalid data
 child process exited with exit code 1

 Same for me.  It's crashing here:

    if (ControlFile-state  DB_SHUTDOWNED ||
        ControlFile-state  DB_IN_PRODUCTION ||
        !XRecOffIsValid(ControlFile-checkPoint))
        ereport(FATAL,
                (errmsg(control file contains invalid data)));

 state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
 ControlFile-checkPoint == 19972072 (0x130BFE8), what's wrong with that?

 (I suppose the reason this is only failing on some machines is
 platform-specific variations in xlog entry size, but it's still a bit
 distressing that this got committed in such a broken state.)

I'm guessing that the problem is as follows: in the old code, the
XLogRecord header could not be split, so any offset that was closer to
the end of the page than SizeOfXLogRecord was a sure sign of trouble.
But commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d relaxed that
restriction, so now it IS legal for the checkpoint record to be where
it is.  But it seems that XRecOffIsValid() didn't get the memo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

David E. Wheeler da...@justatheory.com writes:
 Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
 Which is annoying. I missed what was warning up-thread, but the most 
 foolproof way to make all warnings fatal is the originally suggested

   local $SIG{__WARN__} = sub { die shift };

Sigh, let's do it that way then.

 A *bit* cleaner is to use Carp::croak:

 use Carp;
 local $SIG{__WARN__} = \croak;

Just as soon not add a new module dependency if we don't have to.
In this case, since we're not really expecting the warnings to get
thrown, it seems like there'd be little value added by doing so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Ryan Kelly

On Mon, Jun 25, 2012 at 12:07:55PM -0400, Tom Lane wrote:
 David E. Wheeler da...@justatheory.com writes:
  Hrm, I think that `use warnings 'FATAL';` might only work for core 
  warnings. Which is annoying. I missed what was warning up-thread, but the 
  most foolproof way to make all warnings fatal is the originally suggested
 
local $SIG{__WARN__} = sub { die shift };
 
 Sigh, let's do it that way then.
 
  A *bit* cleaner is to use Carp::croak:
 
  use Carp;
  local $SIG{__WARN__} = \croak;
 
 Just as soon not add a new module dependency if we don't have to.
Carp is core since Perl 5 [1994-10-17].

 In this case, since we're not really expecting the warnings to get
 thrown, it seems like there'd be little value added by doing so.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-Ryan Kelly

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

Robert Haas rh...@postgresql.org writes:
 Remove sanity test in XRecOffIsValid.

 Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
 for splitting xlog records across pages, but neglected to update this
 test.  It's possible that there's some better action here than just
 removing the test completely, but this at least appears to get some
 of the things that are currently broken (like initdb on MacOS X)
 working again.

Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
If it did, I don't think that the upper-limit test would have any
use anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 I bet for a lot of replication systems, the answer is do a full
 resync.  In other words, we either forbid the operation outright
 when the table is enabled for logical replication, or else we emit
 an LCR that says, in effect, transaction 12345 monkeyed with the
 table, please resync.  It strikes me that it's really the job of
 some higher-level control logic to decide what the correct
 behavior is in these cases; the decoding process doesn't really
 have enough information about what the user is trying to do to
 make a sensible decision anyway.
 
This is clearly going to depend on the topology.  You would
definitely want to try to replicate the DDL for the case on which
Simon is focused (which seems to me to be essentially physical
replication of catalogs with logical replication of data changes
from any machine to all others).  What you do about transactions in
flight is the hard part.  You could try to suppress concurrent DML
of the same objects or have some complex matrix of rules for trying
to resolve the transactions in flight.  I don't see how the latter
could ever be 100% accurate.
 
In our shop it is much easier.  We always have one database which is
the only valid source for any tuple, although rows from many such
databases can be in one table, and one row might replicate to many
databases.  Thus, we don't want automatic replication of DDL.
 
 - When a column is going to be added to the source machines, we
   first add it to the targets, with either a default or as
   NULL-capable.
 
 - When a column is going to be deleted from the source machines, we
   make sure it is NULL-capable or has a default on the replicas. 
   We drop it from all replicas after it is gone from all sources.
 
 - If a column is changing name or is changing to a fundamentally
   different type we need to give the new column a new name, have
   triggers to convert old to new (and vice versa) on the replicas,
   and drop the old after all sources are updated.
 
 - If a column is changing in a minor way, like its precision, we
   make sure the replicas can accept either format until all sources
   have been converted.  We update the replicas to match the sources
   after all sources are converted.
 
We most particularly *don't* want DDL to replicate automatically,
because the schema changes are deployed along with related software
changes, and we like to pilot any changes for at least a few days. 
Depending on the release, the rollout may take a couple months, or
we may slam in out everywhere a few days after the first pilot
deployment.
 
So you could certainly punt all of this for any release as far as
Wisconsin Courts are concerned.  We need to know table and column
names, before and after images, and some application-supplied
metadata.
 
I don't know that what we're looking for is any easier (although I
doubt that it's any harder), but I'm starting to wonder how much
mechanism they can really share.  The 2Q code is geared toward page
format OIDs and data values for automated DDL distribution and
faster replication, while we're looking for something which works
between releases, architectures, and OSes.  We keep coming back to
the idea of one mechanism because both WAL and a logical transaction
stream would have after tuples, although they need them in
different formats.
 
I think the need for truly logical replication is obvious, since so
many different people have developed trigger-based versions of that.
And it sure seems like 2Q has clients who are willing to pay for the
other.
 
Perhaps the first question is: Is there enough in common between
logical replication (and all the topologies that might be created
with that) and the proposal on the table (which seems to be based
around one particular topology with a vague notion of bolting
logical replication on to it after the fact) to try to resolve the
differences in one feature?  Or should the identical schema with
multiple identical copies case be allowed to move forward more or
less in isolation, with logical replication having its own design if
and when someone wants to take it on?  Two non-compromised features
might be cleaner -- I'm starting to feel like we're trying to design
a toaster which can also water your garden.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Perhaps the first question is: Is there enough in common between
 logical replication (and all the topologies that might be created
 with that) and the proposal on the table (which seems to be based
 around one particular topology with a vague notion of bolting
 logical replication on to it after the fact) to try to resolve the
 differences in one feature?  Or should the identical schema with
 multiple identical copies case be allowed to move forward more or
 less in isolation, with logical replication having its own design if
 and when someone wants to take it on?  Two non-compromised features
 might be cleaner -- I'm starting to feel like we're trying to design
 a toaster which can also water your garden.

I think there are a number of shared pieces.  Being able to read WAL
and do something with it is a general need that both solutions share;
I think actually that might be the piece that we should try to get
committed first.  I suspect that there are a number of applications
for just that and nothing more - for example, it might allow a contrib
module that reads WAL as it's generated and prints out a debug trace,
which I can imagine being useful.

Also, I think that even for MMR there will be a need for control
logic, resynchronization, and similar mechanisms.  I mean, suppose you
have four servers in an MMR configuration.  Now, you want to deploy a
schema change that adds a new column and which, as it so happens,
requires a table rewrite to add the default.  It is very possible that
you do NOT want that to automatically replicate around the cluster.
Instead, you likely want to redirect load to the remaining three
servers, do the change on the fourth, put it back into the ring and
take out a different one, do the change on that one, and so on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

On Mon, Jun 25, 2012 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas rh...@postgresql.org writes:
 Remove sanity test in XRecOffIsValid.

 Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
 for splitting xlog records across pages, but neglected to update this
 test.  It's possible that there's some better action here than just
 removing the test completely, but this at least appears to get some
 of the things that are currently broken (like initdb on MacOS X)
 working again.

 Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
 If it did, I don't think that the upper-limit test would have any
 use anymore.

Yeah, I wondered that, too, but wasn't sure enough about what the real
alignment requirements were to do it myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
 build:

This appears to be because of the following hunk from commit
dfda6ebaec6763090fb78b458a979b558c50b39b:

@@ -558,10 +536,10 @@ PrintControlValues(bool guessed)
snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
 ControlFile.system_identifier);

-   printf(_(First log file ID after reset:%u\n),
-  newXlogId);
-   printf(_(First log file segment after reset:   %u\n),
-  newXlogSeg);
+   XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSe
+
+   printf(_(First log segment after reset:%s\n),
+  fname);
printf(_(pg_control version number:%u\n),
   ControlFile.pg_control_version);
printf(_(Catalog version number:   %u\n),

Evidently, Heikki failed to realize that pg_upgrade gets the control
data information by parsing the output of pg_controldata.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
  So you propose to define any compiler that strictly implements C99 as
  not sensible and not one that will be able to compile Postgres?
  
  I propose to treat any compiler which has no way to get to equivalent
  behaviour as not sensible. Yes.

 Well, my response is no.  I could see saying that we require (some) C99
 features at this point, but not features that are in no standard, no
 matter how popular gcc might be.
I fail to see how gcc is the relevant point here given that there is 
equivalent definitions available from multiple compiler vendors.

Also, 'static inline' *is* C99 conforming as far as I can see? The problem 
with it is that some compilers may warn if the function isn't used in the same 
translation unit. Thats doesn't make not using a function non standard-
conforming though.

  I don't think there really are many of those
  around. As you pointed out there is only one compiler in the buildfarm
  with problems
 This just means we don't have a wide enough collection of non-mainstream
 machines in the buildfarm.  Deciding to break any platform with a
 non-gcc-equivalent compiler isn't going to improve that.
No, it won't improve that. But neither will the contrary.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Greg Jaskiewicz

Wasn't this more of an issue in de-coupling compression from encryption ?

 
On 25 Jun 2012, at 16:36, Euler Taveira wrote:

 On 24-06-2012 23:04, Robert Haas wrote:
 So I think we really
 need someone to try this both ways and compare.  Right now it seems
 like we're mostly speculating on how well either approach would work,
 which is not as good as having some experimental results.
 
 Not a problem. That's what I'm thinking too but I would like to make sure that
 others don't object to general idea. Let me give it a try in both ideas...
 
 
 -- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

This breaks pg_resetxlog -l at all. When I ran pg_resetxlog -l 0x01,0x01,0x01
in the HEAD, I got the following error message though the same command
successfully completed in 9.1.

pg_resetxlog: invalid argument for option -l
Try pg_resetxlog --help for more information.

I think the attached patch needs to be applied.

Regards,

-- 
Fujii Masao


resetxlog_bugfix_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

On Monday, June 25, 2012 05:34:13 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund and...@2ndquadrant.com 
wrote:
   The only theoretical way I see against that problem would be to
   postpone all relation unlinks untill everything that could possibly
   read them has finished. Doesn't seem to alluring although it would be
   needed if we ever move more things of SnapshotNow.
   
   Input/Ideas/Opinions?
  
  Yeah, this is slightly nasty.  I'm not sure whether or not there's a
  way to make it work.
  
  Postponing all non-rollback unlinks to the next logical checkpoint is
  the only thing I can think of...
 There are a number of cool things we could do if we postponed unlinks.
  Like, why can't we allow concurrent read-only queries while a CLUSTER
 operation is in progress?  Well, two reasons.  The first is that we
 currently can't do ANY DDL with less than a full table lock because of
 SnapshotNow-related race conditions.  The second is that people might
 still need to look at the old heap after the CLUSTER transaction
 commits.  Some kind of delayed unlink facility where we
 garbage-collect relation backing files when their refcount falls to
 zero would solve the second problem - not that that's any help by
 itself without a solution to the first one, but hey.
Its an argument why related infrastructure would be interesting to more than 
that patch and thats not bad.
If the garbage collecting is done in a very simplistic manner it doesn't sound 
too hard... The biggest problem is probably crash-recovery of that knowledge 
and how to hook knowledge into it that logical rep needs that data...

  I had another idea.  Suppose decoding happens directly on the primary,
  because I'm still hoping there's a way to swing that.  Suppose further
  that we handle DDL by insisting that (1) any backend which wants to
  add columns or change the types of existing columns must first wait
  for logical replication to catch up and (2) if a backend which has
  added columns or changed the types of existing columns then writes to
  the modified table, decoding of those writes will be postponed until
  transaction commit.  I think that's enough to guarantee that the
  decoding process can just use the catalogs as they stand, with plain
  old SnapshotNow.
  
  I don't think its that easy. If you e.g. have multiple ALTER's in the
  same transaction interspersed with inserted rows they will all have
  different TupleDesc's.
 
 If new columns were added, then tuples created with those older
 tuple-descriptors can still be interpreted with the latest
 tuple-descriptor.
But you need to figure that out. If you have just the before-after images of 
the tupledescs you don't know what happened in there... That would mean either 
doing special things on catalog changes or reassembling the meaning from the 
changed pg_* rows. Neither seems enticing.

 Columns that are dropped or retyped are a little trickier, but
 honestly... how much do we care about those cases?  How practical is
 it to suppose we're going to be able to handle them sanely anyway?
 Suppose that the user defines a type which works just like int4 except
 that the output functions writes out each number in pig latin (and the
 input function parses pig latin).  The user defines the types as
 binary coercible to each other and then does ALTER TABLE on a large
 table with an int4 column, transforming it into an int4piglatin
 column.  Due to Noah Misch's fine work, we will conclude that no table
 rewrite is needed.  But if logical replication is in use, then in
 theory we should scan the whole table and generate an LCR for each row
 saying the row with primary key X was updated, and column Y, which
 used to contain 42, now contains ourty-two-fay.  Otherwise, if we're
 doing heterogenous replication into a system that just stores that
 column as text, it'll end up with the wrong contents.  On the other
 hand, if we're trying to ship data to another PostgreSQL instance
 where the column hasn't yet been updated, then all of those LCRs are
 just going to error out when we try to apply them.

 A more realistic scenario where you have the same problem is with
 something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
 column with a default in a single step (as opposed to first adding the
 column and then setting its default), we rewrite the table and set
 every row to the default value.  Should that generate LCRs showing
 every row being updated to add that new value, or should we generate
 no LCRs and assume that the DBA will independently do the same
 operation on the remote side?  Either answer could be correct,
 depending on how the LCRs are being used.  If you're just rewriting
 with a constant default, then perhaps the sensible thing is to
 generate no LCRs, since it will be more efficient to mimic the
 operation on the remote side than to replay the changes row-by-row.
 But what if the default isn't a constant, like maybe it's

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

I found the typo.

In walsender.c
-reply.write.xlogid, reply.write.xrecoff,
-reply.flush.xlogid, reply.flush.xrecoff,
-reply.apply.xlogid, reply.apply.xrecoff);
+(uint32) (reply.write  32), (uint32) reply.write,
+(uint32) (reply.flush  32), (uint32) reply.flush,
+(uint32) (reply.apply  32), (uint32) reply.apply);

 should be . The attached patch fixes this typo.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Tue, Jun 26, 2012 at 2:53 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

 I found the typo.

 In walsender.c
 -                reply.write.xlogid, reply.write.xrecoff,
 -                reply.flush.xlogid, reply.flush.xrecoff,
 -                reply.apply.xlogid, reply.apply.xrecoff);
 +                (uint32) (reply.write  32), (uint32) reply.write,
 +                (uint32) (reply.flush  32), (uint32) reply.flush,
 +                (uint32) (reply.apply  32), (uint32) reply.apply);

  should be . The attached patch fixes this typo.

Oh, I forgot to attach the patch.. Here is the patch.

Regards,

-- 
Fujii Masao


walsender_typo_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
  should be . The attached patch fixes this typo.

 Oh, I forgot to attach the patch.. Here is the patch.

I committed both of the patches you posted to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund and...@2ndquadrant.com wrote:
 Its an argument why related infrastructure would be interesting to more than
 that patch and thats not bad.
 If the garbage collecting is done in a very simplistic manner it doesn't sound
 too hard... The biggest problem is probably crash-recovery of that knowledge
 and how to hook knowledge into it that logical rep needs that data...

I suppose the main reason we haven't done it already is that it
increases the period of time during which we're using 2X the disk
space.

  I don't think its that easy. If you e.g. have multiple ALTER's in the
  same transaction interspersed with inserted rows they will all have
  different TupleDesc's.

 If new columns were added, then tuples created with those older
 tuple-descriptors can still be interpreted with the latest
 tuple-descriptor.
 But you need to figure that out. If you have just the before-after images of
 the tupledescs you don't know what happened in there... That would mean either
 doing special things on catalog changes or reassembling the meaning from the
 changed pg_* rows. Neither seems enticing.

I think there is absolutely nothing wrong with doing extra things in
ALTER TABLE when logical replication is enabled.  We've got code
that's conditional on Hot Standby being enabled in many places in the
system; why should logical replication be any different?  If we set
the bar for logical replication at the system can't do anything
differently when logical replication is enabled then I cheerfully
submit that we are doomed.  You've already made WAL format changes to
support logging the pre-image of the tuple, which is a hundred times
more likely to cause a performance problem than any monkeying around
we might want to do in ALTER TABLE.

 Can we just agree to punt all this complexity for version 1 (and maybe
 versions 2, 3, and 4)?  I'm not sure what Slony does in situations
 like this but I bet for a lot of replication systems, the answer is
 do a full resync.  In other words, we either forbid the operation
 outright when the table is enabled for logical replication, or else we
 emit an LCR that says, in effect, transaction 12345 monkeyed with the
 table, please resync.  It strikes me that it's really the job of some
 higher-level control logic to decide what the correct behavior is in
 these cases; the decoding process doesn't really have enough
 information about what the user is trying to do to make a sensible
 decision anyway.  It would be nice to be able to support some simple
 cases like adding a column that has no default or dropping a
 column without punting, but going much further than that seems like
 it will require embedding policy decisions that should really be
 happening at a higher level.
 I am totally fine with saying that we do not support everything from the
 start. But we need to choose an architecture where its possible to add that
 support gradually and I don't think without looking inside transaction makes
 that possible.

I am deeply skeptical that we need to look inside of transactions that
do full-table rewrites.  But even if we do, I don't see that what I'm
proposing precludes it.  For example, I think we could have ALTER
TABLE emit WAL records specifically for logical replication that allow
us to disentangle which tuple descriptor to use at which point in the
transaction.  I don't see that that would even be very difficult to
set up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

Hi,

(munching the mail from Robert and Kevin together)

On Monday, June 25, 2012 06:42:41 PM Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
  I bet for a lot of replication systems, the answer is do a full
  resync.  In other words, we either forbid the operation outright
  when the table is enabled for logical replication, or else we emit
  an LCR that says, in effect, transaction 12345 monkeyed with the
  table, please resync.  It strikes me that it's really the job of
  some higher-level control logic to decide what the correct
  behavior is in these cases; the decoding process doesn't really
  have enough information about what the user is trying to do to
  make a sensible decision anyway.
 
 This is clearly going to depend on the topology.  You would
 definitely want to try to replicate the DDL for the case on which
 Simon is focused (which seems to me to be essentially physical
 replication of catalogs with logical replication of data changes
 from any machine to all others).  What you do about transactions in
 flight is the hard part.  You could try to suppress concurrent DML
 of the same objects or have some complex matrix of rules for trying
 to resolve the transactions in flight.  I don't see how the latter
 could ever be 100% accurate.
Yes. Thats why I dislike that proposal. I don't think thats going to be 
understandable and robust enough.

If we really look inside transactions (3b) and 1)) that shouldn't be a problem 
though. So I think it really has to be one of those.


 In our shop it is much easier.  We always have one database which is
 the only valid source for any tuple, although rows from many such
 databases can be in one table, and one row might replicate to many
 databases.  Thus, we don't want automatic replication of DDL.
 
  - When a column is going to be added to the source machines, we
first add it to the targets, with either a default or as
NULL-capable.
 
  - When a column is going to be deleted from the source machines, we
make sure it is NULL-capable or has a default on the replicas.
We drop it from all replicas after it is gone from all sources.
 
  - If a column is changing name or is changing to a fundamentally
different type we need to give the new column a new name, have
triggers to convert old to new (and vice versa) on the replicas,
and drop the old after all sources are updated.
 
  - If a column is changing in a minor way, like its precision, we
make sure the replicas can accept either format until all sources
have been converted.  We update the replicas to match the sources
after all sources are converted.

 We most particularly *don't* want DDL to replicate automatically,
 because the schema changes are deployed along with related software
 changes, and we like to pilot any changes for at least a few days.
 Depending on the release, the rollout may take a couple months, or
 we may slam in out everywhere a few days after the first pilot
 deployment.
Thats a sensible for your use-case - but I do not think its thats the 
appropriate behaviour for anything which is somewhat out-of-the box...

 So you could certainly punt all of this for any release as far as
 Wisconsin Courts are concerned.  We need to know table and column
 names, before and after images, and some application-supplied
 metadata.
I am not sure were going to get all that into 9.3. More on that below.

On Monday, June 25, 2012 07:09:38 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner wrote:
  I don't know that what we're looking for is any easier (although I
  doubt that it's any harder), but I'm starting to wonder how much
  mechanism they can really share.  The 2Q code is geared toward page
  format OIDs and data values for automated DDL distribution and
  faster replication, while we're looking for something which works
  between releases, architectures, and OSes.  We keep coming back to
  the idea of one mechanism because both WAL and a logical transaction
  stream would have after tuples, although they need them in
  different formats.
  
  I think the need for truly logical replication is obvious, since so
  many different people have developed trigger-based versions of that.
  And it sure seems like 2Q has clients who are willing to pay for the
  other.
 
  Perhaps the first question is: Is there enough in common between
  logical replication (and all the topologies that might be created
  with that) and the proposal on the table (which seems to be based
  around one particular topology with a vague notion of bolting
  logical replication on to it after the fact) to try to resolve the
  differences in one feature?  Or should the identical schema with
  multiple identical copies case be allowed to move forward more or
  less in isolation, with logical replication having its own design if
  and when someone wants to take it on?  Two non-compromised features
  might be cleaner -- I'm starting to feel like we're trying

Re: [HACKERS] new --maintenance-db options


Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:

 Really, I think
 pg_upgrade needs this option too, unless we're going to kill the
 problem at its root by providing a reliable way to enumerate database
 names without first knowing the name one that you can connect to.

I think pg_upgrade could do this one task by using a standalone backend
instead of a full-blown postmaster.  It should be easy enough ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:
 
 We most particularly *don't* want DDL to replicate automatically,
 because the schema changes are deployed along with related
 software changes, and we like to pilot any changes for at least a
 few days.  Depending on the release, the rollout may take a
 couple months, or we may slam in out everywhere a few days after
 the first pilot deployment.
 Thats a sensible for your use-case - but I do not think its thats
 the appropriate behaviour for anything which is somewhat
 out-of-the box...
 
Right.  We currently consider the issues involved in a change and
script it by hand.  I think we want to continue to do that.  The
point was that, given the variety of timings and techniques for
distributing schema changes, maybe is was only worth doing that
automatically for the most constrained and controlled cases.
 
 So you could certainly punt all of this for any release as far as
 Wisconsin Courts are concerned.  We need to know table and column
 names, before and after images, and some application-supplied
 metadata.
 I am not sure were going to get all that into 9.3.
 
Sure, that was more related to why I was questioning how much these
use cases even *could* integrate -- whether it even paid to
*consider* these use cases at this point.  Responses from Robert and
you have pretty much convinced me that I was being overly
pessimistic on that.
 
One fine point regarding before and after images -- if a value
doesn't change in an UPDATE, there's no reason to include it in both
the BEFORE and AFTER tuple images, as long as we have the null
column bitmaps -- or some other way of distinguishing unchanged from
NULL.  (This could be especially important when the unchanged column
was a 50 MB bytea.)  It doesn't matter to me which way this is
optimized -- in our trigger-based system we chose to keep the full
BEFORE tuple and just show AFTER values which were distinct from the
corresponding BEFORE values, but it would be trivial to adapt the
code to the other way.
 
Sorry for that bout of pessimism.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
 Really, I think
 pg_upgrade needs this option too, unless we're going to kill the
 problem at its root by providing a reliable way to enumerate database
 names without first knowing the name one that you can connect to.

 I think pg_upgrade could do this one task by using a standalone backend
 instead of a full-blown postmaster.  It should be easy enough ...

Maybe, but it seems like baking even more hackery into a tool that's
already got too much hackery.  It's also hard for pg_upgrade to know
things like - whether pg_hba.conf prohibits access to certain
users/databases/etc. or just requires the use of authentication
methods that happen to fail.  From pg_upgrade's perspective, it would
be nice to have a flag that starts the server in some mode where
nobody but pg_upgrade can connect to it and all connections are
automatically allowed, but it's not exactly clear how to implement
nobody but pg_upgrade can connect to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options


Excerpts from Robert Haas's message of lun jun 25 14:58:25 -0400 2012:
 
 On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
  Really, I think
  pg_upgrade needs this option too, unless we're going to kill the
  problem at its root by providing a reliable way to enumerate database
  names without first knowing the name one that you can connect to.
 
  I think pg_upgrade could do this one task by using a standalone backend
  instead of a full-blown postmaster.  It should be easy enough ...
 
 Maybe, but it seems like baking even more hackery into a tool that's
 already got too much hackery.  It's also hard for pg_upgrade to know
 things like - whether pg_hba.conf prohibits access to certain
 users/databases/etc. or just requires the use of authentication
 methods that happen to fail.  From pg_upgrade's perspective, it would
 be nice to have a flag that starts the server in some mode where
 nobody but pg_upgrade can connect to it and all connections are
 automatically allowed, but it's not exactly clear how to implement
 nobody but pg_upgrade can connect to it.

Well, have it specify a private socket directory, listen only on that
(not TCP), and bypass all pg_hba rules.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

Hi,

On Monday, June 25, 2012 08:13:54 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  Its an argument why related infrastructure would be interesting to more
  than that patch and thats not bad.
  If the garbage collecting is done in a very simplistic manner it doesn't
  sound too hard... The biggest problem is probably crash-recovery of that
  knowledge and how to hook knowledge into it that logical rep needs that
  data...
 I suppose the main reason we haven't done it already is that it
 increases the period of time during which we're using 2X the disk
 space.
I find that an acceptable price if its optional. Making it such doesn't seem 
to be a problem for me.

   I don't think its that easy. If you e.g. have multiple ALTER's in the
   same transaction interspersed with inserted rows they will all have
   different TupleDesc's.
  
  If new columns were added, then tuples created with those older
  tuple-descriptors can still be interpreted with the latest
  tuple-descriptor.
  
  But you need to figure that out. If you have just the before-after images
  of the tupledescs you don't know what happened in there... That would
  mean either doing special things on catalog changes or reassembling the
  meaning from the changed pg_* rows. Neither seems enticing.
 
 I think there is absolutely nothing wrong with doing extra things in
 ALTER TABLE when logical replication is enabled.  We've got code
 that's conditional on Hot Standby being enabled in many places in the
 system; why should logical replication be any different?  If we set
 the bar for logical replication at the system can't do anything
 differently when logical replication is enabled then I cheerfully
 submit that we are doomed.  You've already made WAL format changes to
 support logging the pre-image of the tuple, which is a hundred times
 more likely to cause a performance problem than any monkeying around
 we might want to do in ALTER TABLE.

 I am deeply skeptical that we need to look inside of transactions that
 do full-table rewrites.  But even if we do, I don't see that what I'm
 proposing precludes it.  For example, I think we could have ALTER
 TABLE emit WAL records specifically for logical replication that allow
 us to disentangle which tuple descriptor to use at which point in the
 transaction.  I don't see that that would even be very difficult to
 set up.
Sorry, I was imprecise above: I have no problem doing some changes during 
ALTER TABLE if logical rep is enabled. I am worried though that to make that 
robust you would need loads of places that emit additional information:
* ALTER TABLE
* ALTER FUNCTIION
* ALTER OPERATOR
* ALTER/CREATE CAST
* TRUNCATE
* CLUSTER
* ...

I have the feeling that would we want to do that the full amount of required 
information would be rather high and end up being essentially the catalog. 
Just having an intermediate tupledesc doesn't help that much if you have e.g. 
record_out doing type lookups of its own.

There also is the issue you have talked about before, that a user-type might 
depend on values in other tables. Unless were ready to break at least 
transactional behaviour for those for now...) I don't see how decoding outside 
of the transaction is ever going to be valid? I wouldn't have a big problem 
declaring that as broken for now...

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Dimitri Fontaine

Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...

So we're talking about some LZO things such as snappy from google, and
that would be another run time dependency IIUC.

I think it's time to talk about fastlz:

  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c

  551 lines of C code under MIT license, works also under windows

I guess it would be easy (and safe) enough to embed in our tree should
we decide going this way.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Florian Pflug

On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
 Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...
 
 So we're talking about some LZO things such as snappy from google, and
 that would be another run time dependency IIUC.
 
 I think it's time to talk about fastlz:
 
  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
 
  551 lines of C code under MIT license, works also under windows
 
 I guess it would be easy (and safe) enough to embed in our tree should
 we decide going this way.

Agreed. If we extend the protocol to support compression, and not rely
on SSL, then let's pick one of these LZ77-style compressors, and let's
integrate it into our tree.

We should then also make it possible to enable compression only for
the server - client direction. Since those types of compressions are
usually pretty easy to decompress, that reduces the amount to work
non-libpq clients have to put in to take advantage of compression.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Phil Sorber

On Mon, Jun 25, 2012 at 3:45 PM, Florian Pflug f...@phlo.org wrote:
 On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
 Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...

 So we're talking about some LZO things such as snappy from google, and
 that would be another run time dependency IIUC.

 I think it's time to talk about fastlz:

  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c

  551 lines of C code under MIT license, works also under windows

 I guess it would be easy (and safe) enough to embed in our tree should
 we decide going this way.

 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.

+1


 best regards,
 Florian Pflug


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

Andres Freund and...@2ndquadrant.com writes:
 On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
 Well, my response is no.  I could see saying that we require (some) C99
 features at this point, but not features that are in no standard, no
 matter how popular gcc might be.

 Also, 'static inline' *is* C99 conforming as far as I can see?

Hmm.  I went back and re-read the C99 spec, and it looks like most of
the headaches we had in the past with C99 inline are specific to the
case where you want an extern declaration to be available.  For a
function that exists *only* as a static it might be all right.  So maybe
I'm misremembering how well this would work.  We'd have to be sure we
don't need any extern declarations, though.

Having said that, I'm still of the opinion that it's not so hard to deal
with that we should just blow off compilers where inline doesn't work
well.  I have no sympathy at all for the we'd need two copies
argument.  First off, if the code is at any risk whatsoever of changing
intra-major-release, it is not acceptable to inline it (there would be
inline copies in third-party modules where we couldn't ensure
recompilation).  So that's going to force us to use this only in cases
where the code is small and stable enough that two copies aren't such
a big problem.  Second, it's not that hard to set things up so there's
only one source-code copy, as was noted upthread.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal


Excerpts from Kevin Grittner's message of lun jun 25 14:50:54 -0400 2012:

 One fine point regarding before and after images -- if a value
 doesn't change in an UPDATE, there's no reason to include it in both
 the BEFORE and AFTER tuple images, as long as we have the null
 column bitmaps -- or some other way of distinguishing unchanged from
 NULL.  (This could be especially important when the unchanged column
 was a 50 MB bytea.)

Yeah, probably the best is to have the whole thing in BEFORE, and just
send AFTER values for those columns that changed, and include the
'replace' bool array (probably packed as a bitmap), so that the update
can be trivially constructed at the other end just like in
heap_modify_tuple.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

Robert Haas robertmh...@gmail.com writes:
 From pg_upgrade's perspective, it would
 be nice to have a flag that starts the server in some mode where
 nobody but pg_upgrade can connect to it and all connections are
 automatically allowed, but it's not exactly clear how to implement
 nobody but pg_upgrade can connect to it.

The implementation I've wanted to see for some time is that you can
start a standalone backend, but it speaks FE/BE protocol to its caller
(preferably over pipes, so that there is no issue whatsoever of where
you can securely put a socket or anything like that).  Making that
happen might be a bit too much work if pg_upgrade were the only use
case, but there are a lot of people who would like to use PG as an
embedded database, and this might be close enough for such use-cases.

However, that has got little to do with whether --maintenance-db is a
worthwhile thing or not, because that's about external client-side
tools, not pg_upgrade.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Peter Geoghegan

On 25 June 2012 20:59, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Also, 'static inline' *is* C99 conforming as far as I can see?

 Hmm.  I went back and re-read the C99 spec, and it looks like most of
 the headaches we had in the past with C99 inline are specific to the
 case where you want an extern declaration to be available.  For a
 function that exists *only* as a static it might be all right.  So maybe
 I'm misremembering how well this would work.  We'd have to be sure we
 don't need any extern declarations, though.

Yeah, the extern inline functions sounds at least superficially
similar to what happened with extern templates in C++ - exactly one
compiler vendor implemented them to the letter of the standard (they
remained completely unimplemented elsewhere), and subsequently went
bust, before they were eventually removed from the standard last year.

Note that when you build Postgres with Clang, it's implicitly and
automatically building C code as C99. There is an excellent analysis
of the situation here, under C99 inline functions:

http://clang.llvm.org/compatibility.html

 Having said that, I'm still of the opinion that it's not so hard to deal
 with that we should just blow off compilers where inline doesn't work
 well.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread k...@rice.edu

On Mon, Jun 25, 2012 at 09:45:26PM +0200, Florian Pflug wrote:
 On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
  Magnus Hagander mag...@hagander.net writes:
  Or that it takes less code/generates cleaner code...
  
  So we're talking about some LZO things such as snappy from google, and
  that would be another run time dependency IIUC.
  
  I think it's time to talk about fastlz:
  
   http://fastlz.org/
   http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
  
   551 lines of C code under MIT license, works also under windows
  
  I guess it would be easy (and safe) enough to embed in our tree should
  we decide going this way.
 
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.
 
 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.
 
 best regards,
 Florian Pflug
 

Here is the benchmark list from the Google lz4 page:

NameRatio   C.speed D.speed
LZ4 (r59)   2.084   330  915
LZO 2.05 1x_1   2.038   311  480
QuickLZ 1.5 -1  2.233   257  277
Snappy 1.0.52.024   227  729
LZF 2.076   197  465
FastLZ  2.030   190  420
zlib 1.2.5 -1   2.72839  195
LZ4 HC (r66)2.71218 1020
zlib 1.2.5 -6   3.09514  210

lz4 absolutely dominates on compression/decompression speed. While fastlz
is faster than zlib(-1) on compression, lz4 is almost 2X faster still.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu

Hi all,

I've been working on a new system view, pg_stat_lwlocks, to observe
LWLock, and just completed my 'proof-of-concept' code that can work
with version 9.1.

Now, I'd like to know the possibility of this feature for future
release.

With this patch, DBA can easily determine a bottleneck around lwlocks.
--
postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10;
 lwlockid | calls  | waits | time_ms
--++---+-
   49 | 193326 | 32096 |   23688
8 |   3305 |   133 |1335
2 | 21 | 0 |   0
4 | 135188 | 0 |   0
5 |  57935 | 0 |   0
6 |141 | 0 |   0
7 |  24580 | 1 |   0
3 |   3282 | 0 |   0
1 | 41 | 0 |   0
9 |  3 | 0 |   0
(10 rows)

postgres=#
--

In this view,
  'lwlockid' column represents LWLockId used in the backends.
  'calls' represents how many times LWLockAcquire() was called.
  'waits' represents how many times LWLockAcquire() needed to wait
  within it before lock acquisition.
  'time_ms' represents how long LWLockAcquire() totally waited on
  a lwlock.

And lwlocks that use a LWLockId range, such as BufMappingLock or
LockMgrLock, would be grouped and summed up in a single record.
For example, lwlockid 49 in the above view represents LockMgrLock
statistics.

Now, I know there are some considerations.

(1) Performance

  I've measured LWLock performance both with and without the patch,
  and confirmed that this patch does not affect the LWLock perfomance
  at all.

  pgbench scores with the patch:
tps = 900.906658 (excluding connections establishing)
tps = 908.528422 (excluding connections establishing)
tps = 903.900977 (excluding connections establishing)
tps = 910.470595 (excluding connections establishing)
tps = 909.685396 (excluding connections establishing)

  pgbench scores without the patch:
tps = 909.096785 (excluding connections establishing)
tps = 894.868712 (excluding connections establishing)
tps = 910.074669 (excluding connections establishing)
tps = 904.022770 (excluding connections establishing)
tps = 895.673830 (excluding connections establishing)

  Of course, this experiment was not I/O bound, and the cache hit ratio
  was 99.9%.

(2) Memory space

  In this patch, I added three new members to LWLock structure
  as uint64 to collect statistics.

  It means that those members must be held in the shared memory,
  but I'm not sure whether it's appropriate.

  I think another possible option is holding those statistics
  values in local (backend) process memory, and send them through
  the stat collector process (like other statistics values).

(3) LWLock names (or labels)

  Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is
  not easy for DBA to determine actual lock type.

  So, I want to show LWLock names (or labels), like 'WALWriteLock'
  or 'LockMgrLock', but how should I implement it?

Any comments?

Regards,
-- 
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.bki 
postgresql-9.1.2/src/backend/catalog/postgres.bki
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.bki  2012-06-20 
03:32:46.0 +0900
--- postgresql-9.1.2/src/backend/catalog/postgres.bki   2012-06-26 
01:51:52.0 +0900
***
*** 1553,1558 
--- 1553,1559 
  insert OID = 3071 ( pg_xlog_replay_pause 11 10 12 1 0 0 f f f t f v 0 0 2278 
 _null_ _null_ _null_ _null_ pg_xlog_replay_pause _null_ _null_ _null_ )
  insert OID = 3072 ( pg_xlog_replay_resume 11 10 12 1 0 0 f f f t f v 0 0 2278 
 _null_ _null_ _null_ _null_ pg_xlog_replay_resume _null_ _null_ _null_ )
  insert OID = 3073 ( pg_is_xlog_replay_paused 11 10 12 1 0 0 f f f t f v 0 0 
16  _null_ _null_ _null_ _null_ pg_is_xlog_replay_paused _null_ _null_ _null_ 
)
+ insert OID = 3764 ( pg_stat_get_lwlocks 11 10 12 1 100 0 f f f f t s 0 0 2249 
 {20,20,20,20} {o,o,o,o} {lwlockid,calls,waits,time_ms} _null_ 
pg_stat_get_lwlocks _null_ _null_ _null_ )
  insert OID = 2621 ( pg_reload_conf 11 10 12 1 0 0 f f f t f v 0 0 16  
_null_ _null_ _null_ _null_ pg_reload_conf _null_ _null_ _null_ )
  insert OID = 2622 ( pg_rotate_logfile 11 10 12 1 0 0 f f f t f v 0 0 16  
_null_ _null_ _null_ _null_ pg_rotate_logfile _null_ _null_ _null_ )
  insert OID = 2623 ( pg_stat_file 11 10 12 1 0 0 f f f t f v 1 0 2249 25 
{25,20,1184,1184,1184,1184,16} {i,o,o,o,o,o,o} 
{filename,size,access,modification,change,creation,isdir} _null_ pg_stat_file 
_null_ _null_ _null_ )
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.description 
postgresql-9.1.2/src/backend/catalog/postgres.description
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.description  
2012-06-20 03:32:46.0 +0900
---

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 25-06-2012 16:45, Florian Pflug wrote:
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.
 
If we have an option to have it out of our tree, good; if not, let's integrate
it. I, particularly, don't see a compelling reason to integrate compression
code in our tree, I mean, if we want to support more than one algorithm, it is
clear that the overhead for maintain the compression code is too high (a lot
of my-new-compression-algorithms).

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.
 
I don't buy this idea. My use case (data load) will not be covered if we only
enable server - client compression. I figure that there are use cases for
server - client compression (replication, for example) but also there are
important use cases for client - server (data load, for example). If you
implement decompression, why not code compression code too?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 25-06-2012 14:30, Greg Jaskiewicz wrote:
 Wasn't this more of an issue in de-coupling compression from encryption ?
 
Let me give a try to take some conclusion. If we decide for an independent
compression code instead of an SSL-based one, that is a possibility to be
tested: SSL + SSL-based compression x SSL + our compression code. If the
latter is faster then we could discuss the possibility to disable compression
in our SSL code and put in documentation that it is recommended to enable
compression in SSL connections.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Merlin Moncure

On Mon, Jun 25, 2012 at 3:38 PM, Euler Taveira eu...@timbira.com wrote:
 On 25-06-2012 16:45, Florian Pflug wrote:
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.

 If we have an option to have it out of our tree, good; if not, let's integrate
 it. I, particularly, don't see a compelling reason to integrate compression
 code in our tree, I mean, if we want to support more than one algorithm, it is
 clear that the overhead for maintain the compression code is too high (a lot
 of my-new-compression-algorithms).

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.

 I don't buy this idea. My use case (data load) will not be covered if we only
 enable server - client compression. I figure that there are use cases for
 server - client compression (replication, for example) but also there are
 important use cases for client - server (data load, for example). If you
 implement decompression, why not code compression code too?

+1.  lz4, which is looking like a strong candidate,  has c#, java,
etc. which are the main languages that are going to lag behind in
terms of protocol support.  I don't think you're saving a lot by going
only one way (although you could make a case for the client to signal
interest in compression separately from decompression?)

another point:
It's been obvious for years now that zlib is somewhat of a dog in
terms of cpu usage for what it gives you.  however, raw performance #s
are not the whole story -- it would be interesting to compress real
world protocol messages to/from the server in various scenarios to see
how compression would work, in particular with OLTP type queries --
for example pgbench runs, etc. That would speak a lot more to the
benefits than canned benchmarks.

merlin

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Josh Berkus

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:
 (1) Performance
 
   I've measured LWLock performance both with and without the patch,
   and confirmed that this patch does not affect the LWLock perfomance
   at all.

This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.

I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Heikki Linnakangas


On 25.06.2012 21:01, Robert Haas wrote:

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

 should be. The attached patch fixes this typo.


Oh, I forgot to attach the patch.. Here is the patch.


I committed both of the patches you posted to this thread.


Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
file name, instead of comma-separated tli, xlogid, segno arguments. The 
latter is a bit meaningless now that we don't use the xlogid+segno 
combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
by the change in pg_resetxlog -n output - I changed that too to print 
the First log segment after reset information as a WAL file name, 
instead of logid+segno. Another option would be to print the 64-bit 
segment number, but I think that's worse, because the 64-bit 	segment 
number is harder to associate with a physical WAL file.


So I think we should change pg_resetxlog -l option to take a WAL file 
name as argument, and fix pg_upgrade accordingly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-25 Thread Steve Singer


On 12-06-21 04:37 AM, Andres Freund wrote:

Hi Steve,
Thanks!



Attached is a detailed review of the patch.

Very good analysis, thanks!

Another reasons why we cannot easily do 1) is that subtransactions aren't
discernible from top-level transactions before the top-level commit happens,
we can only properly merge in the right order (by sorting via lsn) once we
have seen the commit record which includes a list of all committed
subtransactions.



Based on that and your comments further down in your reply (and that no 
one spoke up and disagreed with you) It sounds like that doing (1) isn't 
going to be practical.



I also don't think 1) would be particularly welcome by people trying to
replicate into foreign systems.



They could still sort the changes into transaction groups before 
applying to the foreign system.



I planned to have some cutoff 'max_changes_in_memory_per_txn' value. 
If it has

been reached for one transaction all existing changes are spilled to disk. New
changes again can be kept in memory till its reached again.

Do you want max_changes_per_in_memory_txn or do you want to put a limit 
on the total amount of memory that the cache is able to use? How are you 
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how 
much memory their system has and that they can devote to the apply cache 
versus other things, giving them guidance on how estimating how much 
open transactions they might have at a point in time  and how many
WAL change records each transaction generates seems like a step 
backwards from the progress we've been making in getting Postgresql to 
be easier to tune.  The maximum number of transactions that could be 
opened at a time is governed by max_connections on the master at the 
time the WAL was generated , so I don't even see how the machine 
processing the WAL records could autotune/guess that.





We need to support serializing the cache for crash recovery + shutdown of the
receiving side as well. Depending on how we do the wal decoding we will need
it more frequently...



Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state 
everytime it calls cache-commit() to ensure that consumers don't get 
invoked twice on the same transaction.


If the apply module is making changes to the same backend that the apply 
cache serializes to then both the state for the apply cache and the 
changes that committed changes/transactions make will be persisted (or 
not persisted) together.   What if I am replicating from x86 to x86_64 
via a apply module that does textout conversions?


x86 Proxy x86_64
WAL-- apply
 cache
  |   (proxy catalog)
  |
 apply module
  textout  -
  SQL statements


How do we ensure that the commits are all visible(or not visible)  on 
the catalog on the proxy instance used for decoding WAL, the destination 
database, and the state + spill files of the apply cache stay consistent 
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd 
rather we didn't).  Can we come up with a way of avoiding the need for 
them to be consistent with each other?


I think apply modules will need to be able to be passed the same 
transaction twice (once before the crash and again after) and come up 
with a  way of deciding if that transaction has  a) Been applied to the 
translation/proxy catalog and b) been applied on the replica instance.   
How is the walreceiver going to decide which WAL sgements it needs to 
re-process after a crash?  I would want to see more of these details 
worked out before we finalize the interface between the apply cache and 
the apply modules and how the serialization works.



Code Review
=

applycache.h
---
+typedef struct ApplyCacheTupleBuf
+{
+/* position in preallocated list */
+ilist_s_node node;
+
+HeapTupleData tuple;
+HeapTupleHeaderData header;
+char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big 
the data in the transaction is? Wouldn't workloads with inserts of lots 
of small rows in a transaction eat up lots of memory that is allocated 
but storing nothing?  The only alternative I can think of is dynamically 
allocating these and I don't know what the cost/benefit of that overhead 
will be versus spilling to disk sooner.


+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
 +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

Re: [HACKERS] WAL format changes


Excerpts from Heikki Linnakangas's message of lun jun 25 20:09:34 -0400 2012:
 On 25.06.2012 21:01, Robert Haas wrote:
  On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:
   should be. The attached patch fixes this typo.
 
  Oh, I forgot to attach the patch.. Here is the patch.
 
  I committed both of the patches you posted to this thread.
 
 Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
 file name, instead of comma-separated tli, xlogid, segno arguments. The 
 latter is a bit meaningless now that we don't use the xlogid+segno 
 combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
 by the change in pg_resetxlog -n output - I changed that too to print 
 the First log segment after reset information as a WAL file name, 
 instead of logid+segno. Another option would be to print the 64-bit 
 segment number, but I think that's worse, because the 64-bit segment 
 number is harder to associate with a physical WAL file.
 
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

The only thing pg_upgrade does with the tli/logid/segno combo, AFAICT,
is pass it back to pg_resetxlog -l, so this plan seems reasonable.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

I wrote:
 Barring objections or better ideas, I'll push forward with applying
 this patch and the dependency-fixup patch.

Applied and back-patched.

BTW, I had complained at the end of the pgsql-bugs thread about bug #6699
that it seemed like FK constraints would get prevented from being
restored in parallel fashion if they depended on a constraint-style
unique index, because the dependency for the FK constraint would
reference the index's dump ID, which is nowhere to be seen in the dump.
But in fact they are restored in parallel, and the reason is that
pg_restore silently ignores any references to unknown dump IDs (a hack
put in specifically because of the bogus dependency data, no doubt).
So that raises the opposite question: how come pg_restore doesn't fall
over from trying to create the FK constraint before the unique index it
depends on is created?  And the answer to that is dumb luck, more or
less.  The locking dependencies hack in pg_restore knows that creation
of an FK constraint requires exclusive lock on both tables, so it won't
try to restore the FK constraint before creation of the constraint-style
index is done.  So getting the dependency information fixed is a
necessary prerequisite for any attempt to reduce the locking
requirements there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

Seems reasonable I guess.  It's really specifying a starting WAL
location, but only to file granularity, so treating the argument as a
file name is sort of a type cheat but seems convenient.

If we do it that way, we'd better validate that the argument is a legal
WAL file name, so as to catch any cases where somebody tries to do it
old-style.

BTW, does pg_resetxlog's logic for setting the default -l value (from
scanning pg_xlog to find the largest existing file name) still work?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu


2012/06/26 6:44, Josh Berkus wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the patch,
   and confirmed that this patch does not affect the LWLock perfomance
   at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.


Thanks. I will submit the patch to the CommitFest page with some fixes
to be able to work with the latest PostgreSQL on Git.


I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?


Yes. it has not yet been implemented yet since this code is just a PoC
one, but it is another design issue which needs to be discussed.

To implement it, a new array can be added in the local process memory
to hold lwlock statistics, and update counters both in the shared
memory and the local process memory at once. Then, the session can
retrieve 'per-session' statistics from the local process memory
via some dedicated function.

Does it make sense? Any comments?

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-25 Thread Jeff Janes

On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  That was actually a gloss I added on existing code to try to
 convince myself that it was safe; I don't think that the changes I
 made make that any more or less safe than it was before.

Right, sorry.  I thought there was some strength reduction going on
there as well.

Thanks for the various explanations, they address my concerns.  I see
that v2 applies over v1.

I've verified performance improvements using 8 cores with my proposed
pgbench -P benchmark, with a scale that fits in shared_buffers.
It brings it most of the way, but not quite, up to the btree performance.


I've marked this as ready for committer.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu


2012/06/26 7:04, Kevin Grittner wrote:

Josh Berkusj...@agliodbs.com  wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the
   patch, and confirmed that this patch does not affect the LWLock
   perfomance at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since
trace_lwlocks has quite a noticable one in my experience.
However, the answer to that is to submit the patch and let people
test.


I think overhead is going to depend quite a bit on the
gettimeofday() implementation, since that is called twice per lock
wait.


Yes.
It's one of my concerns, and what I actually want hackers to test.


It looked to me like there was nothing to prevent concurrent updates
of the counts while gathering the accumulated values for display.
Won't this be a problem on 32-bit builds?


Actually, I'd like to know how I can improve my code in a 32bit box.

However, unfortunately I don't have any 32bit (physical) box now,
so I want someone to test it if it needs to be tested.


Please add this to the Open COmmitFest for a proper review:

https://commitfest.postgresql.org/action/commitfest_view/open


Will submit soon. Thanks.



-Kevin



Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Amit Kapila

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  So I think we should change pg_resetxlog -l option to take a WAL file 
  name as argument, and fix pg_upgrade accordingly.

 Seems reasonable I guess.  It's really specifying a starting WAL
 location, but only to file granularity, so treating the argument as a
 file name is sort of a type cheat but seems convenient.

 If we do it that way, we'd better validate that the argument is a legal
 WAL file name, so as to catch any cases where somebody tries to do it
 old-style.

 BTW, does pg_resetxlog's logic for setting the default -l value (from
 scanning pg_xlog to find the largest existing file name) still work?
  

It finds the segment number for largest existing file name from pg_xlog and
then compare it with input provided by the 
user for -l Option, if input is greater it will use the input to set in
control file.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework