date:20140108

Re: [HACKERS] Simple improvements to freespace allocation

2014-01-08 Thread Simon Riggs

On 8 January 2014 07:43, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 01/08/2014 08:56 AM, Simon Riggs wrote:

 Current freesapce code gives a new block insert target (NBT) from
 anywhere in table. That isn't very useful with bigger tables and it
 would be useful to be able to specify different algorithms for
 producing NBTs.


 I've actually been surprised how little demand there has been for
 alternative algorithms. When I wrote the current FSM implementation, I
 expected people to start coming up with all kinds of wishes, but it didn't
 happen. There has been very few complaints, everyone seems to be satisfied
 with the way it works now. So I'm not convinced there's much need for this.

That would require someone to conduct detailed analysis on problems,
which few people are capable of doing at a level that we would accept.
No doubt I will soon be challenged to prove beyond doubt that anything
here is required, which becomes chicken and egg. For the vast majority
of cases, the general approach we have works well enough - this area
has had lots of very useful attention ove the years.

The problem is tables have multiple use cases and we support only one,
with no easy way for people to experiment with alternatives in
production.

Its been on my list for years... but its not been a top priority, for sure.


 ALTER TABLE foo WITH (freespace = );

 Three simple and useful models come to mind

 * CONCURRENT
 This is the standard/current model. Naming it likes this emphasises
 why we pick NBTs in the way we do.

 * PACK
 We want the table to be smaller, so rather than run a VACUUM FULL we
 want to force the table to choose an NBT at start of table, even at
 the expense of concurrency. By avoiding putting new data at the top of
 the table we allow the possibility that VACUUM will shrink table size.
 This is same as current except we always reset the FSM pointer to zero
 and re-seek from there. This takes some time to have an effect, but is
 much less invasive than VACUUM FULL.


 We already reset the FSM pointer to zero on vacuum. Would the above actually
 make any difference in practice?

The Pack algo would emphasise tight packing over assigning concurrent
blocks. It would be useful if that also included not doing HOT updates
in favour of migrating rows to an earlier block in the table. Emphasis
on avoiding VACUUM FULL in certain cases, not for general use.


 * RECENT
 For large tables that are append-mostly use case it would be easier to
 prefer NBTs from the last two 1GB segments of a table, allowing them
 to be more easily cached. This is same as current except when we wrap
 we don't go to block 0 we go to first block of penultimate (max - 1)
 segment. For tables = 2 segments this is no change from existing
 algorithm. For larger tables it would focus updates/inserts into a
 much reduced and yet still large area and allow better cacheing.


 Umm, wouldn't that bloat the table with no limit? Putting my DBA/developer
 hat on, I don't understand when I would want to use that setting.

That would depend on the use case; no algo suggested works for all or
even general cases.

If you have a large table, allocating freespace in blocks unlikely to
be accessed by queries means we introduce additional cache pressure.
If we allocate NBTs from newer blocks we will more likely find them in
cache.

Allowing older data to become write-seldom allows us to consider
things like compressing particular segments or moving them onto
cheaper storage.


(Another suggestion might be to use the VM so that we tend to add data
to already dirty blocks.)

 There's one policy that I'd like to see: maintaining cluster order. When
 inserting a new tuple, try to place it close to other tuples with similar
 keys, to keep the table clustered.

Agreed. Do you have a particular algorithm in mind? I can think of a few.

 In practice, CLUSTER CONCURRENTLY might be more useful, though.

I think we want both wholesale and retail. Certainly in the absence of
the former, the latter seems good addition.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Turning off HOT/Cleanup sometimes

2014-01-08 Thread Simon Riggs

VACUUM cleans up blocks, which is nice because it happens offline in a
lazy manner.

We also make SELECT clean up blocks as it goes. That is useful in OLTP
workloads, but it means that large SQL queries and pg_dump effectively
do much the same work as VACUUM, generating huge amounts of I/O and
WAL on the master, the cost and annoyance of which is experienced
directly by the user. That is avoided on standbys.

Effects of that are that long running statements often run much longer
than we want, increasing bloat as a result. It also produces wildly
varying response times, depending upon extent of cleanup required.

It is a simple task to make that behaviour optional on the master.

I propose a USERSET parameter, prune_cost_limit (---insert better name here)
which will make the behaviour optional, default -1, in normal user
processes. VACUUM will ignore this parameter and so its actions will
never be deferred.

In detail, this parameter would disable pruning for any scan larger
than the cost limit. So large scans will disable the behaviour. The
default, -1, means never disable pruning, which is the current
behavour.

We track the number of pages dirtied by the current statement. When
this reaches prune_cost_limit, we will apply these behaviours to all
shared_buffer block accesses...

(1) avoid running heap_page_prune_opt()

(2) avoid dirtying the buffer for hints. (This is safe because the
hinted changes will either be lost or will be part of the full page
image when we make a logged-change).

(i.e. doesn't apply to temp tables)

For example, if we set prune_cost_limit = 4 this behaviour allows
small index lookups via bitmapheapscan to continue to cleanup, while
larger index and seq scans will avoid cleanup.



There would be a postgresql.conf parameter prune_cost_limit, as well
as a table level parameter that would prevent pruning except via
VACUUM.

This will help in these ways
* Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
* Allow finer grained control over Hot Standby conflicts
* Potentially allow diagnostic inspection of older data via SeqScan

Prototype patch shows this is possible and simple enough for 9.4.
Major objections? Or should I polish up and submit?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-08 Thread Craig Ringer

Dean,

Short version
-

Looks amazing overall. Very clever to zip up the s.b. quals, let the
rest of the rewriter and planer do their work normally, then unpack them
into subqueries inserted in the planner once inheritance appendrels are
expanded, etc.

My main concern is that the securityQuals appear to bypass all later
rewrite stages, inheritance expansion during planning, etc. I suspect
this might be hard to get around (because these are disembodied quals
which may have nonsense varnos), but I'm looking into it now.

There's also an assertion failure whenever a correlated subquery appears
as a security barrier view qual.  Again, looking at it.

Ideas on that issue?



Much longer version: My understanding of how it works
-

My understanding from reading the patch is that this:

- Flattens target views in rewriteTargetView, as in current master. If
the target view is a security barrier view, the view quals are appended
to a list of security barrier quals on the new RTE, instead of appended
to the RTE's normal quals like for normal views.

After rewrite the views are fully flattened down to a RTE_RELATION,
which becomes the resultRelation. An unreferenced RTE for each view
that's been rewritten is preserved in the range-table for permissions
checking purposes only (same as current master).

- Inheritance expansion, tlist expansion, etc then occurrs as normal.

- In planning, in inheritance_planner, if any RTE has any stashed
security quals in its RangeTableEntry, expand_security_qual is invoked.
This iteratively wraps the base relation in a subquery with the saved
security barrier quals, creating nested subqueries around the original
RTE. At each pass resultRelation is changed to point to the new
outer-most subquery.


As a result of this approach everything looks normal to
preprocess_targetlist, row-marking, etc, because they're seeing a normal
RTE_RELATION as resultRelation. The security barrier quals are, at this
stage, stashed aside. If there's inheritance involved, RTEs copied
during appendrel expansion get copies of the security quals on in the
parent RTE.

Problem with inheritance, views, etc in s.b. quals
--

After inheritance expansion, tlist expansion, etc, the s.b. quals are
unpacked to create subqueries wrapping the original RTEs.


So, with:

CREATE TABLE t1 (x float, b integer, secret1 text, secret2 text);
CREATE TABLE t1child (z integer) INHERITS (t1);

INSERT INTO t1 (x, b, secret1, secret2)
VALUES
(0,0,'secret0', 'supersecret'),
(1,1,'secret1', 'supersecret'),
(2,2,'secret2', 'supersecret'),
(3,3,'secret3', 'supersecret'),
(4,4,'secret4', 'supersecret'),
(5,6,'secret5', 'supersecret');

INSERT INTO t1child (x, b, secret1, secret2, z)
VALUES
(8,8,'secret8', 'ss', 8),
(9,9,'secret8', 'ss', 9),
(10,10,'secret8', 'ss', 10);

CREATE VIEW v1
WITH (security_barrier)
AS
SELECT b AS b1, x AS x1, secret1
FROM t1 WHERE b % 2 = 0;

CREATE VIEW v2
WITH (security_barrier)
AS
SELECT b1 AS b2, x1 AS x2
FROM v1 WHERE b1 % 4 = 0;



then a statement like:

UPDATE v2
SET x2 = x2 + 32;

will be rewritten into something like (imaginary sql)

UPDATE t1 WITH SECURITY QUALS ((b % 2 == 0), (b % 4 == 0))
SET x = x + 32

inheritance-expanded and tlist-expanded into something like (imaginary SQL)


UPDATE
 (t1 WITH SECURITY QUALS ((b % 2 == 0), (b % 4 == 0)))
 UNION ALL
 (t1child WITH SECURITY QUALS ((b % 2 == 0), (b % 4 == 0)))
SET x = x + 32;


after which security qual expansion occurs, giving us something like:


UPDATE
 t1, t1child --- resultRelations
 (
SELECT v2.ctid, v2.*
FROM (
  SELECT v1.ctid, v1.*
  FROM (
SELECT t1.ctid, t1.*
FROM t1
WHERE b % 2 == 0
  ) v1
  WHERE b % 4 == 0
) v2

UNION ALL

SELECT v2.ctid, v2.*
FROM (
  SELECT v1.ctid, v1.*
  FROM (
SELECT t1child.ctid, t1child.*
FROM t1child
WHERE b % 2 == 0
  ) v1
  WHERE b % 4 == 0
) v2

 )
SET x = x + 32;


Giving a plan looking like:

EXPLAIN UPDATE v2 SET x2 = 32


QUERY PLAN
---
 Update on t1 t1_2  (cost=0.00..23.35 rows=2 width=76)
   -  Subquery Scan on t1  (cost=0.00..2.18 rows=1 width=74)
 -  Subquery Scan on t1_3  (cost=0.00..2.17 rows=1 width=74)
   Filter: ((t1_3.b % 4) = 0)
   -  Seq Scan on t1 t1_4  (cost=0.00..2.16 rows=1 width=74)
 Filter: ((b % 2) = 0)
   -  Subquery Scan on t1_1  (cost=0.00..21.17 rows=1 width=78)
 -  Subquery Scan on t1_5  (cost=0.00..21.16 rows=1 width=78)
   Filter: ((t1_5.b % 4) = 0)
   -  Seq Scan on t1child  (cost=0.00..21.10 rows=4 width=78)
 Filter: ((b % 2) = 0)
(11 rows)




So far this looks like a really clever approach. My only real concern is
that the security quals are currently

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Heikki Linnakangas


On 11/13/2013 03:09 PM, Rajeev rastogi wrote:

This patch implements the following TODO item:

Add a new eager synchronous mode that starts out synchronous but reverts to 
asynchronous after a failure timeout period
This would require some type of command to be executed to alert administrators 
of this change.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

This patch implementation is in the same line as it was given in the earlier 
thread.
Some Of the additional important changes are:

1.   Have added two GUC variable to take commands from user to be executed

a.   Master_to_standalone_cmd: To be executed before master switches to 
standalone mode.

b.  Master_to_sync_cmd: To be executed before master switches from sync 
mode to standalone mode.

2.   Master mode switch will happen only if the corresponding command 
executed successfully.

3.   Taken care of replication timeout to decide whether synchronous 
standby has gone down. i.e. only after expiry of

wal_sender_timeout, the master will switch from sync mode to standalone mode.

Please provide your opinion or any other expectation out of this patch.


I'm going to say right off the bat that I think the whole notion to 
automatically disable synchronous replication when the standby goes down 
is completely bonkers. If you don't need the strong guarantee that your 
transaction is safe in at least two servers before it's acknowledged to 
the client, there's no point enabling synchronous replication in the 
first place. If you do need it, then you shouldn't fall back to a 
degraded mode, at least not automatically. It's an idea that keeps 
coming back, but I have not heard a convincing argument why it makes 
sense. It's been discussed many times before, most recently in that 
thread you linked to.


Now that I got that out of the way, I concur that some sort of hooks or 
commands that fire when a standby goes down or comes back up makes 
sense, for monitoring purposes. I don't much like this particular 
design. If you just want to write log entry, when all the standbys are 
disconnected, running a shell command seems like an awkward interface. 
It's OK for raising an alarm, but there are many other situations where 
you might want to raise alarms, so I'd rather have us implement some 
sort of a generic trap system, instead of adding this one particular 
extra config option. What do people usually use to monitor replication?


There are two things we're trying to solve here: raising an alarm when 
something interesting happens, and changing the configuration to 
temporarily disable synchronous replication. What would be a good API to 
disable synchronous replication? Editing the config file and SIGHUPing 
is not very nice. There's been talk of an ALTER command to change the 
config, but I'm not sure that's a very good API either. Perhaps expose 
the sync_master_in_standalone_mode variable you have in your patch to 
new SQL-callable functions. Something like:


pg_disable_synchronous_replication()
pg_enable_synchronous_replication()

I'm not sure where that state would be stored. Should it persist 
restarts? And you probably should get some sort of warnings in the log 
when synchronous replication is disabled.


In summary, more work is required to design a good 
user/admin/programming interface. Let's hear a solid proposal for that, 
before writing patches.


BTW, calling an external command with system(), while holding 
SyncRepLock in exclusive-mode, seems like a bad idea. For starters, 
holding a lock will prevent a new WAL sender from starting up and 
becoming a synchronous standby, and the external command might take a 
long time to return.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extra_float_digits and casting from real to numeric

2014-01-08 Thread Christoph Berg

Re: Tom Lane 2014-01-07 14979.1389112...@sss.pgh.pa.us
  But if extra_float_digits  0 is set, I'd expect not only the float4
  output to be affected by it, but also casts to other datatypes,
 
 This proposal scares me.  extra_float_digits is strictly a matter of
 I/O representation, it does not affect any internal calculations.
 Moreover, since one of the fundamental attributes of type numeric
 is that it's supposed to give platform-independent results, I don't
 like the idea that you're likely to get platform-dependent results
 of conversions from float4/float8.

I forgot to mention one bit here, and that's actually what made me
think wtf and post here. The server log is of course also affected
by this, so you even get different parameters depending on
extra_float_digits, yet the numeric result is the same bad one:

2014-01-08 10:13:53 CET LOG:  execute unnamed: INSERT INTO s VALUES($1)
2014-01-08 10:13:53 CET DETAIL:  parameters: $1 = '1.2'
2014-01-08 10:14:18 CET LOG:  execute unnamed: INSERT INTO s VALUES($1)
2014-01-08 10:14:18 CET DETAIL:  parameters: $1 = '1.1797'

Of course this is all consistent and in practice sums up to don't use
real/single...

 I think your customer got bit by his own bad coding practice, and
 that should be the end of it.

What about this patch to mention this gotcha more explicitely in the
documentation?

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 0386330..968f4a7
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** NUMERIC
*** 689,694 
--- 689,697 
literal0/literal, the output is the same on every platform
supported by PostgreSQL.  Increasing it will produce output that
more accurately represents the stored value, but may be unportable.
+   Casts to other numeric datatypes and the literalto_char/literal
+   function are not affected by this setting, it affects only the text
+   representation.
   /para
  /note
  

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 11:07:48 +0200, Heikki Linnakangas wrote:
 I'm going to say right off the bat that I think the whole notion to
 automatically disable synchronous replication when the standby goes down is
 completely bonkers. If you don't need the strong guarantee that your
 transaction is safe in at least two servers before it's acknowledged to the
 client, there's no point enabling synchronous replication in the first
 place.

I think that's likely caused by the misconception that synchronous
replication is synchronous in apply, not just remote write/fsync. I have
now seen several sites that assumed that and just set up sync rep to
maintain that goal to then query standbys instead of the primary after
the commit finished.
If that assumption were true, supporting a timeout that way would
possibly be helpful, but it is not atm...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-08 Thread Matheus de Oliveira

On Tue, Jan 7, 2014 at 10:42 PM, Matheus de Oliveira 
matioli.math...@gmail.com wrote:

 How did you set up the standby? Did you initialize it from an offline
 backup of the master's data directory, perhaps? The log shows that the
 startup took the the crash recovery first, then start archive recovery
 path, because there was no backup label file. In that mode, the standby
 assumes that the system is consistent after replaying all the WAL in
 pg_xlog, which is correct if you initialize from an offline backup or
 atomic filesystem snapshot, for example. But WAL contains references to
 invalid pages could also be a symptom of an inconsistent base backup,
 cause by incorrect backup procedure. In particular, I have to ask because
 I've seen it before: you didn't delete backup_label from the backup, did
 you?


 Well, I cannot answer this right now, but makes all sense and is possible.


I've just confirmed. That was indeed the case, the script was removing the
backup_label. I've just removed this line and synced it again, it is
running nice (for past 1 hour at least).

Thank you guys for all your help, and sorry for all the confusion I caused.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-08 Thread Heikki Linnakangas


On 01/08/2014 02:32 PM, Matheus de Oliveira wrote:

On Tue, Jan 7, 2014 at 10:42 PM, Matheus de Oliveira 
matioli.math...@gmail.com wrote:


How did you set up the standby? Did you initialize it from an offline

backup of the master's data directory, perhaps? The log shows that the
startup took the the crash recovery first, then start archive recovery
path, because there was no backup label file. In that mode, the standby
assumes that the system is consistent after replaying all the WAL in
pg_xlog, which is correct if you initialize from an offline backup or
atomic filesystem snapshot, for example. But WAL contains references to
invalid pages could also be a symptom of an inconsistent base backup,
cause by incorrect backup procedure. In particular, I have to ask because
I've seen it before: you didn't delete backup_label from the backup, did
you?


Well, I cannot answer this right now, but makes all sense and is possible.


I've just confirmed. That was indeed the case, the script was removing the
backup_label. I've just removed this line and synced it again, it is
running nice (for past 1 hour at least).


A-ha! ;-)


Thank you guys for all your help, and sorry for all the confusion I caused.


That seems to be a very common mistake to make. I wish we could do 
something about it. Do you think it would've helped in your case if 
there was a big fat warning in the beginning of backup_label, along the 
lines of: # DO NOT REMOVE THIS FILE FROM A BACKUP ? Any other ideas 
how we could've made it more obvious to the script author to not remove it?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-08 Thread Andres Freund

On 2014-01-08 14:37:34 +0200, Heikki Linnakangas wrote:
 That seems to be a very common mistake to make. I wish we could do something
 about it. Do you think it would've helped in your case if there was a big
 fat warning in the beginning of backup_label, along the lines of: # DO NOT
 REMOVE THIS FILE FROM A BACKUP ? Any other ideas how we could've made it
 more obvious to the script author to not remove it?

I've been wondering about the possibility of setting a boolean in
checkpoint records indicating that a backup label needs to be used when
starting from that checkpoint. That boolean would only get checked when
using a recovery.conf and we've started with pg_control indicating that
it was written by a primary (i.e. state = DB_SHUTDOWNING).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-08 Thread Andres Freund

On 2014-01-07 22:42:59 -0200, Matheus de Oliveira wrote:
 @andres, if it is really removing backup_label it could also cause that
 other issue we saw on Monday, right? (yes I did run the same script).

It might be in your case since that's an easy to way to generate that
situation, but there have been several other reports of that bug, so
it's good that we've discussed it ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-08 Thread Heikki Linnakangas


On 01/08/2014 07:29 AM, Greg Stark wrote:

On Tue, Jan 7, 2014 at 11:36 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is
incorrect, but I don't immediately see how that could cause the PANIC. Why
is the page uninitialized in the standby? If VACUUM is removing some dead
tuples from it, it certainly should exist and be correctly initialized.


Unless the vacuum subsequently truncated the file to be shorter and
the backup was taken after that?


In that case WAL replay should also see the truncation record before 
reaching consistency. We only PANIC on an uninitialized/missing page 
after reaching consistency, before that it's indeed normal if the file 
was later truncated or deleted.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Specifying both recovery_target_xid and recovery_target_time

2014-01-08 Thread Heikki Linnakangas


The docs say:


At most one of recovery_target_time, recovery_target_name or 
recovery_target_xid can be specified


However, the code actually allows them all to be specified at the same time:


else if (strcmp(item-name, recovery_target_name) == 0)
{
/*
 * if recovery_target_xid specified, then this overrides
 * recovery_target_name
 */
if (recoveryTarget == RECOVERY_TARGET_XID)
continue;
recoveryTarget = RECOVERY_TARGET_NAME;


The precedence is XID, time, name.

I think the documented behavior would make more sense, ie. throw an 
error if you try to specify multiple targets. Anyone remember if that 
was intentional? Any objections to change the code to match the docs, in 
master?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying both recovery_target_xid and recovery_target_time

2014-01-08 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 The docs say:
 At most one of recovery_target_time, recovery_target_name or 
 recovery_target_xid can be specified

 However, the code actually allows them all to be specified at the same time:

 else if (strcmp(item-name, recovery_target_name) == 0)
 {
 /*
 * if recovery_target_xid specified, then this overrides
 * recovery_target_name
 */
 if (recoveryTarget == RECOVERY_TARGET_XID)
 continue;
 recoveryTarget = RECOVERY_TARGET_NAME;

 The precedence is XID, time, name.

 I think the documented behavior would make more sense, ie. throw an 
 error if you try to specify multiple targets. Anyone remember if that 
 was intentional? Any objections to change the code to match the docs, in 
 master?

Hm.  I can see potential uses for specifying more than one if the behavior
were OR, that is stop as soon as any of the specified conditions is
satisfied.

It looks like the actual behavior is to randomly choose one specified
mode and ignore the others, which I concur is bogus.  But maybe we
should try to do something useful instead of just throwing an error.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

2014-01-08 Thread Andres Freund

On 2014-01-07 17:54:21 +0100, Andres Freund wrote:
 On 2013-12-12 16:49:33 +0100, Andres Freund wrote:
  On 2013-12-12 10:01:21 -0500, Robert Haas wrote:
   On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund and...@2ndquadrant.com 
   wrote:
As far as I have been thinking of, this would be another catalog table 
like
pg_decoding_plugin(oid, dpname name, dpload regproc).
   
   Instead of adding another catalog table, I think we should just define
   a new type.  Again, please look at the way that foreign data wrappers
   do this:
  
  I don't really see what the usage of a special type has to do with this,
  but I think that's besides your main point. What you're saying is that
  the output plugin is just defined by a function name, possibly schema
  prefixed. That has an elegance to it. +1
 
 Ok, so I've implemented this, but I am not so sure it's sufficient,
 there's some issue:
 Currently a logical replication slot has a plugin assigned, previously
 that has just been identified by the basename of a .so. But with the
 above proposal the identifier is pointing to a function, currently via
 its oid. But what happens if somebody drops or recreates the function?
 We can't make pg_depend entries or anything since that won't work on a
 standby.
 Earlier, if somebody removed the .so we'd just error out, but pg's
 dependency tracking always only mattered to things inside the catalogs.
 
 I see the following possible solutions for this:
 
 1) accept that fact, and throw an error if the function doesn't exist
 anymore, or has an unsuitable signature. We can check the return type of
 output_plugin_callbacks, so that's a pretty specific test.
 
 2) Create a pg_output_plugin catalog and prevent DROP OUTPUT PLUGIN (or
 similar) when there's a slot defined. But how'd that work if the slot is
 only defined on standbys? We could have the redo routine block and/or
 kill the slot if necessary?
 
 3) Don't assign a specific output plugin to a slot, but have it
 specified everytime data is streamed, not just when a slot is
 created. Currently that wouldn't be a problem, but I am afraid it will
 constrict some future optimizations.
 
 Good ideas?

So, Robert and I had a IM discussion about this. Neither of us was
particularly happy about the proposed solutions.

So, what we've concluded is that using a function as the handler doesn't
work out well enough given the constraints (primarily the inability to
create dependency records on a HS node). We've concluded that the best
way forward is a variant of the current implementation where the output
plugin is specified as a dynamic library. Which is:
CREATE_REPLICATION_SLOT slot_name LOGICAL OUTPUT_PLUGIN library_name;
but in contrast to the current code where each individual output plugin
callback is dlsym()ed via a fixed function name, only a
_PG_output_plugin_init() function is looked up  called which fills out
a struct containing the individual callbacks. Additionally the init
and cleanup output plugin callbacks will be renamed to
startup/shutdown to avoid possible confusions.

This unfortunately still prohibits implementing output plugins within
the core postgres binary, but that can be solved by shipping
core-provided output plugins - should they ever exist - as shared
objects, like it's already done for libpqwalreceiver.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple improvements to freespace allocation

2014-01-08 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 01/08/2014 08:56 AM, Simon Riggs wrote:
 * IN-MEMORY
 A large table may only have some of its blocks in memory. It would be
 useful to force a NBT to be a block already in shared_buffers IFF a
 table is above a certain size (use same threshold as seq scans, i.e.
 25% of shared_buffers). That may be difficult to achieve in practice,
 so not sure about this one. Like it? Any ideas?

 Yeah, that seems nice, although I have feeling that it's not worth the 
 complexity.

Not only would that be rather expensive to do, but I think it would be
self-defeating.  Pages that are in memory would be particularly likely
to have been modified by someone else recently, so that the FSM's info
about their available space is stale, and thus once you actually got
to the page it'd be more likely to not have the space you need.

The existing FSM algorithm is intentionally designed to hand out pages
that nobody else has tried to insert into lately, with one goal being to
minimize the number of retries needed because of stale info.  (Or at
least, it worked that way originally, and I don't think Heikki's rewrite
changed that aspect.)  I'm concerned that the alternatives Simon proposes
would lead to more processes ganging up on the same pages, with not only a
direct cost in concurrency but an indirect cost in repeated FSM searches
due to believing stale available-space data.  Indeed, a naive
implementation could easily get into infinite loops of handing back the
same page.

I realize that the point is exactly to sacrifice some insertion
performance in hopes of getting better table packing, but it's not clear
to me that there's an easy fix that makes packing better without a very
large hit on the other side.

Anyway, these fears could be proven or disproven with some benchmarking of
a trial patch, and I have no objection if Simon wants to do that
experimentation.  But I'd be hesitant to see such a feature committed
in advance of experimental proof that it's actually useful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

2014-01-08 Thread taskov

Hello,
could you tell me where I can find the latest version of the MERGE PATCH
file? I need to use it on PostgreSQL 9.3.
I couldn't find it anywhere in git.

Regards,
Nikolay




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/gSoC-ADD-MERGE-COMMAND-code-patch-submission-tp1956415p5785822.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Simon Riggs

On 8 January 2014 09:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 I'm going to say right off the bat that I think the whole notion to
 automatically disable synchronous replication when the standby goes down is
 completely bonkers.

Agreed

We had this discussion across 3 months and we don't want it again.
This should not have been added as a TODO item.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-08 Thread Simon Riggs

On 4 January 2014 13:01, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:

 I'd suggest making the view on top of an SRF like pg_stat_replication
 and pg_stat_activity (for example), instead of a whole lot of separate
 function calls like the older stats views.

 Ok, good idea.

Not sure I see why it needs to be an SRF. It only returns one row.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying both recovery_target_xid and recovery_target_time

2014-01-08 Thread Simon Riggs

On 8 January 2014 15:38, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 The docs say:

 At most one of recovery_target_time, recovery_target_name or
 recovery_target_xid can be specified


 However, the code actually allows them all to be specified at the same time:

 else if (strcmp(item-name, recovery_target_name) == 0)
 {
 /*
  * if recovery_target_xid specified, then this
 overrides
  * recovery_target_name
  */
 if (recoveryTarget == RECOVERY_TARGET_XID)
 continue;
 recoveryTarget = RECOVERY_TARGET_NAME;


 The precedence is XID, time, name.

 I think the documented behavior would make more sense, ie. throw an error if
 you try to specify multiple targets. Anyone remember if that was
 intentional? Any objections to change the code to match the docs, in master?

It seems like I was grasping at some meaning but didn't quite achieve it.

Changing it to mean OR would make sense, but that would be more work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-08 Thread Magnus Hagander

On Wed, Jan 8, 2014 at 6:42 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 4 January 2014 13:01, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:

  I'd suggest making the view on top of an SRF like pg_stat_replication
  and pg_stat_activity (for example), instead of a whole lot of separate
  function calls like the older stats views.
 
  Ok, good idea.

 Not sure I see why it needs to be an SRF. It only returns one row.


Good point, it could/should be a general function returning a composite
type.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] BDR-project

2014-01-08 Thread Simon Riggs

On 25 December 2013 12:01, Andreas Joseph Krogh andr...@officenet.no wrote:

 Ref:
 http://wiki.postgresql.org/wiki/BDR_Project

 Is implementing main BDR features into core Postgres a probable objective to
 version 9.4?

I've not updated hackers recently on this point, so thanks for asking
the question. This was discussed in the ending keynote of the PG
Europe conference, but I appreciate that's not the same thing as
saying it here.

The plan is

* submit the core logical replication technology for 9.4
* submit online upgrade as a feature for 9.5, allowing upgrades from 9.4+
* submit full BDR features for 9.6

BDR code will be released as a separate open source project until/if
core accepts/modifies that. There's lots of work and discussion to be
had yet, so the above plan is a reasonable schedule for achieving
change allowing input from all. Design to full feature submission
would be 4.5 years, plus we expect the features to mature/extend after
that, so there's no rush, just steady movement.

No attempts to publicise that as yet, but if all goes well we expect
to do that once 9.4 is released.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP patch: improve documentation for user defined final functions of aggregates

2014-01-08 Thread Mark Dilger

The fact that a final function may be called
multiple times does not appear to be mentioned
in the docs, and a reasonable reading of the
docs gives the alternate impression.  For
people writing final functions in C, there
should be some warning about changing the
transition value, such as exists in the comments
in src/backend/executor/nodeWindowAgg.c


I'm not sure if this warning should be placed
in the docs of the CREATE AGGREGATE command, or
in section 35.10.  The documentation for the
CREATE AGGREGATE command talks more about what
final functions do, and hence is the more natural
placement in that sense, but for users who are
not doing C programming, changing the state of
the transition value is probably not a problem.

WIP WIP WIP WIP WIP


diff --git a/doc/src/sgml/ref/create_aggregate.sgml 
b/doc/src/sgml/ref/create_aggregate.sgml
index d15fcba..d4603cc 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -111,6 +111,12 @@ CREATE AGGREGATE replaceable 
class=PARAMETERname/replaceable (
   /para
 
   para
+   The final function may be invoked at other times during the processing
+   of all rows and must not damage the internal state value.  It is unwise
+   to assume that the final function will only be invoked once.
+  /para
+
+  para
    An aggregate function can provide an initial condition,
    that is, an initial value for the internal state value.
    This is specified and stored in the database as a value of type


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: improve documentation for user defined final functions of aggregates

2014-01-08 Thread Tom Lane

Mark Dilger markdil...@yahoo.com writes:
 The fact that a final function may be called
 multiple times does not appear to be mentioned
 in the docs,

It's in xaggr.sgml, near the discussion of AggCheckCallContext
(the end of section 35.10, in current HEAD's section numbering).

In general, people writing C functions should not expect that they
can scribble on pass-by-ref arguments, so I don't see a need to
belabor the point in multiple places.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus

Hackers,

Per the Developer Meeting, we are scheduled to do a final triage of 9.4
patches the week before CF4 starts, which is *now*.  The goal of this
triage is to divide patches already in queue into 5 groups:

1) Good To Go: patches which are 100% ready for final testing and commit.

2) Needs a Little Work: smaller patches which can be included in 9.4 if
they get a few hours of love from a committer or major hacker.

3) Big Patches: big, important patches which will need a major time
committement to commit even though they are 90% ready, just due to size.

4) Not Nearly Ready: Patches which need major work and/or spec
discussions before commitment.

5) WIP: any patch which is acknowledged just there for review, or any
brand-new patch which wasn't in CF3 and is non-trivial.

Obviously, any patches in groups 4 and 5 aren't going into 9.4.  The
idea is to handle patches in CF4 in this order:

1. do immediately

2. do after (1) is complete

3. assign 1 senior hacker reviewer to each patch

4. review as time permits after 1-3

5. review as time permits after 1-3

Let the triage begin!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-08 Thread Robert Haas

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:
 The point remains that you need to duplicate it into every process that
 might
 want to use it subsequently, so it makes sense to DuplicateHandle into the
 parent, and then to advertise that  handle value publicly so that other
 child
 processes can DuplicateHandle it back into their own process.

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

 I think one has to try this to see if it works as per the need. If it's not
 urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BDR-project

2014-01-08 Thread Andreas Joseph Krogh

På onsdag 08. januar 2014 kl. 18:57:52, skrev Simon Riggs si...@2ndquadrant.com
 mailto:si...@2ndquadrant.com: On 25 December 2013 12:01, Andreas Joseph 
Krogh andr...@officenet.no wrote:

  Ref:
  http://wiki.postgresql.org/wiki/BDR_Project
 
  Is implementing main BDR features into core Postgres a probable objective to
  version 9.4?

 I've not updated hackers recently on this point, so thanks for asking
 the question. This was discussed in the ending keynote of the PG
 Europe conference, but I appreciate that's not the same thing as
 saying it here.

 The plan is

 * submit the core logical replication technology for 9.4
 * submit online upgrade as a feature for 9.5, allowing upgrades from 9.4+
 * submit full BDR features for 9.6

 BDR code will be released as a separate open source project until/if
 core accepts/modifies that. There's lots of work and discussion to be
 had yet, so the above plan is a reasonable schedule for achieving
 change allowing input from all. Design to full feature submission
 would be 4.5 years, plus we expect the features to mature/extend after
 that, so there's no rush, just steady movement.

 No attempts to publicise that as yet, but if all goes well we expect
 to do that once 9.4 is released.   Thanks for the update!   --
 Andreas Joseph Krogh andr...@officenet.no      mob: +47 909 56 963
 Senior Software Developer / CTO - OfficeNet AS - http://www.officenet.no
 Public key: http://home.officenet.no/~andreak/public_key.asc

Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters

2014-01-08 Thread Robert Haas

On Mon, Jan 6, 2014 at 11:37 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 12:52 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 9:48 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Couldn't we also handle this by postponing FreeConfigVariables until
 after the if (error) block?

Wouldn't doing that way can lead to bigger memory leak, if error level
is ERROR. Though in current fix also it can leak memory but it will be
just for ErrorConfFile_save. I think some similar case can happen for
'pre_value' in code currently as well, that's why I have fixed it in a
similar way in patch.

 I was assuming that error-recovery would reset the containing memory
 context, but I'm not sure what memory context we're executing in at
 this point.

 This function is called from multiple places and based on when it would
 get called the memory context varies. During Startup, it gets called in
 Postmaster context and if some backend runs pg_reload_conf(), then
 it will get called from other background processes (WAL Writer,
 Checpointer, etc..) in their respective contexts (for WAL Writer, the
 context will be WAL Writer, ..).

 In current code, the only time it can go to error path with elevel as
 ERROR is during Postmaster startup
 (context == PGC_POSTMASTER), at which it will anyway upgrade
 ERROR to FATAL, so it should not be a problem to move
 function FreeConfigVariables() after error block check. However
 in future, if someone added any more ERROR (the chances of which
 seems to be quite less), it can cause leak, may be thats why original
 code has been written that way.

 If you think it's better to fix by moving FreeConfigVariables() after error
 block check, then I can update the patch by doing so and incorporate other
 change (directly use PG_AUTOCONF_FILENAME) suggested by you
 as well?

Yeah, let's do it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread David Fetter

On Wed, Jan 08, 2014 at 10:45:37AM -0800, Josh Berkus wrote:
 Hackers,
 
 Per the Developer Meeting, we are scheduled to do a final triage of 9.4
 patches the week before CF4 starts, which is *now*.  The goal of this
 triage is to divide patches already in queue into 5 groups:

With utmost respect, there are unsent patches which don't fit into the
categories below.  Are you saying now that the cut-off is today, and
not actually 1/15, as everyone understands it to be?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Stephen Frost

* David Fetter (da...@fetter.org) wrote:
 On Wed, Jan 08, 2014 at 10:45:37AM -0800, Josh Berkus wrote:
  Per the Developer Meeting, we are scheduled to do a final triage of 9.4
  patches the week before CF4 starts, which is *now*.  The goal of this
  triage is to divide patches already in queue into 5 groups:
 
 With utmost respect, there are unsent patches which don't fit into the
 categories below.  Are you saying now that the cut-off is today, and
 not actually 1/15, as everyone understands it to be?

The categories are based on the degree of completion- how can a patch
not fit into one of those?

As for the cut-off, I would guess that the by-the-book approach would
be to allow until the 15th, but if it's a big patch and it hasn't been
seen in prior CF's then it shouldn't particularly matter.  That doesn't
preclude us from working on the triage in advance of the CF starting.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus

On 01/08/2014 11:07 AM, David Fetter wrote:
 On Wed, Jan 08, 2014 at 10:45:37AM -0800, Josh Berkus wrote:
 Hackers,

 Per the Developer Meeting, we are scheduled to do a final triage of 9.4
 patches the week before CF4 starts, which is *now*.  The goal of this
 triage is to divide patches already in queue into 5 groups:
 
 With utmost respect, there are unsent patches which don't fit into the
 categories below.  Are you saying now that the cut-off is today, and
 not actually 1/15, as everyone understands it to be?

It has always been our policy that non-trivial patches which appear for
the first time in CF4 go to the bottom of the priority list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-08 Thread knizhnik


On 01/08/2014 10:51 PM, Robert Haas wrote:

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:

The point remains that you need to duplicate it into every process that
might
want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that  handle value publicly so that other
child
processes can DuplicateHandle it back into their own process.

Well, right now we just reopen the same object from all of the
processes, which seems to work fine and doesn't require any of this
complexity.  The only problem I don't know how to solve is how to make
a segment stick around for the whole postmaster lifetime.  If
duplicating the handle into the postmaster without its knowledge gets
us there, it may be worth considering, but that doesn't seem like a
good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.


I wonder what is the intended use case of dynamic shared memory?
Is is primarly oriented on PostgreSQL extensions or it will be used also 
in PosatgreSQL core?
In case of extensions, shared memory may be needed to store some 
collected/calculated information which will be used by extension functions.


The main advantage of DSM (from my point of view) comparing with existed 
mechanism of preloaded extension is that it is not necessary to restart 
server to add new extension requiring shared memory.
DSM segment can be attached or created by _PG_init function of the 
loaded module.
But there will be not so much sense in this mechanism if this segment 
will be deleted when there are no more processes attached to it.
So to make DSM really useful for extension it needs some mechanism to 
pin segment in memory during all server/extension lifetime.


May be I am wrong, but I do not see some reasons for creating multiple 
DSM segments by the same extension.
And total number of DSM segments is expected to be not very large (10). 
The same is true for synchronization primitives (LWLocks for example) 
needed to synchronize access to this DSM segments. So I am not sure if 
possibility to place locks in DSM is really so critical...
We can just reserved some space for LWLocks which can be used by 
extension, so that LWLockAssign() can be used without 
RequestAddinLWLocks or RequestAddinLWLocks can be used not only from 
preloaded extension.


IMHO the main trouble with DSM is lack of guarantee that segment is 
always mapped to the same virtual address.
Without such guarantee it is not possible to use direct (normal) 
pointers inside DSM.

But there seems to be no reasonable solution.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Tom Lane

David Fetter da...@fetter.org writes:
 With utmost respect, there are unsent patches which don't fit into the
 categories below.  Are you saying now that the cut-off is today, and
 not actually 1/15, as everyone understands it to be?

I think Josh is merely suggesting that we could start triaging the
patches we have in hand.  If you're aware of new submissions pending
for any of them, of course, it wouldn't hurt to tell people about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Andres Freund

On 2014-01-08 14:50:52 -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  With utmost respect, there are unsent patches which don't fit into the
  categories below.  Are you saying now that the cut-off is today, and
  not actually 1/15, as everyone understands it to be?
 
 I think Josh is merely suggesting that we could start triaging the
 patches we have in hand.  If you're aware of new submissions pending
 for any of them, of course, it wouldn't hurt to tell people about it.

What about doing the triage the first two days or so of the CF? Then we
know which patches have been submitted/updated and we haven't yet
performed too many reviews of patches which aren't going to make it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus


 What about doing the triage the first two days or so of the CF? Then we
 know which patches have been submitted/updated and we haven't yet
 performed too many reviews of patches which aren't going to make it.

If we triage the patches we already have NOW, then it'll be possible to
triage the new/updated stuff which comes in in the first 48 hours of the
CF.  If we wait until the CF begins, we'll spend at least the first week
of the CF triaging.

That's why we set this schedule at the developer meeting.

And besides, we already know what category *your* patch belongs in ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Bruce Momjian

On Wed, Jan  8, 2014 at 05:39:23PM +, Simon Riggs wrote:
 On 8 January 2014 09:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
  I'm going to say right off the bat that I think the whole notion to
  automatically disable synchronous replication when the standby goes down is
  completely bonkers.
 
 Agreed
 
 We had this discussion across 3 months and we don't want it again.
 This should not have been added as a TODO item.

I am glad Heikki and Simon agree, but I don't.  ;-)

The way that I understand it is that you might want durability, but
might not want to sacrifice availability.  Phrased that way, it makes
sense, and notifying the administrator seems the appropriate action.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Hans-Jürgen Schönig


On Jan 8, 2014, at 9:27 PM, Bruce Momjian wrote:

 On Wed, Jan  8, 2014 at 05:39:23PM +, Simon Riggs wrote:
 On 8 January 2014 09:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
 I'm going to say right off the bat that I think the whole notion to
 automatically disable synchronous replication when the standby goes down is
 completely bonkers.
 
 Agreed
 
 We had this discussion across 3 months and we don't want it again.
 This should not have been added as a TODO item.
 
 I am glad Heikki and Simon agree, but I don't.  ;-)
 
 The way that I understand it is that you might want durability, but
 might not want to sacrifice availability.  Phrased that way, it makes
 sense, and notifying the administrator seems the appropriate action.
 

technically and conceptually i agree with andres and simon but from daily 
experience i would say that we should make it configurable.
some people got some nasty experiences when their systems stopped working.

+1 for a GUC to control this one.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Alvaro Herrera

Alvaro Herrera escribió:
 Robert Haas escribió:
 
  I think this direction has some potential.  I'm not sure it's right in
  detail.  The exact scheme you propose above won't work if you want to
  leave out the schema name altogether, and more generally it's not
  going to help very much with anything other than substituting in
  identifiers.  What if you want to add a column called satellite_id to
  every table that gets created, for example?  What if you want to make
  the tables UNLOGGED?  I don't see how that kind of things is going to
  work at all cleanly.
 
 Thanks for the discussion.  I am building some basic infrastructure to
 make this possible, and will explore ideas to cover these oversights
 (not posting anything concrete yet because I expect several iterations
 to crash and burn before I have something sensible to post).

Here's a working example.  Suppose the user runs

CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

In an event trigger, the function pg_event_trigger_get_creation_commands()
returns the following JSON blob:

{authorization:{authorization_role:some guy,
  output:AUTHORIZATION %i{authorization_role}},
 if_not_exists:IF NOT EXISTS,
 name:some schema,
 output:CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}}

wherein I have chosen to have a JSON element with the hardcoded name of
output which is what needs to be expanded; for each %{} parameter
found in it, there is an equally-named element in the JSON blob.  This
can be a string, a NULL, or another JSON object.

If it's a string, it expands to that value; if it's an object,
recursively an output element is expanded in the same way, and the
expanded string is used.

If there's a NULL element when expanding an object, the whole thing
expands to empty.  For example, if no AUTHORIZATION
clause is specified, authorization element is still there, but the
authorization_role element within it is NULL, and so the whole
AUTHORIZATION clause expands to empty and the resulting command contains
no authorization clause.  This is useful to support the case that
someone doesn't have an AUTHORIZATION clause in the CREATE SCHEMA
command, and the event trigger injects one simply by setting the
authorization_role to some role name.

IF NOT EXISTS is handled by defining it to either the string IF NOT
EXISTS or to empty if no such clause was specified.

The user can modify elements in the JSON to get a different version of
the command.  (I reckon the output can also be modified, but this is
probably a bad idea in most/all cases.  I don't think there's a need to
prohibit this explicitely.)  Also, someone might define if_not_exists
to something completely unrelated, but that would be their own fault.
(Maybe we can have some cross-check that the if_not_exists element in
JSON cannot be anything other than IF NOT EXISTS or the empty string;
and that the output element remains the same at expansion time than it
was at generation time.  Perhaps we should even hide the output
element from the user completely and only add them to the JSON at time
of expansion.  Not sure it's worth the trouble.)

There is another function,
pg_event_trigger_expand_creation_command(json), which will expand the
above JSON blob and return the following text:

CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy

Note the identifiers are properly quoted (there are quotes in the JSON
blob, but they correspond to JSON's own delimiters).  I have defined a
'i' modifier to have %i{} elements, which means that the element is an
identifier which might need quoting.

I have also defined a %d{} modifier that means to use the element to
expand a possibly-qualified dotted name.  (There would be no output
element in this case.)  This is to support the case where you have 

CREATE TABLE public.foo
which results in
{table_name:{schema:public,
   relname:foo}}

and you want to edit the table_name element in the root JSON and set
the schema to something else (perhaps NULL), so in the event trigger
after expansion you can end up with CREATE TABLE foo or CREATE TABLE
private.foo or whatever.

Most likely there are some more rules that will need to be created, but
so far this looks sensible.

I'm going to play some more with the %d{} stuff, and also with the idea
of representing table elements such as columns and constraints as an
array.  In the meantime please let me know whether this makes sense.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Heikki Linnakangas


On 01/08/2014 10:27 PM, Bruce Momjian wrote:

On Wed, Jan  8, 2014 at 05:39:23PM +, Simon Riggs wrote:

On 8 January 2014 09:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:


I'm going to say right off the bat that I think the whole notion to
automatically disable synchronous replication when the standby goes down is
completely bonkers.


Agreed

We had this discussion across 3 months and we don't want it again.
This should not have been added as a TODO item.


I am glad Heikki and Simon agree, but I don't.  ;-)

The way that I understand it is that you might want durability, but
might not want to sacrifice availability.  Phrased that way, it makes
sense, and notifying the administrator seems the appropriate action.


They want to have the cake and eat it too. But they're not actually 
getting that. What they actually get is extra latency when things work, 
with no gain in durability.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to reproduce serialization failure for a read only transaction.

2014-01-08 Thread Florian Pflug

On Jan7, 2014, at 20:11 , Kevin Grittner kgri...@ymail.com wrote:
 Yeah, neither of the provided examples rolled back the read only
 transaction itself;

Actually, the fixed version [1] of my example does.

[1] 
http://www.postgresql.org/message-id/8721aad3-7a3a-4576-b10e-f2cbd1e53...@phlo.org

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Bruce Momjian

On Wed, Jan  8, 2014 at 10:46:51PM +0200, Heikki Linnakangas wrote:
 On 01/08/2014 10:27 PM, Bruce Momjian wrote:
 On Wed, Jan  8, 2014 at 05:39:23PM +, Simon Riggs wrote:
 On 8 January 2014 09:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
 I'm going to say right off the bat that I think the whole notion to
 automatically disable synchronous replication when the standby goes down is
 completely bonkers.
 
 Agreed
 
 We had this discussion across 3 months and we don't want it again.
 This should not have been added as a TODO item.
 
 I am glad Heikki and Simon agree, but I don't.  ;-)
 
 The way that I understand it is that you might want durability, but
 might not want to sacrifice availability.  Phrased that way, it makes
 sense, and notifying the administrator seems the appropriate action.
 
 They want to have the cake and eat it too. But they're not actually
 getting that. What they actually get is extra latency when things
 work, with no gain in durability.

They are getting guaranteed durability until they get a notification ---
that seems valuable.  When they get the notification, they can
reevaluate if they want that tradeoff.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to reproduce serialization failure for a read only transaction.

2014-01-08 Thread Kevin Grittner

Florian Pflug f...@phlo.org wrote:
 On Jan7, 2014, at 20:11 , Kevin Grittner kgri...@ymail.com wrote:

 Yeah, neither of the provided examples rolled back the read only
 transaction itself;

 Actually, the fixed version [1] of my example does.

 [1] 
 http://www.postgresql.org/message-id/8721aad3-7a3a-4576-b10e-f2cbd1e53...@phlo.org

Due to my lame email provider, that post didn't show for me until I
had already replied.  :-(  You had already showed an example almost
exactly like what I described in my post.  I tweaked it a bit more
for the Wiki page to show more clearly why SSI has to care about
what the writing transaction reads.  For all the database engine
knows, what was read contributed to whether the application allowed
it to successfully commit.  By using the value from the SELECT in
the UPDATE it is easier to see why it matters, although it needs to
be considered either way.

In other words, we seem to be in full agreement, just using
different language to describe it.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 Heikki Linnakangas wrote:

 They want to have the cake and eat it too. But they're not
 actually getting that. What they actually get is extra latency
 when things work, with no gain in durability.

 They are getting guaranteed durability until they get a
 notification --- that seems valuable.  When they get the
 notification, they can reevaluate if they want that tradeoff.

My first reaction to this has been that if you want synchronous
replication without having the system wait if the synchronous
target goes down, you should configure an alternate target.  With
the requested change we can no longer state that when a COMMIT
returns with an indication of success that the data has been
persisted to multiple clusters.  We would be moving to a situation
where the difference between synchronous is subtle -- either way
the data may or may not be on a second cluster by the time the
committer is notified of success.  We wait up to some threshold
time to try to make the success indication indicate that, but then
return success even if the guarantee has not been provided, without
any way for the committer to know the difference.

On the other hand, we keep getting people saying they want the
database to make the promise of synchronous replication, and tell
applications that it has been successful even when it hasn't been,
as long as there's a line in the server log to record the lie.  Or,
more likely, to record the boundaries of time blocks where it has
been a lie.  This appears to be requested because other products
behave that way.

I'm torn on whether we should cave to popular demand on this; but
if we do, we sure need to be very clear in the documentation about
what a successful return from a commit request means.  Sooner or
later, Murphy's Law being what it is, if we do this someone will
lose the primary and blame us because the synchronous replica is
missing gobs of transactions that were successfully committed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 I'm torn on whether we should cave to popular demand on this; but
 if we do, we sure need to be very clear in the documentation about
 what a successful return from a commit request means.  Sooner or
 later, Murphy's Law being what it is, if we do this someone will
 lose the primary and blame us because the synchronous replica is
 missing gobs of transactions that were successfully committed.

I'm for not caving.  I think people who are asking for this don't
actually understand what they'd be getting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:
 On the other hand, we keep getting people saying they want the
 database to make the promise of synchronous replication, and tell
 applications that it has been successful even when it hasn't been,
 as long as there's a line in the server log to record the lie.

Most people having such a position I've talked to have held that
position because they thought synchronous replication would mean that
apply (and thus visibility) would also be synchronous. Is that
different from your experience?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus

On 01/08/2014 12:27 PM, Bruce Momjian wrote:
 I am glad Heikki and Simon agree, but I don't.  ;-)
 
 The way that I understand it is that you might want durability, but
 might not want to sacrifice availability.  Phrased that way, it makes
 sense, and notifying the administrator seems the appropriate action.

I think there's a valid argument to want things the other way, but I
find the argument not persuasive.  In general, people who want
auto-degrade for sync rep either:

a) don't understand what sync rep actually does (lots of folks confuse
synchronous with simultaneous), or

b) want more infrastructure than we actually have around managing sync
replicas

Now, the folks who want (b) have a legitimate need, and I'll point out
that we always planned to have more features around sync rep, it's just
that we never actually worked on any.  For example, quorum sync was
extensively discussed and originally projected for 9.2, only certain
hackers changed jobs and interests.

If we just did the minimal change, that is, added an auto-degrade GUC
and an alert to the logs each time the master server went into degraded
mode, as Heikki says we'd be loading a big foot-gun for a bunch of
ill-informed DBAs.  People who want that are really much better off with
async rep in the first place.

If we really want auto-degrading sync rep, then we'd (at a minimum) need
a way to determine *from the replica* whether or not it was in degraded
mode when the master died.  What good do messages to the master log do
you if the master no longer exists?

Mind you, being able to determine on the replica whether it was
synchronous or not when it lost communication with the master would be a
great feature to have for sync rep groups as well, and would make them
practical (right now, they're pretty useless).  However, I seriously
doubt that someone is going to code that up in the next 5 days.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 01:34 PM, Kevin Grittner wrote:


I'm torn on whether we should cave to popular demand on this; but
if we do, we sure need to be very clear in the documentation about
what a successful return from a commit request means.  Sooner or
later, Murphy's Law being what it is, if we do this someone will
lose the primary and blame us because the synchronous replica is
missing gobs of transactions that were successfully committed.


I am trying to follow this thread and perhaps I am just being dense but 
it seems to me that:


If you are running synchronous replication, as long as the target 
(subscriber) is up, synchronous replication operates as it should. That 
is that the origin will wait for a notification from the subscriber that 
the write has been successful before continuing.


However, if the subscriber is down, the origin should NEVER wait. That 
is just silly behavior and makes synchronous replication pretty much 
useless. Machines go down, that is the nature of things. Yes, we should 
log and log loudly if the subscriber is down:


ERROR: target xyz is non-communicative: switching to async replication.

We then should store the wal logs up to wal_keep_segments.

When the subscriber comes back up, it will then replicate in async mode 
until the two are back in sync and then switch (perhaps by hand) to sync 
mode. This of course assumes that we have a valid database on the 
subscriber and we have not overrun wal_keep_segments.


Sincerely,

Joshua D. Drake



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Heikki Linnakangas


On 01/08/2014 11:37 PM, Andres Freund wrote:

On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:

On the other hand, we keep getting people saying they want the
database to make the promise of synchronous replication, and tell
applications that it has been successful even when it hasn't been,
as long as there's a line in the server log to record the lie.


Most people having such a position I've talked to have held that
position because they thought synchronous replication would mean that
apply (and thus visibility) would also be synchronous.


And I totally agree that it would be a useful mode if apply was 
synchronous. You could then build a master-standby pair where it's 
guaranteed that when you commit a transaction in the master, it's 
thereafter always seen as committed in the standby too. In that usage, 
if the link between the two is broken, you could set up timeouts e.g so 
that the standby stops accepting new queries after 20 seconds, and then 
the master proceeds without the standby after 25 seconds. Then the 
guarantee would hold.


I don't know if the people asking for the fallback mode are thinking 
that synchronous replication means synchronous apply, or if they're 
trying to have the cake and eat it too wrt. durability and availability.


Synchronous apply would be cool..

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 If we really want auto-degrading sync rep, then we'd (at a minimum) need
 a way to determine *from the replica* whether or not it was in degraded
 mode when the master died.  What good do messages to the master log do
 you if the master no longer exists?

How would it be possible for a replica to know whether the master had
committed more transactions while communication was lost, if the master
dies without ever restoring communication?  It sounds like pie in the
sky from here ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Joshua D. Drake j...@commandprompt.com writes:
 However, if the subscriber is down, the origin should NEVER wait. That 
 is just silly behavior and makes synchronous replication pretty much 
 useless. Machines go down, that is the nature of things. Yes, we should 
 log and log loudly if the subscriber is down:

 ERROR: target xyz is non-communicative: switching to async replication.

 We then should store the wal logs up to wal_keep_segments.

 When the subscriber comes back up, it will then replicate in async mode 
 until the two are back in sync and then switch (perhaps by hand) to sync 
 mode. This of course assumes that we have a valid database on the 
 subscriber and we have not overrun wal_keep_segments.

It sounds to me like you are describing the existing behavior of async
mode, with the possible exception of exactly what shows up in the
postmaster log.

Sync mode is about providing a guarantee that the data exists on more than
one server *before* we tell the client it's committed.  If you don't need
that guarantee, you shouldn't be using sync mode.  If you do need it,
it's not clear to me why you'd suddenly not need it the moment the going
actually gets tough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:

 On the other hand, we keep getting people saying they want the
 database to make the promise of synchronous replication, and
 tell applications that it has been successful even when it
 hasn't been, as long as there's a line in the server log to
 record the lie.

 Most people having such a position I've talked to have held that
 position because they thought synchronous replication would mean
 that apply (and thus visibility) would also be synchronous. Is
 that different from your experience?

I haven't pursued it that far because we don't have
maybe-synchronous mode yet and seem unlikely to ever support it.
I'm not sure why that use-case is any better than any other.  You
still would never really know whether the data read is current.  If
we were to implement this, the supposedly synchronous replica could
be out-of-date by any arbitrary amount of time (from milliseconds
to months).  (Consider what could happen if the replication
connection authorizations got messed up while application
connections to the replica were fine.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

2014-01-08 Thread Alexander Korotkov

On Mon, Jan 6, 2014 at 12:35 PM, Amit Langote amitlangot...@gmail.comwrote:

 On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 
  Yet another version. The encoding/decoding code is now quite isolated in
  ginpostinglist.c, so it's easy to experiment with different encodings.
 This
  patch uses varbyte encoding again.
 
  I got a bit carried away, experimented with a bunch of different
 encodings.
  I tried rice encoding, rice encoding with block and offset number delta
  stored separately, the simple9 variant, and varbyte encoding.
 
  The compressed size obviously depends a lot on the distribution of the
  items, but in the test set I used, the differences between different
  encodings were quite small.
 
  One fatal problem with many encodings is VACUUM. If a page is completely
  full and you remove one item, the result must still fit. In other words,
  removing an item must never enlarge the space needed. Otherwise we have
 to
  be able to split on vacuum, which adds a lot of code, and also makes it
  possible for VACUUM to fail if there is no disk space left. That's
  unpleasant if you're trying to run VACUUM to release disk space. (gin
 fast
  updates already has that problem BTW, but let's not make it worse)
 
  I believe that eliminates all encodings in the Simple family, as well as
  PForDelta, and surprisingly also Rice encoding. For example, if you have
  three items in consecutive offsets, the differences between them are
 encoded
  as 11 in rice encoding. If you remove the middle item, the encoding for
 the
  next item becomes 010, which takes more space than the original.
 
  AFAICS varbyte encoding is safe from that. (a formal proof would be nice
  though).
 
  So, I'm happy to go with varbyte encoding now, indeed I don't think we
 have
  much choice unless someone can come up with an alternative that's
  VACUUM-safe. I have to put this patch aside for a while now, I spent a
 lot
  more time on these encoding experiments than I intended. If you could
 take a
  look at this latest version, spend some time reviewing it and cleaning up
  any obsolete comments, and re-run the performance tests you did earlier,
  that would be great. One thing I'm slightly worried about is the
 overhead of
  merging the compressed and uncompressed posting lists in a scan. This
 patch
  will be in good shape for the final commitfest, or even before that.
 


 I just tried out the patch gin-packed-postinglists-varbyte2.patch
 (which looks like the latest one in this thread) as follows:

 1) Applied patch to the HEAD (on commit
 94b899b829657332bda856ac3f06153d09077bd1)
 2) Created a test table and index

 create table test (a text);
 copy test from '/usr/share/dict/words';
 create index test_trgm_idx on test using gin (a gin_trgm_ops);

 3) Got the following error on a wildcard query:

 postgres=# explain (buffers, analyze) select count(*) from test where
 a like '%tio%';
 ERROR:  lock 9447 is not held
 STATEMENT:  explain (buffers, analyze) select count(*) from test where
 a like '%tio%';
 ERROR:  lock 9447 is not held


Thanks for reporting. Fixed version is attached.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] commit fest manager?

2014-01-08 Thread Peter Eisentraut

Anyone else?

Or you'll have to deal with me again?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Pavel Stehule

Hello

I don't like this direction. What we can do with JSON from plpgsql? More,
JSON is not too robust format against some future changes.

Regards

Pavel
Dne 8.1.2014 21:43 Alvaro Herrera alvhe...@2ndquadrant.com napsal(a):

 Alvaro Herrera escribió:
  Robert Haas escribió:
 
   I think this direction has some potential.  I'm not sure it's right in
   detail.  The exact scheme you propose above won't work if you want to
   leave out the schema name altogether, and more generally it's not
   going to help very much with anything other than substituting in
   identifiers.  What if you want to add a column called satellite_id to
   every table that gets created, for example?  What if you want to make
   the tables UNLOGGED?  I don't see how that kind of things is going to
   work at all cleanly.
 
  Thanks for the discussion.  I am building some basic infrastructure to
  make this possible, and will explore ideas to cover these oversights
  (not posting anything concrete yet because I expect several iterations
  to crash and burn before I have something sensible to post).

 Here's a working example.  Suppose the user runs

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

 In an event trigger, the function pg_event_trigger_get_creation_commands()
 returns the following JSON blob:

 {authorization:{authorization_role:some guy,
   output:AUTHORIZATION %i{authorization_role}},
  if_not_exists:IF NOT EXISTS,
  name:some schema,
  output:CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}}

 wherein I have chosen to have a JSON element with the hardcoded name of
 output which is what needs to be expanded; for each %{} parameter
 found in it, there is an equally-named element in the JSON blob.  This
 can be a string, a NULL, or another JSON object.

 If it's a string, it expands to that value; if it's an object,
 recursively an output element is expanded in the same way, and the
 expanded string is used.

 If there's a NULL element when expanding an object, the whole thing
 expands to empty.  For example, if no AUTHORIZATION
 clause is specified, authorization element is still there, but the
 authorization_role element within it is NULL, and so the whole
 AUTHORIZATION clause expands to empty and the resulting command contains
 no authorization clause.  This is useful to support the case that
 someone doesn't have an AUTHORIZATION clause in the CREATE SCHEMA
 command, and the event trigger injects one simply by setting the
 authorization_role to some role name.

 IF NOT EXISTS is handled by defining it to either the string IF NOT
 EXISTS or to empty if no such clause was specified.

 The user can modify elements in the JSON to get a different version of
 the command.  (I reckon the output can also be modified, but this is
 probably a bad idea in most/all cases.  I don't think there's a need to
 prohibit this explicitely.)  Also, someone might define if_not_exists
 to something completely unrelated, but that would be their own fault.
 (Maybe we can have some cross-check that the if_not_exists element in
 JSON cannot be anything other than IF NOT EXISTS or the empty string;
 and that the output element remains the same at expansion time than it
 was at generation time.  Perhaps we should even hide the output
 element from the user completely and only add them to the JSON at time
 of expansion.  Not sure it's worth the trouble.)

 There is another function,
 pg_event_trigger_expand_creation_command(json), which will expand the
 above JSON blob and return the following text:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy

 Note the identifiers are properly quoted (there are quotes in the JSON
 blob, but they correspond to JSON's own delimiters).  I have defined a
 'i' modifier to have %i{} elements, which means that the element is an
 identifier which might need quoting.

 I have also defined a %d{} modifier that means to use the element to
 expand a possibly-qualified dotted name.  (There would be no output
 element in this case.)  This is to support the case where you have

 CREATE TABLE public.foo
 which results in
 {table_name:{schema:public,
relname:foo}}

 and you want to edit the table_name element in the root JSON and set
 the schema to something else (perhaps NULL), so in the event trigger
 after expansion you can end up with CREATE TABLE foo or CREATE TABLE
 private.foo or whatever.

 Most likely there are some more rules that will need to be created, but
 so far this looks sensible.

 I'm going to play some more with the %d{} stuff, and also with the idea
 of representing table elements such as columns and constraints as an
 array.  In the meantime please let me know whether this makes sense.

 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Alvaro Herrera

Pavel Stehule escribió:
 Hello
 
 I don't like this direction. What we can do with JSON from plpgsql?

We have plenty of JSON functions and operators in SQL, and more to come
soon.  Is that not enough?

 More, JSON is not too robust format against some future changes.

Not sure what you mean.  This JSON is generated and consumed by our own
code, so we only need to concern ourselves with making sure that we can
consume (in the expansion function) what we generated in the previous
phase.  There is no transmission to the exterior.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 01:55 PM, Tom Lane wrote:


Sync mode is about providing a guarantee that the data exists on more than
one server *before* we tell the client it's committed.  If you don't need
that guarantee, you shouldn't be using sync mode.  If you do need it,
it's not clear to me why you'd suddenly not need it the moment the going
actually gets tough.


As I understand it what is being suggested is that if a subscriber or 
target goes down, then the master will just sit there and wait. When I 
read that, I read that the master will no longer process write 
transactions. If I am wrong in that understanding then cool. If I am not 
then that is a serious problem with a production scenario. There is an 
expectation that a master will continue to function if the target is 
down, synchronous or not.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 14:23:34 -0800, Joshua D. Drake wrote:
 
 On 01/08/2014 01:55 PM, Tom Lane wrote:
 
 Sync mode is about providing a guarantee that the data exists on more than
 one server *before* we tell the client it's committed.  If you don't need
 that guarantee, you shouldn't be using sync mode.  If you do need it,
 it's not clear to me why you'd suddenly not need it the moment the going
 actually gets tough.
 
 As I understand it what is being suggested is that if a subscriber or target
 goes down, then the master will just sit there and wait. When I read that, I
 read that the master will no longer process write transactions. If I am
 wrong in that understanding then cool. If I am not then that is a serious
 problem with a production scenario. There is an expectation that a master
 will continue to function if the target is down, synchronous or not.

I don't think you've understood synchronous replication. There wouldn't
be *any* benefit to using it if it worked the way you wish since there
wouldn't be any additional guarantees. A single reconnect of the
streaming rep connection, without any permanent outage, would
potentially lead to data loss if the primary crashed in the wrong
moment.
So you'd buy no guarantees with a noticeable loss in performance.

Just use async mode if you want things work like that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest manager?

2014-01-08 Thread Josh Berkus

On 01/08/2014 02:04 PM, Peter Eisentraut wrote:
 Anyone else?
 
 Or you'll have to deal with me again?
 
 

I vote for Peter.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 02:34 PM, Andres Freund wrote:


I don't think you've understood synchronous replication. There wouldn't
be *any* benefit to using it if it worked the way you wish since there
wouldn't be any additional guarantees. A single reconnect of the
streaming rep connection, without any permanent outage, would
potentially lead to data loss if the primary crashed in the wrong
moment.
So you'd buy no guarantees with a noticeable loss in performance.

Just use async mode if you want things work like that.


Well no. That isn't what I am saying. Consider the following scenario:

db0-db1 in synchronous mode

The idea is that we know that data on db0 is not written until we know 
for a fact that db1 also has that data. That is great and a guarantee of 
data integrity between the two nodes.


If we have the following:

db0-db1:down

Using the model (as I understand it) that is being discussed we have 
increased our failure rate because the moment db1:down we also lose db0. 
The node db0 may be up but if it isn't going to process transactions it 
is useless. I can tell you that I have exactly 0 customers that would 
want that model because a single node failure would cause a double node 
failure.


All the other stuff with wal_keep_segments is just idea throwing. I 
don't care about that at this point. What I care about specifically is 
that a single node failure regardless of replication mode should not be 
able to (automatically) stop the operation of the master node.


Sincerely,

JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest manager?

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 02:38 PM, Josh Berkus wrote:


On 01/08/2014 02:04 PM, Peter Eisentraut wrote:

Anyone else?

Or you'll have to deal with me again?



I vote for Peter.



+1

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
 
 On 01/08/2014 02:34 PM, Andres Freund wrote:
 
 I don't think you've understood synchronous replication. There wouldn't
 be *any* benefit to using it if it worked the way you wish since there
 wouldn't be any additional guarantees. A single reconnect of the
 streaming rep connection, without any permanent outage, would
 potentially lead to data loss if the primary crashed in the wrong
 moment.
 So you'd buy no guarantees with a noticeable loss in performance.
 
 Just use async mode if you want things work like that.
 
 Well no. That isn't what I am saying. Consider the following scenario:
 
 db0-db1 in synchronous mode
 
 The idea is that we know that data on db0 is not written until we know for a
 fact that db1 also has that data. That is great and a guarantee of data
 integrity between the two nodes.

That guarantee is never there. The only thing guaranteed is that the
client isn't notified of the commit until db1 has received the data.

 If we have the following:
 
 db0-db1:down
 
 Using the model (as I understand it) that is being discussed we have
 increased our failure rate because the moment db1:down we also lose db0. The
 node db0 may be up but if it isn't going to process transactions it is
 useless. I can tell you that I have exactly 0 customers that would want that
 model because a single node failure would cause a double node failure.

That's why you should configure a second standby as another (candidate)
synchronous replica, also listed in synchronous_standby_names.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Alvaro Herrera

CC to hackers restored.

Pavel Stehule escribió:
 Dne 8.1.2014 23:17 Alvaro Herrera alvhe...@2ndquadrant.com napsal(a):
 
  Pavel Stehule escribió:
   Hello
  
   I don't like this direction. What we can do with JSON from plpgsql?
 
  We have plenty of JSON functions and operators in SQL, and more to come
  soon.  Is that not enough?
 
 No, is not. Im sure. It is wrong a request to parse  system internal data,
 that is available in structured form. You create string that should be
 parsed same time.
 
 Few functions with OUT parameters are beter than any semistructured string.

That was shot down, for good reasons: we assume that the users of this
are going to want to modify the command before considering it final.
Maybe they want to add a column to each table being created, or they
want to change the tablespace if the table name ends with _big, or
they want to change the schema in which it is created.

This JSON representations lets you receive the table creation data in a
well-known JSON schema; you can tweak individual elements without having
to parse the SQL command.  And when you're done tweaking, there's a
function that lets you produce the SQL command that corresponds to the
original with the tweaks you just did.

(Please note that, thus far, this facility DOES NOT let you change the
table that was created, at least not directly: these event triggers are
run AFTER the creation command has completed.  You can tweak the command
that would be sent to a remote server in a replication swarm, for
example.  Or create a mirror table for audit purposes.  Or perhaps even
generate an ALTER TABLE command for the new table.)

If by few functions with OUT parameters you mean that we need to have
one record type that is able to receive all possible CREATE TABLE
options, so that you can change them as a record in plpgsql, this
doesn't sound too good to me, for three reasons: a) it's going to
require too many types and functions (one per statement type); b)
cramming the stuff in pg_type.h / pg_proc.h is going to be a horrid
task; c) any change is going to require an initdb.

-- 
Alvaro Herrera


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Joshua D. Drake j...@commandprompt.com writes:
 On 01/08/2014 01:55 PM, Tom Lane wrote:
 Sync mode is about providing a guarantee that the data exists on more than
 one server *before* we tell the client it's committed.  If you don't need
 that guarantee, you shouldn't be using sync mode.  If you do need it,
 it's not clear to me why you'd suddenly not need it the moment the going
 actually gets tough.

 As I understand it what is being suggested is that if a subscriber or 
 target goes down, then the master will just sit there and wait. When I 
 read that, I read that the master will no longer process write 
 transactions. If I am wrong in that understanding then cool. If I am not 
 then that is a serious problem with a production scenario. There is an 
 expectation that a master will continue to function if the target is 
 down, synchronous or not.

Then you don't understand the point of sync mode, and you shouldn't be
using it.  The point is *exactly* to refuse to commit transactions unless
we can guarantee the data's been replicated.

There might be other interpretations of synchronous replication in which
it makes sense to continue accepting transactions whether or not there are
any up-to-date replicas; but in the meaning Postgres ascribes to the term,
it does not make sense.  You should just use async mode if that behavior
is what you want.

Possibly we need to rename synchronous replication, or document it
better.  And I don't have any objection in principle to developing
additional replication modes that offer different sets of guarantees and
performance tradeoffs.  But for the synchronous mode that we've got, the
proposed switch is insane, and asking for it merely proves that you don't
understand the difference between async and sync modes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 02:46 PM, Andres Freund wrote:


db0-db1 in synchronous mode

The idea is that we know that data on db0 is not written until we know for a
fact that db1 also has that data. That is great and a guarantee of data
integrity between the two nodes.


That guarantee is never there. The only thing guaranteed is that the
client isn't notified of the commit until db1 has received the data.


Well ugh on that.. but that is for another reply.



That's why you should configure a second standby as another (candidate)
synchronous replica, also listed in synchronous_standby_names.


I don't have a response to this that does not involve a great deal of 
sarcasm.


Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch (sgml typo)

2014-01-08 Thread Erik Rijkers

On Wed, January 8, 2014 22:29, Oleg Bartunov wrote:
 Attached is a new version of patch, which addresses most issues raised
 by Andres.

 [ nested_hstore-0.42.patch.gz  ]

Building documentation fails:

openjade:hstore.sgml:1010:18:E: end tag for element A which is not open
openjade:hstore.sgml:1011:13:E: document type does not allow element TYPE here
openjade:hstore.sgml:1012:8:E: document type does not allow element TYPE here
openjade:hstore.sgml:1012:27:E: document type does not allow element TYPE here
openjade:hstore.sgml:1013:15:E: document type does not allow element 
PROGRAMLISTING here
openjade:hstore.sgml:1024:8:E: end tag for TYPE omitted, but OMITTAG NO was 
specified
openjade:hstore.sgml:1010:3: start tag was here
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'

This is caused by a small tag typo.

The attached fixes that hstore.sgml typo.

thanks,

Erikjan


--- doc/src/sgml/hstore.sgml.orig	2014-01-08 23:32:29.493548857 +0100
+++ doc/src/sgml/hstore.sgml	2014-01-08 23:33:02.554527949 +0100
@@ -1007,7 +1007,7 @@
 
   para
But literalpopulate_record()/ supports more complicated records and nested
-   typehstore/a values, as well. It makes an effort to convert
+   typehstore/ values, as well. It makes an effort to convert
from typehstore/ data types to PostgreSQL types, including arrays,
typejson/, and typehstore/ values:
 programlisting
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
 Using the model (as I understand it) that is being discussed we have
 increased our failure rate because the moment db1:down we also lose db0. The
 node db0 may be up but if it isn't going to process transactions it is
 useless. I can tell you that I have exactly 0 customers that would want that
 model because a single node failure would cause a double node failure.

 That's why you should configure a second standby as another (candidate)
 synchronous replica, also listed in synchronous_standby_names.

Right.  If you want to tolerate one node failure, *and* have a guarantee
that committed data is on at least two nodes, you need at least three
nodes.  Simple arithmetic.  If you only have two nodes, you only get to
have one of those properties.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 That's why you should configure a second standby as another (candidate)
 synchronous replica, also listed in synchronous_standby_names.

Perhaps we should stress in the docs that this is, in fact, the *only*
reasonable mode in which to run with sync rep on?  Where there are
multiple replicas, because otherwise Drake is correct that you'll just
end up having both nodes go offline if the slave fails.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  That's why you should configure a second standby as another (candidate)
  synchronous replica, also listed in synchronous_standby_names.
 
 Perhaps we should stress in the docs that this is, in fact, the *only*
 reasonable mode in which to run with sync rep on?  Where there are
 multiple replicas, because otherwise Drake is correct that you'll just
 end up having both nodes go offline if the slave fails.

Which, as it happens, is actually documented.

http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION
25.2.7.3. Planning for High Availability

Commits made when synchronous_commit is set to on or remote_write will
wait until the synchronous standby responds. The response may never
occur if the last, or only, standby should crash.

The best solution for avoiding data loss is to ensure you don't lose
your last remaining synchronous standby. This can be achieved by naming
multiple potential synchronous standbys using
synchronous_standby_names. The first named standby will be used as the
synchronous standby. Standbys listed after this will take over the role
of synchronous standby if the first one should fail.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Joshua D. Drake



On 01/08/2014 02:49 PM, Tom Lane wrote:


Then you don't understand the point of sync mode, and you shouldn't be
using it.  The point is *exactly* to refuse to commit transactions unless
we can guarantee the data's been replicated.


I understand exactly that and I don't disagree, except in the case where 
it is going to bring down the master (see my further reply). I now 
remember arguing about this a few years ago when we started down the 
sync path.


Anyway, perhaps this is just something of a knob that can be turned. We 
don't have to continue the argument. Thank you for considering what I 
was saying.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus

On 01/08/2014 01:49 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 If we really want auto-degrading sync rep, then we'd (at a minimum) need
 a way to determine *from the replica* whether or not it was in degraded
 mode when the master died.  What good do messages to the master log do
 you if the master no longer exists?
 
 How would it be possible for a replica to know whether the master had
 committed more transactions while communication was lost, if the master
 dies without ever restoring communication?  It sounds like pie in the
 sky from here ...

Oh, right.  Because the main reason for a sync replica degrading is that
it's down.  In which case it isn't going to record anything.  This would
still be useful for sync rep candidates, though, and I'll document why
below.  But first, lemme demolish the case for auto-degrade.

So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:

1. A data center network/power event starts.

2. The sync replica goes down.

3. A short time later, the master goes down.

4. Data center power is restored.

5. The master is fried and is a permanent loss.  The replica is ok, though.

Question: how does the DBA know whether data has been lost or not?

With current sync rep, it's easy: no data was lost, because the master
stopped accepting writes once the replica went down.  If we support
auto-degrade, though, there's no way to know; the replica doesn't have
that information, and anything which was on the master is permanently
lost.  And the point several people have made is: if you can live with
indeterminancy, then you're better off with async rep in the first place.

Now, what we COULD definitely use is a single-command way of degrading
the master when the sync replica is down.  Something like ALTER SYSTEM
DEGRADE SYNC.  Right now you have to push a change to the conf file and
reload, and there's no way to salvage the transaction which triggered
the sync failure.  This would be a nice 9.5 feature.

HOWEVER, we've already kind of set up an indeterminate situation with
allowing sync rep groups and candidate sync rep servers.  Consider this:

1. Master server A is configured with sync replica B and candidate sync
replica C

2. A rolling power/network failure event occurs, which causes B and C to
go down sometime before A, and all of them to go down before the
application does.

3. On restore, only C is restorable; both A and B are a total loss.

Again, we have no way to know whether or not C was in sync replication
when it went down.  If C went down before B, then we've lost data; if B
went down before C, we haven't.  But we can't find out.  *This* is where
it would be useful to have C log whenever it went into (or out of)
synchronous mode.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Andres Freund

On 2014-01-08 14:52:07 -0800, Joshua D. Drake wrote:
 On 01/08/2014 02:46 PM, Andres Freund wrote:
 The idea is that we know that data on db0 is not written until we know for a
 fact that db1 also has that data. That is great and a guarantee of data
 integrity between the two nodes.
 
 That guarantee is never there. The only thing guaranteed is that the
 client isn't notified of the commit until db1 has received the data.
 
 Well ugh on that.. but that is for another reply.

You do realize that locally you have the same guarantees? If the client
didn't receive a reply to a COMMIT you won't know whether the tx
committed or not. If that's not sufficient you need to use 2pc and a
transaction manager.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
  * Andres Freund (and...@2ndquadrant.com) wrote:
   That's why you should configure a second standby as another (candidate)
   synchronous replica, also listed in synchronous_standby_names.
  
  Perhaps we should stress in the docs that this is, in fact, the *only*
  reasonable mode in which to run with sync rep on?  Where there are
  multiple replicas, because otherwise Drake is correct that you'll just
  end up having both nodes go offline if the slave fails.
 
 Which, as it happens, is actually documented.

I'm aware, my point was simply that we should state, up-front in
25.2.7.3 *and* where we document synchronous_standby_names, that it
requires at least three servers to be involved to be a workable
solution.

Perhaps we should even log a warning if only one value is found in
synchronous_standby_names...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus

Stephen,


 I'm aware, my point was simply that we should state, up-front in
 25.2.7.3 *and* where we document synchronous_standby_names, that it
 requires at least three servers to be involved to be a workable
 solution.

It's a workable solution with 2 servers.  That's a low-availability,
high-integrity solution; the user has chosen to double their risk of
not accepting writes against never losing a write.  That's a perfectly
valid configuration, and I believe that NTT runs several applications
this way.

In fact, that can already be looked at as a kind of auto-degrade mode:
if there aren't two nodes, then the database goes read-only.

Might I also point out that transactions are synchronous or not
individually?  The sensible configuration is for only the important
writes being synchronous -- in which case auto-degrade makes even less
sense.

I really think that demand for auto-degrade is coming from users who
don't know what sync rep is for in the first place.  The fact that other
vendors are offering auto-degrade as a feature instead of the ginormous
foot-gun it is adds to the confusion, but we can't help that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 I'm aware, my point was simply that we should state, up-front in
 25.2.7.3 *and* where we document synchronous_standby_names, that it
 requires at least three servers to be involved to be a workable
 solution.

It only requires that if your requirements include both redundant
data storage and tolerating single-node failure.  Now admittedly,
most people who want replication want it so they can have failure
tolerance, but I don't think it's insane to say that you want to
stop accepting writes if either node of a 2-node server drops out.
If you can only afford two nodes, and you need guaranteed redundancy
for business reasons, then that's where you end up.

Or in short, I'm against throwing warnings for this kind of setup.
I do agree that we need some doc improvements, since this is
evidently not clear enough yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Stephen Frost

Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
  I'm aware, my point was simply that we should state, up-front in
  25.2.7.3 *and* where we document synchronous_standby_names, that it
  requires at least three servers to be involved to be a workable
  solution.
 
 It's a workable solution with 2 servers.  That's a low-availability,
 high-integrity solution; the user has chosen to double their risk of
 not accepting writes against never losing a write.  That's a perfectly
 valid configuration, and I believe that NTT runs several applications
 this way.

I really don't agree with that when the standby going offline can take
out the master.  Note that I didn't say we shouldn't allow it, but I
don't think we should accept that it's a real-world solution.

 I really think that demand for auto-degrade is coming from users who
 don't know what sync rep is for in the first place.  The fact that other
 vendors are offering auto-degrade as a feature instead of the ginormous
 foot-gun it is adds to the confusion, but we can't help that.

Do you really feel that a WARNING and increasing the docs to point
out that three systems are necessary, particularly under the 'high
availability' documentation and options, is a bad idea?  I fail to see
how that does anything but clarify the use-case for our users.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus

On 01/08/2014 03:18 PM, Stephen Frost wrote:
 Do you really feel that a WARNING and increasing the docs to point
 out that three systems are necessary, particularly under the 'high
 availability' documentation and options, is a bad idea?  I fail to see
 how that does anything but clarify the use-case for our users.

I think the warning is dumb, and that the suggested documentation change
is insufficient.  If we're going to clarify things, then we need to have
a full-on several-page doc showing several examples of different sync
rep configurations and explaining their tradeoffs (including the
different sync modes and per-transaction sync).  Anything short of that
is just going to muddy the waters further.

Mind you, someone needs to take a machete to the HA section of the docs
anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 HOWEVER, we've already kind of set up an indeterminate situation with
 allowing sync rep groups and candidate sync rep servers.  Consider this:

 1. Master server A is configured with sync replica B and candidate sync
 replica C

 2. A rolling power/network failure event occurs, which causes B and C to
 go down sometime before A, and all of them to go down before the
 application does.

 3. On restore, only C is restorable; both A and B are a total loss.

 Again, we have no way to know whether or not C was in sync replication
 when it went down.  If C went down before B, then we've lost data; if B
 went down before C, we haven't.  But we can't find out.  *This* is where
 it would be useful to have C log whenever it went into (or out of)
 synchronous mode.

Good point, but C can't solve this for you just by logging.  If C was the
first to go down, it has no way to know whether A and B committed more
transactions before dying; and it's unlikely to have logged its own crash,
either.

More fundamentally, if you want to survive the failure of M out of N
nodes, you need a sync configuration that guarantees data is on at least
M+1 nodes before reporting commit.  The above example doesn't meet that,
so it's not surprising that you're screwed.

What we lack, and should work on, is a way for sync mode to have M larger
than one.  AFAICS, right now we'll report commit as soon as there's one
up-to-date replica, and some high-reliability cases are going to want
more.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jeff Janes

On Wed, Jan 8, 2014 at 2:23 PM, Joshua D. Drake j...@commandprompt.comwrote:


 On 01/08/2014 01:55 PM, Tom Lane wrote:

  Sync mode is about providing a guarantee that the data exists on more than
 one server *before* we tell the client it's committed.  If you don't need
 that guarantee, you shouldn't be using sync mode.  If you do need it,
 it's not clear to me why you'd suddenly not need it the moment the going
 actually gets tough.


 As I understand it what is being suggested is that if a subscriber or
 target goes down, then the master will just sit there and wait. When I read
 that, I read that the master will no longer process write transactions. If
 I am wrong in that understanding then cool. If I am not then that is a
 serious problem with a production scenario. There is an expectation that a
 master will continue to function if the target is down, synchronous or not.


My expectation is that the master stops writing checks when it finds it can
no longer cash them.

Cheers,

Jeff

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus

On 01/08/2014 03:27 PM, Tom Lane wrote:
 Good point, but C can't solve this for you just by logging.  If C was the
 first to go down, it has no way to know whether A and B committed more
 transactions before dying; and it's unlikely to have logged its own crash,
 either.

Sure.  But if we *knew* that C was not in synchronous mode when it went
down, then we'd expect some data loss.  As you point out, though, the
converse is not true; even if C was in sync mode, we don't know that
there's been no data loss, since B could come back up as a sync replica
before going down again.

 What we lack, and should work on, is a way for sync mode to have M larger
 than one.  AFAICS, right now we'll report commit as soon as there's one
 up-to-date replica, and some high-reliability cases are going to want
 more.

Yeah, we talked about having this when sync rep originally went in.  It
involves a LOT more bookeeping on the master though, which is why nobody
has been willing to attempt it -- and why we went with the
single-replica solution in the first place.  Especially since most
people who want quorum sync really want MM replication anyway.

Sync N times is really just a guarantee against data loss as long as
you lose N-1 servers or fewer.  And it becomes an even
lower-availability solution if you don't have at least N+1 replicas.
For that reason, I'd like to see some realistic actual user demand
before we take the idea seriously.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jeff Janes

On Wed, Jan 8, 2014 at 2:56 PM, Stephen Frost sfr...@snowman.net wrote:

 * Andres Freund (and...@2ndquadrant.com) wrote:
  That's why you should configure a second standby as another (candidate)
  synchronous replica, also listed in synchronous_standby_names.

 Perhaps we should stress in the docs that this is, in fact, the *only*
 reasonable mode in which to run with sync rep on?


I don't think it is the only reasonable way to run it.  Most of the time
that the master can't communicate with rep1, it is because of a network
problem.  So, the master probably can't talk to rep2 either, and adding the
second one doesn't really get you all that much in terms of availability.

Cheers,

Jeff

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 On 01/08/2014 03:27 PM, Tom Lane wrote:
 What we lack, and should work on, is a way for sync mode to have M larger
 than one.  AFAICS, right now we'll report commit as soon as there's one
 up-to-date replica, and some high-reliability cases are going to want
 more.

 Sync N times is really just a guarantee against data loss as long as
 you lose N-1 servers or fewer.  And it becomes an even
 lower-availability solution if you don't have at least N+1 replicas.
 For that reason, I'd like to see some realistic actual user demand
 before we take the idea seriously.

Sure.  I wasn't volunteering to implement it, just saying that what
we've got now is not designed to guarantee data survival across failure
of more than one server.  Changing things around the margins isn't
going to improve such scenarios very much.

It struck me after re-reading your example scenario that the most
likely way to figure out what you had left would be to see if some
additional system (think Nagios monitor, or monitors) had records
of when the various database servers went down.  This might be
what you were getting at when you said logging, but the key point
is it has to be logging done on an external server that could survive
failure of the database server.  postmaster.log ain't gonna do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Jim Nasby


On 1/8/14, 6:05 PM, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

On 01/08/2014 03:27 PM, Tom Lane wrote:

What we lack, and should work on, is a way for sync mode to have M larger
than one.  AFAICS, right now we'll report commit as soon as there's one
up-to-date replica, and some high-reliability cases are going to want
more.

Sync N times is really just a guarantee against data loss as long as
you lose N-1 servers or fewer.  And it becomes an even
lower-availability solution if you don't have at least N+1 replicas.
For that reason, I'd like to see some realistic actual user demand
before we take the idea seriously.

Sure.  I wasn't volunteering to implement it, just saying that what
we've got now is not designed to guarantee data survival across failure
of more than one server.  Changing things around the margins isn't
going to improve such scenarios very much.

It struck me after re-reading your example scenario that the most
likely way to figure out what you had left would be to see if some
additional system (think Nagios monitor, or monitors) had records
of when the various database servers went down.  This might be
what you were getting at when you said logging, but the key point
is it has to be logging done on an external server that could survive
failure of the database server.  postmaster.log ain't gonna do it.


Yeah, and I think that the logging command that was suggested allows for that 
*if configured correctly*.

Automatic degradation to async is useful for protecting you against all modes 
of a single failure: Master fails, you've got the replica. Replica fails, 
you've got the master.

But fit hits the shan as soon as you get a double failure, and that double 
failure can be very subtle. Josh's case is not subtle: You lost power AND the 
master died. You KNOW you have two failures.

But what happens if there's a network blip that's not large enough to notice 
(but large enough to degrade your replication) and the master dies? Now you 
have no clue if you've lost data.

Compare this to async: if the master goes down (one failure), you have zero 
clue if you lost data or not. At least with auto-degredation you know you have 
to have 2 failures to suffer data loss.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch

2014-01-08 Thread Jim Nasby


On 12/23/13, 9:47 AM, Pavel Stehule wrote:

Has anybody looked into how hard it would be to add method notation
to postgreSQL, so that instead of calling

getString(hstorevalue, n)

we could use

hstorevalue.getString(n)


yes, I played with it some years ago. I ended early, there was a problem with 
parser - when I tried append a new rule. And because there was not simple 
solution, I didn't continue.

But it can be nice feature - minimally for plpgsql coders.


Isn't there also some major problem with differentiating between 
schema/table/field with that too? I recall discussion along those lines, though 
maybe it was for the idea of recursive schemas.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple improvements to freespace allocation

2014-01-08 Thread Jim Nasby


On 1/8/14, 1:43 AM, Heikki Linnakangas wrote:

I've wanted the cluster case for a long time. I also see the use for the RECENT 
scenario, especially if we had CLUSTER CONCURRENT that let you shrink the head 
of the table as needed.

I suspect the in-memory case would only be useful if it could look into the OS 
cache as well, at least until we can recommend you give Postgres 90% of memory 
instead of 25%. Even then, I'm not sure how useful it would ultimately be...


* PACK
We want the table to be smaller, so rather than run a VACUUM FULL we
want to force the table to choose an NBT at start of table, even at
the expense of concurrency. By avoiding putting new data at the top of
the table we allow the possibility that VACUUM will shrink table size.
This is same as current except we always reset the FSM pointer to zero
and re-seek from there. This takes some time to have an effect, but is
much less invasive than VACUUM FULL.


We already reset the FSM pointer to zero on vacuum. Would the above actually 
make any difference in practice?


What if your first request is for a large chunk of free space? You could skip a 
lot of blocks, even if the FSM is bucketized.

But there's probably a more important point to this one: for you to have any 
chance of packing you MUST get everything out of the tail of the table. 
Resetting to zero on every request is one possible way to do that, though it 
might be better to do something like reset only once the pointer goes past 
block X. The other thing you'd want is a way to force tuples off the last X 
pages. Due to a lack of ctid operators that was already hard, and HOT makes 
that even harder (BTW, related to this you'd ideally want HOT to continue to 
operate on the front of the table, but not the back.)

All that said, I've definitely wanted the ability to shrink tables in the past, 
though TBH I've wanted that more for indexes.

Ultimately, what I really want on this front is:

PACK TABLE blah BACKGROUND;
CLUSTER TABLE blah BACKGROUND;
REINDEX INDEX blah BACKGROUND;

where BACKGROUND would respect a throttle setting. (While I'm dreaming, it'd be 
nice to have DATABASE/TABLESPACE/SCHEMA alternate specifications too...)
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Craig Ringer

On 01/09/2014 04:42 AM, Alvaro Herrera wrote:

 If there's a NULL element when expanding an object, the whole thing
 expands to empty.  For example, if no AUTHORIZATION
 clause is specified, authorization element is still there, but the
 authorization_role element within it is NULL, and so the whole
 AUTHORIZATION clause expands to empty and the resulting command contains
 no authorization clause.

I'd like to see this applied consistently to argument-less clauses like
IF NOT EXISTS too. So the same rules apply.

 IF NOT EXISTS is handled by defining it to either the string IF NOT
 EXISTS or to empty if no such clause was specified.

I'm not keen on this bit. It puts clauses of syntax into value strings
other than the special output key. Those keys aren't easily
distinguished from user data without clause specific knowledge. I'm not
keen on that.

Instead, can't we use your already proposed subclause structure?

{authorization:{authorization_role:some guy,
  output:AUTHORIZATION %i{authorization_role}},
 if_not_exists: {output: IF NOT EXISTS},
 name:some schema,
 output:CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}}

i.e. if_not_exists becomes an object. All clauses are objects, all
non-object values are user data. (right?). If the clause is absent, the
output key is the empty string.

The issue with that (and with your original proposal) is that you can't
tell what these clauses are supposed to be if they're not present in the
original query. You can't *enable* IF NOT EXISTS without pulling
knowledge of that syntax from somewhere else.

Depending on the problem you intend to solve there, that might be fine.

If it isn't, then instead there just needs to be a key to flag such
clauses as present or not.


{authorization:{authorization_role:some guy,
  output:AUTHORIZATION %i{authorization_role}},
 if_not_exists: {output: IF NOT EXISTS
   present: true},
 name:some schema,
 output:CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}}

Am I just over-complicating something simple here?

My reasoning is that it'd be good to be able to easily tell the
difference between *structure* and *user data* in these query trees and
do so without possibly version-specific and certainly
syntax/clause-specific knowledge about the meaning of every key of every
clause.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest manager?

2014-01-08 Thread Fabrízio de Royes Mello

On Wed, Jan 8, 2014 at 8:43 PM, Joshua D. Drake j...@commandprompt.com
wrote:


 On 01/08/2014 02:38 PM, Josh Berkus wrote:


 On 01/08/2014 02:04 PM, Peter Eisentraut wrote:

 Anyone else?

 Or you'll have to deal with me again?


 I vote for Peter.


 +1


+1

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

[HACKERS] newlines at end of generated SQL

2014-01-08 Thread Peter Eisentraut

Is there a reason why the programs in src/bin/scripts all put newlines
at the end of the SQL commands they generate?  This produces useless
empty lines in the server log (and client output, if selected).




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-01-08 Thread Alvaro Herrera

Craig Ringer escribió:

 Instead, can't we use your already proposed subclause structure?
 
 {authorization:{authorization_role:some guy,
   output:AUTHORIZATION %i{authorization_role}},
  if_not_exists: {output: IF NOT EXISTS},
  name:some schema,
  output:CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}}
 
 i.e. if_not_exists becomes an object. All clauses are objects, all
 non-object values are user data. (right?). If the clause is absent, the
 output key is the empty string.
 
 The issue with that (and with your original proposal) is that you can't
 tell what these clauses are supposed to be if they're not present in the
 original query. You can't *enable* IF NOT EXISTS without pulling
 knowledge of that syntax from somewhere else.
 
 Depending on the problem you intend to solve there, that might be fine.

Hmm.  This seems like a reasonable thing to do, except that I would like
the output to always be the constant, and have some other way to
enable the clause or disable it.  With your present boolean:
so

if_not_exists: {output: IF NOT EXISTS,
  present: true/false}

In fact, I'm now wondering whether this is a better idea than not
emitting anything when some element in the output expands to NULL; so it
would apply to authorization as well; if the command includes the
clause, it'd be

 {authorization:{authorization_role:some guy,
   present: true,
   output:AUTHORIZATION %i{authorization_role}},

and if there wasn't anything, you'd have

 {authorization:{authorization_role: null,
   present: false,
   output:AUTHORIZATION %i{authorization_role}},

so if you want to turn it on and it wasn't, you need to change both the
present boolean and also set the authorization_role element; and if you
want to turn it off when it was present, just set present to false.

 Am I just over-complicating something simple here?

I think it's a fair point.

 My reasoning is that it'd be good to be able to easily tell the
 difference between *structure* and *user data* in these query trees and
 do so without possibly version-specific and certainly
 syntax/clause-specific knowledge about the meaning of every key of every
 clause.

Sounds reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] integrate pg_upgrade analyze_new_cluster.sh into vacuumdb

2014-01-08 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 pg_upgrade creates a script analyze_new_cluster.{sh|bat} that runs
 vacuumdb --analyze-only in three stages with different statistics target
 settings to get a fresh cluster analyzed faster.  I think this behavior
 is also useful for clusters or databases freshly created by pg_restore
 or any other loading mechanism, so it's suboptimal to have this
 constrained to pg_upgrade.

 Therefore, I suggest to add this functionality into the vacuumdb
 program.

Seems reasonable.

 There are some details to be considered about who pg_upgrade would call
 this.  For example, would we keep creating the script and just have the
 script call vacuumdb with the new option, or would we skip the script
 altogether and just print a message from pg_upgrade?  Also, pg_upgrade
 contains logic to run a vacuum (not only analyze) in the final run when
 upgrading from PostgreSQL 8.4 to deal with the freespace map.  Not sure
 how to adapt that; maybe just keep the script and run a non-analyze
 vacuum after the analyze.

I don't think this vacuumdb feature should deal with any
version-conversion issues.  So it sounds like the thing to do is keep the
wrapper script, which will give us a place to put any such special actions
without having to kluge up vacuumdb's behavior.  That'll avoid breaking
scripts that users might've built for using pg_upgrade, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-08 Thread Mark Dilger

Michael Paquier wrote:
 A possible input for a test that users could provide would be something like 
 that:

 # Node information for tests
 nodes {
 {node1, postgresql.conf params, recovery.conf params}
 {node2, postgresql.conf params, recovery.conf params, slave of node1}
 }
 # Run test
 init node1
 run_sql node1 file1.sql
 # Check output
 init node2
 run_sql node2 file2.sql
 # Check that results are fine
 # Process

 The main problem is actually how to do that. Having some smart shell
 infrastructure would be simple and would facilitate (?) the maintenance
 of code used to run the tests. On the contrary having a C program would
 make the maintenance of code to run the tests more difficult (?) for a
 trade with more readable test suite input like the one I wrote above.
 This might also make the test input more readable for a human eye, in
 the shape of what is already available in src/test/isolation.

I like making this part of src/test/isolation, if folks do not object.
The core infrastructure in src/test/isolation seems applicable to 
replication testing, and I'd hate to duplicate that code.

As for the node setup in your example above, I don't think it can be as
simple as defining nodes first, then running tests.  The configurations
themselves may need to be changed during the execution of a test, and
services stopped and started, all under test control and specified in
the same easy format.

I have started working on this, and will post WIP patches from time to
time, unless you all feel the need to point me in a different direction.


mark






On Sunday, January 5, 2014 6:13 PM, Michael Paquier michael.paqu...@gmail.com 
wrote:
 


On Mon, Jan 6, 2014 at 4:51 AM, Mark Dilger markdil...@yahoo.com wrote:
 I am building a regression test system for replication and came across
 this email thread.  I have gotten pretty far into my implementation, but
 would be happy to make modifications if folks have improvements to
 suggest.  If the community likes my design, or a modified version based
 on your feedback, I'd be happy to submit a patch.
Yeah, this would be nice to look at, core code definitely needs to have some 
more infrastructure for such a test suite. I didn't get the time to go back to 
it since I began this thread though :)

 Currently I am canibalizing src/test/pg_regress.c, but that could instead
 be copied to src/test/pg_regress_replication.c or whatever.  The regression
 test creates and configures multiple database clusters, sets up the
 replication configuration for them, runs them each in nonprivileged mode
 and bound to different ports, feeds all the existing 141 regression tests
 into the master database with the usual checking that all the right results
 are obtained, and then checks that the standbys have the expected
 data.  This is possible all on one system because the database clusters
 are chroot'ed to see their own /data directory and not the /data directory
 of the other chroot'ed clusters, although the rest of the system, like /bin
 and /etc and /dev are all bind mounted and visible to each cluster.
Having vanilla regressions run in a cluster with multiple nodes and check the 
results on a standby is the top of the iceberg though. What I had in mind when 
I began this thread was to have more than a copy/paste of pg_regress, but an 
infrastructure that people could use to create and customize tests by having an 
additional control layer on the cluster itself. For example, testing 
replication is not only a matter of creating and setting up the nodes, but you 
might want to be able to initialize, add, remove nodes during the tests. Node 
addition would be either a new fresh master (this would be damn useful for a 
test suite for logical replication I think), or a slave node with custom 
recovery parameters to test replication, as well as PITR, archiving, etc. Then 
you need to be able to run SQL commands on top of that to check if the results 
are consistent with what you want.

A possible input for a test that users could provide would be something like 
that:
# Node information for tests
nodes
{
    {node1, postgresql.conf params, recovery.conf params}
    {node2, postgresql.conf params, recovery.conf params, slave of node1}
}
# Run test
init node1
run_sql node1 file1.sql
# Check output
init node2
run_sql node2 file2.sql
# Check that results are fine
# Process


The main problem is actually how to do that. Having some smart shell 
infrastructure would be simple and would facilitate (?) the maintenance of code 
used to run the tests. On the contrary having a C program would make the 
maintenance of code to run the tests more difficult (?) for a trade with more 
readable test suite input like the one I wrote above. This might also make the 
test input more readable for a human eye, in the shape of what is already 
available in src/test/isolation.


Another possibility could be also to integrate directly a recovery/backup 
manager in PG core, and have some tests for it,

Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Robert Treat

On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus j...@agliodbs.com wrote:
 Stephen,


 I'm aware, my point was simply that we should state, up-front in
 25.2.7.3 *and* where we document synchronous_standby_names, that it
 requires at least three servers to be involved to be a workable
 solution.

 It's a workable solution with 2 servers.  That's a low-availability,
 high-integrity solution; the user has chosen to double their risk of
 not accepting writes against never losing a write.  That's a perfectly
 valid configuration, and I believe that NTT runs several applications
 this way.

 In fact, that can already be looked at as a kind of auto-degrade mode:
 if there aren't two nodes, then the database goes read-only.

 Might I also point out that transactions are synchronous or not
 individually?  The sensible configuration is for only the important
 writes being synchronous -- in which case auto-degrade makes even less
 sense.

 I really think that demand for auto-degrade is coming from users who
 don't know what sync rep is for in the first place.  The fact that other
 vendors are offering auto-degrade as a feature instead of the ginormous
 foot-gun it is adds to the confusion, but we can't help that.


I think the problem here is that we tend to have a limited view of
the right way to use synch rep. If I have 5 nodes, and I set 1
synchronous and the other 3 asynchronous, I've set up a known
successor in the event that the leader fails. In this scenario
though, if the successor fails, you actually probably want to keep
accepting writes; since you weren't using synchronous for durability
but for operational simplicity. I suspect there are probably other
scenarios where users are willing to trade latency for improved and/or
directed durability but not at the extent of availability, don't you?

In fact there are entire systems that provide that type of thing. I
feel like it's worth mentioning that there's a nice primer on tunable
consistency in the Riak docs; strongly recommended.
http://docs.basho.com/riak/1.1.0/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/.
I'm not entirely sure how well it maps into our problem space, but it
at least gives you a sane working model to think about. If you were
trying to explain the Postgres case, async is like the N value (I want
the data to end up on this many nodes eventually) and sync is like the
W value (it must be written to this many nodes, or it should fail). Of
course, we only offer an R = 1, W = 1 or 2, and N = all. And it's
worse than that, because we have golden nodes.

This isn't to say there isn't a lot of confusion around the issue.
Designing, implementing, and configuring different guarantees in the
presence of node failures is a non-trivial problem. Still, I'd prefer
to see Postgres head in the direction of providing more options in
this area rather than drawing a firm line at being a CP-oriented
system.

Robert Treat
play: xzilla.net
work: omniti.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planning time in explain/explain analyze

2014-01-08 Thread Andreas Karlsson


A patch with updated documentation is attached.

On 01/02/2014 04:08 AM, Robert Haas wrote:

I'm wondering whether the time should be stored inside the PlannedStmt
node instead of passing it around separately. One possible problem
with the way you've done things here is that, in the case of a
prepared statement, EXPLAIN ANALYZE will emit the time needed to call
GetCachedPlan(), even if that function didn't do any replanning.  Now
you could argue that this is the correct behavior, but I think there's
a decent argument that what we ought to show there is the amount of
time that was required to create the plan that we're displaying at the
time it was created, rather than the amount of time that was required
to figure out that we didn't need to replan.

A minor side benefit of this approach is that you wouldn't need to
change the signature for ExplainOnePlan(), which would avoid breaking
extensions that may call it.


A possible argument against printing the time to create the plan is that 
unless it was created when running EXPLAIN we will not know it. I do not 
think we want to always measure the time it took to generate a plan due 
to slow clocks on some architectures. Also I feel that such a patch 
would be more invasive.


Just my reasoning for the current solution. I welcome any opinions about 
how to print planning time for prepared statements since I am not a 
heavy user of them.


--
Andreas Karlsson
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
new file mode 100644
index 2af1738..482490b
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
*** EXPLAIN SELECT * FROM tenk1;
*** 89,94 
--- 89,95 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  /screen
 /para
  
*** EXPLAIN SELECT * FROM tenk1;
*** 162,167 
--- 163,174 
 /para
  
 para
+ The literalPlanning time/literal shown is the time it took to generate
+ the query plan from the parsed query and optimize it. It does not include
+ rewriting and parsing.
+/para
+ 
+para
  Returning to our example:
  
  screen
*** EXPLAIN SELECT * FROM tenk1;
*** 170,175 
--- 177,183 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  /screen
 /para
  
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 198,203 
--- 206,212 
  
   Seq Scan on tenk1  (cost=0.00..483.00 rows=7001 width=244)
 Filter: (unique1 lt; 7000)
+  Planning time: 0.104 ms
  /screen
  
  Notice that the commandEXPLAIN/ output shows the literalWHERE/
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 234,239 
--- 243,249 
 Recheck Cond: (unique1 lt; 100)
 -gt;  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 lt; 100)
+  Planning time: 0.093 ms
  /screen
  
  Here the planner has decided to use a two-step plan: the child plan
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 262,267 
--- 272,278 
 Filter: (stringu1 = 'xxx'::name)
 -gt;  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 lt; 100)
+  Planning time: 0.089 ms
  /screen
  
  The added condition literalstringu1 = 'xxx'/literal reduces the
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 283,288 
--- 294,300 
  -
   Index Scan using tenk1_unique1 on tenk1  (cost=0.29..8.30 rows=1 width=244)
 Index Cond: (unique1 = 42)
+  Planning time: 0.076 ms
  /screen
  
  In this type of plan the table rows are fetched in index order, which
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 311,316 
--- 323,329 
 Index Cond: (unique1 lt; 100)
   -gt;  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.78 rows=999 width=0)
 Index Cond: (unique2 gt; 9000)
+  Planning time: 0.094 ms
  /screen
  
  But this requires visiting both indexes, so it's not necessarily a win
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 331,336 
--- 344,350 
 -gt;  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..71.27 rows=10 width=244)
   Index Cond: (unique2 gt; 9000)
   Filter: (unique1 lt; 100)
+  Planning time: 0.087 ms
  /screen
 /para
  
*** WHERE t1.unique1 lt; 10 AND t1.unique2
*** 364,369 
--- 378,384 
 Index Cond: (unique1 lt; 10)
 -gt;  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244)
   Index Cond: (unique2 =

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-08 Thread Michael Paquier

On Thu, Jan 9, 2014 at 12:34 PM, Mark Dilger markdil...@yahoo.com wrote:
 Michael Paquier wrote:
 A possible input for a test that users could provide would be something
 like that:

 # Node information for tests
 nodes {
 {node1, postgresql.conf params, recovery.conf params}
 {node2, postgresql.conf params, recovery.conf params, slave of node1}
 }
 # Run test
 init node1
 run_sql node1 file1.sql
 # Check output
 init node2
 run_sql node2 file2.sql
 # Check that results are fine
 # Process

 The main problem is actually how to do that. Having some smart shell
 infrastructure would be simple and would facilitate (?) the maintenance
 of code used to run the tests. On the contrary having a C program would
 make the maintenance of code to run the tests more difficult (?) for a
 trade with more readable test suite input like the one I wrote above.
 This might also make the test input more readable for a human eye, in
 the shape of what is already available in src/test/isolation.

 I like making this part of src/test/isolation, if folks do not object.
 The core infrastructure in src/test/isolation seems applicable to
 replication testing, and I'd hate to duplicate that code.

 As for the node setup in your example above, I don't think it can be as
 simple as defining nodes first, then running tests.  The configurations
 themselves may need to be changed during the execution of a test, and
 services stopped and started, all under test control and specified in
 the same easy format.
Yes, my example was very basic :). What you actually need is the
possibility to perform actions on nodes during a test run, basically:
stop, start, init, reload, run SQL, change params/create new conf
files (like putting a node in recovery could be = create recovery.conf
+ restart). The place of the code does not matter much, but don't
think that it should be part of isolation as clustering and isolation
are too different test suites. I would have for example seen that as
src/test/cluster, with src/test/common for things that are shared
between test infrastructures.

As mentioned by Steve, the test suite of Slony might be interesting to
look at to get some ideas.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

92 matches

Mail list logo