Re: [HACKERS] Issue installing doc tools on OSX

2015-02-16 Thread Florian Pflug
On Feb16, 2015, at 23:18 , Peter Eisentraut pete...@gmx.net wrote:
 On 2/15/15 9:23 PM, David Steele wrote:
 That seems a bit incredible, since port should be able to resolve the
 dependencies by itself.  I suggest that this should be reported as a bug
 to MacPorts.
 
 Sure, that has been my experience, but the error message was very clear.
 Unfortunately I did not capture the error before I changed the order
 and the log file was removed on the next run.

I just tried this on OS X 10.9.5 running MacPorts 2.3.3 and a ports tree
as of a few minutes ago. The command

After uninstalling docbook, opensp and openjade, re-installing them with

  sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl
  libxslt openjade opensp

completed without a hitch. It even logged

  ---  Computing dependencies for openjade
  ---  Dependencies to be installed: opensp

BTW, why does the list of suggested packages include docbook-xml? I was
under the impression that postgres used only the SGML version of docbook.
And I previously only has the SGML version installed, and I'm pretty sure
that I was able to build the documentation successfully.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] restrict global access to be readonly

2015-02-15 Thread Florian Pflug
On Feb15, 2015, at 10:13 , David G Johnston david.g.johns...@gmail.com wrote:
 happy times wrote
 Sure, we can utilize the   runtime parameter
 default_transaction_read_only, however, it does not restrict user from
 changing transaction attribute to non-readonly mode, so is not safe. 
 
 ISTM that implementing a means to make this setting only super-user
 changeable would be a quick(er) solution to the problem - the decision as to
 what to disallow is already decided.
 
 It seems like it would have to be a separate GUC but it would require any
 new SQL but would simply leverage the existing settings system to setup
 database-user values that only a super-user can change.

I've wished for a way prevent regular users for changing specific settings
in the past. Maybe we could have 

  ALTER DATABASE database FORCE parameter TO value
  ALTER ROLE role [ IN DATABASE db ] FORCE parameter TO value

In postgresql.conf, we could use a syntax like

  parameter =!= value

to indicate that the parameter value can only be changed by super-users.

We' have to figure out what happens if a database- or role-specific
FORCEd setting attempts to override a value already FORCEd in postgresql.conf.

Ideally, we'd allow database- or role-specific settings created by super-users
to override previously FORCEd values, but that would require us to store the
role that creates such settings in pg_db_role_setting. For SET clauses attached
to functions, we'd complain if they attempt to change a FORCEd value, unless
they are called by a super-user, or are marked SECURITY DEFINER and owned by
a super-user.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-24 Thread Florian Pflug
On Oct23, 2014, at 17:45 , Kevin Grittner kgri...@ymail.com wrote:
 Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
 transaction a check for child rows when validating a parent DELETE
 should consider both rows which exist according to the transaction
 snapshot and according to a current snapshot.

I've pondered this further, and unfortunately it seems that this
isn't sufficient to guarantee true serializability :-(

Verifying that both snapshots contain exactly the same rows does not
prevent a child row from being inserted and immediately deleted again,
not if both actions happen *after* the parent-updating transaction
took its snapshot, but *before* it takes the crosscheck snapshot.

Let parent(id) and child(id, parent_id) again be two tables with a
FK constraint between them, let child be initially empty, and let
parent contain a single row (1).

Assume PD is a transaction which deletes all rows from parent,
CI a transaction which inserts the row (1, 1) into child, and
CD a transaction which deletes that row again.

Even with the extended cross-checking you propose, we'd still allow
the following concurrent schedule

 1. PD takes snapshot
 2. CI starts and completes
 3. CD starts and completes
 4. PD deletes from parent without complaining, since there were
no conflicting rows at time (1), and none at time (4).

So far, all is well. But add two more tables, called ci_before_pd
and pd_before_cd, both initially containing one row. Let CI scan
ci_before_pd, let PD delete from ci_before_pd and scan pd_before_cd,
and let CD delete from pd_before_cd. In the concurrent schedule from
above, CI will see the row in ci_before_pd, and PD will delete it, and
PD will see the row in pd_before_cd that CD deletes. Note that SSI *will*
allow that schedule to occur without raising a serialization error
The only serial schedule which yields the same results for the various
queries pertaining ci_before_pd and pd_before_cd is

  CI - PD - CD,

i.e. PD has to run *between* CI and CD. But in that serial schedule,
PD *should* see a FK key violation, since CI has inserted a child which
CD hasn't yet deleted.

There is thus *no* serial schedule which yields the same results as the
concurrent schedule above for the queries pertaining parent and child,
*and* for the queries pertaining ci_before_pd and pd_before_cd, i.e
the concurrent schedule is *not* serializable. Yet even the extended cross-
check won't detect this.

Attached is an isolationtester spec file which implements this example,
and the corresponding out-file which shows that SSI permits the concurrent
schedule. Since SSI doesn't concern itself with RI enforcement queries,
it would also permit that schedule if we extended the cross-check, I think.

(I used REL9_4_STABLE as of today to try this, commit
 1cf54b00ba2100083390223a8244430643c1ec07)

best regards,
Florian Pflug


fk-consistency2.spec
Description: Binary data


fk-consistency2.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-24 Thread Florian Pflug
On Oct24, 2014, at 18:42 , Robert Haas robertmh...@gmail.com wrote:
 I don't think you can count on being able to figure out all of the
 recent lockers by looking at xmax; it can get overwritten.  For
 example, suppose transaction A locks the row and then commits.  Then
 transaction B comes along and again locks the row and commits.  My
 understanding is that B won't create a multi-xact since there's just
 one locker; it'll just stomp on the previous xmax.
 
 Now, you could change that, but I suspect it would have pretty drastic
 implications on systems that have both foreign keys and long-running
 transactions.

Yes, we'd have to create multi-xids in some cases were we don't do that
today. How big the effect would be, I honestly don't know. I guess it
depends on how costly multi-xid creation is, compared to updating the
parent row (to change its xmax) and updating or deleting child rows.
My hope would be that it's cheap compared to that, but maybe that's not
true, especially since multi-xids are xlog'ed nowadays.

The only other option I see would be so teach the executor to check
whether *any* snapshot between the transaction's snapshot and a current
snapshot would see a different set of rows. Simply checking whether both
the current snapshot and the transaction's snapshot see the same set isn't
sufficient, per the counter-example in my other mail :-(

An advantage of the latter is that I'd but the burden on the session
that attempts to remove a parent row, instead of on the sessions which
add or remove children.

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-24 Thread Florian Pflug
On Oct24, 2014, at 19:32 , Robert Haas robertmh...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 1:28 PM, Florian Pflug f...@phlo.org wrote:
 The only other option I see would be so teach the executor to check
 whether *any* snapshot between the transaction's snapshot and a current
 snapshot would see a different set of rows. Simply checking whether both
 the current snapshot and the transaction's snapshot see the same set isn't
 sufficient, per the counter-example in my other mail :-(
 
 What about doing one scan using SnapshotAny and then testing each
 returned row for visibility under both relevant snapshots?  See
 whether there is any tuple for which they disagree.

See my other mail - testing whether the snapshots agree isn't enough,
you'd have to check whether there could have been *any* snapshot taken
between the two which would see a different result. Or at least what I
currently think - I don't yet have an actual proof at hand that doing
this always preserves serializability. But it seems plausible that it
does, because it ensures that there is a rather large time frame during
which the enforcement query can be placed in a serial schedule without
changing its result.

This applies only to SERIALIZABLE transactions, though. For REPEATABLE
READ, it might be that checking whether the two snapshots agree is
sufficient.

So a third way forward would be to not skip the SSI logic for RI enforcement
queries. Though I fear that, due to the imprecise nature of SIREAD locks,
doing so would drive the false positive rate way up for workloads involving 
lots of parent and child row modifications.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-24 Thread Florian Pflug
On Oct24, 2014, at 20:24 , Robert Haas robertmh...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 2:12 PM, Florian Pflug f...@phlo.org wrote:
 What about doing one scan using SnapshotAny and then testing each
 returned row for visibility under both relevant snapshots?  See
 whether there is any tuple for which they disagree.
 
 See my other mail - testing whether the snapshots agree isn't enough,
 you'd have to check whether there could have been *any* snapshot taken
 between the two which would see a different result.
 
 Oh, hmm.  I had thought what I was proposing was strong enough to
 handle that case, but now I see that it isn't.  However, I'm not
 entirely sure that it's the RI code's job to prevent such cases, or at
 least not when the transaction isolation level is less than
 serializable.

Well, the responsibility was laid onto the RI code by the decision to
not do SIREAD locking for RI enforcement queries. Simply enabling SIREAD
locking without doing anything else is not a good solution, I think -
it will drive up the false-positive rate. And it would make workloads
consisting purely of serializable transactions pay the price for the
row-level locking done by the RI triggers, without getting anything in
return. 

 Is there an argument that the anomaly that results is
 unacceptable at REPEATABLE READ?

I'm not aware of any case that is clearly unacceptable for REPEATABLE
READ, but neither am I convinced that no such case exists. REPEATABLE READ
mode is hard because there doesn't seem to be any formal definition of
what we do and do not guarantee. The guarantees provided by the SQL
standard seem to be so much weaker than what we offer that they're not
really helpful :-(

I believe the best way forward is to first find a solution for SERIALIZABLE
transactions, and then check if it can be applied to REPEATABLE READ
mode too. For SERIALIZABLE mode, it's at least clear what we're aiming
for -- offering true serializability.

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-24 Thread Florian Pflug
On Oct24, 2014, at 22:50 , Kevin Grittner kgri...@ymail.com wrote:
 I need to spend some more time looking at it, and I have another
 couple things in front of this on my personal TODO list, but I
 think that if we had a row lock which was stronger than current
 SELECT FOR UPDATE behavior, and the delete of a parent row (or
 updated of its referenced key) read the children using it, this
 would be solved.  Essentially I'm thinking of something just like
 FOR UPDATE except that a transaction which attempted a concurrent
 UPDATE or DELETE of the row (before the locking transaction ended)
 would error out with some form of serialization failure.  I believe
 this would be consistent with the behavior of SELECT FOR UPDATE in
 Oracle, so it would allow a path for those migrating from Oracle to
 maintain their current logic (with a slight change to the FOR
 strength clause).

Hm, I don't believe any form of traditional row-level lock on the
child row can fix this. If you look at the second failure case
(an updated isolation tester spec file which includes comments is
attached), you see that it fails even if you define the FK constraint
as CASCADE instead of RESTRICT. So even a full UPDATE or DELETE
of the child rows doesn't help.

But maybe I miss-understood what you proposed.

best regards,
Florian Pflug



fk-consistency2.spec
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: allow AS label inside ROW constructor

2014-10-23 Thread Florian Pflug
On Oct23, 2014, at 15:39 , Andrew Dunstan and...@dunslane.net wrote:
 On 10/23/2014 09:27 AM, Merlin Moncure wrote:
 On Thu, Oct 23, 2014 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 postgres=# select row_to_json(row(10 as A, row(30 as c, 20 AS B) as x));
  row_to_json
 --
  {a:10,x:{c:30,b:20}}
 (1 row)
 
 wow -- this is great.   I'll take a a look.
 
 
 Already in  9.4:
 
 andrew=# select 
 json_build_object('a',10,'x',json_build_object('c',30,'b',20));
   json_build_object
 
 {a : 10, x : {c : 30, b : 20}}
 (1 row)

 So I'm not sure why we want another mechanism unless it's needed in some 
 other context.

I've wanted to name the field of rows created with ROW() on more than
one occasion, quite independent from whether the resulting row is converted
to JSON or not. And quite apart from usefulness, this is a matter of
orthogonality. If we have named fields in anonymous record types, we should
provide a convenient way of specifying the field names.

So to summarize, I think this is an excellent idea, json_build_object
non-withstanding.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] KEY UPDATE / UPDATE / NO KEY UPDATE distinction vs. README.tuplock

2014-10-23 Thread Florian Pflug
Hi,

It seems that README.tuplock never got updated when the KEY SHARE patch's
lock level were changed from being KEY UPDATE / UPDATE / SHARE / KEY SHARE
to UPDATE / NO KEY UPDATE / SHARE / KEY SHARE.

Thus, as it stands, that file implies that SELECT FOR UPDATE obtains a
weaker lock than an actual UPDATE might take (if that update modifies key
columns) -- according to README.tuplock, the former doesn't conflict with
KEY SHARE while the actual UPDATE would.

But this isn't actually the case in the committed version of the patch -
one now needs to explicitly request that weaker lock level with SELECT FOR
NO KEY UPDATE.

The attached patch updated README.tuplock accordingly.

best regards,
Florian Pflug


README.tuplock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-23 Thread Florian Pflug
On Oct23, 2014, at 17:45 , Kevin Grittner kgri...@ymail.com wrote:
 Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
 transaction a check for child rows when validating a parent DELETE
 should consider both rows which exist according to the transaction
 snapshot and according to a current snapshot.  Interestingly, the
 run of the query passes both snapshots through to the executor, but
 for this query the estate-es_crosscheck_snapshot field (which
 contains the transaction snapshot) doesn't seem to be consulted.
 It makes me wonder whether we were at some point doing this right
 and it later got broken.

I've been pondering a completely different way to fix this. Many years
ago I tried to get rid of the crosscheck snapshot completely by changing
the way locking conflicts are treated for REPEATABLE READ transactions
above.

The basic idea is that taking a share lock on a row implies that you're
going to apply further changes whose correctness depends on existence
of the row you lock. That, in particular, applies to the locks taken
by RI triggers -- we lock the parent row before we add children, because
the children's existence necessitates the existence of the parent. If
you take an exclusive lock, OTOH, that implies a modification of the row
itself (we never explicitly take that lock during an UPDATE or DELETE,
but we do so implicitly, because UPDATEs and DELETEs conflict with SHARE
locks). So after obtaining such a lock, its the lock holder's responsibility
to check that the desired update doesn't break anything, i.e. in the case
of RI that it doesn't create any orphaned children.

The only reason we need the crosscheck snapshot to do that is because
children may have been added (and the change committed) *after* the
transaction which removed the parent has taken its snapshot, but *before*
that transaction locks the parent row.

My proposal is to instead extend the locking protocol to prevent that.
Essentially, we have to raise a serialization error whenever

  1) We attempt to exclusively lock a row (this includes updating or deleting
 it), and

  2) somebody else did hold a share lock on that row recently, and

  3) That somebody is invisible to us according to our snapshot.

My initial attempt to do that failed, because we used to have very little
means of storing the locking history - the only source of information was
the xmax field, so any update of a tuple removed information about previous
lock holders - even if that update was later aborted. I pondered using
multi-xids for this, but at the time I deemed that too risky - back at the
time, they had a few wraparound issues and the like which were OK for share
locks, but not for what I intended.

But now that we have KEY SHARE locks, the situation changes. We now rely on
multi-xids to a much greater extent, and AFAIK multi-xid wraparound is now
correctly dealt with. We also already ensure that the information contained
in multi-xids are preserved across tuple upgrades (otherwise, updating a row
on which someone holds a KEY SHARE lock would be broken).

So all that is missing, I think, is

  1) To make sure we only remove a multi-xid if none of the xids are invisible
 to any snapshot (i.e. lie before GlobalXmin or something like that).

  2) When we acquire a lock (either explicitly or implicitly by doing an
 UPDATE or DELETE) check if all previous committed lock holders are
 visible according to our snapshot, and raise a serialization error
 if not.

The big advantage of doing that over fixing the crosscheck logic would be
that it'd make it possible to write concurrency-safe FK triggers in any
procedural language.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Florian Pflug
Florian Pflug f...@phlo.org wrote:
 So in conclusion, the lock avoids raising constraint violation errors in
 a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
 constraint violation errors into serialization failures. Or at least that's
 how it looks to me.

I go the REPEATABLE READ case wrong there, and that case is actually broken!.
I initially assumed that trying to lock a row whose latest version is invisible
a transaction in mode REPEATABLE READ or above will result in a serialization
error. But for the queries run by ri_triggers.c, this is not the case.

AFAICS, the only executor node which takes the crosscheck snapshot into account
is nodeModifyTable. So both a plain SELECT and a SELECT FOR [KEY] SHARE | UPDATE
will simply run with the snapshot passed to the SPI, which for the query in
question is always a *current* snapshot (we pass the original snapshot as the
crosscheck snapshot in REPEATABLE READ). Thus, if there's a child row that
is visible to the transaction deleting the parent row, but that child was 
meanwhile
removed, it seems that we currently *won't* complain. The same holds, of course
for mode SERIALIZABLE -- we don't distinguish the two in ri_triggers.c.

But that's wrong. The transaction's snapshot *would* see that row, so we
ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
breaks true serializability in some cases, since we don't do conflict detection
for RI enforcement queries.

Here's a test case, involving two transaction A and B. I tried this on
REL9_4_STABLE.

Setup
-
CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
CREATE TABLE child (id int NOT NULL PRIMARY KEY,
parent_id int NOT NULL REFERENCES parent (id));
INSERT INTO parent (id) VALUES (1);
INSERT INTO child (id, parent_id) VALUES (1, 1);

Failure Case

A:: set default_transaction_isolation='serializable';
A:: begin;
A:: select * from child;
 -  id | parent_id 
+---
  1 | 1
B:: set default_transaction_isolation='serializable';
B:: begin;
B:: delete from child;
 - DELETE 1
B:: commit;
A:: delete from parent;
 - DELETE 1
A:: commit;

A can neither come before B in any serializable history (since the DELETE
should fail then), nor after it (since it shouldn't see the row in the child
table then). Yet we don't complain, even though both transaction run in
SERIALIZABLE mode.

On Oct21, 2014, at 19:27 , Nick Barnes nickbarne...@gmail.com wrote:
 On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner kgri...@ymail.com wrote:
 
 It doesn't seem like this analysis considers all of the available ON
 DELETE and ON UPDATE behaviors available.  Besides RESTRICT there is
 CASCADE, SET NULL, SET DEFAULT, and NO ACTION.  Some of those
 require updating the referencing rows.
  
 I think the logic in question is specific to RESTRICT and NO ACTION.
 The other cases don't look like they need to explicitly lock anything;
 the UPDATE / DELETE itself should take care of that.

Exactly. We run the dubious SELECT FROM fkrel ... FOR KEY SHARE query *only*
for RESTRICT and NO ACTION.

I've traced through the revisions of ri_triggers.c a bit. Locking the
conflicting child rows in the parent table's DELETE and UPDATE triggers was
added in commit 0882951b0cdb4c6686e57121d56216cb2044f7eb which is dated
1999-12-08. (This was before we even has row-level SHARE locks, so the code
did a SELECT .. FOR UPDATE back then). This is also the commit which added
FOR UPDATE locking to RI_FKey_check, i.e. which ensured that parent rows are
locked when adding new child rows. It's commit message unfortunately doesn't
offer much in terms of explanations - it just says Fixed concurrent visibility
bug.

When SHARE locks where implemented in commit 
bedb78d386a47fd66b6cda2040e0a5fb545ee371,
dating from 2005-04-28, it seems that FOR UPDATE was simply replaced by
FOR SHARE throughout ri_triggers.c. The same seems to have happened when
KEY SHARE locks where introduced on 2013-01-13 with commit
0ac5ad5134f2769ccbaefec73844f8504c4d6182.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Florian Pflug
 This should not be considered a problem for repeatable read
 transactions because the change in visible rows meet the definition
 of phantom reads, which are allowed in repeatable read: A
 transaction re-executes a query returning a set of rows that
 satisfy a search condition and finds that the set of rows
 satisfying the condition has changed due to another
 recently-committed transaction. 

Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, 
repeatable reads? Also, note that after the DELETE FROM parent, further SELECTS 
in the same transaction will use the original snapshot again, und thus will see 
the conflicting child rows again that were ignored by the RI trigger. But they 
won't, of course, see the parent row.

IOW, transaction A will, after the delete, see a state of the database in which 
the PK constraint is broken. I don't think that's acceptable in any isolation 
level.

Best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add launchd Support

2014-10-21 Thread Florian Pflug
On Oct21, 2014, at 02:53 , Wim Lewis w...@omnigroup.com wrote:
 
 2) AFAICS, this .plist file doesn't do anything about launchd's habit of
 not waiting for the network to come up. 

 If true, the job will be kept alive as long as the network is up, where
 up is defined as at least one non-loopback  interface being up and having
 IPv4 or IPv6 addresses assigned to them.  If false, the job will be kept
 alive in the inverse condition.
 
 On the other hand, it’s not unreasonable to have postgres running on a
 machine with only a loopback interface, depending on the use.

This, I think, shows the gist of the problem -- Networking is up is not
a clearly defined stated on modern desktop machines. Interfaces can come and
go all the time, and even things like ethernet may take a while to come up
during boot, or even depend on a user logging in if something like 802.11X
is used.

 We might be able to put something in LaunchEvents that gets it to fire
 when the network launches, but documentation is hella thin (and may only
 be supported on Yosemite, where there are a bunch of poorly-documented
 launchd changes).
 
 If one were desperate enough... it’s possible to dig through the launchd
 sources to make up for the gaps in the documentation (also on
 opensource.apple.com; there used to be a community-ish site for it at
 macosforge.org as well). It’s rough going, though, IIRC.

The correct way to deal with this would be to teach postgres to react to
connectivity changes dynamically. During postmaster launch, we'd ignore all
listen entries which mention non-bindable addresses. If the network
configuration changes, we'd rescan the listen entries, and attempt to
create any missing sockets.

OS X provides an API (via the SystemConfiguration framework IIRC) which allows
to subscribe to networking-changed notifications. I believe systemd provides
something similar via dbus or some such.

Whether or not doing this is worth the effort though, I don't know. I guess
it depends on whether postgres on OS X is actually used for productions --
my guess is that most installations are for development and testing purposes,
but maybe my view's biased there.

 (3) I don't think you want Disabled = true.
 
 It’s the default. When you run `launchctl load -w` it overrides it to false 
 in
 its database. I’m fine to have it be less opaque, though.
 
 Yes, AFAICT it’s conventional to specify Disabled=true in a launchd plist and
 use launchctl to enable the item.

Yup, macports also has Disabled=true in the launchd items they install.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-21 Thread Florian Pflug
(CCing Alvaro, since he implemented KEY SHARE locks)

On Oct16, 2014, at 15:51 , Nick Barnes nickbarne...@gmail.com wrote:
 One of the queries in ri_triggers.c has be a little baffled.
 
 For (relatively) obvious reasons, a FK insert triggers a SELECT 1 FROM pk_rel 
 ... FOR KEY SHARE.
 For not-so-obvious reasons, a PK delete triggers a SELECT 1 FROM fk_rel ... 
 FOR KEY SHARE.
 
 I can't see what the lock on fk_rel achieves. Both operations are already
 contending for the lock on the PK row, which seems like enough to cover
 every eventuality.

I remember wondering about this too, quite a while ago. That was before we
had KEY locks, i.e. it simply read FOR SHARE back then.

I don't think I ever figured out if and why that lock is necessary - so here's
an attempt at unraveling this.

The lock certainly isn't required to ensure that we see all the child rows in
the fk_rel -- since we're doing this in an AFTER trigger, we're already holding
the equivalent of an UPDATE lock on the parent row, so no new fk_rel rows can
appear concurrently. Note that seeing here refers to an up-to-date snapshot
-- in REPEATABLE READ mode and above, our transaction's snapshot doesn't
necessarily see those rows, but ri_PerformCheck ensure that we error out if our
transaction's snapshot prevents us from seeing any newly added child rows
(c.f. detectNewRows).

However, DELETing from fk_rel isn't restricted by any of the constraint 
triggers,
so child rows may still be deleted concurrently. So without the lock, it might
happen that we report a constraint violation, yet the offending row is already
gone by the time we report the error. Note that ri_PerformCheck's detectNewRows
flag doesn't enter here -- AFAIK, it only raises an error if the transaction's
snapshot sees *fewer* rows than a current snapshot. 

So AFAICS, the current behaviour (sans the KEY SHARE / SHARE confusion, see
below) is this:

In READ COMMITED mode, we'll ignore rows that are deleted or reparented
concurrently (after waiting for the concurrent transaction to commit, of course)

In REPEATABLE READ mode and above, we'll report a serialization error if a child
row is deleted or reparented concurrently (if the concurrent transaction 
committed,
of course)

Without the lock, we'd report a constraint violation in both cases.

So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

 And even if the lock serves a purpose, KEY SHARE is an odd choice, since
 the referencing field is, in general, not a key in this sense.

Hm, yeah, that's certainly weird. AFAICS, what this will do is prevent any
concurrent update of any columns mentions in any unique index on the *fk_rel*
- but as you say, that set doesn't necessarily the set of columns in the
FK constraint at all.

So currently, it seems that the lock only prevent concurrent DELETES, but 
not necessarily concurrent UPDATEs, even if such an UPDATE changes the parent
that a child row refers to.

Independent from whether the lock is actually desirable or not, that
inconsistency certainly looks like a bug to me...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TODO] Process pg_hba.conf keywords as case-insensitive

2014-09-10 Thread Florian Pflug
On Sep10, 2014, at 10:54 , Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp 
wrote:
 Under the new specifications, next_token will work as following,
 
  - USER  : token: USER  , case-insensitive
  - USeR: token: USeR  , case-SENSITIVE
  - +uSeR   : token: +uSeR , case-SENSITIVE
  - +UsER   : token: +UsEr , case-insensitive
  - USeR  : token: USeR , case-insensitive
 
  - +USER : token: USER  , case-insensitive, group_name
  - +uSeR   : token: uSeR  , case_SENSITIVE, group_name
  - +UsEr : token: UsEr , case-insensitive, group_name
 
  - + : token: + , (useless?)
  - @ : token: @ , (useless?)
  - @hoge: token: hoge, file_inclusion (not confirmed)
 
 
 There's a concern that Case-insensitive matching is accomplished
 by full-scan on pg_database or pg_authid so it would be rather
 slow than case-sensitive matching. This might not be acceptable
 by the community.

That does indeed sound bad. Couldn't we handle this the same
way we handle SQL identifiers, i.e. simply downcase unquoted
identifiers, and then compare case-sensitively?

So foo, Foo and FOO would all match the user called foo,
but Foo would match the user called Foo, and FOO the
user called FOO.

An unquoted + would cause whatever follows it to be interpreted
as a group name, whereas a quoted + would simply become part of
the user name (or group name, if there's an additional unquoted
+ before it).

So +foo would refer to the group foo, +FOO to the group FOO,
and ++A to the group +A.

I haven't checked if such an approach would be sufficiently
backwards-compatible, though.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-04 Thread Florian Pflug
On Sep4, 2014, at 20:50 , Pavel Stehule pavel.steh...@gmail.com wrote:
 2014-09-04 20:31 GMT+02:00 Josh Berkus j...@agliodbs.com:
 * The ability to compile functions/procedures for faster execution.
 
 This point is more complex, because bottleneck is not in plpgsql - it is
 terrible fast against noncompiled pcode interpreted PL/SQL and it is
 comparable with PL/SQL - due different design. A expression evaluation is
 slower, partially due using a SQL expression interpret, partially due our
 arrays and strings are immutable, and any composition are slow.

That, in principle, is just an inlining problem, though. Say we translate
PL/pgSQL into LLVM bytecode in the simplest possible way by simply traversing
the parse tree, and emitting calls to the functions that the interpreter calls
now. Now, that alone wouldn't buy much, as you say. But if we additionally
compile (at least parts of) the executor machinery to LLVM bytecode too
(just once, while building postgres), the LLVM optimizer should in principle
be able to inline at least some of these calls, which *could* have considerable
benefit. The hard part would probably be to figure out how to inform the
executor which parts it may consider to be *constant* (i.e. what constitues
the execution *plan*) and which parts can change from one execution to the
next (i.e. the executor state). 

In fact, such an approach would allow all expression evaluations to be
JITed - not only those appearing in PL/pgSQL functions but also in plain SQL.

 Cost of hidden IO cast is negative too. If we can change it, then we can
 increase a sped.

But the whole power of PL/pgSQL comes from the fact that it allows you to
use the full set of postgres data types and operatores, and that it seamlessly
integrated with SQL. Without that, PL/pgSQL is about as appealing as BASIC
as a programming language...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-13 Thread Florian Pflug
On Apr13, 2014, at 20:23 , Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 OK, I'm marking this ready for committer attention, on the
 understanding that that doesn't include the invtrans_minmax patch.
 
 I've finished reviewing/revising/committing this submission.

Cool! Thanks!

 Some notes:
 
 * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're
 of little if any use to regular users, and neither very well defined nor
 well documented.  We can revisit that later of course.
 
 * I also left out the table documenting which aggregates have this
 optimization.  That's not the kind of thing we ordinarily document,
 and it seems inevitable to me that such a table would be noteworthy
 mostly for wrong/incomplete/obsolete information in the future.

I can live with each leaving each of these out individually, but leaving
them both out means there's now no way of determining how a particular
sliding window aggregation will behave. That leaves our users a bit
out in the cold IMHO.

For example, with this patch you'd usually want to write
  SUM(numeric_value::numeric(n,m))
instead of
  SUM(numeric_value)::numeric(n,m)
in a sliding window aggregation, and there should be *some* way for
users to figure this out.

We have exhaustive tables of all the functions, operators and aggregates
we support, and maintenance of those seems to work well for the most
part. Why would a table of invertible aggregates be any different? If
it's the non-locality that bothers you - I guess the same information
could be added to the descriptions of the individual aggregates, instead
of having one big table.

 * I rejected the invtrans_collecting sub-patch altogether.  I didn't
 like anything about the idea of adding a poorly-documented field to
 ArrayBuildState and then teaching some random subset of the functions
 using that struct what to do with it.  I think it would've been simpler,
 more reliable, and not that much less efficient to just do memmove's in
 shiftArrayResult. The string-aggregate changes were at least more
 localized, but they still seemed awfully messy.  And there's a bigger
 issue: these aggregates have to do O(N) work per row for a frame of length
 N anyway, so it's not clear to me that there's any big-O gain to be had
 from making them into moving aggregates.

Yeah, those probably aren't super-usefull. I initially added array_agg
because with the nonstrict-forward/strict-reverse rule still in place,
there wouldn't otherwise have been an in-core aggregate which exercises
the non-strict case, which seemed like a bad idea. For the string case -
I didn't expect that to turn out to be *quite* this messy when I started
implementing it.

 Anyway, this is nice forward progress for 9.4, even if we get no further.

Yup! Thanks to everybody who made this happens!

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-11 Thread Florian Pflug
On Apr11, 2014, at 17:09 , Tom Lane t...@sss.pgh.pa.us wrote:
 Basically, this comes down to a design judgment call, and my judgment
 is differing from yours.  In the absence of opinions from others,
 I'm going to do it my way.

Ok. Are you going to do the necessary changes, or shall I? I'm happy to
leave it to you, but if you lack the time I can try to find some.

 ... SUM(int4) wouldn't need
 *any* extra state at all to be invertible, if it weren't for those pesky
 issues surrounding NULL handling. In fact, an earlier version of the
 invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
 only reason this doesn't work nowadays is that Dean didn't like the
 forward-nonstrict-but-inverse-strict special case that made this work.
 
 Tell me about the special case here again?  Should we revisit the issue?

My original coding allows the combination of non-strict forward with strict
reverse transition functions. The combination behaved like a strict aggregate
regarding NULL handling - i.e., neither the forward nor the reverse transition
function received NULL inputs. But if the initial state was NULL, the forward
transition function *would* be called with that NULL state value upon seeing
the first non-NULL input. In the window case, the aggregation machinery would
also take care to reset the state type to it's initial value when it removed
the last non-NULL input from the aggregation state (just like it does for
strict aggregates today). This had two advantages

  1) First, it allows strict aggregates to use state type internal. As it
 stands now, aggregates with state type internal must implement their
 own NULL handling, even if they behave exactly like most standard
 aggregates do, namely ignore NULLS and return NULL only if there were
 no non-NULL inputs. 

  2) It allows strict aggregates whose state type and input type aren't
 binary coercible to return NULL if all inputs were NULL without any
 special coding. As it stands today, this doesn't work without some
 kind of counter in the state, because the final function otherwise
 won't know if there were non-NULL inputs or not. Aggregates whose state
 and input types are binary coercible get around that by setting the
 initial value to NULL while *still* being strict, and relying on the
 state replacement behaviour of the aggregate machinery.

It, however, also has a few disadvantages

  3) It means that one needs to look at the inverse transition function's
 strictness setting even if that function is never used. 

  4) It feels a bit hacky.

(2) is what affects SUM(int4). The only reason it track the number of inputs
is to be able to return NULL instead of 0 if no inputs remain.

One idea to fix (3) and (4) was *explicitly* flagging aggregates as
NULL-handling or NULL-ignoring, and also as what one might call
weakly strict, i.e. as returning NULL exactly if there were no non-NULL
inputs. There are various variations of that theme possible - one flag for
each behaviour, or simply a single common behaviour flag. In the end, I
decided not to pursue that, mostly because the aggregates affected by that
issued turned out to be relatively easy to fix. For the ones with state type
internal, I simply added a counter, and I made SUM(int4) use AVG's transition
function.

I don't feel too strongly either way on this one - I initially implemented the
exception because I noticed that the NULL handling of some aggregates was
broken otherwise, and it seemed simpler to fix this in one place than going
over all the aggregates separately. OTOH, when I wrote the docs, I noticed
how hard it was to describe the behaviour accurately, which made me like it
less and less. And Dean wasn't happy with it at all, so that finally settled it.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-11 Thread Florian Pflug
On Apr11, 2014, at 19:04 , Tom Lane t...@sss.pgh.pa.us wrote:
 It strikes me that your second point
 
  2) It allows strict aggregates whose state type and input type aren't
 binary coercible to return NULL if all inputs were NULL without any
 special coding. As it stands today, this doesn't work without some
 kind of counter in the state, because the final function otherwise
 won't know if there were non-NULL inputs or not. Aggregates whose state
 and input types are binary coercible get around that by setting the
 initial value to NULL while *still* being strict, and relying on the
 state replacement behaviour of the aggregate machinery.
 
 could be addressed by the initfunc idea, but I'm still not sufficiently
 excited by that one.  Given the problem with internal-type transition
 values, I think this could only win if there are cases where it would let
 us use a regular SQL type instead of internal for the transition value;
 and I'm not sure that there are many/any such cases.

Yes, the idea had come up at some point during the review discussion. I
agree that it's only worthwhile if it works for state type internal - though
I think there ought to be a way to allow that. We could for example simply
decree that the initfunc's first parameter must be of type internal if it's
return type is, and then just pass NULL for that parameter.

What I like about the initfunc idea is that it also naturally extends to
ordered-set aggregates, I think it'd be very useful for some possible
ordered-set aggregates to received their direct arguments beforehand and not
afterwards. 

But that all seems largely orthogonal to the invtrans patch.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-11 Thread Florian Pflug
On Apr11, 2014, at 19:42 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 Yes, the idea had come up at some point during the review discussion. I
 agree that it's only worthwhile if it works for state type internal - though
 I think there ought to be a way to allow that. We could for example simply
 decree that the initfunc's first parameter must be of type internal if it's
 return type is, and then just pass NULL for that parameter.
 
 I had thought about that, but it doesn't really work since it'd be
 violating the strictness spec of the function.

Only if we insist on passing SQL NULL, not if we passed an non-NULL value
that happens to be (char*)0.

Or we could simply require the initfunc to be non-strict in the case of
state type internal.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-10 Thread Florian Pflug
On Apr10, 2014, at 02:13 , Florian Pflug f...@phlo.org wrote:
 On Apr9, 2014, at 23:17 , Florian Pflug f...@phlo.org wrote:
 On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote:
 A quick test says that avg(int4)
 is about five percent slower than sum(int4), so that's the kind of hit
 we'd be taking on non-windowed aggregations if we do it like this.
 
 That's rather surprising though. AFAICS, there's isn't much difference
 between the two transfer functions int4_sum and int4_avg_accum at all.
 
 The differences come down to (+ denoting things which ought to make
 int4_avg_accum slower compared to int4_sum, - denoting the opposite)
 
 1. +) int4_avg_accum calls AggCheckCallContext
 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
  to verify that the state is a 2-element array without NULL entries
 4. -) int4_sum checks if the input is NULL
 
 I've done a bit of profiling on this (using Instruments.app on OSX). One
 thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
 calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
 since we know that the datum cannot possibly be toasted I think (or if it
 was, nodeAgg.c should do the de-toasting).
 
 The profile also attributes a rather large percent of the total runtime of
 int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
 rid of that would help quite a few transition functions, invertible or not.
 That certainly seems doable - we'd need a way to mark functions as internal
 support functions, and prevent user-initiated calls of such functions. 
 Transition functions marked that way could then safely scribble over their
 state arguments without having to consult AggCheckCallContext() first.
 
 ...
 
 However, I still believe the best approach at this point is to just work
 on making int4_avg_accum faster. I still see no principal reason what it
 has to be noticeably slower - the only additional work it absolutely *has*
 to perform is *one* 64-bit increment.

I played with this a bit.

Currently, int4_avg_accum invokes AggCheckCallContext every time, and also
repeatedly checks whether the array has the required dimension - which,
incidentally, is the only big difference between int4_avg_accum and int4_sum.

To avoid that, I added a flags field to fcinfo which nodeAgg uses to tell
transition functions whether they're called for the first time, or are being
called with whatever state they themselves returned last time, i.e the
n-th time.

If the n-th time flag is set, int4_avg_accum simply retrieves the state with
PG_GETARG_DATUM() instead of PG_GETARG_ARRAYTYPE_P(), relying on the fact
that it never returns toasted datums itself, and also doesn't bother to validate
the array size, for the same reason.

If the flag is not set, it uses PG_GETARG_ARRAYTYPE_COPY_P and does validate
the array size. In theory, it could further distinguish between a 1st call
in an aggregation context (where the copy is unnecessary), and a user-initiated
call (i.e. outside an aggregation). But that seems unnecessary - one additional
copy per aggregation group seems unlikely to be a problem.

With this in place, instruction-level profiling using Apple's Instruments.app
shows that int4_avg_accum takes about 1.5% of the total runtime of a simple
aggregation, while int4_sum takes about 1.2%. 

A (very rough) patch is attached.

I haven't been able to repeat Tom's measurement which shows a 5% difference in
performance between the invertible and the non-invertible versions of SUM(int4),
so I cannot say if this removes that. But the profiling I've done would 
certainly
indicate it should.

best regards,
Florian Pflug



invtrans_sumint4_opt2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-10 Thread Florian Pflug
On Apr10, 2014, at 21:34 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 10 April 2014 19:54, Tom Lane t...@sss.pgh.pa.us wrote:
 So if we go with that terminology, perhaps these names for the
 new CREATE AGGREGATE parameters:
 
 initfuncapplies to plain aggregation, mutually exclusive with 
 initcond
 msfunc  (or just mfunc?) forward transition for moving-agg mode
 mifunc  inverse transition for moving-agg mode
 mstype  state datatype for moving-agg mode
 msspace space estimate for mstype
 mfinalfunc  final function for moving-agg mode
 minitfunc   firsttrans for moving-agg mode
 minitcond   mutually exclusive with minitfunc
 
 I think I prefer mfunc to msfunc, but perhaps that's just my
 natural aversion to the ms prefix :-)
 
 Also, perhaps minvfunc rather than mifunc because i by itself
 could mean initial.

I still think you're getting ahead of yourselves here. The number of
aggregates which benefit from this is tiny SUM(int2,int4) and maybe
BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs -
for the others, the state type is already pass-by-ref.

I don't think we should be additional that much additional machinery
until it has been conclusively demonstrated that the performance
regression cannot be fixed any other way. Which, quite frankly, I
don't believe. Nothing in int4_avg_accum looks particularly expensive,
and the things that *do* seem to cost something certainly can be made
cheaper. c.f. the patch I just posted.

Another reason I'm so opposed to this is that inverse transition
functions might not be the last kind of transition functions we ever
add. For example, if we ever get ROLLUP/CUBE, we might want to have
a mergefunc which takes two aggregation states and combines them 
into one. What do we do if we add those? Add yet a another set of
mergable transition functions? What about the various combinations
of invertible/non-invertible mergable/non-mergable that could result?
The opportunity cost seems pretty high here...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-10 Thread Florian Pflug
On Apr11, 2014, at 00:07 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 I still think you're getting ahead of yourselves here. The number of
 aggregates which benefit from this is tiny SUM(int2,int4) and maybe
 BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs -
 for the others, the state type is already pass-by-ref.
 
 That argument is reasonable for the initfunc idea, but it doesn't apply
 otherwise.

Why not? AFAICS, the increase in cost comes from going from an
by-value to a by-reference state type. Once you're using a by-refence
type, you already pay the overhead of the additional dereferences, and
for calling AggCheckCallContext() or some equivalent. 

 Another reason I'm so opposed to this is that inverse transition
 functions might not be the last kind of transition functions we ever
 add. For example, if we ever get ROLLUP/CUBE, we might want to have
 a mergefunc which takes two aggregation states and combines them 
 into one. What do we do if we add those?
 
 Make more pg_aggregate columns.  What exactly do you think is either
 easier or harder about such cases?  Also, maybe I'm misremembering
 the spec, but ROLLUP/CUBE wouldn't apply to window functions would
 they?  So if your argument is based on the assumption that these
 features need to combine, I'm not sure it's true.

Well, it was just an example - there might be other future extensions
which *do* need to combine. And we've been known to go beyond the spec
sometimes...

 Furthermore, I do not buy the argument that if we hack hard enough,
 we can make the performance cost of forcing the sfunc to do double duty
 negligible.  In the first place, that argument is unsupported by much
 evidence, and in the second place, it will certainly cost us complexity
 to make the performance issue go away.  Instead we can just design the
 problem out, for nothing that I see as a serious drawback.

My argument is that is costs us more complexity to duplicate everything
for the invertible case, *and* the result seems less flexible - not
from the POV of aggregate implementations, but from the POV of future
extensions.

As for evidence - have you looked at the patch I posted? I'd be very
interested to know if it removes the performance differences you saw.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-10 Thread Florian Pflug
On Apr11, 2014, at 01:30 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 As for evidence - have you looked at the patch I posted? I'd be very
 interested to know if it removes the performance differences you saw.
 
 (1) You can't really prove the absence of a performance issue by showing
 that one specific aggregate doesn't have an issue.

I'm claiming that SUM(int4) is about as simple as it gets, so if the
effect can be mitigated there, it can be mitigated everywhere. The more
complex a forward-only transition function is, the less will and added if
or two hurt.

 (2) These results
 (mine as well as yours) seem mighty platform-dependent, so the fact you
 don't see an issue doesn't mean someone else won't.

Yeah, though *any* change - mine, the one your propose, and any other -
has the potential to hurt some platform due to weird interactions (say,
cache trashing). 

 (3) A new
 FunctionCallInfoData field just for this?  Surely not.  There's got to be
 a distributed cost to that (although I notice you didn't bother
 initializing the field most places, which is cheating).

I think the field doesn't actually increase the size of the structure at
all - at least not if the bool before it has size 1 and the short following
it is 2-byte aligned. Or at least that was why I made it a char, and added
it at the place I did. But I might be missing something...

Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though
maybe not everybody uses that - I didn't check, this was just a quick hack.

 Now point 3 could be addressed by doing the signaling in some other way
 with the existing context field.  But it's still the case that you're
 trying to band-aid a bad design.  There's no good reason to make the sfunc
 do extra work to be invertible in contexts where we know, with certainty,
 that that work is useless.

This is the principal point where we disagree, I think. From my POV, the
problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need
*any* extra state at all to be invertible, if it weren't for those pesky
issues surrounding NULL handling. In fact, an earlier version of the
invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
only reason this doesn't work nowadays is that Dean didn't like the
forward-nonstrict-but-inverse-strict special case that made this work.

The way I see it, the main problem is the drop in performance that comes
from using a pass-by-ref state type. Which IMHO happens because this
*already* is a heavily band-aided design - all the state validation and
if (AggCheckCallContext(,NULL)) stuff really works around the fact that
transition functions *aren't* supposed to be user-called, yet they are,
and must deal.

Which is why I feel that having two separate sets of transition functions
and state types solves the wrong problem. If we find a way to prevent
transition functions from being called directly, we'll shave a few cycles
of quite a few existing aggregates, invertible or not. If we find a way
around the initfunc-for-internal-statetype problem you discovered, the
implementation would get much simpler, since we could then make nearly
all of them strict. And this would again shave off a few cycles - for lots
of NULL inputs, the effect could even be large.

Compared to that, the benefit of having a completely separate set of
transition functions and state types for the invertible case is much
less tangible, IMHO.

 Especially not when we know that even a few instructions of extra work
 can be performance-significant.

But where do we draw that line? nodeWindowAgg.c quite certainly wastes
about as many cycles as int4_avg_accum does on various checks that are
unnecessary unless in the non-sliding window case. Do we duplicate those
functions too?

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug
On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote:
 As a quick check, I compared aggregation performance in HEAD, non-assert
 builds, with and without --disable-float8-byval on a 64-bit machine.
 So this tests replacing a pass-by-val transition datatype with a
 pass-by-ref one without any other changes.  There's essentially no
 difference in performance of sum(int4), AFAICT, but that's because
 int4_sum goes out of its way to cheat and avoid palloc overhead.

 I looked to the bit_and() aggregates to see what would happen to
 an aggregate not thus optimized.  As expected, int4 and int8 bit_and
 are just about the same speed if int8 is pass by value ... but if it's
 pass by ref, the int8 case is a good 60% slower.

True, but that just means that aggregate transition functions really
ought to update the state in-place, no?

 So added palloc overhead, at least, is a no-go.  I see that the patched
 version of sum(int4) avoids that trap, but nonetheless it's replaced a
 pretty cheap transition function with a less cheap function, namely the
 function previously used for avg(int4).  A quick test says that avg(int4)
 is about five percent slower than sum(int4), so that's the kind of hit
 we'd be taking on non-windowed aggregations if we do it like this.

That's rather surprising though. AFAICS, there's isn't much difference
between the two transfer functions int4_sum and int4_avg_accum at all.

The differences come down to (+ denoting things which ought to make
int4_avg_accum slower compared to int4_sum, - denoting the opposite)

 1. +) int4_avg_accum calls AggCheckCallContext
 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
   to verify that the state is a 2-element array without NULL entries
 4. -) int4_sum checks if the input is NULL

The number of conditional branches should be about the same (and all are
seldomly taken). The validity checks on the state array, i.e. (3), should
be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but
not so much more expensive either. That leaves the AggCheckCallContext call.
If that call costs us 5%, maybe we can find a way to make it faster, or
get rid of it entirely? Still seems a lot of a call of a not-very-complex
function, though...

I'll go and check the disassembly - maybe something in int4_avg_accum turns
out to be more complex than is immediately obvious. I'll also try to create
a call profile, unless you already have one from your test runs.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug
On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 There was discussion upthread of providing
 two separate forward transition functions, but Florian argued that that
 would do nothing that you couldn't accomplish with a runtime check in
 the forward function.  I think that's nonsense though, because one of
 the key points here is that an invertible aggregate may require a more
 complex transition state data structure --- in particular, if you're
 forced to go from a pass-by-value to a pass-by-reference data type, right
 there you are going to take a big hit in aggregate performance, and there
 is no way for the forward transition function to avoid it.

To be precise, my point was that *only* having a separate non-invertible
forward transition function is pointless, exactly because of the reason
you gave - that won't allow a cheaper state representation for the
non-invertible case. So all such a non-invertible forward transition
function can do is to skip some bookkeeping that's required by the
inverse transition function - and *that* can just as easily be accomplished
by a runtime check.

I was (and still am) not in favour of duplicating the whole quadruple of
(state, initialvalue, transferfunction, finalfunction) because it seems
excessive. In fact, I believed that doing this would probably be grounds for
outright rejection of the patch, on the base of catalog bloat. And your
initial response to this suggestion seemed to confirm this.

 The patch has in fact already done that to a couple of basic aggregates like
 sum(int4).  Has anyone bothered to test what side-effects that has on
 non-windowed aggregation performance?

I'm pretty sure David Rowley did some benchmarking. The results should be
in this thread somewhere I think, but they currently evade me... Maybe David
can re-post, if he's following this...

 I think what'd make sense is to have a separate forward function *and*
 separate state datatype to be used when we want invertible aggregation
 (hm, and I guess a separate final function too).  So an aggregate
 definition would include two entirely independent implementations.
 If they chance to share sfunc and stype, fine, but they don't have to.
 
 This would mean we'd need some new names for the doppelgangers of the
 CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond
 (but not sortop).  I guess that'd open up a chance to use a more uniform
 naming scheme, but I'm not too sure what would be good.

If we really go down that road (and I'm far from convinced), then maybe
instead of having a bunch of additional fields, we could have separate
entries in pg_aggregate for the two cases, with links between them.

The non-invertible aggregates would have something like _noninv or so
appended to their name, and we'd automatically use them if we know we
won't need to remove entries from the aggregation state. That would also
allow the users to *force* the non-invertible aggregate to be used by
simply saying SUM_NONINV instead of SUM.

Then all we'd need would be an additional OID field that links the
invertible to the non-invertible definition.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug
On Apr9, 2014, at 23:17 , Florian Pflug f...@phlo.org wrote:

 On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote:
 A quick test says that avg(int4)
 is about five percent slower than sum(int4), so that's the kind of hit
 we'd be taking on non-windowed aggregations if we do it like this.
 
 That's rather surprising though. AFAICS, there's isn't much difference
 between the two transfer functions int4_sum and int4_avg_accum at all.
 
 The differences come down to (+ denoting things which ought to make
 int4_avg_accum slower compared to int4_sum, - denoting the opposite)
 
 1. +) int4_avg_accum calls AggCheckCallContext
 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
   to verify that the state is a 2-element array without NULL entries
 4. -) int4_sum checks if the input is NULL

I've done a bit of profiling on this (using Instruments.app on OSX). One
thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
since we know that the datum cannot possibly be toasted I think (or if it
was, nodeAgg.c should do the de-toasting).

The profile also attributes a rather large percent of the total runtime of
int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
rid of that would help quite a few transition functions, invertible or not.
That certainly seems doable - we'd need a way to mark functions as internal
support functions, and prevent user-initiated calls of such functions. 
Transition functions marked that way could then safely scribble over their
state arguments without having to consult AggCheckCallContext() first.

I've also compared the disassemblies of int4_sum and int4_avg_accum, and
apart from these differences, they are pretty similar.

Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
transition type is that it needs to return NULL, not 0, if zero rows
were aggregates. It might seems that it could just use int8 as state
with initial value NULL, but that only works if the transition functions
are non-strict (since the special case of state type == input type doesn't
apply here). And for non-strict transition functions need to deal with
NULL inputs themselves, which means counting the number of non-NULL inputs..

That problem would go away though if we had support for an initalfunction,
which would receive the first input value and initialize the state. In
the case of SUM(), the initial function would be strict, and thus would be
called on the first non-NULL input value, which it'd convert to int8 and
return as the new state. 

However, I still believe the best approach at this point is to just work
on making int4_avg_accum faster. I still see no principal reason what it
has to be noticeably slower - the only additional work it absolutely *has*
to perform is *one* 64-bit increment.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-08 Thread Florian Pflug
On Apr9, 2014, at 02:55 , David Rowley dgrowle...@gmail.com wrote:
 On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug f...@phlo.org wrote:
 
 As explain above, invtrans_bool is a bit problematic, since it carries
 a real risk of performance regressions. It's included for completeness'
 sake, and should probably not be committed at this time.
 
 Did you mean to write invtrans_minmax? Otherwise you didn't explain about
 you concerns with bool.

Grmpf. Should have re-read that once more before sending :-(

Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the
inverse transition function never fails for BOOL_AND and BOOL_OR. This
is why I factored it out into a separate patch, to make it easy to not
apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for
the confision.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-07 Thread Florian Pflug
On Apr5, 2014, at 09:55 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 5 April 2014 08:38, Dean Rasheed dean.a.rash...@gmail.com wrote:
 [snip] releasing it in this state feels a little half-baked
 to me.
 
 
 I regret writing that almost as soon as I sent it. The last of those
 queries is now over 10 times faster than HEAD, which is certainly
 worthwhile. What bugs me is that there is so much more potential in
 this patch.

Well, the perfect is the enemy of the good as they say. By all means,
let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic
version into 9.4.

 I think it's pretty close to being committable, but I fear that time
 is running out for 9.4.

I'm not aware of any open issues in the basic patch, other then 
scaling the reported calling statistics by nloops, which should be
trivial. I'll post an updated patch later today.

I don't really expect all the add-on patches to make it into 9.4  -
they don't seem to have gotten much attention yet - but at least 
the inverse transition functions for the basic arithmetic aggregates
should be doable I hope.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-04 Thread Florian Pflug
 ), which seem reasonable. But
 then I started testing performance, and I found cases where the
 improvement is not nearly what I expected.
 
 The example cited at the start of this thread is indeed orders of
 magnitude faster than HEAD:
 
 SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
 F
 
 
 
 
 
 
 I'm not sure how much additional work is required to sort this out,
 but to me it looks more realistic to target 9.5 than 9.4, so at this
 point I tend to think that the patch ought to be marked as returned
 with feedback.
 
 Thoughts?
 
 Regards,
 Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-04 Thread Florian Pflug

 On 04.04.2014, at 09:40, Dean Rasheed dean.a.rash...@gmail.com wrote:
 
 I'm not sure how much additional work is required to sort this out,
 but to me it looks more realistic to target 9.5 than 9.4, so at this
 point I tend to think that the patch ought to be marked as returned
 with feedback.

I think the patch is worthwhile, even without this additional optimization. In 
fact, If the optimization was part of the patch, there would probably be calls 
to factor it out, on the ground that the patch is already rather large.

I don't see what bumping the whole thing to 9.5 buys, compared do applying what 
we have now, and optimizing in 9.5 further.

best regards,
Florian Pflug

PS: Sorry for the broken mail I sent earlier - miss-touched on my Phone ;-(

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-27 Thread Florian Pflug
First, sorry guys for letting this slide - I was overwhelmed by other works,
and this kind of slipped my mind :-(

On Mar27, 2014, at 09:04 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 26 March 2014 19:43, David Rowley dgrowle...@gmail.com wrote:
 On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 David Rowley dgrowle...@gmail.com writes:
 I've attached an updated invtrans_strictstrict_base patch which has the
 feature removed.
 
 What is the state of play on this patch?  Is the latest version what's in
 
 http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org
 plus this sub-patch?  Is everybody reasonably happy with it?  I don't
 see it marked ready for committer in the CF app, but time is running
 out.
 
 
 As far as I know the only concern left was around the extra stats in the
 explain output, which I removed in the patch I attached in the previous
 email.
 
 
 Agreed. That was my last concern regarding the base patch, and I agree
 that removing the new explain output is probably the best course of
 action, given that we haven't reached consensus as to what the most
 useful output would be.

After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e.
simply reporting the total number of invocations of the forward transition
functions, and the total number of invocations of the reverse transition
function, over reporting nothing. The labels of the two counts would simply
be Forward Transitions and Reverse Transitions.

But I don't want this issue to prevent us from getting this patch into 9.4,
so if there are objections to this, I'll rip out the EXPLAIN stuff all
together.

 The invtrans_strictstrict_base.patch in my previous email replaces the
 invtrans_strictstrict_base_038070.patch in that Florian sent here
 http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org
 all of the other patches are unchanged so it's save to use Florian's latest
 ones
 
 Perhaps Dean can confirm that there's nothing else outstanding?
 
 
 Florian mentioned upthread that the docs hadn't been updated to
 reflect the latest changes, so I think they need a little attention.

I'll see to updating the docs, and will post a final patch within the next
few days.

Dean, have you by chance looked at the other patches yet?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-07 Thread Florian Pflug
On Mar5, 2014, at 23:49 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 On Mar5, 2014, at 18:37 , Tom Lane t...@sss.pgh.pa.us wrote:
 My advice is to lose the EXPLAIN output entirely.  If the authors of
 the patch can't agree on what it means, what hope have everyday users
 got of making sense of it?
 
 The question isn't what the current output means, but whether it's a
 good metric to report or not.
 
 If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

 If we don't report anything, then how would a user check whether a query
 is slow because of O(n^2) behaviour of a windowed aggregate, or because
 of some other reasons?
 
 [ shrug... ]  They can see whether the Window plan node is where the time
 is going.  It's not apparent to me that the extra numbers you propose to
 report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-07 Thread Florian Pflug
On Mar5, 2014, at 23:49 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 On Mar5, 2014, at 18:37 , Tom Lane t...@sss.pgh.pa.us wrote:
 My advice is to lose the EXPLAIN output entirely.  If the authors of
 the patch can't agree on what it means, what hope have everyday users
 got of making sense of it?
 
 The question isn't what the current output means, but whether it's a
 good metric to report or not.
 
 If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

 If we don't report anything, then how would a user check whether a query
 is slow because of O(n^2) behaviour of a windowed aggregate, or because
 of some other reasons?
 
 [ shrug... ]  They can see whether the Window plan node is where the time
 is going.  It's not apparent to me that the extra numbers you propose to
 report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl status with nonexistent data directory

2014-03-06 Thread Florian Pflug
On Mar6, 2014, at 00:08 , Bruce Momjian br...@momjian.us wrote:
 I have addressed this issue with the attached patch:
 
   $ pg_ctl -D /lkjasdf status
   pg_ctl: directory /lkjasdf does not exist
   $ pg_ctl -D / status
   pg_ctl: directory / is not a database cluster directory
 
 One odd question is that pg_ctl status has this comment for reporting
 the exit code for non-running servers:
 
printf(_(%s: no server running\n), progname);
 
/*
 * The Linux Standard Base Core Specification 3.1 says this should return
 * '3'
 * 
 https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
 */
exit(3);
 
 If they haven't passed us a data directory, we don't really know if the
 server is running or not, so the patch just returns '1'.

Why change the exit code at all in the ENOENT-case? If the directory
does not exist, it's fairly certain that the server is not running, so
3 seems fine. Technically, changing the return value is an API change
and might break things, so why do it if there's no clear benefit?

In the EPERM case (or, rather the non-ENOENT case), I agree with Amit
that 4 (meaning program or service status is unknown) fits much better
than 1 (meaning program is dead and /var/run pid file exists). So *if*
we change it at all, we should change it to 4, not to some other, equally
arbitrary value.

best regards,
Florian Pflug







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-05 Thread Florian Pflug
On Mar4, 2014, at 21:09 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 3 March 2014 23:00, Florian Pflug f...@phlo.org wrote:
 * In show_windowagg_info(), this calculation looks suspicious to me:
 
   double tperrow = winaggstate-aggfwdtrans /
   (inst-nloops * inst-ntuples);
 
 If the node is executed multiple times, aggfwdtrans will be reset in
 each loop, so the transitions per row figure will be under-estimated.
 ISTM that if you want to report on this, you'd need aggfwdtrans to be
 reset once per query, but I'm not sure exactly how to do that.
 
 ...
 
 Actually, I think it's misleading to only count forward transition
 function calls, because a call to the inverse transition function
 still represents a state transition, and is likely to be around the
 same cost. For a window of size 2, there would not be much advantage
 to using inverse transition functions, because it would be around 2
 transitions per row either way.
 
 True. In fact, I pondered whether to avoid using the inverse transition
 function for windows of 2 rows. In the end, I didn't because I felt that
 it makes custom aggregates harder to test.
 
 On the question of whether to count inverse transition function calls -
 the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
 number of state transitions, but rather to show whether the aggregation
 has O(n) or O(n^2) behaviour. The idea being that a value close to 1
 means inverse transition function works as expected, and larger values
 mean not working so well.
 
 Regarding multiple evaluations - I think I based the behaviour on how
 ntuples works, which also only reports the value of the last evaluation
 I think. But maybe I'm confused about this.
 
 
 No, it doesn't look like that's correct for multiple loops. Consider
 this example:
 
 ...
 
 It turns out that show_windowagg_info() is only called once at the
 end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
 tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
 1, you'd have to use this formula:
 
double tperrow = winaggstate-aggfwdtrans / inst-ntuples;

Hm, so I *was* confused - seems I mixed up ntuples (which counts the
total number of tuples over all loops) with what we report as rows
(i.e. the average number of rows per loop). Thanks for clearing that up!

 I'm still not convinced that's the most useful thing to report though.
 Personally, I'd prefer to just see the separate counts, e.g.:
 
   -  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
 time=0.019..0.094 rows=25 loops=4)
 Output: sum(i.i) OVER (?)
 Forward transitions: 25  Inverse transitions: 25
 
 IMO that gives a clearer picture of what's going on.

My problem with this is that if there are multiple aggregates, the
numbers don't really count the number of forward and reverse transfer
function invocations. What nodeWindowAgg.c really counts is the number
of *rows* it has to fetch from the tuple store to perform forward
aggregations - the relevant code snippet is

  if (numaggs_restart  0)
winstate-aggfwdtrans += (winstate-aggregatedupto 
  - winstate-frameheadpos);
  else
winstate-aggfwdtrans += (winstate-aggregatedupto
  - aggregatedupto_nonrestarted);

Now we could of course report these figures per aggregate instead of
only once per aggregation node. But as I said earlier, my guess is that
people usually won't be interested in the precise counts - most likely,
what you want to know is how much do the restarts cost me

When I added the EXPLAIN stuff, I initially simply reported the number
of times nodeWindowAgg has to restart the aggregation. The problem with
that approach is that not all restarts carry the same cost. If the frames
either don't overlap at all or are identical, restarts don't cause any
additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
m FOLLOWING), the performance effects of restarts depends on m-n.

Which is why I made it count the number of aggregated input rows instead.

Having said that, I' not really 100% happy with the name
Transitions Per Row for this - it was simply the best I could come up with
that was reasonably short. And I'm certainly open to reporting the absolute
count instead of a factor relative to ntuples.

If we made it an absolute count, would calling this Aggregated Rows work
for you?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-05 Thread Florian Pflug
On Mar5, 2014, at 18:37 , Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 I think we really need a larger consensus on this though, so I'd be
 interested to hear what others think.
 
 My advice is to lose the EXPLAIN output entirely.  If the authors of
 the patch can't agree on what it means, what hope have everyday users
 got of making sense of it?

The question isn't what the current output means, but whether it's a
good metric to report or not.

If we don't report anything, then how would a user check whether a query
is slow because of O(n^2) behaviour of a windowed aggregate, or because
of some other reasons? If inevitability where a purely static property,
then maybe we could get away with that, and say check whether your
aggregates are invertible or not. But since we have partially invertible
aggregates, the performance characteristics depends on the input data,
so we IMHO need some way for users to check what's actually happening.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-05 Thread Florian Pflug
On Mar5, 2014, at 18:27 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 5 March 2014 14:35, Florian Pflug f...@phlo.org wrote:
 When I added the EXPLAIN stuff, I initially simply reported the number
 of times nodeWindowAgg has to restart the aggregation. The problem with
 that approach is that not all restarts carry the same cost. If the frames
 either don't overlap at all or are identical, restarts don't cause any
 additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
 m FOLLOWING), the performance effects of restarts depends on m-n.
 
 Which is why I made it count the number of aggregated input rows instead.
 
 Having said that, I' not really 100% happy with the name
 Transitions Per Row for this - it was simply the best I could come up with
 that was reasonably short. And I'm certainly open to reporting the absolute
 count instead of a factor relative to ntuples.
 
 If we made it an absolute count, would calling this Aggregated Rows work
 for you?
 
 I'm not sure about naming, but I think my preference would be to
 output the correct absolute counts for both the forward and inverse
 transitions (i.e. multiply by the right combinations of numaggs and
 numaggs_restart). The EXPLAIN output already tells you how many
 aggregates there are, and how many rows there are, so you'd be able to
 work out from that how much extra work it's doing.

Hm, if we do that we might as well go all the way and simply report these
numbers per aggregate, instead of once per window aggregation node. That'd
provide the maximum amount of information, and be quite unambiguous.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-28 Thread Florian Pflug
On Feb28, 2014, at 15:45 , Emre Hasegeli e...@hasegeli.com wrote:
 The problem is that pg_dump --binary-upgrade dumps objects in the extension
 on the old cluster, not just the CREATE EXTENSION statement. pg_upgrade
 fails to restore them, if the new operator class already exists on the new
 cluster as the default. It effects all users who use the extension, even
 if they are not using the inet and cidr operator classes in it.

Hm, what if we put the new opclass into an extension of its own, say inet_gist,
instead of into core? 

Couldn't we then remove the inet support from the latest version of btree_gist
(the one we'd ship with 9.4)? People who don't use the old inet opclass could
then simply upgrade the extension after running pg_upgrade to get rid of the
old, broken version. People who *do* use the old inet opclass would need to
drop their indices before doing that, then install the extension inet_gist,
and finally re-create their indices. 

People who do nothing would continue to use the old inet opclass.

inet_gist's SQL script could check whether btree_gist has been upgrade, and
if not fail with an error like btree_gist must be upgraded to at least version
x.y before inet_gist can be installed. That would avoid failing with a rather
cryptic error later.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal

2014-02-28 Thread Florian Pflug
On Feb28, 2014, at 05:29 , Tan Tran tankimt...@gmail.com wrote:
 I'm applying for GSoC 2014 with Postgresql and would appreciate your comments
 on my proposal (attached).
 pg_gsoc2014_TanTran.pdf

First, please include your proposal as plain, inline text next time.
That makes it easier to quote the relevant parts when replying, and
also allows your mail to be indexed correctly by the mailing list
archive.

Regarding your proposal, I think you need to explain what exactly it
is you want to achieve in more detail.

 In particular, text and bytea are EXTERNAL by default, so that substring
 operations can seek straight to the exact slice (which is O(1)) instead
 of de-toasting the whole datum (which is O(file size)). Specifically,
 varlena.c’s text_substring(...) and bytea_substring(...) call
 DatumGetTextPSlice(...), which r!etrieves only the slice(s) at an
 easily-computed offset.!
 
 ...
 
 1. First, I will optimize array element retrieval and UTF-8 substring
 retrieval. Both are straightforward, as they involve calculating slice
 numbers and using similar code to above.!

I'm confused by that - text_substring *already* attempts to only fetch
the relevant slice in the case of UTF-8. It can't do so precisely - it
needs to use a conservative estimate - but I fail to see how that can
be avoided. Since UTF-8 maps a character to anything from 1 to 6 bytes,
you can't compute the byte offset of a given character index precisely.

You could store a constant number of *characters* per slice, instead of
a constant number of *bytes*, but due to the rather large worst-case of
6 bytes per character, that would increase the storage and access overhead
6 fold for languages which can largely be represented with 1 byte per
character. That's not going to go down well...

I haven't looked at how we currently handle arrays, but the problems
there are similar. For arrays containing variable-length types, you can't
compute the byte offset from the index. It's even worst than for varchar,
because the range of possible element lengths is much longer - one array
element might be only a few bytes long, while another may be 1kB or more...

 2. Second, I will implement a SPLITTER clause for the CREATE TYPE
 statement. As 1 proposes, one would define a type, for example:
   CREATE TYPE my_xml
 LIKE xml
 SPLITTER my_xml_splitter;

As far as I can tell, the idea is to allow a datatype to influence how
it's split into chunks for TOASTing so that functions can fetch only
the required slices more easily. To judge whether that is worthwhile or
not, you'd have to provide a concrete example of when such a facility
would be useful.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-27 Thread Florian Pflug
On Feb27, 2014, at 11:39 , Emre Hasegeli e...@hasegeli.com wrote:
 2014-02-24 17:55, Bruce Momjian br...@momjian.us:
 
 pg_upgrade has _never_ modified the old cluster, and I am hesitant to do
 that now.  Can we make the changes in the new cluster, or in pg_dump
 when in binary upgrade mode?
 
 It can be possible to update the new operator class in the new cluster
 as not default, before restore. After the restore, pg_upgrade can upgrade
 the btree_gist extension and reset the operator class as the default.
 pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
 you think it is a better solution? I think it will be more complicated
 to do in pg_dump when in binary upgrade mode.

Maybe I'm missing something, but isn't the gist of the problem here that
pg_dump won't explicitly state the operator class if it's the default?

If so, can't we just make pg_dump always spell out the operator class
explicitly? Then changing the default will never change the meaning of
database dumps, so upgraded clusters will simply continue to use the
old (now non-default) opclass, no?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-27 Thread Florian Pflug
On Feb27, 2014, at 17:56 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 Maybe I'm missing something, but isn't the gist of the problem here that
 pg_dump won't explicitly state the operator class if it's the default?
 
 That's not a bug, it's a feature, for much the same reasons that pg_dump
 tries to minimize explicit schema-qualification.

I fail to see the value in this for opclasses. It's certainly nice for
schema qualifications, because dumping one schema and restoring into a
different schema is probably quite common. But how often does anyone dump
a database and restore it into a database with a different default opclass
for some type?

Or is the idea just to keep the dump as readable as possible?

 At least, it's a feature for ordinary dumps.  I'm not sure whether it
 might be a good idea for binary_upgrade dumps.  That would depend on
 our policy about whether a new opclass has to have a new (and necessarily
 weird) name.  If we try to make the new opclass still have the nicest
 name then it won't help at all for pg_dump to dump the old name.

Well, given the choice between not ever being able to change the default
opclass of a type, and not being able to re-use an old opclass' name,
I'd pick the latter. Especially because for default opclasses, users don't
usually have to know the name anyway.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-02-25 Thread Florian Pflug
On Feb24, 2014, at 17:50 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 20 February 2014 01:48, Florian Pflug f...@phlo.org wrote:
 On Jan29, 2014, at 13:45 , Florian Pflug f...@phlo.org wrote:
 In fact, I'm
 currently leaning towards just forbidding non-strict forward transition
 function with strict inverses, and adding non-NULL counters to the
 aggregates that then require them. It's really only the SUM() aggregates
 that are affected by this, I think.
 
 I finally got around to doing that, and the results aren't too bad. The
 attached patches required that the strictness settings of the forward and
 reverse transition functions agree, and employ exactly the same NULL-skipping
 logic we always had.
 
 The only aggregates seriously affected by that change were SUM(int2) and
 SUM(int4).
 
 I haven't looked at this in any detail yet, but that seems much neater
 to me. It seems perfectly sensible that the forward and inverse
 transition functions should have the same strictness settings, and
 enforcing that keeps the logic simple, as well as hopefully making it
 easier to document.

Good to hear that you agree! I'll try to find some time to update the docs. 

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: IMPORT FOREIGN SCHEMA statement.

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 12:09 , Ronan Dunklau ronan.dunk...@dalibo.com wrote:
 I havent had a look at the patch yet since I dont have a nice editor right
 now, but how do you handle inter operability between datatypes?
 Specifically, how do you handle those datatypes which have a different name
 from the PostgreSQL name for them and/or are stored in a different manner?
 
 For the postgres_fdw POC implementation, this is done by parsing the 
 attributes type from the query result with the regtype input functions. The 
 attribute typmod is injected too.

Who says that the OIDs are the same on the local and the remove postgres
instance? For user-defined types, that's certainly not going to be true...

Also, why do you aggregate the lists of columns, types and oids into arrays
when querying them from the remote server? Just producing a query that returns
one row per table column seems much simpler, both conceptually and 
implementation
wise.

Finally, I think there are a few missing pieces. For example, you quite easily
could copy over not-NULL flags, but you currently don't. Similarly, what about
inheritance relationships between remote tables? There's a patch in the current
CF, I believe, which adds support for inheritance to foreign tables, so all 
you'd
have to do is to make the foreign table's inheritance structure match the remote
table's.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 16:46 , Craig Ringer cr...@2ndquadrant.com wrote:
 The real question IMO is why it's taking so long. It looks like
 cfindloop(...) is being called multiple times, with each call taking a
 couple of seconds.

Yeah, I wondered about this too. I've shortened the example a bit - here
are a few observations

postgres=# select regexp_matches(' $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 33.026 ms

postgres=# select regexp_matches('  $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 60.594 ms

postgres=# select regexp_matches('   $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 114.410 ms

postgres=# select regexp_matches('$a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 227.467 ms

postgres=# select regexp_matches(' $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 452.739 ms

postgres=# select regexp_matches('  $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 943.098 ms

postgres=# select regexp_matches(' $a$b$c$d$e$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 200.795 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 298.264 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 444.219 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 696.137 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$i$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 974.502 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 1369.703 ms

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 2747.766 ms

In other words, the observes runtime is roughly 2^i * 1.5^j for inputs
consiting of i leading spaces (any character will probably do) followed
by j substring of the form $X$ (X is an arbitrary character).

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SPI_connect on multi-threaded app

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 13:44 , John Williams jdwilliams1...@gmail.com wrote:
 I'm writing a pgsql extension in C, which is multithreaded. The SPI
 connection is global, so do I have to implement a lock to make sql
 queries in each thread, or can I make a connection on a per-thread basis?

Postgres backends aren't multi-threaded, and offer no support for
multiple threads whatsoever. AFAIK, the only safe way to use multiple
threads is to either restrict calls to *any* postgres function to only
one thread, or to use a single, big mutex to prevents multiple threads
from accessing postgres internals simultaneously.

You should also prevent auxiliary threads from executing the signal handlers
that postgres installs. See pthread_sigmask.

And you'll need to make sure that everything (i.e. the whole of postgres
and your extension) is built with multi-threading enabled. Otherwise,
things like errno might end up not being thread-local, meaning any syscall
your auxiliary threads do can interfere with postgres' error handling.

You might want to check whether you can run the multi-threaded parts
of your extension as a separate process, and communicate with the backend
via some IPC mechanism.

best regards,
Florian Pflugs



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 17:29 , Craig Ringer cr...@2ndquadrant.com wrote:
 The problem report claims that the issue does not occur on 9.1, but yet:
 
 git diff REL9_1_STABLE master  -- ./src/backend/utils/adt/regexp.c
 
 is utterly trivial; a copyright date line change, and 1609797c which
 just tweaks the includes. 9.0 has a much bigger diff.

On 9.1.12:

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
   regexp_matches
-
 {  $a$b$c$d$e$f$g$h$i$j,NULL}
(1 row)
Time: 1.048 ms

On HEAD

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
 regexp_matches  
-
 {  ,NULL}
 {a,NULL}
 {b,NULL}
 {c,NULL}
 {d,NULL}
 {e,NULL}
 {f,NULL}
 {g,NULL}
 {h,NULL}
 {i,NULL}
 {j,NULL}
(11 rows)
Time: 4787.239 ms

Aha! Since we go rid of regex_flavor pre-9.1, I don't have an immediate 
suspect...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 11:45 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
 On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 So what we need to do is to acquire a write barrier between the
 assignments to lwWaitLink and lwWaiting, i.e.
   proc-lwWaitLink = NULL;
   pg_write_barrier();
   proc-lwWaiting = false;
 
 You didn't really explain why you think that ordering is necessary?
 Each proc being awoken will surely see both fields updated, and other
 procs won't be examining these fields at all, since we already delinked
 all these procs from the LWLock's queue.
 
 The problem is that one the released backends could wake up concurrently
 because of a unrelated, or previous PGSemaphoreUnlock(). It could see
 lwWaiting = false, and thus wakeup and acquire the lock, even if the
 store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
 ordering here) yet.
 Now, it may well be that there's no practical consequence of that, but I
 am not prepared to bet on it.
 
 AFAICS there is a potential problem if three backends are involved, since
 by the time the waiting backend's lwWaitLink is set to NULL after the
 original lock holder released the lock, the waiting backend might already
 have acquired the lock (due to a spurious wakeup) *and* a third backend
 might have already enqueued behind it.
 
 The exact sequence for backends A,B,C that corrupts the wait queue is:
 
 A: Release lock, set B's lwWaiting to false
 B: Wakes up spuriously, takes the lock
 C: Enqueues behind B
 A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
   causing C and anyone behind it to block indefinitely.
 
 I don't think that can actually happen because the head of the wait list
 isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
 for a while...

Hm, true, but does that protect us under all circumstances? If another
backend grabs the lock before B gets a chance to do so, B will become the
wait queue's head, and anyone who enqueues behind B will do so by updating
B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
by the original lock holder.

The specific sequence I have in mind is:

A: Take lock
B: Enqueue
A: Release lock, set B's lwWaiting to false
D: Acquire lock
B: Enqueue after spurious wakeup
   (lock-head := B)
C: Enqueue behind B
   (B-lwWaitLink := C, lock-tail := C)
A: Set B's wWaitLink back to NULL, thereby corrupting the queue
   (B-lwWaitLink := NULL)
D: Release lock, dequeue and wakeup B
   (lock-head := B-lwWaitLink, i.e. lock-head := NULL)
B: Take and release lock, queue appears empty!
   C blocks indefinitely.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.
 
 I don't like that, this stuff is hard to debug already, having stale
 pointers around doesn't help. I think we should just add the barrier in
 the releases with barrier.h and locally use a volatile var in the
 branches before that.

I like the idea of updating lwWaiting and lwWaitLink while still holding the
spinlock better. The costs are probably negligible, and it'd make the code much
easier to reason about.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 13:36 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
 I don't think that can actually happen because the head of the wait list
 isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
 for a while...
 
 Hm, true, but does that protect us under all circumstances? If another
 backend grabs the lock before B gets a chance to do so, B will become the
 wait queue's head, and anyone who enqueues behind B will do so by updating
 B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
 by the original lock holder.
 
 The specific sequence I have in mind is:
 
 A: Take lock
 B: Enqueue
 A: Release lock, set B's lwWaiting to false
 D: Acquire lock
 B: Enqueue after spurious wakeup
   (lock-head := B)
 C: Enqueue behind B
   (B-lwWaitLink := C, lock-tail := C)
 A: Set B's wWaitLink back to NULL, thereby corrupting the queue
   (B-lwWaitLink := NULL)
 D: Release lock, dequeue and wakeup B
   (lock-head := B-lwWaitLink, i.e. lock-head := NULL)
 B: Take and release lock, queue appears empty!
   C blocks indefinitely.
 
 That's assuming either reordering by the compiler or an out-of-order
 store architecture, right?

Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head 
 and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.
 
 I don't like that, this stuff is hard to debug already, having stale
 pointers around doesn't help. I think we should just add the barrier in
 the releases with barrier.h and locally use a volatile var in the
 branches before that.
 
 I like the idea of updating lwWaiting and lwWaitLink while still holding the
 spinlock better. The costs are probably negligible, and it'd make the code 
 much
 easier to reason about.
 
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.

No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have 
on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?
 
 Yea, but I don't see a better alternative. Realistically the likelihood
 of a problem without the compiler reordering stuff is miniscule on any
 relevant platform that's supported by the !barrier.h branches. Newer
 ARMs are the only realistic suspect, and the support for in older
 releases wasn't so good...

Isn't POWER/PowerPC potentially affected by this? 

Also, there is still the alternative of not resetting lwWaitLink in 
LWLockRelease.
I can understand why you oppose that - it's certainly nicer to not have stray
pointer lying around. But since (as least as far as we know)

a) resetting lwWaitLink is not strictly necessary
b) resetting lwWaitLink introduces a race condition (however unlikely)

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.

Another idea for a fix would be to conflate lwWaiting and lwWaitLink into one
field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to true.

We'd then depend on pointer-sized stores being atomic, which I think we depend
on in other places already.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 16:32 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 10:26:07 -0500, Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 Another idea for a fix would be to conflate lwWaiting and lwWaitLink into 
 one
 field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
 tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
 enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to 
 true.
 
 We'd then depend on pointer-sized stores being atomic, which I think we 
 depend
 on in other places already.
 
 I don't believe that's true; neither that we depend on it now, nor that
 it would be safe to do so.
 
 Yea, we currently rely on 4 byte stores being atomic (most notably for
 xids), but we don't rely on anything bigger. I am not sure if there are
 architectures with 64bit pointers that aren't accessed atomically when
 aligned? Even if, that's certainly nothing that should be introduced
 when backpatching.

Hm, we could use 4-byte stores instead of 8-byte stores if we turned lwWaitLink
into an index into the proc array instead of a pointer to the PGPROC struct.

We could then use 0x instead place of NULL to indicate not waiting,
and PROCARRAY_MAXPROCS to indicate waiting, but no successor in the queue.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 16:51 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
 On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?
 
 Yea, but I don't see a better alternative. Realistically the likelihood
 of a problem without the compiler reordering stuff is miniscule on any
 relevant platform that's supported by the !barrier.h branches. Newer
 ARMs are the only realistic suspect, and the support for in older
 releases wasn't so good...
 
 Isn't POWER/PowerPC potentially affected by this? 
 
 I wouldn't consider it a major architecture... And I am not sure how
 much out of order a CPU has to be to be affected by this, the read side
 uses spinlocks, which in most of the spinlock implementations implies a
 full memory barrier which in many cache coherency designs will cause
 other CPUs to flush writes. And I think the control dependency in the
 PGSemaphoreUnlock() loop will actually cause a flush on ppc...

I guess it's sufficient for the store to lwWaitLink to be delayed.
As long as the CPU on which that store is executing hasn't managed to gain
exclusive access to the relevant cache line, memory barriers on the read
side won't help because the store won't be visible to other CPUs.

 Also, there is still the alternative of not resetting lwWaitLink in 
 LWLockRelease.
 I can understand why you oppose that - it's certainly nicer to not have stray
 pointer lying around. But since (as least as far as we know)
 
 a) resetting lwWaitLink is not strictly necessary
 
 I don't want to rely on that in the backbranches, that's an entirely new
 assumption. I think anything more than minimal changes are hard to
 justify for a theoretical issue that hasn't been observed in the field.
 
 I think the biggest danger here is that writes are reordered by the
 compiler, that we definitely need to protect against. Making a variable
 volatile or introducing a memory barrier is pretty simple to analyze.

Well, the assumption isn't all that new. We already have the situation that
a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
Currently, the process who took it off the queue is responsible for rectifying
that eventually, with the changed proposed below it'll be the backend who
owns the PGPROC. From the point of view of other backends, nothing really
changes.

 
 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
 
 It's not just cleanliness, it's being able to actually debug crashes.

We could still arrange for the stray lwWaitLink from being visible only
momentarily. If a backend is woken up and detects that lwWaiting is false,
it knows that it cannot be on any wait queue - it was just removed, and
hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
back to NULL. The stray value would thus only be visible while a backend 
executes
the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
from a stack trace. So I'm not convinced that this makes debugging any harder.

(knizhnik has just suggested the same)

It's interesting, BTW, that updating lwWaitLink after lwWaiting is OK here -
the crucial property is not that it's updated before lwWaiting, but rather that
it is reset before enqueuing the backend again. Which is something that backend
itself can guarantee far more easily than whoever happens to de-queue it due to
spurious wakeups.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 19:21 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
 Well, the assumption isn't all that new. We already have the situation that
 a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
 Currently, the process who took it off the queue is responsible for 
 rectifying
 that eventually, with the changed proposed below it'll be the backend who
 owns the PGPROC. From the point of view of other backends, nothing really
 changes.
 
 That window is absolutely tiny tho.

True, but it's there, so if anything counts on that never being the case, it's
still already broken.

 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
 
 It's not just cleanliness, it's being able to actually debug crashes.
 
 We could still arrange for the stray lwWaitLink from being visible only
 momentarily. If a backend is woken up and detects that lwWaiting is false,
 it knows that it cannot be on any wait queue - it was just removed, and
 hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
 back to NULL. The stray value would thus only be visible while a backend 
 executes
 the PGSemaphoreLock() loop, and whether or not this is the case can be 
 deduced
 from a stack trace. So I'm not convinced that this makes debugging any 
 harder.
 
 That's not actually safe on an out of order architecture afaics. Without
 barriers the store to lwWaitLink in the woken up backend can preempt the
 read for the next element in the PGSemaphoreUnlock() loop.

Hm, true. I guess we could prevent that by introducing an artificial dependency
between the read and the write - something like

  proc = head;
  head = proc-lwWaitLink
  /*
   * We don't ever expect to actually PANIC here, but the test forces the
   * load to execute before the store to lwWaiting. This prevents a race
   * between reading lwWaitLink here and resetting it back to zero in the
   * awoken backend (Note that backends can wake up spuriously, so just
   * reading it before doing PGSemaphoreUnlock is insufficient)
   */
  if (head != MyProc)
proc-lwWaiting = false;
  else
elog(PANIC, ...)
  PGSemaphoreUnlock(proc-sem);

(This assumes that proc is a volatile pointer)

Another idea would be to do as you suggest and only mark the PGPROC pointers
volatile, but to additionally add a check for queue corruption somewhere. We 
should
be able to detect that - if we ever hit this issue, LWLockRelease should find a
PGPROC while traversing the queue whose lwWaitLink is NULL but which isn't 
equal to
lock-tail. If that ever happens, we'd simply PANIC.

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-13 Thread Florian Pflug
On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 So what we need to do is to acquire a write barrier between the
 assignments to lwWaitLink and lwWaiting, i.e.
proc-lwWaitLink = NULL;
pg_write_barrier();
proc-lwWaiting = false;
 
 You didn't really explain why you think that ordering is necessary?
 Each proc being awoken will surely see both fields updated, and other
 procs won't be examining these fields at all, since we already delinked
 all these procs from the LWLock's queue.
 
 The problem is that one the released backends could wake up concurrently
 because of a unrelated, or previous PGSemaphoreUnlock(). It could see
 lwWaiting = false, and thus wakeup and acquire the lock, even if the
 store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
 ordering here) yet.
 Now, it may well be that there's no practical consequence of that, but I
 am not prepared to bet on it.

AFAICS there is a potential problem if three backends are involved, since
by the time the waiting backend's lwWaitLink is set to NULL after the
original lock holder released the lock, the waiting backend might already
have acquired the lock (due to a spurious wakeup) *and* a third backend
might have already enqueued behind it.

The exact sequence for backends A,B,C that corrupts the wait queue is:

A: Release lock, set B's lwWaiting to false
B: Wakes up spuriously, takes the lock
C: Enqueues behind B
A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
   causing C and anyone behind it to block indefinitely.

I wonder whether LWLockRelease really needs to update lwWaitLink at all.
We take the backends we awake off the queue by updating the queue's head and
tail, so the contents of lwWaitLink should only matter once the backend is
re-inserted into some wait queue. But when doing that, we reset lwWaitLink
back to NULL anway.

best regards,
Florian Pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread Florian Pflug
On Feb12, 2014, at 12:55 , MauMau maumau...@gmail.com wrote:
 From: Andres Freund and...@2ndquadrant.com
 It's x86, right? Then it's unlikely to be actual unordered memory
 accesses, but if the compiler reordered:
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   proc-lwWaiting = false;
   PGSemaphoreUnlock(proc-sem);
 to
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   proc-lwWaiting = false;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   PGSemaphoreUnlock(proc-sem);
 which it is permitted to do, yes, that could cause symptoms like you
 describe.
 
 Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for 
 x86_64.
 The PostgreSQL was built with GCC.

The relevant part of the disassembled binary you attached seems to be

Dump of assembler code for function LWLockRelease:
...
0x00647f47 LWLockRelease+519: lea0x10(%rcx),%rdi
0x00647f4b LWLockRelease+523: movq   $0x0,0x48(%rcx)
0x00647f53 LWLockRelease+531: movb   $0x0,0x41(%rcx)
0x00647f57 LWLockRelease+535: callq  0x606210 PGSemaphoreUnlock

I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity
and lwWaiting a single-byte quantity, it's pretty much certain that the
first store updates lwWaitLink and the second lwWaiting. Thus, no reordering
seems to have taken place here...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-29 Thread Florian Pflug
On Jan29, 2014, at 09:59 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 28 January 2014 20:16, Florian Pflug f...@phlo.org wrote:
 On Jan27, 2014, at 23:28 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 This case is explicitly forbidden, both in CREATE AGGREGATE and in the
 executor. To me, that seems overly restrictive --- if transfn is
 strict, then you know for sure that no NULL values are added to the
 aggregate state, so surely it doesn't matter whether or not
 inv_transfn is strict. It will never be asked to remove NULL values.
 
 I think there are definite use-cases where a user might want to use a
 pre-existing function as the inverse transition function, so it seems
 harsh to force them to wrap it in a strict function in this case.
 
 I'm not sure that the likelihood of someone wanting to combine a strict
 forward with a non-strict inverse function is very hight, but neither
 
 
 Me neither, but the checks to forbid it aren't adding anything, and I
 think it's best to make it as flexible as possible.

Ok.

 Another idea would be to have an explicit nulls=ignore|process option
 for CREATE AGGREGATE. If nulls=process and either of the transition
 functions are strict, we could either error out, or simply do what
 normal functions calls do and pretend they return NULL for NULL inputs.
 Not sure how the rule that forward transition functions may not return
 NULL if there's an inverse transition function would fit in if we do
 the latter, though.
 
 The question is - is it worth it the effort to add that flag?
 
 One use-case I had in mind upthread was suppose you wanted to write a
 custom version of array_agg that only collected non-NULL values. With
 such a flag, that would be trivial, but with the current patch you'd
 have to (count-intuitively) wrap the inverse transition function in a
 strict function.

I'd be more convinced by that if doing so was actually possible for
non-superusers. But it isn't, because the aggregate uses internal as
it's state type and it's thus entirely up to the user to not crash the
database by mixing transfer and final functions with incompatible
state data. Plus, instead of creating a custom aggregate, one can just
use a FILTER clause to get rid of the NULL values.

 Another use-case I can imagine is suppose you wanted a custom version
 of sum() that returned NULL if any of the input values were NULL. If
 you knew that most of your data was non-NULL, you might still wish to
 benefit from an inverse transition function. Right now the patch won't
 allow that because of the error in advance_windowaggregate(), but
 possibly that could be relaxed by forcing a restart in that case.

That's not really true - that patch only forbids that if you insist on
representing the state i have seen a NULL input with a NULL state value.
But if you instead just count the number of NULLS in your transition
functions, all you need to do is to have your final function return NULL
if that count is not zero.

 If I've understood correctly that should give similar to current
 performance if NULLs are present, and enhanced performance as the
 window moved over non-NULL data.

Exactly - and this makes defining a NULL-sensitive SUM() this way
rather silly - a simple counter has very nearly zero overhead, and avoids
all rescans.

 In that second case, it would also be nice if you could simply re-use
 the existing sum forward and inverse transition functions, with a
 different null-handling flag.

Even if we had a nulls=process|ignore flag, SUM's transition functions
would still need to take that use-case into account explicitly to make
this work - at the very least, the forward transition function would
need to return NULL if the input is NULL instead of just skipping it as
it does now. But that would leads to completely unnecessary rescans, so
what we actually ought to do then is to make it track whether there have
been NULL inputs and make the finalfunc return NULL in this case. Which
would border on ironic, since the whole raison d'etre for this is to
*avoid* spreading NULL-counting logic around...

Plus, to make this actually useable, we'd have to document this, and tell
people how to define such a SUM aggregate. But I don't want to go there -
we really mustn't encourage people to mix-and-match built-in aggregates
with state type internal, since whether they work or crash depends
on factors outside the control of the user.

And finally, you can get that behaviour quite easily by doing

  CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END

 So I think an ignore-nulls flag would have real benefits, as well as
 being a cleaner design than relying on a strict inverse transition
 function.

The more I think about this, the less convinced I am. In fact, I'm
currently leaning towards just forbidding non-strict forward transition
function with strict inverses, and adding non-NULL counters to the
aggregates that then require them. It's really only the SUM() aggregates

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-28 Thread Florian Pflug
On Jan27, 2014, at 23:28 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 strict transfn vs non-strict inv_transfn
 
 
 This case is explicitly forbidden, both in CREATE AGGREGATE and in the
 executor. To me, that seems overly restrictive --- if transfn is
 strict, then you know for sure that no NULL values are added to the
 aggregate state, so surely it doesn't matter whether or not
 inv_transfn is strict. It will never be asked to remove NULL values.
 
 I think there are definite use-cases where a user might want to use a
 pre-existing function as the inverse transition function, so it seems
 harsh to force them to wrap it in a strict function in this case.
 
 AFAICS, the code in advance/retreat_windowaggregate should just work
 if those checks are removed.

True. It didn't use to in earlier version of the patch because the advance
and retreat functions looked at the strict settings directly, but now that
there's an ignore_nulls flag in the executor node, the only requirement
should be that ignore_nulls is set if either of the transition functions
is strict.

I'm not sure that the likelihood of someone wanting to combine a strict
forward with a non-strict inverse function is very hight, but neither

 non-strict transfn vs strict inv_transfn
 
 
 At first this seems as though it must be an error --- the forwards
 transition function allows NULL values into the aggregate state, and
 the inverse transition function is strict, so it cannot remove them.
 
 But actually what the patch is doing in this case is treating the
 forwards transition function as partially strict --- it won't be
 passed NULL values (at least in a window context), but it may be
 passed a NULL state in order to build the initial state when the first
 non-NULL value arrives.

Exactly.

 It looks like this behaviour is intended to support aggregates that
 ignore NULL values, but cannot actually be made to have a strict
 transition function. I think typically this is because the aggregate's
 initial state is NULL and it's state type differs from the type of the
 values being aggregated, and so can't be automatically created from
 the first non-NULL value.

Yes. I added this because the alternative would haven been to count
non-NULL values in most of the existing SUM() aggregates.

 That all seems quite ugly though, because now you have a transition
 function that is not strict, which is passed NULL values when used in
 a normal aggregate context, and not passed NULL values when used in a
 window context (whether sliding or not), except for the NULL state for
 the first non-NULL value.

Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does
and skip NULL inputs if the aggregate has a strict inverse transition
function. That fact that we never actually *call* the inverse doesn't
matter. Will fix that once we decided on the various strictness issues.

 I'm not sure if there is a better way to do it though. If we disallow
 this case, these aggregates would have to use non-strict functions for
 both the forward and inverse transition functions, which means they
 would have to implement their own non-NULL value counting.
 Alternatively, allowing strict transition functions for these
 aggregates would require that we provide some other way to initialise
 the state from the first non-NULL input, such as a new initfn.

I'm not sure an initfn would really help. It would allow us to return
to the initial requirement that the strict settings match - but you
already deem the check that the forward transition function can only
be strict if the inverse is also overly harsh, so would that really be
an improvement? It's also cost us an additional pg_proc entry per aggregate.

Another idea would be to have an explicit nulls=ignore|process option
for CREATE AGGREGATE. If nulls=process and either of the transition
functions are strict, we could either error out, or simply do what
normal functions calls do and pretend they return NULL for NULL inputs.
Not sure how the rule that forward transition functions may not return
NULL if there's an inverse transition function would fit in if we do
the latter, though.

The question is - is it worth it the effort to add that flag?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weird error messages from Windows upon client death

2014-01-28 Thread Florian Pflug
On Jan28, 2014, at 19:19 , Jeff Janes jeff.ja...@gmail.com wrote:
 On windows, if the client gets terminated while sending data to the server, 
 in a
 COPY for example, it results in some rather head-scratcher messages in the 
 server
 log, for example:
 
 LOG:  could not receive data from client: No connection could be made because
 the target machine actively refused it.
 
 Since the server was reading from the client and never tries to initiate a
 connection, the %m part of the message is a bit baffling.  The errno at this
 point is 10061.

My guess is that the server received a TCP RST, indicating that the client's
socket has gone away, and the the error message is the same for a RST received
during connection setup and a RST received later on.

During connection setup, it absolutely makes sense to say that the client has
actively refused the connection if it responds to a SYN packet with RST...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-26 Thread Florian Pflug
On Jan26, 2014, at 03:50 , Bruce Momjian br...@momjian.us wrote:
 Patch attached.

 + if ((float)tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon  INT_MAX)
 + return -1;

Is this bullet-proof? If float and int are both 32-bit, the float's mantissa
will be less than 32-bit (24 or so, I think), and thus won't be able to
represent INT_MAX accurately. This means there's a value of
tm_year * MONTHS_PER_YEAR which is less than INT_MAX, yet for which
tm_year * MONTHS_PER_YEAR + tm_mon will return tm_year * MONTHS_PER_YEAR
unmodified if tm_mon is small enough (e.g, 1). But if tm_year * MONTHS_PER_YEAR
was close enough to INT_MAX, the actual value of
tm_year * MONTHS_PER_YEAR + tm_mon might still be larger than INT_MAX,
the floating-point computation just won't detect it. In that case, the
check would succeed, yet the actual integer computation would still overflow.

It should be safe with double instead of float.

 + if (SAMESIGN(span1-month, span2-month) 
 + !SAMESIGN(result-month, span1-month))

This assumes wrapping semantics for signed integral types, which isn't
guaranteed by the C standard - it says overflows of signed integral types
produce undefined results. We currently depend on wrapping semantics for
these types in more places, and therefore need GCC's -fwrapv anway, but
I still wonder if adding more of these kinds of checks is a good idea.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-26 Thread Florian Pflug
On Jan26, 2014, at 10:19 , Simon Riggs si...@2ndquadrant.com wrote:
 Also, having
  plpgsql.warnings_as_errors = off (default) | on
 makes sense and should be included in 9.4

I still think this is a bad idea, for the same reasons I don't like
consistent_into (discussed in a separate thread).

But these objections would go away if restricted this to function
creation time only. So even with warnings_as_errors=on, you
could still *call* a function that produces a warning, but not
*create* one.

We could then integrate this with check_function_bodies, i.e. add a
third possible value error_on_warnings to that GUC, instead of
having a separate GUC for this.

 Putting this and all future options as keywords seems like a poor
 choice. Indeed, the # syntax proposed isn't even fully described on
 list here, nor are examples given in tests. My feeling is that nobody
 even knows that is being proposed and would likely cause more
 discussion if they did. So I wish to push back the # syntax to a later
 release when it has had more discussion. It would be good if you could
 lead that discussion in later releases.

+1

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-26 Thread Florian Pflug
On Jan26, 2014, at 00:24 , David Rowley dgrowle...@gmail.com wrote:
 On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug f...@phlo.org wrote:
 On Jan24, 2014, at 08:47 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 The invtrans_minmax patch doesn't contain any patches yet - David, could
 you provide some for these functions, and also for bool_and and bool_or?
 We don't need to test every datatype, but a few would be nice.
 
 I've added a few regression tests for min, min and bool_or and bool_and.
 I've pushed these up to here:
 
 https://github.com/david-rowley/postgres/commits/invtrans_minmax

OK, I've pushed this to github. I haven't produced new patches yet - I
think it's probably better to wait for a few things to pile up, to make
this less of a moving target. But ultimately, this is up to Dean - Dean,
I'll do whatever makes your life as a reviewer easier.

 I did wonder if you'd want to see uses of FILTER in there as I'm thinking
 that should really be covered by the core patch and the tests here really
 should just be checking the inverse transition functions for min and max.

I don't mind the FILTER - when this gets committed, the distinction between
what was in which patch goes away anyway. What I did realize when merging
this is that we currently don't tests the case of a window which lies
fully after the current row (i.e. BETWEEN N FOLLOWING AND m FOLLOWING). We
do test BETWEEN n PRECEDING AND m PRECEDING, because the array_agg and
string_agg tests do that. So maybe some of the MIN/MAX or arithmetic tests
could exercise that case? 

 As for the data types tested, I ended just adding tests for int and text
 for min and max. Let me know if you think that more should be tested.

That's sufficient for the regression tests, I think.

 As for bool_or and bool_and. I didn't think there was much extra that would
 need tested after I added 1 test. It's pretty simple code and adding anything
 extra seems like it would be testing something else.

Sounds fine to me.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] running make check with only specified tests

2014-01-26 Thread Florian Pflug
On Jan26, 2014, at 17:47 , Andrew Dunstan and...@dunslane.net wrote:
 I've often wanted to be able to run make check and just have it run the 
 small number of tests I am interested in. Here's a tiny patch along those 
 lines. It creates a new targe which I have called check-with for want of a 
 better name. And with it I can do:
 
   $ make check-with TESTS=json jsonb
 
 and have it do the temp install etc and then run just those two tests.

+1 for the feature (+Inf, actually), but will this work if the tests
depend on stuff created by other tests?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-25 Thread Florian Pflug
On Jan25, 2014, at 09:50 , David Rowley dgrowle...@gmail.com wrote:
 On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug f...@phlo.org wrote:
 On Jan23, 2014, at 01:17 , David Rowley dgrowle...@gmail.com wrote:
  On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug f...@phlo.org wrote:
  If you want to play with
  this, I think the first step has to be to find a set of guarantees that
  SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if 
  the
  summands all have the same sign, the error is bound by C * N, where C is 
  some
  (machine-specific?) constant (about 1e-15 or so), and N is the number of 
  input
  rows. Or at least so I think from looking at SUMs over floats in 
  descending
  order, which I guess is the worst case. You could, for example, try to 
  see if
  you can find a invertibility conditions which guarantees the same, but 
  allows
  C to be larger. That would put a bound on the number of digits lost by 
  the new
  SUM(float) compared to the old one.
 
  I guess if the case is that normal (non-window) sum(float) results are 
  undefined
  unless you add an order by to the aggregate then I guess there is no 
  possible
  logic to put in for inverse transitions that will make them behave the 
  same as
  an undefined behaviour.
 
 Actually, if sum(float) was really undefined, it'd be *very* easy to provide 
 an
 inverse transition function - just make it a no-op. Heck, you could then even
 make the forward transition function a no-op, since the very definition of
 undefined behaviour is result can be anything, including setting your 
 house
 on fire. The point is, it's *not* undefined - it's just imprecise. And the
 imprecision can even be quantified, it just depends on the number of
 input rows (the equal-sign requirement is mostly there to make imprecision
 equivalent to relative error).
 
 My apologies, I meant to use the term nondeterministic rather than undefined.
 There's really not any need that I can see to turn things silly here.

I wasn't trying to be absurd, I was trying to get a point across.

 My point was more that since sum(float) can give different results if it used
 an index scan rather than a seq scan, trying to get the inverse transition to
 match something that gives varying results sounds like a tricky task to take 
 on.

You don't have to match it digit-by-digit! In that I fully agree with Kevin -
floats are *always* an approximation, and so is thus SUM(float). Summarization
order is BTW not the only source of non-determinism for SUM(float) - the exact
result can very between architectures, and for some architectures between
compilers. (Intel is one of these, due to their 80-bit extended precision format
that gets truncated to 64-bit when stored to main memory).

But in a large number of cases, they won't vary by *much*, which is *the* reason
why SUM(float) is *not* totally useless. And reasoning about SUM(float) which
ignores that by simply calling it non-deterministic, undefined or whatever,
without any quantification of the possible error, has thus zero chance of
leading to interesting conclusions.


 Secondly, you'd really need a second aggregate definition - usually, the
 non-invertible aggregate will get away with a much smaller state 
 representation.
 Aggregates with state type internal could hack their way around that by
 simply not using the additional parts of their state structure, but e.g.
 plpgsql aggregates would still incur overhead due to repeated copying of
 a unnecessary large state value. If it's possible to represent the
 forward-only but not the invertible state state with a copy-by-value type,
 that overhead could easily be a large part of the total overhead you paid
 for having an inverse transition function.
 
 I had imagined they keep the same state type and don't use any extra variables
 that are defined for the forward transition function that is invertible.

Yeah, and the above explains that at least for non-C-language aggregates, 
passing around that unnecessarily large state may very well prove to be the
source of a large part, if not almost all, of the overhead. So having just
a separate forward transition function will buy you almost nothing or some
cases.

I just tried this. I defined two aggregates mymax(int4) and myfastmax(int4),
both with just a forward transition function, both SQL-language functions.
mymax uses a composite type for the state containing an int4 field holding
the current maximum, and a dummy int8 field. myfastmax uses a plain int4
for the state, holding the current maximum. Both forward transition
functions essentially do

  case when current_max  v then current_max else v end

On my laptop, computing the maximum of 1e6 rows takes about 4.5 seconds with
myfastmax and 7.8 seconds with mymax. If make mymax's transition function
increment the dummy field on every transition, the time increases from 7.8
to 8.2 seconds. So here, using a composite type for the state accounts for
about 3.3 seconds, or 40

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-24 Thread Florian Pflug
On Jan24, 2014, at 08:47 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 I think it should probably be broken up. It might be overly ambitious
 to try to get all of this committed during this commitfest, and in any
 case, I suspect that the committer would probably choose to commit it
 in stages. Perhaps something like:
 
 Patch 1
 - Basic support for inverse transition functions, CREATE AGGREGATE
 support and doc updates. This should include test cases to validate
 that the underlying executor changes are correct, by defining custom
 aggregates such as sum(int) and array_agg() using inverse transition
 functions.
 
 Patch 2
 - Add built-in inverse transition functions for count, sum(int), and friends.
 
 Patch 3, 4...
 - Other related groups of built-in aggregates. By this point, it
 should be a fairly mechanical process.
 
 Splitting it up this way now should help to focus on getting patch 1
 correct, without being distracted by all the other aggregates that may
 or may not usefully be made to have inverse transition functions. I
 think the value of the feature has been proved, and it is good to see
 that it can be applied to so many aggregates, but let's not try to do
 it all at once.

Working on that now, will post individual patches later today.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Florian Pflug
On Jan24, 2014, at 22:29 , Josh Berkus j...@agliodbs.com wrote:
 On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
 ISTM the consensus is that we need better monitoring/administration
 interfaces so that people can script the behavior they want in external
 tools. Also, a new synchronous apply replication mode would be handy,
 but that'd be a whole different patch. We don't have a patch on the
 table that we could consider committing any time soon, so I'm going to
 mark this as rejected in the commitfest app.
 
 I don't feel that we'll never do auto-degrade is determinative;
 several hackers were for auto-degrade, and they have a good use-case
 argument.  However, we do have consensus that we need more scaffolding
 than this patch supplies in order to make auto-degrade *safe*.
 
 I encourage the submitter to resumbit and improved version of this patch
 (one with more monitorability) for  9.5 CF1.  That'll give us a whole
 dev cycle to argue about it.

There seemed to be at least some support for having way to manually
degrade from sync rep to async rep via something like

  ALTER SYSTEM SET synchronous_commit='local';

Doing that seems unlikely to meet much resistant on grounds of principle,
so it seems to me that working on that would be the best way forward for
the submitter. I don't know how hard it would be to pull this off,
though.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Passing direct args of ordered-set aggs to the transition function

2014-01-23 Thread Florian Pflug
Hi,

Is there a particular reason why the direct arguments of ordered-set
aggregates are not passed to the transition function too? It seems that
evaluating of some ordered-set aggregates would be much cheaper if we did
that.

For example, dense_rank() would then just need to count the number of rows
smaller than the hypothetical row, AFAICS.

Another example (that we don't currently provide, but still) would be a
histogram aggregate which receives an array of buckets as direct args and
returns a similarly shaped array of counters.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing direct args of ordered-set aggs to the transition function

2014-01-23 Thread Florian Pflug
On Jan23, 2014, at 17:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 Is there a particular reason why the direct arguments of ordered-set
 aggregates are not passed to the transition function too?
 
 Because they have to be evaluated only once.
 
 I did consider evaluating them once at the start and saving the values,
 but that's a bit problematic from a memory management standpoint.

Yeah, that's what I had in mind. I not sure I understand that memory
management problems you mention though - couldn't we just copy them to
some appropriate memory context, say aggcontext?

What I'm more concerned about is whether it'd still be possible to have
ordered_set_transition and ordered_set_transition_multi work for all the
ordered-set aggregates we currently have. But I have yet to wrap my head
fully around the VARIADIC any ANY stuff that goes on there...

 Still, if you have a good solution to that and the cycles to produce a
 patch, it's not too late to reconsider.  I concur that it's not that
 hard to think of cases where it'd be useful.

I'll see what I can do.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-22 Thread Florian Pflug
On Jan23, 2014, at 01:17 , David Rowley dgrowle...@gmail.com wrote:
 On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug f...@phlo.org wrote:
 If you want to play with
 this, I think the first step has to be to find a set of guarantees that
 SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
 summands all have the same sign, the error is bound by C * N, where C is some
 (machine-specific?) constant (about 1e-15 or so), and N is the number of 
 input
 rows. Or at least so I think from looking at SUMs over floats in descending
 order, which I guess is the worst case. You could, for example, try to see if
 you can find a invertibility conditions which guarantees the same, but allows
 C to be larger. That would put a bound on the number of digits lost by the 
 new
 SUM(float) compared to the old one.
 
 I guess if the case is that normal (non-window) sum(float) results are 
 undefined
 unless you add an order by to the aggregate then I guess there is no possible
 logic to put in for inverse transitions that will make them behave the same as
 an undefined behaviour.

Actually, if sum(float) was really undefined, it'd be *very* easy to provide an
inverse transition function - just make it a no-op. Heck, you could then even
make the forward transition function a no-op, since the very definition of
undefined behaviour is result can be anything, including setting your house
on fire. The point is, it's *not* undefined - it's just imprecise. And the
imprecision can even be quantified, it just depends on the number of
input rows (the equal-sign requirement is mostly there to make imprecision
equivalent to relative error). 

  If it seems sound enough, then I may implement it in C to see how much
  overhead it adds to forward aggregation for floating point types, but even
  if it did add too much overhead to forward aggregation it might be worth
  allowing aggregates to have 2 forward transition functions and if the 2nd
  one exists then it could be used in windowing functions where the frame
  does not have unbounded following.
 
 I don't think adding yet another type of aggregation function is the
 solution here.
 
 Why not?
  
 I thought, if in the cases where we need to burden the forward transition
 functions with extra code to make inverse transitions possible, then why
 not have an extra transition function so that it does not slow down normal
 aggregation?
 
 My testing of sum(numeric) last night does not show any slow down by that
 extra dscale tracking code that was added, but if it did then I'd be thinking
 seriously about 2 forward transition functions to get around the problem.
 I don't understand what would be wrong with that. The core code could just
 make use of the 2nd function if it existed and inverse transitions were
 thought to be required. i.e in a window context and does not
 have UNBOUNDED PRECEDING.

First, every additional function increases the maintenance burden, and at
some point the expected benefit simply isn't worth it. IMHO at least.

Secondly, you'd really need a second aggregate definition - usually, the
non-invertible aggregate will get away with a much smaller state representation.
Aggregates with state type internal could hack their way around that by
simply not using the additional parts of their state structure, but e.g.
plpgsql aggregates would still incur overhead due to repeated copying of
a unnecessary large state value. If it's possible to represent the
forward-only but not the invertible state state with a copy-by-value type,
that overhead could easily be a large part of the total overhead you paid
for having an inverse transition function.

And lastly, we could achieve much of the benefit of a second transition
function by simply telling the forward transition function whether to
expect inverse transition function calls. We already expose things like
the aggregation memory context to C-language transition functions, so the
basic mechanism is already there. In fact, I pondered whether to do this -
but then figured I'd leave it to however needs that facility first. Though
it wouldn't be much code - basically, setting a flag in the WindowAggState,
and exporting a function to be used by aggregates to read it, similar
to what AggCheckCallContext does.

best regards,
Florian Pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confusing documentation of ordered-set aggregates?

2014-01-22 Thread Florian Pflug
Hi

After reading through the relevant parts of sytnax.sgml, create_aggregate.smgl
and xaggr.sgml, I think I understand how these work - they work exactly like
regular aggregates, except that some arguments are evaluated only once and
passed to the final function instead of the transition function. The whole
ORDER BY thing is just crazy syntax the standard mandates - a saner
alternative would have been

 ordered_set_agg(direct1,...,directN, WITHIN(arg1,...,argM))

or something like that, right?

So whether ORDER BY implies any actual ordering is up to the ordered-set
aggregate's final function. Or at least that's what xaggr.sgml seems to say

 Unlike the case for normal aggregates, the sorting of input rows for an
 ordered-set aggregate is emphasisnot/ done behind the scenes, but is
 the responsibility of the aggregate's support functions.

but that seems to contradict syntax.sgml which says

 The expressions in the replaceableorder_by_clause/replaceable are
 evaluated once per input row just like normal aggregate arguments, sorted
 as per the replaceableorder_by_clause/replaceable's requirements, and
 fed to the aggregate function as input arguments.

Also, xaggr.sgml has the following to explain why the NULLs are passed for all
aggregated arguments to the final function, instead of simply not passing them
at all

 While the null values seem useless at first sight, they are important because
 they make it possible to include the data types of the aggregated input(s) in
 the final function's signature, which may be necessary to resolve the output
 type of a polymorphic aggregate.

Why do ordered-set aggregates required that, when plain aggregates are fine
without it? array_agg(), for example, also has a result type that is
determined by the argument type, yet it's final function doesn't take an
argument of type anyelement, even though it returns anyarray.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-22 Thread Florian Pflug
On Jan23, 2014, at 01:07 , David Rowley dgrowle...@gmail.com wrote:
 On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug f...@phlo.org wrote:
 On Jan20, 2014, at 08:42 , David Rowley dgrowle...@gmail.com wrote:
  On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug f...@phlo.org wrote:
  * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, 
  and
it's the last commit, so if you object to that, then you can merge up to
eafa72330f23f7c970019156fcc26b18dd55be27 instead of
de3d9148be9732c4870b76af96c309eaf1d613d7.
 
 
  Seems like sfunc really should be tfunc then we could have invtfunc. I'd 
  probably
  understand this better if I knew what the 's' was for in sfunc. I've not 
  applied
  this just yet. Do you have a reason why you think it's better?
 
 My issue with just invfunc is mainly that it's too generic - it doesn't 
 tell
 you what it's supposed to be the inverse of.
 
 I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and 
 that
 the naming is inspired by control theory, where the function which acts on 
 the
 state space is often called S.
 
 Ok, that makes more sense now and it seems like a reasonable idea. I'm not 
 not quite
 sure yet as when someone said upthread that these negative transition 
 functions as
 I was calling them at the time should really be called inverse transition 
 functions,
 I then posted that I was going to call the create aggregate option invfunc 
 which
 nobody seemed to object to. I just don't want to go and change that now. It 
 is very
 possible this will come up again when the committer is looking at the patch. 
 It would
 be a waste if it ended up back at invfunc after we changed it to invsfunc.

Since we already settled on inverse transition function, I kinda doubt that
calling the parameter invsfunc is going to meet a lot of resistance. But we can 
put
that off a little longer still...

I've pushed a few additional things to 
https://github.com/fgp/postgres/tree/invtrans.

* I update the CREATE AGGREGATE documentation, trying to include the 
description of
  the various modes of inverse transition functions into the paragraphs which 
already
  explained about STRICT for transition functions and such.

* I've also updated the list of window functions to include a list of those
  aggregates which potentially need to restart the computation, i.e. MIN/MAX 
and 
  the like.

* I've changed nodeWindowAgg.c to use per-aggregate aggregation contexts for the
  invertible aggregates. Without that, the aggregate context is potentially 
never
  reset, because that previously required *all* the aggregates to restart at the
  same time. That would be OK if we were sure not to leak unbounded amounts of
  stuff stores in that context, but unfortunately we sometimes do. For example,
  whenever a strict, invertible aggregate ends up with only NULL inputs, we
  re-initialize the aggregation, which leaks the old state value. We could
  pfree() that of course, but that state value might reference other stuff that
  we don't know about and thus cannot free. Separating the aggregation contexts
  is the only solution I came up with, so I did that.

* I've also tweaked an if to flag aggregates as invertible only if the frame 
head
  can actually move, i.e. if the frame start clause is something other than
  UNBOUNDED PRECEEDING. Since the choice of whether to use a private aggregation
  context is driven by that flag, that also makes the above apply only to 
aggregates
  were the inverse transition function is actually used.

I hope to find some time tomorrow or so to complete my pass through the 
documentation -
what's still missing as an explanation of the EXPLAIN VERBOSE ANALYZE field and 
maybe
some cleanup of xaggr.sgml.

Do you have any additional things pending?

best regards,
Florian Pflug







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-21 Thread Florian Pflug
On Jan21, 2014, at 10:53 , David Rowley dgrowle...@gmail.com wrote:
 It came to me that it might be possible to implement inverse transitions
 for floating point aggregates by just detecting if precision has been
 lost during forward transitions.
 
 I've written the test to do this as:
 
 IF state.value + value = state.value AND value  0 THEN
   newstate.precision_lost := true;
   newstate.value := state.value;
   ELSE
   newstate.precision_lost := false;
   newstate.value := state.value + value;
   END IF;
 
 
 The inverse transition function checks the precision_lost and if it's true it
 returns NULL. The core code is now implemented (thanks to Florian) to
 re-aggregate when NULL is returned from the inverse transition function.

That's not sufficient, I fear. You can lose all significant digits of the value
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
SELECT 1e16::float8 + 1.01::float8 = 1e16::float8 returns FALSE, yet
SELECT 1e16::float8 + 1.01::float8 - 1e16::float8 returns 2 where 1.01
would have been correct. That's still too much precision loss.

I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that 
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.

I don't have high hopes for this getting int 9.4, though.

 If it seems sound enough, then I may implement it in C to see how much
 overhead it adds to forward aggregation for floating point types, but even
 if it did add too much overhead to forward aggregation it might be worth
 allowing aggregates to have 2 forward transition functions and if the 2nd
 one exists then it could be used in windowing functions where the frame
 does not have unbounded following.

I don't think adding yet another type of aggregation function is the
solution here. 

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-21 Thread Florian Pflug
On Jan20, 2014, at 15:20 , Florian Pflug f...@phlo.org wrote:
 * In CREATE AGGREGATE, we should state the precise axioms we assume about 
 forward
 and inverse transition functions. The last time I read the text there, it was
 a bit ambiguous about whether inverse transition functions assume 
 commutativity,
 i.e. whether we assume that we can remove inputs other than the first one from
 the aggregation. Currently, all the inverse transition functions we have are,
 in fact, commutative, because all the forward transition function are also. 
 But
 we could e.g. add an inverse to array_agg and string_agg, and those would
 obviously need this ordering guarantee. I'd also like us to explain that the
 inverse in inverse transition function shouldn't be taken too literally. 
 For
 forward transition function F, inverse transition function G, and inputs 
 a,b,...
 we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
 initial state, then
 
  G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).
 
 (Well, actually we don't strictly require even that, the two states don't
 need to be identical, they just need to produce identical outputs when passed
 to the final function)
 
 * In CREATE AGGREGATE, we should also explain the NULL semantics you get
 with various combinations of strict and non-strict forward and inverse
 transition functions. I think a table listing all the combinations is in order
 there, with a column explaining the semantics you get. We should also clearly
 state that once you provide an inverse transition function, NULL isn't a valid
 state value anymore, except as an initial state, i.e. that the forward 
 transition
 function must never return NULL in this case.

I gave that a shot, the results are at 
https://github.com/fgp/postgres/tree/invtrans

 * The window function page should explain the performance hazards when
 a moving frame head is involved. Ideally, we'd list which aggregates never
 have to restart, and which sometimes might, and what you can do to avoid that.
 We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
 restarts are occurring. This is were we'd tell people to cast their numeric
 operands to a type with defined scale to avoid restarts, and were we'd state
 the SUM(float) *does* restart always due to precision loss issues.


 BTW, something that we haven't addressed at all is how inverse transition
 functions interact with what is called ordered-set aggregates. I haven't
 wrapped my head around these fully, I think, so I'm still not sure if there's
 anything to do there or not. Can those even be used as window functions?
 Should we forbid ordered-set aggregates from specifying an inverse transition
 function?

It seems that these aren't valid window function anyway, so there's nothing
to do for now, I think.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-21 Thread Florian Pflug
On Jan21, 2014, at 18:56 , Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Perhaps we should jettison entirely the idea of using the operating
 system's built-in sprintf and use one of our own that has all of the
 nice widgets we need, like a format code that's guaranteed to be right
 for uint64 and one that's guaranteed to be right for Size.  This could
 turn out to be a bad idea if the best sprintf we can write is much
 slower than the native sprintf on any common platforms ... and maybe
 it wouldn't play nice with GCC's desire to check format strings.
 
 That last is a deal-breaker.  It's not just whether gcc desires to check
 this --- we *need* that checking, because people get it wrong without it.

There's an attribute that enables this check for arbitrary functions AFAIR.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-20 Thread Florian Pflug
On Jan19, 2014, at 20:00 , David Rowley dgrowle...@gmail.com wrote:
 I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
which includes

* A bunch of missing declaration for *_inv functions

* An assert that the frame end doesn't move backwards - I realized that
  it is after all easy to do that, if it's done after the loop which adds
  the new values, not before.

* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
  transitions per row and aggregate. It's a bit imprecise, because it doesn't
  track the count per aggregate, but it's still a good metric for how well
  the inverse transition functions work. If the number is close to one, you
  know that very few rescans are happening.

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
  it's the last commit, so if you object to that, then you can merge up to
  eafa72330f23f7c970019156fcc26b18dd55be27 instead of
  de3d9148be9732c4870b76af96c309eaf1d613d7.

A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the
  inverse transition function returns NULL once, we never call it again, so the
  flag won't have any practical effect. And second, assume we ever called the
  forward transition function after the inverse fail, and then retried the 
inverse.
  In the case of do_numeric_discard(), that actually *could* allow the inverse
  to suddenly succeed - if the call to the forward function increased the dscale
  beyond that of the element we tried to remove, removal would suddenly be
  possible again. We never do that, of course, and it seems unlikely we ever
  will. But it's still weird to have code which serves no other purpose than to
  pessimize a case which would otherwise just work fine.

* The state == NULL checks in all the strict inverse transition functions are
  redundant.

I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-20 Thread Florian Pflug
On Jan20, 2014, at 08:42 , David Rowley dgrowle...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug f...@phlo.org wrote:
 * An assert that the frame end doesn't move backwards - I realized that
   it is after all easy to do that, if it's done after the loop which adds
   the new values, not before.
 
 I've applied this too, but I'm wondering why an elog for if the head moves
 back, but an assert if the tail moves back? 

When I put the frame head check in, I was concerned that the code might crash
or loop endlessly if aggregatedbase was ever larger than frameheadpos, so
I made it elog(), just for safety.

The frame end check, OTOH, is done at the very end, so it doesn't protect
against much, it just documents that it's not supposed to happen. 

But yeah, it's bit weird. Feel free to turn the frame end check into an
elog() too if you want to.

 * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
   it's the last commit, so if you object to that, then you can merge up to
   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
   de3d9148be9732c4870b76af96c309eaf1d613d7.
 
 
 Seems like sfunc really should be tfunc then we could have invtfunc. I'd 
 probably
 understand this better if I knew what the 's' was for in sfunc. I've not 
 applied
 this just yet. Do you have a reason why you think it's better?

My issue with just invfunc is mainly that it's too generic - it doesn't tell
you what it's supposed to be the inverse of.

I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.

 Thanks, yeah those really do need a review. I've lost a bit of direction with
 them and I'm not quite sure just how much detail to go in to with it. I'd like
 to explain a bit that users who need to use their numeric columns in window
 aggregates might want to think about having a defined scale to the numeric 
 rather
 than an undefined scale and explain that this is because the inverse 
 transition
 function for numeric bails out if it loses the maximum seen dscale. Though it
 does seem generally a good idea to have a defined scale, but then I guess 
 you've
 got to have a bit of knowledge about the numbers you're storing in that case.
 I'm not quite sure how to put that into words friendly enough for the 
 documents
 just yet and or exactly where to put the words. So any ideas or patches you 
 have
 around that would be great.

Here's what I image the docs should look like, very roughly

* In CREATE AGGREGATE, we should state the precise axioms we assume about 
forward
and inverse transition functions. The last time I read the text there, it was
a bit ambiguous about whether inverse transition functions assume commutativity,
i.e. whether we assume that we can remove inputs other than the first one from
the aggregation. Currently, all the inverse transition functions we have are,
in fact, commutative, because all the forward transition function are also. But
we could e.g. add an inverse to array_agg and string_agg, and those would
obviously need this ordering guarantee. I'd also like us to explain that the
inverse in inverse transition function shouldn't be taken too literally. For
forward transition function F, inverse transition function G, and inputs a,b,...
we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
initial state, then

  G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).

(Well, actually we don't strictly require even that, the two states don't
need to be identical, they just need to produce identical outputs when passed
to the final function)

* In CREATE AGGREGATE, we should also explain the NULL semantics you get
with various combinations of strict and non-strict forward and inverse
transition functions. I think a table listing all the combinations is in order
there, with a column explaining the semantics you get. We should also clearly
state that once you provide an inverse transition function, NULL isn't a valid
state value anymore, except as an initial state, i.e. that the forward 
transition
function must never return NULL in this case.

* The window function page should explain the performance hazards when
a moving frame head is involved. Ideally, we'd list which aggregates never
have to restart, and which sometimes might, and what you can do to avoid that.
We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
restarts are occurring. This is were we'd tell people to cast their numeric
operands to a type with defined scale to avoid restarts, and were we'd state
the SUM(float) *does* restart always due to precision loss issues.

BTW, something that we haven't addressed at all is how inverse transition
functions interact with what is called ordered-set aggregates. I haven't
wrapped my head around these fully, I think, so I'm still not sure if there's
anything to do there or not. Can those even

Re: [HACKERS] plpgsql.warn_shadow

2014-01-20 Thread Florian Pflug
On Jan20, 2014, at 14:05 , Marko Tiikkaja ma...@joh.to wrote:
 On 1/20/14 1:55 PM, Robert Haas wrote:
 On Mon, Jan 20, 2014 at 7:16 AM, Marko Tiikkaja ma...@joh.to wrote:
 What's so hard about plpgsql.warnings='all'?  Or if the fact that it's a
 list is your concern, I'm not going to oppose to making it a boolean.
 
 Sure, that'd be fine.  What I don't want is to have to start each function 
 with:
 
 #option warn_this
 #option warn_that
 #option warn_theotherthing
 #option warn_somethingelse
 #option warn_yetanotherthing
 #option warn_whatdoesthisdoagain
 
 Right.  Completely agreed.  The only reason I had them in the patch is to 
 have the
 ability to turn *off* a specific warning for a particular function.  But even
 that's of a bit dubious a value.

I think as long as there's an easy way to avoid a warning - in the case of
warn_shadow e.g. by renaming the shadowing variable - there's no real 
requirement
to be able to disable the warning on a per-function basis.

And if there isn't a simple way to avoid a particular warning then we
ought not warn about it anyway, I guess, because that would indicate that there
are genuine reasons for doing whatever it is the warning complains about.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-19 Thread Florian Pflug
On Jan19, 2014, at 05:27 , David Rowley dgrowle...@gmail.com wrote:
 I just finished implementing the inverse transition functions for bool_and
 and bool_or, these aggregates had a sort operator which I assume would have
 allowed an index scan to be performed, but since I had to change the first
 argument of these aggregates to internal and that meant I had to get rid of
 the sort operator...

Why does having transition type internal prevent you from specifying a
sort operator? The sort operator's argument types must match the *input*
type of the aggregate, not the transition type.

Here's a pure SQL implementation of an optimized bool_and called myand_agg
that uses state type bigint[] and specifies a sort operator.

  create or replace function myboolagg_fwd(counts bigint[], value bool)
  returns bigint[] as $$
select array[
  counts[1] + case value when true then 0 else 1 end,
  counts[2] + case value when true then 1 else 0 end
]
  $$ language sql strict immutable;

  create or replace function myboolagg_inv(counts bigint[], value bool)
  returns bigint[] as $$
select array[
  counts[1] - case value when true then 0 else 1 end,
  counts[2] - case value when true then 1 else 0 end
]
  $$ language sql strict immutable;

  create or replace function myboolagg_and(counts bigint[])
  returns bool as $$
select case counts[1] when 0 then true else false end
  $$ language sql strict immutable;

  create aggregate myand_agg (bool) (
stype = bigint[],
sfunc = myboolagg_fwd,
invfunc = myboolagg_inv,
finalfunc = myboolagg_and,
sortop = ,
initcond = '{0,0}'
  );

With this, doing

  create table boolvals as
select i, random()  0.5 as v from generate_series(1,1) i;
  create index on boolvals(v);

  explain analyze select myand_agg(v) from boolvals;

yields

 Result  (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 
loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 
rows=1 loops=1)
   -  Index Only Scan using boolvals_v_idx on boolvals  
(cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1)
 Index Cond: (v IS NOT NULL)
 Heap Fetches: 1
 Total runtime: 0.100 ms

which looks fine, no?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-19 Thread Florian Pflug
On Jan19, 2014, at 20:00 , David Rowley dgrowle...@gmail.com wrote:
 I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
This branch includes the following changes

* A bunch of missing declaration for *_inv functions

* An assert that the frame end doesn't move backwards - I realized that
 it is after all easy to do that, if it's done after the loop which adds
 the new values, not before.

* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
 transitions per row and aggregate. It's a bit imprecise, because it doesn't
 track the count per aggregate, but it's still a good metric for how well
 the inverse transition functions work. If the number is close to one, you
 know that very few rescans are happening.

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
 it's the last commit, so if you object to that, then you can merge up to
 eafa72330f23f7c970019156fcc26b18dd55be27 instead of
 de3d9148be9732c4870b76af96c309eaf1d613d7.

A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the
 inverse transition function returns NULL once, we never call it again, so the
 flag won't have any practical effect. And second, assume we ever called the
 forward transition function after the inverse fail, and then retried the 
inverse.
 In the case of do_numeric_discard(), that actually *could* allow the inverse
 to suddenly succeed - if the call to the forward function increased the dscale
 beyond that of the element we tried to remove, removal would suddenly be
 possible again. We never do that, of course, and it seems unlikely we ever
 will. But it's still weird to have code which serves no other purpose than to
 pessimize a case which would otherwise just work fine.

* The state == NULL checks in all the strict inverse transition functions are
 redundant.

I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-18 Thread Florian Pflug
On Jan18, 2014, at 06:15 , David Rowley dgrowle...@gmail.com wrote:
 On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug f...@phlo.org wrote:
 On Jan17, 2014, at 23:34 , David Rowley dgrowle...@gmail.com wrote:
 The test turned out to become:
  if (state-expectedScale == -1)
  state-expectedScale = X.dscale;
  else if (state-expectedScale != X.dscale)
  state-expectedScale = -2;
 
 In do_numeric_accum, then when we do the inverse I just check if
 expectedScale  0 then return NULL.
 
 Ok, so this will rescan if and only if the dscales of all inputs match.
 I still that's overly pessimistic - we've only got a problem when we
 removed the input with the largest dscale, no? So my suggestion would be
 
 sniped
 
 I'd think that this avoids more restarts without about the same effort,
 but I haven't tried this though, so maybe I'm missing something.
 
 This is not quite right as it means if all the values are the same then
 we reject inverse transitions since state-maxScale will always be equal
 to X.dscale.
 But you are right about the overly strict code I've put in, we should allow
 values with a less than the maximum dscale to be unaggregated without
 complaint. To implement this I needed a maxScaleCounter variable too so we
 only reject when the maxScaleCounter gets back to 0 again. 

Ups, sorry, yeah. Sounds sensible.

BTW, this made me realize that MIN and MAX currently have the same issue -
they'll rescan if the inputs are all equal. We could avoid that by doing what
you did with dscale - track the number of times we've seen the maximum. I
wonder if that would be worth it - it would, unfortunately, require use to
use state type internal there too, and hence to add final functions for all
the MIN/MAX aggregates. But that seems excessive. So for now, let's just
live with that.

If we really *do* want to optimize this case, we could
come to it from a completely different angle. Aggregates already special-case
MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t.
If we provided a way for the transition function to call the sort operator
specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and
inverse transition function and final function would work for all the
MIN and MAX aggregates. But that's material for a separate patch for 9.5

 Note that after this fix the results for my quick benchmark now look like:
 
 create table num (num numeric(10,2) not null);
 insert into num (num) select * from generate_series(1,2);
 select sum(num) over(order by num rows between current row and unbounded 
 following) from num; -- 113 ms
 select sum(num / 10) over(order by num rows between current row and unbounded 
 following) from num; -- 156ms
 select sum(num / 1) over(order by num rows between current row and unbounded 
 following) from num; -- 144 ms
 
 So it seems to be much less prone to falling back to brute force forward
 transitions. 
 It also seems the / 10 version must have had to previously do 1 brute
 force rescan but now it looks like it can do it in 1 scan.
 
 I'm not set on it, and I'm willing to try the lazy zeroing of the scale
 tracker array, but I'm just not quite sure what extra real use cases it
 covers that the one above does not. Perhaps my way won't perform inverse
 transitions if the query did sum(numericCol / 10) or so.
 
 Dunno how many people SUM over numerics with different dscales. Easily
 possible that it's few enough to not be worth fussing over.
 
 Going by Tom's comments in the post above this is possible just by having an
 unconstrained numeric column, but I guess there's still a good chance that
 even those unconstrained numbers have the same scale or at least the scale
 will likely not vary wildly enough to make us have to perform brute force
 forward transitions for each row.

Yeah, I'm convinced by now that your approach is the right trade-off there.
Those who do have values with wildly different dscales in their columns can
always add a cast to normalize them, if they experience a lot of restarts.

So let's just add a sentence or two to the SUM(numeric) documentation about
this, and be done.

 * build_aggregate_fnexprs() should allow NULL to be passed for 
 invtransfn_oid,
  I think. I don't quite like that we construct that even for plain 
 aggregates,
  and avoiding requires just an additional if.
 
 I'm not quite sure what you mean on this. It passes InvalidOid in normal
 aggregate calls (search for: InvalidOid, /* invtrans is not needed here */)
 and only looks up the function in build_aggregate_fnexprs if
 (OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved
 since that function is used for window aggregates and normal aggregates.

I was thinking about checking for **invtransfnexpr = NULL, and not assigning
if it is. But on second thought, you're right - the additional variable doesn't
really hurt. So let's leave it as it is.

 * Don't we need to check for volatile function in the filter

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 15:14 , David Rowley dgrowle...@gmail.com wrote:
 If you make any more changes would it be possible for you to base them on the 
 attached patch instead of the last one you sent?

Sure! The only reason I didn't rebase my last patch onto yours was that having 
the numeric stuff in there meant potentially better test coverage. Thanks for 
doing the merge!

 Perhaps if there's much more hacking to do we could start pushing to a guthub 
 repo to make it easier to work together on this.

Yeah, that seems like a good idea. Easier to keep track of then a bunch of 
patches floating around. Could you push your latest version?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 20:34 , David Rowley dgrowle...@gmail.com wrote:
 On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug f...@phlo.org wrote:
 
 I've now shuffled things around so that we can use inverse transition 
 functions
 even if only some aggregates provide them, and to allow inverse transition
 functions to force a restart by returning NULL. The rules regarding NULL 
 handling
 are now the following
 
 Maybe this is me thinking out loud, but I'm just thinking about the numeric 
 case again.
 Since the patch can now handle inverse transition functions returning NULL 
 when they
 fail to perform inverse transitions, I'm wondering if we could add an 
 expectedscale
 to NumericAggState, set it to -1 initially, when we get the first value set 
 the
 expectedscale to the dscale of that numeric, then if we get anything but that 
 value
 we'll set the expectedscale back to -1 again, if we are asked to perform an 
 inverse
 transition with a expectedscale as -1 we'll return null, otherwise we can 
 perform
 the inverse transition...

You could do better than that - the numeric problem amounts to tracking the 
maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale 
equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum 
scale
within the aggregated values isn't that bad. If we zero the dscale counters 
lazily,
the numbers of entries we have to zero is bound by the maximum dscale we 
encounter.
Since actually summing N digits is probably more expensive than zeroing them, 
and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.

It'd be interesting to do float() similar to the way you describe, though. We 
might
not be able to guarantee that we yield exactly the same result as without 
inverse
aggregation, but we might be able to bound the error. That might make it 
acceptable
to do this - as Kevin pointed out, float is always an approximation anyway. I 
haven't
really thought that through, though...

Anyway, with time running out fast if we want to get this into 9.4, I think we 
should
focus on getting this into a committable state right now.

I've started to look over what you've pushed to github, and it looks mostly 
fine.
I have a few comments - mostly cosmetic stuff - that I'll post once I finished 
reading
through it. I also plan to do some basic performance testing to verify that my
reshuffling of eval_windowaggregates() doesn't hurt aggregates without an 
inverse
transition function.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
 the core machinery performs the expected calls of the forward and
  reverse transition function. I was thinking about creating an aggregate in the
  regression tests which simply concatenates all the calls into a string, e.g.
  you might get F:1 F:2 F:3 I:1 if we aggregated 1,2,3 and then removed 1.
  I think that should be possible with an SQL-language forward and inverse
  transfer function, but I haven't tried. I can try, if you want.

best regards,
Florian Pflug











-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Florian Pflug
On Jan16, 2014, at 09:07 , David Rowley dgrowle...@gmail.com wrote:
 On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug f...@phlo.org wrote:
 The notnullcount machinery seems to apply to both STRICT and non-STRICT 
 transfer function pairs. Shouldn't that be constrained to STRICT transfer 
 function pairs? For non-STRICT pairs, it's up to the transfer functions to 
 deal with NULL inputs however they please, no?
 
 The reason I had to track the notnullcount was because for non-strict was 
 because:
 
 select sum(n) over (order by i rows between current row and unbounded 
 following) from (values(1,1),(2,NULL)) v(i,n);
 
 would otherwise return
 1
 0
 
 sum is not strict, so I guess we need to track notnullcount for non-strict 
 functions.
 See the code around if (peraggstate-notnullcount == 0) in 
 retreat_windowaggregate(). 

Yeah, I figured that was the reason, but you can't fix it that way. See 
http://www.postgresql.org/message-id/8e857d95-cba4-4974-a238-9dd7f61be...@phlo.org
 for a detailed explanation why. My 2.2 patch allows pairs of non-strict 
forward and strict inverse transfer functions exactly because of this - i.e., 
basically it decrees that if there's an inverse transfer function, the strict 
setting of the *inverse* function determines the aggregates NULL behaviour. The 
forward transfer function is then never called
for NULL inputs, but it *is* called with the NULL state for the first non-NULL 
input, and *must* then return a non-NULL state (hence it's technically not 
strict, it's strict only in the inputs, not in the state).

BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did 
that in 2.2. That doesn't matter for the built-in aggregates since these aren't 
created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed 
patched.

 The logic around movedaggbase in eval_windowaggregates() seems a bit 
 convoluted. Couldn't the if be moved before the code that pulls 
 aggregatedbase up to frameheadpos using the inverse aggregation function?

 I had a look at this and even tried moving the code to before the inverse 
 transitions, but it looks like that would only work if I added more tests to 
 the if condition to ensure that we're not about to perform inverse 
 transitions. To me it just seemed more bullet proof the way it is rather than 
 duplicating logic and having to ensure that it stays duplicated. But saying 
 that I don't think I've fully got my head around why the original code is 
 valid in the first place. I would have imagined that it should contain a 
 check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that sounds like 
 complete rubbish then I'll put it down to my head still being fried from work.

Ok, I get this now. That code really just is asking is the previous row's 
frame the same as the current row's frame in that if where you added 
movedagg. What confused me was the rather weird way that condition is 
expressed, which turns out is due to the fact that at the point of the if, we 
don't know the new frame's end. Now, we could use update_frametailpos() to find 
that, but that's potentially costly, especially if the tuple store was spilled 
to disk. So instead, the code relies on the fact that only if the frame end is 
n FOLLOWING/PRECEDING can the current row lie within the previous row's frame 
without the two frame's ends being necessarily the same.

For added confusion, that if never explicitly checks whether the frame's 
*heads* coincide - the previous code seems to have relied on the impossibility 
of having aggregatedbase = current  aggregatedupto after re-initializing 
the aggregation, because then aggregatedbase = aggregatedupto = 0. That's why 
you can't just move the if before the frameheadpos == aggregatedbase check. 
But you can *if* you also check whether aggregatedbase == frameheadpos in the 
if - which is clearer than relying on that rather subtle  assumption anyway.

BTW, the your patch will also fail badly if the frame head ever moves backwards 
- the invariant that aggregatedbase == frameheadpos after the inverse 
transition loop will then be violated. Now, this should never happen, because 
we require that the offset in n PRECEDING/FOLLOWING is constant, but we 
should probably still check for this and elog().

That check was implicit in old code, because it advanced the tuplestore mark, 
so if the frame head moved backwards, re-scanning the frame would have failed. 
That brings me to another todo - as it stands, that mark gets never advanced if 
we're doing inverse aggregation. To fix that, we need a call to 
WinSetMarkPosition() somewhere in eval_windowaggregates().

After doing this things, eval_windowaggregates ended up calling 
initalize_windowframeaggregates at a single place again, so I inlined it back 
into eval_windowaggregates().  I also made it use temp_slot_1 in the inverse 
aggregation loop, since that seemed safe - the code assumes some invariants 
about aggregatedbase and agg_row_slow.

Updated patch

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Florian Pflug
I had some more fun with this, the result is v2.5 of the patch (attached). 
Changes are explained below.

On Jan16, 2014, at 19:10 , Florian Pflug f...@phlo.org wrote:
 On Jan16, 2014, at 09:07 , David Rowley dgrowle...@gmail.com wrote:
 On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug f...@phlo.org wrote:
 The notnullcount machinery seems to apply to both STRICT and non-STRICT 
 transfer function pairs. Shouldn't that be constrained to STRICT transfer 
 function pairs? For non-STRICT pairs, it's up to the transfer functions to 
 deal with NULL inputs however they please, no?
 
 The reason I had to track the notnullcount was because for non-strict was 
 because:
 
 select sum(n) over (order by i rows between current row and unbounded 
 following) from (values(1,1),(2,NULL)) v(i,n);
 
 would otherwise return
 1
 0
 
 sum is not strict, so I guess we need to track notnullcount for non-strict 
 functions.
 See the code around if (peraggstate-notnullcount == 0) in 
 retreat_windowaggregate(). 
 
 Yeah, I figured that was the reason, but you can't fix it that way. See 
 http://www.postgresql.org/message-id/8e857d95-cba4-4974-a238-9dd7f61be...@phlo.org
  for a detailed explanation why. My 2.2 patch allows pairs of non-strict 
 forward and strict inverse transfer functions exactly because of this - i.e., 
 basically it decrees that if there's an inverse transfer function, the strict 
 setting of the *inverse* function determines the aggregates NULL behaviour. 
 The forward transfer function is then never called
 for NULL inputs, but it *is* called with the NULL state for the first 
 non-NULL input, and *must* then return a non-NULL state (hence it's 
 technically not strict, it's strict only in the inputs, not in the state).
 
 BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did 
 that in 2.2. That doesn't matter for the built-in aggregates since these 
 aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a 
 fixed patched.

Fixed now.

 The logic around movedaggbase in eval_windowaggregates() seems a bit 
 convoluted. Couldn't the if be moved before the code that pulls 
 aggregatedbase up to frameheadpos using the inverse aggregation function?
 
 I had a look at this and even tried moving the code to before the inverse 
 transitions, but it looks like that would only work if I added more tests to 
 the if condition to ensure that we're not about to perform inverse 
 transitions. To me it just seemed more bullet proof the way it is rather 
 than duplicating logic and having to ensure that it stays duplicated. But 
 saying that I don't think I've fully got my head around why the original 
 code is valid in the first place. I would have imagined that it should 
 contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that 
 sounds like complete rubbish then I'll put it down to my head still being 
 fried from work.
 
 Ok, I get this now. That code really just is asking is the previous row's 
 frame the same as the current row's frame in that if where you added 
 movedagg. What confused me was the rather weird way that condition is 
 expressed, which turns out is due to the fact that at the point of the if, we 
 don't know the new frame's end. Now, we could use update_frametailpos() to 
 find that, but that's potentially costly, especially if the tuple store was 
 spilled to disk. So instead, the code relies on the fact that only if the 
 frame end is n FOLLOWING/PRECEDING can the current row lie within the 
 previous row's frame without the two frame's ends being necessarily the same.
 
 For added confusion, that if never explicitly checks whether the frame's 
 *heads* coincide - the previous code seems to have relied on the 
 impossibility of having aggregatedbase = current  aggregatedupto after 
 re-initializing the aggregation, because then aggregatedbase = aggregatedupto 
 = 0. That's why you can't just move the if before the frameheadpos == 
 aggregatedbase check. But you can *if* you also check whether 
 aggregatedbase == frameheadpos in the if - which is clearer than relying on 
 that rather subtle  assumption anyway.
 
 BTW, the your patch will also fail badly if the frame head ever moves 
 backwards - the invariant that aggregatedbase == frameheadpos after the 
 inverse transition loop will then be violated. Now, this should never happen, 
 because we require that the offset in n PRECEDING/FOLLOWING is constant, 
 but we should probably still check for this and elog().
 
 That check was implicit in old code, because it advanced the tuplestore mark, 
 so if the frame head moved backwards, re-scanning the frame would have 
 failed. That brings me to another todo - as it stands, that mark gets never 
 advanced if we're doing inverse aggregation. To fix that, we need a call to 
 WinSetMarkPosition() somewhere in eval_windowaggregates().
 
 After doing this things, eval_windowaggregates ended up calling 
 initalize_windowframeaggregates at a single place

Re: [HACKERS] plpgsql.warn_shadow

2014-01-15 Thread Florian Pflug
On Jan15, 2014, at 10:08 , Marko Tiikkaja ma...@joh.to wrote:
 On 1/15/14 7:07 AM, Florian Pflug wrote:
 On Jan15, 2014, at 01:34 , Marko Tiikkaja ma...@joh.to wrote:
 It's me again, trying to find a solution to the most common mistakes I 
 make.  This time it's accidental shadowing of variables, especially input 
 variables.  I've wasted several hours banging my head against the wall 
 while shouting HOW CAN THIS VARIABLE ALWAYS BE NULL?.  I can't believe 
 I'm the only one.  To give you a rough idea on how it works:
 
 I like this, but think that the option should be just called 
 plpgsql.warnings or plpgsql.warn_on and accept a list of warnings to enable.
 
 Hmm.  How about:
 
  plpgsql.warnings = 'all' # enable all warnings, defauls to the empty list, 
 i.e. no warnings
  plpgsql.warnings = 'shadow, unused' # enable just shadow and unused 
 warnings

Looks good. For the #-directive, I think what we'd actually want there is to 
*disable* certain warnings for certain functions, i.e. #silence_warning 
shadow would disable the shadow warning. Enabling on a per-function basis 
doesn't seem all that useful - usually you'd develop with all warnings globally 
enabled anyway.

  plpgsql.warnings_as_errors = on # defaults to off?

This I object to, for the same reasons I object to consistent_into. 

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-15 Thread Florian Pflug
On Jan15, 2014, at 11:20 , Pavel Stehule pavel.steh...@gmail.com wrote:
 2014/1/15 Marko Tiikkaja ma...@joh.to
 On 1/15/14 7:07 AM, Florian Pflug wrote:
 On Jan15, 2014, at 01:34 , Marko Tiikkaja ma...@joh.to wrote:
 It's me again, trying to find a solution to the most common mistakes I make.  
 This time it's accidental shadowing of variables, especially input variables. 
  I've wasted several hours banging my head against the wall while shouting 
 HOW CAN THIS VARIABLE ALWAYS BE NULL?.  I can't believe I'm the only one.  
 To give you a rough idea on how it works:
 
 I like this, but think that the option should be just called plpgsql.warnings 
 or plpgsql.warn_on and accept a list of warnings to enable.
 
 Hmm.  How about:
 
   plpgsql.warnings = 'all' # enable all warnings, defauls to the empty list, 
 i.e. no warnings
   plpgsql.warnings = 'shadow, unused' # enable just shadow and unused 
 warnings
   plpgsql.warnings_as_errors = on # defaults to off?
 
 This interface is a lot more flexible and should address Jim's concerns as 
 well.
 
 In this context is not clean if this option is related to plpgsql compile 
 warnings, plpgsql executor warnings or general warnings.
 
 plpgsql.compile_warnings = disabled, enabled, fatal

This makes no sense to me - warnings can just as well be emitted during 
execution. Why would we distinguish the two? What would that accomplish?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-15 Thread Florian Pflug
On Jan15, 2014, at 13:08 , Pavel Stehule pavel.steh...@gmail.com wrote:
 2014/1/15 Florian Pflug f...@phlo.org
 On Jan15, 2014, at 11:20 , Pavel Stehule pavel.steh...@gmail.com wrote:
  2014/1/15 Marko Tiikkaja ma...@joh.to
plpgsql.warnings = 'all' # enable all warnings, defauls to the empty 
  list, i.e. no warnings
plpgsql.warnings = 'shadow, unused' # enable just shadow and unused 
  warnings
plpgsql.warnings_as_errors = on # defaults to off?
 
  This interface is a lot more flexible and should address Jim's concerns as 
  well.
 
  In this context is not clean if this option is related to plpgsql compile 
  warnings, plpgsql executor warnings or general warnings.
 
  plpgsql.compile_warnings = disabled, enabled, fatal
 
 This makes no sense to me - warnings can just as well be emitted during 
 execution. Why would we distinguish the two? What would that accomplish?
 
 When we talked about plpgsql compiler warnings, we talked about relative 
 important warnings that means in almost all cases some design issue and is 
 better to stop early.
 
 Postgres warnings is absolutly different kind - usually has informative 
 character - and usually you don't would to increment severity.
 
 More we talking about warnings produced by plpgsql environment - and what I 
 know - it has sense only for compiler.

The fact that it's named plpgsql.warnings already clearly documents that this 
only affects plpgsql. But whether a particular warning is emitted during 
compilation or during execution it largely irrelevant, I think. For example, if 
we called this compiler_warning, we'd couldn't add a warning which triggers 
when SELECT .. INTO ingores excessive rows.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-15 Thread Florian Pflug
On Jan15, 2014, at 13:32 , Marko Tiikkaja ma...@joh.to wrote:
 On 1/15/14 1:23 PM, Florian Pflug wrote:
 The fact that it's named plpgsql.warnings already clearly documents that 
 this only affects plpgsql. But whether a particular warning is emitted 
 during compilation or during execution it largely irrelevant, I think. For 
 example, if we called this compiler_warning, we'd couldn't add a warning 
 which triggers when SELECT .. INTO ingores excessive rows.
 
 There is the fact that something being a compiler warning gives you an idea 
 on its effects on performance.  But maybe that would be better described in 
 the documentation (perhaps even more accurately).
 
 I like the idea of warning about SELECT .. INTO, though, but that one could 
 have a non-negligible performance penalty during execution.

I'm not overly concerned about that. I image people would usually enable 
warnings during development, not production.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-15 Thread Florian Pflug
On Jan15, 2014, at 19:56 , Robert Haas robertmh...@gmail.com wrote:
 It strikes me that for numeric what you really need is to just tell
 the sum operation, whether through a parameter or otherwise, how many
 decimal places to show in the output.  Obviously that's not a
 practical change for sum() itself, but if we're inventing new stuff it
 can be done.

You can already do that, just cast the result of SUM(numeric) to
an appropriately constrained NUMERIC type, i.e. to NUMERIC(prec, scale).

BTW, AVG() and STDDEV() have the same issue. The problem is just partially
masked by the division by N (or N-1) at the end, because we always emit as
least 16 fractional digits when dividing. So you have to have an input 
value with a larger scale than that to trigger it.

For the following query

  select avg(x) over (order by i rows between current row and 1 following)
  from (values
(1,1), (2,1), (3,0.1), (4,1), (5,1)
  ) t(i,x);

9.3 returns
 avg 
-
  1.
 0.50001
 0.50001
  1.
  1.

but HEAD+patch returns
 avg 
-
  1.
 0.50001
 0.50001
 1.0
 1.0

I have to admit that I'm *very* tempted to suggest we simply ignore this -
but that *would* mean accepting that windowed aggregates are non-
deterministic in the sense that their result (even if only in the number
of trailing zeros) depends on values outside of the frame. Which, I guess,
is a box that best stays closed...

I'm currently thinking the best way forward is to get a basic patch
without any NUMERIC stuff committed, and to revisit this after that's done.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-15 Thread Florian Pflug
On Jan14, 2014, at 17:39 , Florian Pflug f...@phlo.org wrote:
 On Jan14, 2014, at 11:06 , David Rowley dgrowle...@gmail.com wrote:
 Here's a patch which removes sum(numeric) and changes the documents a little 
 to remove a reference to using sum(numeric) to workaround the fact that 
 there's no inverse transitions for sum(float). I also made a small change in 
 the aggregates.sql tests to check that the aggfnoid = .
 
 I've looked over the patch, here a few comments.
 
 For STRICT pairs of transfer and inverse transfer functions we should 
 complain if any of them ever return NULL. That can never be correct anyway, 
 since a STRICT function cannot undo a non-NULL - NULL transition.
 
 The same goes for non-STRICT, at least if we ever want to allow an inverse 
 transfer function to indicate Sorry, cannot undo in this case, please 
 rescan. If we allowed NULL as just another state value now, we couldn't 
 easily undo that later, so we'd rob ourselves of the obvious way for the 
 inverse transfer function to indicate this condition to its caller.
 
 The notnullcount machinery seems to apply to both STRICT and non-STRICT 
 transfer function pairs. Shouldn't that be constrained to STRICT transfer 
 function pairs? For non-STRICT pairs, it's up to the transfer functions to 
 deal with NULL inputs however they please, no?

I modified the patch to do this, and ran into a problem. Currently, aggregates 
with state type internal cannot have a strict transfer function, even if they 
behave like they did, i.e. ignore non-NULL inputs. This is because the only 
possible initial value for state type internal is NULL, and it's up to the 
transfer function to create the state - usually upon seeing the first non-NULL 
input. Now, currently that isn't a huge problem - the transfer function simply 
has to check for NULL input values itself, and if it's indeed conceptually 
strict, it just returns in this case.

With inverse transfer functions, however, each such pair of forward and inverse 
transfer functions would also need to duplicate the NULL-counting logic from 
nodeWindowAgg.c if it want's to be conceptually strict, i.e. ignore NULL 
inputs, but return NULL if there are *only* NULL inputs (or no inputs at all). 
That seems like a lot of duplicated code in the long run.

In essence, what we'd want for strict pairs of forward and inverse transfer 
functions is for the forward transfer function to be strict in all arguments 
except the state, and the inverse to be strict according to the usual 
definition. We can't express that property of the forward transfer function 
within pg_proc, but we don't really have to - a local hack in nodeWindowAgg.c 
suffices. So what I'm proposing is:

We allow the forward transfer function to be non-strict even if the inverse is 
strict, but only if the initial value is NULL. In that case we behave as if the 
forward transfer function was strict, except that upon seeing the first 
non-NULL input we call it with a NULL state. The return value must still be 
non-NULL in all cases.

Thoughts?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-15 Thread Florian Pflug
On Jan16, 2014, at 02:32 , Florian Pflug f...@phlo.org wrote:
 On Jan14, 2014, at 17:39 , Florian Pflug f...@phlo.org wrote:
 On Jan14, 2014, at 11:06 , David Rowley dgrowle...@gmail.com wrote:
 Here's a patch which removes sum(numeric) and changes the documents a 
 little to remove a reference to using sum(numeric) to workaround the fact 
 that there's no inverse transitions for sum(float). I also made a small 
 change in the aggregates.sql tests to check that the aggfnoid = .
 
 I've looked over the patch, here a few comments.
 
 For STRICT pairs of transfer and inverse transfer functions we should 
 complain if any of them ever return NULL. That can never be correct anyway, 
 since a STRICT function cannot undo a non-NULL - NULL transition.
 
 The same goes for non-STRICT, at least if we ever want to allow an inverse 
 transfer function to indicate Sorry, cannot undo in this case, please 
 rescan. If we allowed NULL as just another state value now, we couldn't 
 easily undo that later, so we'd rob ourselves of the obvious way for the 
 inverse transfer function to indicate this condition to its caller.
 
 The notnullcount machinery seems to apply to both STRICT and non-STRICT 
 transfer function pairs. Shouldn't that be constrained to STRICT transfer 
 function pairs? For non-STRICT pairs, it's up to the transfer functions to 
 deal with NULL inputs however they please, no?
 
 I modified the patch to do this, and ran into a problem. Currently, 
 aggregates with state type internal cannot have a strict transfer function, 
 even if they behave like they did, i.e. ignore non-NULL inputs. This is 
 because the only possible initial value for state type internal is NULL, 
 and it's up to the transfer function to create the state - usually upon 
 seeing the first non-NULL input. Now, currently that isn't a huge problem - 
 the transfer function simply has to check for NULL input values itself, and 
 if it's indeed conceptually strict, it just returns in this case.
 
 With inverse transfer functions, however, each such pair of forward and 
 inverse transfer functions would also need to duplicate the NULL-counting 
 logic from nodeWindowAgg.c if it want's to be conceptually strict, i.e. 
 ignore NULL inputs, but return NULL if there are *only* NULL inputs (or no 
 inputs at all). That seems like a lot of duplicated code in the long run.
 
 In essence, what we'd want for strict pairs of forward and inverse transfer 
 functions is for the forward transfer function to be strict in all arguments 
 except the state, and the inverse to be strict according to the usual 
 definition. We can't express that property of the forward transfer function 
 within pg_proc, but we don't really have to - a local hack in nodeWindowAgg.c 
 suffices. So what I'm proposing is:
 
 We allow the forward transfer function to be non-strict even if the inverse 
 is strict, but only if the initial value is NULL. In that case we behave as 
 if the forward transfer function was strict, except that upon seeing the 
 first non-NULL input we call it with a NULL state. The return value must 
 still be non-NULL in all cases.

Ok, I tried this and it worked out quite OK. 

Updated patch is attached. It passes regression tests, but those currently 
don't seem to include any aggregates which *don't* ignore NULL values, so that 
case is probably untested. Also, it allows non-strict forward transfer 
functions together with strict inverse transfer functions even for non-NULL 
initial states now. It seemed rather pedantic to forbid this.

BTW, as it stands, the patch currently uses the inverse transition function 
only when *all* the aggregates belonging to one window clause provide one. 
After working with the code a bit, I think that a bit of reshuffling would 
allow that distinction to be made on a per-aggregate basis. The question is, is 
it worth it?

best regards,
Florian Pflug


inverse_transition_functions_v2.2_fgp.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-14 Thread Florian Pflug
On Jan14, 2014, at 11:06 , David Rowley dgrowle...@gmail.com wrote:
 Here's a patch which removes sum(numeric) and changes the documents a little 
 to remove a reference to using sum(numeric) to workaround the fact that 
 there's no inverse transitions for sum(float). I also made a small change in 
 the aggregates.sql tests to check that the aggfnoid = .

I've looked over the patch, here a few comments.

For STRICT pairs of transfer and inverse transfer functions we should complain 
if any of them ever return NULL. That can never be correct anyway, since a 
STRICT function cannot undo a non-NULL - NULL transition.

The same goes for non-STRICT, at least if we ever want to allow an inverse 
transfer function to indicate Sorry, cannot undo in this case, please rescan. 
If we allowed NULL as just another state value now, we couldn't easily undo 
that later, so we'd rob ourselves of the obvious way for the inverse transfer 
function to indicate this condition to its caller.

The notnullcount machinery seems to apply to both STRICT and non-STRICT 
transfer function pairs. Shouldn't that be constrained to STRICT transfer 
function pairs? For non-STRICT pairs, it's up to the transfer functions to deal 
with NULL inputs however they please, no?

The logic around movedaggbase in eval_windowaggregates() seems a bit 
convoluted. Couldn't the if be moved before the code that pulls aggregatedbase 
up to frameheadpos using the inverse aggregation function? 

Also, could we easily check whether the frames corresponding to the individual 
rows are all either identical or disjoint, and don't use the inverse transfer 
function then? Currently, for a frame which contains either just the current 
row, or all the current row's peers, I think we'd use the inverse transfer 
function to fully un-add the old frame, and then add back the new frame.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-14 Thread Florian Pflug
On Jan15, 2014, at 01:34 , Marko Tiikkaja ma...@joh.to wrote:
 It's me again, trying to find a solution to the most common mistakes I make.  
 This time it's accidental shadowing of variables, especially input variables. 
  I've wasted several hours banging my head against the wall while shouting 
 HOW CAN THIS VARIABLE ALWAYS BE NULL?.  I can't believe I'm the only one.  
 To give you a rough idea on how it works:

I like this, but think that the option should be just called plpgsql.warnings 
or plpgsql.warn_on and accept a list of warnings to enable. I'm sure you'll 
come up with more unsafe coding practices to warn about in the future - for 
example, consistent_into immediately comes to mind ;-)

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Florian Pflug
On Jan12, 2014, at 04:18 , Josh Berkus j...@agliodbs.com wrote:
 Thing is, when we talk about auto-degrade, we need to determine things
 like Is the replica down or is this just a network blip? and take
 action according to the user's desired configuration.  This is not
 something, realistically, that we can do on a single request.  Whereas
 it would be fairly simple for an external monitoring utility to do:
 
 1. decide replica is offline for the duration (several poll attempts
 have failed)
 
 2. Send ALTER SYSTEM SET to the master and change/disable the
 synch_replicas.
 
 In other words, if we're going to have auto-degrade, the most
 intelligent place for it is in
 RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
 place.  Anything we do *inside* Postgres is going to have a really,
 really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-13 Thread Florian Pflug
On Jan13, 2014, at 22:30 , Joshua D. Drake j...@commandprompt.com wrote:
 On 01/13/2014 01:14 PM, Jim Nasby wrote:
 
 On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
 
 On 01/13/2014 10:12 AM, Hannu Krosing wrote:
 In other words, if we're going to have auto-degrade, the most
 intelligent place for it is in
 RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
 place.  Anything we do *inside* Postgres is going to have a really,
 really hard time determining when to degrade.
 +1
 
 This is also how 2PC works, btw - the database provides the building
 blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
 to deal with issues that require a whole-cluster perspective.
 
 
 ++1
 
 +1
 
 Josh, what do you think of the upthread idea of being able to recover
 in-progress transactions that are waiting when we turn off sync rep? I'm
 thinking that would be a very good feature to have... and it's not
 something you can easily do externally.
 
 I think it is extremely valuable, else we have lost those transactions which
 is exactly what we don't want.

We *have* to recover waiting transaction upon switching off sync rep.

A transaction that waits for a sync standby to respond has already committed
locally (i.e., updated the clog), it just hasn't updated the proc array yet,
and thus is still seen as in-progress by the rest of the system. But rolling
back the transaction is nevertheless *impossible* at that point (except by
PITR, and hence the quoted around reciver). So the only alternative to
recovering them, i.e. have them abort their waiting, is to let them linger
indefinitely, still holding their locks, preventing xmin from advancing, etc,
until either the client disconnects or the server is restarted.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Florian Pflug
On Jan13, 2014, at 22:49 , Jim Nasby j...@nasby.net wrote:
 ISTM that in this case, it should be safe to make the new default behavior 
 STRICT;
 if you forget to set the GUC to disable than you'll get an error that points 
 directly
 at the problem, at which point you'll go Oh, yeah... I forgot to set X...

What do you mean by STRICT? STRICT (which we already support) complains if the
query doesn't return *exactly* one row. What Marko wants is to raise an error
for a plain SELECT ... INTO if more than one row is returned, but to still
convert zero rows to NULL.

 Outside of the GUC, I believe the default should definitely be STRICT. If 
 your app is
 relying on non-strict then you need to be made aware of that. We should be 
 able to
 provide a DO block that will change this setting for every function you've 
 got if
 someone isn't happy with STRICT mode.

If you mean that we should change SELECT ... INTO to always behave as if STRICT
had been specified - why on earth would we want to do that? That would break
perfectly fine code for no good reason whatsoever.

In fact, after reading the documentation on SELECT ... INTO, I'm convinced the
the whole consistent_into thing is a bad idea. The documentation states clearly
that

  For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more 
than
  one returned row, even when STRICT is not specified. This is because there is 
no
  option such as ORDER BY with which to determine which affected row should be
  returned.

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason. We shouldn't be second-guessing ourselves by changing that later -
not, at least, unless we have a *very* good reason for it. Which, AFAICS, we 
don't.

(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
IMHO just too late for that)

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Florian Pflug
On Jan14, 2014, at 00:52 , Marko Tiikkaja ma...@joh.to wrote:
 When I've worked with PL/PgSQL, this has been a source of a few bugs that
 would have been noticed during testing if the behaviour of INTO wasn't as
 dangerous as it is right now.

The question is, how many bugs stemmed from wrong SQL queries, and what
percentage of those would have been caught by this? The way I see it, there
are thousands of ways to screw up a query, and having it return multiple 
rows instead of one is just one of them.

 Yes, it breaks backwards compatibility, but that's why there's a nice GUC.

Which doesn't help, because the GUC isn't tied to the code. This *adds*
an error case, not remove one - now, instead of getting your code correct,
you *also* have to get the GUC correct. If you even *know* that such a GUC
exists.

 If we're not going to scrap PL/PgSQL and
 start over again, we are going to have to do changes like this to make the
 language better.  Also I think that out of all the things we could do to
 break backwards compatibility, this is closer to harmless than a pain
 in the butt.

I very strongly believe that languages don't get better by adding a thousand
little knobs which subtly change semantics. Look at the mess that is PHP -
we absolutely, certainly don't want to go there. The most important rule in
language design is in my opinion stick with your choices. C, C++ and JAVA
all seem to follow this, and it's one of the reasons these languages are
popular for big projects, I think.

The way I see it, the only OK way to change existing behaviour is to have
the concept of a language version, and force code to indicate the language
version it expects. The important thing is that the language version is an
attribute of code, not some global setting that you can change without ever
looking at the code it'd affect.

So if we really want to change this, I think we need to have a
LANGUAGE_VERSION attribute on functions. Each time a major postgres release
changes the behaviour of one of the procedural languages, we'd increment
that language's version, and enable the old behaviour for all functions
tagged with an older one.

best regards,
Florian Pflug








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Florian Pflug
(Responding to both of your mails here)

On Jan14, 2014, at 01:20 , Jim Nasby j...@nasby.net wrote:
 On 1/13/14, 5:57 PM, Josh Berkus wrote:
 On 01/13/2014 03:41 PM, Florian Pflug wrote:
 It therefor isn't an oversight that SELECT ... INTO allows multiple result 
 rows
 but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose 
 and
 for a reason. We shouldn't be second-guessing ourselves by changing that 
 later -
 not, at least, unless we have a *very* good reason for it. Which, AFAICS, 
 we don't.
 
 (And yeah, personally I'd prefer if we'd complain about multiple rows. But 
 it's
 IMHO just too late for that)
 
 I *really* don't want to go through all my old code to find places where
 I used SELECT ... INTO just to pop off the first row, and ignored the
 rest.  I doubt anyone else does, either.
 
 Do you regularly have use cases where you actually want just one RANDOM row?
 I suspect the far more likely scenario is that people write code assuming 
 they'll
 get only one row and they'll end up with extremely hard to trace bugs if that
 assumption is ever wrong.

One case that immediatly comes to mind is a JOIN which sometimes returns
multiple rows, and a projection clause that only uses one of the tables
involved in the join. 

Another are queries including an ORDER BY - I don't think the patch makes an
exception for those, and even if it did, it probably wouldn't catch all
cases, like e.g. an SRF which returns the rows in a deterministic order.

Or maybe you're picking a row to process next, and don't really care about
the order in which you work through them.

 The question is, how many bugs stemmed from wrong SQL queries, and what
 percentage of those would have been caught by this? The way I see it, there
 are thousands of ways to screw up a query, and having it return multiple
 rows instead of one is just one of them.
 
 A query that's simply wrong is more likely to fail consistently. Non-strict
 use of INTO is going to fail in very subtle ways (unless you actually DO want
 just the first row, in which case you should explicitly use LIMIT 1).

How so? Say you expect SELECT * FROM view WHERE c=n to only ever return
one row. Then SELECT sum(f) FROM table JOIN view ON table.c = view.c is
just as subtly wrong as the first query is.

 And if we've always had it, why on earth didn't we make STRICT the default
 behavior?

Dunno, but AFAIK pl/pgsql mimics Oracle's PL/SQL, at least in some aspects,
so maybe this is one of the areas where we just do what oracle does.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-13 Thread Florian Pflug
On Jan10, 2014, at 22:27 , David Rowley dgrowle...@gmail.com wrote:
 As the patch stands at the moment, I currently have a regression test
 which currently fails due to these extra zeros after the decimal point:
 
 -- This test currently fails due extra trailing 0 digits.
 SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 
 UNBOUNDED FOLLOWING)
   FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
 
 Patched produces:
  6.01
  5.00
  3.00
 Unpatched produces:
  6.01
  5
  3

Hm, that's annoying. I checked the standard to see if it offers any guidance
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.

To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.

We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.

 With inverse transitions this query still produces correct results, it just 
 does
 not produces the numeric in the same format as it does without performing 
 inverse
 transitions. Personally I'd rather focus on trying to get SUM(numeric) in 
 there
 for 9.4

I think it'd be worthwile to get this into 9.4, if that's still an option,
even if we only support COUNT.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-13 Thread Florian Pflug
On Jan14, 2014, at 00:33 , Craig Ringer cr...@2ndquadrant.com wrote:
 So I guess the question is: Is it worth all that hassle to remove a
 misfeature you have to go out of your way to use? Is support for non-1
 lower bounds stopping us from doing something useful and important? Or
 is it just an irritation that it exists?

I don't think it's worh it - as you say, the actual risk of bugs is low,
because you have to go out of your way to end up with a lower bound other
than one.

Also, at least from my POV, the fact that we use one type do represent
arrays with an arbitrary number of dimensions is actually worse than
the lower-bound problem. So *if* we ever remove support for arbitrary
lower bounds, we should also add distinct types for different dimensions.
That'd probably required some extension of the type system though...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   >