[HACKERS] Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

2014-10-08 Thread Noah Misch
On Mon, Oct 06, 2014 at 08:57:54PM -0400, Tom Lane wrote:
 I eventually realized that the critical difference was you'd added
 CFLAGS= to the configure call.  On this platform that has the net
 effect of removing -O2 from the compiler flags, and apparently that
 shifts around the stack layout enough to expose the clobber.
 
 The fix is simple enough: ecpg's version of ParseDateTime is failing
 to check for overrun of the field[] array until *after* it's already
 clobbered the stack:

Thanks for tracking that down.  Oops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread Heikki Linnakangas

On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote:

What set of options would you pass if you want to use it as a synchronous
standby? And if you don't? Could we just have a single --synchronous
flag, instead of -F and --reply-fsync?


If you set synchronous_commit as remote_write, the options would be 
different .
The set of options in each case, see the following.


  Synchronous standby(synchronous_commit=on)
  --fsync-interval=-1
  --reply-fsync
  --slot=slotname

  Synchronous standby(synchronous_commit=remote_write)
  --fsync-interval=-1
  --reply-fsync

  Asynchronous
  There are no relative options.


Well, if the response time delay(value of --status-interval=interval) is acceptable, 
--reply-fsync is unnecessary.
Instead of --reply-fsync, using --synchronous(which summarizes the --reply-fsync and 
fsync-interval = -1) might be easy to understand. Although, in that case,  --fsync-interval=interval 
would be fixed value. Isn't there any problem ?


I think we should remove --fsync-interval and --reply-fsync, and just 
have a --synchronous option, which would imply the same behavior you get 
with --fsync-interval=-1 --reply--fsync.


That leaves the question of whether pg_receivexlog should do fsyncs when 
it's not acting as a synchronous standby. There isn't any real need to 
do so. In asynchronous mode, there are no guarantees anyway, and even 
without an fsync, the OS will eventually flush the data to disk.


But we could do even better than that. It would be best if you didn't 
need even the --synchronous flag. The server knows whether the client is 
a synchronous standby or not. It could tell the client.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog always handles -d option argument as connstr

2014-10-08 Thread Sawada Masahiko
Amit Kapilaamit.kapil...@gmail.com

 On Tue, Oct 7, 2014 at 8:13 PM, Sawada Masahiko sawada.m...@gmail.com
 javascript:_e(%7B%7D,'cvml','sawada.m...@gmail.com'); wrote:
 
  On Tue, Oct 7, 2014 at 12:58 PM, Amit Kapila amit.kapil...@gmail.com
 javascript:_e(%7B%7D,'cvml','amit.kapil...@gmail.com'); wrote:
   On Mon, Oct 6, 2014 at 10:23 PM, Sawada Masahiko 
 sawada.m...@gmail.com
 javascript:_e(%7B%7D,'cvml','sawada.m...@gmail.com');
   wrote:
  
   Hi all,
  
   pg_receivexlog always handles argument of -d option as  connstr
 formatted
   value.
   We can doubly specify host name, port number.
   The other client tools handles -d option as connstr value only if
   argument has = character.
  
   pg_basebackup also seems to behave same as pg_receivexlog.
   psql also treats it in similar way.  The behaviour of psql is as
   below:
   psql.exe -d=host=localhost port=5432 dbname=postgres
   psql: invalid connection option 
  
   psql.exe -d host=localhost port=5432 dbname=postgres
   psql (9.5devel)
   WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page Notes for Windows users for details.
   Type help for help.
  
   postgres=#
  
   The document says that pg_receivexlog ignores database name, and this
   option is called for consistency with other client applications.
   But if we specify database name like other client tool '-d hoge' ,
   then we will definitely got error.
  
   What I understand from document is that it ignores database name
   when given in connection string.
  
 
  Yep, but we can use -d option like 'psql -d postgres'.
  pg_receivexlog and pg_basebackup doesn't seem to work like that.
  they always handles argument as connstr formatted value.

 psql needs to connect to particular database for its functioning where
 as pg_basebackup/pg_receivexlog works differently and doesn't
 need to connect to particular database for its functioning, so I am not
 sure why you want them to behave similarly for -d switch.
 Moreover the same is clarified in docs, so whats the issue?


Yes, I got your point.

I thought that it's a little confusion that,
Although the document says -d option of pg_receivexlog is called for
consistency,
Behavior of psql(other tool) and pg_receivexlog aren't exactly same.

If user use same -d option like '-d hoge_db' on both tool pg_receivexlog and
psql for easir setting, then it leads to occur error on pg_receivexlog side.
I thought pg_receivexlog should ignore database name even thought -d option
value is not connsrr formatted value, same as psql dose.
But there may be not any such case.

Regards,

Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen
On Tue, 2014-10-07 at 13:33 +0100, Simon Riggs wrote:
 Is there a way of detecting that we are updating a unique constraint
 column and then applying the HW locking only in that case? Or can we
 only apply locking when we have multiple unique constraints on a
 table?

What is the use case of doing an UPSERT into a table with multiple
unique constraints?

Consider table user with unique columns name and email and a non-unique
column age. If it has data

Jack | j...@example.com |33 
Tom | t...@example.com | 35

And the user does UPSERT values (Jack, t...@example.com, 34). The
proposed specification will pick random unique index (either name or
email index) and do an update of that row.

First, this will cause unique violation, second, doing the UPSERT on
random index seems confusing.

The MySQL documentation says that you should try to avoid using an ON
DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1].
The proposed feature's documentation has the same suggestion[2]. Still,
the feature defaults to this behavior. Why is the default something the
documentation says you shouldn't do?

Going a bit further, I am wondering what is the use case of doing an
UPSERT against multiple unique indexes? If multiple unique indexes
UPSERT could be dropped that might allow for faster or cleaner index
locking techniques.

 - Anssi

1: http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
2: 
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Michael Paquier
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais 
j...@dalibo.com wrote:

 We kept the WAL files and log files for further analysis. How can we help
 regarding this issue?


Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.

Comments welcome.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..39701a3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 			strspn(xlde-d_name, 0123456789ABCDEF) == 24 
 			strcmp(xlde-d_name + 8, lastoff + 8) = 0)
 		{
-			if (XLogArchiveCheckDone(xlde-d_name))
+			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
 			{
 snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
anssi.kaariai...@thl.fi wrote:
 The MySQL documentation says that you should try to avoid using an ON
 DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1].
 The proposed feature's documentation has the same suggestion[2]. Still,
 the feature defaults to this behavior. Why is the default something the
 documentation says you shouldn't do?

MySQL started saying that when they realized it broke their
statement-based replication. They have their own weird reasons for
disliking it. Now, that's not to minimize the risks.

The reasoning behind making the unique index specification optional is:

We cannot easily cover corner cases with another syntax - unique
indexes must be named directly to cover every case, and make the
user's intent absolutely clear. That's not obviously the case, but
consider partial unique indexes, for example. Or consider uniquely
constrained columns, with an almost equivalent uniquely constrained
expression on those same columns. On the one hand I am not comfortable
failing to support those, but on the other hand it could get very
messy to do it another way.

As we all know, naming a unique index in DML is ugly, and has poor
support in ORMs. It seems likely that we're better off making it
optional - it hasn't been much of a problem with the existing subxact
looping pattern. A lot of the time, there will only be a single unique
index anyway, or there will be a trivial serial PK unique index and
the unique index we always want to treat would-be conflicts within as
triggering the alternative UPDATE/IGNORE path.

 Going a bit further, I am wondering what is the use case of doing an
 UPSERT against multiple unique indexes? If multiple unique indexes
 UPSERT could be dropped that might allow for faster or cleaner index
 locking techniques.

I see no reason to think that. I don't think it buys us much at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Heikki Linnakangas

On 10/08/2014 11:10 AM, Peter Geoghegan wrote:

The reasoning behind making the unique index specification optional is:

We cannot easily cover corner cases with another syntax - unique
indexes must be named directly to cover every case, and make the
user's intent absolutely clear. That's not obviously the case, but
consider partial unique indexes, for example. Or consider uniquely
constrained columns, with an almost equivalent uniquely constrained
expression on those same columns. On the one hand I am not comfortable
failing to support those, but on the other hand it could get very
messy to do it another way.

As we all know, naming a unique index in DML is ugly, and has poor
support in ORMs.


I vehemently object to naming indexes in the UPSERT statement. That 
mixes logical and physical database design, which is a bad idea. This is 
not ISAM.


Instead of naming the index, you should name the columns, and the system 
can look up the index or indexes that match those columns.


(Remind me again, where did this need to name an index come from in the 
first place?)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Marti Raudsepp
On Wed, Oct 8, 2014 at 3:47 AM, Peter Geoghegan p...@heroku.com wrote:
 It seems like what you're talking about here is just changing the
 spelling of what I already have.

I think there's a subtle difference in expectations too. The current
BEFORE INSERT trigger behavior is somewhat defensible with an
INSERT-driven syntax (though I don't like it even now [1]). But the
MERGE syntax, to me, strongly implies that insertion doesn't begin
before determining whether a conflict exists or not.

[1] 
http://www.postgresql.org/message-id/CABRT9RD6zriK+t6mnqQOqaozZ5z1bUaKh+kNY=o9zqbzfoa...@mail.gmail.com

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Paweł Cesar Sanjuan Szklarz
Hello.

I am interested in the json type on postgresql. I would like to implement
additional operations on the json structure that may extract/insert table
like information from the json tree structure.
I have a implementation on javascript that shows this type of operations.
You can see examples in this page
https://github.com/paweld2/eelnss/wiki

Following the examples in the previous page, it may by possible to
implement a function similar to json_populate_record to extract multiple
records from a single json value, for example:
select * from json_populate_records_with_clen(null::myrowtype_users,
'app.users.{:uID}.(email,data.name,isActive)', '... nested json value ...')

may return
uID  |  email  | name  | isActive
--
u1 | ad...@pmsoft.eu| administrator | true
u2 | nor...@pmsoft.eu   | user   | true
u3 | testu...@pmsoft.eu | testUser| false


Also, assuming that we have a table User as above (uID, email, name,
isActive), with context lenses it is very simple to map the table to a json
object. I assume that a similar api to table_to_xml,query_to_xml may be
provided:

table_to_json( Person, 'app.users.{:uID}.(email,data.name,isActive)');
query_to_json( 'select * from Person where ... ', 'app.users.{:uID}.(email,
data.name,isActive)');


I don't know the details about the integration of functions/operators to
sql queries, but because context lenses maps between tables and tree
objects, it may be possible to use a column json value as a separate table
in the queries. Assume the table
create table Person {
  pID  Integer
  address Json
}
then it  may be possible to query:
select * from Person as P left join ( select * from
json_populate_records_with_clen(null::addressType, 'addres.(street.number,
street.local,city.code,city.name)', P.address);

A final api for such functions needs to be defined. If such functions may
be usefull, I can try to prepare a implementation in postgres base code.

Regards.
Pawel Cesar Sanjuan Szklarz.


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread furuyao
 On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote:
  What set of options would you pass if you want to use it as a
  synchronous standby? And if you don't? Could we just have a single
 --synchronous
  flag, instead of -F and --reply-fsync?
 
  If you set synchronous_commit as remote_write, the options would
 be different .
  The set of options in each case, see the following.
 
 
Synchronous standby(synchronous_commit=on)
--fsync-interval=-1
--reply-fsync
--slot=slotname
 
Synchronous standby(synchronous_commit=remote_write)
--fsync-interval=-1
--reply-fsync
 
Asynchronous
There are no relative options.
 
 
  Well, if the response time delay(value of
 --status-interval=interval) is acceptable, --reply-fsync is
 unnecessary.
  Instead of --reply-fsync, using --synchronous(which summarizes
 the --reply-fsync and fsync-interval = -1) might be easy to
 understand. Although, in that case,  --fsync-interval=interval would
 be fixed value. Isn't there any problem ?
 
 I think we should remove --fsync-interval and --reply-fsync, and just
 have a --synchronous option, which would imply the same behavior you get
 with --fsync-interval=-1 --reply--fsync.
 
 That leaves the question of whether pg_receivexlog should do fsyncs when
 it's not acting as a synchronous standby. There isn't any real need to
 do so. In asynchronous mode, there are no guarantees anyway, and even
 without an fsync, the OS will eventually flush the data to disk.
 
 But we could do even better than that. It would be best if you didn't
 need even the --synchronous flag. The server knows whether the client
 is a synchronous standby or not. It could tell the client.

Thanks for the reply.

If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what should 
we do about it?

In asynchronous mode, I think there is no problem since the specification is 
same with release version.

 The server knows whether the client is a synchronous standby or not. 
 It could tell the client.

When notifying the synchronous/asynchronous mode to the client from the server,
do we need to change the format of the Message ?

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Instead of naming the index, you should name the columns, and the system can
 look up the index or indexes that match those columns.

It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
dead set on it. It was something I mainly added at Kevin's request. I
do see the risks, though.

 (Remind me again, where did this need to name an index come from in the
 first place?)

I agree that naming indexes is ugly, and best avoided. It's tricky,
though. The first thing you might try is to look up the index during
parse analysis and/or planning. That could work in simple cases (which
are probably the vast majority), but I'm worried about stuff like
before row triggers that change the values being inserted out from
under us, in a way that interferes with partial unique indexes. Maybe
you could choose the wrong one if you didn't leave it until the last
minute. But it's not very convenient to leave it until the last
minute.

If you're willing to live with the command conservatively failing when
there isn't a clean specification (although not in a way that can make
the command fail when the user innocently adds a unique index later),
then I think I can do it. Otherwise, it could be surprisingly hard to
cover all the cases non-surprisingly. I freely admit that putting a
unique index in a DML statement is weird, but it is sort of the most
direct way of expressing what we want. Oracle actually have an
INSERT-IGNORE like hint that names a unique index (yes, really - see
the UPSERT wiki page). That's really bizarre, but the point is that
they may have felt that there was no reasonable alternative.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote:
 I think there's a subtle difference in expectations too. The current
 BEFORE INSERT trigger behavior is somewhat defensible with an
 INSERT-driven syntax (though I don't like it even now [1]).

There is no way around it. We need to fire before row triggers to know
what to insert on the one hand, but on the other hand (in general) we
have zero ability to nullify the effects (or side-effects) of before
triggers, since they may execute arbitrary user-defined code. I think
there is a good case to be made for severely restricting what before
row triggers can do, but it's too late for that.

 But the
 MERGE syntax, to me, strongly implies that insertion doesn't begin
 before determining whether a conflict exists or not.

I think you're right. Another strike against the MERGE syntax, then,
since as I said we cannot even know what to check prior to having
before row insert triggers fire.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Marti Raudsepp
On Tue, Oct 7, 2014 at 2:27 PM, Marti Raudsepp ma...@juffo.org wrote:
 but the new approach seems
 surprising: changes from BEFORE INSERT get persisted in the database,
 but AFTER INSERT is not fired.

I am sorry, I realize now that I misunderstood the current proposed
trigger behavior, I thought what Simon Riggs wrote here already
happens:
https://groups.google.com/forum/#!msg/django-developers/hdzkoLYVjBY/bnXyBVqx95EJ

But the point still stands: firing INSERT triggers when the UPDATE
path is taken is counterintuitive.  If we prevent changes of upsert
key columns in BEFORE triggers then we get the benefits, including
more straightforward trigger behavior and avoid problems with serial
columns.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Heikki Linnakangas

On 10/08/2014 10:44 AM, Michael Paquier wrote:

On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais 
j...@dalibo.com wrote:


We kept the WAL files and log files for further analysis. How can we help
regarding this issue?



Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.



There are two mysteries here:

1. Where do the FF files come from? In 9.2, FF-segments are not supposed 
to created, ever.


Since this only happens with streaming replication, the FF segments are 
probably being created by walreceiver. XLogWalRcvWrite is the function 
that opens the file. I don't see anything obviously wrong there. 
XLogWalRcvWrite opens the file corresponding the start position in the 
message received from the master. There is no check that the start 
position is valid, though; if the master sends a start position in the 
FF segment, walreceiver will merrily write it. So the problem could be 
in the walsender side. However, I don't see anything wrong there either.


I think we should add a check in walreceiver, to throw an error if the 
master sends an invalid WAL pointer, pointing to an FF segment.



2. Why are the .done files sometimes not being created?

I may have an explanation for that. Walreceiver creates a .done file 
when it closes an old segment and opens a new one. However, it does this 
only when it's about to start writing to the new segment, and still has 
the old segment open. If you stream the FE segment fully, but drop 
replication connection at exactly that point, the .done file is not 
created. That might sound unlikely, but it's actually pretty easy to 
trigger. Just do select pg_switch_xlog() in the master, followed by 
pg_ctl stop -m i and a restart.


The creation of the .done files seems quite unreliable anyway. If only a 
portion of a segment is streamed, we don't write a .done file for it, so 
we still have the original problem that we will try to archive the 
segment after failover, even though the master might already have 
archived it.


I looked again at the thread where this was discussed: 
http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com. 
I believe the idea was that the server that generates a WAL segment is 
always responsible for archiving it. A standby should never attempt to 
archive a WAL segment that was restored from the archive, or streamed 
from the master.


In that thread, it was not discussed what should happen to WAL files 
that an admin manually copies into pg_xlog of the standby. Should the 
standby archive them? I don't think so - the admin should copy them 
manually to the archive too, if he wants them archived. It's a good and 
simple rule that the server that generates the WAL, archives the WAL.


Instead of creating any .done files during recovery, we could scan 
pg_xlog at promotion, and create a .done file for every WAL segment 
that's present at that point. That would be more robust. And then apply 
your patch, to recycle old segments during archive recovery, ignoring 
.done files.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen
On Wed, 2014-10-08 at 02:22 -0700, Peter Geoghegan wrote:
 On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  Instead of naming the index, you should name the columns, and the system can
  look up the index or indexes that match those columns.
 
 It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
 dead set on it. It was something I mainly added at Kevin's request. I
 do see the risks, though.

To be usable in Django ORM's .save() method there must be an option to
use the primary key index, and only the primary key index for conflict
resolution.

Assume an author table with id SERIAL PRIMARY KEY, email TEXT UNIQUE,
age INTEGER, then when saving an object, Django must update the row with
matching primary key value, or otherwise do an insert. Doing an update
of matching email column isn't allowed. So, Django can't use the query:

   INSERT INTO author values(1, 't...@example.com', 35)
   ON CONFLICT UPDATE
   SET email = conflicting(email), age = conflicting(age);

If the table contains data (id=2, email='t...@example.com', age=34), the
query would update tom's age to 35 when it should have resulted in
unique constraint violation.

Other ORMs have similar save/persist implementations, that is objects
are persisted on primary key identity alone.

 - Anssi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen
On Wed, 2014-10-08 at 01:10 -0700, Peter Geoghegan wrote:
 On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
 anssi.kaariai...@thl.fi wrote:
  The MySQL documentation says that you should try to avoid using an ON
  DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1].
  The proposed feature's documentation has the same suggestion[2]. Still,
  the feature defaults to this behavior. Why is the default something the
  documentation says you shouldn't do?

 As we all know, naming a unique index in DML is ugly, and has poor
 support in ORMs. It seems likely that we're better off making it
 optional - it hasn't been much of a problem with the existing subxact
 looping pattern.

The subxact approach is a bit different than the proposed UPSERT
command. It loops:

   try:
   INSERT INTO author VALUE('Jack', 't...@example.com', 34)
   except UniqueConstraintViolation:
   UPDATE author SET ... WHERE name = 'Jack'

while the UPSERT command does something like:

   try:
   INSERT INTO author VALUE('Jack', 't...@example.com', 34)
   except UniqueConstaintViolation:
   UPDATE author SET ... WHERE name = 'Jack' OR email = 't...@example.com' 
LIMIT 1;

 - Anssi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Marti Raudsepp
On Wed, Oct 8, 2014 at 12:28 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote:
 I think there's a subtle difference in expectations too. The current
 BEFORE INSERT trigger behavior is somewhat defensible with an
 INSERT-driven syntax (though I don't like it even now [1]).

 There is no way around it. We need to fire before row triggers to know
 what to insert on the one hand, but on the other hand (in general) we
 have zero ability to nullify the effects (or side-effects) of before
 triggers, since they may execute arbitrary user-defined code.

With my proposal this problem disappears: if we prevent BEFORE
triggers from changing key attributes of NEW in the case of upsert,
then we can acquire value locks before firing any triggers (before
even constructing the whole tuple), and have a guarantee that the
value locks are still valid by the time we proceed with the actual
insert/update.

Other than changing NEW, the side effects of triggers are not relevant.

Now, there may very well be reasons why this is tricky to implement,
but I haven't heard any. Can you see any concrete reasons why this
won't work? I can take a shot at implementing this, if you're willing
to consider it.

 I think
 there is a good case to be made for severely restricting what before
 row triggers can do, but it's too late for that.

There are no users of new upsert syntax out there yet, so it's not
too late to rehash the semantics of that. This in no way affects users
of old INSERT/UPDATE syntax.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-10-08 Thread David Rowley
On Tue, Oct 7, 2014 at 3:46 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Oct 6, 2014 at 5:57 AM, David Rowley dgrowle...@gmail.com wrote:
  Can anyone shed any light on how I might determine where the scan rel is
 in
  the tree? I need to find it so I can check if the RangeTblEntry is
 marked as
  skip-able.

 I think you're probably going to need logic that knows about
 particular node types and does the right thing for each one.  I think
 - but maybe I'm misunderstanding - what you're looking for is a
 function of the form Oid ScansOnePlainTableWithoutQuals().  The
 algorithm could be something like:

 switch (node type)
 {
 case T_SeqScanState:
 if (no quals)
 return the_appropriate_oid;
 return false;
 case T_HashJoin:
 decide whether we can ignore one side of the join
 fish out the node from the other side of the join (including
 reaching through the Hash node if necessary)
 return ScansOnePlainTableWithoutQuals(the node we fished out);
 ...other specific cases...
 default:
 return false;
 }


Thanks Robert.

Ok, so I've been hacking away at this for a couple of evenings and I think
I have a working prototype finally!
My earlier thoughts about having to descend down until I find a seqscan
were wrong. It looks like just need to look at the next node down, if it's
a seqscan and it's marked as not needed, then we can skip that side of the
join, or if the child node is a HashJoinState then check the skip status of
that node, if both sides are marked as not needed, then skip that side of
the join.

I've just completed some simple performance tests:

create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);

I've loaded these tables with 4 million rows each.The performance is as
follows:

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
  count
-
 400
(1 row)
Time: 1022.492 ms

test=# select count(*) from t1;
  count
-
 400
(1 row)
Time: 693.642 ms

test=# alter table t2 drop constraint t2_t3_id_fkey;
ALTER TABLE
Time: 2.141 ms
test=# alter table t1 drop constraint t1_t2_id_fkey;
ALTER TABLE
Time: 1.678 ms
test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
  count
-
 400
(1 row)
Time: 11682.525 ms

So it seems it's not quite as efficient as join removal at planning time,
but still a big win when it's possible to perform the join skipping.

As yet, I've only added support for hash joins, and I've not really looked
into detail on what's needed for nested loop joins or merge joins.

For hash join I just added some code into the case HJ_BUILD_HASHTABLE: in
ExecHashJoin(). The code just checks if any side can be skipped, if they
can then the node will never move out of the HJ_BUILD_HASHTABLE state, so
that each time ExecHashJoin() is called, it'll just return the next tuple
from the non-skip side of the join until we run out of tuples... Or if both
sides can be skipped NULL is returned as any nodes above this shouldn't
attempt to scan, perhaps this should just be an Assert() as I don't think
any parent nodes should ever bother executing that node if it's not
required. The fact that I've put this code into the switch
under HJ_BUILD_HASHTABLE makes me think it should add about close to zero
overhead for when the join cannot be skipped.

One thing that we've lost out of this execution time join removal checks
method is the ability to still remove joins where the join column is
NULLable. The previous patch added IS NOT NULL to ensure the query was
equivalent when a join was removed, of course this meant that any
subsequent joins may later not have been removed due to the IS NOT NULL
quals existing (which could restrict the rows and remove the ability that
the foreign key could guarantee the existence of exactly 1 row matching the
join condition). I've not yet come up with a nice way to reimplement these
null checks at execution time, so I've thought that perhaps I won't bother,
at least not for this patch. I'd just disable join skipping at planning
time if any of the join columns can have NULLs.

Anyway this is just a progress report, and also to say thanks to Andres,
you might just have saved this patch with the execution time checking idea.

I'll post a patch soon, hopefully once I have merge and nestloop join types
working.

Regards

David Rowley


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-10-08 Thread Andres Freund
On 2014-10-09 00:21:44 +1300, David Rowley wrote:
 Ok, so I've been hacking away at this for a couple of evenings and I think
 I have a working prototype finally!

Cool!

 So it seems it's not quite as efficient as join removal at planning time,
 but still a big win when it's possible to perform the join skipping.

Have you checked where the overhead is? Is it really just the additional
node that the tuples are passed through?

Have you measured the overhead of the plan/execution time checks over
master?

 One thing that we've lost out of this execution time join removal checks
 method is the ability to still remove joins where the join column is
 NULLable. The previous patch added IS NOT NULL to ensure the query was
 equivalent when a join was removed, of course this meant that any
 subsequent joins may later not have been removed due to the IS NOT NULL
 quals existing (which could restrict the rows and remove the ability that
 the foreign key could guarantee the existence of exactly 1 row matching the
 join condition). I've not yet come up with a nice way to reimplement these
 null checks at execution time, so I've thought that perhaps I won't bother,
 at least not for this patch. I'd just disable join skipping at planning
 time if any of the join columns can have NULLs.

Sounds fair enough for the first round.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread Heikki Linnakangas

On 10/08/2014 11:47 AM, furu...@pm.nttdata.co.jp wrote:

On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote:

What set of options would you pass if you want to use it as a
synchronous standby? And if you don't? Could we just have a single

--synchronous

flag, instead of -F and --reply-fsync?


If you set synchronous_commit as remote_write, the options would

be different .

The set of options in each case, see the following.


   Synchronous standby(synchronous_commit=on)
   --fsync-interval=-1
   --reply-fsync
   --slot=slotname

   Synchronous standby(synchronous_commit=remote_write)
   --fsync-interval=-1
   --reply-fsync

   Asynchronous
   There are no relative options.


Well, if the response time delay(value of

--status-interval=interval) is acceptable, --reply-fsync is
unnecessary.

Instead of --reply-fsync, using --synchronous(which summarizes

the --reply-fsync and fsync-interval = -1) might be easy to
understand. Although, in that case,  --fsync-interval=interval would
be fixed value. Isn't there any problem ?

I think we should remove --fsync-interval and --reply-fsync, and just
have a --synchronous option, which would imply the same behavior you get
with --fsync-interval=-1 --reply--fsync.

That leaves the question of whether pg_receivexlog should do fsyncs when
it's not acting as a synchronous standby. There isn't any real need to
do so. In asynchronous mode, there are no guarantees anyway, and even
without an fsync, the OS will eventually flush the data to disk.

But we could do even better than that. It would be best if you didn't
need even the --synchronous flag. The server knows whether the client
is a synchronous standby or not. It could tell the client.


If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what should 
we do about it?


I'm sorry, I didn't understand that.


In asynchronous mode, I think there is no problem since the specification is 
same with release version.


The server knows whether the client is a synchronous standby or not.
It could tell the client.


When notifying the synchronous/asynchronous mode to the client from the server,
do we need to change the format of the Message ?


Yeah. Or rather, add a new message type, to indicate the 
synchronous/asynchronous status.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
 2.
 LWLockWakeup()
 {
 ..
 #ifdef LWLOCK_STATS
 lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
 #else
 SpinLockAcquire(lock-mutex);
 #endif
 ..
 }
 
 Earlier while releasing lock, we don't count it towards LWLock stats
 spin_delay_count.  I think if we see other places in lwlock.c, it only gets
 counted when we try to acquire it in a loop.

I think the previous situation was clearly suboptimal. I've now modified
things so all spinlock acquirations are counted.

 3.
 LWLockRelease()
 {
 ..
 /* grant permission to run, even if a spurious share lock increases
 lockcount */
 else if (mode == LW_EXCLUSIVE  have_waiters)
 check_waiters = true;
 /* nobody has this locked anymore, potential exclusive lockers get a chance
 */
 else if (lockcount == 0  have_waiters)
 check_waiters = true;
 ..
 }
 
 It seems comments have been reversed in above code.

No, they look right. But I've expanded them in the version I'm going to
post in a couple minutes.
 
 5.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) wakeup)
 {
 PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
 LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
 dlist_delete(waiter-lwWaitLink);
 pg_write_barrier();
 waiter-lwWaiting = false;
 PGSemaphoreUnlock(waiter-sem);
 }
 ..
 }
 
 Why can't we decrement the nwaiters after waking up? I don't think
 there is any major problem even if callers do that themselves, but
 in some rare cases LWLockRelease() might spuriously assume that
 there are some waiters and tries to call LWLockWakeup().  Although
 this doesn't create any problem, keeping the value sane is good unless
 there is some problem in doing so.

That won't work because then LWLockWakeup() wouldn't be called when
necessary - precisely because nwaiters is 0.

 6.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) l-waiters)
 {
 ..
 if (wokeup_somebody  waiter-lwWaitMode == LW_EXCLUSIVE)
 continue;
 ..
 if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE)
 {
 ..
 wokeup_somebody = true;
 }
 ..
 }
 ..
 }
 
 a.
 IIUC above logic, if the waiter queue is as follows:
 (S-Shared; X-Exclusive) S X S S S X S S
 
 it can skip the exclusive waiters and release shared waiter.
 
 If my understanding is right, then I think instead of continue, there
 should be *break* in above logic.

No, it looks correct to me. What happened is that the first S was woken
up. So there's no point in waking up an exclusive locker, but further
non-exclusive lockers can be woken up.

 b.
 Consider below sequence of waiters:
 (S-Shared; X-Exclusive) S S X S S
 
 I think as per un-patched code, it will wakeup waiters uptill (including)
 first Exclusive, but patch will wake up uptill (*excluding*) first
 Exclusive.

I don't think the current code does that. And it'd be a pretty pointless
behaviour, leading to useless increased contention. The only time it'd
make sense for X to be woken up is when it gets run faster than the S
processes.

 7.
 LWLockWakeup()
 {
 ..
 dlist_foreach_modify(iter, (dlist_head *) l-waiters)
 {
 ..
 dlist_delete(waiter-lwWaitLink);
 dlist_push_tail(wakeup, waiter-lwWaitLink);
 ..
 }
 ..
 }
 
 Use of dlist has simplified the code, but I think there might be a slight
 overhead of maintaining wakeup queue as compare to un-patched
 mechanism especially when there is a long waiter queue.

I don't see that as being relevant. The difference is an instruction or
two - in the slow path we'll enter the kernel and sleep. This doesn't
matter in comparison.
And the code is *so* much more readable.

 8.
 LWLockConditionalAcquire()
 {
 ..
 /*
  * We ran into an exclusive lock and might have blocked another
  * exclusive lock from taking a shot because it took a time to back
  * off. Retry till we are either sure we didn't block somebody (because
  * somebody else certainly has the lock) or till we got it.
  *
  * We cannot rely on the two-step lock-acquisition protocol as in
  * LWLockAcquire because we're not using it.
  */
 if (potentially_spurious)
 {
 SPIN_DELAY();
 goto retry;
 }
 ..
 }
 
 Due to above logic, I think it can keep on retrying for long time before
 it actually concludes whether it got lock or not incase other backend/'s
 takes Exclusive lock after *double_check* and release before
 unconditional increment of  shared lock in function LWLockAttemptLock.
 I understand that it might be difficult to have such a practical scenario,
 however still there is a theoratical possibility of same.

I'm not particularly concerned. We could optimize it a bit, but I really
don't think it's necessary.

 Is there any advantage of retrying in LWLockConditionalAcquire()?

It's required for correctness. We only retry if we potentially blocked
an exclusive acquirer (by spuriously incrementing/decrementing lockcount
with 1). We need to be sure to either get the lock (in which case we can
wake up the waiter on release), or be sure that we didn't disturb
anyone.

 9.
 LWLockAcquireOrWait()
 

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 14:47:44 +0200, Andres Freund wrote:
 On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
  5.
  LWLockWakeup()
  {
  ..
  dlist_foreach_modify(iter, (dlist_head *) wakeup)
  {
  PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
  LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
  dlist_delete(waiter-lwWaitLink);
  pg_write_barrier();
  waiter-lwWaiting = false;
  PGSemaphoreUnlock(waiter-sem);
  }
  ..
  }
  
  Why can't we decrement the nwaiters after waking up? I don't think
  there is any major problem even if callers do that themselves, but
  in some rare cases LWLockRelease() might spuriously assume that
  there are some waiters and tries to call LWLockWakeup().  Although
  this doesn't create any problem, keeping the value sane is good unless
  there is some problem in doing so.
 
 That won't work because then LWLockWakeup() wouldn't be called when
 necessary - precisely because nwaiters is 0.

Err, this is bogus. Memory fail.

The reason I've done so is that it's otherwise much harder to debug
issues where there are backend that have been woken up already, but
haven't rerun yet. Without this there's simply no evidence of that
state. I can't see this being relevant for performance, so I'd rather
have it stay that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Michael Paquier
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 1. Where do the FF files come from? In 9.2, FF-segments are not supposed
 to created, ever.

Since this only happens with streaming replication, the FF segments are
 probably being created by walreceiver. XLogWalRcvWrite is the function that
 opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
 opens the file corresponding the start position in the message received
 from the master. There is no check that the start position is valid,
 though; if the master sends a start position in the FF segment, walreceiver
 will merrily write it. So the problem could be in the walsender side.
 However, I don't see anything wrong there either.

Good to hear that. By looking at the wal receiver and sender code paths, I
found nothing really weird.

I think we should add a check in walreceiver, to throw an error if the
 master sends an invalid WAL pointer, pointing to an FF segment.

Then we're good for a check in ProcessWalSndrMessage for walEnd I guess.
Seems like a straight-forward patch.

2. Why are the .done files sometimes not being created?

 I may have an explanation for that. Walreceiver creates a .done file when
 it closes an old segment and opens a new one. However, it does this only
 when it's about to start writing to the new segment, and still has the old
 segment open. If you stream the FE segment fully, but drop replication
 connection at exactly that point, the .done file is not created. That might
 sound unlikely, but it's actually pretty easy to trigger. Just do select
 pg_switch_xlog() in the master, followed by pg_ctl stop -m i and a
 restart.


That's exactly the test I have been doing a couple of times to trigger this
behavior before sending my previous email, but without success on the
standby with master: all the WAL files were marked as .done. Now, I have
just retried it, with far more tries on REL9_3_STABLE and on HEAD and I
have been able to actually trigger the problem a couple of times. Simply
run a long transaction generating a lot of WAL like that:
=# create table aa as select generate_series(1,10);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
And with enough luck, .ready files may appear. It may take a dozen of
tries before seeing at least ones. And I noticed that generally multiple
.ready files appear at the same time.


 The creation of the .done files seems quite unreliable anyway. If only a
 portion of a segment is streamed, we don't write a .done file for it, so we
 still have the original problem that we will try to archive the segment
 after failover, even though the master might already have archived it.

Yep. Agreed.


 I looked again at the thread where this was discussed:
 http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com.
 I believe the idea was that the server that generates a WAL segment is
 always responsible for archiving it. A standby should never attempt to
 archive a WAL segment that was restored from the archive, or streamed from
 the master.

In that thread, it was not discussed what should happen to WAL files that
 an admin manually copies into pg_xlog of the standby. Should the standby
 archive them? I don't think so - the admin should copy them manually to the
 archive too, if he wants them archived. It's a good and simple rule that
 the server that generates the WAL, archives the WAL.


Question time: why has the check based on recovery state of the node been
removed in 1bd42cd? Just assuming, but did you have in mind that relying on
XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at
this point?

Instead of creating any .done files during recovery, we could scan pg_xlog
 at promotion, and create a .done file for every WAL segment that's present
 at that point. That would be more robust. And then apply your patch, to
 recycle old segments during archive recovery, ignoring .done files.


The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Regards,
-- 
Michael


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Simon Riggs
On 8 October 2014 01:47, Peter Geoghegan p...@heroku.com wrote:

 It seems like what you're talking about here is just changing the
 spelling of what I already have. I think that would be confusing to
 users when the time comes to actually implement a fully-generalized
 MERGE, even with the clearly distinct MERGE CONCURRENTLY variant
 outlined here (which, of course, lacks an outer join, unlike MERGE
 proper).

I change my view on this, after some more thought. (Hope that helps)

If we implement MERGE, I can see we may also wish to implement MERGE
CONCURRENTLY one day. That would be different to UPSERT.

So in the future I think we will need 3 commands

1. MERGE
2. MERGE CONCURRENTLY
3. UPSERT

So we no longer need to have the command start with the MERGE keyword.

 However, unlike the idea of trying to square the circle of producing a
 general purpose MERGE command that also supports the UPSERT use-case,
 my objection to this much more limited proposal is made purely on
 aesthetic grounds. I think that it is not very user-friendly; I do not
 think that it's a total disaster, which is what trying to solve both
 problems at once (MERGE bulkloading and UPSERTing) would result in. So
 FWIW, if the community is really set on something that includes the
 keyword MERGE, which is really all you outline here, then I can live
 with that.

We will one day have MERGE according to the SQL Standard.

My opinion is that syntax for this should be similar to MERGE in the
*body* of the command, rather than some completely different syntax.
e.g.

 WHEN NOT MATCHED THEN
   INSERT
 WHEN MATCHED THEN
  UPDATE

I'm happy that we put that to a vote on what the syntax should be, as
long as we bear in mind that we will one day have MERGE as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Simon Riggs
On 8 October 2014 00:34, Peter Geoghegan p...@heroku.com wrote:

 INSERTs see #2 win, and by a wider margin than #1 beat #2 with
 UPDATEs. However, insert.sh is by design very unsympathetic towards
 #1. It uses a serial primary key, so every INSERT uselessly obtains a
 HW lock on the same leaf page for the duration of heap insertion.
 Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
 20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
 page heavyweight locking, and it isn't all that bad.

Lets see the results of running a COPY please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-08 Thread Andres Freund
Hi,

Attached you can find the next version of my LW_SHARED patchset. Now
that atomics are committed, it seems like a good idea to also add their
raison d'être.

Since the last public version I have:
* Addressed lots of Amit's comments. Thanks!
* Peformed a fair amount of testing.
* Rebased the code. The volatile removal made that not entirely
  trivial...
* Significantly cleaned up and simplified the code.
* Updated comments and such
* Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case)

The feature currently consists out of two patches:
1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
   verbose. This also does:
* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than
  before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent. I can
  reproduce crashes without it.
2) Implement the wait free LW_SHARED algorithm.

Personally I'm quite happy with the new state. I think it needs more
review, but I personally don't know of anything that needs
changing. There's lots of further improvements that could be done, but
let's get this in first.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/2] Convert the PGPROC-lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc-roleId = owner;
 	proc-lwWaiting = false;
 	proc-lwWaitMode = 0;
-	proc-lwWaitLink = NULL;
 	proc-waitLock = NULL;
 	proc-waitProcLock = NULL;
 	for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include replication/slot.h
+#include storage/barrier.h
 #include storage/ipc.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d,
+		elog(LOG, %s(%s %d): excl %d shared %d rOK %d,
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock-exclusive, lock-shared, lock-head,
+			 (int) lock-exclusive, lock-shared,
 			 (int) lock-releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock-exclusive = 0;
 	lock-shared = 0;
 	lock-tranche = tranche_id;
-	lock-head = NULL;
-	lock-tail = NULL;
+	dlist_init(lock-waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = mode;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);

Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-08 Thread Fujii Masao
On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 (2014/09/13 2:42), Fujii Masao wrote:

 On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

 Fujii Masao wrote:

 On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:


 PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
 Wouldn't it be easy-to-use to have only one parameter,
 PENDING_LIST_CLEANUP_SIZE?  How about setting PENDING_LIST_CLEANUP_SIZE
 to
 work_mem as the default value when running the CREATE INDEX command?


 That's an idea. But there might be some users who want to change
 the cleanup size per session like they can do by setting work_mem,
 and your idea would prevent them from doing that...

 So what about introduing pending_list_cleanup_size also as GUC?
 That is, users can set the threshold by using either the reloption or
 GUC.


 Yes, I think having both a GUC and a reloption makes sense -- the GUC
 applies to all indexes, and can be tweaked for individual indexes using
 the reloption.


 Agreed.

 I'm not sure about the idea of being able to change it per session,
 though.  Do you mean that you would like insert processes use a very
 large value so that they can just append new values to the pending list,
 and have vacuum use a small value so that it cleans up as soon as it
 runs?  Two things: 1. we could have an autovacuum_ reloption which
 only changes what autovacuum does; 2. we could have autovacuum run
 index cleanup actions separately from actual vacuuming.


 Yes, I was thinking something like that. But if autovacuum
 has already been able to handle that, it's nice. Anyway,
 as you pointed out, it's better to have both GUC and reloption
 for the cleanup size of pending list.


 OK, I'd vote for your idea of having both the GUC and the reloption. So, I
 think the patch needs to be updated.  Fujii-san, what plan do you have about
 the patch?

Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 5907,5912  SET XML OPTION { DOCUMENT | CONTENT };
--- 5907,5933 
/listitem
   /varlistentry
  
+  varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size
+   termvarnamepending_list_cleanup_size/varname (typeinteger/type)
+   indexterm
+primaryvarnamepending_list_cleanup_size/ configuration parameter/primary
+   /indexterm
+   /term
+   listitem
+para
+ Sets the maximum size of the GIN pending list which is used
+ when literalFASTUPDATE/ is enabled. If the list grows
+ larger than this maximum size, it is cleaned up by moving
+ the entries in it to the main GIN data structure in bulk.
+ The default is four megabytes (literal4MB/). This setting
+ can be overridden for individual GIN indexes by changing
+ storage parameters.
+  See xref linkend=gin-fast-update and xref linkend=gin-tips
+  for more information.
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
   sect2 id=runtime-config-client-format
*** a/doc/src/sgml/gin.sgml
--- b/doc/src/sgml/gin.sgml
***
*** 728,735 
 from the indexed item). As of productnamePostgreSQL/productname 8.4,
 acronymGIN/ is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed, or if the pending list becomes too large
!(larger than xref linkend=guc-work-mem), the entries are moved to the
 main acronymGIN/acronym data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
 acronymGIN/acronym index update speed, even counting the additional
--- 728,735 
 from the indexed item). As of productnamePostgreSQL/productname 8.4,
 acronymGIN/ is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed, or if the pending list becomes larger than
!xref linkend=guc-pending-list-cleanup-size, the entries are moved to the
 main acronymGIN/acronym data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
 acronymGIN/acronym index update speed, even counting the additional
***
*** 812,829 
/varlistentry
  
varlistentry
!termxref linkend=guc-work-mem/term
 listitem
  para
   During a series of insertions into an existing acronymGIN/acronym
   index that has literalFASTUPDATE/ enabled, the 

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Fujii Masao
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 10/08/2014 10:44 AM, Michael Paquier wrote:

 On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais 
 j...@dalibo.com wrote:

 We kept the WAL files and log files for further analysis. How can we help
 regarding this issue?


 Commit c2f79ba has added as assumption that the WAL receiver should always
 enforce the create of .done files when WAL files are done being streamed
 (XLogWalRcvWrite and WalReceiverMain) or archived
 (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
 changed a bit RemoveOldXlogFiles, removing a check looking if the node is
 in recovery. Now, based on the information given here yes it happens that
 there are still cases where .done file creation is not correctly done,
 leading to those extra files. Even by looking at the code, I am not
 directly seeing any code paths where an extra call to XLogArchiveForceDone
 would be needed on the WAL receiver side but... Something like the patch
 attached (which is clearly a band-aid) may help though as it would make
 files to be removed even if they are not marked as .done for a node in
 recovery. And this is consistent with the pre-1bd42cd.



 There are two mysteries here:

 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
 created, ever.

 Since this only happens with streaming replication, the FF segments are
 probably being created by walreceiver. XLogWalRcvWrite is the function that
 opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
 opens the file corresponding the start position in the message received from
 the master. There is no check that the start position is valid, though; if
 the master sends a start position in the FF segment, walreceiver will
 merrily write it. So the problem could be in the walsender side. However, I
 don't see anything wrong there either.

 I think we should add a check in walreceiver, to throw an error if the
 master sends an invalid WAL pointer, pointing to an FF segment.


 2. Why are the .done files sometimes not being created?

 I may have an explanation for that. Walreceiver creates a .done file when it
 closes an old segment and opens a new one. However, it does this only when
 it's about to start writing to the new segment, and still has the old
 segment open. If you stream the FE segment fully, but drop replication
 connection at exactly that point, the .done file is not created. That might
 sound unlikely, but it's actually pretty easy to trigger. Just do select
 pg_switch_xlog() in the master, followed by pg_ctl stop -m i and a
 restart.

 The creation of the .done files seems quite unreliable anyway. If only a
 portion of a segment is streamed, we don't write a .done file for it, so we
 still have the original problem that we will try to archive the segment
 after failover, even though the master might already have archived it.

 I looked again at the thread where this was discussed:
 http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com.
 I believe the idea was that the server that generates a WAL segment is
 always responsible for archiving it. A standby should never attempt to
 archive a WAL segment that was restored from the archive, or streamed from
 the master.

 In that thread, it was not discussed what should happen to WAL files that an
 admin manually copies into pg_xlog of the standby. Should the standby
 archive them? I don't think so - the admin should copy them manually to the
 archive too, if he wants them archived. It's a good and simple rule that the
 server that generates the WAL, archives the WAL.

 Instead of creating any .done files during recovery, we could scan pg_xlog
 at promotion, and create a .done file for every WAL segment that's present
 at that point. That would be more robust. And then apply your patch, to
 recycle old segments during archive recovery, ignoring .done files.

What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master? In this case, no WAL files have .done status
files, so the server will create .ready and archive all of them. Probably this
is problematic. So even if we adopt your idea, ISTM that it's better to create
.done file whenever WAL file is fullly streamed and restored.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Andres Freund
On 2014-09-29 13:38:37 -0400, Robert Haas wrote:
 On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost sfr...@snowman.net wrote:
  Lastly, I will say that I feel it'd be good to support bi-directional
  communication as I think it'll be needed eventually, but I'm not sure
  that has to happen now.
 
 I agree we need bidirectional communication; I just don't agree that
 the other direction should use the libpq format.  The data going from
 the worker to the process that launched it is stuff like errors and
 tuples, for which we already have a wire format.  The data going in
 the other direction is going to be things like plan trees to be
 executed, for which we don't.  But if we can defer the issue, so much
 the better.  Things will become clearer as we get closer to being
 done.

I think that might be true for your usecase, but not for others. It's
perfectly conceivable that one might want to ship tuples to a couple
bgworkers using the COPY protocol or such.

I don't think it needs to be fully implemented, but I think we should
design it a way that it's unlikely to require larger changes to the
added code from here to add it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Andres Freund
On 2014-09-10 16:53:12 -0400, Robert Haas wrote:
 diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
 index 5da9d8d..0b8db42 100644
 --- a/src/include/libpq/libpq.h
 +++ b/src/include/libpq/libpq.h
 @@ -37,6 +37,31 @@ typedef struct
   }   u;
  } PQArgBlock;
  
 +typedef struct
 +{
 + void (*comm_reset)(void);
 + int (*flush)(void);
 + int (*flush_if_writable)(void);
 + bool (*is_send_pending)(void);
 + int (*putmessage)(char msgtype, const char *s, size_t len);
 + void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
 + void (*startcopyout)(void);
 + void (*endcopyout)(bool errorAbort);
 +} PQsendMethods;
 +
 +PQsendMethods *PqSendMethods;

WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan


On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote:

Hello.

I am interested in the json type on postgresql. I would like to 
implement additional operations on the json structure that may 
extract/insert table like information from the json tree structure.
I have a implementation on javascript that shows this type of 
operations. You can see examples in this page

https://github.com/paweld2/eelnss/wiki

Following the examples in the previous page, it may by possible to 
implement a function similar to json_populate_record to extract 
multiple records from a single json value, for example:
select * from json_populate_records_with_clen(null::myrowtype_users, 
'app.users.{:uID}.(email,data.name http://data.name,isActive)', '... 
nested json value ...')


may return
uID  |  email  | name  | isActive
--
u1 | ad...@pmsoft.eu mailto:ad...@pmsoft.eu| administrator 
| true
u2 | nor...@pmsoft.eu mailto:nor...@pmsoft.eu   | user 
  | true
u3 | testu...@pmsoft.eu mailto:testu...@pmsoft.eu | testUser   
 | false



Also, assuming that we have a table User as above (uID, email, name, 
isActive), with context lenses it is very simple to map the table to a 
json object. I assume that a similar api to table_to_xml,query_to_xml 
may be provided:


table_to_json( Person, 'app.users.{:uID}.(email,data.name 
http://data.name,isActive)');
query_to_json( 'select * from Person where ... ', 
'app.users.{:uID}.(email,data.name http://data.name,isActive)');



I don't know the details about the integration of functions/operators 
to sql queries, but because context lenses maps between tables and 
tree objects, it may be possible to use a column json value as a 
separate table in the queries. Assume the table

create table Person {
  pID  Integer
  address Json
}
then it  may be possible to query:
select * from Person as P left join ( select * from 
json_populate_records_with_clen(null::addressType, 
'addres.(street.number, street.local,city.code,city.name 
http://city.name)', P.address);


A final api for such functions needs to be defined. If such functions 
may be usefull, I can try to prepare a implementation in postgres base 
code.






I don't think we need to import Mongo type notation here. But there is 
probably a good case for some functions like:


   json_table_agg(anyrecord) - json

which would work like json_agg() but would return an array of arrays 
instead of an array of objects. The caller would be assumed to know 
which field is which in the array. That should take care of both the 
table_to_json and query_to_json suggestions above.


In the other direction, we could have something like:

json_populate_recordset_from_table(base anyrecord, fields text[], 
jsontable json) - setof record


where jsontable is an array of arrays of values  and fields is a 
corresponding array of field names.


I'm not sure how mainstream any of this is. Maybe an extension would be 
more appropriate?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Heikki Linnakangas

On 10/08/2014 04:59 PM, Fujii Masao wrote:

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Instead of creating any .done files during recovery, we could scan pg_xlog
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.


What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master?


Um, that's not a safe thing to do anyway, is it?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Paweł Cesar Sanjuan Szklarz
On Wed, Oct 8, 2014 at 4:25 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote:

 Hello.

 I am interested in the json type on postgresql. I would like to implement
 additional operations on the json structure that may extract/insert table
 like information from the json tree structure.
 I have a implementation on javascript that shows this type of operations.
 You can see examples in this page
 https://github.com/paweld2/eelnss/wiki

 Following the examples in the previous page, it may by possible to
 implement a function similar to json_populate_record to extract multiple
 records from a single json value, for example:
 select * from json_populate_records_with_clen(null::myrowtype_users,
 'app.users.{:uID}.(email,data.name http://data.name,isActive)', '...
 nested json value ...')

 may return
 uID  |  email  | name  | isActive
 
 --
 u1 | ad...@pmsoft.eu mailto:ad...@pmsoft.eu| administrator |
 true
 u2 | nor...@pmsoft.eu mailto:nor...@pmsoft.eu   | user
  | true
 u3 | testu...@pmsoft.eu mailto:testu...@pmsoft.eu | testUser
 | false


 Also, assuming that we have a table User as above (uID, email, name,
 isActive), with context lenses it is very simple to map the table to a json
 object. I assume that a similar api to table_to_xml,query_to_xml may be
 provided:

 table_to_json( Person, 'app.users.{:uID}.(email,data.name 
 http://data.name,isActive)');
 query_to_json( 'select * from Person where ... ',
 'app.users.{:uID}.(email,data.name http://data.name,isActive)');


 I don't know the details about the integration of functions/operators to
 sql queries, but because context lenses maps between tables and tree
 objects, it may be possible to use a column json value as a separate table
 in the queries. Assume the table
 create table Person {
   pID  Integer
   address Json
 }
 then it  may be possible to query:
 select * from Person as P left join ( select * from
 json_populate_records_with_clen(null::addressType,
 'addres.(street.number, street.local,city.code,city.name 
 http://city.name)', P.address);

 A final api for such functions needs to be defined. If such functions may
 be usefull, I can try to prepare a implementation in postgres base code.




 I don't think we need to import Mongo type notation here. But there is
 probably a good case for some functions like:

json_table_agg(anyrecord) - json

 which would work like json_agg() but would return an array of arrays
 instead of an array of objects. The caller would be assumed to know which
 field is which in the array. That should take care of both the
 table_to_json and query_to_json suggestions above.

 In the other direction, we could have something like:

 json_populate_recordset_from_table(base anyrecord, fields text[],
 jsontable json) - setof record

 where jsontable is an array of arrays of values  and fields is a
 corresponding array of field names.

 I'm not sure how mainstream any of this is. Maybe an extension would be
 more appropriate?

 cheers

 andrew


Hello.

My personal interest is to send updates to a single json value in the
server. Which is the best way to make a update to a json value in postgres
without a full update of the already stored value??  the - operator
extract a internal value, but to update the value I don't see any operator.

I was not familiar with the extensions, but it looks like the best way to
start is to create a extension with possible implementations of new
functions. I will do so.

In my project I considered to use mongo, but in my case the core part of
the model match perfectly a relational schema. I have some leaf concepts
that will change frequently, and to avoid migrations I store that
information in a json value. To make changes in such leaf values I would
like to have a context lenses like api in the server. I will start with
some toy extension and try to feel if this make sense.

Regards.
Pawel.


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas
On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't see that as being relevant. The difference is an instruction or
 two - in the slow path we'll enter the kernel and sleep. This doesn't
 matter in comparison.
 And the code is *so* much more readable.

I find the slist/dlist stuff actually quite difficult to get right
compared to a hand-rolled linked list.  But the really big problem is
that the debugger can't do anything useful with it.  You have to work
out the structure-member offset in order to walk the list and manually
cast to char *, adjust the pointer, and cast back.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 13:13:33 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't see that as being relevant. The difference is an instruction or
  two - in the slow path we'll enter the kernel and sleep. This doesn't
  matter in comparison.
  And the code is *so* much more readable.
 
 I find the slist/dlist stuff actually quite difficult to get right
 compared to a hand-rolled linked list.

Really? I've spent more than a day debugging things with the current
code. And Heikki introduced a bug in it. If you look at how the code
looks before/after I find the difference pretty clear.

 But the really big problem is
 that the debugger can't do anything useful with it.  You have to work
 out the structure-member offset in order to walk the list and manually
 cast to char *, adjust the pointer, and cast back.  That sucks.

Hm. I can just do that with the debugger here. Not sure if that's
because I added the right thing to my .gdbinit or because I use the
correct compiler flags.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Alvaro Herrera
Robert Haas wrote:
 On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't see that as being relevant. The difference is an instruction or
  two - in the slow path we'll enter the kernel and sleep. This doesn't
  matter in comparison.
  And the code is *so* much more readable.
 
 I find the slist/dlist stuff actually quite difficult to get right
 compared to a hand-rolled linked list.  But the really big problem is
 that the debugger can't do anything useful with it.  You have to work
 out the structure-member offset in order to walk the list and manually
 cast to char *, adjust the pointer, and cast back.  That sucks.

As far as I recall you can get gdb to understand those pointer games
by defining some structs or macros.  Maybe we can improve by documenting
this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan


On 10/08/2014 12:13 PM, Paweł Cesar Sanjuan Szklarz wrote:




I don't think we need to import Mongo type notation here. But
there is probably a good case for some functions like:

   json_table_agg(anyrecord) - json

which would work like json_agg() but would return an array of
arrays instead of an array of objects. The caller would be assumed
to know which field is which in the array. That should take care
of both the table_to_json and query_to_json suggestions above.

In the other direction, we could have something like:

json_populate_recordset_from_table(base anyrecord, fields
text[], jsontable json) - setof record

where jsontable is an array of arrays of values  and fields is a
corresponding array of field names.

I'm not sure how mainstream any of this is. Maybe an extension
would be more appropriate?




Hello.

My personal interest is to send updates to a single json value in the 
server. Which is the best way to make a update to a json value in 
postgres without a full update of the already stored value??  the - 
operator extract a internal value, but to update the value I don't see 
any operator.


I was not familiar with the extensions, but it looks like the best way 
to start is to create a extension with possible implementations of new 
functions. I will do so.


In my project I considered to use mongo, but in my case the core part 
of the model match perfectly a relational schema. I have some leaf 
concepts that will change frequently, and to avoid migrations I store 
that information in a json value. To make changes in such leaf values 
I would like to have a context lenses like api in the server. I will 
start with some toy extension and try to feel if this make sense.





There is work already being done on providing update operations.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-08 Thread Michael Banck
Hi,

Am Samstag, den 04.10.2014, 15:05 -0500 schrieb Jim Nasby:
 On 10/4/14, 1:21 PM, Jeff Janes wrote:
  On Thu, Oct 2, 2014 at 6:21 AM, Michael Banck wrote:
  we have seen repeatedly that users can be confused about why PostgreSQL
  is not shutting down even though they requested it. Usually, this is
  because `log_checkpoints' is not enabled and the final checkpoint is
  being written, delaying shutdown. As no message besides shutting down
  is written to the server log in this case, we even had users believing
  the server was hanging and pondering killing it manually.
 
 
 Wouldn't a better place to write this message be the terminal from 
 which pg_ctl stop was invoked, rather than the server log file?

Looking at it from a DBA perspective, this would indeed be better, yes.

However, I see a few issues with that:

1. If you are using an init script (or another wrapper around pg_ctl),
you don't get any of its output it seems.

2. Having taken a quick look at pg_ctl, it seems to just kill the
postmaster on shutdown and wait for its PID file to disappear.  I don't
see how it should figure out that PostgreSQL is waiting for a checkpoint
to be finished?

 Or do both. I suspect elog( INFO, ... ) might do that.

That would imply that pg_ctl receives and writes out log messages
directed at clients, which I don't think is true?  Even if it was,
client_min_messages does not include an INFO level, and LOG is not being
logged to clients by default. So the first common level above the
default of both client_min_messages and log_min_messages would be
WARNING, which seems excessive to me.

As I said, I only took a quick look at pg_ctl though, so I might well be
missing something.


Michael

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote:
 Robert Haas wrote:
  On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   I don't see that as being relevant. The difference is an instruction or
   two - in the slow path we'll enter the kernel and sleep. This doesn't
   matter in comparison.
   And the code is *so* much more readable.
  
  I find the slist/dlist stuff actually quite difficult to get right
  compared to a hand-rolled linked list.  But the really big problem is
  that the debugger can't do anything useful with it.  You have to work
  out the structure-member offset in order to walk the list and manually
  cast to char *, adjust the pointer, and cast back.  That sucks.
 
 As far as I recall you can get gdb to understand those pointer games
 by defining some structs or macros.  Maybe we can improve by documenting
 this.

So, what makes it work for me (among other unrelated stuff) seems to be
the following in .gdbinit, defineing away some things that gdb doesn't
handle:
macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
macro define __extension__
macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
postgres' macros. At least if you're in the right scope.

As an example, the following works:
(gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, 
BackendList)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Peter Geoghegan
On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Having said that, it would be much nicer to have a mode that allows
 you to just say the word UPDATE and have it copy the data into the
 correct columns, like MySQL does. That is very intuitive, even if it
 isn't very flexible.

I thought about this, and at first I agreed, but now I'm not so sure.

Consider the case where you write an INSERT ... ON CONFLICT UPDATE ALL
query, or however we might spell this idea.

1. Developer writes the query, and it works fine.

2. Some time later, the DBA adds an inserted_at column (those are
common). The DBA is not aware of the existence of this particular
query. The new column has a default value of now(), say.

Should we UPDATE the inserted_at column to be NULL? Or (more
plausibly) the default value filled in by the INSERT? Or leave it be?
I think there is a case to be made for all of these behaviors, and
that tension makes me prefer to not do this at all. It's like
encouraging SELECT * queries in production, only worse.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan


On 10/08/2014 02:04 PM, Thom Brown wrote:


There is work already being done on providing update operations.

I've been looking out for that.  Has there been a discussion on how
that would look yet that you could point me to?




https://github.com/erthalion/jsonbx

Note that a) it's an extension, and b) it's jsonb only.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-08 Thread Andres Freund
On 2014-10-08 15:23:22 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
  1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
 verbose. This also does:
  * changes the logic in LWLockRelease() to release all shared lockers
when waking up any. This can yield some significant performance
improvements - and the fairness isn't really much worse than
before,
as we always allowed new shared lockers to jump the queue.
 
  * adds a memory pg_write_barrier() in the wakeup paths between
dequeuing and unsetting -lwWaiting. That was always required on
weakly ordered machines, but f4077cda2 made it more urgent. I can
reproduce crashes without it.

 I think it's a really bad idea to mix a refactoring change (like
 converting PGPROC-lwWaitLink into a dlist) with an attempted
 performance enhancement (like changing the rules for jumping the lock
 queue) and a bug fix (like adding pg_write_barrier where needed).  I'd
 suggest that the last of those be done first, and perhaps
 back-patched.

I think it makes sense to separate out the write barrier one. I don't
really see the point of separating the other two changes.

I've indeed previously started a thread
(http://archives.postgresql.org/message-id/20140210134625.GA15246%40awork2.anarazel.de)
about the barrier issue. IIRC you argued that that might be to
expensive.

 The current coding, using a hand-rolled list, touches shared memory
 fewer times.  When many waiters are awoken at once, we clip them all
 out of the list at one go.  Your revision moves them to a
 backend-private list one at a time, and then pops them off one at a
 time.  The backend-private memory accesses don't seem like they matter
 much, but the shared memory accesses would be nice to avoid.

I can't imagine this to matter.  We're entering the kernel for each PROC
for the PGSemaphoreUnlock() and we're dirtying the cacheline for
proc-lwWaiting = false anyway. This really is the slow path.

 Does LWLockUpdateVar's wake-up loop need a write barrier per
 iteration, or just one before the loop starts?  How about commenting
 the pg_write_barrier() with the read-fence to which it pairs?

Hm. Are you picking out LWLockUpdateVar for a reason or just as an
example? Because I don't see a difference between the different wakeup
loops?
It needs to be a barrier per iteration.

Currently the loop looks like
while (head != NULL)
{
proc = head;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
proc-lwWaiting = false;
PGSemaphoreUnlock(proc-sem);
}

Consider what happens if either the compiler or the cpu reorders this
to:
proc-lwWaiting = false;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
PGSemaphoreUnlock(proc-sem);

as soon as lwWaiting = false, 'proc' can wake up and acquire a new
lock. Backends can wake up prematurely because proc-sem is used for
other purposes than this (that's why the loops around PGSemaphoreLock
exist). Then it could reset lwWaitLink while acquiring a new lock. And
some processes wouldn't be woken up anymore.

The barrier it pairs with is the spinlock acquiration before
requeuing. To be more obviously correct we could add a read barrier
before
if (!proc-lwWaiting)
break;
but I don't think it's needed.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-08 Thread Robert Haas
On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
 2) Implement the wait free LW_SHARED algorithm.

+ * too high for workloads/locks that were locked in shared mode very

s/locked/taken/?

+ * frequently. Often we were spinning in the (obviously exlusive) spinlock,

exclusive.

+ * acquiration for locks that aren't exclusively locked.

acquisition.

+ * For exlusive lock acquisition we use an atomic compare-and-exchange on the

exclusive.

+ * lockcount variable swapping in EXCLUSIVE_LOCK/131-1/0x7FFF if and only

Add comma after variable.  Find some way of describing the special
value (maybe a sentinel value, EXCLUSIVE_LOCK) just once, instead of
three times.

+ * if the current value of lockcount is 0. If the swap was not successfull, we

successful.

+ * by 1 again. If so, we have to wait for the exlusive locker to release the

exclusive.

+ * The attentive reader probably might have noticed that naively doing the

probably might is redundant.  Delete probably.

+ * notice that we have to wait. Unfortunately until we have finished queuing,

until - by the time

+ *   Phase 2: Add us too the waitqueue of the lock

too - to.  And maybe us - ourselves.

+ *get queued in Phase 2 and we can wake them up if neccessary or they will

necessary.

+ * When acquiring shared locks it's possible that we disturb an exclusive
+ * waiter. If that's a problem for the specific user, pass in a valid pointer
+ * for 'potentially_spurious'. Its value will be set to true if we possibly
+ * did so. The caller then has to handle that scenario.

disturb is not clear enough.

+/* yipeyyahee */

Although this will be clear to individuals with a good command of
English, I suggest avoiding such usages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 6:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I change my view on this, after some more thought. (Hope that helps)

Great.

 If we implement MERGE, I can see we may also wish to implement MERGE
 CONCURRENTLY one day. That would be different to UPSERT.

 So in the future I think we will need 3 commands

 1. MERGE
 2. MERGE CONCURRENTLY
 3. UPSERT

 So we no longer need to have the command start with the MERGE keyword.

As I've outlined, I don't see how MERGE CONCURRENTLY could ever work,
but I'm glad that you agree that it should not block this effort (or
indeed, some later effort to implement a MERGE that is comparable to
the implementations of other database systems).

 We will one day have MERGE according to the SQL Standard.

Agreed.

 My opinion is that syntax for this should be similar to MERGE in the
 *body* of the command, rather than some completely different syntax.
 e.g.

 WHEN NOT MATCHED THEN
   INSERT
 WHEN MATCHED THEN
  UPDATE

 I'm happy that we put that to a vote on what the syntax should be, as
 long as we bear in mind that we will one day have MERGE as well.

While I am also happy with taking a vote, if we do so I vote against
even this much less MERGE-like syntax. It's verbose, and makes much
less sense when the mechanism is driven by would-be duplicate key
violations rather than an outer join. I also like that when you UPSERT
with the proposed ON CONFLICT UPDATE syntax, you get all the
flexibility of an INSERT - you can use data-modifying CTEs, and nest
the statement in a data-modifying CTE, and INSERT ... SELECT... ON
CONFLICT UPDATE ... . And to be honest, it's much simpler to
implement this whole feature as an adjunct to how INSERT statements
are currently processed (during parse analysis, planning and
execution); I don't want to make the syntax work against that. For
example, consider how little I had to change the grammar to make all
of this work:

$ git diff master --stat | grep gram
 src/backend/parser/gram.y  |  72 ++-

The code footprint of this patch is relatively small, and I think we
can all agree that that's a good thing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Petr Jelinek

On 08/10/14 21:03, Robert Haas wrote:

On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund and...@2ndquadrant.com wrote:


WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)


Well, I'm happy enough to go ahead and change that, if that's all it
takes to make you happy.  Is it?

So far the status of these patches AIUI is:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.


Yeah there is nothing much to say there, it can just go in.


#4 - No comments.


I think I said I like it which I meant as I think it's good as it is, no 
objections to committing that one either.



#6 - Updated per Amit's feedback.  Not sure if anyone else wants to
review.  Still needs docs.



I'd like to see this one when it's updated since I lost track a little 
about how the changes looks like exactly.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here (without necessarily working out the exact
details right away, including for example the exact spelling of what
I'm calling WITHIN). Since Simon has changed his mind on MERGE (while
still wanting to retain a superficial flavor of MERGE,  a flavor that
does not include the MERGE keyword), I think that leaves you as the
only advocate for the MERGE syntax.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Petr Jelinek

On 08/10/14 11:28, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote:

But the
MERGE syntax, to me, strongly implies that insertion doesn't begin
before determining whether a conflict exists or not.


I think you're right. Another strike against the MERGE syntax, then,
since as I said we cannot even know what to check prior to having
before row insert triggers fire.



True, but to me it also seems to be strike against using INSERT for this 
as I don't really see how you can make triggers work in a sane way if 
the UPSERT is implemented as part of INSERT (at least I haven't seen any 
proposal that I would consider sane from the user point of view).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 IIRC it wasn't agreed that we needed to identify which indexes in the
 upsert SQL statement itself, since this would be possible in other
 ways and would require programmers to know which unique constraints
 are declared.

 Kevin seemed quite concerned about that. That is something that seems
 hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can
 comment on that, since he was in favor of both being able to specify
 user intent by accepting a unique index, while also being in favor of
 the MERGE syntax.

Well, I mostly wanted to make sure we properly considered what the
implications were of using the standard syntax without other
keywords or decorations before deciding to go the non-standard
route.  In spite of an alarming tendency for people to assume that
meant that I didn't understand the desired semantics, I feel enough
people have understood the question and weighed in in favor of an
explicit choice between semantics, rather than inferring
concurrency handling based on the availability of the index
necessary for the slicker behavior.  I'm willing to concede that
overall consensus is leaning toward the view that UPSERT semantics
should be conditioned on explicit syntax; I'll drop that much going
forward.

Granting that, I will say that I lean toward either the MERGE
syntax with CONCURRENTLY being the flag to use UPSERT semantics, or
a separate UPSERT command which is as close to identical to the
MERGE syntax (other than the opening verb) as possible.  I see that
as still needing the ON clause so that you can specify which values
match which columns from the target table.  I'm fine with starting
with the syntax in the standard, which has no DELETE or IGNORE
options (as of the latest version I've seen).  So the syntax I'm
suggesting is close to what Simon is suggesting, but a more
compliant form would be:

MERGE CONCURRENTLY INTO foo
  USING (VALUES (valuelist) aliases)
  ON (conditions)
  WHEN NOT MATCHED
INSERT [ (columnlist) ] VALUES (valuelist)
  WHEN MATCHED
UPDATE SET colname = expression [, ...]

Rather than pseudo-randomly picking a unique index or using a
constraint or index name, the ON condition would need to allow
matching based on equality to all columns of a unique index which
only referenced NOT NULL columns; we would pick an index which
matched those conditions.  In any event, the unique index would be
required if CONCURRENTLY was specified.  Using column matching to
pick the index (like we do when specifying a FOREIGN KEY
constraint) is more in keeping with other SQL statements, and seems
generally safer to me.  It would also make it fairly painless for
people to switch concurrency techniques for what is, after all,
exactly the same operation except for differences in handling of
concurrent conflicting DML.

As I said, I'm also OK with using UPSERT in place of MERGE
CONCURRENTLY.

I also feel that if we could allow:

  USING (VALUES (valuelist) [, ...])

that would be great.  In fact, I don't see why that can't be pretty
much any relation, but it doesn't have to be for a first cut.  A
relation would allow a temporary table to be loaded with a batch of
rows where the intent is to UPSERT every row in the batch, without
needing to write a loop to do it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Simon Riggs
On 8 October 2014 21:16, Peter Geoghegan p...@heroku.com wrote:

 My opinion is that syntax for this should be similar to MERGE in the
 *body* of the command, rather than some completely different syntax.
 e.g.

 WHEN NOT MATCHED THEN
   INSERT
 WHEN MATCHED THEN
  UPDATE

 I'm happy that we put that to a vote on what the syntax should be, as
 long as we bear in mind that we will one day have MERGE as well.

 While I am also happy with taking a vote, if we do so I vote against
 even this much less MERGE-like syntax. It's verbose, and makes much
 less sense when the mechanism is driven by would-be duplicate key
 violations rather than an outer join.

It wouldn't be driven by an outer join, not sure where that comes from.

MERGE is verbose, I agree. I don't always like the SQL Standard, I
just think we should follow it as much as possible. You can't change
the fact that MERGE exists, so I don't see a reason to have two
variants of syntax that do roughly the same thing.

MERGE syntax would allow many things, such as this...
WHEN NOT MATCHED AND x  7 THEN
  INSERT
WHEN NOT MATCHED THEN
  INSERT
WHEN MATCHED AND y = 5 THEN
  DO NOTHING
WHEN MATCHED THEN
 UPDATE

etc

 I also like that when you UPSERT
 with the proposed ON CONFLICT UPDATE syntax, you get all the
 flexibility of an INSERT - you can use data-modifying CTEs, and nest
 the statement in a data-modifying CTE, and INSERT ... SELECT... ON
 CONFLICT UPDATE ... . And to be honest, it's much simpler to
 implement this whole feature as an adjunct to how INSERT statements
 are currently processed (during parse analysis, planning and
 execution); I don't want to make the syntax work against that.

I spoke to someone today that preferred a new command keyword, like
UPSERT, because the semantics of triggers are weird. Having a before
insert trigger fire when there is no insert is quite strange. Properly
documenting that on hackers would help; has the comments made on the
Django list been replayed here in some form?

I'm very scared by your comments about data modifying CTEs etc.. You
have no definition of how they will work, not tests of that. That part
isn't looking like a benefit as things currently stand.

I'm still waiting for some more docs to describe your intentions so
they can be reviewed.

Also, it would be useful to hear that your're going to do something
about the references to rows using conflicting(), since nobody has
agreed with you there. Or hopefully even that you've listened and
implemented something differently already. (We need that, whatever we
do with other elements of syntax).

Overall, I'm not seeing too many comments that indicate you are
accepting review comments and acting upon them. If you want acceptance
from others, you need to begin with some yourself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:

 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

 Can you please comment on this, Kevin? It would be nice to converge on
 an agreement on syntax here

Robert said however spelled -- which I take to mean that he at
least sees that the MERGE-like UPSERT syntax can be turned into the
desired semantics.  I have no idea why anyone would think otherwise.

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.  That doesn't
seem as clean to me as saying (for example):

UPSERT targettable t
  USING (VALUES (123, 100)) x(id, quantity)
  ON t.id = x.id
  WHEN NOT MATCHED
INSERT (id, quantity, inserted_at) VALUES (x.id, x.quantity, now())
  WHEN MATCHED
UPDATE SET quantity = t.quantity + x.quantity, updated_at = now();

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at)
  VALUES(123, quantity, now())
  ON CONFLICT WITHIN targettable_pkey
UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-10-08 Thread Andres Freund
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
 Hmm. So the simple, non-table driven, calculation gives the same result as
 using the lookup table using the reflected lookup code. That's expected; the
 lookup method is supposed to be the same, just faster. However, using the
 normal lookup code, but with a reflected lookup table, produces the same
 result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
 AFAICS, that's an incorrect combination. You're supposed to the
 non-reflected lookup table with the non-reflected lookup code; you can't mix
 and match.
 
 As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
 correspond to any bit-by-bit CRC variant and polynomial. My math skills are
 not strong enough to determine what the consequences of that are. It might
 still be a decent checksum. Or not. I couldn't tell if the good error
 detection properties of the normal CRC-32 polynomial apply to our algorithm
 or not.

Additional interesting datapoints are that hstore and ltree contain the
same tables - but properly use the reflected computation.

 Thoughts?

It clearly seems like a bad idea to continue with this - I don't think
anybody here knows which guarantees this gives us.

The question is how can we move away from this. There's unfortunately
two places that embed PGC32 that are likely to prove problematic when
fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
their logic in a persistent way.  I think we should provide
INIT/COMP/FIN_PG32 using the current algorithm for these.

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-10-08 Thread Alvaro Herrera
About patch 1, which I just pushed, there were two reviews:

Andres Freund wrote:
 On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
  diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
  new file mode 100644
  index 000..520b066
  --- /dev/null
  +++ b/src/include/utils/ruleutils.h

 I wondered for a minute whether any of these are likely to cause
 problems for code just including builtins.h - but I think there will be
 sufficiently few callers for them to make that not much of a concern.

Great.

Michael Paquier wrote:

 Patch 1: I still like this patch as it gives a clear separation of the
 built-in functions and the sub-functions of ruleutils.c that are
 completely independent. Have you considered adding the external
 declaration of quote_all_identifiers as well? It is true that this
 impacts extensions (some of my stuff as well), but my point is to bite
 the bullet and make the separation cleaner between builtins.h and
 ruleutils.h. Attached is a patch that can be applied on top of patch 1
 doing so... Feel free to discard for the potential breakage this would
 create though.

Yes, I did consider moving the quoting function definitions and the
global out of builtins.h.  It is much more disruptive, however, because
it affects a lot of external code not only our own; and given the looks
I got after people had to mess with adding the htup_details.h header to
external projects due to the refactoring I did to htup.h in an earlier
release, I'm not sure about it.

However, if I were to do it, I would instead create a quote.h file and
would also add the quote_literal_cstr() prototype to it, perhaps even
move the implementations from their current ruleutils.c location to
quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
beyond me.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas hlinnakan...@vmware.com 
wrote:

 Instead of naming the index, you should name the columns, and
 the system can look up the index or indexes that match those
 columns.

+1

That is what I have been consistently requesting instead of index
names, every time I notice it mentioned.

 It's not totally clear that we need *any* WITHIN clause, BTW.
 I'm not dead set on it. It was something I mainly added at
 Kevin's request. I do see the risks, though.

What I said was in response to your assertion that your technique
would *never* generate a duplicate key error.  As others have again
been pointing out, when there is more than one unique index you can
have rows to apply which cannot be applied accurately without
causing such an error.  What I requested was that the behavior in
those cases be clear and documented.  I didn't take a position on
whether you pick an index or ignore the input row (with a
warning?), but we need to decide how it is handled.  I have made
the same point Heikki is making, though -- we have no business
referencing an index name in the statement.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Stephen Frost
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
 On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I hope we can get pgAudit in as a module for 9.5. I also hope that it
  will stimulate the requirements/funding of further work in this area,
  rather than squash it. My feeling is we have more examples of feature
  sets that grow over time (replication, view handling, hstore/JSONB
  etc) than we have examples of things languishing in need of attention
  (partitioning).
 
 +1

To this point, specifically, I'll volunteer to find time in Novemeber to
review pgAudit for inclusion as a contrib module.  I feel a bit
responsible for it not being properly reviewed earlier due to my upgrade
and similar concerns.

Perhaps the latest version should be posted and added to the commitfest
for 2014-10 and I'll put myself down as a reviewer..?  I don't see it
there now.  I don't mean to be dismissive by suggesting it be added to
the commitfest- I honestly don't see myself having time before November
given the other things I'm involved in right now and pgConf.eu happening
in a few weeks.

Of course, if someone else has time to review, please do.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 2:50 PM, Kevin Grittner kgri...@ymail.com wrote:
 What I said was in response to your assertion that your technique
 would *never* generate a duplicate key error.

That is strictly true: the INSERT cannot raise a unique violation
error, because to do so would cause it to take the UPDATE path
(assuming there is no WITHIN clause). The UPDATE may get one, though.
It doesn't have to get one for your statement to have effects that are
surprising, though.

 As others have again
 been pointing out, when there is more than one unique index you can
 have rows to apply which cannot be applied accurately without
 causing such an error.

It's simpler than that. You want to make sure that the right condition
- the right unique index having a would-be duplicate violation - is
the one taken *in practice*, the condition on which the UPDATE path is
taken. What happens after that is not that interesting (e.g. whether
or not there is an UPDATE-path duplicate violation), because either
way it's broken.

 What I requested was that the behavior in
 those cases be clear and documented.  I didn't take a position on
 whether you pick an index or ignore the input row (with a
 warning?), but we need to decide how it is handled.  I have made
 the same point Heikki is making, though -- we have no business
 referencing an index name in the statement.

I think that's fair enough. That's all fine - but actually doing that
is quite tricky. Look at what I've said in relation to doing that.
Consider the corner-cases I name. They're certainly only a small
minority of cases in practice, but as an implementer I still need to
worry about them (maybe even just as much). Yes, I would like to just
name columns, but it's hard to make that fully generally.

If you want me to do what you say, fine. But in order to do that, I
need support for having the corner cases make parse analysis throw up
its hands and error. Is that a price you're willing to pay? If so,
then I'll implement it. Or, alternatively, we could do WITHIN PRIMARY
key/NOT WITHIN PRIMARY KEY.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-10-08 Thread Gavin Flower

On 09/10/14 10:13, Andres Freund wrote:

On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:

Hmm. So the simple, non-table driven, calculation gives the same result as
using the lookup table using the reflected lookup code. That's expected; the
lookup method is supposed to be the same, just faster. However, using the
normal lookup code, but with a reflected lookup table, produces the same
result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
AFAICS, that's an incorrect combination. You're supposed to the
non-reflected lookup table with the non-reflected lookup code; you can't mix
and match.

As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
correspond to any bit-by-bit CRC variant and polynomial. My math skills are
not strong enough to determine what the consequences of that are. It might
still be a decent checksum. Or not. I couldn't tell if the good error
detection properties of the normal CRC-32 polynomial apply to our algorithm
or not.

Additional interesting datapoints are that hstore and ltree contain the
same tables - but properly use the reflected computation.


Thoughts?

It clearly seems like a bad idea to continue with this - I don't think
anybody here knows which guarantees this gives us.

The question is how can we move away from this. There's unfortunately
two places that embed PGC32 that are likely to prove problematic when
fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
their logic in a persistent way.  I think we should provide
INIT/COMP/FIN_PG32 using the current algorithm for these.

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...

Greetings,

Andres Freund

Could a 64 bit variant of some kind be useful as an option - in addition 
to a correct 32 bit?


As most people have 64 bit processors and storage is less constrained 
now-a-days, as well as we tend to store much larger chunks of data.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote:
 While I am also happy with taking a vote, if we do so I vote against
 even this much less MERGE-like syntax. It's verbose, and makes much
 less sense when the mechanism is driven by would-be duplicate key
 violations rather than an outer join.

 It wouldn't be driven by an outer join, not sure where that comes from.

Right, I understood that it wouldn't be - which is the point. So with
an UPSERT that retains influence from MERGE, NOT MATCHED means no
conflict, MATCHED means conflict. That just seems like an odd way
to spell the concept given, as you say, that we're not talking about
an outer join.

 MERGE is verbose, I agree. I don't always like the SQL Standard, I
 just think we should follow it as much as possible. You can't change
 the fact that MERGE exists, so I don't see a reason to have two
 variants of syntax that do roughly the same thing.

 MERGE syntax would allow many things, such as this...
 WHEN NOT MATCHED AND x  7 THEN
   INSERT
 WHEN NOT MATCHED THEN
   INSERT
 WHEN MATCHED AND y = 5 THEN
   DO NOTHING
 WHEN MATCHED THEN
  UPDATE

 etc

But then you can have before row insert triggers fire, which as you
acknowledge is more surprising with this syntax.

 I spoke to someone today that preferred a new command keyword, like
 UPSERT, because the semantics of triggers are weird. Having a before
 insert trigger fire when there is no insert is quite strange. Properly
 documenting that on hackers would help; has the comments made on the
 Django list been replayed here in some form?

Yes. It's also mentioned in the commit message of CONFLICTING() (patch
0003-*). And the documentation (both the proposed INSERT
documentation, and the trigger documentation). There is a large
comment on it in the code. So I've said it many times.

 I'm very scared by your comments about data modifying CTEs etc.. You
 have no definition of how they will work, not tests of that. That part
 isn't looking like a benefit as things currently stand.

Actually, I have a few smoke tests for that. But I don't see any need
for special handling. When you have a data-modifying CTE, it can
contain an INSERT, and there are no special restrictions on that
INSERT (other than that it may not itself have a CTE, but that's true
more generally). You can have data-modifying CTEs containing INSERTs,
and data-modifying CTEs containing UPDATEswhat I've done is have
data-modifying CTEs contain INSERTs that also happen to have an ON
CONFLICT UPDATE clause.

This new clause of INSERTs is in no more need of special documentation
regarding interactions with data-modifying CTEs than UPDATE  WHERE
CURRENT OF is. The only possible exception I can think of would be
cardinality violations where a vanilla INSERT in one part of a command
(one data-modifying CTE) gives problems to the UPSERT part of the
same command (because we give a special cardinality violation message
when we try to update the same tuple twice in the same command). But
that's a pretty imaginative complaint, and I doubt it would really
surprise someone.

Why would you be surprised by the fact that a new clause for INSERT
plays nicely with existing clauses? It's nothing special - there is no
special handling.

 I'm still waiting for some more docs to describe your intentions so
 they can be reviewed.

I think it would be useful to add several more isolation tests,
highlighting some of the cases you talked about. I'll work on that.
While the way forward for WITHIN isn't clear, I think a WITHIN PRIMARY
KEY variant would certainly be useful. Maybe it would be okay to
forget about naming a specific unique index, while supporting an
(optional) WITHIN PRIMARY KEY/NOT WITHIN PRIMARY KEY. It doesn't
totally solve the problems, but may be a good compromise that mostly
satisfies people that want to be able to clearly indicate user intent
(Kevin, in particular), and satisfies other people that don't want to
name a unique index (Heikki, in particular). Certainly, the Django
people would like that, since they said as much.

 Also, it would be useful to hear that your're going to do something
 about the references to rows using conflicting(), since nobody has
 agreed with you there. Or hopefully even that you've listened and
 implemented something differently already. (We need that, whatever we
 do with other elements of syntax).

Do you really expect me to do major work on some aspect of the syntax
like that, given, as you say, that nobody explicitly agreed with me
(and only you disagreed with me)? The only remark I heard on that was
from you (you'd prefer to use NEW.* and OLD.*). But you spent much
more time talking about getting something MERGE-like, which
NEW.*/OLD.* clearly isn't.

CONFLICTING() is very close (identical?) to MySQL's use of ON
DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that,
but it's news to me that people take particular issue with it.

 Overall, I'm not seeing too many comments 

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Andres Freund
  /*
 + * Arrange to remove a dynamic shared memory mapping at cleanup time.
 + *
 + * dsm_keep_mapping() can be used to preserve a mapping for the entire
 + * lifetime of a process; this function reverses that decision, making
 + * the segment owned by the current resource owner.  This may be useful
 + * just before performing some operation that will invalidate the segment
 + * for future use by this backend.
 + */
 +void
 +dsm_unkeep_mapping(dsm_segment *seg)
 +{
 + Assert(seg-resowner == NULL);
 + ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
 + seg-resowner = CurrentResourceOwner;
 + ResourceOwnerRememberDSM(seg-resowner, seg);
 +}

Hm, I dislike the name unkeep. I guess you want to be symmetric to
dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
dm_remember_mapping()?

 From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
 From: Robert Haas rh...@postgresql.org
 Date: Fri, 11 Jul 2014 09:53:40 -0400
 Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
  get the results.
 
 The currently-active GUC values from the user session will be copied
 to the background worker.  If the command returns a result set, you
 can retrieve the result set; if not, you can retrieve the command
 tags.  If the command fails with an error, the same error will be
 thrown in the launching process when the results are retrieved.
 Warnings and other messages generated by the background worker, and
 notifications received by it, are also propagated to the foreground
 process.

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

 +/* Table-of-contents constants for our dynamic shared memory segment. */
 +#define PG_BACKGROUND_MAGIC  0x50674267
 +#define PG_BACKGROUND_KEY_FIXED_DATA 0
 +#define PG_BACKGROUND_KEY_SQL1
 +#define PG_BACKGROUND_KEY_GUC2
 +#define PG_BACKGROUND_KEY_QUEUE  3
 +#define PG_BACKGROUND_NKEYS  4
 +
 +/* Fixed-size data passed via our dynamic shared memory segment. */
 +typedef struct pg_background_fixed_data
 +{
 + Oid database_id;
 + Oid authenticated_user_id;
 + Oid current_user_id;
 + int sec_context;
 + char database[NAMEDATALEN];
 + char authenticated_user[NAMEDATALEN];
 +} pg_background_fixed_data;

Why not NameData?

 +/* Private state maintained by the launching backend for IPC. */
 +typedef struct pg_background_worker_info
 +{
 + pid_t   pid;
 + dsm_segment *seg;
 + BackgroundWorkerHandle *handle; 
 + shm_mq_handle *responseq;   
 + boolconsumed;
 +} pg_background_worker_info;

whitespace damage.

 +static HTAB *worker_hash;

Hm. So the lifetime of this hash's contents is managed via
on_dsm_detach(), do I understand that correctly?

 +PG_FUNCTION_INFO_V1(pg_background_launch);
 +PG_FUNCTION_INFO_V1(pg_background_result);
 +PG_FUNCTION_INFO_V1(pg_background_detach);


 +void pg_background_worker_main(Datum);
 +
 +/*
 + * Start a dynamic background worker to run a user-specified SQL command.
 + */
 +Datum
 +pg_background_launch(PG_FUNCTION_ARGS)
 +{
 + text   *sql = PG_GETARG_TEXT_PP(0);
 + int32   queue_size = PG_GETARG_INT64(1);
 + int32   sql_len = VARSIZE_ANY_EXHDR(sql);
 + Sizeguc_len;
 + Sizesegsize;
 + dsm_segment *seg;
 + shm_toc_estimator e;
 + shm_toc*toc;
 +char*sqlp;

wrong indentation.

 + char   *gucstate;
 + shm_mq *mq;
 + BackgroundWorker worker;
 + BackgroundWorkerHandle *worker_handle;
 + pg_background_fixed_data *fdata;
 + pid_t   pid;
 + shm_mq_handle *responseq;
 + MemoryContext   oldcontext;
 +
 + /* Ensure a valid queue size. */
 + if (queue_size  0 || ((uint64) queue_size)  shm_mq_minimum_size)
 + ereport(ERROR,
 + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 +  errmsg(queue size must be at least %zu bytes,
 + shm_mq_minimum_size)));

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?


 + /* Wait for the worker to start. */
 + switch (WaitForBackgroundWorkerStartup(worker_handle, pid))
 + {
 + case BGWH_STARTED:
 + /* Success. */
 + break;
 + case BGWH_STOPPED:
 + pfree(worker_handle);
 + ereport(ERROR,
 + 
 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
 +  errmsg(could not start background 
 

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me, but if the
 consensus is that we want to do that, it does argue for your plan
 of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.

 As I understand it you are proposing that would be:

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON CONFLICT WITHIN targettable_pkey
 UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
 now();

That's right, yes.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Michael Paquier
On Wed, Oct 8, 2014 at 11:38 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 10/08/2014 04:59 PM, Fujii Masao wrote:

 On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 Instead of creating any .done files during recovery, we could scan
 pg_xlog
 at promotion, and create a .done file for every WAL segment that's
 present
 at that point. That would be more robust. And then apply your patch, to
 recycle old segments during archive recovery, ignoring .done files.


 What happens if a user shutdowns the standby, removes recovery.conf and
 starts the server as the master?


 Um, that's not a safe thing to do anyway, is it?

That's not safe as it bypasses all the consistency checks of promotion.
Now, it is also something that repmgr for example does as far as I recall
to do a node promotion. What if we simply document the problem properly
then? The apparition of those phantom WAL files is more scary than a user
or a utility that does a promotion with a server restart. Not to mention as
well that users as free to add themselves files to pg_xlog.
-- 
Michael


Re: [HACKERS] Add CREATE support to event triggers

2014-10-08 Thread Michael Paquier
On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 About patch 1, which I just pushed, there were two reviews:

 Andres Freund wrote:
  On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
   diff --git a/src/include/utils/ruleutils.h
 b/src/include/utils/ruleutils.h
   new file mode 100644
   index 000..520b066
   --- /dev/null
   +++ b/src/include/utils/ruleutils.h

  I wondered for a minute whether any of these are likely to cause
  problems for code just including builtins.h - but I think there will be
  sufficiently few callers for them to make that not much of a concern.

 Great.

 Michael Paquier wrote:

  Patch 1: I still like this patch as it gives a clear separation of the
  built-in functions and the sub-functions of ruleutils.c that are
  completely independent. Have you considered adding the external
  declaration of quote_all_identifiers as well? It is true that this
  impacts extensions (some of my stuff as well), but my point is to bite
  the bullet and make the separation cleaner between builtins.h and
  ruleutils.h. Attached is a patch that can be applied on top of patch 1
  doing so... Feel free to discard for the potential breakage this would
  create though.

 Yes, I did consider moving the quoting function definitions and the
 global out of builtins.h.  It is much more disruptive, however, because
 it affects a lot of external code not only our own; and given the looks
 I got after people had to mess with adding the htup_details.h header to
 external projects due to the refactoring I did to htup.h in an earlier
 release, I'm not sure about it.

 However, if I were to do it, I would instead create a quote.h file and
 would also add the quote_literal_cstr() prototype to it, perhaps even
 move the implementations from their current ruleutils.c location to
 quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
 beyond me.)

Yes, an extra header file would be cleaner. Now if we are worried about
creating much incompatibilities with existing extension (IMO that's not
something core should worry much about), then it is better not to do it.
Now, if I can vote on it, I think it is worth doing in order to have
builtins.h only contain only Datum foo(PG_FUNCTION_ARGS), and move all the
quote-related things in quote.c.
-- 
Michael


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical keep track of
how many posts are in a thread trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

 Well, it isn't that I'm doing it because I think that it is a great
 idea, with everything to recommend it. It's more like I don't see any
 practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that? Even a clarify
this, this couldn't possibly work or you're not a major
contributor so your arguments are invalid response. ;)

I admit I initially misunderstood some details about the current
behavior. I can dig into the code and write a clearer and/or more
detailed spec if I had even *some* feedback.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Ian Barwick
On 14/10/09 7:06, Stephen Frost wrote:
 * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
 On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I hope we can get pgAudit in as a module for 9.5. I also hope that it
 will stimulate the requirements/funding of further work in this area,
 rather than squash it. My feeling is we have more examples of feature
 sets that grow over time (replication, view handling, hstore/JSONB
 etc) than we have examples of things languishing in need of attention
 (partitioning).

 +1
 
 To this point, specifically, I'll volunteer to find time in Novemeber to
 review pgAudit for inclusion as a contrib module.  I feel a bit
 responsible for it not being properly reviewed earlier due to my upgrade
 and similar concerns.
 
 Perhaps the latest version should be posted and added to the commitfest
 for 2014-10 and I'll put myself down as a reviewer..?  I don't see it
 there now.  I don't mean to be dismissive by suggesting it be added to
 the commitfest- I honestly don't see myself having time before November
 given the other things I'm involved in right now and pgConf.eu happening
 in a few weeks.

Thanks :) We're updating pgAudit for submission this for the upcoming 
commitfest,
it will be added within the next few days.

Regards


Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp ma...@juffo.org wrote:
 On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me

 Indeed, the current behavior breaks even the canonical keep track of
 how many posts are in a thread trigger example because INSERT
 triggers are fired for an effective row UPDATE. I can't see how that's
 acceptable.

DB2 had the foresight to add the following restrictions to BEFORE ROW
triggers - they cannot:


Contain any INSERT, DELETE, or UPDATE operations, nor invoke any
routine defined with MODIFIES SQL DATA, if it is not a compound SQL.
Contain any DELETE or UPDATE operations on the trigger subject table,
nor invoke any routine containing such operations, if it is a compound
SQL.
Reference a materialized query table defined with REFRESH IMMEDIATE
Reference a generated column other than the identity column in the NEW
transition variable.


To get a sense of how doing fancy things in before triggers leads to
trouble, considering the hardening Kevin added in commit 6868ed74. In
short, use an AFTER trigger for this kind of thing, and all of these
issues go away.

 Well, it isn't that I'm doing it because I think that it is a great
 idea, with everything to recommend it. It's more like I don't see any
 practical alternative.

 I proposed an alternative that avoids this surprise and might allow
 some other benefits. Can you please look into that?

I looked at that. You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail, in order to make an arguably inappropriate
case for before row triggers work (when you don't change the tuple to
be inserted, but you do want to do some other thing). That seems
backwards. My proposed behavior seems far less surprising.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas
On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 So, what makes it work for me (among other unrelated stuff) seems to be
 the following in .gdbinit, defineing away some things that gdb doesn't
 handle:
 macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
 macro define __extension__
 macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

 Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
 postgres' macros. At least if you're in the right scope.

 As an example, the following works:
 (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, 
 elem, BackendList)

Ah, cool.  I'll try that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Fabrízio de Royes Mello
On Wed, Oct 8, 2014 at 2:55 AM, David Fetter da...@fetter.org wrote:

 On Wed, Oct 08, 2014 at 12:41:46AM -0300, Fabrízio de Royes Mello wrote:
  On Wed, Oct 8, 2014 at 12:36 AM, Josh Berkus j...@agliodbs.com wrote:
  
   On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote:
We can think in a mechanism to create properties / options and
assign
  it
to objects (table, index, column, schema, ...) like COMMENT does.
   
A quickly thought:
   
CREATE OPTION [ IF NOT EXISTS ] name
VALIDATOR valfunction
[ DEFAULT value ];
   
ALTER TABLE name
SET OPTION optname { TO | = } { value | 'value' | DEFAULT };
   
It's just a simple thought of course. We must think better about the
  syntax
and purposes.
  
   If we're going to do this (and it seems like a good idea), we really,
   really, really need to include some general system views which expose
   the options in a user-friendly format (like columns, JSON or an
array).
It's already very hard for users to get information about what
   reloptions have been set.
  
 
  Maybe into information_schema ??

 Not there.  The information schema is defined pretty precisely in the
 SQL standard and contains only some of this kind of information.


You're absolutely right.. thanks...


 pg_catalog seems like a much more appropriate space for
 PostgreSQL-specific settings.


We can add views and some functions to get this info too.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan p...@heroku.com wrote:
 Indeed, the current behavior breaks even the canonical keep track of
 how many posts are in a thread trigger example
 use an AFTER trigger for this kind of thing, and all of these
 issues go away.

In the latest patches from CommitFest, the UPDATE case invokes
BEFOREAFTER INSERT triggers, not AFTER UPDATE. Is that going to be
changed? If so, I concede that.

 I proposed an alternative that avoids this surprise and might allow
 some other benefits. Can you please look into that?

 I looked at that.

Thanks!

 You're talking about making the case where before
 row triggers clearly are appropriate (when you just want to change the
 tuple being inserted) fail

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

* It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers,
which your approach fundamentally cannot.
* It can probably avoid evaluating column defaults in the UPDATE case,
like nextval() for unrelated serial columns = reduced overhead
* The semantics match better with MERGE-like syntax that seems to come
up again and again.
* And it appears to solve (part?) of the problem you had with naming
key *colums* in the WITHIN clause, rather than using index name.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp ma...@juffo.org wrote:
 Oh, one more consideration: I believe you will run into the same issue
 if you want to implement BEFORE UPDATE triggers in any form. Skipping
 BEFORE UPDATE entirely seems to violate POLA.

Good thing that the patch doesn't do that, then. I clearly documented
this in a few places, including:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

 It's common for applications to e.g. use triggers to keep track of
 latest modified time for a row. With your proposal, every query needs
 to include logic for that to work.

Wrong.

 If you don't see any reasons why it can't be done, these benefits seem
 clear to me. I think the tradeoffs at least warrant wider discussion.

 I don't.  That's very surprising. One day, it will fail unexpectedly.
 As proposed, the way BEFORE INSERT triggers fire almost forces users
 to consider the issues up-front.

 Not necessarily up-front, as proposed it causes existing triggers to
 change behavior when users adopt the upsert feature. And that adoption
 may even be transparent to the user due to ORM magic.

 There are potential surprises with both approaches.

When you make the slightest effort to understand what my approach is,
I might take your remarks seriously.

 Note that the CONFLICTING() behavior with respect to BEFORE INSERT
 triggers work's the same as MySQL's INSERT ON DUPLICATE KEY UPDATE
 foo = VALUES(foo) thing. There was agreement that that was the right
 behavior, it seemed.

 MySQL gets away with lots of things, they have several other caveats
 with triggers. I don't think it's a good example to follow wrt trigger
 behavior.

No true Scotsman.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 4:56 AM, Marti Raudsepp ma...@juffo.org wrote:
 create trigger ev1 before insert on evt_type execute procedure ins();
 create trigger ev2 before update on evt_type execute procedure upd();
 create trigger ev3 after insert on evt_type execute procedure ins();
 create trigger ev4 after update on evt_type execute procedure upd();

Oops, I missed for each row here.
Sorry for wasting your time.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 7:04 PM, Marti Raudsepp ma...@juffo.org wrote:
 Oops, I missed for each row here.

Note that I have an open item for what to do about statement level
triggers here: https://wiki.postgresql.org/wiki/UPSERT

I'm not satisfied with the current behavior. It needs more thought.
But I think row level triggers are fine.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-08 Thread Etsuro Fujita

(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:



On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:



PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE?  How about setting PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?



So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.



Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.



OK, I'd vote for your idea of having both the GUC and the reloption. So, I
think the patch needs to be updated.  Fujii-san, what plan do you have about
the patch?



Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?


It seems reasonable to me that the GUC has the same default value as 
work_mem.  So, +1 for the default value of 4MB.



BTW, I moved the CommitFest entry of this patch to next CF 2014-10.


OK, I'll review the patch in the CF.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-08 Thread Amit Kapila
On Wed, Oct 8, 2014 at 11:22 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Meh.  Even SELECT 1 is going to be doing *far* more pallocs than that
to
  get through raw parsing, parse analysis, planning, and execution
startup.
  If you can find a few hundred pallocs we can avoid in trivial queries,
  it would get interesting; but I'll be astonished if saving 4 is
measurable.

 I got nerd-sniped by this problem today, probably after staring at the
 profiling data very similar to what led Andres to ask the question in
 the first place.  The gains are indeed not measurable on a
 macrobenchmark, but I had to write the patch to figure that out, so
 here it is.

 Although the gain isn't a measurable percentage of total runtime, it
 does appear to be a significant percentage of the palloc traffic on
 trivial queries. I did 30-second SELECT-only pgbench runs at scale
 factor 10, using prepared queries.  According to perf, on unpatched
 master, StartTransactionCommand accounts for 11.46% of the calls to
 AllocSetAlloc.  (I imagine this is by runtime, not by call count, but
 it probably doesn't matter much either way.)  But with this patch,
 StartTransactionCommand's share drops to 4.43%.  Most of that is
 coming from AtStart_Inval(), which wouldn't be hard to fix either.

 I'm inclined to think that tamping down palloc traffic is worthwhile
 even if we can't directly measure the savings on a macrobenchmark.
 AllocSetAlloc is often at or near the top of profiling results, but
 there's rarely a single caller responsible for enough of those calls
 for to make it worth optimizing. But that means that if we refuse to
 take patches that save just a few pallocs, we're basically giving up
 on ever improving anything in this area, and that doesn't seem like a
 good idea either; it's only going to get slowly worse over time as we
 add more features.

Yeah, this makes sense, even otherwise also if somebody comes
up with a much larger patch which reduce few dozen of pallocs and shows
noticeable gain, then it will be difficult to quantify the patch based on
which particular pallocs gives improvements, as reducing few pallocs
might not give any noticeable gain.

 BTW, if anyone's curious what the biggest source of AllocSetAlloc
 calls is on this workload, the answer is ExecInitIndexScan(), which
 looks to be indirectly responsible for almost half the total (or just
 over half, with the patch applied).  That apparently calls a lot of
 different things that allocate small amounts of memory, and it
 accounts for nearly all of the AllocSetAlloc traffic that originates
 from PortalStart().

One way to dig into this problem is that we look at each individual
pallocs in PortalStart() path and try to see which one's can be
optimized, another way is that we introduce a concept of
Prepared Portals some thing similar to Prepared Statements, but for
Portal. This will not only avoid the pallocs in executor startup phase,
but can improve the performance by avoiding many other calls related
to executor startup.  I understand that this will be a much bigger
project, however the gains can also be substantial for prepared
statements when all/most of the data fits in memory.  It seems other
databases also have similar concepts for execution (example Oracle
uses Cursor_sharing/Shared cursors to achieve the same)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Amit Kapila
On Thu, Oct 9, 2014 at 4:02 AM, Andres Freund and...@2ndquadrant.com
wrote:

   /*
  + * Arrange to remove a dynamic shared memory mapping at cleanup time.
  + *
  + * dsm_keep_mapping() can be used to preserve a mapping for the entire
  + * lifetime of a process; this function reverses that decision, making
  + * the segment owned by the current resource owner.  This may be useful
  + * just before performing some operation that will invalidate the
segment
  + * for future use by this backend.
  + */
  +void
  +dsm_unkeep_mapping(dsm_segment *seg)
  +{
  + Assert(seg-resowner == NULL);
  + ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
  + seg-resowner = CurrentResourceOwner;
  + ResourceOwnerRememberDSM(seg-resowner, seg);
  +}

 Hm, I dislike the name unkeep.

I also think function name is not appropriate as per functionality.

 I guess you want to be symmetric to
 dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
 dm_remember_mapping()?

Another could be dsm_change_mapping().  Yet another idea could
be that we use single function (dsm_manage_mapping() with an
additional parameter to indicate the scope of segment) instead of
two different functions dsm_keep_mapping() and
dsm_unkeep_mapping().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread furuyao
  What set of options would you pass if you want to use it as a
  synchronous standby? And if you don't? Could we just have a single
  --synchronous
  flag, instead of -F and --reply-fsync?
 
  If you set synchronous_commit as remote_write, the options would
  be different .
  The set of options in each case, see the following.
 
 
 Synchronous standby(synchronous_commit=on)
 --fsync-interval=-1
 --reply-fsync
 --slot=slotname
 
 Synchronous standby(synchronous_commit=remote_write)
 --fsync-interval=-1
 --reply-fsync
 
 Asynchronous
 There are no relative options.
 
 
  Well, if the response time delay(value of
  --status-interval=interval) is acceptable, --reply-fsync is
  unnecessary.
  Instead of --reply-fsync, using --synchronous(which summarizes
  the --reply-fsync and fsync-interval = -1) might be easy to
  understand. Although, in that case,  --fsync-interval=interval
  would be fixed value. Isn't there any problem ?
 
  I think we should remove --fsync-interval and --reply-fsync, and just
  have a --synchronous option, which would imply the same behavior you
  get with --fsync-interval=-1 --reply--fsync.
 
  That leaves the question of whether pg_receivexlog should do fsyncs
  when it's not acting as a synchronous standby. There isn't any real
  need to do so. In asynchronous mode, there are no guarantees anyway,
  and even without an fsync, the OS will eventually flush the data to
 disk.
 
  But we could do even better than that. It would be best if you didn't
  need even the --synchronous flag. The server knows whether the client
  is a synchronous standby or not. It could tell the client.
 
  If we remove --fsync-interval, resoponse time to user will not be delay.
  Although, fsync will be executed multiple times in a short period.
  And there is no way to solve the problem without --fsync-interval, what
 should we do about it?
 
 I'm sorry, I didn't understand that.
 
  In asynchronous mode, I think there is no problem since the
 specification is same with release version.
 
  The server knows whether the client is a synchronous standby or not.
  It could tell the client.
 
  When notifying the synchronous/asynchronous mode to the client from
  the server, do we need to change the format of the Message ?
 
 Yeah. Or rather, add a new message type, to indicate the
 synchronous/asynchronous status.

Thanks for the reply.

  If we remove --fsync-interval, resoponse time to user will not be delay.
  Although, fsync will be executed multiple times in a short period.
  And there is no way to solve the problem without --fsync-interval, what
 should we do about it?
 
 I'm sorry, I didn't understand that.

Here is an example.
When WAL is sent at 100ms intervals, fsync() is executed 10 times per second.
If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of 
making WAL record) for 1 second, though, fsync() won't be executed several 
times in 1 second.
I think --fsync-interval meets the demands of people who wants to restrain 
fsync() happens for several time in short period, but what do you think?  
Is it ok to delete --fsync-interval ? 

  The server knows whether the client is a synchronous standby or not.
  It could tell the client.
 
  When notifying the synchronous/asynchronous mode to the client from
  the server, do we need to change the format of the Message ?
 
 Yeah. Or rather, add a new message type, to indicate the
 synchronous/asynchronous status.

What kind of 'message type' we have to add ?

Do we need to separate 'w' into two types ? synchronous and asynchronous ?

OR 

Add a new message type, kind of 'notify synchronous', 
and inform pg_receivexlog of synchronous status when it connect to the server.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-08 Thread Kyotaro HORIGUCHI
Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.

But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.

Finally, the attached patch works as expected with Andres's patch
1-3.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 6fc6903..2288fe2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
 	BIO_clear_retry_flags(h);
 	if (res = 0)
 	{
-		if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
+		if (!ProcDiePending 
+			(errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))
 		{
 			BIO_set_retry_write(h);
 		}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 7b5b30f..ab9e122 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
 
+retry:
 #ifdef USE_SSL
 	if (port-ssl_in_use)
 	{
@@ -210,7 +211,26 @@ secure_write(Port *port, void *ptr, size_t len)
 		n = secure_raw_write(port, ptr, len);
 	}
 
-	/* XXX: We likely will want to process some interrupts here */
+	/*
+	 * We only want to process the interrupt here if socket writes are
+	 * blocking to increase the chance to get an error message to the
+	 * client. If we're not blocked there'll soon be a
+	 * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+	 * that situation if the client has died.
+	 */
+	if (ProcDiePending  !port-noblock  n  0)
+	{
+		/*
+		 * We're dying. It's safe (and sane) to handle that now.
+		 */
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* retry after processing interrupts */
+	if (n  0  errno == EINTR)
+	{
+		goto retry;
+	}
 
 	return n;
 }
@@ -236,10 +256,19 @@ wloop:
 		 * don't do anything while (possibly) inside a ssl library.
 		 */
 		w = WaitLatchOrSocket(MyProc-procLatch,
-			  WL_SOCKET_WRITEABLE,
+			  WL_LATCH_SET | WL_SOCKET_WRITEABLE,
 			  port-sock, 0);
 
-		if (w  WL_SOCKET_WRITEABLE)
+		if (w  WL_LATCH_SET)
+		{
+			ResetLatch(MyProc-procLatch);
+			/*
+			 * Force a return, so interrupts can be processed when not
+			 * (possibly) underneath a ssl library.
+			 */
+			errno = EINTR;
+		}
+		else if (w  WL_SOCKET_WRITEABLE)
 		{
 			goto wloop;
 		}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a6aa1c..61390aa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -520,6 +520,13 @@ ProcessClientReadInterrupt(void)
 
 		errno = save_errno;
 	}
+	else if (ProcDiePending)
+	{
+		/*
+		 * We're dying. It's safe (and sane) to handle that now.
+		 */
+		CHECK_FOR_INTERRUPTS();
+	}
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Simon Riggs
On 8 October 2014 23:24, Peter Geoghegan p...@heroku.com wrote:

 Also, it would be useful to hear that your're going to do something
 about the references to rows using conflicting(), since nobody has
 agreed with you there. Or hopefully even that you've listened and
 implemented something differently already. (We need that, whatever we
 do with other elements of syntax).

 Do you really expect me to do major work on some aspect of the syntax
 like that, given, as you say, that nobody explicitly agreed with me
 (and only you disagreed with me)? The only remark I heard on that was
 from you (you'd prefer to use NEW.* and OLD.*).
 But you spent much
 more time talking about getting something MERGE-like, which
 NEW.*/OLD.* clearly isn't.

Yes, it is. Look at the AS clause.

 CONFLICTING() is very close (identical?) to MySQL's use of ON
 DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that,
 but it's news to me that people take particular issue with it.

3 people have asked you questions or commented about the use of
CONFLICTING() while I've been watching. It's clearly a non-standard
mechanism and not inline with other Postgres usage. Nobody actually
says I object to this - do they need to use that phrase before you
take note?

I'm beginning to feel that giving you review comments is being seen as
some kind of negative action. Needing to repeat myself makes it clear
that you aren't taking note.

Yes, I expect you to do these things
* collect other people's input, even if you personally disagree
* if there is disagreement amongst reviewers, seek to resolve that in
a fair and reasonable manner
* publish a summary of changes requested
* do major work to address them

So yes, I really expect that.

It doesn't matter that it is only SImon or only Kevin. **One**
comment is enough for you to take note.

If there is disagreement, publishing the summary of changes you plan
to make in your next version will help highlight that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers