[HACKERS] Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX
On Mon, Oct 06, 2014 at 08:57:54PM -0400, Tom Lane wrote: I eventually realized that the critical difference was you'd added CFLAGS= to the configure call. On this platform that has the net effect of removing -O2 from the compiler flags, and apparently that shifts around the stack layout enough to expose the clobber. The fix is simple enough: ecpg's version of ParseDateTime is failing to check for overrun of the field[] array until *after* it's already clobbered the stack: Thanks for tracking that down. Oops. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote: What set of options would you pass if you want to use it as a synchronous standby? And if you don't? Could we just have a single --synchronous flag, instead of -F and --reply-fsync? If you set synchronous_commit as remote_write, the options would be different . The set of options in each case, see the following. Synchronous standby(synchronous_commit=on) --fsync-interval=-1 --reply-fsync --slot=slotname Synchronous standby(synchronous_commit=remote_write) --fsync-interval=-1 --reply-fsync Asynchronous There are no relative options. Well, if the response time delay(value of --status-interval=interval) is acceptable, --reply-fsync is unnecessary. Instead of --reply-fsync, using --synchronous(which summarizes the --reply-fsync and fsync-interval = -1) might be easy to understand. Although, in that case, --fsync-interval=interval would be fixed value. Isn't there any problem ? I think we should remove --fsync-interval and --reply-fsync, and just have a --synchronous option, which would imply the same behavior you get with --fsync-interval=-1 --reply--fsync. That leaves the question of whether pg_receivexlog should do fsyncs when it's not acting as a synchronous standby. There isn't any real need to do so. In asynchronous mode, there are no guarantees anyway, and even without an fsync, the OS will eventually flush the data to disk. But we could do even better than that. It would be best if you didn't need even the --synchronous flag. The server knows whether the client is a synchronous standby or not. It could tell the client. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog always handles -d option argument as connstr
Amit Kapilaamit.kapil...@gmail.com On Tue, Oct 7, 2014 at 8:13 PM, Sawada Masahiko sawada.m...@gmail.com javascript:_e(%7B%7D,'cvml','sawada.m...@gmail.com'); wrote: On Tue, Oct 7, 2014 at 12:58 PM, Amit Kapila amit.kapil...@gmail.com javascript:_e(%7B%7D,'cvml','amit.kapil...@gmail.com'); wrote: On Mon, Oct 6, 2014 at 10:23 PM, Sawada Masahiko sawada.m...@gmail.com javascript:_e(%7B%7D,'cvml','sawada.m...@gmail.com'); wrote: Hi all, pg_receivexlog always handles argument of -d option as connstr formatted value. We can doubly specify host name, port number. The other client tools handles -d option as connstr value only if argument has = character. pg_basebackup also seems to behave same as pg_receivexlog. psql also treats it in similar way. The behaviour of psql is as below: psql.exe -d=host=localhost port=5432 dbname=postgres psql: invalid connection option psql.exe -d host=localhost port=5432 dbname=postgres psql (9.5devel) WARNING: Console code page (437) differs from Windows code page (1252) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details. Type help for help. postgres=# The document says that pg_receivexlog ignores database name, and this option is called for consistency with other client applications. But if we specify database name like other client tool '-d hoge' , then we will definitely got error. What I understand from document is that it ignores database name when given in connection string. Yep, but we can use -d option like 'psql -d postgres'. pg_receivexlog and pg_basebackup doesn't seem to work like that. they always handles argument as connstr formatted value. psql needs to connect to particular database for its functioning where as pg_basebackup/pg_receivexlog works differently and doesn't need to connect to particular database for its functioning, so I am not sure why you want them to behave similarly for -d switch. Moreover the same is clarified in docs, so whats the issue? Yes, I got your point. I thought that it's a little confusion that, Although the document says -d option of pg_receivexlog is called for consistency, Behavior of psql(other tool) and pg_receivexlog aren't exactly same. If user use same -d option like '-d hoge_db' on both tool pg_receivexlog and psql for easir setting, then it leads to occur error on pg_receivexlog side. I thought pg_receivexlog should ignore database name even thought -d option value is not connsrr formatted value, same as psql dose. But there may be not any such case. Regards, Sawada Masahiko -- Regards, --- Sawada Masahiko
Re: [HACKERS] Promise index tuples for UPSERT
On Tue, 2014-10-07 at 13:33 +0100, Simon Riggs wrote: Is there a way of detecting that we are updating a unique constraint column and then applying the HW locking only in that case? Or can we only apply locking when we have multiple unique constraints on a table? What is the use case of doing an UPSERT into a table with multiple unique constraints? Consider table user with unique columns name and email and a non-unique column age. If it has data Jack | j...@example.com |33 Tom | t...@example.com | 35 And the user does UPSERT values (Jack, t...@example.com, 34). The proposed specification will pick random unique index (either name or email index) and do an update of that row. First, this will cause unique violation, second, doing the UPSERT on random index seems confusing. The MySQL documentation says that you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1]. The proposed feature's documentation has the same suggestion[2]. Still, the feature defaults to this behavior. Why is the default something the documentation says you shouldn't do? Going a bit further, I am wondering what is the use case of doing an UPSERT against multiple unique indexes? If multiple unique indexes UPSERT could be dropped that might allow for faster or cleaner index locking techniques. - Anssi 1: http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html 2: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: We kept the WAL files and log files for further analysis. How can we help regarding this issue? Commit c2f79ba has added as assumption that the WAL receiver should always enforce the create of .done files when WAL files are done being streamed (XLogWalRcvWrite and WalReceiverMain) or archived (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has changed a bit RemoveOldXlogFiles, removing a check looking if the node is in recovery. Now, based on the information given here yes it happens that there are still cases where .done file creation is not correctly done, leading to those extra files. Even by looking at the code, I am not directly seeing any code paths where an extra call to XLogArchiveForceDone would be needed on the WAL receiver side but... Something like the patch attached (which is clearly a band-aid) may help though as it would make files to be removed even if they are not marked as .done for a node in recovery. And this is consistent with the pre-1bd42cd. Comments welcome. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5a4dbb9..39701a3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) strspn(xlde-d_name, 0123456789ABCDEF) == 24 strcmp(xlde-d_name + 8, lastoff + 8) = 0) { - if (XLogArchiveCheckDone(xlde-d_name)) + if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name)) { snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen anssi.kaariai...@thl.fi wrote: The MySQL documentation says that you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1]. The proposed feature's documentation has the same suggestion[2]. Still, the feature defaults to this behavior. Why is the default something the documentation says you shouldn't do? MySQL started saying that when they realized it broke their statement-based replication. They have their own weird reasons for disliking it. Now, that's not to minimize the risks. The reasoning behind making the unique index specification optional is: We cannot easily cover corner cases with another syntax - unique indexes must be named directly to cover every case, and make the user's intent absolutely clear. That's not obviously the case, but consider partial unique indexes, for example. Or consider uniquely constrained columns, with an almost equivalent uniquely constrained expression on those same columns. On the one hand I am not comfortable failing to support those, but on the other hand it could get very messy to do it another way. As we all know, naming a unique index in DML is ugly, and has poor support in ORMs. It seems likely that we're better off making it optional - it hasn't been much of a problem with the existing subxact looping pattern. A lot of the time, there will only be a single unique index anyway, or there will be a trivial serial PK unique index and the unique index we always want to treat would-be conflicts within as triggering the alternative UPDATE/IGNORE path. Going a bit further, I am wondering what is the use case of doing an UPSERT against multiple unique indexes? If multiple unique indexes UPSERT could be dropped that might allow for faster or cleaner index locking techniques. I see no reason to think that. I don't think it buys us much at all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 10/08/2014 11:10 AM, Peter Geoghegan wrote: The reasoning behind making the unique index specification optional is: We cannot easily cover corner cases with another syntax - unique indexes must be named directly to cover every case, and make the user's intent absolutely clear. That's not obviously the case, but consider partial unique indexes, for example. Or consider uniquely constrained columns, with an almost equivalent uniquely constrained expression on those same columns. On the one hand I am not comfortable failing to support those, but on the other hand it could get very messy to do it another way. As we all know, naming a unique index in DML is ugly, and has poor support in ORMs. I vehemently object to naming indexes in the UPSERT statement. That mixes logical and physical database design, which is a bad idea. This is not ISAM. Instead of naming the index, you should name the columns, and the system can look up the index or indexes that match those columns. (Remind me again, where did this need to name an index come from in the first place?) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 3:47 AM, Peter Geoghegan p...@heroku.com wrote: It seems like what you're talking about here is just changing the spelling of what I already have. I think there's a subtle difference in expectations too. The current BEFORE INSERT trigger behavior is somewhat defensible with an INSERT-driven syntax (though I don't like it even now [1]). But the MERGE syntax, to me, strongly implies that insertion doesn't begin before determining whether a conflict exists or not. [1] http://www.postgresql.org/message-id/CABRT9RD6zriK+t6mnqQOqaozZ5z1bUaKh+kNY=o9zqbzfoa...@mail.gmail.com Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Context lenses to set/get values in json values.
Hello. I am interested in the json type on postgresql. I would like to implement additional operations on the json structure that may extract/insert table like information from the json tree structure. I have a implementation on javascript that shows this type of operations. You can see examples in this page https://github.com/paweld2/eelnss/wiki Following the examples in the previous page, it may by possible to implement a function similar to json_populate_record to extract multiple records from a single json value, for example: select * from json_populate_records_with_clen(null::myrowtype_users, 'app.users.{:uID}.(email,data.name,isActive)', '... nested json value ...') may return uID | email | name | isActive -- u1 | ad...@pmsoft.eu| administrator | true u2 | nor...@pmsoft.eu | user | true u3 | testu...@pmsoft.eu | testUser| false Also, assuming that we have a table User as above (uID, email, name, isActive), with context lenses it is very simple to map the table to a json object. I assume that a similar api to table_to_xml,query_to_xml may be provided: table_to_json( Person, 'app.users.{:uID}.(email,data.name,isActive)'); query_to_json( 'select * from Person where ... ', 'app.users.{:uID}.(email, data.name,isActive)'); I don't know the details about the integration of functions/operators to sql queries, but because context lenses maps between tables and tree objects, it may be possible to use a column json value as a separate table in the queries. Assume the table create table Person { pID Integer address Json } then it may be possible to query: select * from Person as P left join ( select * from json_populate_records_with_clen(null::addressType, 'addres.(street.number, street.local,city.code,city.name)', P.address); A final api for such functions needs to be defined. If such functions may be usefull, I can try to prepare a implementation in postgres base code. Regards. Pawel Cesar Sanjuan Szklarz.
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote: What set of options would you pass if you want to use it as a synchronous standby? And if you don't? Could we just have a single --synchronous flag, instead of -F and --reply-fsync? If you set synchronous_commit as remote_write, the options would be different . The set of options in each case, see the following. Synchronous standby(synchronous_commit=on) --fsync-interval=-1 --reply-fsync --slot=slotname Synchronous standby(synchronous_commit=remote_write) --fsync-interval=-1 --reply-fsync Asynchronous There are no relative options. Well, if the response time delay(value of --status-interval=interval) is acceptable, --reply-fsync is unnecessary. Instead of --reply-fsync, using --synchronous(which summarizes the --reply-fsync and fsync-interval = -1) might be easy to understand. Although, in that case, --fsync-interval=interval would be fixed value. Isn't there any problem ? I think we should remove --fsync-interval and --reply-fsync, and just have a --synchronous option, which would imply the same behavior you get with --fsync-interval=-1 --reply--fsync. That leaves the question of whether pg_receivexlog should do fsyncs when it's not acting as a synchronous standby. There isn't any real need to do so. In asynchronous mode, there are no guarantees anyway, and even without an fsync, the OS will eventually flush the data to disk. But we could do even better than that. It would be best if you didn't need even the --synchronous flag. The server knows whether the client is a synchronous standby or not. It could tell the client. Thanks for the reply. If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? In asynchronous mode, I think there is no problem since the specification is same with release version. The server knows whether the client is a synchronous standby or not. It could tell the client. When notifying the synchronous/asynchronous mode to the client from the server, do we need to change the format of the Message ? Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Instead of naming the index, you should name the columns, and the system can look up the index or indexes that match those columns. It's not totally clear that we need *any* WITHIN clause, BTW. I'm not dead set on it. It was something I mainly added at Kevin's request. I do see the risks, though. (Remind me again, where did this need to name an index come from in the first place?) I agree that naming indexes is ugly, and best avoided. It's tricky, though. The first thing you might try is to look up the index during parse analysis and/or planning. That could work in simple cases (which are probably the vast majority), but I'm worried about stuff like before row triggers that change the values being inserted out from under us, in a way that interferes with partial unique indexes. Maybe you could choose the wrong one if you didn't leave it until the last minute. But it's not very convenient to leave it until the last minute. If you're willing to live with the command conservatively failing when there isn't a clean specification (although not in a way that can make the command fail when the user innocently adds a unique index later), then I think I can do it. Otherwise, it could be surprisingly hard to cover all the cases non-surprisingly. I freely admit that putting a unique index in a DML statement is weird, but it is sort of the most direct way of expressing what we want. Oracle actually have an INSERT-IGNORE like hint that names a unique index (yes, really - see the UPSERT wiki page). That's really bizarre, but the point is that they may have felt that there was no reasonable alternative. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote: I think there's a subtle difference in expectations too. The current BEFORE INSERT trigger behavior is somewhat defensible with an INSERT-driven syntax (though I don't like it even now [1]). There is no way around it. We need to fire before row triggers to know what to insert on the one hand, but on the other hand (in general) we have zero ability to nullify the effects (or side-effects) of before triggers, since they may execute arbitrary user-defined code. I think there is a good case to be made for severely restricting what before row triggers can do, but it's too late for that. But the MERGE syntax, to me, strongly implies that insertion doesn't begin before determining whether a conflict exists or not. I think you're right. Another strike against the MERGE syntax, then, since as I said we cannot even know what to check prior to having before row insert triggers fire. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Tue, Oct 7, 2014 at 2:27 PM, Marti Raudsepp ma...@juffo.org wrote: but the new approach seems surprising: changes from BEFORE INSERT get persisted in the database, but AFTER INSERT is not fired. I am sorry, I realize now that I misunderstood the current proposed trigger behavior, I thought what Simon Riggs wrote here already happens: https://groups.google.com/forum/#!msg/django-developers/hdzkoLYVjBY/bnXyBVqx95EJ But the point still stands: firing INSERT triggers when the UPDATE path is taken is counterintuitive. If we prevent changes of upsert key columns in BEFORE triggers then we get the benefits, including more straightforward trigger behavior and avoid problems with serial columns. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On 10/08/2014 10:44 AM, Michael Paquier wrote: On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: We kept the WAL files and log files for further analysis. How can we help regarding this issue? Commit c2f79ba has added as assumption that the WAL receiver should always enforce the create of .done files when WAL files are done being streamed (XLogWalRcvWrite and WalReceiverMain) or archived (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has changed a bit RemoveOldXlogFiles, removing a check looking if the node is in recovery. Now, based on the information given here yes it happens that there are still cases where .done file creation is not correctly done, leading to those extra files. Even by looking at the code, I am not directly seeing any code paths where an extra call to XLogArchiveForceDone would be needed on the WAL receiver side but... Something like the patch attached (which is clearly a band-aid) may help though as it would make files to be removed even if they are not marked as .done for a node in recovery. And this is consistent with the pre-1bd42cd. There are two mysteries here: 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. Since this only happens with streaming replication, the FF segments are probably being created by walreceiver. XLogWalRcvWrite is the function that opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite opens the file corresponding the start position in the message received from the master. There is no check that the start position is valid, though; if the master sends a start position in the FF segment, walreceiver will merrily write it. So the problem could be in the walsender side. However, I don't see anything wrong there either. I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment. 2. Why are the .done files sometimes not being created? I may have an explanation for that. Walreceiver creates a .done file when it closes an old segment and opens a new one. However, it does this only when it's about to start writing to the new segment, and still has the old segment open. If you stream the FE segment fully, but drop replication connection at exactly that point, the .done file is not created. That might sound unlikely, but it's actually pretty easy to trigger. Just do select pg_switch_xlog() in the master, followed by pg_ctl stop -m i and a restart. The creation of the .done files seems quite unreliable anyway. If only a portion of a segment is streamed, we don't write a .done file for it, so we still have the original problem that we will try to archive the segment after failover, even though the master might already have archived it. I looked again at the thread where this was discussed: http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com. I believe the idea was that the server that generates a WAL segment is always responsible for archiving it. A standby should never attempt to archive a WAL segment that was restored from the archive, or streamed from the master. In that thread, it was not discussed what should happen to WAL files that an admin manually copies into pg_xlog of the standby. Should the standby archive them? I don't think so - the admin should copy them manually to the archive too, if he wants them archived. It's a good and simple rule that the server that generates the WAL, archives the WAL. Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, 2014-10-08 at 02:22 -0700, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Instead of naming the index, you should name the columns, and the system can look up the index or indexes that match those columns. It's not totally clear that we need *any* WITHIN clause, BTW. I'm not dead set on it. It was something I mainly added at Kevin's request. I do see the risks, though. To be usable in Django ORM's .save() method there must be an option to use the primary key index, and only the primary key index for conflict resolution. Assume an author table with id SERIAL PRIMARY KEY, email TEXT UNIQUE, age INTEGER, then when saving an object, Django must update the row with matching primary key value, or otherwise do an insert. Doing an update of matching email column isn't allowed. So, Django can't use the query: INSERT INTO author values(1, 't...@example.com', 35) ON CONFLICT UPDATE SET email = conflicting(email), age = conflicting(age); If the table contains data (id=2, email='t...@example.com', age=34), the query would update tom's age to 35 when it should have resulted in unique constraint violation. Other ORMs have similar save/persist implementations, that is objects are persisted on primary key identity alone. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, 2014-10-08 at 01:10 -0700, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen anssi.kaariai...@thl.fi wrote: The MySQL documentation says that you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes[1]. The proposed feature's documentation has the same suggestion[2]. Still, the feature defaults to this behavior. Why is the default something the documentation says you shouldn't do? As we all know, naming a unique index in DML is ugly, and has poor support in ORMs. It seems likely that we're better off making it optional - it hasn't been much of a problem with the existing subxact looping pattern. The subxact approach is a bit different than the proposed UPSERT command. It loops: try: INSERT INTO author VALUE('Jack', 't...@example.com', 34) except UniqueConstraintViolation: UPDATE author SET ... WHERE name = 'Jack' while the UPSERT command does something like: try: INSERT INTO author VALUE('Jack', 't...@example.com', 34) except UniqueConstaintViolation: UPDATE author SET ... WHERE name = 'Jack' OR email = 't...@example.com' LIMIT 1; - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 12:28 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote: I think there's a subtle difference in expectations too. The current BEFORE INSERT trigger behavior is somewhat defensible with an INSERT-driven syntax (though I don't like it even now [1]). There is no way around it. We need to fire before row triggers to know what to insert on the one hand, but on the other hand (in general) we have zero ability to nullify the effects (or side-effects) of before triggers, since they may execute arbitrary user-defined code. With my proposal this problem disappears: if we prevent BEFORE triggers from changing key attributes of NEW in the case of upsert, then we can acquire value locks before firing any triggers (before even constructing the whole tuple), and have a guarantee that the value locks are still valid by the time we proceed with the actual insert/update. Other than changing NEW, the side effects of triggers are not relevant. Now, there may very well be reasons why this is tricky to implement, but I haven't heard any. Can you see any concrete reasons why this won't work? I can take a shot at implementing this, if you're willing to consider it. I think there is a good case to be made for severely restricting what before row triggers can do, but it's too late for that. There are no users of new upsert syntax out there yet, so it's not too late to rehash the semantics of that. This in no way affects users of old INSERT/UPDATE syntax. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to support SEMI and ANTI join removal
On Tue, Oct 7, 2014 at 3:46 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 6, 2014 at 5:57 AM, David Rowley dgrowle...@gmail.com wrote: Can anyone shed any light on how I might determine where the scan rel is in the tree? I need to find it so I can check if the RangeTblEntry is marked as skip-able. I think you're probably going to need logic that knows about particular node types and does the right thing for each one. I think - but maybe I'm misunderstanding - what you're looking for is a function of the form Oid ScansOnePlainTableWithoutQuals(). The algorithm could be something like: switch (node type) { case T_SeqScanState: if (no quals) return the_appropriate_oid; return false; case T_HashJoin: decide whether we can ignore one side of the join fish out the node from the other side of the join (including reaching through the Hash node if necessary) return ScansOnePlainTableWithoutQuals(the node we fished out); ...other specific cases... default: return false; } Thanks Robert. Ok, so I've been hacking away at this for a couple of evenings and I think I have a working prototype finally! My earlier thoughts about having to descend down until I find a seqscan were wrong. It looks like just need to look at the next node down, if it's a seqscan and it's marked as not needed, then we can skip that side of the join, or if the child node is a HashJoinState then check the skip status of that node, if both sides are marked as not needed, then skip that side of the join. I've just completed some simple performance tests: create table t3 (id int primary key); create table t2 (id int primary key, t3_id int not null references t3); create table t1 (id int primary key, t2_id int not null references t2); I've loaded these tables with 4 million rows each.The performance is as follows: test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join t3 on t2.t3_id=t3.id; count - 400 (1 row) Time: 1022.492 ms test=# select count(*) from t1; count - 400 (1 row) Time: 693.642 ms test=# alter table t2 drop constraint t2_t3_id_fkey; ALTER TABLE Time: 2.141 ms test=# alter table t1 drop constraint t1_t2_id_fkey; ALTER TABLE Time: 1.678 ms test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join t3 on t2.t3_id=t3.id; count - 400 (1 row) Time: 11682.525 ms So it seems it's not quite as efficient as join removal at planning time, but still a big win when it's possible to perform the join skipping. As yet, I've only added support for hash joins, and I've not really looked into detail on what's needed for nested loop joins or merge joins. For hash join I just added some code into the case HJ_BUILD_HASHTABLE: in ExecHashJoin(). The code just checks if any side can be skipped, if they can then the node will never move out of the HJ_BUILD_HASHTABLE state, so that each time ExecHashJoin() is called, it'll just return the next tuple from the non-skip side of the join until we run out of tuples... Or if both sides can be skipped NULL is returned as any nodes above this shouldn't attempt to scan, perhaps this should just be an Assert() as I don't think any parent nodes should ever bother executing that node if it's not required. The fact that I've put this code into the switch under HJ_BUILD_HASHTABLE makes me think it should add about close to zero overhead for when the join cannot be skipped. One thing that we've lost out of this execution time join removal checks method is the ability to still remove joins where the join column is NULLable. The previous patch added IS NOT NULL to ensure the query was equivalent when a join was removed, of course this meant that any subsequent joins may later not have been removed due to the IS NOT NULL quals existing (which could restrict the rows and remove the ability that the foreign key could guarantee the existence of exactly 1 row matching the join condition). I've not yet come up with a nice way to reimplement these null checks at execution time, so I've thought that perhaps I won't bother, at least not for this patch. I'd just disable join skipping at planning time if any of the join columns can have NULLs. Anyway this is just a progress report, and also to say thanks to Andres, you might just have saved this patch with the execution time checking idea. I'll post a patch soon, hopefully once I have merge and nestloop join types working. Regards David Rowley
Re: [HACKERS] Patch to support SEMI and ANTI join removal
On 2014-10-09 00:21:44 +1300, David Rowley wrote: Ok, so I've been hacking away at this for a couple of evenings and I think I have a working prototype finally! Cool! So it seems it's not quite as efficient as join removal at planning time, but still a big win when it's possible to perform the join skipping. Have you checked where the overhead is? Is it really just the additional node that the tuples are passed through? Have you measured the overhead of the plan/execution time checks over master? One thing that we've lost out of this execution time join removal checks method is the ability to still remove joins where the join column is NULLable. The previous patch added IS NOT NULL to ensure the query was equivalent when a join was removed, of course this meant that any subsequent joins may later not have been removed due to the IS NOT NULL quals existing (which could restrict the rows and remove the ability that the foreign key could guarantee the existence of exactly 1 row matching the join condition). I've not yet come up with a nice way to reimplement these null checks at execution time, so I've thought that perhaps I won't bother, at least not for this patch. I'd just disable join skipping at planning time if any of the join columns can have NULLs. Sounds fair enough for the first round. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 10/08/2014 11:47 AM, furu...@pm.nttdata.co.jp wrote: On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote: What set of options would you pass if you want to use it as a synchronous standby? And if you don't? Could we just have a single --synchronous flag, instead of -F and --reply-fsync? If you set synchronous_commit as remote_write, the options would be different . The set of options in each case, see the following. Synchronous standby(synchronous_commit=on) --fsync-interval=-1 --reply-fsync --slot=slotname Synchronous standby(synchronous_commit=remote_write) --fsync-interval=-1 --reply-fsync Asynchronous There are no relative options. Well, if the response time delay(value of --status-interval=interval) is acceptable, --reply-fsync is unnecessary. Instead of --reply-fsync, using --synchronous(which summarizes the --reply-fsync and fsync-interval = -1) might be easy to understand. Although, in that case, --fsync-interval=interval would be fixed value. Isn't there any problem ? I think we should remove --fsync-interval and --reply-fsync, and just have a --synchronous option, which would imply the same behavior you get with --fsync-interval=-1 --reply--fsync. That leaves the question of whether pg_receivexlog should do fsyncs when it's not acting as a synchronous standby. There isn't any real need to do so. In asynchronous mode, there are no guarantees anyway, and even without an fsync, the OS will eventually flush the data to disk. But we could do even better than that. It would be best if you didn't need even the --synchronous flag. The server knows whether the client is a synchronous standby or not. It could tell the client. If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. In asynchronous mode, I think there is no problem since the specification is same with release version. The server knows whether the client is a synchronous standby or not. It could tell the client. When notifying the synchronous/asynchronous mode to the client from the server, do we need to change the format of the Message ? Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 2. LWLockWakeup() { .. #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif .. } Earlier while releasing lock, we don't count it towards LWLock stats spin_delay_count. I think if we see other places in lwlock.c, it only gets counted when we try to acquire it in a loop. I think the previous situation was clearly suboptimal. I've now modified things so all spinlock acquirations are counted. 3. LWLockRelease() { .. /* grant permission to run, even if a spurious share lock increases lockcount */ else if (mode == LW_EXCLUSIVE have_waiters) check_waiters = true; /* nobody has this locked anymore, potential exclusive lockers get a chance */ else if (lockcount == 0 have_waiters) check_waiters = true; .. } It seems comments have been reversed in above code. No, they look right. But I've expanded them in the version I'm going to post in a couple minutes. 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. 6. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. if (wokeup_somebody waiter-lwWaitMode == LW_EXCLUSIVE) continue; .. if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE) { .. wokeup_somebody = true; } .. } .. } a. IIUC above logic, if the waiter queue is as follows: (S-Shared; X-Exclusive) S X S S S X S S it can skip the exclusive waiters and release shared waiter. If my understanding is right, then I think instead of continue, there should be *break* in above logic. No, it looks correct to me. What happened is that the first S was woken up. So there's no point in waking up an exclusive locker, but further non-exclusive lockers can be woken up. b. Consider below sequence of waiters: (S-Shared; X-Exclusive) S S X S S I think as per un-patched code, it will wakeup waiters uptill (including) first Exclusive, but patch will wake up uptill (*excluding*) first Exclusive. I don't think the current code does that. And it'd be a pretty pointless behaviour, leading to useless increased contention. The only time it'd make sense for X to be woken up is when it gets run faster than the S processes. 7. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) l-waiters) { .. dlist_delete(waiter-lwWaitLink); dlist_push_tail(wakeup, waiter-lwWaitLink); .. } .. } Use of dlist has simplified the code, but I think there might be a slight overhead of maintaining wakeup queue as compare to un-patched mechanism especially when there is a long waiter queue. I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. 8. LWLockConditionalAcquire() { .. /* * We ran into an exclusive lock and might have blocked another * exclusive lock from taking a shot because it took a time to back * off. Retry till we are either sure we didn't block somebody (because * somebody else certainly has the lock) or till we got it. * * We cannot rely on the two-step lock-acquisition protocol as in * LWLockAcquire because we're not using it. */ if (potentially_spurious) { SPIN_DELAY(); goto retry; } .. } Due to above logic, I think it can keep on retrying for long time before it actually concludes whether it got lock or not incase other backend/'s takes Exclusive lock after *double_check* and release before unconditional increment of shared lock in function LWLockAttemptLock. I understand that it might be difficult to have such a practical scenario, however still there is a theoratical possibility of same. I'm not particularly concerned. We could optimize it a bit, but I really don't think it's necessary. Is there any advantage of retrying in LWLockConditionalAcquire()? It's required for correctness. We only retry if we potentially blocked an exclusive acquirer (by spuriously incrementing/decrementing lockcount with 1). We need to be sure to either get the lock (in which case we can wake up the waiter on release), or be sure that we didn't disturb anyone. 9. LWLockAcquireOrWait()
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 14:47:44 +0200, Andres Freund wrote: On 2014-06-25 19:06:32 +0530, Amit Kapila wrote: 5. LWLockWakeup() { .. dlist_foreach_modify(iter, (dlist_head *) wakeup) { PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur); LOG_LWDEBUG(LWLockRelease, l, mode, release waiter); dlist_delete(waiter-lwWaitLink); pg_write_barrier(); waiter-lwWaiting = false; PGSemaphoreUnlock(waiter-sem); } .. } Why can't we decrement the nwaiters after waking up? I don't think there is any major problem even if callers do that themselves, but in some rare cases LWLockRelease() might spuriously assume that there are some waiters and tries to call LWLockWakeup(). Although this doesn't create any problem, keeping the value sane is good unless there is some problem in doing so. That won't work because then LWLockWakeup() wouldn't be called when necessary - precisely because nwaiters is 0. Err, this is bogus. Memory fail. The reason I've done so is that it's otherwise much harder to debug issues where there are backend that have been woken up already, but haven't rerun yet. Without this there's simply no evidence of that state. I can't see this being relevant for performance, so I'd rather have it stay that way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. Since this only happens with streaming replication, the FF segments are probably being created by walreceiver. XLogWalRcvWrite is the function that opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite opens the file corresponding the start position in the message received from the master. There is no check that the start position is valid, though; if the master sends a start position in the FF segment, walreceiver will merrily write it. So the problem could be in the walsender side. However, I don't see anything wrong there either. Good to hear that. By looking at the wal receiver and sender code paths, I found nothing really weird. I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment. Then we're good for a check in ProcessWalSndrMessage for walEnd I guess. Seems like a straight-forward patch. 2. Why are the .done files sometimes not being created? I may have an explanation for that. Walreceiver creates a .done file when it closes an old segment and opens a new one. However, it does this only when it's about to start writing to the new segment, and still has the old segment open. If you stream the FE segment fully, but drop replication connection at exactly that point, the .done file is not created. That might sound unlikely, but it's actually pretty easy to trigger. Just do select pg_switch_xlog() in the master, followed by pg_ctl stop -m i and a restart. That's exactly the test I have been doing a couple of times to trigger this behavior before sending my previous email, but without success on the standby with master: all the WAL files were marked as .done. Now, I have just retried it, with far more tries on REL9_3_STABLE and on HEAD and I have been able to actually trigger the problem a couple of times. Simply run a long transaction generating a lot of WAL like that: =# create table aa as select generate_series(1,10); And then run that: $ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start And with enough luck, .ready files may appear. It may take a dozen of tries before seeing at least ones. And I noticed that generally multiple .ready files appear at the same time. The creation of the .done files seems quite unreliable anyway. If only a portion of a segment is streamed, we don't write a .done file for it, so we still have the original problem that we will try to archive the segment after failover, even though the master might already have archived it. Yep. Agreed. I looked again at the thread where this was discussed: http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com. I believe the idea was that the server that generates a WAL segment is always responsible for archiving it. A standby should never attempt to archive a WAL segment that was restored from the archive, or streamed from the master. In that thread, it was not discussed what should happen to WAL files that an admin manually copies into pg_xlog of the standby. Should the standby archive them? I don't think so - the admin should copy them manually to the archive too, if he wants them archived. It's a good and simple rule that the server that generates the WAL, archives the WAL. Question time: why has the check based on recovery state of the node been removed in 1bd42cd? Just assuming, but did you have in mind that relying on XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at this point? Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. The additional process at promotion sounds like a good idea, I'll try to get a patch done tomorrow. This would result as well in removing the XLogArchiveForceDone stuff. Either way, not that I have been able to reproduce the problem manually, things can be clearly solved. Regards, -- Michael
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 8 October 2014 01:47, Peter Geoghegan p...@heroku.com wrote: It seems like what you're talking about here is just changing the spelling of what I already have. I think that would be confusing to users when the time comes to actually implement a fully-generalized MERGE, even with the clearly distinct MERGE CONCURRENTLY variant outlined here (which, of course, lacks an outer join, unlike MERGE proper). I change my view on this, after some more thought. (Hope that helps) If we implement MERGE, I can see we may also wish to implement MERGE CONCURRENTLY one day. That would be different to UPSERT. So in the future I think we will need 3 commands 1. MERGE 2. MERGE CONCURRENTLY 3. UPSERT So we no longer need to have the command start with the MERGE keyword. However, unlike the idea of trying to square the circle of producing a general purpose MERGE command that also supports the UPSERT use-case, my objection to this much more limited proposal is made purely on aesthetic grounds. I think that it is not very user-friendly; I do not think that it's a total disaster, which is what trying to solve both problems at once (MERGE bulkloading and UPSERTing) would result in. So FWIW, if the community is really set on something that includes the keyword MERGE, which is really all you outline here, then I can live with that. We will one day have MERGE according to the SQL Standard. My opinion is that syntax for this should be similar to MERGE in the *body* of the command, rather than some completely different syntax. e.g. WHEN NOT MATCHED THEN INSERT WHEN MATCHED THEN UPDATE I'm happy that we put that to a vote on what the syntax should be, as long as we bear in mind that we will one day have MERGE as well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 8 October 2014 00:34, Peter Geoghegan p...@heroku.com wrote: INSERTs see #2 win, and by a wider margin than #1 beat #2 with UPDATEs. However, insert.sh is by design very unsympathetic towards #1. It uses a serial primary key, so every INSERT uselessly obtains a HW lock on the same leaf page for the duration of heap insertion. Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and 20,441.57 TPS for #2. So you're pretty much seeing the full brunt of page heavyweight locking, and it isn't all that bad. Lets see the results of running a COPY please. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi, Attached you can find the next version of my LW_SHARED patchset. Now that atomics are committed, it seems like a good idea to also add their raison d'être. Since the last public version I have: * Addressed lots of Amit's comments. Thanks! * Peformed a fair amount of testing. * Rebased the code. The volatile removal made that not entirely trivial... * Significantly cleaned up and simplified the code. * Updated comments and such * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case) The feature currently consists out of two patches: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. 2) Implement the wait free LW_SHARED algorithm. Personally I'm quite happy with the new state. I think it needs more review, but I personally don't know of anything that needs changing. There's lots of further improvements that could be done, but let's get this in first. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 7 Oct 2014 15:32:34 +0200 Subject: [PATCH 1/2] Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. Author: Andres Freund --- src/backend/access/transam/twophase.c | 1 - src/backend/storage/lmgr/lwlock.c | 151 +- src/backend/storage/lmgr/proc.c | 2 - src/include/storage/lwlock.h | 5 +- src/include/storage/proc.h| 3 +- 5 files changed, 60 insertions(+), 102 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d5409a6..6401943 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid, proc-roleId = owner; proc-lwWaiting = false; proc-lwWaitMode = 0; - proc-lwWaitLink = NULL; proc-waitLock = NULL; proc-waitProcLock = NULL; for (i = 0; i NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 9fe6855..e6f9158 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -35,6 +35,7 @@ #include miscadmin.h #include pg_trace.h #include replication/slot.h +#include storage/barrier.h #include storage/ipc.h #include storage/predicate.h #include storage/proc.h @@ -115,9 +116,9 @@ inline static void PRINT_LWDEBUG(const char *where, const LWLock *lock) { if (Trace_lwlocks) - elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d, + elog(LOG, %s(%s %d): excl %d shared %d rOK %d, where, T_NAME(lock), T_ID(lock), - (int) lock-exclusive, lock-shared, lock-head, + (int) lock-exclusive, lock-shared, (int) lock-releaseOK); } @@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id) lock-exclusive = 0; lock-shared = 0; lock-tranche = tranche_id; - lock-head = NULL; - lock-tail = NULL; + dlist_init(lock-waiters); } @@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val) proc-lwWaiting = true; proc-lwWaitMode = mode; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex);
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/13 2:42), Fujii Masao wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. Agreed. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. Yes, I was thinking something like that. But if autovacuum has already been able to handle that, it's nice. Anyway, as you pointed out, it's better to have both GUC and reloption for the cleanup size of pending list. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? BTW, I moved the CommitFest entry of this patch to next CF 2014-10. Regards, -- Fujii Masao *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 5907,5912 SET XML OPTION { DOCUMENT | CONTENT }; --- 5907,5933 /listitem /varlistentry + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) + indexterm +primaryvarnamepending_list_cleanup_size/ configuration parameter/primary + /indexterm + /term + listitem +para + Sets the maximum size of the GIN pending list which is used + when literalFASTUPDATE/ is enabled. If the list grows + larger than this maximum size, it is cleaned up by moving + the entries in it to the main GIN data structure in bulk. + The default is four megabytes (literal4MB/). This setting + can be overridden for individual GIN indexes by changing + storage parameters. + See xref linkend=gin-fast-update and xref linkend=gin-tips + for more information. +/para + /listitem + /varlistentry + /variablelist /sect2 sect2 id=runtime-config-client-format *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !xref linkend=guc-pending-list-cleanup-size, the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/08/2014 10:44 AM, Michael Paquier wrote: On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: We kept the WAL files and log files for further analysis. How can we help regarding this issue? Commit c2f79ba has added as assumption that the WAL receiver should always enforce the create of .done files when WAL files are done being streamed (XLogWalRcvWrite and WalReceiverMain) or archived (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has changed a bit RemoveOldXlogFiles, removing a check looking if the node is in recovery. Now, based on the information given here yes it happens that there are still cases where .done file creation is not correctly done, leading to those extra files. Even by looking at the code, I am not directly seeing any code paths where an extra call to XLogArchiveForceDone would be needed on the WAL receiver side but... Something like the patch attached (which is clearly a band-aid) may help though as it would make files to be removed even if they are not marked as .done for a node in recovery. And this is consistent with the pre-1bd42cd. There are two mysteries here: 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. Since this only happens with streaming replication, the FF segments are probably being created by walreceiver. XLogWalRcvWrite is the function that opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite opens the file corresponding the start position in the message received from the master. There is no check that the start position is valid, though; if the master sends a start position in the FF segment, walreceiver will merrily write it. So the problem could be in the walsender side. However, I don't see anything wrong there either. I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment. 2. Why are the .done files sometimes not being created? I may have an explanation for that. Walreceiver creates a .done file when it closes an old segment and opens a new one. However, it does this only when it's about to start writing to the new segment, and still has the old segment open. If you stream the FE segment fully, but drop replication connection at exactly that point, the .done file is not created. That might sound unlikely, but it's actually pretty easy to trigger. Just do select pg_switch_xlog() in the master, followed by pg_ctl stop -m i and a restart. The creation of the .done files seems quite unreliable anyway. If only a portion of a segment is streamed, we don't write a .done file for it, so we still have the original problem that we will try to archive the segment after failover, even though the master might already have archived it. I looked again at the thread where this was discussed: http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com. I believe the idea was that the server that generates a WAL segment is always responsible for archiving it. A standby should never attempt to archive a WAL segment that was restored from the archive, or streamed from the master. In that thread, it was not discussed what should happen to WAL files that an admin manually copies into pg_xlog of the standby. Should the standby archive them? I don't think so - the admin should copy them manually to the archive too, if he wants them archived. It's a good and simple rule that the server that generates the WAL, archives the WAL. Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. What happens if a user shutdowns the standby, removes recovery.conf and starts the server as the master? In this case, no WAL files have .done status files, so the server will create .ready and archive all of them. Probably this is problematic. So even if we adopt your idea, ISTM that it's better to create .done file whenever WAL file is fullly streamed and restored. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 2014-09-29 13:38:37 -0400, Robert Haas wrote: On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost sfr...@snowman.net wrote: Lastly, I will say that I feel it'd be good to support bi-directional communication as I think it'll be needed eventually, but I'm not sure that has to happen now. I agree we need bidirectional communication; I just don't agree that the other direction should use the libpq format. The data going from the worker to the process that launched it is stuff like errors and tuples, for which we already have a wire format. The data going in the other direction is going to be things like plan trees to be executed, for which we don't. But if we can defer the issue, so much the better. Things will become clearer as we get closer to being done. I think that might be true for your usecase, but not for others. It's perfectly conceivable that one might want to ship tuples to a couple bgworkers using the COPY protocol or such. I don't think it needs to be fully implemented, but I think we should design it a way that it's unlikely to require larger changes to the added code from here to add it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 2014-09-10 16:53:12 -0400, Robert Haas wrote: diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h index 5da9d8d..0b8db42 100644 --- a/src/include/libpq/libpq.h +++ b/src/include/libpq/libpq.h @@ -37,6 +37,31 @@ typedef struct } u; } PQArgBlock; +typedef struct +{ + void (*comm_reset)(void); + int (*flush)(void); + int (*flush_if_writable)(void); + bool (*is_send_pending)(void); + int (*putmessage)(char msgtype, const char *s, size_t len); + void (*putmessage_noblock)(char msgtype, const char *s, size_t len); + void (*startcopyout)(void); + void (*endcopyout)(bool errorAbort); +} PQsendMethods; + +PQsendMethods *PqSendMethods; WRT my complaint in the other subthread, I think PQsendMethods should just be renamed to PQcommMethods or similar. That'd, in combination with a /* at some point we might want to add methods for receiving data here */ looks like it'd already satisfy my complaint ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Context lenses to set/get values in json values.
On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote: Hello. I am interested in the json type on postgresql. I would like to implement additional operations on the json structure that may extract/insert table like information from the json tree structure. I have a implementation on javascript that shows this type of operations. You can see examples in this page https://github.com/paweld2/eelnss/wiki Following the examples in the previous page, it may by possible to implement a function similar to json_populate_record to extract multiple records from a single json value, for example: select * from json_populate_records_with_clen(null::myrowtype_users, 'app.users.{:uID}.(email,data.name http://data.name,isActive)', '... nested json value ...') may return uID | email | name | isActive -- u1 | ad...@pmsoft.eu mailto:ad...@pmsoft.eu| administrator | true u2 | nor...@pmsoft.eu mailto:nor...@pmsoft.eu | user | true u3 | testu...@pmsoft.eu mailto:testu...@pmsoft.eu | testUser | false Also, assuming that we have a table User as above (uID, email, name, isActive), with context lenses it is very simple to map the table to a json object. I assume that a similar api to table_to_xml,query_to_xml may be provided: table_to_json( Person, 'app.users.{:uID}.(email,data.name http://data.name,isActive)'); query_to_json( 'select * from Person where ... ', 'app.users.{:uID}.(email,data.name http://data.name,isActive)'); I don't know the details about the integration of functions/operators to sql queries, but because context lenses maps between tables and tree objects, it may be possible to use a column json value as a separate table in the queries. Assume the table create table Person { pID Integer address Json } then it may be possible to query: select * from Person as P left join ( select * from json_populate_records_with_clen(null::addressType, 'addres.(street.number, street.local,city.code,city.name http://city.name)', P.address); A final api for such functions needs to be defined. If such functions may be usefull, I can try to prepare a implementation in postgres base code. I don't think we need to import Mongo type notation here. But there is probably a good case for some functions like: json_table_agg(anyrecord) - json which would work like json_agg() but would return an array of arrays instead of an array of objects. The caller would be assumed to know which field is which in the array. That should take care of both the table_to_json and query_to_json suggestions above. In the other direction, we could have something like: json_populate_recordset_from_table(base anyrecord, fields text[], jsontable json) - setof record where jsontable is an array of arrays of values and fields is a corresponding array of field names. I'm not sure how mainstream any of this is. Maybe an extension would be more appropriate? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On 10/08/2014 04:59 PM, Fujii Masao wrote: On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. What happens if a user shutdowns the standby, removes recovery.conf and starts the server as the master? Um, that's not a safe thing to do anyway, is it? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Context lenses to set/get values in json values.
On Wed, Oct 8, 2014 at 4:25 PM, Andrew Dunstan and...@dunslane.net wrote: On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote: Hello. I am interested in the json type on postgresql. I would like to implement additional operations on the json structure that may extract/insert table like information from the json tree structure. I have a implementation on javascript that shows this type of operations. You can see examples in this page https://github.com/paweld2/eelnss/wiki Following the examples in the previous page, it may by possible to implement a function similar to json_populate_record to extract multiple records from a single json value, for example: select * from json_populate_records_with_clen(null::myrowtype_users, 'app.users.{:uID}.(email,data.name http://data.name,isActive)', '... nested json value ...') may return uID | email | name | isActive -- u1 | ad...@pmsoft.eu mailto:ad...@pmsoft.eu| administrator | true u2 | nor...@pmsoft.eu mailto:nor...@pmsoft.eu | user | true u3 | testu...@pmsoft.eu mailto:testu...@pmsoft.eu | testUser | false Also, assuming that we have a table User as above (uID, email, name, isActive), with context lenses it is very simple to map the table to a json object. I assume that a similar api to table_to_xml,query_to_xml may be provided: table_to_json( Person, 'app.users.{:uID}.(email,data.name http://data.name,isActive)'); query_to_json( 'select * from Person where ... ', 'app.users.{:uID}.(email,data.name http://data.name,isActive)'); I don't know the details about the integration of functions/operators to sql queries, but because context lenses maps between tables and tree objects, it may be possible to use a column json value as a separate table in the queries. Assume the table create table Person { pID Integer address Json } then it may be possible to query: select * from Person as P left join ( select * from json_populate_records_with_clen(null::addressType, 'addres.(street.number, street.local,city.code,city.name http://city.name)', P.address); A final api for such functions needs to be defined. If such functions may be usefull, I can try to prepare a implementation in postgres base code. I don't think we need to import Mongo type notation here. But there is probably a good case for some functions like: json_table_agg(anyrecord) - json which would work like json_agg() but would return an array of arrays instead of an array of objects. The caller would be assumed to know which field is which in the array. That should take care of both the table_to_json and query_to_json suggestions above. In the other direction, we could have something like: json_populate_recordset_from_table(base anyrecord, fields text[], jsontable json) - setof record where jsontable is an array of arrays of values and fields is a corresponding array of field names. I'm not sure how mainstream any of this is. Maybe an extension would be more appropriate? cheers andrew Hello. My personal interest is to send updates to a single json value in the server. Which is the best way to make a update to a json value in postgres without a full update of the already stored value?? the - operator extract a internal value, but to update the value I don't see any operator. I was not familiar with the extensions, but it looks like the best way to start is to create a extension with possible implementations of new functions. I will do so. In my project I considered to use mongo, but in my case the core part of the model match perfectly a relational schema. I have some leaf concepts that will change frequently, and to avoid migrations I store that information in a json value. To make changes in such leaf values I would like to have a context lenses like api in the server. I will start with some toy extension and try to feel if this make sense. Regards. Pawel.
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 13:13:33 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. Really? I've spent more than a day debugging things with the current code. And Heikki introduced a bug in it. If you look at how the code looks before/after I find the difference pretty clear. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. Hm. I can just do that with the debugger here. Not sure if that's because I added the right thing to my .gdbinit or because I use the correct compiler flags. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. As far as I recall you can get gdb to understand those pointer games by defining some structs or macros. Maybe we can improve by documenting this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Context lenses to set/get values in json values.
On 10/08/2014 12:13 PM, Paweł Cesar Sanjuan Szklarz wrote: I don't think we need to import Mongo type notation here. But there is probably a good case for some functions like: json_table_agg(anyrecord) - json which would work like json_agg() but would return an array of arrays instead of an array of objects. The caller would be assumed to know which field is which in the array. That should take care of both the table_to_json and query_to_json suggestions above. In the other direction, we could have something like: json_populate_recordset_from_table(base anyrecord, fields text[], jsontable json) - setof record where jsontable is an array of arrays of values and fields is a corresponding array of field names. I'm not sure how mainstream any of this is. Maybe an extension would be more appropriate? Hello. My personal interest is to send updates to a single json value in the server. Which is the best way to make a update to a json value in postgres without a full update of the already stored value?? the - operator extract a internal value, but to update the value I don't see any operator. I was not familiar with the extensions, but it looks like the best way to start is to create a extension with possible implementations of new functions. I will do so. In my project I considered to use mongo, but in my case the core part of the model match perfectly a relational schema. I have some leaf concepts that will change frequently, and to avoid migrations I store that information in a json value. To make changes in such leaf values I would like to have a context lenses like api in the server. I will start with some toy extension and try to feel if this make sense. There is work already being done on providing update operations. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Hi, Am Samstag, den 04.10.2014, 15:05 -0500 schrieb Jim Nasby: On 10/4/14, 1:21 PM, Jeff Janes wrote: On Thu, Oct 2, 2014 at 6:21 AM, Michael Banck wrote: we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. As no message besides shutting down is written to the server log in this case, we even had users believing the server was hanging and pondering killing it manually. Wouldn't a better place to write this message be the terminal from which pg_ctl stop was invoked, rather than the server log file? Looking at it from a DBA perspective, this would indeed be better, yes. However, I see a few issues with that: 1. If you are using an init script (or another wrapper around pg_ctl), you don't get any of its output it seems. 2. Having taken a quick look at pg_ctl, it seems to just kill the postmaster on shutdown and wait for its PID file to disappear. I don't see how it should figure out that PostgreSQL is waiting for a checkpoint to be finished? Or do both. I suspect elog( INFO, ... ) might do that. That would imply that pg_ctl receives and writes out log messages directed at clients, which I don't think is true? Even if it was, client_min_messages does not include an INFO level, and LOG is not being logged to clients by default. So the first common level above the default of both client_min_messages and log_min_messages would be WARNING, which seems excessive to me. As I said, I only took a quick look at pg_ctl though, so I might well be missing something. Michael -- Michael Banck Projektleiter / Berater Tel.: +49 (2161) 4643-171 Fax: +49 (2161) 4643-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote: Robert Haas wrote: On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund and...@2ndquadrant.com wrote: I don't see that as being relevant. The difference is an instruction or two - in the slow path we'll enter the kernel and sleep. This doesn't matter in comparison. And the code is *so* much more readable. I find the slist/dlist stuff actually quite difficult to get right compared to a hand-rolled linked list. But the really big problem is that the debugger can't do anything useful with it. You have to work out the structure-member offset in order to walk the list and manually cast to char *, adjust the pointer, and cast back. That sucks. As far as I recall you can get gdb to understand those pointer games by defining some structs or macros. Maybe we can improve by documenting this. So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs si...@2ndquadrant.com wrote: Having said that, it would be much nicer to have a mode that allows you to just say the word UPDATE and have it copy the data into the correct columns, like MySQL does. That is very intuitive, even if it isn't very flexible. I thought about this, and at first I agreed, but now I'm not so sure. Consider the case where you write an INSERT ... ON CONFLICT UPDATE ALL query, or however we might spell this idea. 1. Developer writes the query, and it works fine. 2. Some time later, the DBA adds an inserted_at column (those are common). The DBA is not aware of the existence of this particular query. The new column has a default value of now(), say. Should we UPDATE the inserted_at column to be NULL? Or (more plausibly) the default value filled in by the INSERT? Or leave it be? I think there is a case to be made for all of these behaviors, and that tension makes me prefer to not do this at all. It's like encouraging SELECT * queries in production, only worse. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Context lenses to set/get values in json values.
On 10/08/2014 02:04 PM, Thom Brown wrote: There is work already being done on providing update operations. I've been looking out for that. Has there been a discussion on how that would look yet that you could point me to? https://github.com/erthalion/jsonbx Note that a) it's an extension, and b) it's jsonb only. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-08 15:23:22 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. I think it's a really bad idea to mix a refactoring change (like converting PGPROC-lwWaitLink into a dlist) with an attempted performance enhancement (like changing the rules for jumping the lock queue) and a bug fix (like adding pg_write_barrier where needed). I'd suggest that the last of those be done first, and perhaps back-patched. I think it makes sense to separate out the write barrier one. I don't really see the point of separating the other two changes. I've indeed previously started a thread (http://archives.postgresql.org/message-id/20140210134625.GA15246%40awork2.anarazel.de) about the barrier issue. IIRC you argued that that might be to expensive. The current coding, using a hand-rolled list, touches shared memory fewer times. When many waiters are awoken at once, we clip them all out of the list at one go. Your revision moves them to a backend-private list one at a time, and then pops them off one at a time. The backend-private memory accesses don't seem like they matter much, but the shared memory accesses would be nice to avoid. I can't imagine this to matter. We're entering the kernel for each PROC for the PGSemaphoreUnlock() and we're dirtying the cacheline for proc-lwWaiting = false anyway. This really is the slow path. Does LWLockUpdateVar's wake-up loop need a write barrier per iteration, or just one before the loop starts? How about commenting the pg_write_barrier() with the read-fence to which it pairs? Hm. Are you picking out LWLockUpdateVar for a reason or just as an example? Because I don't see a difference between the different wakeup loops? It needs to be a barrier per iteration. Currently the loop looks like while (head != NULL) { proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } Consider what happens if either the compiler or the cpu reorders this to: proc-lwWaiting = false; head = proc-lwWaitLink; proc-lwWaitLink = NULL; PGSemaphoreUnlock(proc-sem); as soon as lwWaiting = false, 'proc' can wake up and acquire a new lock. Backends can wake up prematurely because proc-sem is used for other purposes than this (that's why the loops around PGSemaphoreLock exist). Then it could reset lwWaitLink while acquiring a new lock. And some processes wouldn't be woken up anymore. The barrier it pairs with is the spinlock acquiration before requeuing. To be more obviously correct we could add a read barrier before if (!proc-lwWaiting) break; but I don't think it's needed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: 2) Implement the wait free LW_SHARED algorithm. + * too high for workloads/locks that were locked in shared mode very s/locked/taken/? + * frequently. Often we were spinning in the (obviously exlusive) spinlock, exclusive. + * acquiration for locks that aren't exclusively locked. acquisition. + * For exlusive lock acquisition we use an atomic compare-and-exchange on the exclusive. + * lockcount variable swapping in EXCLUSIVE_LOCK/131-1/0x7FFF if and only Add comma after variable. Find some way of describing the special value (maybe a sentinel value, EXCLUSIVE_LOCK) just once, instead of three times. + * if the current value of lockcount is 0. If the swap was not successfull, we successful. + * by 1 again. If so, we have to wait for the exlusive locker to release the exclusive. + * The attentive reader probably might have noticed that naively doing the probably might is redundant. Delete probably. + * notice that we have to wait. Unfortunately until we have finished queuing, until - by the time + * Phase 2: Add us too the waitqueue of the lock too - to. And maybe us - ourselves. + *get queued in Phase 2 and we can wake them up if neccessary or they will necessary. + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. disturb is not clear enough. +/* yipeyyahee */ Although this will be clear to individuals with a good command of English, I suggest avoiding such usages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 6:25 AM, Simon Riggs si...@2ndquadrant.com wrote: I change my view on this, after some more thought. (Hope that helps) Great. If we implement MERGE, I can see we may also wish to implement MERGE CONCURRENTLY one day. That would be different to UPSERT. So in the future I think we will need 3 commands 1. MERGE 2. MERGE CONCURRENTLY 3. UPSERT So we no longer need to have the command start with the MERGE keyword. As I've outlined, I don't see how MERGE CONCURRENTLY could ever work, but I'm glad that you agree that it should not block this effort (or indeed, some later effort to implement a MERGE that is comparable to the implementations of other database systems). We will one day have MERGE according to the SQL Standard. Agreed. My opinion is that syntax for this should be similar to MERGE in the *body* of the command, rather than some completely different syntax. e.g. WHEN NOT MATCHED THEN INSERT WHEN MATCHED THEN UPDATE I'm happy that we put that to a vote on what the syntax should be, as long as we bear in mind that we will one day have MERGE as well. While I am also happy with taking a vote, if we do so I vote against even this much less MERGE-like syntax. It's verbose, and makes much less sense when the mechanism is driven by would-be duplicate key violations rather than an outer join. I also like that when you UPSERT with the proposed ON CONFLICT UPDATE syntax, you get all the flexibility of an INSERT - you can use data-modifying CTEs, and nest the statement in a data-modifying CTE, and INSERT ... SELECT... ON CONFLICT UPDATE ... . And to be honest, it's much simpler to implement this whole feature as an adjunct to how INSERT statements are currently processed (during parse analysis, planning and execution); I don't want to make the syntax work against that. For example, consider how little I had to change the grammar to make all of this work: $ git diff master --stat | grep gram src/backend/parser/gram.y | 72 ++- The code footprint of this patch is relatively small, and I think we can all agree that that's a good thing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 08/10/14 21:03, Robert Haas wrote: On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund and...@2ndquadrant.com wrote: WRT my complaint in the other subthread, I think PQsendMethods should just be renamed to PQcommMethods or similar. That'd, in combination with a /* at some point we might want to add methods for receiving data here */ looks like it'd already satisfy my complaint ;) Well, I'm happy enough to go ahead and change that, if that's all it takes to make you happy. Is it? So far the status of these patches AIUI is: #1 - Just committed, per discussion last week. #2 - No comments, possibly because it's pretty boring. Yeah there is nothing much to say there, it can just go in. #4 - No comments. I think I said I like it which I meant as I think it's good as it is, no objections to committing that one either. #6 - Updated per Amit's feedback. Not sure if anyone else wants to review. Still needs docs. I'd like to see this one when it's updated since I lost track a little about how the changes looks like exactly. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote: I think the problem is that it's not possible to respect the usual guarantees even at READ COMMITTED level when performing an INSERT OR UPDATE operation (however spelled). You may find that there's a tuple with the same PK which is committed but not visible to the snapshot you took at the beginning of the statement. Can you please comment on this, Kevin? It would be nice to converge on an agreement on syntax here (without necessarily working out the exact details right away, including for example the exact spelling of what I'm calling WITHIN). Since Simon has changed his mind on MERGE (while still wanting to retain a superficial flavor of MERGE, a flavor that does not include the MERGE keyword), I think that leaves you as the only advocate for the MERGE syntax. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 08/10/14 11:28, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp ma...@juffo.org wrote: But the MERGE syntax, to me, strongly implies that insertion doesn't begin before determining whether a conflict exists or not. I think you're right. Another strike against the MERGE syntax, then, since as I said we cannot even know what to check prior to having before row insert triggers fire. True, but to me it also seems to be strike against using INSERT for this as I don't really see how you can make triggers work in a sane way if the UPSERT is implemented as part of INSERT (at least I haven't seen any proposal that I would consider sane from the user point of view). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs si...@2ndquadrant.com wrote: IIRC it wasn't agreed that we needed to identify which indexes in the upsert SQL statement itself, since this would be possible in other ways and would require programmers to know which unique constraints are declared. Kevin seemed quite concerned about that. That is something that seems hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can comment on that, since he was in favor of both being able to specify user intent by accepting a unique index, while also being in favor of the MERGE syntax. Well, I mostly wanted to make sure we properly considered what the implications were of using the standard syntax without other keywords or decorations before deciding to go the non-standard route. In spite of an alarming tendency for people to assume that meant that I didn't understand the desired semantics, I feel enough people have understood the question and weighed in in favor of an explicit choice between semantics, rather than inferring concurrency handling based on the availability of the index necessary for the slicker behavior. I'm willing to concede that overall consensus is leaning toward the view that UPSERT semantics should be conditioned on explicit syntax; I'll drop that much going forward. Granting that, I will say that I lean toward either the MERGE syntax with CONCURRENTLY being the flag to use UPSERT semantics, or a separate UPSERT command which is as close to identical to the MERGE syntax (other than the opening verb) as possible. I see that as still needing the ON clause so that you can specify which values match which columns from the target table. I'm fine with starting with the syntax in the standard, which has no DELETE or IGNORE options (as of the latest version I've seen). So the syntax I'm suggesting is close to what Simon is suggesting, but a more compliant form would be: MERGE CONCURRENTLY INTO foo USING (VALUES (valuelist) aliases) ON (conditions) WHEN NOT MATCHED INSERT [ (columnlist) ] VALUES (valuelist) WHEN MATCHED UPDATE SET colname = expression [, ...] Rather than pseudo-randomly picking a unique index or using a constraint or index name, the ON condition would need to allow matching based on equality to all columns of a unique index which only referenced NOT NULL columns; we would pick an index which matched those conditions. In any event, the unique index would be required if CONCURRENTLY was specified. Using column matching to pick the index (like we do when specifying a FOREIGN KEY constraint) is more in keeping with other SQL statements, and seems generally safer to me. It would also make it fairly painless for people to switch concurrency techniques for what is, after all, exactly the same operation except for differences in handling of concurrent conflicting DML. As I said, I'm also OK with using UPSERT in place of MERGE CONCURRENTLY. I also feel that if we could allow: USING (VALUES (valuelist) [, ...]) that would be great. In fact, I don't see why that can't be pretty much any relation, but it doesn't have to be for a first cut. A relation would allow a temporary table to be loaded with a batch of rows where the intent is to UPSERT every row in the batch, without needing to write a loop to do it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 8 October 2014 21:16, Peter Geoghegan p...@heroku.com wrote: My opinion is that syntax for this should be similar to MERGE in the *body* of the command, rather than some completely different syntax. e.g. WHEN NOT MATCHED THEN INSERT WHEN MATCHED THEN UPDATE I'm happy that we put that to a vote on what the syntax should be, as long as we bear in mind that we will one day have MERGE as well. While I am also happy with taking a vote, if we do so I vote against even this much less MERGE-like syntax. It's verbose, and makes much less sense when the mechanism is driven by would-be duplicate key violations rather than an outer join. It wouldn't be driven by an outer join, not sure where that comes from. MERGE is verbose, I agree. I don't always like the SQL Standard, I just think we should follow it as much as possible. You can't change the fact that MERGE exists, so I don't see a reason to have two variants of syntax that do roughly the same thing. MERGE syntax would allow many things, such as this... WHEN NOT MATCHED AND x 7 THEN INSERT WHEN NOT MATCHED THEN INSERT WHEN MATCHED AND y = 5 THEN DO NOTHING WHEN MATCHED THEN UPDATE etc I also like that when you UPSERT with the proposed ON CONFLICT UPDATE syntax, you get all the flexibility of an INSERT - you can use data-modifying CTEs, and nest the statement in a data-modifying CTE, and INSERT ... SELECT... ON CONFLICT UPDATE ... . And to be honest, it's much simpler to implement this whole feature as an adjunct to how INSERT statements are currently processed (during parse analysis, planning and execution); I don't want to make the syntax work against that. I spoke to someone today that preferred a new command keyword, like UPSERT, because the semantics of triggers are weird. Having a before insert trigger fire when there is no insert is quite strange. Properly documenting that on hackers would help; has the comments made on the Django list been replayed here in some form? I'm very scared by your comments about data modifying CTEs etc.. You have no definition of how they will work, not tests of that. That part isn't looking like a benefit as things currently stand. I'm still waiting for some more docs to describe your intentions so they can be reviewed. Also, it would be useful to hear that your're going to do something about the references to rows using conflicting(), since nobody has agreed with you there. Or hopefully even that you've listened and implemented something differently already. (We need that, whatever we do with other elements of syntax). Overall, I'm not seeing too many comments that indicate you are accepting review comments and acting upon them. If you want acceptance from others, you need to begin with some yourself. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote: I think the problem is that it's not possible to respect the usual guarantees even at READ COMMITTED level when performing an INSERT OR UPDATE operation (however spelled). You may find that there's a tuple with the same PK which is committed but not visible to the snapshot you took at the beginning of the statement. Can you please comment on this, Kevin? It would be nice to converge on an agreement on syntax here Robert said however spelled -- which I take to mean that he at least sees that the MERGE-like UPSERT syntax can be turned into the desired semantics. I have no idea why anyone would think otherwise. Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me, but if the consensus is that we want to do that, it does argue for your plan of making UPSERT a variant of the INSERT command. That doesn't seem as clean to me as saying (for example): UPSERT targettable t USING (VALUES (123, 100)) x(id, quantity) ON t.id = x.id WHEN NOT MATCHED INSERT (id, quantity, inserted_at) VALUES (x.id, x.quantity, now()) WHEN MATCHED UPDATE SET quantity = t.quantity + x.quantity, updated_at = now(); As I understand it you are proposing that would be: INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON CONFLICT WITHIN targettable_pkey UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now(); -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote: Hmm. So the simple, non-table driven, calculation gives the same result as using the lookup table using the reflected lookup code. That's expected; the lookup method is supposed to be the same, just faster. However, using the normal lookup code, but with a reflected lookup table, produces the same result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But AFAICS, that's an incorrect combination. You're supposed to the non-reflected lookup table with the non-reflected lookup code; you can't mix and match. As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't correspond to any bit-by-bit CRC variant and polynomial. My math skills are not strong enough to determine what the consequences of that are. It might still be a decent checksum. Or not. I couldn't tell if the good error detection properties of the normal CRC-32 polynomial apply to our algorithm or not. Additional interesting datapoints are that hstore and ltree contain the same tables - but properly use the reflected computation. Thoughts? It clearly seems like a bad idea to continue with this - I don't think anybody here knows which guarantees this gives us. The question is how can we move away from this. There's unfortunately two places that embed PGC32 that are likely to prove problematic when fixing the algorithm: pg_trgm and tsgist both seem to include crc's in their logic in a persistent way. I think we should provide INIT/COMP/FIN_PG32 using the current algorithm for these. If we're switching to a saner computation, we should imo also switch to a better polynom - CRC-32C has better error detection capabilities than CRC32 and is available in hardware. As we're paying the price pf breaking compat anyway... Arguably we could also say that given that there's been little evident problems with the borked computation we could also switch to a much faster hash instead of continuing to use crc... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
About patch 1, which I just pushed, there were two reviews: Andres Freund wrote: On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote: diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h new file mode 100644 index 000..520b066 --- /dev/null +++ b/src/include/utils/ruleutils.h I wondered for a minute whether any of these are likely to cause problems for code just including builtins.h - but I think there will be sufficiently few callers for them to make that not much of a concern. Great. Michael Paquier wrote: Patch 1: I still like this patch as it gives a clear separation of the built-in functions and the sub-functions of ruleutils.c that are completely independent. Have you considered adding the external declaration of quote_all_identifiers as well? It is true that this impacts extensions (some of my stuff as well), but my point is to bite the bullet and make the separation cleaner between builtins.h and ruleutils.h. Attached is a patch that can be applied on top of patch 1 doing so... Feel free to discard for the potential breakage this would create though. Yes, I did consider moving the quoting function definitions and the global out of builtins.h. It is much more disruptive, however, because it affects a lot of external code not only our own; and given the looks I got after people had to mess with adding the htup_details.h header to external projects due to the refactoring I did to htup.h in an earlier release, I'm not sure about it. However, if I were to do it, I would instead create a quote.h file and would also add the quote_literal_cstr() prototype to it, perhaps even move the implementations from their current ruleutils.c location to quote.c. (Why is half the stuff in ruleutils.c rather than quote.c is beyond me.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
Peter Geoghegan p...@heroku.com wrote: On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Instead of naming the index, you should name the columns, and the system can look up the index or indexes that match those columns. +1 That is what I have been consistently requesting instead of index names, every time I notice it mentioned. It's not totally clear that we need *any* WITHIN clause, BTW. I'm not dead set on it. It was something I mainly added at Kevin's request. I do see the risks, though. What I said was in response to your assertion that your technique would *never* generate a duplicate key error. As others have again been pointing out, when there is more than one unique index you can have rows to apply which cannot be applied accurately without causing such an error. What I requested was that the behavior in those cases be clear and documented. I didn't take a position on whether you pick an index or ignore the input row (with a warning?), but we need to decide how it is handled. I have made the same point Heikki is making, though -- we have no business referencing an index name in the statement. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs si...@2ndquadrant.com wrote: I hope we can get pgAudit in as a module for 9.5. I also hope that it will stimulate the requirements/funding of further work in this area, rather than squash it. My feeling is we have more examples of feature sets that grow over time (replication, view handling, hstore/JSONB etc) than we have examples of things languishing in need of attention (partitioning). +1 To this point, specifically, I'll volunteer to find time in Novemeber to review pgAudit for inclusion as a contrib module. I feel a bit responsible for it not being properly reviewed earlier due to my upgrade and similar concerns. Perhaps the latest version should be posted and added to the commitfest for 2014-10 and I'll put myself down as a reviewer..? I don't see it there now. I don't mean to be dismissive by suggesting it be added to the commitfest- I honestly don't see myself having time before November given the other things I'm involved in right now and pgConf.eu happening in a few weeks. Of course, if someone else has time to review, please do. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, Oct 8, 2014 at 2:50 PM, Kevin Grittner kgri...@ymail.com wrote: What I said was in response to your assertion that your technique would *never* generate a duplicate key error. That is strictly true: the INSERT cannot raise a unique violation error, because to do so would cause it to take the UPDATE path (assuming there is no WITHIN clause). The UPDATE may get one, though. It doesn't have to get one for your statement to have effects that are surprising, though. As others have again been pointing out, when there is more than one unique index you can have rows to apply which cannot be applied accurately without causing such an error. It's simpler than that. You want to make sure that the right condition - the right unique index having a would-be duplicate violation - is the one taken *in practice*, the condition on which the UPDATE path is taken. What happens after that is not that interesting (e.g. whether or not there is an UPDATE-path duplicate violation), because either way it's broken. What I requested was that the behavior in those cases be clear and documented. I didn't take a position on whether you pick an index or ignore the input row (with a warning?), but we need to decide how it is handled. I have made the same point Heikki is making, though -- we have no business referencing an index name in the statement. I think that's fair enough. That's all fine - but actually doing that is quite tricky. Look at what I've said in relation to doing that. Consider the corner-cases I name. They're certainly only a small minority of cases in practice, but as an implementer I still need to worry about them (maybe even just as much). Yes, I would like to just name columns, but it's hard to make that fully generally. If you want me to do what you say, fine. But in order to do that, I need support for having the corner cases make parse analysis throw up its hands and error. Is that a price you're willing to pay? If so, then I'll implement it. Or, alternatively, we could do WITHIN PRIMARY key/NOT WITHIN PRIMARY KEY. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 09/10/14 10:13, Andres Freund wrote: On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote: Hmm. So the simple, non-table driven, calculation gives the same result as using the lookup table using the reflected lookup code. That's expected; the lookup method is supposed to be the same, just faster. However, using the normal lookup code, but with a reflected lookup table, produces the same result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But AFAICS, that's an incorrect combination. You're supposed to the non-reflected lookup table with the non-reflected lookup code; you can't mix and match. As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't correspond to any bit-by-bit CRC variant and polynomial. My math skills are not strong enough to determine what the consequences of that are. It might still be a decent checksum. Or not. I couldn't tell if the good error detection properties of the normal CRC-32 polynomial apply to our algorithm or not. Additional interesting datapoints are that hstore and ltree contain the same tables - but properly use the reflected computation. Thoughts? It clearly seems like a bad idea to continue with this - I don't think anybody here knows which guarantees this gives us. The question is how can we move away from this. There's unfortunately two places that embed PGC32 that are likely to prove problematic when fixing the algorithm: pg_trgm and tsgist both seem to include crc's in their logic in a persistent way. I think we should provide INIT/COMP/FIN_PG32 using the current algorithm for these. If we're switching to a saner computation, we should imo also switch to a better polynom - CRC-32C has better error detection capabilities than CRC32 and is available in hardware. As we're paying the price pf breaking compat anyway... Arguably we could also say that given that there's been little evident problems with the borked computation we could also switch to a much faster hash instead of continuing to use crc... Greetings, Andres Freund Could a 64 bit variant of some kind be useful as an option - in addition to a correct 32 bit? As most people have 64 bit processors and storage is less constrained now-a-days, as well as we tend to store much larger chunks of data. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote: While I am also happy with taking a vote, if we do so I vote against even this much less MERGE-like syntax. It's verbose, and makes much less sense when the mechanism is driven by would-be duplicate key violations rather than an outer join. It wouldn't be driven by an outer join, not sure where that comes from. Right, I understood that it wouldn't be - which is the point. So with an UPSERT that retains influence from MERGE, NOT MATCHED means no conflict, MATCHED means conflict. That just seems like an odd way to spell the concept given, as you say, that we're not talking about an outer join. MERGE is verbose, I agree. I don't always like the SQL Standard, I just think we should follow it as much as possible. You can't change the fact that MERGE exists, so I don't see a reason to have two variants of syntax that do roughly the same thing. MERGE syntax would allow many things, such as this... WHEN NOT MATCHED AND x 7 THEN INSERT WHEN NOT MATCHED THEN INSERT WHEN MATCHED AND y = 5 THEN DO NOTHING WHEN MATCHED THEN UPDATE etc But then you can have before row insert triggers fire, which as you acknowledge is more surprising with this syntax. I spoke to someone today that preferred a new command keyword, like UPSERT, because the semantics of triggers are weird. Having a before insert trigger fire when there is no insert is quite strange. Properly documenting that on hackers would help; has the comments made on the Django list been replayed here in some form? Yes. It's also mentioned in the commit message of CONFLICTING() (patch 0003-*). And the documentation (both the proposed INSERT documentation, and the trigger documentation). There is a large comment on it in the code. So I've said it many times. I'm very scared by your comments about data modifying CTEs etc.. You have no definition of how they will work, not tests of that. That part isn't looking like a benefit as things currently stand. Actually, I have a few smoke tests for that. But I don't see any need for special handling. When you have a data-modifying CTE, it can contain an INSERT, and there are no special restrictions on that INSERT (other than that it may not itself have a CTE, but that's true more generally). You can have data-modifying CTEs containing INSERTs, and data-modifying CTEs containing UPDATEswhat I've done is have data-modifying CTEs contain INSERTs that also happen to have an ON CONFLICT UPDATE clause. This new clause of INSERTs is in no more need of special documentation regarding interactions with data-modifying CTEs than UPDATE WHERE CURRENT OF is. The only possible exception I can think of would be cardinality violations where a vanilla INSERT in one part of a command (one data-modifying CTE) gives problems to the UPSERT part of the same command (because we give a special cardinality violation message when we try to update the same tuple twice in the same command). But that's a pretty imaginative complaint, and I doubt it would really surprise someone. Why would you be surprised by the fact that a new clause for INSERT plays nicely with existing clauses? It's nothing special - there is no special handling. I'm still waiting for some more docs to describe your intentions so they can be reviewed. I think it would be useful to add several more isolation tests, highlighting some of the cases you talked about. I'll work on that. While the way forward for WITHIN isn't clear, I think a WITHIN PRIMARY KEY variant would certainly be useful. Maybe it would be okay to forget about naming a specific unique index, while supporting an (optional) WITHIN PRIMARY KEY/NOT WITHIN PRIMARY KEY. It doesn't totally solve the problems, but may be a good compromise that mostly satisfies people that want to be able to clearly indicate user intent (Kevin, in particular), and satisfies other people that don't want to name a unique index (Heikki, in particular). Certainly, the Django people would like that, since they said as much. Also, it would be useful to hear that your're going to do something about the references to rows using conflicting(), since nobody has agreed with you there. Or hopefully even that you've listened and implemented something differently already. (We need that, whatever we do with other elements of syntax). Do you really expect me to do major work on some aspect of the syntax like that, given, as you say, that nobody explicitly agreed with me (and only you disagreed with me)? The only remark I heard on that was from you (you'd prefer to use NEW.* and OLD.*). But you spent much more time talking about getting something MERGE-like, which NEW.*/OLD.* clearly isn't. CONFLICTING() is very close (identical?) to MySQL's use of ON DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that, but it's news to me that people take particular issue with it. Overall, I'm not seeing too many comments
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
/* + * Arrange to remove a dynamic shared memory mapping at cleanup time. + * + * dsm_keep_mapping() can be used to preserve a mapping for the entire + * lifetime of a process; this function reverses that decision, making + * the segment owned by the current resource owner. This may be useful + * just before performing some operation that will invalidate the segment + * for future use by this backend. + */ +void +dsm_unkeep_mapping(dsm_segment *seg) +{ + Assert(seg-resowner == NULL); + ResourceOwnerEnlargeDSMs(CurrentResourceOwner); + seg-resowner = CurrentResourceOwner; + ResourceOwnerRememberDSM(seg-resowner, seg); +} Hm, I dislike the name unkeep. I guess you want to be symmetric to dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup() dm_remember_mapping()? From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001 From: Robert Haas rh...@postgresql.org Date: Fri, 11 Jul 2014 09:53:40 -0400 Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and get the results. The currently-active GUC values from the user session will be copied to the background worker. If the command returns a result set, you can retrieve the result set; if not, you can retrieve the command tags. If the command fails with an error, the same error will be thrown in the launching process when the results are retrieved. Warnings and other messages generated by the background worker, and notifications received by it, are also propagated to the foreground process. I got to ask: Why is it helpful that we have this in contrib? I have a good share of blame to bear for that, but I think we need to stop dilluting contrib evermore with test programs. These have a place, but I don't think it should be contrib. +/* Table-of-contents constants for our dynamic shared memory segment. */ +#define PG_BACKGROUND_MAGIC 0x50674267 +#define PG_BACKGROUND_KEY_FIXED_DATA 0 +#define PG_BACKGROUND_KEY_SQL1 +#define PG_BACKGROUND_KEY_GUC2 +#define PG_BACKGROUND_KEY_QUEUE 3 +#define PG_BACKGROUND_NKEYS 4 + +/* Fixed-size data passed via our dynamic shared memory segment. */ +typedef struct pg_background_fixed_data +{ + Oid database_id; + Oid authenticated_user_id; + Oid current_user_id; + int sec_context; + char database[NAMEDATALEN]; + char authenticated_user[NAMEDATALEN]; +} pg_background_fixed_data; Why not NameData? +/* Private state maintained by the launching backend for IPC. */ +typedef struct pg_background_worker_info +{ + pid_t pid; + dsm_segment *seg; + BackgroundWorkerHandle *handle; + shm_mq_handle *responseq; + boolconsumed; +} pg_background_worker_info; whitespace damage. +static HTAB *worker_hash; Hm. So the lifetime of this hash's contents is managed via on_dsm_detach(), do I understand that correctly? +PG_FUNCTION_INFO_V1(pg_background_launch); +PG_FUNCTION_INFO_V1(pg_background_result); +PG_FUNCTION_INFO_V1(pg_background_detach); +void pg_background_worker_main(Datum); + +/* + * Start a dynamic background worker to run a user-specified SQL command. + */ +Datum +pg_background_launch(PG_FUNCTION_ARGS) +{ + text *sql = PG_GETARG_TEXT_PP(0); + int32 queue_size = PG_GETARG_INT64(1); + int32 sql_len = VARSIZE_ANY_EXHDR(sql); + Sizeguc_len; + Sizesegsize; + dsm_segment *seg; + shm_toc_estimator e; + shm_toc*toc; +char*sqlp; wrong indentation. + char *gucstate; + shm_mq *mq; + BackgroundWorker worker; + BackgroundWorkerHandle *worker_handle; + pg_background_fixed_data *fdata; + pid_t pid; + shm_mq_handle *responseq; + MemoryContext oldcontext; + + /* Ensure a valid queue size. */ + if (queue_size 0 || ((uint64) queue_size) shm_mq_minimum_size) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(queue size must be at least %zu bytes, + shm_mq_minimum_size))); Hm. So every user can do this once the extension is created as the functions are most likely to be PUBLIC. Is this a good idea? + /* Wait for the worker to start. */ + switch (WaitForBackgroundWorkerStartup(worker_handle, pid)) + { + case BGWH_STARTED: + /* Success. */ + break; + case BGWH_STOPPED: + pfree(worker_handle); + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_RESOURCES), + errmsg(could not start background
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me, but if the consensus is that we want to do that, it does argue for your plan of making UPSERT a variant of the INSERT command. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. We need the before row insert triggers to fire to figure out what to insert (or if we should update instead). No way around that. At the same time, those triggers are at liberty to do almost anything, and so in general we have no way of totally nullifying their effects (or side effects). Surely you see the dilemma. As I understand it you are proposing that would be: INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON CONFLICT WITHIN targettable_pkey UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now(); That's right, yes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Wed, Oct 8, 2014 at 11:38 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/08/2014 04:59 PM, Fujii Masao wrote: On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. What happens if a user shutdowns the standby, removes recovery.conf and starts the server as the master? Um, that's not a safe thing to do anyway, is it? That's not safe as it bypasses all the consistency checks of promotion. Now, it is also something that repmgr for example does as far as I recall to do a node promotion. What if we simply document the problem properly then? The apparition of those phantom WAL files is more scary than a user or a utility that does a promotion with a server restart. Not to mention as well that users as free to add themselves files to pg_xlog. -- Michael
Re: [HACKERS] Add CREATE support to event triggers
On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: About patch 1, which I just pushed, there were two reviews: Andres Freund wrote: On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote: diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h new file mode 100644 index 000..520b066 --- /dev/null +++ b/src/include/utils/ruleutils.h I wondered for a minute whether any of these are likely to cause problems for code just including builtins.h - but I think there will be sufficiently few callers for them to make that not much of a concern. Great. Michael Paquier wrote: Patch 1: I still like this patch as it gives a clear separation of the built-in functions and the sub-functions of ruleutils.c that are completely independent. Have you considered adding the external declaration of quote_all_identifiers as well? It is true that this impacts extensions (some of my stuff as well), but my point is to bite the bullet and make the separation cleaner between builtins.h and ruleutils.h. Attached is a patch that can be applied on top of patch 1 doing so... Feel free to discard for the potential breakage this would create though. Yes, I did consider moving the quoting function definitions and the global out of builtins.h. It is much more disruptive, however, because it affects a lot of external code not only our own; and given the looks I got after people had to mess with adding the htup_details.h header to external projects due to the refactoring I did to htup.h in an earlier release, I'm not sure about it. However, if I were to do it, I would instead create a quote.h file and would also add the quote_literal_cstr() prototype to it, perhaps even move the implementations from their current ruleutils.c location to quote.c. (Why is half the stuff in ruleutils.c rather than quote.c is beyond me.) Yes, an extra header file would be cleaner. Now if we are worried about creating much incompatibilities with existing extension (IMO that's not something core should worry much about), then it is better not to do it. Now, if I can vote on it, I think it is worth doing in order to have builtins.h only contain only Datum foo(PG_FUNCTION_ARGS), and move all the quote-related things in quote.c. -- Michael
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me Indeed, the current behavior breaks even the canonical keep track of how many posts are in a thread trigger example because INSERT triggers are fired for an effective row UPDATE. I can't see how that's acceptable. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. I proposed an alternative that avoids this surprise and might allow some other benefits. Can you please look into that? Even a clarify this, this couldn't possibly work or you're not a major contributor so your arguments are invalid response. ;) I admit I initially misunderstood some details about the current behavior. I can dig into the code and write a clearer and/or more detailed spec if I had even *some* feedback. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 14/10/09 7:06, Stephen Frost wrote: * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs si...@2ndquadrant.com wrote: I hope we can get pgAudit in as a module for 9.5. I also hope that it will stimulate the requirements/funding of further work in this area, rather than squash it. My feeling is we have more examples of feature sets that grow over time (replication, view handling, hstore/JSONB etc) than we have examples of things languishing in need of attention (partitioning). +1 To this point, specifically, I'll volunteer to find time in Novemeber to review pgAudit for inclusion as a contrib module. I feel a bit responsible for it not being properly reviewed earlier due to my upgrade and similar concerns. Perhaps the latest version should be posted and added to the commitfest for 2014-10 and I'll put myself down as a reviewer..? I don't see it there now. I don't mean to be dismissive by suggesting it be added to the commitfest- I honestly don't see myself having time before November given the other things I'm involved in right now and pgConf.eu happening in a few weeks. Thanks :) We're updating pgAudit for submission this for the upcoming commitfest, it will be added within the next few days. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp ma...@juffo.org wrote: On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me Indeed, the current behavior breaks even the canonical keep track of how many posts are in a thread trigger example because INSERT triggers are fired for an effective row UPDATE. I can't see how that's acceptable. DB2 had the foresight to add the following restrictions to BEFORE ROW triggers - they cannot: Contain any INSERT, DELETE, or UPDATE operations, nor invoke any routine defined with MODIFIES SQL DATA, if it is not a compound SQL. Contain any DELETE or UPDATE operations on the trigger subject table, nor invoke any routine containing such operations, if it is a compound SQL. Reference a materialized query table defined with REFRESH IMMEDIATE Reference a generated column other than the identity column in the NEW transition variable. To get a sense of how doing fancy things in before triggers leads to trouble, considering the hardening Kevin added in commit 6868ed74. In short, use an AFTER trigger for this kind of thing, and all of these issues go away. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. I proposed an alternative that avoids this surprise and might allow some other benefits. Can you please look into that? I looked at that. You're talking about making the case where before row triggers clearly are appropriate (when you just want to change the tuple being inserted) fail, in order to make an arguably inappropriate case for before row triggers work (when you don't change the tuple to be inserted, but you do want to do some other thing). That seems backwards. My proposed behavior seems far less surprising. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Ah, cool. I'll try that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Wed, Oct 8, 2014 at 2:55 AM, David Fetter da...@fetter.org wrote: On Wed, Oct 08, 2014 at 12:41:46AM -0300, Fabrízio de Royes Mello wrote: On Wed, Oct 8, 2014 at 12:36 AM, Josh Berkus j...@agliodbs.com wrote: On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote: We can think in a mechanism to create properties / options and assign it to objects (table, index, column, schema, ...) like COMMENT does. A quickly thought: CREATE OPTION [ IF NOT EXISTS ] name VALIDATOR valfunction [ DEFAULT value ]; ALTER TABLE name SET OPTION optname { TO | = } { value | 'value' | DEFAULT }; It's just a simple thought of course. We must think better about the syntax and purposes. If we're going to do this (and it seems like a good idea), we really, really, really need to include some general system views which expose the options in a user-friendly format (like columns, JSON or an array). It's already very hard for users to get information about what reloptions have been set. Maybe into information_schema ?? Not there. The information schema is defined pretty precisely in the SQL standard and contains only some of this kind of information. You're absolutely right.. thanks... pg_catalog seems like a much more appropriate space for PostgreSQL-specific settings. We can add views and some functions to get this info too. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan p...@heroku.com wrote: Indeed, the current behavior breaks even the canonical keep track of how many posts are in a thread trigger example use an AFTER trigger for this kind of thing, and all of these issues go away. In the latest patches from CommitFest, the UPDATE case invokes BEFOREAFTER INSERT triggers, not AFTER UPDATE. Is that going to be changed? If so, I concede that. I proposed an alternative that avoids this surprise and might allow some other benefits. Can you please look into that? I looked at that. Thanks! You're talking about making the case where before row triggers clearly are appropriate (when you just want to change the tuple being inserted) fail Only in case the trigger changes *key* columns necessary for atomicity (i.e. from the WITHIN index). Other columns are fair game. The restriction seems justifiable to me: it's unreasonable to be atomic with respect to values that change mid-way. * It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers, which your approach fundamentally cannot. * It can probably avoid evaluating column defaults in the UPDATE case, like nextval() for unrelated serial columns = reduced overhead * The semantics match better with MERGE-like syntax that seems to come up again and again. * And it appears to solve (part?) of the problem you had with naming key *colums* in the WITHIN clause, rather than using index name. If you don't see any reasons why it can't be done, these benefits seem clear to me. I think the tradeoffs at least warrant wider discussion. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp ma...@juffo.org wrote: Oh, one more consideration: I believe you will run into the same issue if you want to implement BEFORE UPDATE triggers in any form. Skipping BEFORE UPDATE entirely seems to violate POLA. Good thing that the patch doesn't do that, then. I clearly documented this in a few places, including: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html It's common for applications to e.g. use triggers to keep track of latest modified time for a row. With your proposal, every query needs to include logic for that to work. Wrong. If you don't see any reasons why it can't be done, these benefits seem clear to me. I think the tradeoffs at least warrant wider discussion. I don't. That's very surprising. One day, it will fail unexpectedly. As proposed, the way BEFORE INSERT triggers fire almost forces users to consider the issues up-front. Not necessarily up-front, as proposed it causes existing triggers to change behavior when users adopt the upsert feature. And that adoption may even be transparent to the user due to ORM magic. There are potential surprises with both approaches. When you make the slightest effort to understand what my approach is, I might take your remarks seriously. Note that the CONFLICTING() behavior with respect to BEFORE INSERT triggers work's the same as MySQL's INSERT ON DUPLICATE KEY UPDATE foo = VALUES(foo) thing. There was agreement that that was the right behavior, it seemed. MySQL gets away with lots of things, they have several other caveats with triggers. I don't think it's a good example to follow wrt trigger behavior. No true Scotsman. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Thu, Oct 9, 2014 at 4:56 AM, Marti Raudsepp ma...@juffo.org wrote: create trigger ev1 before insert on evt_type execute procedure ins(); create trigger ev2 before update on evt_type execute procedure upd(); create trigger ev3 after insert on evt_type execute procedure ins(); create trigger ev4 after update on evt_type execute procedure upd(); Oops, I missed for each row here. Sorry for wasting your time. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Wed, Oct 8, 2014 at 7:04 PM, Marti Raudsepp ma...@juffo.org wrote: Oops, I missed for each row here. Note that I have an open item for what to do about statement level triggers here: https://wiki.postgresql.org/wiki/UPSERT I'm not satisfied with the current behavior. It needs more thought. But I think row level triggers are fine. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On Wed, Oct 8, 2014 at 11:22 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Even SELECT 1 is going to be doing *far* more pallocs than that to get through raw parsing, parse analysis, planning, and execution startup. If you can find a few hundred pallocs we can avoid in trivial queries, it would get interesting; but I'll be astonished if saving 4 is measurable. I got nerd-sniped by this problem today, probably after staring at the profiling data very similar to what led Andres to ask the question in the first place. The gains are indeed not measurable on a macrobenchmark, but I had to write the patch to figure that out, so here it is. Although the gain isn't a measurable percentage of total runtime, it does appear to be a significant percentage of the palloc traffic on trivial queries. I did 30-second SELECT-only pgbench runs at scale factor 10, using prepared queries. According to perf, on unpatched master, StartTransactionCommand accounts for 11.46% of the calls to AllocSetAlloc. (I imagine this is by runtime, not by call count, but it probably doesn't matter much either way.) But with this patch, StartTransactionCommand's share drops to 4.43%. Most of that is coming from AtStart_Inval(), which wouldn't be hard to fix either. I'm inclined to think that tamping down palloc traffic is worthwhile even if we can't directly measure the savings on a macrobenchmark. AllocSetAlloc is often at or near the top of profiling results, but there's rarely a single caller responsible for enough of those calls for to make it worth optimizing. But that means that if we refuse to take patches that save just a few pallocs, we're basically giving up on ever improving anything in this area, and that doesn't seem like a good idea either; it's only going to get slowly worse over time as we add more features. Yeah, this makes sense, even otherwise also if somebody comes up with a much larger patch which reduce few dozen of pallocs and shows noticeable gain, then it will be difficult to quantify the patch based on which particular pallocs gives improvements, as reducing few pallocs might not give any noticeable gain. BTW, if anyone's curious what the biggest source of AllocSetAlloc calls is on this workload, the answer is ExecInitIndexScan(), which looks to be indirectly responsible for almost half the total (or just over half, with the patch applied). That apparently calls a lot of different things that allocate small amounts of memory, and it accounts for nearly all of the AllocSetAlloc traffic that originates from PortalStart(). One way to dig into this problem is that we look at each individual pallocs in PortalStart() path and try to see which one's can be optimized, another way is that we introduce a concept of Prepared Portals some thing similar to Prepared Statements, but for Portal. This will not only avoid the pallocs in executor startup phase, but can improve the performance by avoiding many other calls related to executor startup. I understand that this will be a much bigger project, however the gains can also be substantial for prepared statements when all/most of the data fits in memory. It seems other databases also have similar concepts for execution (example Oracle uses Cursor_sharing/Shared cursors to achieve the same) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Thu, Oct 9, 2014 at 4:02 AM, Andres Freund and...@2ndquadrant.com wrote: /* + * Arrange to remove a dynamic shared memory mapping at cleanup time. + * + * dsm_keep_mapping() can be used to preserve a mapping for the entire + * lifetime of a process; this function reverses that decision, making + * the segment owned by the current resource owner. This may be useful + * just before performing some operation that will invalidate the segment + * for future use by this backend. + */ +void +dsm_unkeep_mapping(dsm_segment *seg) +{ + Assert(seg-resowner == NULL); + ResourceOwnerEnlargeDSMs(CurrentResourceOwner); + seg-resowner = CurrentResourceOwner; + ResourceOwnerRememberDSM(seg-resowner, seg); +} Hm, I dislike the name unkeep. I also think function name is not appropriate as per functionality. I guess you want to be symmetric to dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup() dm_remember_mapping()? Another could be dsm_change_mapping(). Yet another idea could be that we use single function (dsm_manage_mapping() with an additional parameter to indicate the scope of segment) instead of two different functions dsm_keep_mapping() and dsm_unkeep_mapping(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
What set of options would you pass if you want to use it as a synchronous standby? And if you don't? Could we just have a single --synchronous flag, instead of -F and --reply-fsync? If you set synchronous_commit as remote_write, the options would be different . The set of options in each case, see the following. Synchronous standby(synchronous_commit=on) --fsync-interval=-1 --reply-fsync --slot=slotname Synchronous standby(synchronous_commit=remote_write) --fsync-interval=-1 --reply-fsync Asynchronous There are no relative options. Well, if the response time delay(value of --status-interval=interval) is acceptable, --reply-fsync is unnecessary. Instead of --reply-fsync, using --synchronous(which summarizes the --reply-fsync and fsync-interval = -1) might be easy to understand. Although, in that case, --fsync-interval=interval would be fixed value. Isn't there any problem ? I think we should remove --fsync-interval and --reply-fsync, and just have a --synchronous option, which would imply the same behavior you get with --fsync-interval=-1 --reply--fsync. That leaves the question of whether pg_receivexlog should do fsyncs when it's not acting as a synchronous standby. There isn't any real need to do so. In asynchronous mode, there are no guarantees anyway, and even without an fsync, the OS will eventually flush the data to disk. But we could do even better than that. It would be best if you didn't need even the --synchronous flag. The server knows whether the client is a synchronous standby or not. It could tell the client. If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. In asynchronous mode, I think there is no problem since the specification is same with release version. The server knows whether the client is a synchronous standby or not. It could tell the client. When notifying the synchronous/asynchronous mode to the client from the server, do we need to change the format of the Message ? Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. Thanks for the reply. If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. Here is an example. When WAL is sent at 100ms intervals, fsync() is executed 10 times per second. If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of making WAL record) for 1 second, though, fsync() won't be executed several times in 1 second. I think --fsync-interval meets the demands of people who wants to restrain fsync() happens for several time in short period, but what do you think? Is it ok to delete --fsync-interval ? The server knows whether the client is a synchronous standby or not. It could tell the client. When notifying the synchronous/asynchronous mode to the client from the server, do we need to change the format of the Message ? Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. What kind of 'message type' we have to add ? Do we need to separate 'w' into two types ? synchronous and asynchronous ? OR Add a new message type, kind of 'notify synchronous', and inform pg_receivexlog of synchronous status when it connect to the server. Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hello, simplly inhibit set retry flag when ProcDiePending in my_sock_write seems enough. But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so I modified the patch 4 as the attached patch. Finally, the attached patch works as expected with Andres's patch 1-3. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 6fc6903..2288fe2 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size) BIO_clear_retry_flags(h); if (res = 0) { - if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) + if (!ProcDiePending + (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)) { BIO_set_retry_write(h); } diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 7b5b30f..ab9e122 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len) { ssize_t n; +retry: #ifdef USE_SSL if (port-ssl_in_use) { @@ -210,7 +211,26 @@ secure_write(Port *port, void *ptr, size_t len) n = secure_raw_write(port, ptr, len); } - /* XXX: We likely will want to process some interrupts here */ + /* + * We only want to process the interrupt here if socket writes are + * blocking to increase the chance to get an error message to the + * client. If we're not blocked there'll soon be a + * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of + * that situation if the client has died. + */ + if (ProcDiePending !port-noblock n 0) + { + /* + * We're dying. It's safe (and sane) to handle that now. + */ + CHECK_FOR_INTERRUPTS(); + } + + /* retry after processing interrupts */ + if (n 0 errno == EINTR) + { + goto retry; + } return n; } @@ -236,10 +256,19 @@ wloop: * don't do anything while (possibly) inside a ssl library. */ w = WaitLatchOrSocket(MyProc-procLatch, - WL_SOCKET_WRITEABLE, + WL_LATCH_SET | WL_SOCKET_WRITEABLE, port-sock, 0); - if (w WL_SOCKET_WRITEABLE) + if (w WL_LATCH_SET) + { + ResetLatch(MyProc-procLatch); + /* + * Force a return, so interrupts can be processed when not + * (possibly) underneath a ssl library. + */ + errno = EINTR; + } + else if (w WL_SOCKET_WRITEABLE) { goto wloop; } diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 3a6aa1c..61390aa 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -520,6 +520,13 @@ ProcessClientReadInterrupt(void) errno = save_errno; } + else if (ProcDiePending) + { + /* + * We're dying. It's safe (and sane) to handle that now. + */ + CHECK_FOR_INTERRUPTS(); + } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 8 October 2014 23:24, Peter Geoghegan p...@heroku.com wrote: Also, it would be useful to hear that your're going to do something about the references to rows using conflicting(), since nobody has agreed with you there. Or hopefully even that you've listened and implemented something differently already. (We need that, whatever we do with other elements of syntax). Do you really expect me to do major work on some aspect of the syntax like that, given, as you say, that nobody explicitly agreed with me (and only you disagreed with me)? The only remark I heard on that was from you (you'd prefer to use NEW.* and OLD.*). But you spent much more time talking about getting something MERGE-like, which NEW.*/OLD.* clearly isn't. Yes, it is. Look at the AS clause. CONFLICTING() is very close (identical?) to MySQL's use of ON DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that, but it's news to me that people take particular issue with it. 3 people have asked you questions or commented about the use of CONFLICTING() while I've been watching. It's clearly a non-standard mechanism and not inline with other Postgres usage. Nobody actually says I object to this - do they need to use that phrase before you take note? I'm beginning to feel that giving you review comments is being seen as some kind of negative action. Needing to repeat myself makes it clear that you aren't taking note. Yes, I expect you to do these things * collect other people's input, even if you personally disagree * if there is disagreement amongst reviewers, seek to resolve that in a fair and reasonable manner * publish a summary of changes requested * do major work to address them So yes, I really expect that. It doesn't matter that it is only SImon or only Kevin. **One** comment is enough for you to take note. If there is disagreement, publishing the summary of changes you plan to make in your next version will help highlight that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers