Re: [HACKERS] postgresql latency bgwriter not doing its job
Hello Aidan, If all you want is to avoid the write storms when fsyncs start happening on slow storage, can you not just adjust the kernel vm.dirty* tunables to start making the kernel write out dirty buffers much sooner instead of letting them accumulate until fsyncs force them out all at once? I tried that by setting: vm.dirty_expire_centisecs = 100 vm.dirty_writeback_centisecs = 100 So it should start writing returned buffers at most 2s after they are returned, if I understood the doc correctly, instead of at most 35s. The result is that with a 5000s 25tps pretty small load (the system can do 300tps with the default configuration), I lost 2091 (1.7%) of transactions, that is they were beyond the 200ms schedule limit. Another change is that overall the lost transactions are more spread than without this setting, although there still is stretches of unresponsiveness. So although the situation is significantly better, it is still far from good with the much reduced value I tried. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On Wed, Aug 27, 2014 at 2:46 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: I have done some more review, below are my comments: Thanks! 1. There are currently two loops on *num_sync, Can we simplify the function SyncRepGetSynchronousNodes by moving the priority calculation inside the upper loop if (*num_sync == allowed_sync_nodes) { for (j = 0; j *num_sync; j++) { Anyway we require priority only if *num_sync == allowed_sync_nodes condition matches. So in this loop itself, we can calculate the priority as well as assigment of new standbys with lower priority. Let me know if you see any issue with this. OK, I see, yes this can minimize process a bit so I refactored the code by integrating the second loop to the first. This has needed the removal of the break portion as we need to find the highest priority value among the nodes already determined as synchronous. 2. Comment inside the function SyncRepReleaseWaiters, /* * We should have found ourselves at least, except if it is not expected * to find any synchronous nodes. */ Assert(num_sync 0); I think except if it is not expected to find any synchronous nodes is not required. As if it has come till this point means at least this node is synchronous. Yes, removed. 3. Document says that s_s_num should be lesser than max_wal_senders but code wise there is no protection for the same. IMHO, s_s_num should be lesser than or equal to max_wal_senders otherwise COMMIT will never return back the console without any knowledge of user. I see that some discussion has happened regarding this but I think just adding documentation for this is not enough. I am not sure what issue is observed in adding check during GUC initialization but if there is unavoidable issue during GUC initialization then can't we try to add check at later points. The trick here is that you cannot really return a warning at GUC loading level to the user as a warning could be easily triggered if for example s_s_num is present before max_wal_senders in postgresql.conf. I am open to any solutions if there are any (like an error when initializing WAL senders?!). Documentation seems enough for me to warn the user. 4. Similary interaction between parameters s_s_names and s_s_num. I see some discussion has happened regarding this and it is acceptable to have s_s_num more than s_s_names. But I was thinking should not give atleast some notice message to user for such case along with some documentation. Done. I added the following in the paragraph Server will wait: Hence it is recommended to not set varnamesynchronous_standby_num/ to a value higher than the number of elements in varnamesynchronous_standby_names/. 5. At any one time there will be at a number of active synchronous standbys: this sentence is not proper. What about that: At any one time there can be a number of active synchronous standbys up to the number defined by xref linkend=guc-synchronous-standby-num 6. When this parameter is set to literal0/, all the standby nodes will be considered as asynchronous. Can we make this as When this parameter is set to literal0/, all the standby nodes will be considered as asynchronous irrespective of value of synchronous_standby_names. Done. This seems proper for the user as we do not care at all about s_s_names if _num = 0. 7. Are considered as synchronous the first elements of xref linkend=guc-synchronous-standby-names in number of xref linkend=guc-synchronous-standby-num that are connected. Starting of this sentence looks to be incomplete. OK, I reworked this part as well. I hope it is clearer. 8. Standbys listed after this will take over the role of synchronous standby if the first one should fail. Should not we make it as: Standbys listed after this will take over the role of synchronous standby if any of the first synchronous-standby-num standby fails. Fixed as proposed. At the same I found a bug with pg_stat_get_wal_senders caused by a NULL pointer that was freed when s_s_num = 0. Updated patch addressing comments is attached. On top of that the documentation has been reworked a bit by replacing the too-high amount of xref blocks by varname, having a link to a given variable specified only once. Regards, -- Michael *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2585,2597 include_dir 'conf.d' para Specifies a comma-separated list of standby names that can support firsttermsynchronous replication/, as described in ! xref linkend=synchronous-replication. ! At any one time there will be at most one
Re: [HACKERS] Specifying the unit in storage parameter
On Thu, Aug 28, 2014 at 12:55 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Aug 27, 2014 at 10:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Not necessarily, because it's harmless. It's there for purely aesthetical reasons, so it's your choice whether to add it or not. Having it there is slightly easier on somebody reading the code, perhaps. Agreed. On my side, that's up to you Fujii-san. The patch does what it states, I only think that this extra 0 should be added either everywhere or nowhere. Yep. I added extra 0 everywhere. Ok, I just applied the patch. Thanks for the review! Not mandatory either: drop test_param_unit in the regression tests after running the test queries. I don't have strong opinion about this. There are many tables which regression test creates but doesn't drop. But if you strongly think that the table must be dropped, I'm OK with that. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specifying the unit in storage parameter
On Thu, Aug 28, 2014 at 4:20 PM, Fujii Masao masao.fu...@gmail.com wrote: I don't have strong opinion about this. There are many tables which regression test creates but doesn't drop. But if you strongly think that the table must be dropped, I'm OK with that. This remark is just to limit the amount of trash in the database used for regression tests. But then if we'd remove everything we would lack handy material for tests on utilities like database-wide thingies of the type VACUUM, REINDEX, pg_dump, etc. And we can just drop the database used for regressions to clean up everything. So that's not mandatory at all. I tend to always clean up objects in my patches touching regressions to limit interactions with other tests, but I guess that's up to the person who wrote the code to decide. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On 28 August 2014 00:25, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Thomas Munro wrote: Thanks, I hadn't seen this, I should have checked the archives better. I have actually already updated my patch to handle EvalPlanQualFetch with NOWAIT and SKIP LOCKED with isolation specs, see attached. I will compare with Craig's and see if I screwed anything up... of course I am happy to merge and submit a new patch on top of Craig's if it's going to be committed. Thanks, please rebase. Thanks -- new patch attached. Best regards, Thomas Munro diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 231dc6a..0469705 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -45,7 +45,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac [ LIMIT { replaceable class=parametercount/replaceable | ALL } ] [ OFFSET replaceable class=parameterstart/replaceable [ ROW | ROWS ] ] [ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] { ROW | ROWS } ONLY ] -[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ] [...] ] +[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase @@ -1283,7 +1283,7 @@ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] { The locking clause has the general form synopsis -FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ] +FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ] /synopsis where replaceablelock_strength/ can be one of @@ -1359,11 +1359,17 @@ KEY SHARE para To prevent the operation from waiting for other transactions to commit, -use the literalNOWAIT/ option. With literalNOWAIT/, the statement -reports an error, rather than waiting, if a selected row -cannot be locked immediately. Note that literalNOWAIT/ applies only -to the row-level lock(s) mdash; the required literalROW SHARE/literal -table-level lock is still taken in the ordinary way (see +use either the literalNOWAIT/ or literalSKIP LOCKED/literal +option. With literalNOWAIT/, the statement reports an error, rather +than waiting, if a selected row cannot be locked immediately. +With literalSKIP LOCKED/literal, any selected rows that cannot be +immediately locked are skipped. Skipping locked rows provides an +inconsistent view of the data, so this is not suitable for general purpose +work, but can be used to avoid lock contention with multiple consumers +accessing a queue-like table. Note that literalNOWAIT/ +and literalSKIP LOCKED/literal apply only to the row-level lock(s) +mdash; the required literalROW SHARE/literal table-level lock is +still taken in the ordinary way (see xref linkend=mvcc). You can use xref linkend=sql-lock with the literalNOWAIT/ option first, @@ -1386,14 +1392,14 @@ KEY SHARE /para para -Multiple locking -clauses can be written if it is necessary to specify different locking -behavior for different tables. If the same table is mentioned (or -implicitly affected) by more than one locking clause, -then it is processed as if it was only specified by the strongest one. -Similarly, a table is processed -as literalNOWAIT/ if that is specified in any of the clauses -affecting it. +Multiple locking clauses can be written if it is necessary to specify +different locking behavior for different tables. If the same table is +mentioned (or implicitly affected) by more than one locking clause, then +it is processed as if it was only specified by the strongest one. +Similarly, a table is processed as literalNOWAIT/ if that is specified +in any of the clauses affecting it. Otherwise, it is processed +as literalSKIP LOCKED/literal if that is specified in any of the +clauses affecting it. /para para @@ -1930,9 +1936,9 @@ SELECT distributors.* WHERE distributors.name = 'Westward'; productnamePostgreSQL/productname allows it in any commandSELECT/ query as well as in sub-commandSELECT/s, but this is an extension. The literalFOR NO KEY UPDATE/, literalFOR SHARE/ and -literalFOR KEY SHARE/ variants, -as well as the literalNOWAIT/ option, -do not appear in the standard. +literalFOR KEY SHARE/ variants, as well as the literalNOWAIT/ +and literalSKIP LOCKED/literal options, do not appear in the +standard. /para /refsect2 diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml index ba92607..57396d7 100644 --- a/doc/src/sgml/sql.sgml +++ b/doc/src/sgml/sql.sgml @@ -863,7
Re: [HACKERS] Function to know last log write timestamp
On Thu, Aug 28, 2014 at 2:44 AM, Jim Nasby j...@nasby.net wrote: On 8/27/14, 7:33 AM, Fujii Masao wrote: On Tue, Aug 19, 2014 at 1:07 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-14 14:37:22 -0400, Robert Haas wrote: On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-14 14:19:13 -0400, Robert Haas wrote: That's about the idea. However, what you've got there is actually unsafe, because shmem-counter++ is not an atomic operation. It reads the counter (possibly even as two separate 4-byte loads if the counter is an 8-byte value), increments it inside the CPU, and then writes the resulting value back to memory. If two backends do this concurrently, one of the updates might be lost. All these are only written by one backend, so it should be safe. Note that that coding pattern, just without memory barriers, is all over pgstat.c Ah, OK. If there's a separate slot for each backend, I agree that it's safe. We should probably add barriers to pgstat.c, too. Yea, definitely. I think this is rather borked on weaker architectures. It's just that the consequences of an out of date/torn value are rather low, so it's unlikely to be noticed. Imo we should encapsulate the changecount modifications/checks somehow instead of repeating the barriers, Asserts, comments et al everywhere. So what about applying the attached patch first, which adds the macros to load and store the changecount with the memory barries, and changes pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4? After applying the patch, I will rebase the pg_last_xact_insert_timestamp patch and post it again. That looks OK to me on a relatively-quick read-through. I was initially a bit worried about this part: do { ! pgstat_increment_changecount_before(beentry); } while ((beentry-st_changecount 1) == 0); pgstat_increment_changecount_before is an increment followed by a write barrier. This seemed like funny coding to me at first because while-test isn't protected by any sort of barrier. But now I think it's correct, because there's only one process that can possibly write to that data, and that's the one that is making the test, and it had certainly better see its own modifications in program order no matter what. I wouldn't object to back-patching this to 9.4 if we were earlier in the beta cycle, but at this point I'm more inclined to just put it in 9.5. If we get an actual bug report about any of this, we can always back-patch the fix at that time. But so far that seems mostly hypothetical, so I think the less-risky course of action is to give this a longer time to bake before it hits an official release. Sounds reasonable. So, barring any objection, I will apply the patch only to the master branch. It's probably worth adding a comment explaining why it's safe to do this without a barrier... s/without/with ? Theoretically it's not safe without a barrier on a machine with weak memory ordering. No? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
On Thu, Jun 19, 2014 at 5:29 PM, Ian Barwick i...@2ndquadrant.com wrote: On 12/06/14 20:37, Fujii Masao wrote: On Wed, Jun 11, 2014 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: Your wish just seems like a separate feature to me. Including replication commands in 'all' seems correct independent of the desire for a more granular control. No, I think I've got to vote with the other side on that. The reason we can have log_statement as a scalar progression none ddl mod all is that there's little visible use-case for logging DML but not DDL, nor for logging SELECTS but not INSERT/UPDATE/DELETE. However, logging replication commands seems like something people would reasonably want an orthogonal control for. There's no nice way to squeeze such a behavior into log_statement. I guess you could say that log_statement treats replication commands as if they were DDL, but is that really going to satisfy users? I think we should consider log_statement to control logging of SQL only, and invent a separate GUC (or, in the future, likely more than one GUC?) for logging of replication activity. Seems reasonable. OK. The attached patch adds log_replication_command parameter which causes replication commands to be logged. I added this to next CF. A quick review: Sorry, I've noticed this email right now. Thanks for reviewing the patch! - minor rewording for the description, mentioning that statements will still be logged as DEBUG1 even if parameter set to 'off' (might prevent reports of the kind I set it to 'off', why am I still seeing log entries?). para Causes each replication command to be logged in the server log. See xref linkend=protocol-replication for more information about replication commands. The default value is literaloff/. When set to literaloff/, commands will be logged at log level literalDEBUG1/literal. Only superusers can change this setting. /para Yep, fixed. Attached is the updated version of the patch. - I feel it would be more consistent to use the plural form for this parameter, i.e. log_replication_commands, in line with log_lock_waits, log_temp_files, log_disconnections etc. But log_statement is in the singular form. So I just used log_replication_command. For the consistency, maybe we need to change both parameters in the plural form? I don't have strong opinion about this. - It might be an idea to add a cross-reference to this parameter from the Streaming Replication Protocol page: http://www.postgresql.org/docs/devel/static/protocol-replication.html Yes. I added that. Regards, -- Fujii Masao *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 4558,4563 FROM pg_stat_activity; --- 4558,4581 /listitem /varlistentry + varlistentry id=guc-log-replication-command xreflabel=log_replication_command + termvarnamelog_replication_command/varname (typeboolean/type) + indexterm +primaryvarnamelog_replication_command/ configuration parameter/primary + /indexterm + /term + listitem +para + Causes each replication command to be logged in the server log. + See xref linkend=protocol-replication for more information about + replication command. The default value is literaloff/. When + set to literaloff/, commands will be logged at log level + literalDEBUG1/ (or lower). + Only superusers can change this setting. +/para + /listitem + /varlistentry + varlistentry id=guc-log-temp-files xreflabel=log_temp_files termvarnamelog_temp_files/varname (typeinteger/type) indexterm *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** *** 1306,1311 To initiate streaming replication, the frontend sends the --- 1306,1314 of literaltrue/ tells the backend to go into walsender mode, wherein a small set of replication commands can be issued instead of SQL statements. Only the simple query protocol can be used in walsender mode. + Replication commands are logged in the server log at log level + literalDEBUG1/ (or lower) or when + xref linkend=guc-log-replication-command is enabled. Passing literaldatabase/ as the value instructs walsender to connect to the database specified in the literaldbname/ parameter, which will allow the connection to be used for logical replication from that database. *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** *** 108,113 bool am_db_walsender = false; /* Connected to a database? */ --- 108,114 int max_wal_senders = 0; /* the maximum number of concurrent walsenders */ int wal_sender_timeout = 60 * 1000; /* maximum time to send one * WAL data message */ + bool log_replication_command = false;
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Aug 28, 2014 at 12:03 AM, Kevin Grittner kgri...@ymail.com wrote: In essence, make the relations work like PL/pgSQL variables do. If you squint a little, the new/old relation is a variable from the function's point of view, and a parameter from the planner/executor's point of view. It's just a variable/parameter that holds a set of tuples, instead of a single Datum. will be surprised if someone doesn't latch onto this to create a new declared temporary table that only exists within the scope of a compound statement (i.e., a BEGIN/END block). You would DECLARE them just like you would a scalar variable in a PL, and they would have the same scope. I'll take a look at doing this in the next couple days, and see whether doing it that way is as easy as it seems on the face of it. We already have all this with refcursor variables. Admittedly cursors have some weird semantics (mutable position) and currently they're cumbersome to combine into queries, but would a new relation variable be sufficiently better to warrant a new type? Why not extend refcursors and make them easier to use instead? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Hi,all I have a question about data type timestamp with time zone. Why data of timestamptz does not store value of timezone passed to it? Considering the following example. postgres=# select '2014-08-28 14:30:30.423602+02'::timestamp with time zone; timestamptz --- 2014-08-28 20:30:30.423602+08 (1 row) The timezone of output(+08) is different with the original input value(+02). It seems not to be good behavior.But the behavior of date type time with time zone is correct. postgres=# select '14:30:30.423602+02'::time with time zone; timetz 14:30:30.423602+02 (1 row) If the corrent behavior of timestamptz is not suitable,is there any plan to correct the behavior of timestamptz or create a new data type which can store timestamp with timezone? *)manual--8.5.1.3. Time Stamps - For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone. - Best regarts rohto rohto
Re: [HACKERS] inherit support for foreign tables
(2014/08/22 11:51), Noah Misch wrote: On Wed, Aug 20, 2014 at 08:11:01PM +0900, Etsuro Fujita wrote: (2014/07/02 11:23), Noah Misch wrote: Your chosen ANALYZE behavior is fair, but the messaging from a database-wide ANALYZE VERBOSE needs work: INFO: analyzing test_foreign_inherit.parent INFO: parent: scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows INFO: analyzing test_foreign_inherit.parent inheritance tree Please arrange to omit the 'analyzing tablename inheritance tree' message, since this ANALYZE actually skipped that task. I think it would be better that this is handled in the same way as an inheritance tree that turns out to be a singe table that doesn't have any descendants in acquire_inherited_sample_rows(). That would still output the message as shown below, but I think that that would be more consistent with the existing code. The patch works so. Today's ANALYZE VERBOSE messaging for former inheritance parents (tables with relhassubclass = true but no pg_inherits.inhparent links) is deceptive, and I welcome a fix to omit the spurious message. As defects go, this is quite minor. There's fundamentally no value in collecting inheritance tree statistics for such a parent, and no PostgreSQL command will do so. Your proposed behavior for inheritance parents having at least one foreign table child is more likely to mislead DBAs in practice. An inheritance tree genuinely exists, and a different ANALYZE command is quite capable of collecting statistics on that inheritance tree. Messaging should reflect the difference between ANALYZE invocations that do such work and ANALYZE invocations that skip it. I'm anticipating a bug report along these lines: I saw poor estimates involving a child foreign table, so I ran ANALYZE VERBOSE, which reported 'INFO: analyzing public.parent inheritance tree'. Estimates remained poor, so I scratched my head for awhile before noticing that pg_stats didn't report such statistics. I then ran ANALYZE VERBOSE parent, saw the same 'INFO: analyzing public.parent inheritance tree', but this time saw relevant rows in pg_stats. The message doesn't reflect the actual behavior. I'll sympathize with that complaint. It's a minor point overall, but the code impact is, I predict, small enough that we may as well get it right. A credible alternative is to emit a second message indicating that we skipped the inheritance tree statistics after all, and why we skipped them. I'd like to address this by emitting the second message as shown below: INFO: analyzing public.parent INFO: parent: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows INFO: analyzing public.parent inheritance tree INFO: skipping analyze of public.parent inheritance tree --- this inheritance tree contains foreign tables Attached is the update version of the patch. Based on the result of discussions about attr_needed upthread, I've changed the patch so that create_foreignscan_plan makes reference to reltargetlist, not to attr_needed. (So, the patch in [1] isn't required, anymore.) Other changes: * Revise code/comments/docs a bit * Add more regression tests A separate patch (analyze.patch) handles the former case in a similar way. [1] http://www.postgresql.org/message-id/53f4707c.8030...@lab.ntt.co.jp Thanks, Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 115,120 static void fileGetForeignRelSize(PlannerInfo *root, --- 115,125 static void fileGetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid); + static ForeignPath *fileReparameterizeForeignPath(PlannerInfo *root, + RelOptInfo *baserel, + Oid foreigntableid, + ForeignPath *path, + Relids required_outer); static ForeignScan *fileGetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid, *** *** 143,149 static bool check_selective_binary_conversion(RelOptInfo *baserel, static void estimate_size(PlannerInfo *root, RelOptInfo *baserel, FileFdwPlanState *fdw_private); static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel, ! FileFdwPlanState *fdw_private, Cost *startup_cost, Cost *total_cost); static int file_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, --- 148,154 static void estimate_size(PlannerInfo *root, RelOptInfo *baserel, FileFdwPlanState *fdw_private); static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel, ! FileFdwPlanState *fdw_private, List *join_conds, Cost *startup_cost, Cost *total_cost); static int file_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, *** *** 161,166
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/27 22:56), Robert Haas wrote: On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed during postgresBeginForeignScan rather than during postgresIterateForeignScan. It probably does not matter, but is there a reason to do it different from the normal scan? Hmm, I'm worried that may be an API contract violation. Will fix. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] v4 protocol TODO item - Lazy fetch/stream of TOASTed values?
Hi all Per the protocol todo: https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes do you think it's reasonable to allow for delayed sending of big varlena values and arrays in the protocol? The JDBC spec already has support for this in arrays, XML, and binary blobs, but AFAIK the backend can't support lazy fetching. Of course anbything like this would require an open transaction, but the JDBC spec already sets that as a requirement for SQLXML etc, specifying that resources must be retained for at least the duration of the transaction unless explicitly free'd by the application. As far as I can tell PostgreSQL is in a good position to support lazy fetch of big values. TOASTed values could be returned as an opaque placeholder value identifying the toast table oid and ctid, then fetched by sending a further protocol-level request to the server that could send values as a result set or by entering COPY mode to stream them. The transaction's xmin would already prevent the values being overwritten - at least on SERIALIZABLE isolation. For non-TOASTed values like big in-memory values it probably makes more sense to just force them to be sent direct to the client like they already are, rather than trying to stash them in server memory and address them for returning later. Though a tuplestore could always be used to spill to disk, I guess. Anyway. Random idea for the day. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile
On 2014-08-26 23:56:10 -0400, Peter Eisentraut wrote: On Tue, 2014-08-26 at 02:05 +0200, Andres Freund wrote: Currently running make -j16 all check in contrib/ results in a mess because all pg_regress invocations fight over the same port. Adding a simple .NOTPARALLEL: check-%-recurse into contrib/Makefile fixes that. Do we want that? But that causes also the all to be run in not-parallel, because the meaning of .NOTPARALLEL is: If `.NOTPARALLEL' is mentioned as a target, then this invocation of `make' will be run serially, even if the `-j' option is given. Argh. There goes that. It does not mean, as you appear to imagine, to run only the listed prerequisites in not-parallel. That would be nice! Yea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
Hi, On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote: Hi I chose \? xxx, because it is related to psql features. I wrote commands: \? options \? variables comments? Generall I like it. Some stuff I changed: * I rephrased the sgml changes * s/Printing options/Display options/. Or maybe Display influencing variables? That makes it clearer why they're listed under --help-variables. * I added \? commands as an alias for a plain \? That way the scheme can sensibly be expanded. * I renamed help_variables() to be inline with the surrounding functions. Stuff I wondered about: * How about making it --help=variables instead of --help-variables? That * Do we really need the 'Report bugs to' blurb for --help-variables? * Since we're not exhaustive about documenting environment variales, do we really need to document TMPDIR and SHELL? * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is for already started statements, and PROMPT3 is for COPY FROM STDIN? Looks like it's commitable next round. I've attached a incremental patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services 0001-help-variables-11.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/27 23:05), Tom Lane wrote: I wrote: Robert Haas robertmh...@gmail.com writes: Hmm, I'm worried that may be an API contract violation. Actually, there's another problem there. What of UPDATE or DELETE with a LIMIT clause, which is something that seems to be coming down the pike: https://commitfest.postgresql.org/action/patch_view?id=1550 I'd like to try to extend the functionality so as to push UPDATE/DELETE-with-LIMIT down into the remote server if it's safe. But I don't yet know if it's possible, because I started looking into the UPDATE/DELETE-with-LIMIT patch just now ... Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch versus \timing
On Tue, Aug 26, 2014 at 4:55 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/25/2014 10:48 PM, Heikki Linnakangas wrote: On 08/25/2014 09:22 PM, Fujii Masao wrote: On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I agree that refactoring this would be nice in the long-term, and I also agree that it's probably OK as it is in the short-term. I don't like the name PSQLexecInternal, though. PSQLexec is used for internal commands anyway. In fact it's backwards, because PSQLexecInternal is used for non-internal queries, given by \watch, while PSQLexec is used for internal commands. Agreed. So what about PSQLexecCommon (inspired by the relation between LWLockAcquireCommon and LWLockAcquire)? Or any better name? Actually, perhaps it would be better to just copy-paste PSQLexec, and modify the copy to suite \watch's needs. (PSQLexecWatch? SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much overlap between what \watch wants and what other PSQLexec callers want. \watch wants timing output, others don't. \watch doesn't want transaction handling. Agreed. Attached is the revised version of the patch. I implemented PSQLexecWatch() which sends the query, prints the results and outputs the query execution time (if \timing is enabled). This patch was marked as ready for committer, but since I revised the code very much, I marked this as needs review again. Do we want --echo-hidden to print the \watch'd query? Not sure.. Per document, --echo-hidden prints the actual queries generated by backslash command. But \watch doesn't handle backslash commands. So I think that PSQLexecWatch doesn't need to handle --echo-hidden. BTW, I just noticed that none of the callers of PSQLexec pass start_xact=true. So that part of the function is dead code. We might want to remove it, and replace with a comment noting that PSQLexec never starts a new transaction block, even in autocommit-off mode. (I know you're hacking on this, so I didnn't want to joggle your elbow by doing it right now) Good catch. So I will remove start_xact code later. Regards, -- Fujii Masao *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *** *** 2687,2693 do_watch(PQExpBuffer query_buf, long sleep) for (;;) { ! PGresult *res; time_t timer; long i; --- 2687,2693 for (;;) { ! int res; time_t timer; long i; *** *** 2701,2759 do_watch(PQExpBuffer query_buf, long sleep) myopt.title = title; /* ! * Run the query. We use PSQLexec, which is kind of cheating, but ! * SendQuery doesn't let us suppress autocommit behavior. */ ! res = PSQLexec(query_buf-data, false); ! ! /* PSQLexec handles failure results and returns NULL */ ! if (res == NULL) ! break; /* ! * If SIGINT is sent while the query is processing, PSQLexec will ! * consume the interrupt. The user's intention, though, is to cancel ! * the entire watch process, so detect a sent cancellation request and ! * exit in this case. */ ! if (cancel_pressed) ! { ! PQclear(res); break; ! } ! ! switch (PQresultStatus(res)) ! { ! case PGRES_TUPLES_OK: ! printQuery(res, myopt, pset.queryFout, pset.logfile); ! break; ! ! case PGRES_COMMAND_OK: ! fprintf(pset.queryFout, %s\n%s\n\n, title, PQcmdStatus(res)); ! break; ! ! case PGRES_EMPTY_QUERY: ! psql_error(_(\\watch cannot be used with an empty query\n)); ! PQclear(res); ! return false; ! ! case PGRES_COPY_OUT: ! case PGRES_COPY_IN: ! case PGRES_COPY_BOTH: ! psql_error(_(\\watch cannot be used with COPY\n)); ! PQclear(res); ! return false; ! ! default: ! /* other cases should have been handled by PSQLexec */ ! psql_error(_(unexpected result status for \\watch\n)); ! PQclear(res); ! return false; ! } ! ! PQclear(res); ! ! fflush(pset.queryFout); /* * Set up cancellation of 'watch' via SIGINT. We redo this each time --- 2701,2720 myopt.title = title; /* ! * Run the query and print out the results. We use PSQLexecWatch, ! * which is kind of cheating, but SendQuery doesn't let us suppress ! * autocommit behavior. */ ! res = PSQLexecWatch(query_buf-data, myopt); /* ! * PSQLexecWatch handles the case where we can no longer ! * repeat the query, and returns 0 or -1. */ ! if (res == 0) break; ! if (res == -1) ! return false; /* * Set up cancellation of 'watch' via SIGINT. We redo this each time *** a/src/bin/psql/common.c --- b/src/bin/psql/common.c *** *** 497,502 PSQLexec(const char *query, bool start_xact) --- 497,598 } + /* + * PSQLexecWatch + * + * This function is used for \watch command to send the query to + * the server and print out the results. + * + * Returns 1 if the query executed successfully, 0
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On Thu, Aug 28, 2014 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote: Hi I chose \? xxx, because it is related to psql features. I wrote commands: \? options \? variables comments? Generall I like it. Some stuff I changed: * I rephrased the sgml changes * s/Printing options/Display options/. Or maybe Display influencing variables? That makes it clearer why they're listed under --help-variables. * I added \? commands as an alias for a plain \? That way the scheme can sensibly be expanded. * I renamed help_variables() to be inline with the surrounding functions. Stuff I wondered about: * How about making it --help=variables instead of --help-variables? That * Do we really need the 'Report bugs to' blurb for --help-variables? * Since we're not exhaustive about documenting environment variales, do we really need to document TMPDIR and SHELL? * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is for already started statements, and PROMPT3 is for COPY FROM STDIN? Looks like it's commitable next round. I've attached a incremental patch. ISTM that you failed to attach the right patch. help-variables-10--11.diff contains only one line. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On 2014-08-28 13:20:07 +0200, Andres Freund wrote: I've attached a incremental patch. Apparently I didn't attach the patch, as so much a file containing the name of the patchfile... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 6d3189d..b14db45 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2581,12 +2581,15 @@ testdb=gt; userinput\setenv LESS -imx4F/userinput varlistentry -termliteral\? [ options | variables ]/literal/term +termliteral\? [ replaceable class=parametertopic/replaceable ]/literal/term listitem para -Shows help information about the backslash commands. This command can have a -option variables or options to take help for psql configuration variables -or psql command line options. +Shows help information. The parameter +replaceable class=parametertopic/replaceable can be +literalcommands/ (the default) to show help about backslash +commands, literaloptions/ to show information about commandline +arguments, or literalvariables/ to show information about +variables influencing applicationpsql/'s behaviour or display. /para /listitem /varlistentry diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 12cbb20..7e4626c 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1507,12 +1507,12 @@ exec_command(const char *cmd, char *opt0 = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, false); - if (!opt0) + if (!opt0 || strcmp(opt0, commands) == 0) slashUsage(pset.popt.topt.pager); - else if (strcmp(opt0, variables) == 0) - help_variables(pset.popt.topt.pager); else if (strcmp(opt0, options) == 0) usage(pset.popt.topt.pager); + else if (strcmp(opt0, variables) == 0) + helpVariables(pset.popt.topt.pager); else slashUsage(pset.popt.topt.pager); } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 5e7953d..e0d1d13 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -171,7 +171,7 @@ slashUsage(unsigned short int pager) fprintf(output, _(Help\n)); - fprintf(output, _( \\? description of all psql commands\n)); + fprintf(output, _( \\? [commands] description of all psql commands\n)); fprintf(output, _( \\? options description of psql options\n)); fprintf(output, _( \\? variables description of all psql configuration variables\n)); fprintf(output, _( \\h [NAME] help on syntax of SQL commands, * for all commands\n)); @@ -294,10 +294,12 @@ slashUsage(unsigned short int pager) /* - * show list of available variables (options) from command line + * helpVariables + * + * Show list of available variables (options). */ void -help_variables(unsigned short int pager) +helpVariables(unsigned short int pager) { FILE *output; @@ -335,7 +337,7 @@ help_variables(unsigned short int pager) fprintf(output, _( USER the currently connected database user\n)); fprintf(output, _( VERBOSITY control verbosity of error reports [default, verbose, terse]\n)); - fprintf(output, _(\nPrinting options:\n)); + fprintf(output, _(\nDisplay influencing variables:\n)); fprintf(output, _(Usage:\n)); fprintf(output, _( psql --pset=NAME[=VALUE]\n or \\pset NAME [VALUE] in interactive mode\n\n)); diff --git a/src/bin/psql/help.h b/src/bin/psql/help.h index bab360d..3ad374a 100644 --- a/src/bin/psql/help.h +++ b/src/bin/psql/help.h @@ -12,7 +12,7 @@ void usage(unsigned short int pager); void slashUsage(unsigned short int pager); -void help_variables(unsigned short int pager); +void helpVariables(unsigned short int pager); void helpSQL(const char *topic, unsigned short int pager); diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index af68e13..0651588 100644 --- a/src/bin/psql/startup.c +++ b/src/bin/psql/startup.c @@ -574,7 +574,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options) } break; case 1: -help_variables(NOPAGER); +helpVariables(NOPAGER); exit(EXIT_SUCCESS); default: fprintf(stderr, _(Try \%s --help\ for more information.\n), diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 7a94fa8..da1b984 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3516,7 +3516,7 @@ psql_completion(const char *text, int start, int end) else if (strcmp(prev_wd, \\?) == 0) { static const char *const my_list[] = - {options, variables, NULL}; + {commands,options, variables, NULL}; COMPLETE_WITH_LIST_CS(my_list); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
2014-08-28 13:20 GMT+02:00 Andres Freund and...@2ndquadrant.com: Hi, On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote: Hi I chose \? xxx, because it is related to psql features. I wrote commands: \? options \? variables comments? Generall I like it. Some stuff I changed: * I rephrased the sgml changes * s/Printing options/Display options/. Or maybe Display influencing variables? That makes it clearer why they're listed under --help-variables. * I added \? commands as an alias for a plain \? That way the scheme can sensibly be expanded. * I renamed help_variables() to be inline with the surrounding functions. Stuff I wondered about: * How about making it --help=variables instead of --help-variables? That I am sorry, I don't like this idea --help-variables is much more usual for cmd line * Do we really need the 'Report bugs to' blurb for --help-variables? * Since we're not exhaustive about documenting environment variales, do we really need to document TMPDIR and SHELL? * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is for already started statements, and PROMPT3 is for COPY FROM STDIN? I don't see TMPDIR, SHELL as really important in this position Pavel Looks like it's commitable next round. I've attached a incremental patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On 28/08/14 13:20, Andres Freund wrote: Hi, Stuff I wondered about: * How about making it --help=variables instead of --help-variables? -1, help is not a variable to be assigned imho * Do we really need the 'Report bugs to' blurb for --help-variables? Probably not. * Since we're not exhaustive about documenting environment variales, do we really need to document TMPDIR and SHELL? They are bit more important than most of others, but probably not really necessary to be there as they are not psql specific. * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is for already started statements, and PROMPT3 is for COPY FROM STDIN? Yes please! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On Thu, Aug 28, 2014 at 5:48 AM, Pavel Stehule pavel.steh...@gmail.com wrote: comments? +fprintf(output, _( ECHO control what input is written to standard output [all, queries]\n)); The valid values in the help messages should be consistent with the values that the tab-completion displays. So in the case of ECHO, errors and none also should be added in the message. Thought? In the help messages of some psql variables like ECHO_HIDDEN, valid values are not explained. Why not? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is this code safe?
On Thu, Aug 28, 2014 at 11:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: Pavan Deolasee pavan.deola...@gmail.com writes: Can some kind of compiler optimisation reorder things such that the else if expression is evaluated using the old, uninitialised value of optval? Any such behavior is patently illegal per the C standard. Thanks a lot for explaining that. Not that that's always stopped compiler writers; but I think you might be looking at a compiler bug. I'd requested the reporter to try moving the getsockopt() call outside if expression. That hasn't helped. So I think its a case of getsockopt() neither returning an error nor setting optval to any sane value. Will try to get more details about the platform etc. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] re-reading SSL certificates during server reload
On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Aug 27, 2014 at 11:56 AM, Alexey Klyukin al...@hintbits.com wrote: Greetings, Is there a strong reason to disallow reloading server key and cert files during the PostgreSQL reload? Key and cert files are loaded in the postmaster. We'd need to change that. Why? Hmm. That's actually a good point. Not sure I have an excuse. They could certainly be made BACKEND without that, and there's no way to change it within a running backend *anyway*, since we cannot turn on/off SSL once a connection has been made. So yeah, it can actually still be loaded in postmaster, and I withdraw that argument :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
2014-08-28 14:22 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Thu, Aug 28, 2014 at 5:48 AM, Pavel Stehule pavel.steh...@gmail.com wrote: comments? +fprintf(output, _( ECHO control what input is written to standard output [all, queries]\n)); The valid values in the help messages should be consistent with the values that the tab-completion displays. So in the case of ECHO, errors and none also should be added in the message. Thought? In the help messages of some psql variables like ECHO_HIDDEN, valid values are not explained. Why not? it is based on http://www.postgresql.org/docs/9.4/static/app-psql.html ECHO_HIDDEN When this variable is set and a backslash command queries the database, the query is first shown. This way you can study the PostgreSQL internals and provide similar functionality in your own programs. (To select this behavior on program start-up, use the switch -E.) If you set the variable to the value noexec, the queries are just shown but are not actually sent to the server and executed. There are no clear a set of valid values :( .. When I found a known fields in doc, I used it. Regards Pavel Regards, -- Fujii Masao
Re: [HACKERS] Escaping from blocked send() reprised.
Hello, sorry for the dazed reply in the previous mail. I made revised patch for this issue. Attached patches are following, - 0001_Revise_socket_emulation_for_win32_backend.patch Revises socket emulation on win32 backend so that each socket can have its own blocking mode state. - 0002_Allow_backend_termination_during_write_blocking.patch The patch to solve the issue. This patch depends on the 0001_ patch. == I'm marking this as Waiting on Author in the commitfest app, because: 1. the protocol violation needs to be avoided one way or another, and 2. the behavior needs to be consistent so that a single pg_terminate_backend() is enough to always kill the connection. - Preventing protocol violation. To prevent protocol violation, secure_write sets ClientConnectionLost when SIGTERM detected, then internal_flush() and ProcessInterrupts() follow the instruction. - Single pg_terminate_backend surely kills the backend. secure_raw_write() uses non-blocking socket and a loop of select() with timeout to surely detects received signal(SIGTERM). To avoid frequent switching of blocking mode, the bare socket for Port is put to non-blocking mode from the first in StreamConnection() and blocking mode is controlled only by Port-noblock in secure_raw_read/write(). To make the code mentioned above (Patch 0002) tidy, rewrite the socket emulation code for win32 backends so that each socket can have its own non-blocking state. (patch 0001) Some concern about this patch, - This patch allows the number of non-blocking socket to be below 64 (FD_SETSIZE) on win32 backend but it seems to be sufficient. - This patch introduced redundant socket emulation for win32 backend but win32 bare socket for Port is already nonblocking as described so it donsn't seem to be a serious problem on performance. Addition to it, since I don't know the reason why win32/socket.c provides the blocking-mode socket emulation, I decided to preserve win32/socket.c to have blocking socket emulation. Possibly it can be removed. Any suggestions? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 605d891..c92851e 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -795,10 +795,6 @@ pq_set_nonblocking(bool nonblocking) if (MyProcPort-noblock == nonblocking) return; -#ifdef WIN32 - pgwin32_noblock = nonblocking ? 1 : 0; -#else - /* * Use COMMERROR on failure, because ERROR would try to send the error to * the client, which might require changing the mode again, leading to @@ -816,7 +812,7 @@ pq_set_nonblocking(bool nonblocking) ereport(COMMERROR, (errmsg(could not set socket to blocking mode: %m))); } -#endif + MyProcPort-noblock = nonblocking; } diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c index c981169..f0ff3e7 100644 --- a/src/backend/port/win32/socket.c +++ b/src/backend/port/win32/socket.c @@ -21,11 +21,8 @@ * non-blocking mode in order to be able to deliver signals, we must * specify this in a separate flag if we actually need non-blocking * operation. - * - * This flag changes the behaviour *globally* for all socket operations, - * so it should only be set for very short periods of time. */ -int pgwin32_noblock = 0; +static fd_set nonblockset; #undef socket #undef accept @@ -33,6 +30,7 @@ int pgwin32_noblock = 0; #undef select #undef recv #undef send +#undef closesocket /* * Blocking socket functions implemented so they listen on both @@ -40,6 +38,34 @@ int pgwin32_noblock = 0; */ /* + * Set blocking mode for each socket + */ +void +pgwin32_set_socket_nonblock(SOCKET s, int nonblock) +{ + if (nonblock) + FD_SET(s, nonblockset); + else + FD_CLR(s, nonblockset); + + /* + * fd_set cannot have more than FD_SETSIZE entries. It's not likey to come + * close to this limit but if it goes above the limit, non blocking state + * of some existing sockets will be discarded. + */ + if (nonblockset.fd_count = FD_SETSIZE) + elog(FATAL, Too many sockets requested to be nonblocking mode.); +} + +void +pgwin32_nonblockset_init() +{ + FD_ZERO(nonblockset); +} + +#define socket_is_nonblocking(s) FD_ISSET((s), nonblockset) + +/* * Convert the last socket error code into errno */ static void @@ -256,6 +282,10 @@ pgwin32_socket(int af, int type, int protocol) TranslateSocketError(); return INVALID_SOCKET; } + + /* newly cerated socket should be in blocking mode */ + pgwin32_set_socket_nonblock(s, false); + errno = 0; return s; @@ -334,7 +364,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f) return -1; } - if (pgwin32_noblock) + if (socket_is_nonblocking(s)) { /* * No data received, and we are in emulated non-blocking mode, so @@ -420,7 +450,7 @@ pgwin32_send(SOCKET s, const void *buf, int len, int flags) return -1; } - if
Re: [HACKERS] Selectivity estimation for inet operators
On 08/26/2014 12:44 PM, Emre Hasegeli wrote: I agree with you that we can support other join type and anti join later, If others don’t have any objection in doing other parts later I will mark as Ready For Committer. I updated the patch to cover semi and anti joins with eqjoinsel_semi(). I think it is better than returning a constant. The new version attached with the new version of the test script. Can you please look at it again and mark it as ready for committer if it seems okay to you? I took a quick look at this. Some questions: * Isn't X Y equivalent to network_scan_first(X) Y AND network_scan_last(X) Y? Or at least close enough for selectivity estimation purposes? Pardon my ignorance - I'm not too familiar with the inet datatype - but how about just calling scalarineqsel for both bounds? * inet_mcv_join_selec() is O(n^2) where n is the number of entries in the MCV lists. With the max statistics target of 1, a worst case query on my laptop took about 15 seconds to plan. Maybe that's acceptable, but you went through some trouble to make planning of MCV vs histogram faster, by the log2 method to compare only some values, so I wonder why you didn't do the same for the MCV vs MCV case? * A few typos: lenght - length. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
Hello, On Thu, 2014-07-03 at 13:05 +0900, Kyotaro HORIGUCHI wrote: For the earlier than 9.4, no one seems to have considered seriously to use local_preload_library via ALTER statements so far. Only document fix would be enough for them. I think local_preload_libraries never actually worked sensibly for any specific purpose. It was designed to work with the pldebugger, but pldebugger doesn't use it anymore. Hmm, I see although I don't know pldebugger. So before we start bending the GUC system into new directions, let's ask ourselves what the current practical application of local_preload_libraries is and whether the proposed changes support that use at all. I suspect there aren't any, and we should leave it alone. I found this issue when trying per-pg_user (role) loading of auto_analyze and some tweaking tool. It is not necessarily set by the user by own, but the function to decide whether to load some module by the session-user would be usable, at least, as for me:) I agree that we should fix the documentation in 9.3 and earlier. It seems rather hard work if local_preload_library itself is not removed or fixed as expressed.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Why data of timestamptz does not store value of timezone passed to it?
rohtodeveloper wrote I have a question about data type timestamp with time zone. Why data of timestamptz does not store value of timezone passed to it? The timezone of output(+08) is different with the original input value(+02). It seems not to be good behavior. Its good for the inumerable people who use it every day without difficulty... The why is that the goal of timestamptz is to represent a single point-in-time. For all practical purposes the introduction of timezones simply allows for multiple equivalent representations of said point. Postgres has simply chosen UTC as the canonical representation for storage purposes and uses client-provided timezone information to transform the stored valued into the equivalent representation that is thought to be most useful to the user. But the behavior of date type time with time zone is correct. postgres=# select '14:30:30.423602+02'::time with time zone; timetz 14:30:30.423602+02 (1 row) Inconsistent (wrt timestamptz), and possibly buggy (though doubtful, consistency is not mandatory), but the documentation itself says that time with time zone has problematic properties mandated by the SQL standard. The issue is that without knowing the date within a given timezone one does not know the adjustment value to use. TimeZones are inherently date dependent - so timetz is fundamentally flawed even if it can be used to good effect in limited situations. If this does what you need then create a composite type (date, timetz). Once you starting doing modifications to your custom type you will likely find the timestamptz behavior to be more useful and accurate. If the corrent behavior of timestamptz is not suitable,is there any plan to correct the behavior of timestamptz or create a new data type which can store timestamp with timezone? Timestamptz will never be changed from its current behavior. The bar to introduce another timestamptz-like data type with different behavior is extremely high. It would probably be worthwhile for everyone if you share what you are actually trying to accomplish instead of just throwing out the claim that the data type is broken. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Why-data-of-timestamptz-does-not-store-value-of-timezone-passed-to-it-tp5816703p5816737.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specifying the unit in storage parameter
Michael Paquier wrote: This remark is just to limit the amount of trash in the database used for regression tests. But then if we'd remove everything we would lack handy material for tests on utilities like database-wide thingies of the type VACUUM, REINDEX, pg_dump, etc. And we can just drop the database used for regressions to clean up everything. So that's not mandatory at all. I tend to always clean up objects in my patches touching regressions to limit interactions with other tests, but I guess that's up to the person who wrote the code to decide. Leaving lingering objects is not a bad thing, particularly if they have unusual properties; they enable somebody pg_dump'ing the database which can be a good test for pg_dump. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
Magnus Hagander mag...@hagander.net writes: On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net wrote: Key and cert files are loaded in the postmaster. We'd need to change that. Why? Hmm. That's actually a good point. Not sure I have an excuse. They could certainly be made BACKEND without that, and there's no way to change it within a running backend *anyway*, since we cannot turn on/off SSL once a connection has been made. So yeah, it can actually still be loaded in postmaster, and I withdraw that argument :) Why would they need to be BACKEND, as opposed to just PGC_SIGHUP? The only reason they're PGC_POSTMASTER is the lack of any code for loading updated values, which I assume is something that's possible with OpenSSL. We could in fact wait to load them until after a backend has forked off from the postmaster, but (1) that'd slow down session startup, and (2) it would mean that you don't hear about broken settings at postmaster startup. (BTW, what happens on Windows? I imagine we have to reload them anyway after fork/exec on that platform ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net wrote: Key and cert files are loaded in the postmaster. We'd need to change that. Why? Hmm. That's actually a good point. Not sure I have an excuse. They could certainly be made BACKEND without that, and there's no way to change it within a running backend *anyway*, since we cannot turn on/off SSL once a connection has been made. So yeah, it can actually still be loaded in postmaster, and I withdraw that argument :) Why would they need to be BACKEND, as opposed to just PGC_SIGHUP? The only reason they're PGC_POSTMASTER is the lack of any code for loading updated values, which I assume is something that's possible with OpenSSL. I just thought semantically - because they do not change in a running backend. Any running backend will continue with encryption set up based on the old certificate. We could in fact wait to load them until after a backend has forked off from the postmaster, but (1) that'd slow down session startup, and (2) it would mean that you don't hear about broken settings at postmaster startup. (BTW, what happens on Windows? I imagine we have to reload them anyway after fork/exec on that platform ...) Yes, we already do that - secure_initialize() is called in SubPostmasterMain(). But I think reloading them in the postmaster on Unix is the better choice, yes. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
Magnus Hagander mag...@hagander.net writes: On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Why would they need to be BACKEND, as opposed to just PGC_SIGHUP? I just thought semantically - because they do not change in a running backend. Any running backend will continue with encryption set up based on the old certificate. Hm. Yeah, I guess there is some use in holding onto the values that were actually used to initialize the current session, or at least there would be if we exposed the cert contents in any fashion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.
Michael Paquier michael.paqu...@gmail.com writes: The patch attached fixes pg_upgrade by putting quotes when generating reloptions and it passes check-world. I imagine that having quotes by default in the value of reloptions in pg_class is the price to pay for supporting units. If this is not considered as a correct approach, then reverting the patch would be wiser I guess. Ugh. I'm not sure what the best solution is, but I don't think I like that one. Given that this patch broke both pg_dump and pg_upgrade, I think it should be reverted for the time being, rather than applying a hasty hack. Obviously more thought and testing is needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
On 2014-08-28 10:12:19 -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Why would they need to be BACKEND, as opposed to just PGC_SIGHUP? I just thought semantically - because they do not change in a running backend. Any running backend will continue with encryption set up based on the old certificate. Hm. Yeah, I guess there is some use in holding onto the values that were actually used to initialize the current session, or at least there would be if we exposed the cert contents in any fashion. Won't that allow the option to be specified at connection start by mere mortal users? That sounds odd to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
Andres Freund and...@2ndquadrant.com writes: On 2014-08-28 10:12:19 -0400, Tom Lane wrote: Hm. Yeah, I guess there is some use in holding onto the values that were actually used to initialize the current session, or at least there would be if we exposed the cert contents in any fashion. Won't that allow the option to be specified at connection start by mere mortal users? That sounds odd to me. Well, no, because SSL would be established (or not) before we ever process the contents of the connection request packet. You might be able to change the value that SHOW reports, but not the value actually governing your session. Having said that, there's a nearby thread about inventing a SUBACKEND GUC category, and that's likely what we'd really want to use here, just on the grounds that superusers would know better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
On Thu, Aug 28, 2014 at 4:14 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-28 10:12:19 -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Why would they need to be BACKEND, as opposed to just PGC_SIGHUP? I just thought semantically - because they do not change in a running backend. Any running backend will continue with encryption set up based on the old certificate. Hm. Yeah, I guess there is some use in holding onto the values that were actually used to initialize the current session, or at least there would be if we exposed the cert contents in any fashion. Won't that allow the option to be specified at connection start by mere mortal users? That sounds odd to me. The cert is (and has to be) loaded before we even read the startup packet, so there is no way for them to actually send the value over early enough I believe. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Audit of logout
On Wed, Aug 27, 2014 at 5:19 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Aug 23, 2014 at 3:44 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Aug 5, 2014 at 8:04 PM, Fujii Masao masao.fu...@gmail.com wrote: Changing PGC_SU_BACKEND parameter (log_connections) is visible even with a non-super user client due to above code. Shouldn't it be only visible for super-user logins? Simple steps to reproduce the problem: a. start Server (default configuration) b. connect with superuser c. change in log_connections to on in postgresql.conf d. perform select pg_reload_conf(); e. connect with non-super-user f. show log_connections; --This step shows the value as on, --whereas I think it should have been off In this case, log_connections is changed in postgresql.conf and it's reloaded, so ISTM that it's natural that even non-superuser sees the changed value. No? Maybe I'm missing something. Yeah, you are right. With the latest patch, I am getting one regression failure on windows. Attached is the regression diff. Can we improve this line a bit? ! * BACKEND options are the same as SU_BACKEND ones, but they can BACKEND options can be set same as SU_BACKEND ones, .. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com regression.diffs Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 08/28/2014 04:43 AM, Peter Geoghegan wrote: -- Nesting within wCTE: WITH t AS ( INSERT INTO z SELECT i, 'insert' FROM generate_series(0, 16) i ON CONFLICT UPDATE SET v = v || 'update' -- use of operators/functions in targetlist RETURNING * -- only projects inserted tuples, never updated tuples ) SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k; Personally I would find it surprising if RETURNING did not also return the updated tuples. In many use cases for upsert the user does not care if the row was new or not. What I think would be useful is if all tuples were returned but there was some way to filter out only the inserted ones. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
On 2014-08-28 10:20:08 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-28 10:12:19 -0400, Tom Lane wrote: Hm. Yeah, I guess there is some use in holding onto the values that were actually used to initialize the current session, or at least there would be if we exposed the cert contents in any fashion. Won't that allow the option to be specified at connection start by mere mortal users? That sounds odd to me. Well, no, because SSL would be established (or not) before we ever process the contents of the connection request packet. You might be able to change the value that SHOW reports, but not the value actually governing your session. Sure. There's probably nothing happening with ssl 'that late' right now. But and it seems like a invitation for trouble later to me. Even if it's just that other code copies the logic, thinking it'd also be safe. Having said that, there's a nearby thread about inventing a SUBACKEND GUC category, and that's likely what we'd really want to use here, just on the grounds that superusers would know better. What we really want is PGC_SIGHUP | PGC_BACKEND, right? I wonder if it's not better to allow for some cominations of GucContext's by bitmask, instead of adding SUBACKEND and HUPBACKEND and whatever else might make sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
Andres Freund and...@2ndquadrant.com writes: On 2014-08-28 10:20:08 -0400, Tom Lane wrote: Having said that, there's a nearby thread about inventing a SUBACKEND GUC category, and that's likely what we'd really want to use here, just on the grounds that superusers would know better. What we really want is PGC_SIGHUP | PGC_BACKEND, right? How you figure that? You *don't* want it to change at SIGHUP, at least not in existing sessions. And the postmaster already thinks it should reload PGC_BACKEND variables on SIGHUP. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] re-reading SSL certificates during server reload
On 2014-08-28 10:30:30 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-28 10:20:08 -0400, Tom Lane wrote: Having said that, there's a nearby thread about inventing a SUBACKEND GUC category, and that's likely what we'd really want to use here, just on the grounds that superusers would know better. What we really want is PGC_SIGHUP | PGC_BACKEND, right? How you figure that? You *don't* want it to change at SIGHUP, at least not in existing sessions. And the postmaster already thinks it should reload PGC_BACKEND variables on SIGHUP. Well, it shouldn't be settable at connection start but changed for new connections (expressed by _SIGHUP). But it shouldn't change for existing connections (therefor PGC_BACKEND). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On 2014-08-28 14:04:27 +0200, Petr Jelinek wrote: On 28/08/14 13:20, Andres Freund wrote: Hi, Stuff I wondered about: * How about making it --help=variables instead of --help-variables? -1, help is not a variable to be assigned imho I don't think variable assignment is a good mental model for long commandline arguments. And it's not like I'm the first to come up with an extensible --help. Check e.g. gcc. But anyway, I guess I've lost that argument. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql latency bgwriter not doing its job
On Thu, Aug 28, 2014 at 3:27 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Aidan, If all you want is to avoid the write storms when fsyncs start happening on slow storage, can you not just adjust the kernel vm.dirty* tunables to start making the kernel write out dirty buffers much sooner instead of letting them accumulate until fsyncs force them out all at once? I tried that by setting: vm.dirty_expire_centisecs = 100 vm.dirty_writeback_centisecs = 100 So it should start writing returned buffers at most 2s after they are returned, if I understood the doc correctly, instead of at most 35s. The result is that with a 5000s 25tps pretty small load (the system can do 300tps with the default configuration), I lost 2091 (1.7%) of transactions, that is they were beyond the 200ms schedule limit. Another change is that overall the lost transactions are more spread than without this setting, although there still is stretches of unresponsiveness. So although the situation is significantly better, it is still far from good with the much reduced value I tried. What do the checkpoint logs look like now? (especially interested in sync times) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
Josh Berkus wrote: On 04/16/2014 01:30 PM, Alvaro Herrera wrote: Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Yeah, good idea. Want to propose a patch? Yeah, lemme dig into this. I really think we need it for 9.4, feature frozen or not. Ping? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Tue, Aug 26, 2014 at 12:19 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I could think of 2 ways to change this: a. if user has specified cost_limit value for table, then it just uses it rather than rebalancing based on value of system-wide guc variable autovacuum_vacuum_cost_limit b. another could be to restrict setting per-table value to be lesser than system-wide value? The former is used for auto vacuum parameters like scale_factor and later is used for parameters like freeze_max_age. I've been giving some thought to this. Really, there is no way to handle this sensibly while at the same time keeping the documented behavior -- or in other words, what we have documented is not useful behavior. Your option (b) above is an easy solution to the problem, however it means that the user will have serious trouble configuring the system in scenarios such as volatile tables, as Mark says -- essentially that will foreclose the option of using autovacuum for them. I'm not sure I like your (a) proposal much better. One problem there is that if you set the values for a table to be exactly the same values as in postgresql.conf, it will behave completely different because it will not participate in balancing. To me this seems to violate POLA. So my proposal is a bit more complicated. First we introduce the notion of a single number, to enable sorting and computations: the delay equivalent, ... I favor option (a). There's something to be said for your proposal in terms of logical consistency with what we have now, but to be honest I'm not sure it's the behavior anyone wants (I would welcome more feedback on what people actually want). I think we should view an attempt to set a limit for a particular table as a way to control the rate at which that table is vacuumed - period. At least in situations that I've encountered, it's typical to be able to determine the frequency with which a given table needs to be vacuumed to avoid runaway bloat, and from that you can work backwards to figure out how fast you must process it in MB/s, and from there you can work backwards to figure out what cost delay will achieve that effect. But if the system tinkers with the cost delay under the hood, then you're vacuuming at a different (slower) rate and, of course, the table bloats. Now, in the case where you are setting an overall limit, there is at least an argument to be made that you can determine the overall rate of autovacuum-induced I/O activity that the system can tolerate, and set your limits to stay within that budget, and then let the system decide how to divide that I/O up between workers. But if you're overriding a per-table limit, I don't really see how that holds any water. The system I/O budget doesn't go up just because one particular table is being vacuumed rather than any other. The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. I might be missing something, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.
On Thu, Aug 28, 2014 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: The patch attached fixes pg_upgrade by putting quotes when generating reloptions and it passes check-world. I imagine that having quotes by default in the value of reloptions in pg_class is the price to pay for supporting units. If this is not considered as a correct approach, then reverting the patch would be wiser I guess. Ugh. I'm not sure what the best solution is, but I don't think I like that one. Another approach is to change pg_dump so that it encloses the relopt values with single quotes. This is the same approach as what pg_dumpall does for database or role-specific settings. Obvious drawback of this approach is that it prevents pg_dump with 9.4 or before from outputting the restorable dump. Maybe we can live with this because there is no guarantee that older version of pg_dump can work properly with newer major version of server. But maybe someone cannot live with that. Not sure. Further other approach is to change the reloptions code so that it always stores the plain value without the units (i.e., 1000 is stored if 1s is specified in autovacuum_vacuum_cost_delay)in pg_class. This approach doesn't prevent older version of pg_dump from taking the dump from v9.5 server. One drawback of this approach is that reloption values are always stored without the units, which might make it a bit hard for a user to see the reloption values from pg_class. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] implement subject alternative names support for SSL connections
On Mon, Aug 25, 2014 at 12:02 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/24/2014 03:11 PM, Alexey Klyukin wrote: On Wed, Aug 20, 2014 at 11:53 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The patch doesn't seem to support wildcards in alternative names. Is that on purpose? Not really, we just did not have any use case for them, but it seems that RFC 5280 does not disallow them: Finally, the semantics of subject alternative names that include wildcard characters (e.g., as a placeholder for a set of names) are not addressed by this specification. Applications with specific requirements MAY use such names, but they must define the semantics. I've added support for them in the next iteration of the patch attached to this email. It would be good to add a little helper function that does the NULL-check, straight comparison, and wildcard check, for a single name. And then use that for the Common Name and all the Alternatives. That'll ensure that all the same rules apply whether the name is the Common Name or an Alternative (assuming that the rules are supposed to be the same; I don't know if that's true). Thanks, common code has been moved into a separate new function. Another question is how should we treat the certificates with no CN and non-empty SAN? Current code just bails out right after finding no CN section present , but the RFC (5280) says: Further, if the only subject identity included in the certificate is an alternative name form (e.g., an electronic mail address), then the subject distinguished name MUST be empty (an empty sequence), and the subjectAltName extension MUST be present. which to me sounds like the possibility of coming across such certificates in the wild, although I personally see little use in them. Regards, -- Alexey Klyukin diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c new file mode 100644 index f950fc3..394a66f *** a/src/interfaces/libpq/fe-secure-openssl.c --- b/src/interfaces/libpq/fe-secure-openssl.c *** *** 60,68 --- 60,73 #ifdef USE_SSL_ENGINE #include openssl/engine.h #endif + #include openssl/x509v3.h static bool verify_peer_name_matches_certificate(PGconn *); static intverify_cb(int ok, X509_STORE_CTX *ctx); + static intcertificate_name_entry_validate_match(PGconn *conn, + char *name, + unsigned int len, + bool *match); static void destroy_ssl_system(void); static intinitialize_SSL(PGconn *conn); static PostgresPollingStatusType open_client_SSL(PGconn *); *** wildcard_certificate_match(const char *p *** 471,479 return 1; } /* ! *Verify that common name resolves to peer. */ static bool verify_peer_name_matches_certificate(PGconn *conn) --- 476,515 return 1; } + /* + * Validate a single certificate name entry and match it against the pghost. + * Returns 0 if the certificate name is invalid (contains embedded NULLs), 1 otherwise. + */ + static int + certificate_name_entry_validate_match(PGconn *conn, char *name, unsigned int len, bool *match) + { + /* There is no guarantee the string returned from the certificate is NULL-terminated */ + name[len] = '\0'; + *match = false; + /* +* Reject embedded NULLs in certificate common or alternative name to prevent attacks +* like CVE-2009-4034. +*/ + if (len != strlen(name)) + { + printfPQExpBuffer(conn-errorMessage, + libpq_gettext(SSL certificate's common name contains embedded null\n)); + return 0; + } + if (pg_strcasecmp(name, conn-pghost) == 0) + /* Exact name match */ + *match = true; + else if (wildcard_certificate_match(name, conn-pghost)) + /* Matched wildcard certificate */ + *match = true; + else + *match = false; + return 1; + } + /* ! *Verify that common name or any of the alternative dNSNames resolves to peer. */ static bool verify_peer_name_matches_certificate(PGconn *conn) *** verify_peer_name_matches_certificate(PGc *** 492,499 /* * Extract the common name from the certificate. -* -* XXX: Should support alternate names here */ /* First find out the name's length and allocate a buffer for it. */ len = X509_NAME_get_text_by_NID(X509_get_subject_name(conn-peer), --- 528,533 *** verify_peer_name_matches_certificate(PGc *** 522,540
[HACKERS] Re: [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.
On Thu, Aug 28, 2014 at 12:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Aug 28, 2014 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: The patch attached fixes pg_upgrade by putting quotes when generating reloptions and it passes check-world. I imagine that having quotes by default in the value of reloptions in pg_class is the price to pay for supporting units. If this is not considered as a correct approach, then reverting the patch would be wiser I guess. Ugh. I'm not sure what the best solution is, but I don't think I like that one. Another approach is to change pg_dump so that it encloses the relopt values with single quotes. This is the same approach as what pg_dumpall does for database or role-specific settings. Obvious drawback of this approach is that it prevents pg_dump with 9.4 or before from outputting the restorable dump. Maybe we can live with this because there is no guarantee that older version of pg_dump can work properly with newer major version of server. But maybe someone cannot live with that. Not sure. To me, this doesn't seem nearly important enough to justify breaking pg_dump compatibility. AAUI, this is just a cosmetic improvement, so we shouldn't break functional things for that. Further other approach is to change the reloptions code so that it always stores the plain value without the units (i.e., 1000 is stored if 1s is specified in autovacuum_vacuum_cost_delay)in pg_class. This approach doesn't prevent older version of pg_dump from taking the dump from v9.5 server. One drawback of this approach is that reloption values are always stored without the units, which might make it a bit hard for a user to see the reloption values from pg_class. This seems like the way to go. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
On 08/28/2014 09:09 AM, Alvaro Herrera wrote: Josh Berkus wrote: On 04/16/2014 01:30 PM, Alvaro Herrera wrote: Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Yeah, good idea. Want to propose a patch? Yeah, lemme dig into this. I really think we need it for 9.4, feature frozen or not. Got sidetracked by JSONB. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] implement subject alternative names support for SSL connections
On Mon, Aug 25, 2014 at 12:33 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/25/2014 01:07 PM, Andres Freund wrote: On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote: But actually, I wonder if we should delegate the whole hostname matching to OpenSSL? There's a function called X509_check_host for that, although it's new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep the current code to handle older versions. Given that we're about to add support for other SSL implementations I'm not sure that that's a good idea. IIRC there exist quite a bit of different interpretations about what denotes a valid cert between the libraries. As long as just this patch is concerned, I agree it's easier to just implement it ourselves, but if we want to start implementing more complicated rules, then I'd rather not get into that business at all, and let the SSL library vendor deal with the bugs and CVEs. Sounds reasonable. I guess we'll go ahead with this patch for now, but keep this in mind if someone wants to complicate the rules further in the future. +1 -- Regards, Alexey Klyukin
Re: [HACKERS] Switch pg_basebackup to use -X stream instead of -X fetch by default?
On Wed, Aug 27, 2014 at 2:55 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Aug 27, 2014 at 5:16 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Aug 27, 2014 at 6:16 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Aug 26, 2014 at 10:46 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-26 16:41:44 -0400, Peter Eisentraut wrote: On 8/26/14 12:40 PM, Magnus Hagander wrote: I think the first reason is gone now, and the risk/damage of the two connections is probably smaller than running out of WAL. -x is a good default for smaller systems, but -X is a safer one for bigger ones. So I agree that changing the default mode would make sense. I would seriously consider just removing one of the modes. Having two modes is complex enough, and then having different defaults in different versions, and fuzzy recommendations like, it's better for smaller systems, it's quite confusing. Happy with removing the option and just accepting -X for backward compat. Works for me - this is really the cleaner way of doing it... We cannot use -X stream with tar output format mode. So I'm afraid that removing -X fetch would make people using tar output format feel disappointed. Or we should make -X stream work with tar mode. Ah, yes, I've actually had that on my TODO for some time. I think the easy way of doing that is to just create an xlog.tar file. Since we already create base.tar and possibly n*tablespace.tar, adding one more file shouldn't be a big problem, and would make such an implementation much easier. Would be trivial to do .tar.gz for it as well, just like for the others. Still, that seems like a pretty good reason not to rip the old mode out completely. Actually, I'd favor that anyway: changing the default in one release and removing the deprecated feature in a later release usually provides a smoother upgrade path. But if there are features that aren't even present in the newer mode yet, that's an even better reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On 08/28/2014 01:51 AM, rohtodeveloper wrote: Hi,all I have a question about data type timestamp with time zone. Why data of timestamptz does not store value of timezone passed to it? Considering the following example. postgres=# select '2014-08-28 14:30:30.423602+02'::timestamp with time zone; timestamptz --- 2014-08-28 20:30:30.423602+08 (1 row) The timezone of output(+08) is different with the original input value(+02). It seems not to be good behavior.But the behavior of date type time with time zone is correct. postgres=# select '14:30:30.423602+02'::time with time zone; timetz 14:30:30.423602+02 (1 row) If the corrent behavior of timestamptz is not suitable,is there any plan to correct the behavior of timestamptz or create a new data type which can store timestamp with timezone? *)manual--8.5.1.3. Time Stamps - For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone. - This is actually more appropriate for the General mailing list. But... I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time with 2014-08-28 14:30:30.423602+02 and 2014-08-28 20:30:30.423602+08 merely being different representations of that same point in time. Time with time zone is a similarly bad name as it is really a time with offset from GMT. It should be noted that -08, +02 etc. are actually *offsets* from GMT and are not, technically, time-zones. A time-zone includes additional information about the dates on which that offset changes due to daylight saving schedules and politically imposed changes thereto. As the manual states, The type time with time zone is defined by the SQL standard, but the definition exhibits properties which lead to questionable usefulness. From the above, you can infer that one of those issues is that the offset changes based on the date but there is no date in a time with time zone field. Among the things you will discover is that '12:34:56-04' is legal input for time with time zone but '12:34:56 America/New_York' is not because you can't determine the offset without a date. Adding a date like '2014-08-28 12:34:56 America/New_York' will give you a time with offset or what the spec calls time with time zone (12:45:31.899075-04) though it really doesn't have any information about America/New_York. That the internal representation is in GMT is a curiosity but ultimately irrelevant as is it up to PostgreSQL to appropriately convert/display whatever it stores internally to the input and output format specified by the user. The varying values of things like day, month and year combined with constantly shifting definitions of time-zones make date and time handling, *um* interesting. Is the interval 1-day shorthand for 24-hours or the same time of day the following day (i.e. when crossing DST boundaries). What is the appropriate value of March 31 minus one month? February 29 plus one year? Read and experiment to understand the quirks and the design-decisions implemented in PostgreSQL (or other program). Cheers, Steve
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
Robert Haas wrote: Now, in the case where you are setting an overall limit, there is at least an argument to be made that you can determine the overall rate of autovacuum-induced I/O activity that the system can tolerate, and set your limits to stay within that budget, and then let the system decide how to divide that I/O up between workers. But if you're overriding a per-table limit, I don't really see how that holds any water. The system I/O budget doesn't go up just because one particular table is being vacuumed rather than any other. The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. Yeah, this makes sense to me too -- at least as long as you only have one such table. But if you happen to have more than one, and due to some bad luck they happen to be vacuumed concurrently, they will eat a larger share of your I/O bandwidth budget than you anticipated, which you might not like. Thus what I am saying is that those should be scaled down too to avoid peaks. Now, my proposal above mentioned subtracting the speed of tables under the limit, from the speed of those above the limit; maybe we can just rip that part out. Then we end up with the behavior you want, that is to have the fast table vacuum as fast as it is configured when it's the only fast table being vacuumed; and also with what I say, which is that if you have two of them, the two balance the I/O consumption (but only among themselves, not with the slow ones.) Since figuring out this subtraction is the only thing missing from the patch I posted, ISTM we could have something committable with very little extra effort if we agree on this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing plpgsql.o Symbols on OS X
On Aug 27, 2014, at 9:53 PM, Ashesh Vashi ashesh.va...@enterprisedb.com wrote: Please add -arch x86_64 to your LD_FLAGS and CFLAGS in your make file. This made no difference: LDFLAGS = -arch x86_64 CFLAGS = -arch x86_64 Best, David signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] Missing plpgsql.o Symbols on OS X
On Aug 27, 2014, at 4:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, but plpgsql.so is mentioned nowhere on your command line. I'm not too sure about the dynamic-linking rules on OS X, but I'd not be surprised if you need to provide a reference to plpgsql.so in its final installed location (ie, a reference to it in the build tree may appear to link and then fail at runtime). Ah. Is there a recommended way to do that in a PGXS-powered Makefile? Thanks, David signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] postgresql latency bgwriter not doing its job
I tried that by setting: vm.dirty_expire_centisecs = 100 vm.dirty_writeback_centisecs = 100 So it should start writing returned buffers at most 2s after they are returned, if I understood the doc correctly, instead of at most 35s. The result is that with a 5000s 25tps pretty small load (the system can do 300tps with the default configuration), I lost 2091 (1.7%) of transactions, that is they were beyond the 200ms schedule limit. Another change is that overall the lost transactions are more spread than without this setting, although there still is stretches of unresponsiveness. So although the situation is significantly better, it is still far from good with the much reduced value I tried. What do the checkpoint logs look like now? (especially interested in sync times) Here is an extract: LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 4893 buffers (4.8%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=128.430 s, sync=0.378 s, total=128.921 s; sync files=11, longest=0.375 s, average=0.034 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6200 buffers (6.1%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=128.934 s, sync=0.240 s, total=129.280 s; sync files=7, longest=0.132 s, average=0.034 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6177 buffers (6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=130.146 s, sync=0.322 s, total=130.592 s; sync files=7, longest=0.185 s, average=0.046 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6185 buffers (6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=132.673 s, sync=0.143 s, total=132.909 s; sync files=5, longest=0.078 s, average=0.028 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6188 buffers (6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=130.415 s, sync=0.170 s, total=130.676 s; sync files=5, longest=0.132 s, average=0.034 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6203 buffers (6.1%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=131.726 s, sync=0.444 s, total=132.256 s; sync files=5, longest=0.441 s, average=0.088 s ... Basically sync is between 0.1-0.5 seconds. I had one particulary bad stretch of unresponsiveness, which is not clearly related to a slow checkpoint sync: progress: 4065.0 s, 28.7 tps, lat 11.886 ms stddev 48.470, lag 18.851 ms, 0 skipped progress: 4066.2 s, 1.6 tps, lat 952.830 ms stddev 280.673, lag 155.310 ms, 25 skipped progress: 4067.0 s, 2.7 tps, lat 1067.730 ms stddev 316.321, lag 171.766 ms, 14 skipped progress: 4068.4 s, 1.4 tps, lat 1147.392 ms stddev 240.337, lag 132.297 ms, 35 skipped progress: 4069.7 s, 2.3 tps, lat 1034.543 ms stddev 213.786, lag 154.453 ms, 35 skipped progress: 4070.2 s, 1.8 tps, lat 562.715 ms stddev 0.000, lag 182.833 ms, 12 skipped progress: 4071.3 s, 3.6 tps, lat 600.929 ms stddev 108.232, lag 162.723 ms, 25 skipped progress: 4072.5 s, 1.7 tps, lat 1077.187 ms stddev 77.654, lag 168.849 ms, 31 skipped progress: 4073.3 s, 1.3 tps, lat 1298.093 ms stddev 0.000, lag 168.882 ms, 21 skipped progress: 4074.0 s, 2.7 tps, lat 920.704 ms stddev 183.587, lag 165.333 ms, 24 skipped progress: 4075.3 s, 2.4 tps, lat 756.051 ms stddev 118.241, lag 176.863 ms, 29 skipped progress: 4076.1 s, 1.3 tps, lat 1424.548 ms stddev 0.000, lag 0.000 ms, 17 skipped progress: 4077.3 s, 2.4 tps, lat 791.090 ms stddev 338.729, lag 155.403 ms, 26 skipped progress: 4078.1 s, 27.4 tps, lat 46.834 ms stddev 198.812, lag 3.915 ms, 0 skipped -- Fabien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Steve Crawford scrawf...@pinpointresearch.com wrote: I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time I agree. While what timestamptz implements is a very useful data type, I think it was a very unfortunate decision to implement that for the standard type name, instead of something more consistent with the spec. It seems very unlikely to change, though, because so much existing production code would break. :-( Understandably, people do tend to expect that saving something into a column defined as TIMESTAMP WITH TIME ZONE will save a time zone with the timestamp, and in PostgreSQL it does not. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
2014-08-28 20:26 GMT+02:00 Kevin Grittner kgri...@ymail.com: Steve Crawford scrawf...@pinpointresearch.com wrote: I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time I agree. While what timestamptz implements is a very useful data type, I think it was a very unfortunate decision to implement that for the standard type name, instead of something more consistent with the spec. It seems very unlikely to change, though, because so much existing production code would break. :-( Understandably, people do tend to expect that saving something into a column defined as TIMESTAMP WITH TIME ZONE will save a time zone with the timestamp, and in PostgreSQL it does not. Yes, it strange for first moment, and it is difficult for beginners - but it works well .. after you switch to different mode. But can we implement a Time Zone as special type? This and examples and documentation can better explain what it does. Pavel -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Similar to csvlog but not really, json logs?
On 08/27/2014 09:53 AM, Andres Freund wrote: Perhaps instead of doing this in-core it would be better to make log handling more extensible? I'm thinking add a specific binary format and an external tool that can parse that and do whatever the user wants with it. That means we don't have to keep adding more complexity to the internal log handling (which already has the risk of being a bottleneck), while allowing maximum user flexibility. There's a logging hook. Most of this should be doable from there. Is there any docs at all on the logging hooks? I couldn't find any. Maybe that's why people keep trying to reinvent them. The main reason I personally would like to have JSON logs has more to do with formatting and extracting data. For example, if you have a prepared statement and log_min_duration_statement turned on, you get something like this in the message and details fields: duration: 8.253 ms execute unnamed: SELECT login FROM users WHERE id = $1, parameters: $1 = '90700' ... and then for various analytics, like pgBadger and troubleshooting, you have to break out the parts of those fields by regex and split, which is error-prone; in our query replay tool, I have at least a dozen commits tweaking the regexes because of some special case I didn't account for (like an array parameter). It would be vastly easier to work with the above as JSON: ... message : { duration : 8.253, command : execute, statement_name : unnamed, statement : SELECT login FROM users WHERE id = $1 }, details : { parameters : { $1 : 90700 } } ... This would allow me, or Dalibo, to remove literally dozens of lines of error-prone regexing code. That fix would, IMHO, make it worth enabling JSON logging as a logging hook or something similar. If we're just going to convert the CSV to JSON, with the existing fields? Waste of time, I can do that with a 5-line script in at least 3 different languages. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson andr...@proxel.se wrote: Personally I would find it surprising if RETURNING did not also return the updated tuples. In many use cases for upsert the user does not care if the row was new or not. I'm not attached to that particular behavior, but it does seem kind of similar to the behavior of BEFORE triggers, where a NULL return value (do nothing) will also cause RETURNING to not project the tuple. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 08/28/2014 09:05 PM, Peter Geoghegan wrote: On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson andr...@proxel.se wrote: Personally I would find it surprising if RETURNING did not also return the updated tuples. In many use cases for upsert the user does not care if the row was new or not. I'm not attached to that particular behavior, but it does seem kind of similar to the behavior of BEFORE triggers, where a NULL return value (do nothing) will also cause RETURNING to not project the tuple. I see. So we have three cases where we may or may not want to project a tuple. 1) The tuple was inserted 2) We got a conflict and updated the tuple 3) We got a conflict but skipped updating the tuple My personal intuition was that (1) and (2) would be returned but not (3). But I am not sure if that is the most useful behavior. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Thu, Aug 28, 2014 at 1:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: Now, in the case where you are setting an overall limit, there is at least an argument to be made that you can determine the overall rate of autovacuum-induced I/O activity that the system can tolerate, and set your limits to stay within that budget, and then let the system decide how to divide that I/O up between workers. But if you're overriding a per-table limit, I don't really see how that holds any water. The system I/O budget doesn't go up just because one particular table is being vacuumed rather than any other. The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. Yeah, this makes sense to me too -- at least as long as you only have one such table. But if you happen to have more than one, and due to some bad luck they happen to be vacuumed concurrently, they will eat a larger share of your I/O bandwidth budget than you anticipated, which you might not like. I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Now that is obviously more a matter for autovacuum_naptime than this option, but the point seems general to me: if you need the table vacuumed every N seconds, minutes, or hours, and it only gets vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there are other autovacuum workers running, the table is going to bloat. That *might* be better than busting your I/O budget, but it might also be (and I think frequently is) much worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On Thu, Aug 28, 2014 at 11:26:53AM -0700, Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time I agree. While what timestamptz implements is a very useful data type, I think it was a very unfortunate decision to implement that for the standard type name, instead of something more consistent with the spec. It seems very unlikely to change, though, because so much existing production code would break. :-( Understandably, people do tend to expect that saving something into a column defined as TIMESTAMP WITH TIME ZONE will save a time zone with the timestamp, and in PostgreSQL it does not. So the standard requires storing of original timezone in the data type? I was not aware of that. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote: On Thu, Aug 28, 2014 at 11:26:53AM -0700, Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time I agree. While what timestamptz implements is a very useful data type, I think it was a very unfortunate decision to implement that for the standard type name, instead of something more consistent with the spec. It seems very unlikely to change, though, because so much existing production code would break. :-( Understandably, people do tend to expect that saving something into a column defined as TIMESTAMP WITH TIME ZONE will save a time zone with the timestamp, and in PostgreSQL it does not. So the standard requires storing of original timezone in the data type? I was not aware of that. I do not have a copy of the SQL 92 spec, but several references to the spec mention that it defined the time zone as a format SHH:MM where S represents the sign (+ or -), which seems to be what PostgreSQL uses. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
k...@rice.edu k...@rice.edu writes: On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote: So the standard requires storing of original timezone in the data type? I was not aware of that. I do not have a copy of the SQL 92 spec, but several references to the spec mention that it defined the time zone as a format SHH:MM where S represents the sign (+ or -), which seems to be what PostgreSQL uses. Yeah, the spec envisions timezone as being a separate numeric field (ie, a numeric GMT offset) within a timestamp with time zone. One of the ways in which the spec's design is rather broken is that there's no concept of real-world time zones with varying DST rules. Anyway, I agree with the upthread comments that it'd have been better if we'd used some other name for this datatype, and also that it's at least ten years too late to revisit the choice :-(. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
Robert Haas wrote: I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Wow, that's extreme. For that case you can set autovacuum_vacuum_cost_limit to 0, which disables the whole thing and lets vacuum run at full speed -- no throttling at all. Would that satisfy the concern? Now that is obviously more a matter for autovacuum_naptime than this option, but the point seems general to me: if you need the table vacuumed every N seconds, minutes, or hours, and it only gets vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there are other autovacuum workers running, the table is going to bloat. There might be another problem here which is that if you have all your workers busy because they are vacuuming large tables that don't have churn high enough to warrant disrupting the whole environment (thus low cost_limit), then the table will bloat no matter what you set its cost limit to. So there's not only a matter of a low enough naptime (which is a bad thing for the rest of the system, also) but also one of something similar to priority inversion; should you speed up the vacuuming of those large tables so that one worker is freed soon enough to get to the high-churn table? Was the solution for that customer to set an external tool running vacuum on that table? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
k...@rice.edu k...@rice.edu wrote: On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote: So the standard requires storing of original timezone in the data type? I was not aware of that. I do not have a copy of the SQL 92 spec, but several references to the spec mention that it defined the time zone as a format SHH:MM where S represents the sign (+ or -), which seems to be what PostgreSQL uses. I just took a quick look at the spec to refresh my memory, and it seems to require that the WITH TIME ZONE types store UTC (I suppose for fast comparisons), it requires the time zone in the form of a hour:minute offset to be stored with it, so you can determine the local time from which it was derived. I concede that this is not usually useful, and am glad we have a type that behaves as timestamptz does; but occasionally a type that behaves in conformance with the spec would be useful, and it would certainly be less confusing for people who are used to the standard behavior. Basically, both store a moment in time in UTC, and display it with offset in hours and minutes; but the standard says it should show you that moment from the perspective of whoever saved it unless you ask for it in a different time zone, while PostgreSQL always shows it to you from the perspective of your client connection's time zone. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Kevin Grittner wrote: I just took a quick look at the spec to refresh my memory, and it seems to require that the WITH TIME ZONE types store UTC (I suppose for fast comparisons), it requires the time zone in the form of a hour:minute offset to be stored with it, so you can determine the local time from which it was derived. I concede that this is not usually useful, and am glad we have a type that behaves as timestamptz does; but occasionally a type that behaves in conformance with the spec would be useful, and it would certainly be less confusing for people who are used to the standard behavior. I remember we tried to implement this some years ago (IIRC alongside Alexey Klyukin who might remember more details). I couldn't find the thread, but one of the first problems we encountered was that we wanted to avoid storing the text name of the timezone on each datum; we had the idea of creating a catalog to attach an OID to each timezone, but that turned very quickly into a horrid mess and we discarded the idea. (For instance: if a new timezone is added in a new tzdata release, it needs to be added to the catalog, but how do you do that in minor releases?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Kevin Grittner wrote: I just took a quick look at the spec to refresh my memory, and it seems to require that the WITH TIME ZONE types store UTC (I suppose for fast comparisons), it requires the time zone in the form of a hour:minute offset to be stored with it, so you can determine the local time from which it was derived. I concede that this is not usually useful, and am glad we have a type that behaves as timestamptz does; but occasionally a type that behaves in conformance with the spec would be useful, and it would certainly be less confusing for people who are used to the standard behavior. I remember we tried to implement this some years ago (IIRC alongside Alexey Klyukin who might remember more details). I couldn't find the thread, but one of the first problems we encountered was that we wanted to avoid storing the text name of the timezone on each datum; we had the idea of creating a catalog to attach an OID to each timezone, but that turned very quickly into a horrid mess and we discarded the idea. (For instance: if a new timezone is added in a new tzdata release, it needs to be added to the catalog, but how do you do that in minor releases?) But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Kevin Grittner wrote: But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. Yeah, it does, but is it useful? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
Hello, A while back someone showed me a program blocking in libpq 9.2 on Solaris 11, inside sigwait called by pq_reset_sigpipe. It had happened a couple of times before during a period of instability/crashing with a particular DB (a commercial PostgreSQL derivative, but the client was using regular libpq). This was a very busy multithreaded client where each thread had its own connection. My theory is that if two connections accessed by different threads get shut down around the same time, there is a race scenario where each of them fails to write to its socket, sees errno == EPIPE and then sees a pending SIGPIPE with sigpending(), but only one thread returns from sigwait() due to signal merging. We never saw the problem again after we made the following change: --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -450,7 +450,6 @@ void pq_reset_sigpipe(sigset_t *osigset, bool sigpipe_pending, bool got_epipe) { int save_errno = SOCK_ERRNO; - int signo; sigset_tsigset; /* Clear SIGPIPE only if none was pending */ @@ -460,11 +459,13 @@ pq_reset_sigpipe(sigset_t *osigset, bool sigpipe_pending, bool got_epipe) sigismember(sigset, SIGPIPE)) { sigset_tsigpipe_sigset; + siginfo_t siginfo; + struct timespec timeout = { 0, 0 }; sigemptyset(sigpipe_sigset); sigaddset(sigpipe_sigset, SIGPIPE); - sigwait(sigpipe_sigset, signo); + sigtimedwait(sigpipe_sigset, siginfo, timeout); } } Does this make any sense? Best regards, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Thu, Aug 28, 2014 at 4:56 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Wow, that's extreme. For that case you can set autovacuum_vacuum_cost_limit to 0, which disables the whole thing and lets vacuum run at full speed -- no throttling at all. Would that satisfy the concern? Well, maybe, if you want to run completely unthrottled. But I have no evidence that's a common desire. Now that is obviously more a matter for autovacuum_naptime than this option, but the point seems general to me: if you need the table vacuumed every N seconds, minutes, or hours, and it only gets vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there are other autovacuum workers running, the table is going to bloat. There might be another problem here which is that if you have all your workers busy because they are vacuuming large tables that don't have churn high enough to warrant disrupting the whole environment (thus low cost_limit), then the table will bloat no matter what you set its cost limit to. So there's not only a matter of a low enough naptime (which is a bad thing for the rest of the system, also) but also one of something similar to priority inversion; should you speed up the vacuuming of those large tables so that one worker is freed soon enough to get to the high-churn table? I don't think so. I continue to believe that the we need to provide the user with the tools to be certain that table X will get vacuumed at least every Y seconds/minutes/hours. To me, allowing the user to set a rate that the system will not adjust or manipulate in any way makes this a lot easier than anything else we might do. Was the solution for that customer to set an external tool running vacuum on that table? Nope, we just changed settings. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On 26.8.2014 21:38, Jeff Davis wrote: On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote: I think this is enough for this commitfest - we have consensus on the design. For the next one, please address those open items, and resubmit. Agreed, return with feedback. I need to get the accounting patch in first, which needs to address some performance issues, but there's a chance of wrapping those up quickly. Sounds good to me. I'd like to coordinate our efforts on this a bit, if you're interested. I've been working on the hashjoin-like batching approach PoC (because I proposed it, so it's fair I do the work), and I came to the conclusion that it's pretty much impossible to implement on top of dynahash. I ended up replacing it with a hashtable (similar to the one in the hashjoin patch, unsurprisingly), which supports the batching approach well, and is more memory efficient and actually faster (I see ~25% speedup in most cases, although YMMV). I plan to address this in 4 patches: (1) replacement of dynahash by the custom hash table (done) (2) memory accounting (not sure what's your plan, I've used the approach I proposed on 23/8 for now, with a few bugfixes/cleanups) (3) applying your HashWork patch on top of this (I have this mostly completed, but need to do more testing over the weekend) (4) extending this with the batching I proposed, initially only for aggregates with states that we can serialize/deserialize easily (e.g. types passed by value) - I'd like to hack on this next week So at this point I have (1) and (2) pretty much ready, (3) is almost complete and I plan to start hacking on (4). Also, this does not address the open items listed in your message. But I agree this is more complex than the patch you proposed. So if you choose to pursue your patch, I have no problem with that - I'll then rebase my changes on top of your patch and submit them separately. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function to know last log write timestamp
On Thu, Aug 28, 2014 at 3:34 AM, Fujii Masao masao.fu...@gmail.com wrote: Theoretically it's not safe without a barrier on a machine with weak memory ordering. No? Why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Kevin Grittner wrote: But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. Yeah, it does, but is it useful? More so than CHAR(n). It would have been beneficial to support for the same reason. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On 08/28/2014 02:25 PM, Kevin Grittner wrote: But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. Except that an offset is not a timezone. This is why the spec behavior was always academic crippleware, and why we abandoned it back in ~~7.2. It does me no good at all to know that a timestamp is offset -07:00: that could be Mountain Time, Arizona Time, or Navajo Nation time, all of which will behave differently when I add 2 months to them. Unless the only goal is to be compatible with some other DBMS, in which case ... build an extension. On the other hand, I take partial responsibility for the mess which is our data type naming. What we call timestamptz should just be timestamp, and whether or not it converts to local timezone on retrieval should be a GUC setting. And the type we call timestamp shouldn't exist. Hindsight is 20/20. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
Kevin Grittner kgri...@ymail.com writes: But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. However, the added field creates its own semantic problems. As an example, is 2014-08-28 18:00:00-04 the same as or different from 2014-08-28 17:00:00-05? If they're different, which one is less? If they're the same, what's the point of storing the extra field? And do you really like equal values that behave differently, not only for I/O but for operations such as EXTRACT()? (I imagine the SQL spec gives a ruling on this issue, which I'm too lazy to look up; my point is that whatever they did, it will be the wrong thing for some use-cases.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
Thomas Munro mu...@ip9.org writes: My theory is that if two connections accessed by different threads get shut down around the same time, there is a race scenario where each of them fails to write to its socket, sees errno == EPIPE and then sees a pending SIGPIPE with sigpending(), but only one thread returns from sigwait() due to signal merging. Hm, that does sound like it could be a problem, if the platform fails to track pending SIGPIPE on a per-thread basis. We never saw the problem again after we made the following change: ... Does this make any sense? I don't think that patch would fix the problem if it's real. It would prevent libpq from hanging up when it's trying to throw away a pending SIGPIPE, but the fundamental issue is that that action could cause a SIGPIPE that's meant for some other thread to get lost; and that other thread isn't necessarily doing anything with libpq. I don't claim to be an expert on this stuff, but I had the idea that multithreaded environments were supposed to track signal state per-thread not just per-process, precisely because of issues like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.
Vik Fearing wrote: Here are two patches for this. The first one, reindex_user_tables.v1.patch, implements the variant that only hits user tables, as suggested by you. The second one, reindex_no_dbname.v1.patch, allows the three database-wide variants to omit the database name (voted for by Daniel Migowski, Bruce, and myself; voted against by you). This patch is to be applied on top of the first one. Not a fan. Here's a revised version that provides REINDEX USER TABLES, which can only be used without a database name; other modes are not affected i.e. they continue to require a database name. I also renamed your proposed reindexdb's --usertables to --user-tables. Oh, I just noticed that if you say reindexdb --all --user-tables, the latter is not honored. Must fix before commit. Makes sense? Note: I don't like the reindexdb UI; if you just run reindexdb -d foobar it will reindex everything, including system catalogs. I think USER TABLES should be the default operation mode for reindex. If you want plain old REINDEX DATABASE foobar which also hits the catalogs, you should request that separately (how?). This patch doesn't change this. Also note: if you say user tables, information_schema is reindexed too, which kinda sucks. Further note: this command is probably pointless in the majority of cases. Somebody should spend some serious time with REINDEX CONCURRENTLY .. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index cabae19..d05e1ac 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -22,6 +22,7 @@ PostgreSQL documentation refsynopsisdiv synopsis REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ] +REINDEX USER TABLES /synopsis /refsynopsisdiv @@ -126,14 +127,26 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERnam /varlistentry varlistentry +termliteralUSER TABLES/literal/term +listitem + para + Recreate all indexes on user tables within the current database. + Indexes on system catalogs are not processed. + This form of commandREINDEX/command cannot be executed inside a + transaction block. + /para +/listitem + /varlistentry + + varlistentry termreplaceable class=PARAMETERname/replaceable/term listitem para The name of the specific index, table, or database to be reindexed. Index and table names can be schema-qualified. - Presently, commandREINDEX DATABASE/ and commandREINDEX SYSTEM/ - can only reindex the current database, so their parameter must match - the current database's name. + Presently, commandREINDEX DATABASE/, commandREINDEX SYSTEM/, + and commandREINDEX USER TABLES/ can only reindex the current + database, so their parameter must match the current database's name. /para /listitem /varlistentry diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml index 486f5c9..f69d84b 100644 --- a/doc/src/sgml/ref/reindexdb.sgml +++ b/doc/src/sgml/ref/reindexdb.sgml @@ -65,6 +65,15 @@ PostgreSQL documentation /group arg choice=optreplaceabledbname/replaceable/arg /cmdsynopsis + + cmdsynopsis + commandreindexdb/command + arg rep=repeatreplaceableconnection-option/replaceable/arg + group choice=plain +arg choice=plainoption--user-tables/option/arg +arg choice=plainoption-u/option/arg + /group + /cmdsynopsis /refsynopsisdiv @@ -173,6 +182,16 @@ PostgreSQL documentation /listitem /varlistentry + varlistentry + termoption-u//term + termoption--user-tables//term + listitem + para +Reindex database's user tables. + /para + /listitem + /varlistentry + varlistentry termoption-V//term termoption--version//term diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index fdfa6ca..23e13f0 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1772,6 +1772,9 @@ ReindexTable(RangeVar *relation) * To reduce the probability of deadlocks, each table is reindexed in a * separate transaction, so we can release the lock on it right away. * That means this must not be called within a user transaction block! + * + * databaseName can be NULL when do_user is set and do_system isn't; this + * is the REINDEX USER TABLES case. */ Oid ReindexDatabase(const char *databaseName, bool do_system, bool do_user) @@ -1784,9 +1787,10 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user) List *relids = NIL; ListCell *l; - AssertArg(databaseName); + AssertArg(databaseName || (do_user !do_system)); - if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0) + if (databaseName +
Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
On 28 August 2014 23:45, Tom Lane t...@sss.pgh.pa.us wrote: I don't claim to be an expert on this stuff, but I had the idea that multithreaded environments were supposed to track signal state per-thread not just per-process, precisely because of issues like this. After some googling, I found reply #3 in https://community.oracle.com/thread/1950900?start=0tstart=0 and various other sources which say that in Solaris versions 10 they changed SIGPIPE delivery from per process (as specified by UNIX98) to per thread (as specified by POSIX:2001). But we are on version 11, so my theory doesn't look great. (Though 9 is probably still in use out there somewhere...) I also found this article: http://krokisplace.blogspot.co.uk/2010/02/suppressing-sigpipe-in-library.html The author recommends an approach nearly identical to the PostgreSQL approach, except s/he says: to do this we use sigtimedwait() with zero timeout; this is to avoid blocking in a scenario where malicious user sent SIGPIPE manually to a whole process: in this case we will see it pending, but other thread may handle it before we had a [chance] to wait for it. Maybe we have malicious users sending signals to processes. It does seem more likely the crashing database triggered this somehow though, perhaps in combination with something else the client app was doing, though I can't think what it could be that would eat another thread's SIGPIPE in between the sigpending and sigwait syscalls. Best regards, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On Thu, Aug 28, 2014 at 03:25:49PM -0700, Josh Berkus wrote: On 08/28/2014 02:25 PM, Kevin Grittner wrote: But the standard doesn't say anything about storing a time zone *name* or *abbreviation* -- it requires that it be stored as UTC with the *offset* (in hours and minutes). That makes it pretty close to what we have -- it's all about a difference in presentation. And as far as I can see it completely dodges the kinds of problems you're talking about. Except that an offset is not a timezone. This is why the spec behavior was always academic crippleware, and why we abandoned it back in ~~7.2. It does me no good at all to know that a timestamp is offset -07:00: that could be Mountain Time, Arizona Time, or Navajo Nation time, all of which will behave differently when I add 2 months to them. Unless the only goal is to be compatible with some other DBMS, in which case ... build an extension. On the other hand, I take partial responsibility for the mess which is our data type naming. What we call timestamptz should just be timestamp, and whether or not it converts to local timezone on retrieval should be a GUC setting. And the type we call timestamp shouldn't exist. Hindsight is 20/20. Well, the standard TIMESTAMP requires WITHOUT TIME ZONE, so I don't know how you would be standards-compliant without it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: Allow distdir to be overridden on make command line
Not just a one line patch, a one character patch. Use ?= instead of = in distdir assignment, so it can be overridden on the command line when building dist tarballs with patches. Yes, you can just modify GNUMakefile.in, but that's extra noise in a diff, adds merge conflicts, etc. Please apply. Surely this is harmless? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 2a115f7b62dbe3f98ef2c011f3283a41af86fd15 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 29 Aug 2014 10:00:40 +0800 Subject: [PATCH] Allow distdir to be overridden on the make command line This is useful for preparing dist tarballs for patches with useful suffixes, such as: distdir='postgresql-$(VERSION)'-git$(git rev-parse --short HEAD) --- GNUmakefile.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/GNUmakefile.in b/GNUmakefile.in index 69e0824..b469e3a 100644 --- a/GNUmakefile.in +++ b/GNUmakefile.in @@ -76,7 +76,7 @@ GNUmakefile: GNUmakefile.in $(top_builddir)/config.status ## -distdir = postgresql-$(VERSION) +distdir ?= postgresql-$(VERSION) dummy = =install= garbage = =* #* .#* *~* *.orig *.rej core postgresql-* -- 1.9.3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On 08/29/2014 04:59 AM, Kevin Grittner wrote: I just took a quick look at the spec to refresh my memory, and it seems to require that the WITH TIME ZONE types store UTC (I suppose for fast comparisons), it requires the time zone in the form of a hour:minute offset to be stored with it, so you can determine the local time from which it was derived. I concede that this is not usually useful, and am glad we have a type that behaves as timestamptz does; but occasionally a type that behaves in conformance with the spec would be useful, and it would certainly be less confusing for people who are used to the standard behavior. FWIW, MS SQL's DateTimeOffset data type: http://msdn.microsoft.com/en-AU/library/bb630289.aspx is much more like what I, when I was getting started, expected TIMESTAMP WITH TIME ZONE to be. We don't really have anything equivalent in PostgreSQL. The PostgreSQL implementation merits some highlighted clear explanation in the documentation, explaining the concept of a point in absolute time (the first person to mention relativity gets smacked ... oh, darn) vs a wall-clock value in local time. It should also discuss the approach of storing a (instant timestamptz, timezone text) or (instant timestampts, tzoffset smallint) tuple for when unambiguous representation is required. (I guess I just volunteered myself to write a draft of that). BTW, it might be interesting to have a validated 'timezone' data type that can store time zone names or offsets, for use in conjunction with timestamptz to store a (timestamptz, timezone) tuple. Though also complicated - whether 'EST' is Australian or USA Eastern time is GUC-dependent, and it can't just be expanded into Australia/Sydney at input time because EST is always +1000 while Australia/Sydney could also be EDT +1100 . I hate time zones. It'd probably have to expand abbrevs to their UTC offsets at input time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible optimization: push down aggregates
On 08/28/2014 03:46 AM, Claudio Freire wrote: You can't with mean and stddev, only with associative aggregates. That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count. You could with a new helper function to merge the temporary states for each scan though. In the case of mean, for example, it'd just mean adding the counts and sums. However, I'm not sure how interesting that is without the ability to execute the subplans in parallel. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
Haribabu Kommi kommi.harib...@gmail.com writes: Thanks for your review. Please find the rebased patch to latest HEAD. Committed with minor (mostly cosmetic) alterations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan p...@heroku.com wrote: There are some restrictions on what this auxiliary update may do, but FWIW there are considerably fewer than those that the equivalent MySQL or SQLite feature imposes on their users. I realized that I missed a few cases here. For one thing, the posted patch fails to arrange for the UPDATE post-parse-analysis tree representation to go through the rewriter stage (on the theory that user-defined rules shouldn't be able to separately affect the auxiliary UPDATE query tree), but rewriting is at least necessary so that rewriteTargetListIU() can expand a SET val = DEFAULT targetlist, as well as normalize the ordering of the UPDATE's tlist. Separately, the patch fails to defend against certain queries that ought to be disallowed, where a subselect is specified with a subquery expression in the auxiliary UPDATE's WHERE clause. These are garden-variety bugs that aren't likely to affect the kind of high-level design discussion that I'm looking for here. I'll post a fixed version in a few days time. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/13 12:40), Etsuro Fujita wrote: (2014/08/12 18:34), Shigeru Hanada wrote: Issues addressed by Eitoku-san were fixed properly, but he found a bug and a possible enhancement in the v2 patch. * push-down check misses delete triggers update_is_pushdown_safe() seems to have a bug that it misses the existence of row-level delete trigger. DELETE statement executed against a foreign table which has row-level delete trigger is pushed down to remote, and consequently no row-level delete trigger is fired. Ah, I noticed that the current code for that is not correct. Will fix. Done. * further optimization Is there any chance to consider further optimization by passing the operation type (UPDATE|DELETE) of undergoing statement to update_is_pushdown_safe()? It seems safe to push down UPDATE statement when the target foreign table has no update trigger even it has a delete trigger (of course the opposite combination would be also fine). Good idea! Will improve that too. Done. * Documentation The requirement of pushing down UPDATE/DELETE statements would not be easy to understand for non-expert users, so it seems that there is a room to enhance documentation. An idea is to define which expression is safe to send to remote first (it might need to mention the difference of semantics), and refer the definition from the place describing the requirement of pushing-down for SELECT, UPDATE and DELETE. Yeah, I also think that it would not necessarily easy for the users to understand which expression is safe to send. So I agree with that enhancement, but ISTM that it would be better to do that as a separate patch. As above, I'd like to leave this as another patch. Please find attached the updated version of the patch. Thanks, Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 188,197 is_foreign_expr(PlannerInfo *root, if (!foreign_expr_walker((Node *) expr, glob_cxt, loc_cxt)) return false; - /* Expressions examined here should be boolean, ie noncollatable */ - Assert(loc_cxt.collation == InvalidOid); - Assert(loc_cxt.state == FDW_COLLATE_NONE); - /* * An expression which includes any mutable functions can't be sent over * because its result is not stable. For example, sending now() remote --- 188,193 *** *** 927,932 deparseUpdateSql(StringInfo buf, PlannerInfo *root, --- 923,981 } /* + * deparse remote UPDATE statement + * + * The statement text is appended to buf, and we also create an integer List + * of the columns being retrieved by RETURNING (if any), which is returned + * to *retrieved_attrs. + */ + void + deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root, + Index rtindex, Relation rel, + List *remote_conds, + List *targetlist, + List *targetAttrs, List *returningList, + List **retrieved_attrs) + { + RelOptInfo *baserel = root-simple_rel_array[rtindex]; + List *params_list = NIL; + deparse_expr_cxt context; + bool first; + ListCell *lc; + + /* Set up context struct for recursion */ + context.root = root; + context.foreignrel = baserel; + context.buf = buf; + context.params_list = NULL; + + appendStringInfoString(buf, UPDATE ); + deparseRelation(buf, rel); + appendStringInfoString(buf, SET ); + + first = true; + foreach(lc, targetAttrs) + { + int attnum = lfirst_int(lc); + TargetEntry *tle = get_tle_by_resno(targetlist, attnum); + + if (!first) + appendStringInfoString(buf, , ); + first = false; + + deparseColumnRef(buf, rtindex, attnum, root); + appendStringInfo(buf, = ); + deparseExpr((Expr *) tle-expr, context); + } + if (remote_conds) + appendWhereClause(buf, root, baserel, remote_conds, + true, params_list); + + deparseReturningList(buf, root, rtindex, rel, false, + returningList, retrieved_attrs); + } + + /* * deparse remote DELETE statement * * The statement text is appended to buf, and we also create an integer List *** *** 949,954 deparseDeleteSql(StringInfo buf, PlannerInfo *root, --- 998,1030 } /* + * deparse remote DELETE statement + * + * The statement text is appended to buf, and we also create an integer List + * of the columns being retrieved by RETURNING (if any), which is returned + * to *retrieved_attrs. + */ + void + deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root, + Index rtindex, Relation rel, + List *remote_conds, + List *returningList, + List **retrieved_attrs) + { + RelOptInfo *baserel = root-simple_rel_array[rtindex]; + List *params_list = NIL; + + appendStringInfoString(buf, DELETE FROM ); + deparseRelation(buf, rel); + if (remote_conds) + appendWhereClause(buf, root, baserel, remote_conds, + true, params_list); + + deparseReturningList(buf, root, rtindex, rel, false, + returningList,
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/26 12:20), Etsuro Fujita wrote: (2014/08/25 21:58), Albe Laurenz wrote: I played with it, and apart from Hanada's comments I have found the following: test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id 3; QUERY PLAN -- Update on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1) - Foreign Scan on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=27 loops=1) Output: id, val, ctid Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id 3)) Planning time: 0.179 ms Execution time: 3706.919 ms (6 rows) Time: 3708.272 ms The actual time readings are surprising. Shouldn't these similar to the actual execution time, since most of the time is spent in the foreign scan node? I was also thinkng that this is confusing to the users. I think this is because the patch executes the UPDATE/DELETE statement during postgresBeginForeignScan, not postgresIterateForeignScan, as you mentioned below: Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed during postgresBeginForeignScan rather than during postgresIterateForeignScan. I'll modify the patch so as to execute the statement during postgresIterateForeignScan. Done. It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE is pushed down, right? Maybe it would make sense to add an assertion for that. IIUC, that is right. As ModifyTable doesn't support rescan currently, postgresReScanForeignScan needn't to be called in the update pushdown case. The assertion is a good idea. I'll add it. Done. You can find the updated version of the patch at http://www.postgresql.org/message-id/53fffa50.6020...@lab.ntt.co.jp Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 29/08/14 08:56, Alvaro Herrera wrote: Robert Haas wrote: I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Wow, that's extreme. For that case you can set autovacuum_vacuum_cost_limit to 0, which disables the whole thing and lets vacuum run at full speed -- no throttling at all. Would that satisfy the concern? Well no - you might have a whole lot of big tables that you want vacuum to not get too aggressive on, but a few small tables that are highly volatile. So you want *them* vacuumed really fast to prevent them becoming huge tables with only a few rows therein, but your system might not be able to handle *all* your tables being vacuum full speed. This is a fairly common scenario for (several) web CMS systems that tend to want to have session and/cache tables that are small and extremely volatile, plus the rest of the (real) data that is bigger and vastly less volatile. While there is a valid objection along the lines of don't use a database instead of memcache, it does seem reasonable that Postgres should be able to cope with this type of workload. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers