Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Fabien COELHO


Hello Aidan,


If all you want is to avoid the write storms when fsyncs start happening on
slow storage, can you not just adjust the kernel vm.dirty* tunables to
start making the kernel write out dirty buffers much sooner instead of
letting them accumulate until fsyncs force them out all at once?


I tried that by setting:
  vm.dirty_expire_centisecs = 100
  vm.dirty_writeback_centisecs = 100

So it should start writing returned buffers at most 2s after they are 
returned, if I understood the doc correctly, instead of at most 35s.


The result is that with a 5000s 25tps pretty small load (the system can do 
300tps with the default configuration), I lost 2091 (1.7%) of 
transactions, that is they were beyond the 200ms schedule limit. Another 
change is that overall the lost transactions are more spread than without 
this setting, although there still is stretches of unresponsiveness.


So although the situation is significantly better, it is still far from 
good with the much reduced value I tried.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers

2014-08-28 Thread Michael Paquier
On Wed, Aug 27, 2014 at 2:46 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 I have done some more review, below are my comments:
Thanks!

 1. There are currently two loops on *num_sync, Can we simplify the function 
 SyncRepGetSynchronousNodes by moving the priority calculation inside the 
 upper loop
 if (*num_sync == allowed_sync_nodes)
 {
 for (j = 0; j  *num_sync; j++)
 {
 Anyway we require priority only if *num_sync == allowed_sync_nodes 
 condition matches.
 So in this loop itself, we can calculate the priority as well as 
 assigment of new standbys with lower priority.
 Let me know if you see any issue with this.

OK, I see, yes this can minimize process a bit so I refactored the
code by integrating the second loop to the first. This has needed the
removal of the break portion as we need to find the highest priority
value among the nodes already determined as synchronous.

 2.  Comment inside the function SyncRepReleaseWaiters,
 /*
  * We should have found ourselves at least, except if it is not 
 expected
  * to find any synchronous nodes.
  */
 Assert(num_sync  0);

 I think except if it is not expected to find any synchronous nodes 
 is not required.
 As if it has come till this point means at least this node is 
 synchronous.
Yes, removed.

 3.  Document says that s_s_num should be lesser than max_wal_senders but 
 code wise there is no protection for the same.
 IMHO, s_s_num should be lesser than or equal to max_wal_senders 
 otherwise COMMIT will never return back the console without
 any knowledge of user.
 I see that some discussion has happened regarding this but I think 
 just adding documentation for this is not enough.
 I am not sure what issue is observed in adding check during GUC 
 initialization but if there is unavoidable issue during GUC initialization 
 then can't we try to add check at later points.

The trick here is that you cannot really return a warning at GUC
loading level to the user as a warning could be easily triggered if
for example s_s_num is present before max_wal_senders in
postgresql.conf. I am open to any solutions if there are any (like an
error when initializing WAL senders?!). Documentation seems enough for
me to warn the user.

 4.  Similary interaction between parameters s_s_names and s_s_num. I see some 
 discussion has happened regarding this and it is acceptable
 to have s_s_num more than s_s_names. But I was thinking should not 
 give atleast some notice message to user for such case along with
 some documentation.

Done. I added the following in the paragraph Server will wait:
Hence it is recommended to not set varnamesynchronous_standby_num/
to a value higher than the number of elements in
varnamesynchronous_standby_names/.

 5. At any one time there will be at a number of active synchronous 
 standbys: this sentence is not proper.
What about that:
At any one time there can be a number of active synchronous standbys
up to the number defined by xref
linkend=guc-synchronous-standby-num

 6.  When this parameter is set to literal0/, all the standby
 nodes will be considered as asynchronous.

 Can we make this as
 When this parameter is set to literal0/, all the standby
 nodes will be considered as asynchronous irrespective of value of 
 synchronous_standby_names.

Done. This seems proper for the user as we do not care at all about
s_s_names if _num = 0.

 7.  Are considered as synchronous the first elements of
 xref linkend=guc-synchronous-standby-names in number of
 xref linkend=guc-synchronous-standby-num that are
 connected.

 Starting of this sentence looks to be incomplete.
OK, I reworked this part as well. I hope it is clearer.

 8.  Standbys listed after this will take over the role
 of synchronous standby if the first one should fail.

 Should not we make it as:

 Standbys listed after this will take over the role
 of synchronous standby if any of the first synchronous-standby-num 
 standby fails.
Fixed as proposed.

At the same I found a bug with pg_stat_get_wal_senders caused by a
NULL pointer that was freed when s_s_num = 0. Updated patch addressing
comments is attached. On top of that the documentation has been
reworked a bit by replacing the too-high amount of xref blocks by
varname, having a link to a given variable specified only once.
Regards,
-- 
Michael
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2585,2597  include_dir 'conf.d'
 para
  Specifies a comma-separated list of standby names that can support
  firsttermsynchronous replication/, as described in
! xref linkend=synchronous-replication.
! At any one time there will be at most one 

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-28 Thread Fujii Masao
On Thu, Aug 28, 2014 at 12:55 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 10:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Not necessarily, because it's harmless.  It's there for purely
 aesthetical reasons, so it's your choice whether to add it or not.
 Having it there is slightly easier on somebody reading the code,
 perhaps.

Agreed.

 On my side, that's up to you Fujii-san. The patch does what it states, I
 only think that this extra 0 should be added either everywhere or nowhere.

Yep. I added extra 0 everywhere.

Ok, I just applied the patch. Thanks for the review!

 Not mandatory either: drop test_param_unit in the regression tests after
 running the test queries.

I don't have strong opinion about this. There are many tables which
regression test creates but doesn't drop. But if you strongly think that
the table must be dropped, I'm OK with that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specifying the unit in storage parameter

2014-08-28 Thread Michael Paquier
On Thu, Aug 28, 2014 at 4:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I don't have strong opinion about this. There are many tables which
 regression test creates but doesn't drop. But if you strongly think that
 the table must be dropped, I'm OK with that.
This remark is just to limit the amount of trash in the database used
for regression tests. But then if we'd remove everything we would lack
handy material for tests on utilities like database-wide thingies of
the type VACUUM, REINDEX, pg_dump, etc. And we can just drop the
database used for regressions to clean up everything. So that's not
mandatory at all. I tend to always clean up objects in my patches
touching regressions to limit interactions with other tests, but I
guess that's up to the person who wrote the code to decide.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-08-28 Thread Thomas Munro
On 28 August 2014 00:25, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Thomas Munro wrote:

 Thanks, I hadn't seen this, I should have checked the archives better.
 I have actually already updated my patch to handle EvalPlanQualFetch
 with NOWAIT and SKIP LOCKED with isolation specs, see attached.  I
 will compare with Craig's and see if I screwed anything up... of
 course I am happy to merge and submit a new patch on top of Craig's if
 it's going to be committed.

 Thanks, please rebase.

Thanks -- new patch attached.

Best regards,
Thomas Munro
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 231dc6a..0469705 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -45,7 +45,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac
 [ LIMIT { replaceable class=parametercount/replaceable | ALL } ]
 [ OFFSET replaceable class=parameterstart/replaceable [ ROW | ROWS ] ]
 [ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] { ROW | ROWS } ONLY ]
-[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ] [...] ]
+[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase
 
@@ -1283,7 +1283,7 @@ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] {
 The locking clause has the general form
 
 synopsis
-FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ]
+FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ]
 /synopsis
 
 where replaceablelock_strength/ can be one of
@@ -1359,11 +1359,17 @@ KEY SHARE
 
para
 To prevent the operation from waiting for other transactions to commit,
-use the literalNOWAIT/ option.  With literalNOWAIT/, the statement
-reports an error, rather than waiting, if a selected row
-cannot be locked immediately.  Note that literalNOWAIT/ applies only
-to the row-level lock(s) mdash; the required literalROW SHARE/literal
-table-level lock is still taken in the ordinary way (see
+use either the literalNOWAIT/ or literalSKIP LOCKED/literal
+option.  With literalNOWAIT/, the statement reports an error, rather
+than waiting, if a selected row cannot be locked immediately.
+With literalSKIP LOCKED/literal, any selected rows that cannot be
+immediately locked are skipped.  Skipping locked rows provides an
+inconsistent view of the data, so this is not suitable for general purpose
+work, but can be used to avoid lock contention with multiple consumers
+accessing a queue-like table.  Note that literalNOWAIT/
+and literalSKIP LOCKED/literal apply only to the row-level lock(s)
+mdash; the required literalROW SHARE/literal table-level lock is
+still taken in the ordinary way (see
 xref linkend=mvcc).  You can use
 xref linkend=sql-lock
 with the literalNOWAIT/ option first,
@@ -1386,14 +1392,14 @@ KEY SHARE
/para
 
para
-Multiple locking
-clauses can be written if it is necessary to specify different locking
-behavior for different tables.  If the same table is mentioned (or
-implicitly affected) by more than one locking clause,
-then it is processed as if it was only specified by the strongest one.
-Similarly, a table is processed
-as literalNOWAIT/ if that is specified in any of the clauses
-affecting it.
+Multiple locking clauses can be written if it is necessary to specify
+different locking behavior for different tables.  If the same table is
+mentioned (or implicitly affected) by more than one locking clause, then
+it is processed as if it was only specified by the strongest one.
+Similarly, a table is processed as literalNOWAIT/ if that is specified
+in any of the clauses affecting it.  Otherwise, it is processed
+as literalSKIP LOCKED/literal if that is specified in any of the
+clauses affecting it.
/para
 
para
@@ -1930,9 +1936,9 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
 productnamePostgreSQL/productname allows it in any commandSELECT/
 query as well as in sub-commandSELECT/s, but this is an extension.
 The literalFOR NO KEY UPDATE/, literalFOR SHARE/ and
-literalFOR KEY SHARE/ variants,
-as well as the literalNOWAIT/ option,
-do not appear in the standard.
+literalFOR KEY SHARE/ variants, as well as the literalNOWAIT/
+and literalSKIP LOCKED/literal options, do not appear in the
+standard.
/para
   /refsect2
 
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml
index ba92607..57396d7 100644
--- a/doc/src/sgml/sql.sgml
+++ b/doc/src/sgml/sql.sgml
@@ -863,7 

Re: [HACKERS] Function to know last log write timestamp

2014-08-28 Thread Fujii Masao
On Thu, Aug 28, 2014 at 2:44 AM, Jim Nasby j...@nasby.net wrote:
 On 8/27/14, 7:33 AM, Fujii Masao wrote:

 On Tue, Aug 19, 2014 at 1:07 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao masao.fu...@gmail.com
 wrote:

 On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

 On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund
 and...@2ndquadrant.com wrote:

 On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

 That's about the idea. However, what you've got there is actually
 unsafe, because shmem-counter++ is not an atomic operation.  It
 reads
 the counter (possibly even as two separate 4-byte loads if the
 counter
 is an 8-byte value), increments it inside the CPU, and then writes
 the
 resulting value back to memory.  If two backends do this
 concurrently,
 one of the updates might be lost.


 All these are only written by one backend, so it should be safe. Note
 that that coding pattern, just without memory barriers, is all over
 pgstat.c


 Ah, OK.  If there's a separate slot for each backend, I agree that
 it's safe.

 We should probably add barriers to pgstat.c, too.


 Yea, definitely. I think this is rather borked on weaker
 architectures. It's just that the consequences of an out of date/torn
 value are rather low, so it's unlikely to be noticed.

 Imo we should encapsulate the changecount modifications/checks somehow
 instead of repeating the barriers, Asserts, comments et al everywhere.


 So what about applying the attached patch first, which adds the macros
 to load and store the changecount with the memory barries, and changes
 pgstat.c use them. Maybe this patch needs to be back-patch to at least
 9.4?

 After applying the patch, I will rebase the
 pg_last_xact_insert_timestamp
 patch and post it again.


 That looks OK to me on a relatively-quick read-through.  I was
 initially a bit worried about this part:

do
{
 ! pgstat_increment_changecount_before(beentry);
} while ((beentry-st_changecount  1) == 0);

 pgstat_increment_changecount_before is an increment followed by a
 write barrier.  This seemed like funny coding to me at first because
 while-test isn't protected by any sort of barrier.  But now I think
 it's correct, because there's only one process that can possibly write
 to that data, and that's the one that is making the test, and it had
 certainly better see its own modifications in program order no matter
 what.

 I wouldn't object to back-patching this to 9.4 if we were earlier in
 the beta cycle, but at this point I'm more inclined to just put it in
 9.5.  If we get an actual bug report about any of this, we can always
 back-patch the fix at that time.  But so far that seems mostly
 hypothetical, so I think the less-risky course of action is to give
 this a longer time to bake before it hits an official release.


 Sounds reasonable. So, barring any objection, I will apply the patch
 only to the master branch.


 It's probably worth adding a comment explaining why it's safe to do this
 without a barrier...

s/without/with ?

Theoretically it's not safe without a barrier on a machine with weak
memory ordering. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and log_statements

2014-08-28 Thread Fujii Masao
On Thu, Jun 19, 2014 at 5:29 PM, Ian Barwick i...@2ndquadrant.com wrote:
 On 12/06/14 20:37, Fujii Masao wrote:

 On Wed, Jun 11, 2014 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@2ndquadrant.com writes:

 Your wish just seems like a separate feature to me. Including
 replication commands in 'all' seems correct independent of the desire
 for a more granular control.


 No, I think I've got to vote with the other side on that.

 The reason we can have log_statement as a scalar progression
 none  ddl  mod  all is that there's little visible use-case
 for logging DML but not DDL, nor for logging SELECTS but not
 INSERT/UPDATE/DELETE.  However, logging replication commands seems
 like something people would reasonably want an orthogonal control for.
 There's no nice way to squeeze such a behavior into log_statement.

 I guess you could say that log_statement treats replication commands
 as if they were DDL, but is that really going to satisfy users?

 I think we should consider log_statement to control logging of
 SQL only, and invent a separate GUC (or, in the future, likely
 more than one GUC?) for logging of replication activity.


 Seems reasonable. OK. The attached patch adds log_replication_command
 parameter which causes replication commands to be logged. I added this to
 next CF.


 A quick review:

Sorry, I've noticed this email right now. Thanks for reviewing the patch!

 - minor rewording for the description, mentioning that statements will
   still be logged as DEBUG1 even if parameter set to 'off' (might
   prevent reports of the kind I set it to 'off', why am I still seeing
   log entries?).

para
 Causes each replication command to be logged in the server log.
 See xref linkend=protocol-replication for more information about
 replication commands. The default value is literaloff/. When set
 to
 literaloff/, commands will be logged at log level
 literalDEBUG1/literal.
 Only superusers can change this setting.
/para

Yep, fixed. Attached is the updated version of the patch.


 - I feel it would be more consistent to use the plural form
   for this parameter, i.e. log_replication_commands, in line with
   log_lock_waits, log_temp_files, log_disconnections etc.

But log_statement is in the singular form. So I just used
log_replication_command. For the consistency, maybe we need to
change both parameters in the plural form? I don't have strong
opinion about this.

 - It might be an idea to add a cross-reference to this parameter from
   the Streaming Replication Protocol page:
   http://www.postgresql.org/docs/devel/static/protocol-replication.html

Yes. I added that.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 4558,4563  FROM pg_stat_activity;
--- 4558,4581 
/listitem
   /varlistentry
  
+  varlistentry id=guc-log-replication-command xreflabel=log_replication_command
+   termvarnamelog_replication_command/varname (typeboolean/type)
+   indexterm
+primaryvarnamelog_replication_command/ configuration parameter/primary
+   /indexterm
+   /term
+   listitem
+para
+ Causes each replication command to be logged in the server log.
+ See xref linkend=protocol-replication for more information about
+ replication command. The default value is literaloff/. When
+ set to literaloff/, commands will be logged at log level
+ literalDEBUG1/ (or lower).
+ Only superusers can change this setting.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-log-temp-files xreflabel=log_temp_files
termvarnamelog_temp_files/varname (typeinteger/type)
indexterm
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 1306,1311  To initiate streaming replication, the frontend sends the
--- 1306,1314 
  of literaltrue/ tells the backend to go into walsender mode, wherein a
  small set of replication commands can be issued instead of SQL statements. Only
  the simple query protocol can be used in walsender mode.
+ Replication commands are logged in the server log at log level
+ literalDEBUG1/ (or lower) or when
+ xref linkend=guc-log-replication-command is enabled.
  Passing literaldatabase/ as the value instructs walsender to connect to
  the database specified in the literaldbname/ parameter, which will allow
  the connection to be used for logical replication from that database.
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 108,113  bool		am_db_walsender = false;	/* Connected to a database? */
--- 108,114 
  int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
  int			wal_sender_timeout = 60 * 1000;		/* maximum time to send one
   * WAL data message */
+ bool		log_replication_command = false;
  

Re: [HACKERS] delta relations in AFTER triggers

2014-08-28 Thread Marti Raudsepp
On Thu, Aug 28, 2014 at 12:03 AM, Kevin Grittner kgri...@ymail.com wrote:
 In essence, make the relations work like PL/pgSQL
 variables do. If you squint a little, the new/old relation is a variable
 from the function's point of view, and a parameter from the
 planner/executor's point of view. It's just a variable/parameter that
 holds a set of tuples, instead of a single Datum.

 will be surprised if someone doesn't latch onto this to create a
 new declared temporary table that only exists within the scope of
 a compound statement (i.e., a BEGIN/END block).  You would DECLARE
 them just like you would a scalar variable in a PL, and they would
 have the same scope.

 I'll take a look at doing this in the next couple days, and see
 whether doing it that way is as easy as it seems on the face of it.

We already have all this with refcursor variables. Admittedly cursors
have some weird semantics (mutable position) and currently they're
cumbersome to combine into queries, but would a new relation
variable be sufficiently better to warrant a new type?

Why not extend refcursors and make them easier to use instead?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread rohtodeveloper
Hi,all

 

I have a question about data type timestamp with time zone.
Why data of timestamptz does not store value of timezone passed to it?


Considering the following example.

 

postgres=# select '2014-08-28 14:30:30.423602+02'::timestamp with time zone;
  timestamptz  
---
 2014-08-28 20:30:30.423602+08
(1 row)

 

The timezone of output(+08) is different with the original input value(+02).
It seems not to be good behavior.But the behavior of date type time with time 
zone is correct.

 

postgres=# select '14:30:30.423602+02'::time with time zone;
   timetz   

 14:30:30.423602+02
(1 row)

 

If the corrent behavior of timestamptz is not suitable,is there any plan to 
correct the behavior of timestamptz or create a new data type which can store 
timestamp with timezone?

 


*)manual--8.5.1.3. Time Stamps
-
For timestamp with time zone, the internally stored value is always in UTC 
(Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). 
An input value that has an explicit time zone specified is converted to UTC 
using the appropriate offset for that time zone. If no time zone is stated in 
the input string, then it is assumed to be in the time zone indicated by the 
system's TimeZone parameter, and is converted to UTC using the offset for the 
timezone zone.
-

 

 

 

Best regarts

rohto

 

 

 

 

 

rohto
  

Re: [HACKERS] inherit support for foreign tables

2014-08-28 Thread Etsuro Fujita

(2014/08/22 11:51), Noah Misch wrote:

On Wed, Aug 20, 2014 at 08:11:01PM +0900, Etsuro Fujita wrote:

(2014/07/02 11:23), Noah Misch wrote:

Your chosen ANALYZE behavior is fair, but the messaging from a database-wide
ANALYZE VERBOSE needs work:

INFO:  analyzing test_foreign_inherit.parent
INFO:  parent: scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 
1 rows in sample, 1 estimated total rows
INFO:  analyzing test_foreign_inherit.parent inheritance tree



Please arrange to omit the 'analyzing tablename inheritance tree' message,
since this ANALYZE actually skipped that task.



I think it would be better that this is handled in the same way as
an inheritance tree that turns out to be a singe table that doesn't
have any descendants in acquire_inherited_sample_rows().  That would
still output the message as shown below, but I think that that would
be more consistent with the existing code.  The patch works so.



Today's ANALYZE VERBOSE messaging for former inheritance parents (tables with
relhassubclass = true but no pg_inherits.inhparent links) is deceptive, and I
welcome a fix to omit the spurious message.  As defects go, this is quite
minor.  There's fundamentally no value in collecting inheritance tree
statistics for such a parent, and no PostgreSQL command will do so.

Your proposed behavior for inheritance parents having at least one foreign
table child is more likely to mislead DBAs in practice.  An inheritance tree
genuinely exists, and a different ANALYZE command is quite capable of
collecting statistics on that inheritance tree.  Messaging should reflect the
difference between ANALYZE invocations that do such work and ANALYZE
invocations that skip it.  I'm anticipating a bug report along these lines:

   I saw poor estimates involving a child foreign table, so I ran ANALYZE
   VERBOSE, which reported 'INFO:  analyzing public.parent inheritance
   tree'.  Estimates remained poor, so I scratched my head for awhile before
   noticing that pg_stats didn't report such statistics.  I then ran ANALYZE
   VERBOSE parent, saw the same 'INFO:  analyzing public.parent inheritance
   tree', but this time saw relevant rows in pg_stats.  The message doesn't
   reflect the actual behavior.

I'll sympathize with that complaint.  It's a minor point overall, but the code
impact is, I predict, small enough that we may as well get it right.  A
credible alternative is to emit a second message indicating that we skipped
the inheritance tree statistics after all, and why we skipped them.


I'd like to address this by emitting the second message as shown below:

INFO:  analyzing public.parent
INFO:  parent: scanned 0 of 0 pages, containing 0 live rows and 0 dead 
rows; 0 rows in sample, 0 estimated total rows

INFO:  analyzing public.parent inheritance tree
INFO:  skipping analyze of public.parent inheritance tree --- this 
inheritance tree contains foreign tables


Attached is the update version of the patch.  Based on the result of 
discussions about attr_needed upthread, I've changed the patch so that 
create_foreignscan_plan makes reference to reltargetlist, not to 
attr_needed.  (So, the patch in [1] isn't required, anymore.)


Other changes:
* Revise code/comments/docs a bit
* Add more regression tests

A separate patch (analyze.patch) handles the former case in a similar way.

[1] http://www.postgresql.org/message-id/53f4707c.8030...@lab.ntt.co.jp

Thanks,

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 115,120  static void fileGetForeignRelSize(PlannerInfo *root,
--- 115,125 
  static void fileGetForeignPaths(PlannerInfo *root,
  	RelOptInfo *baserel,
  	Oid foreigntableid);
+ static ForeignPath *fileReparameterizeForeignPath(PlannerInfo *root,
+   RelOptInfo *baserel,
+   Oid foreigntableid,
+   ForeignPath *path,
+   Relids required_outer);
  static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
     RelOptInfo *baserel,
     Oid foreigntableid,
***
*** 143,149  static bool check_selective_binary_conversion(RelOptInfo *baserel,
  static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
  			  FileFdwPlanState *fdw_private);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
! 			   FileFdwPlanState *fdw_private,
  			   Cost *startup_cost, Cost *total_cost);
  static int file_acquire_sample_rows(Relation onerel, int elevel,
  		 HeapTuple *rows, int targrows,
--- 148,154 
  static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
  			  FileFdwPlanState *fdw_private);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
! 			   FileFdwPlanState *fdw_private, List *join_conds,
  			   Cost *startup_cost, Cost *total_cost);
  static int file_acquire_sample_rows(Relation onerel, int elevel,
  		 HeapTuple *rows, int targrows,
***
*** 161,166  

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-28 Thread Etsuro Fujita

(2014/08/27 22:56), Robert Haas wrote:

On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:

Reading the code, I noticed that the pushed down UPDATE or DELETE statement is 
executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the 
normal scan?


Hmm, I'm worried that may be an API contract violation.


Will fix.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] v4 protocol TODO item - Lazy fetch/stream of TOASTed values?

2014-08-28 Thread Craig Ringer
Hi all

Per the protocol todo:

 https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes

do you think it's reasonable to allow for delayed sending of big varlena
values and arrays in the protocol?

The JDBC spec already has support for this in arrays, XML, and binary
blobs, but AFAIK the backend can't support lazy fetching.

Of course anbything like this would require an open transaction, but the
JDBC spec already sets that as a requirement for SQLXML etc, specifying
that resources must be retained for at least the duration of the
transaction unless explicitly free'd by the application.

As far as I can tell PostgreSQL is in a good position to support lazy
fetch of big values. TOASTed values could be returned as an opaque
placeholder value identifying the toast table oid and ctid, then fetched
by sending a further protocol-level request to the server that could
send values as a result set or by entering COPY mode to stream them.

The transaction's xmin would already prevent the values being
overwritten - at least on SERIALIZABLE isolation.

For non-TOASTed values like big in-memory values it probably makes more
sense to just force them to be sent direct to the client like they
already are, rather than trying to stash them in server memory and
address them for returning later. Though a tuplestore could always be
used to spill to disk, I guess.

Anyway. Random idea for the day.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile

2014-08-28 Thread Andres Freund
On 2014-08-26 23:56:10 -0400, Peter Eisentraut wrote:
 On Tue, 2014-08-26 at 02:05 +0200, Andres Freund wrote:
  Currently running make -j16 all check in contrib/ results in a mess
  because
  all pg_regress invocations fight over the same port. Adding a simple
  .NOTPARALLEL: check-%-recurse
  into contrib/Makefile fixes that. Do we want that?
 
 But that causes also the all to be run in not-parallel, because the
 meaning of .NOTPARALLEL is:
 
  If `.NOTPARALLEL' is mentioned as a target, then this invocation of
  `make' will be run serially, even if the `-j' option is given.

Argh. There goes that.

 It does not mean, as you appear to imagine, to run only the listed
 prerequisites in not-parallel.  That would be nice!

Yea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Andres Freund
Hi,

On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote:
 Hi
 
 I chose \? xxx, because it is related to psql features. I wrote commands:
 
 \? options
 \? variables
 
 comments?

Generall I like it.

Some stuff I changed:
* I rephrased the sgml changes
* s/Printing options/Display options/. Or maybe Display influencing
  variables? That makes it clearer why they're listed under
  --help-variables.
* I added \? commands as an alias for a plain \?
  That way the scheme can sensibly be expanded.
* I renamed help_variables() to be inline with the surrounding functions.

Stuff I wondered about:
* How about making it --help=variables instead of --help-variables? That
* Do we really need the 'Report bugs to' blurb for --help-variables?
* Since we're not exhaustive about documenting environment variales, do
  we really need to document TMPDIR and SHELL?
* Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is
  for already started statements, and PROMPT3 is for COPY FROM STDIN?

Looks like it's commitable next round.

I've attached a incremental patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
0001-help-variables-11.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-28 Thread Etsuro Fujita
(2014/08/27 23:05), Tom Lane wrote:
 I wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm, I'm worried that may be an API contract violation.

 Actually, there's another problem there.  What of UPDATE or DELETE with a
 LIMIT clause, which is something that seems to be coming down the pike:
 https://commitfest.postgresql.org/action/patch_view?id=1550

I'd like to try to extend the functionality so as to push
UPDATE/DELETE-with-LIMIT down into the remote server if it's safe.  But
I don't yet know if it's possible, because I started looking into the
UPDATE/DELETE-with-LIMIT patch just now ...

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2014-08-28 Thread Fujii Masao
On Tue, Aug 26, 2014 at 4:55 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 08/25/2014 10:48 PM, Heikki Linnakangas wrote:

 On 08/25/2014 09:22 PM, Fujii Masao wrote:

 On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 I agree that refactoring this would be nice in the long-term, and I also
 agree that it's probably OK as it is in the short-term. I don't like the
 name PSQLexecInternal, though. PSQLexec is used for internal commands
 anyway. In fact it's backwards, because PSQLexecInternal is used for
 non-internal queries, given by \watch, while PSQLexec is used for
 internal
 commands.


 Agreed. So what about PSQLexecCommon (inspired by
 the relation between LWLockAcquireCommon and LWLockAcquire)?
 Or any better name?


 Actually, perhaps it would be better to just copy-paste PSQLexec, and
 modify the copy to suite \watch's needs. (PSQLexecWatch?
 SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much
 overlap between what \watch wants and what other PSQLexec callers want.
 \watch wants timing output, others don't. \watch doesn't want
 transaction handling.

Agreed. Attached is the revised version of the patch. I implemented
PSQLexecWatch() which sends the query, prints the results and outputs
the query execution time (if \timing is enabled).

This patch was marked as ready for committer, but since I revised
the code very much, I marked this as needs review again.

 Do we want --echo-hidden to print the \watch'd
 query? Not sure..

Per document, --echo-hidden prints the actual queries generated by
backslash command. But \watch doesn't handle backslash commands.
So I think that PSQLexecWatch doesn't need to handle --echo-hidden.

 BTW, I just noticed that none of the callers of PSQLexec pass
 start_xact=true. So that part of the function is dead code. We might want
 to remove it, and replace with a comment noting that PSQLexec never starts a
 new transaction block, even in autocommit-off mode. (I know you're hacking
 on this, so I didnn't want to joggle your elbow by doing it right now)

Good catch. So I will remove start_xact code later.

Regards,

-- 
Fujii Masao
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
***
*** 2687,2693  do_watch(PQExpBuffer query_buf, long sleep)
  
  	for (;;)
  	{
! 		PGresult   *res;
  		time_t		timer;
  		long		i;
  
--- 2687,2693 
  
  	for (;;)
  	{
! 		int	res;
  		time_t		timer;
  		long		i;
  
***
*** 2701,2759  do_watch(PQExpBuffer query_buf, long sleep)
  		myopt.title = title;
  
  		/*
! 		 * Run the query.  We use PSQLexec, which is kind of cheating, but
! 		 * SendQuery doesn't let us suppress autocommit behavior.
  		 */
! 		res = PSQLexec(query_buf-data, false);
! 
! 		/* PSQLexec handles failure results and returns NULL */
! 		if (res == NULL)
! 			break;
  
  		/*
! 		 * If SIGINT is sent while the query is processing, PSQLexec will
! 		 * consume the interrupt.  The user's intention, though, is to cancel
! 		 * the entire watch process, so detect a sent cancellation request and
! 		 * exit in this case.
  		 */
! 		if (cancel_pressed)
! 		{
! 			PQclear(res);
  			break;
! 		}
! 
! 		switch (PQresultStatus(res))
! 		{
! 			case PGRES_TUPLES_OK:
! printQuery(res, myopt, pset.queryFout, pset.logfile);
! break;
! 
! 			case PGRES_COMMAND_OK:
! fprintf(pset.queryFout, %s\n%s\n\n, title, PQcmdStatus(res));
! break;
! 
! 			case PGRES_EMPTY_QUERY:
! psql_error(_(\\watch cannot be used with an empty query\n));
! PQclear(res);
! return false;
! 
! 			case PGRES_COPY_OUT:
! 			case PGRES_COPY_IN:
! 			case PGRES_COPY_BOTH:
! psql_error(_(\\watch cannot be used with COPY\n));
! PQclear(res);
! return false;
! 
! 			default:
! /* other cases should have been handled by PSQLexec */
! psql_error(_(unexpected result status for \\watch\n));
! PQclear(res);
! return false;
! 		}
! 
! 		PQclear(res);
! 
! 		fflush(pset.queryFout);
  
  		/*
  		 * Set up cancellation of 'watch' via SIGINT.  We redo this each time
--- 2701,2720 
  		myopt.title = title;
  
  		/*
! 		 * Run the query and print out the results. We use PSQLexecWatch,
! 		 * which is kind of cheating, but SendQuery doesn't let us suppress
! 		 * autocommit behavior.
  		 */
! 		res = PSQLexecWatch(query_buf-data, myopt);
  
  		/*
! 		 * PSQLexecWatch handles the case where we can no longer
! 		 * repeat the query, and returns 0 or -1.
  		 */
! 		if (res == 0)
  			break;
! 		if (res == -1)
! 			return false;
  
  		/*
  		 * Set up cancellation of 'watch' via SIGINT.  We redo this each time
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
***
*** 497,502  PSQLexec(const char *query, bool start_xact)
--- 497,598 
  }
  
  
+ /*
+  * PSQLexecWatch
+  *
+  * This function is used for \watch command to send the query to
+  * the server and print out the results.
+  *
+  * Returns 1 if the query executed successfully, 0 

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Fujii Masao
On Thu, Aug 28, 2014 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote:
 Hi

 I chose \? xxx, because it is related to psql features. I wrote commands:

 \? options
 \? variables

 comments?

 Generall I like it.

 Some stuff I changed:
 * I rephrased the sgml changes
 * s/Printing options/Display options/. Or maybe Display influencing
   variables? That makes it clearer why they're listed under
   --help-variables.
 * I added \? commands as an alias for a plain \?
   That way the scheme can sensibly be expanded.
 * I renamed help_variables() to be inline with the surrounding functions.

 Stuff I wondered about:
 * How about making it --help=variables instead of --help-variables? That
 * Do we really need the 'Report bugs to' blurb for --help-variables?
 * Since we're not exhaustive about documenting environment variales, do
   we really need to document TMPDIR and SHELL?
 * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is
   for already started statements, and PROMPT3 is for COPY FROM STDIN?

 Looks like it's commitable next round.

 I've attached a incremental patch.

ISTM that you failed to attach the right patch.
help-variables-10--11.diff contains only one line.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Andres Freund
On 2014-08-28 13:20:07 +0200, Andres Freund wrote:
 I've attached a incremental patch.

Apparently I didn't attach the patch, as so much a file containing the
name of the patchfile...


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 6d3189d..b14db45 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2581,12 +2581,15 @@ testdb=gt; userinput\setenv LESS -imx4F/userinput
 
 
   varlistentry
-termliteral\? [ options | variables ]/literal/term
+termliteral\? [ replaceable class=parametertopic/replaceable ]/literal/term
 listitem
 para
-Shows help information about the backslash commands.  This command can have a
-option variables or options to take help for psql configuration variables
-or psql command line options.
+Shows help information. The parameter
+replaceable class=parametertopic/replaceable can be
+literalcommands/ (the default) to show help about backslash
+commands, literaloptions/ to show information about commandline
+arguments, or literalvariables/ to show information about
+variables influencing applicationpsql/'s behaviour or display.
 /para
 /listitem
   /varlistentry
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 12cbb20..7e4626c 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1507,12 +1507,12 @@ exec_command(const char *cmd,
 		char	   *opt0 = psql_scan_slash_option(scan_state,
 	OT_NORMAL, NULL, false);
 
-		if (!opt0)
+		if (!opt0 || strcmp(opt0, commands) == 0)
 			slashUsage(pset.popt.topt.pager);
-		else if (strcmp(opt0, variables) == 0)
-			help_variables(pset.popt.topt.pager);
 		else if (strcmp(opt0, options) == 0)
 			usage(pset.popt.topt.pager);
+		else if (strcmp(opt0, variables) == 0)
+			helpVariables(pset.popt.topt.pager);
 		else
 			slashUsage(pset.popt.topt.pager);
 	}
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 5e7953d..e0d1d13 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -171,7 +171,7 @@ slashUsage(unsigned short int pager)
 
 	fprintf(output, _(Help\n));
 
-	fprintf(output, _(  \\? description of all psql commands\n));
+	fprintf(output, _(  \\? [commands]  description of all psql commands\n));
 	fprintf(output, _(  \\? options description of psql options\n));
 	fprintf(output, _(  \\? variables   description of all psql configuration variables\n));
 	fprintf(output, _(  \\h [NAME]  help on syntax of SQL commands, * for all commands\n));
@@ -294,10 +294,12 @@ slashUsage(unsigned short int pager)
 
 
 /*
- * show list of available variables (options) from command line
+ * helpVariables
+ *
+ * Show list of available variables (options).
  */
 void
-help_variables(unsigned short int pager)
+helpVariables(unsigned short int pager)
 {
 	FILE	   *output;
 
@@ -335,7 +337,7 @@ help_variables(unsigned short int pager)
 	fprintf(output, _(  USER   the currently connected database user\n));
 	fprintf(output, _(  VERBOSITY  control verbosity of error reports [default, verbose, terse]\n));
 
-	fprintf(output, _(\nPrinting options:\n));
+	fprintf(output, _(\nDisplay influencing variables:\n));
 	fprintf(output, _(Usage:\n));
 	fprintf(output, _(  psql --pset=NAME[=VALUE]\n  or \\pset NAME [VALUE] in interactive mode\n\n));
 
diff --git a/src/bin/psql/help.h b/src/bin/psql/help.h
index bab360d..3ad374a 100644
--- a/src/bin/psql/help.h
+++ b/src/bin/psql/help.h
@@ -12,7 +12,7 @@ void		usage(unsigned short int pager);
 
 void		slashUsage(unsigned short int pager);
 
-void		help_variables(unsigned short int pager);
+void		helpVariables(unsigned short int pager);
 
 void		helpSQL(const char *topic, unsigned short int pager);
 
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index af68e13..0651588 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -574,7 +574,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options)
 }
 break;
 			case 1:
-help_variables(NOPAGER);
+helpVariables(NOPAGER);
 exit(EXIT_SUCCESS);
 			default:
 fprintf(stderr, _(Try \%s --help\ for more information.\n),
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7a94fa8..da1b984 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3516,7 +3516,7 @@ psql_completion(const char *text, int start, int end)
 	else if (strcmp(prev_wd, \\?) == 0)
 	{
 		static const char *const my_list[] =
-		{options, variables, NULL};
+		{commands,options, variables, NULL};
 
 		COMPLETE_WITH_LIST_CS(my_list);
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Pavel Stehule
2014-08-28 13:20 GMT+02:00 Andres Freund and...@2ndquadrant.com:

 Hi,

 On 2014-08-27 22:48:54 +0200, Pavel Stehule wrote:
  Hi
 
  I chose \? xxx, because it is related to psql features. I wrote commands:
 
  \? options
  \? variables
 
  comments?

 Generall I like it.

 Some stuff I changed:
 * I rephrased the sgml changes
 * s/Printing options/Display options/. Or maybe Display influencing
   variables? That makes it clearer why they're listed under
   --help-variables.
 * I added \? commands as an alias for a plain \?
   That way the scheme can sensibly be expanded.
 * I renamed help_variables() to be inline with the surrounding functions.

 Stuff I wondered about:
 * How about making it --help=variables instead of --help-variables? That


I am sorry, I don't like this idea --help-variables is much more usual for
cmd line


 * Do we really need the 'Report bugs to' blurb for --help-variables?



 * Since we're not exhaustive about documenting environment variales, do
   we really need to document TMPDIR and SHELL?
 * Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is
   for already started statements, and PROMPT3 is for COPY FROM STDIN?


I don't see TMPDIR, SHELL as really important in this position

Pavel



 Looks like it's commitable next round.

 I've attached a incremental patch.

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Petr Jelinek

On 28/08/14 13:20, Andres Freund wrote:

Hi,

Stuff I wondered about:
* How about making it --help=variables instead of --help-variables?


-1, help is not a variable to be assigned imho


* Do we really need the 'Report bugs to' blurb for --help-variables?


Probably not.


* Since we're not exhaustive about documenting environment variales, do
   we really need to document TMPDIR and SHELL?


They are bit more important than most of others, but probably not really 
necessary to be there as they are not psql specific.



* Shouldn't we document that PROMPT1 is the standard prompt, PROMPT2 is
   for already started statements, and PROMPT3 is for COPY FROM STDIN?


Yes please!

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Fujii Masao
On Thu, Aug 28, 2014 at 5:48 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 comments?

+fprintf(output, _(  ECHO   control what input is
written to standard output [all, queries]\n));

The valid values in the help messages should be consistent with
the values that the tab-completion displays. So in the case of ECHO,
errors and none also should be added in the message. Thought?

In the help messages of some psql variables like ECHO_HIDDEN, valid
values are not explained. Why not?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is this code safe?

2014-08-28 Thread Pavan Deolasee
On Thu, Aug 28, 2014 at 11:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Pavan Deolasee pavan.deola...@gmail.com writes:
  Can some kind of compiler optimisation reorder things such that the else
  if expression is evaluated using the old, uninitialised value of optval?

 Any such behavior is patently illegal per the C standard.


Thanks a lot for explaining that.


   Not that that's
 always stopped compiler writers; but I think you might be looking at a
 compiler bug.


I'd requested the reporter to try moving the getsockopt() call outside if
expression. That hasn't helped. So I think its a case of getsockopt()
neither returning an error nor setting optval to any sane value. Will try
to get more details about the platform etc.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Magnus Hagander
On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Aug 27, 2014 at 11:56 AM, Alexey Klyukin al...@hintbits.com wrote:
 Greetings,

 Is there a strong reason to disallow reloading server key and cert files
 during the PostgreSQL reload?

 Key and cert files are loaded in the postmaster. We'd need to change
 that.

 Why?

Hmm. That's actually a good point. Not sure I have an excuse. They
could certainly be made BACKEND without that, and there's no way to
change it within a running backend *anyway*, since we cannot turn
on/off SSL once a connection has been made. So yeah, it can actually
still be loaded in postmaster, and I withdraw that argument :)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Pavel Stehule
2014-08-28 14:22 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Thu, Aug 28, 2014 at 5:48 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  comments?

 +fprintf(output, _(  ECHO   control what input is
 written to standard output [all, queries]\n));

 The valid values in the help messages should be consistent with
 the values that the tab-completion displays. So in the case of ECHO,
 errors and none also should be added in the message. Thought?

 In the help messages of some psql variables like ECHO_HIDDEN, valid
 values are not explained. Why not?


it is based on http://www.postgresql.org/docs/9.4/static/app-psql.html

ECHO_HIDDEN

When this variable is set and a backslash command queries the database, the
query is first shown. This way you can study the PostgreSQL internals and
provide similar functionality in your own programs. (To select this
behavior on program start-up, use the switch -E.) If you set the variable
to the value noexec, the queries are just shown but are not actually sent
to the server and executed.
There are no clear a set of valid values :( .. When I found a known fields
in doc, I used it.

Regards

Pavel



 Regards,

 --
 Fujii Masao



Re: [HACKERS] Escaping from blocked send() reprised.

2014-08-28 Thread Kyotaro HORIGUCHI
Hello, sorry for the dazed reply in the previous mail.

I made revised patch for this issue.

Attached patches are following,

- 0001_Revise_socket_emulation_for_win32_backend.patch

  Revises socket emulation on win32 backend so that each socket
  can have its own blocking mode state.

- 0002_Allow_backend_termination_during_write_blocking.patch

  The patch to solve the issue. This patch depends on the 0001_
  patch.

==

  I'm marking this as Waiting on Author in the commitfest app, because:
  1. the protocol violation needs to be avoided one way or another, and
  2. the behavior needs to be consistent so that a single
  pg_terminate_backend() is enough to always kill the connection.

- Preventing protocol violation.

  To prevent protocol violation, secure_write sets
  ClientConnectionLost when SIGTERM detected, then
  internal_flush() and ProcessInterrupts() follow the
  instruction.

- Single pg_terminate_backend surely kills the backend.

  secure_raw_write() uses non-blocking socket and a loop of
  select() with timeout to surely detects received
  signal(SIGTERM).

  To avoid frequent switching of blocking mode, the bare socket
  for Port is put to non-blocking mode from the first in
  StreamConnection() and blocking mode is controlled only by
  Port-noblock in secure_raw_read/write().

  To make the code mentioned above (Patch 0002) tidy, rewrite the
  socket emulation code for win32 backends so that each socket
  can have its own non-blocking state. (patch 0001)

Some concern about this patch,

- This patch allows the number of non-blocking socket to be below
  64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.

- This patch introduced redundant socket emulation for win32
  backend but win32 bare socket for Port is already nonblocking
  as described so it donsn't seem to be a serious problem on
  performance. Addition to it, since I don't know the reason why
  win32/socket.c provides the blocking-mode socket emulation, I
  decided to preserve win32/socket.c to have blocking socket
  emulation. Possibly it can be removed.

Any suggestions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..c92851e 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -795,10 +795,6 @@ pq_set_nonblocking(bool nonblocking)
 	if (MyProcPort-noblock == nonblocking)
 		return;
 
-#ifdef WIN32
-	pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
 	/*
 	 * Use COMMERROR on failure, because ERROR would try to send the error to
 	 * the client, which might require changing the mode again, leading to
@@ -816,7 +812,7 @@ pq_set_nonblocking(bool nonblocking)
 			ereport(COMMERROR,
 	(errmsg(could not set socket to blocking mode: %m)));
 	}
-#endif
+
 	MyProcPort-noblock = nonblocking;
 }
 
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index c981169..f0ff3e7 100644
--- a/src/backend/port/win32/socket.c
+++ b/src/backend/port/win32/socket.c
@@ -21,11 +21,8 @@
  * non-blocking mode in order to be able to deliver signals, we must
  * specify this in a separate flag if we actually need non-blocking
  * operation.
- *
- * This flag changes the behaviour *globally* for all socket operations,
- * so it should only be set for very short periods of time.
  */
-int			pgwin32_noblock = 0;
+static fd_set		nonblockset;
 
 #undef socket
 #undef accept
@@ -33,6 +30,7 @@ int			pgwin32_noblock = 0;
 #undef select
 #undef recv
 #undef send
+#undef closesocket
 
 /*
  * Blocking socket functions implemented so they listen on both
@@ -40,6 +38,34 @@ int			pgwin32_noblock = 0;
  */
 
 /*
+ * Set blocking mode for each socket
+ */
+void
+pgwin32_set_socket_nonblock(SOCKET s, int nonblock)
+{
+	if (nonblock)
+		FD_SET(s, nonblockset);
+	else
+		FD_CLR(s, nonblockset);
+
+	/*
+	 * fd_set cannot have more than FD_SETSIZE entries. It's not likey to come
+	 * close to this limit but if it goes above the limit, non blocking state
+	 * of some existing sockets will be discarded.
+	 */
+	if (nonblockset.fd_count = FD_SETSIZE)
+		elog(FATAL, Too many sockets requested to be nonblocking mode.);
+}
+
+void
+pgwin32_nonblockset_init()
+{
+	FD_ZERO(nonblockset);
+}
+
+#define socket_is_nonblocking(s) FD_ISSET((s), nonblockset)
+
+/*
  * Convert the last socket error code into errno
  */
 static void
@@ -256,6 +282,10 @@ pgwin32_socket(int af, int type, int protocol)
 		TranslateSocketError();
 		return INVALID_SOCKET;
 	}
+
+	/* newly cerated socket should be in blocking mode  */
+	pgwin32_set_socket_nonblock(s, false);
+
 	errno = 0;
 
 	return s;
@@ -334,7 +364,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
 		return -1;
 	}
 
-	if (pgwin32_noblock)
+	if (socket_is_nonblocking(s))
 	{
 		/*
 		 * No data received, and we are in emulated non-blocking mode, so
@@ -420,7 +450,7 @@ pgwin32_send(SOCKET s, const void *buf, int len, int flags)
 			return -1;
 		}
 
-		if 

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-28 Thread Heikki Linnakangas

On 08/26/2014 12:44 PM, Emre Hasegeli wrote:

I agree with you that we can support other join type and anti join later,
If others don’t have any objection in doing other parts later I will mark as Ready 
For Committer.


I updated the patch to cover semi and anti joins with eqjoinsel_semi().
I think it is better than returning a constant.  The new version
attached with the new version of the test script.  Can you please
look at it again and mark it as ready for committer if it seems okay
to you?


I took a quick look at this. Some questions:

* Isn't X  Y equivalent to network_scan_first(X)  Y AND 
network_scan_last(X)  Y? Or at least close enough for selectivity 
estimation purposes? Pardon my ignorance - I'm not too familiar with the 
inet datatype - but how about just calling scalarineqsel for both bounds?


* inet_mcv_join_selec() is O(n^2) where n is the number of entries in 
the MCV lists. With the max statistics target of 1, a worst case 
query on my laptop took about 15 seconds to plan. Maybe that's 
acceptable, but you went through some trouble to make planning of MCV vs 
histogram faster, by the log2 method to compare only some values, so I 
wonder why you didn't do the same for the MCV vs MCV case?


* A few typos: lenght - length.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-08-28 Thread Kyotaro HORIGUCHI
Hello,

 On Thu, 2014-07-03 at 13:05 +0900, Kyotaro HORIGUCHI wrote:
  For the earlier than 9.4, no one seems to have considered
  seriously to use local_preload_library via ALTER statements so
  far. Only document fix would be enough for them.
 
 I think local_preload_libraries never actually worked sensibly for any
 specific purpose.  It was designed to work with the pldebugger, but
 pldebugger doesn't use it anymore.

Hmm, I see although I don't know pldebugger.

 So before we start bending the GUC system into new directions, let's ask
 ourselves what the current practical application of
 local_preload_libraries is and whether the proposed changes support that
 use at all.  I suspect there aren't any, and we should leave it alone.

I found this issue when trying per-pg_user (role) loading of
auto_analyze and some tweaking tool. It is not necessarily set by
the user by own, but the function to decide whether to load some
module by the session-user would be usable, at least, as for me:)

 I agree that we should fix the documentation in 9.3 and earlier.

It seems rather hard work if local_preload_library itself is not
removed or fixed as expressed..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread David G Johnston
rohtodeveloper wrote
 I have a question about data type timestamp with time zone.
 Why data of timestamptz does not store value of timezone passed to it?
 
 The timezone of output(+08) is different with the original input
 value(+02).
 It seems not to be good behavior.

Its good for the inumerable people who use it every day without
difficulty...

The why is that the goal of timestamptz is to represent a single
point-in-time.  For all practical purposes the introduction of timezones
simply allows for multiple equivalent representations of said point. 
Postgres has simply chosen UTC as the canonical representation for storage
purposes and uses client-provided timezone information to transform the
stored valued into the equivalent representation that is thought to be most
useful to the user.


 But the behavior of date type time with time zone is correct.
 
 postgres=# select '14:30:30.423602+02'::time with time zone;
timetz   
 
  14:30:30.423602+02
 (1 row)

Inconsistent (wrt timestamptz), and possibly buggy (though doubtful,
consistency is not mandatory), but the documentation itself says that time
with time zone has problematic properties mandated by the SQL standard.

The issue is that without knowing the date within a given timezone one does
not know the adjustment value to use.  TimeZones are inherently date
dependent - so timetz is fundamentally flawed even if it can be used to good
effect in limited situations.

If this does what you need then create a composite type (date, timetz). 
Once you starting doing modifications to your custom type you will likely
find the timestamptz behavior to be more useful and accurate.
 

 If the corrent behavior of timestamptz is not suitable,is there any plan
 to correct the behavior of timestamptz or create a new data type which can
 store timestamp with timezone?

Timestamptz will never be changed from its current behavior.

The bar to introduce another timestamptz-like data type with different
behavior is extremely high.


It would probably be worthwhile for everyone if you share what you are
actually trying to accomplish instead of just throwing out the claim that
the data type is broken.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Why-data-of-timestamptz-does-not-store-value-of-timezone-passed-to-it-tp5816703p5816737.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specifying the unit in storage parameter

2014-08-28 Thread Alvaro Herrera
Michael Paquier wrote:

 This remark is just to limit the amount of trash in the database used
 for regression tests. But then if we'd remove everything we would lack
 handy material for tests on utilities like database-wide thingies of
 the type VACUUM, REINDEX, pg_dump, etc. And we can just drop the
 database used for regressions to clean up everything. So that's not
 mandatory at all. I tend to always clean up objects in my patches
 touching regressions to limit interactions with other tests, but I
 guess that's up to the person who wrote the code to decide.

Leaving lingering objects is not a bad thing, particularly if they have
unusual properties; they enable somebody pg_dump'ing the database which
can be a good test for pg_dump.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net wrote:
 Key and cert files are loaded in the postmaster. We'd need to change
 that.

 Why?

 Hmm. That's actually a good point. Not sure I have an excuse. They
 could certainly be made BACKEND without that, and there's no way to
 change it within a running backend *anyway*, since we cannot turn
 on/off SSL once a connection has been made. So yeah, it can actually
 still be loaded in postmaster, and I withdraw that argument :)

Why would they need to be BACKEND, as opposed to just PGC_SIGHUP?
The only reason they're PGC_POSTMASTER is the lack of any code
for loading updated values, which I assume is something that's
possible with OpenSSL.

We could in fact wait to load them until after a backend has forked off
from the postmaster, but (1) that'd slow down session startup, and (2)
it would mean that you don't hear about broken settings at postmaster
startup.

(BTW, what happens on Windows?  I imagine we have to reload them anyway
after fork/exec on that platform ...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Magnus Hagander
On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Aug 28, 2014 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 6:40 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 Key and cert files are loaded in the postmaster. We'd need to change
 that.

 Why?

 Hmm. That's actually a good point. Not sure I have an excuse. They
 could certainly be made BACKEND without that, and there's no way to
 change it within a running backend *anyway*, since we cannot turn
 on/off SSL once a connection has been made. So yeah, it can actually
 still be loaded in postmaster, and I withdraw that argument :)

 Why would they need to be BACKEND, as opposed to just PGC_SIGHUP?
 The only reason they're PGC_POSTMASTER is the lack of any code
 for loading updated values, which I assume is something that's
 possible with OpenSSL.

I just thought semantically - because they do not change in a running
backend. Any running backend will continue with encryption set up
based on the old certificate.


 We could in fact wait to load them until after a backend has forked off
 from the postmaster, but (1) that'd slow down session startup, and (2)
 it would mean that you don't hear about broken settings at postmaster
 startup.

 (BTW, what happens on Windows?  I imagine we have to reload them anyway
 after fork/exec on that platform ...)

Yes, we already do that - secure_initialize() is called in SubPostmasterMain().

But I think reloading them in the postmaster on Unix is the better choice, yes.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Why would they need to be BACKEND, as opposed to just PGC_SIGHUP?

 I just thought semantically - because they do not change in a running
 backend. Any running backend will continue with encryption set up
 based on the old certificate.

Hm.  Yeah, I guess there is some use in holding onto the values that were
actually used to initialize the current session, or at least there would
be if we exposed the cert contents in any fashion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.

2014-08-28 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 The patch attached fixes pg_upgrade by putting quotes when generating
 reloptions and it passes check-world. I imagine that having quotes by
 default in the value of reloptions in pg_class is the price to pay for
 supporting units. If this is not considered as a correct approach,
 then reverting the patch would be wiser I guess.

Ugh.  I'm not sure what the best solution is, but I don't think I like
that one.

Given that this patch broke both pg_dump and pg_upgrade, I think it should
be reverted for the time being, rather than applying a hasty hack.
Obviously more thought and testing is needed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Andres Freund
On 2014-08-28 10:12:19 -0400, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Why would they need to be BACKEND, as opposed to just PGC_SIGHUP?
 
  I just thought semantically - because they do not change in a running
  backend. Any running backend will continue with encryption set up
  based on the old certificate.
 
 Hm.  Yeah, I guess there is some use in holding onto the values that were
 actually used to initialize the current session, or at least there would
 be if we exposed the cert contents in any fashion.

Won't that allow the option to be specified at connection start by mere
mortal users? That sounds odd to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-28 10:12:19 -0400, Tom Lane wrote:
 Hm.  Yeah, I guess there is some use in holding onto the values that were
 actually used to initialize the current session, or at least there would
 be if we exposed the cert contents in any fashion.

 Won't that allow the option to be specified at connection start by mere
 mortal users? That sounds odd to me.

Well, no, because SSL would be established (or not) before we ever process
the contents of the connection request packet.  You might be able to
change the value that SHOW reports, but not the value actually governing
your session.

Having said that, there's a nearby thread about inventing a SUBACKEND
GUC category, and that's likely what we'd really want to use here, just
on the grounds that superusers would know better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Magnus Hagander
On Thu, Aug 28, 2014 at 4:14 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-28 10:12:19 -0400, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Thu, Aug 28, 2014 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Why would they need to be BACKEND, as opposed to just PGC_SIGHUP?

  I just thought semantically - because they do not change in a running
  backend. Any running backend will continue with encryption set up
  based on the old certificate.

 Hm.  Yeah, I guess there is some use in holding onto the values that were
 actually used to initialize the current session, or at least there would
 be if we exposed the cert contents in any fashion.

 Won't that allow the option to be specified at connection start by mere
 mortal users? That sounds odd to me.

The cert is (and has to be) loaded before we even read the startup
packet, so there is no way for them to actually send the value over
early enough I believe.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Audit of logout

2014-08-28 Thread Amit Kapila
On Wed, Aug 27, 2014 at 5:19 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Sat, Aug 23, 2014 at 3:44 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  On Tue, Aug 5, 2014 at 8:04 PM, Fujii Masao masao.fu...@gmail.com
wrote:
  Changing PGC_SU_BACKEND parameter (log_connections) is
  visible even with a non-super user client due to above code.
  Shouldn't it be only visible for super-user logins?
 
  Simple steps to reproduce the problem:
  a. start Server (default configuration)
  b. connect with superuser
  c. change in log_connections to on in postgresql.conf
  d. perform select pg_reload_conf();
  e. connect with non-super-user
  f.  show log_connections;  --This step shows the value as on,
 --whereas I think it should have
been
  off

 In this case, log_connections is changed in postgresql.conf and it's
 reloaded, so ISTM that it's natural that even non-superuser sees the
 changed value. No? Maybe I'm missing something.

Yeah, you are right.

With the latest patch, I am getting one regression failure on windows.
Attached is the regression diff.

Can we improve this line a bit?
!  * BACKEND options are the same as SU_BACKEND ones, but they can
BACKEND options can be set same as SU_BACKEND ones, ..


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-08-28 Thread Andreas Karlsson

On 08/28/2014 04:43 AM, Peter Geoghegan wrote:

-- Nesting within wCTE:
WITH t AS (
 INSERT INTO z SELECT i, 'insert'
 FROM generate_series(0, 16) i
 ON CONFLICT UPDATE SET v = v || 'update' -- use of
operators/functions in targetlist
 RETURNING * -- only projects inserted tuples, never updated tuples
)
SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k;


Personally I would find it surprising if RETURNING did not also return 
the updated tuples. In many use cases for upsert the user does not care 
if the row was new or not.


What I think would be useful is if all tuples were returned but there 
was some way to filter out only the inserted ones.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Andres Freund
On 2014-08-28 10:20:08 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-28 10:12:19 -0400, Tom Lane wrote:
  Hm.  Yeah, I guess there is some use in holding onto the values that were
  actually used to initialize the current session, or at least there would
  be if we exposed the cert contents in any fashion.
 
  Won't that allow the option to be specified at connection start by mere
  mortal users? That sounds odd to me.
 
 Well, no, because SSL would be established (or not) before we ever process
 the contents of the connection request packet.  You might be able to
 change the value that SHOW reports, but not the value actually governing
 your session.

Sure. There's probably nothing happening with ssl 'that late' right
now. But and it seems like a invitation for trouble later to me. Even if
it's just that other code copies the logic, thinking it'd also be safe.

 Having said that, there's a nearby thread about inventing a SUBACKEND
 GUC category, and that's likely what we'd really want to use here, just
 on the grounds that superusers would know better.

What we really want is PGC_SIGHUP | PGC_BACKEND, right? I wonder if it's
not better to allow for some cominations of GucContext's by bitmask,
instead of adding SUBACKEND and HUPBACKEND and whatever else might make
sense.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-28 10:20:08 -0400, Tom Lane wrote:
 Having said that, there's a nearby thread about inventing a SUBACKEND
 GUC category, and that's likely what we'd really want to use here, just
 on the grounds that superusers would know better.

 What we really want is PGC_SIGHUP | PGC_BACKEND, right?

How you figure that?  You *don't* want it to change at SIGHUP, at least
not in existing sessions.  And the postmaster already thinks it should
reload PGC_BACKEND variables on SIGHUP.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] re-reading SSL certificates during server reload

2014-08-28 Thread Andres Freund
On 2014-08-28 10:30:30 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-28 10:20:08 -0400, Tom Lane wrote:
  Having said that, there's a nearby thread about inventing a SUBACKEND
  GUC category, and that's likely what we'd really want to use here, just
  on the grounds that superusers would know better.
 
  What we really want is PGC_SIGHUP | PGC_BACKEND, right?
 
 How you figure that?  You *don't* want it to change at SIGHUP, at least
 not in existing sessions.  And the postmaster already thinks it should
 reload PGC_BACKEND variables on SIGHUP.

Well, it shouldn't be settable at connection start but changed for new
connections (expressed by _SIGHUP). But it shouldn't change for existing
connections (therefor PGC_BACKEND).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-28 Thread Andres Freund
On 2014-08-28 14:04:27 +0200, Petr Jelinek wrote:
 On 28/08/14 13:20, Andres Freund wrote:
 Hi,
 
 Stuff I wondered about:
 * How about making it --help=variables instead of --help-variables?
 
 -1, help is not a variable to be assigned imho

I don't think variable assignment is a good mental model for long
commandline arguments. And it's not like I'm the first to come up with
an extensible --help. Check e.g. gcc.

But anyway, I guess I've lost that argument.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Claudio Freire
On Thu, Aug 28, 2014 at 3:27 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 Hello Aidan,


 If all you want is to avoid the write storms when fsyncs start happening
 on
 slow storage, can you not just adjust the kernel vm.dirty* tunables to
 start making the kernel write out dirty buffers much sooner instead of
 letting them accumulate until fsyncs force them out all at once?


 I tried that by setting:
   vm.dirty_expire_centisecs = 100
   vm.dirty_writeback_centisecs = 100

 So it should start writing returned buffers at most 2s after they are
 returned, if I understood the doc correctly, instead of at most 35s.

 The result is that with a 5000s 25tps pretty small load (the system can do
 300tps with the default configuration), I lost 2091 (1.7%) of transactions,
 that is they were beyond the 200ms schedule limit. Another change is that
 overall the lost transactions are more spread than without this setting,
 although there still is stretches of unresponsiveness.

 So although the situation is significantly better, it is still far from good
 with the much reduced value I tried.


What do the checkpoint logs look like now? (especially interested in sync times)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-08-28 Thread Alvaro Herrera
Josh Berkus wrote:
 On 04/16/2014 01:30 PM, Alvaro Herrera wrote:
  Josh Berkus wrote:
 
  You can see the current multixact value in pg_controldata output.  Keep
  timestamped values of that somewhere (a table?) so that you can measure
  consumption rate.  I don't think we provide SQL-level access to those
  values.
 
  Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
  requirement before release.
  
  Yeah, good idea.  Want to propose a patch?
 
 Yeah, lemme dig into this.  I really think we need it for 9.4, feature
 frozen or not.

Ping?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Robert Haas
On Tue, Aug 26, 2014 at 12:19 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
 mark.kirkw...@catalyst.net.nz wrote:
 I could think of 2 ways to change this:
 
 a. if user has specified cost_limit value for table, then it just uses it
  rather than rebalancing based on value of system-wide guc variable
  autovacuum_vacuum_cost_limit
 b. another could be to restrict setting per-table value to be lesser than
  system-wide value?
 
 The former is used for auto vacuum parameters like scale_factor and
 later is used for parameters like freeze_max_age.
 
 I've been giving some thought to this.  Really, there is no way to
 handle this sensibly while at the same time keeping the documented
 behavior -- or in other words, what we have documented is not useful
 behavior.  Your option (b) above is an easy solution to the problem,
 however it means that the user will have serious trouble configuring the
 system in scenarios such as volatile tables, as Mark says -- essentially
 that will foreclose the option of using autovacuum for them.

 I'm not sure I like your (a) proposal much better.  One problem there is
 that if you set the values for a table to be exactly the same values as
 in postgresql.conf, it will behave completely different because it will
 not participate in balancing.  To me this seems to violate POLA.

 So my proposal is a bit more complicated.  First we introduce the notion
 of a single number, to enable sorting and computations: the delay
 equivalent, ...

I favor option (a).   There's something to be said for your proposal
in terms of logical consistency with what we have now, but to be
honest I'm not sure it's the behavior anyone wants (I would welcome
more feedback on what people actually want).  I think we should view
an attempt to set a limit for a particular table as a way to control
the rate at which that table is vacuumed - period.

At least in situations that I've encountered, it's typical to be able
to determine the frequency with which a given table needs to be
vacuumed to avoid runaway bloat, and from that you can work backwards
to figure out how fast you must process it in MB/s, and from there you
can work backwards to figure out what cost delay will achieve that
effect.  But if the system tinkers with the cost delay under the hood,
then you're vacuuming at a different (slower) rate and, of course, the
table bloats.

Now, in the case where you are setting an overall limit, there is at
least an argument to be made that you can determine the overall rate
of autovacuum-induced I/O activity that the system can tolerate, and
set your limits to stay within that budget, and then let the system
decide how to divide that I/O up between workers.  But if you're
overriding a per-table limit, I don't really see how that holds any
water.  The system I/O budget doesn't go up just because one
particular table is being vacuumed rather than any other.  The only
plausible use case for setting a per-table rate that I can see is when
you actually want the system to use that exact rate for that
particular table.

I might be missing something, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.

2014-08-28 Thread Fujii Masao
On Thu, Aug 28, 2014 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 The patch attached fixes pg_upgrade by putting quotes when generating
 reloptions and it passes check-world. I imagine that having quotes by
 default in the value of reloptions in pg_class is the price to pay for
 supporting units. If this is not considered as a correct approach,
 then reverting the patch would be wiser I guess.

 Ugh.  I'm not sure what the best solution is, but I don't think I like
 that one.

Another approach is to change pg_dump so that it encloses the relopt
values with single quotes. This is the same approach as what
pg_dumpall does for database or role-specific settings. Obvious
drawback of this approach is that it prevents pg_dump with 9.4 or
before from outputting the restorable dump. Maybe we can live with
this because there is no guarantee that older version of pg_dump can
work properly with newer major version of server. But maybe
someone cannot live with that. Not sure.

Further other approach is to change the reloptions code so that it
always stores the plain value without the units (i.e., 1000 is stored
if 1s is specified in autovacuum_vacuum_cost_delay)in pg_class.
This approach doesn't prevent older version of pg_dump from taking
the dump from v9.5 server. One drawback of this approach is that
reloption values are always stored without the units, which might
make it a bit hard for a user to see the reloption values from pg_class.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-28 Thread Alexey Klyukin
On Mon, Aug 25, 2014 at 12:02 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 08/24/2014 03:11 PM, Alexey Klyukin wrote:

 On Wed, Aug 20, 2014 at 11:53 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:


  The patch doesn't seem to support wildcards in alternative names. Is
 that on purpose?


Not really, we just did not have any use case for them, but it seems that
RFC 5280 does not disallow them:

  Finally, the semantics of subject alternative names that include
   wildcard characters (e.g., as a placeholder for a set of names) are
   not addressed by this specification.  Applications with specific
   requirements MAY use such names, but they must define the semantics.

I've added support for them in the next iteration of the patch attached to
this email.



 It would be good to add a little helper function that does the NULL-check,
 straight comparison, and wildcard check, for a single name. And then use
 that for the Common Name and all the Alternatives. That'll ensure that all
 the same rules apply whether the name is the Common Name or an Alternative
 (assuming that the rules are supposed to be the same; I don't know if
 that's true).


Thanks, common code has been moved into a separate new function.

Another question is how should we treat the certificates with no CN and
non-empty SAN?

Current code just bails out right after finding no CN section present , but
the RFC (5280) says:

   Further, if the only subject identity included in the certificate is
   an alternative name form (e.g., an electronic mail address), then the
   subject distinguished name MUST be empty (an empty sequence), and the
   subjectAltName extension MUST be present.

which to me sounds like the possibility of coming across such certificates
in the wild, although I personally see little use in them.

Regards,
-- 
Alexey Klyukin
diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
new file mode 100644
index f950fc3..394a66f
*** a/src/interfaces/libpq/fe-secure-openssl.c
--- b/src/interfaces/libpq/fe-secure-openssl.c
***
*** 60,68 
--- 60,73 
  #ifdef USE_SSL_ENGINE
  #include openssl/engine.h
  #endif
+ #include openssl/x509v3.h
  
  static bool verify_peer_name_matches_certificate(PGconn *);
  static intverify_cb(int ok, X509_STORE_CTX *ctx);
+ static intcertificate_name_entry_validate_match(PGconn *conn,
+   
  char *name,
+   
  unsigned int len,
+   
  bool *match);
  static void destroy_ssl_system(void);
  static intinitialize_SSL(PGconn *conn);
  static PostgresPollingStatusType open_client_SSL(PGconn *);
*** wildcard_certificate_match(const char *p
*** 471,479 
return 1;
  }
  
  
  /*
!  *Verify that common name resolves to peer.
   */
  static bool
  verify_peer_name_matches_certificate(PGconn *conn)
--- 476,515 
return 1;
  }
  
+ /*
+  * Validate a single certificate name entry and match it against the pghost.
+  * Returns 0 if the certificate name is invalid (contains embedded NULLs), 1 
otherwise.
+  */
+ static int
+ certificate_name_entry_validate_match(PGconn *conn, char *name, unsigned int 
len, bool *match)
+ {
+   /* There is no guarantee the string returned from the certificate is 
NULL-terminated */
+   name[len] = '\0';
+   *match = false;
+   /*
+* Reject embedded NULLs in certificate common or alternative name to 
prevent attacks
+* like CVE-2009-4034.
+*/
+   if (len != strlen(name))
+   {
+   printfPQExpBuffer(conn-errorMessage,
+ libpq_gettext(SSL certificate's 
common name contains embedded null\n));
+   return 0;
+   }
+   if (pg_strcasecmp(name, conn-pghost) == 0)
+   /* Exact name match */
+   *match = true;
+   else if (wildcard_certificate_match(name, conn-pghost))
+   /* Matched wildcard certificate */
+   *match = true;
+   else
+   *match = false;
+   return 1;
+ }
+ 
  
  /*
!  *Verify that common name or any of the alternative dNSNames resolves to 
peer.
   */
  static bool
  verify_peer_name_matches_certificate(PGconn *conn)
*** verify_peer_name_matches_certificate(PGc
*** 492,499 
  
/*
 * Extract the common name from the certificate.
-*
-* XXX: Should support alternate names here
 */
/* First find out the name's length and allocate a buffer for it. */
len = X509_NAME_get_text_by_NID(X509_get_subject_name(conn-peer),
--- 528,533 
*** verify_peer_name_matches_certificate(PGc
*** 522,540 
 

[HACKERS] Re: [COMMITTERS] pgsql: Allow units to be specified in relation option setting value.

2014-08-28 Thread Robert Haas
On Thu, Aug 28, 2014 at 12:22 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Aug 28, 2014 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 The patch attached fixes pg_upgrade by putting quotes when generating
 reloptions and it passes check-world. I imagine that having quotes by
 default in the value of reloptions in pg_class is the price to pay for
 supporting units. If this is not considered as a correct approach,
 then reverting the patch would be wiser I guess.

 Ugh.  I'm not sure what the best solution is, but I don't think I like
 that one.

 Another approach is to change pg_dump so that it encloses the relopt
 values with single quotes. This is the same approach as what
 pg_dumpall does for database or role-specific settings. Obvious
 drawback of this approach is that it prevents pg_dump with 9.4 or
 before from outputting the restorable dump. Maybe we can live with
 this because there is no guarantee that older version of pg_dump can
 work properly with newer major version of server. But maybe
 someone cannot live with that. Not sure.

To me, this doesn't seem nearly important enough to justify breaking
pg_dump compatibility.  AAUI, this is just a cosmetic improvement, so
we shouldn't break functional things for that.

 Further other approach is to change the reloptions code so that it
 always stores the plain value without the units (i.e., 1000 is stored
 if 1s is specified in autovacuum_vacuum_cost_delay)in pg_class.
 This approach doesn't prevent older version of pg_dump from taking
 the dump from v9.5 server. One drawback of this approach is that
 reloption values are always stored without the units, which might
 make it a bit hard for a user to see the reloption values from pg_class.

This seems like the way to go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-08-28 Thread Josh Berkus
On 08/28/2014 09:09 AM, Alvaro Herrera wrote:
 Josh Berkus wrote:
 On 04/16/2014 01:30 PM, Alvaro Herrera wrote:
 Josh Berkus wrote:

 You can see the current multixact value in pg_controldata output.  Keep
 timestamped values of that somewhere (a table?) so that you can measure
 consumption rate.  I don't think we provide SQL-level access to those
 values.

 Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
 requirement before release.

 Yeah, good idea.  Want to propose a patch?

 Yeah, lemme dig into this.  I really think we need it for 9.4, feature
 frozen or not.

Got sidetracked by JSONB.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-28 Thread Alexey Klyukin
On Mon, Aug 25, 2014 at 12:33 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 08/25/2014 01:07 PM, Andres Freund wrote:

 On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote:

 But actually, I wonder if we should delegate the whole hostname matching
 to
 OpenSSL? There's a function called X509_check_host for that, although
 it's
 new in OpenSSL 1.1.0 so we'd need to add a configure test for that and
 keep
 the current code to handle older versions.


 Given that we're about to add support for other SSL implementations I'm
 not sure that that's a good idea. IIRC there exist quite a bit of
 different interpretations about what denotes a valid cert between the
 libraries.



 As long as just this patch is concerned, I agree it's easier to just
 implement it ourselves, but if we want to start implementing more
 complicated rules, then I'd rather not get into that business at all, and
 let the SSL library vendor deal with the bugs and CVEs.



Sounds reasonable.



 I guess we'll go ahead with this patch for now, but keep this in mind if
 someone wants to complicate the rules further in the future.


+1

-- 
Regards,
Alexey Klyukin


Re: [HACKERS] Switch pg_basebackup to use -X stream instead of -X fetch by default?

2014-08-28 Thread Robert Haas
On Wed, Aug 27, 2014 at 2:55 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Aug 27, 2014 at 5:16 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Aug 27, 2014 at 6:16 AM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Aug 26, 2014 at 10:46 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 On 2014-08-26 16:41:44 -0400, Peter Eisentraut wrote:
 On 8/26/14 12:40 PM, Magnus Hagander wrote:
  I think the first reason is gone now, and the risk/damage of the two
  connections is probably smaller than running out of WAL. -x is a good
  default for smaller systems, but -X is a safer one for bigger ones. So
  I agree that changing the default mode would make sense.

 I would seriously consider just removing one of the modes.  Having two
 modes is complex enough, and then having different defaults in different
 versions, and fuzzy recommendations like, it's better for smaller
 systems, it's quite confusing.

 Happy with removing the option and just accepting -X for backward
 compat.

 Works for me - this is really the cleaner way of doing it...

 We cannot use -X stream with tar output format mode. So I'm afraid that
 removing -X fetch would make people using tar output format feel 
 disappointed.
 Or we should make -X stream work with tar mode.

 Ah, yes, I've actually had that on my TODO for some time.

 I think the easy way of doing that is to just create an xlog.tar file.
 Since we already create base.tar and possibly n*tablespace.tar,
 adding one more file shouldn't be a big problem, and would make such
 an implementation much easier. Would be trivial to do .tar.gz for it
 as well, just like for the others.

Still, that seems like a pretty good reason not to rip the old mode
out completely.  Actually, I'd favor that anyway: changing the default
in one release and removing the deprecated feature in a later release
usually provides a smoother upgrade path.  But if there are features
that aren't even present in the newer mode yet, that's an even better
reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Steve Crawford

On 08/28/2014 01:51 AM, rohtodeveloper wrote:

Hi,all

I have a question about data type timestamp with time zone.
Why data of timestamptz does not store value of timezone passed to it?

Considering the following example.

postgres=# select '2014-08-28 14:30:30.423602+02'::timestamp with time 
zone;

  timestamptz
---
 2014-08-28 20:30:30.423602+08
(1 row)

The timezone of output(+08) is different with the original input 
value(+02).
It seems not to be good behavior.But the behavior of date type time 
with time zone is correct.


postgres=# select '14:30:30.423602+02'::time with time zone;
   timetz

 14:30:30.423602+02
(1 row)

If the corrent behavior of timestamptz is not suitable,is there any 
plan to correct the behavior of timestamptz or create a new data type 
which can store timestamp with timezone?



*)manual--8.5.1.3. Time Stamps
-
For timestamp with time zone, the internally stored value is always in 
UTC (Universal Coordinated Time, traditionally known as Greenwich Mean 
Time, GMT). An input value that has an explicit time zone specified is 
converted to UTC using the appropriate offset for that time zone. If 
no time zone is stated in the input string, then it is assumed to be 
in the time zone indicated by the system's TimeZone parameter, and is 
converted to UTC using the offset for the timezone zone.

-


This is actually more appropriate for the General mailing list. But...

I have always considered timestamp with time zone to be a bad 
description of that data type but it appears to be a carryover from the 
specs. It is really a point in time with 2014-08-28 
14:30:30.423602+02 and 2014-08-28 20:30:30.423602+08 merely being 
different representations of that same point in time. Time with time 
zone is a similarly bad name as it is really a time with offset from GMT.


It should be noted that -08, +02 etc. are actually *offsets* from GMT 
and are not, technically, time-zones. A time-zone includes additional 
information about the dates on which that offset changes due to daylight 
saving schedules and politically imposed changes thereto.


As the manual states, The type time with time zone is defined by the 
SQL standard, but the definition exhibits properties which lead to 
questionable usefulness. From the above, you can infer that one of 
those issues is that the offset changes based on the date but there is 
no date in a time with time zone field. Among the things you will 
discover is that '12:34:56-04' is legal input for time with time zone 
but '12:34:56 America/New_York' is not because you can't determine the 
offset without a date. Adding a date like '2014-08-28 12:34:56 
America/New_York' will give you a time with offset or what the spec 
calls time with time zone (12:45:31.899075-04) though it really 
doesn't have any information about America/New_York.


That the internal representation is in GMT is a curiosity but ultimately 
irrelevant as is it up to PostgreSQL to appropriately convert/display 
whatever it stores internally to the input and output format specified 
by the user.


The varying values of things like day, month and year combined with 
constantly shifting definitions of time-zones make date and time 
handling, *um* interesting. Is the interval 1-day shorthand for 
24-hours or the same time of day the following day (i.e. when crossing 
DST boundaries). What is the appropriate value of March 31 minus one 
month? February 29 plus one year?


Read and experiment to understand the quirks and the design-decisions 
implemented in PostgreSQL (or other program).


Cheers,
Steve


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Alvaro Herrera
Robert Haas wrote:

 Now, in the case where you are setting an overall limit, there is at
 least an argument to be made that you can determine the overall rate
 of autovacuum-induced I/O activity that the system can tolerate, and
 set your limits to stay within that budget, and then let the system
 decide how to divide that I/O up between workers.  But if you're
 overriding a per-table limit, I don't really see how that holds any
 water.  The system I/O budget doesn't go up just because one
 particular table is being vacuumed rather than any other.  The only
 plausible use case for setting a per-table rate that I can see is when
 you actually want the system to use that exact rate for that
 particular table.

Yeah, this makes sense to me too -- at least as long as you only have
one such table.  But if you happen to have more than one, and due to
some bad luck they happen to be vacuumed concurrently, they will eat a
larger share of your I/O bandwidth budget than you anticipated, which
you might not like.

Thus what I am saying is that those should be scaled down too to avoid
peaks.  Now, my proposal above mentioned subtracting the speed of tables
under the limit, from the speed of those above the limit; maybe we can
just rip that part out.  Then we end up with the behavior you want, that
is to have the fast table vacuum as fast as it is configured when it's
the only fast table being vacuumed; and also with what I say, which is
that if you have two of them, the two balance the I/O consumption (but
only among themselves, not with the slow ones.)

Since figuring out this subtraction is the only thing missing from the
patch I posted, ISTM we could have something committable with very
little extra effort if we agree on this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing plpgsql.o Symbols on OS X

2014-08-28 Thread David E. Wheeler
On Aug 27, 2014, at 9:53 PM, Ashesh Vashi ashesh.va...@enterprisedb.com wrote:

 Please add -arch x86_64 to your LD_FLAGS and CFLAGS in your make file.

This made no difference:

LDFLAGS = -arch x86_64
CFLAGS = -arch x86_64

Best,

David



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] Missing plpgsql.o Symbols on OS X

2014-08-28 Thread David E. Wheeler
On Aug 27, 2014, at 4:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Yeah, but plpgsql.so is mentioned nowhere on your command line.
 
 I'm not too sure about the dynamic-linking rules on OS X, but I'd not be
 surprised if you need to provide a reference to plpgsql.so in its final
 installed location (ie, a reference to it in the build tree may appear to
 link and then fail at runtime).

Ah. Is there a recommended way to do that in a PGXS-powered Makefile?

Thanks,

David



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-28 Thread Fabien COELHO



I tried that by setting:
  vm.dirty_expire_centisecs = 100
  vm.dirty_writeback_centisecs = 100

So it should start writing returned buffers at most 2s after they are
returned, if I understood the doc correctly, instead of at most 35s.

The result is that with a 5000s 25tps pretty small load (the system can do
300tps with the default configuration), I lost 2091 (1.7%) of transactions,
that is they were beyond the 200ms schedule limit. Another change is that
overall the lost transactions are more spread than without this setting,
although there still is stretches of unresponsiveness.

So although the situation is significantly better, it is still far from good
with the much reduced value I tried.


What do the checkpoint logs look like now? (especially interested in sync times)


Here is an extract:

  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 4893 buffers (4.8%);
0 transaction log file(s) added, 0 removed, 0 recycled;
write=128.430 s, sync=0.378 s, total=128.921 s;
sync files=11, longest=0.375 s, average=0.034 s
  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 6200 buffers (6.1%);
0 transaction log file(s) added, 3 removed, 0 recycled;
write=128.934 s, sync=0.240 s, total=129.280 s;
sync files=7, longest=0.132 s, average=0.034 s
  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 6177 buffers (6.0%);
0 transaction log file(s) added, 3 removed, 0 recycled;
write=130.146 s, sync=0.322 s, total=130.592 s;
sync files=7, longest=0.185 s, average=0.046 s
  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 6185 buffers (6.0%);
0 transaction log file(s) added, 3 removed, 0 recycled;
write=132.673 s, sync=0.143 s, total=132.909 s;
sync files=5, longest=0.078 s, average=0.028 s
  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 6188 buffers (6.0%);
0 transaction log file(s) added, 3 removed, 0 recycled;
write=130.415 s, sync=0.170 s, total=130.676 s;
sync files=5, longest=0.132 s, average=0.034 s
  LOG:  checkpoint starting: xlog
  LOG:  checkpoint complete: wrote 6203 buffers (6.1%);
0 transaction log file(s) added, 3 removed, 0 recycled;
   write=131.726 s, sync=0.444 s, total=132.256 s;
   sync files=5, longest=0.441 s, average=0.088 s
  ...

Basically sync is between 0.1-0.5 seconds.

I had one particulary bad stretch of unresponsiveness, which is not 
clearly related to a slow checkpoint sync:


  progress: 4065.0 s, 28.7 tps, lat 11.886 ms stddev 48.470, lag 18.851 ms, 0 
skipped
  progress: 4066.2 s, 1.6 tps, lat 952.830 ms stddev 280.673, lag 155.310 ms, 
25 skipped
  progress: 4067.0 s, 2.7 tps, lat 1067.730 ms stddev 316.321, lag 171.766 ms, 
14 skipped
  progress: 4068.4 s, 1.4 tps, lat 1147.392 ms stddev 240.337, lag 132.297 ms, 
35 skipped
  progress: 4069.7 s, 2.3 tps, lat 1034.543 ms stddev 213.786, lag 154.453 ms, 
35 skipped
  progress: 4070.2 s, 1.8 tps, lat 562.715 ms stddev 0.000, lag 182.833 ms, 12 
skipped
  progress: 4071.3 s, 3.6 tps, lat 600.929 ms stddev 108.232, lag 162.723 ms, 
25 skipped
  progress: 4072.5 s, 1.7 tps, lat 1077.187 ms stddev 77.654, lag 168.849 ms, 
31 skipped
  progress: 4073.3 s, 1.3 tps, lat 1298.093 ms stddev 0.000, lag 168.882 ms, 21 
skipped
  progress: 4074.0 s, 2.7 tps, lat 920.704 ms stddev 183.587, lag 165.333 ms, 
24 skipped
  progress: 4075.3 s, 2.4 tps, lat 756.051 ms stddev 118.241, lag 176.863 ms, 
29 skipped
  progress: 4076.1 s, 1.3 tps, lat 1424.548 ms stddev 0.000, lag 0.000 ms, 17 
skipped
  progress: 4077.3 s, 2.4 tps, lat 791.090 ms stddev 338.729, lag 155.403 ms, 
26 skipped
  progress: 4078.1 s, 27.4 tps, lat 46.834 ms stddev 198.812, lag 3.915 ms, 0 
skipped



--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Kevin Grittner
Steve Crawford scrawf...@pinpointresearch.com wrote:

 I have always considered timestamp with time zone to be a bad
 description of that data type but it appears to be a carryover
 from the specs. It is really a point in time

I agree.  While what timestamptz implements is a very useful data
type, I think it was a very unfortunate decision to implement that
for the standard type name, instead of something more consistent
with the spec.  It seems very unlikely to change, though, because
so much existing production code would break.  :-(

Understandably, people do tend to expect that saving something into 
a column defined as TIMESTAMP WITH TIME ZONE will save a time zone 
with the timestamp, and in PostgreSQL it does not.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Pavel Stehule
2014-08-28 20:26 GMT+02:00 Kevin Grittner kgri...@ymail.com:

 Steve Crawford scrawf...@pinpointresearch.com wrote:

  I have always considered timestamp with time zone to be a bad
  description of that data type but it appears to be a carryover
  from the specs. It is really a point in time

 I agree.  While what timestamptz implements is a very useful data
 type, I think it was a very unfortunate decision to implement that
 for the standard type name, instead of something more consistent
 with the spec.  It seems very unlikely to change, though, because
 so much existing production code would break.  :-(

 Understandably, people do tend to expect that saving something into
 a column defined as TIMESTAMP WITH TIME ZONE will save a time zone
 with the timestamp, and in PostgreSQL it does not.


Yes, it strange for first moment, and it is difficult for beginners - but
it works well .. after you switch to different mode.

But can we implement a Time Zone as special type? This and examples and
documentation can better explain what it does.

Pavel



 --
 Kevin Grittner
 EDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Similar to csvlog but not really, json logs?

2014-08-28 Thread Josh Berkus
On 08/27/2014 09:53 AM, Andres Freund wrote:
  Perhaps instead of doing this in-core it would be better to make log 
  handling more extensible? I'm thinking add a specific binary format and 
  an external tool that can parse that and do whatever the user wants with 
  it. That means we don't have to keep adding more complexity to the 
  internal log handling (which already has the risk of being a bottleneck), 
  while allowing maximum user flexibility.
 There's a logging hook. Most of this should be doable from there.

Is there any docs at all on the logging hooks?  I couldn't find any.
Maybe that's why people keep trying to reinvent them.

The main reason I personally would like to have JSON logs has more to do
with formatting and extracting data.  For example, if you have a
prepared statement and log_min_duration_statement turned on, you get
something like this in the message and details fields:

duration: 8.253 ms  execute unnamed: SELECT login FROM users WHERE id
= $1, parameters: $1 = '90700'

... and then for various analytics, like pgBadger and troubleshooting,
you have to break out the parts of those fields by regex and split,
which is error-prone; in our query replay tool, I have at least a dozen
commits tweaking the regexes because of some special case I didn't
account for (like an array parameter).

It would be vastly easier to work with the above as JSON:

...
message : { duration : 8.253, command : execute,
statement_name : unnamed, statement : SELECT login FROM users
WHERE id = $1 }, details : { parameters : { $1 : 90700 } }
...

This would allow me, or Dalibo, to remove literally dozens of lines of
error-prone regexing code.

That fix would, IMHO, make it worth enabling JSON logging as a logging
hook or something similar.  If we're just going to convert the CSV to
JSON, with the existing fields?  Waste of time, I can do that with a
5-line script in at least 3 different languages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-08-28 Thread Peter Geoghegan
On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson andr...@proxel.se wrote:
 Personally I would find it surprising if RETURNING did not also return the
 updated tuples. In many use cases for upsert the user does not care if the
 row was new or not.

I'm not attached to that particular behavior, but it does seem kind of
similar to the behavior of BEFORE triggers, where a NULL return value
(do nothing) will also cause RETURNING to not project the tuple.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-08-28 Thread Andreas Karlsson

On 08/28/2014 09:05 PM, Peter Geoghegan wrote:

On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson andr...@proxel.se wrote:

Personally I would find it surprising if RETURNING did not also return the
updated tuples. In many use cases for upsert the user does not care if the
row was new or not.


I'm not attached to that particular behavior, but it does seem kind of
similar to the behavior of BEFORE triggers, where a NULL return value
(do nothing) will also cause RETURNING to not project the tuple.


I see. So we have three cases where we may or may not want to project a 
tuple.


1) The tuple was inserted
2) We got a conflict and updated the tuple
3) We got a conflict but skipped updating the tuple

My personal intuition was that (1) and (2) would be returned but not 
(3). But I am not sure if that is the most useful behavior.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Robert Haas
On Thu, Aug 28, 2014 at 1:36 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Now, in the case where you are setting an overall limit, there is at
 least an argument to be made that you can determine the overall rate
 of autovacuum-induced I/O activity that the system can tolerate, and
 set your limits to stay within that budget, and then let the system
 decide how to divide that I/O up between workers.  But if you're
 overriding a per-table limit, I don't really see how that holds any
 water.  The system I/O budget doesn't go up just because one
 particular table is being vacuumed rather than any other.  The only
 plausible use case for setting a per-table rate that I can see is when
 you actually want the system to use that exact rate for that
 particular table.

 Yeah, this makes sense to me too -- at least as long as you only have
 one such table.  But if you happen to have more than one, and due to
 some bad luck they happen to be vacuumed concurrently, they will eat a
 larger share of your I/O bandwidth budget than you anticipated, which
 you might not like.

I agree that you might not like that.  But you might not like having
the table vacuumed slower than the configured rate, either.  My
impression is that the time between vacuums isn't really all that
negotiable for some people.  I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.  Now that is obviously more a matter for autovacuum_naptime
than this option, but the point seems general to me: if you need the
table vacuumed every N seconds, minutes, or hours, and it only gets
vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
are other autovacuum workers running, the table is going to bloat.
That *might* be better than busting your I/O budget, but it might also
be (and I think frequently is) much worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Bruce Momjian
On Thu, Aug 28, 2014 at 11:26:53AM -0700, Kevin Grittner wrote:
 Steve Crawford scrawf...@pinpointresearch.com wrote:
 
  I have always considered timestamp with time zone to be a bad
  description of that data type but it appears to be a carryover
  from the specs. It is really a point in time
 
 I agree.  While what timestamptz implements is a very useful data
 type, I think it was a very unfortunate decision to implement that
 for the standard type name, instead of something more consistent
 with the spec.  It seems very unlikely to change, though, because
 so much existing production code would break.  :-(
 
 Understandably, people do tend to expect that saving something into 
 a column defined as TIMESTAMP WITH TIME ZONE will save a time zone 
 with the timestamp, and in PostgreSQL it does not.

So the standard requires storing of original timezone in the data type? 
I was not aware of that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread k...@rice.edu
On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote:
 On Thu, Aug 28, 2014 at 11:26:53AM -0700, Kevin Grittner wrote:
  Steve Crawford scrawf...@pinpointresearch.com wrote:
  
   I have always considered timestamp with time zone to be a bad
   description of that data type but it appears to be a carryover
   from the specs. It is really a point in time
  
  I agree.  While what timestamptz implements is a very useful data
  type, I think it was a very unfortunate decision to implement that
  for the standard type name, instead of something more consistent
  with the spec.  It seems very unlikely to change, though, because
  so much existing production code would break.  :-(
  
  Understandably, people do tend to expect that saving something into 
  a column defined as TIMESTAMP WITH TIME ZONE will save a time zone 
  with the timestamp, and in PostgreSQL it does not.
 
 So the standard requires storing of original timezone in the data type? 
 I was not aware of that.
 

I do not have a copy of the SQL 92 spec, but several references to the
spec mention that it defined the time zone as a format SHH:MM where
S represents the sign (+ or -), which seems to be what PostgreSQL uses.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Tom Lane
k...@rice.edu k...@rice.edu writes:
 On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote:
 So the standard requires storing of original timezone in the data type? 
 I was not aware of that.

 I do not have a copy of the SQL 92 spec, but several references to the
 spec mention that it defined the time zone as a format SHH:MM where
 S represents the sign (+ or -), which seems to be what PostgreSQL uses.

Yeah, the spec envisions timezone as being a separate numeric field
(ie, a numeric GMT offset) within a timestamp with time zone.  One of
the ways in which the spec's design is rather broken is that there's
no concept of real-world time zones with varying DST rules.

Anyway, I agree with the upthread comments that it'd have been better
if we'd used some other name for this datatype, and also that it's
at least ten years too late to revisit the choice :-(.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Alvaro Herrera
Robert Haas wrote:

 I agree that you might not like that.  But you might not like having
 the table vacuumed slower than the configured rate, either.  My
 impression is that the time between vacuums isn't really all that
 negotiable for some people.  I had one customer who had horrible bloat
 issues on a table that was vacuumed every minute; when we changed the
 configuration so that it was vacuumed every 15 seconds, those problems
 went away.

Wow, that's extreme.  For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all.  Would that
satisfy the concern?

 Now that is obviously more a matter for autovacuum_naptime
 than this option, but the point seems general to me: if you need the
 table vacuumed every N seconds, minutes, or hours, and it only gets
 vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
 are other autovacuum workers running, the table is going to bloat.

There might be another problem here which is that if you have all your
workers busy because they are vacuuming large tables that don't have
churn high enough to warrant disrupting the whole environment (thus low
cost_limit), then the table will bloat no matter what you set its cost
limit to.  So there's not only a matter of a low enough naptime (which
is a bad thing for the rest of the system, also) but also one of
something similar to priority inversion; should you speed up the
vacuuming of those large tables so that one worker is freed soon enough
to get to the high-churn table?

Was the solution for that customer to set an external tool running
vacuum on that table?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Kevin Grittner
k...@rice.edu k...@rice.edu wrote:

 On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote:

 So the standard requires storing of original timezone in the
 data type?  I was not aware of that.

 I do not have a copy of the SQL 92 spec, but several references
 to the spec mention that it defined the time zone as a format
 SHH:MM where S represents the sign (+ or -), which seems to be
 what PostgreSQL uses.

I just took a quick look at the spec to refresh my memory, and it
seems to require that the WITH TIME ZONE types store UTC (I suppose
for fast comparisons), it requires the time zone in the form of a
hour:minute offset to be stored with it, so you can determine the
local time from which it was derived.  I concede that this is not
usually useful, and am glad we have a type that behaves as
timestamptz does; but occasionally a type that behaves in
conformance with the spec would be useful, and it would certainly
be less confusing for people who are used to the standard behavior.

Basically, both store a moment in time in UTC, and display it with
offset in hours and minutes; but the standard says it should show
you that moment from the perspective of whoever saved it unless you
ask for it in a different time zone, while PostgreSQL always shows
it to you from the perspective of your client connection's time
zone.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Alvaro Herrera
Kevin Grittner wrote:

 I just took a quick look at the spec to refresh my memory, and it
 seems to require that the WITH TIME ZONE types store UTC (I suppose
 for fast comparisons), it requires the time zone in the form of a
 hour:minute offset to be stored with it, so you can determine the
 local time from which it was derived.  I concede that this is not
 usually useful, and am glad we have a type that behaves as
 timestamptz does; but occasionally a type that behaves in
 conformance with the spec would be useful, and it would certainly
 be less confusing for people who are used to the standard behavior.

I remember we tried to implement this some years ago (IIRC alongside
Alexey Klyukin who might remember more details).  I couldn't find the
thread, but one of the first problems we encountered was that we wanted
to avoid storing the text name of the timezone on each datum; we had the
idea of creating a catalog to attach an OID to each timezone, but that
turned very quickly into a horrid mess and we discarded the idea.

(For instance: if a new timezone is added in a new tzdata release, it
needs to be added to the catalog, but how do you do that in minor
releases?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 I just took a quick look at the spec to refresh my memory, and it
 seems to require that the WITH TIME ZONE types store UTC (I suppose
 for fast comparisons), it requires the time zone in the form of a
 hour:minute offset to be stored with it, so you can determine the
 local time from which it was derived.  I concede that this is not
 usually useful, and am glad we have a type that behaves as
 timestamptz does; but occasionally a type that behaves in
 conformance with the spec would be useful, and it would certainly
 be less confusing for people who are used to the standard behavior.

 I remember we tried to implement this some years ago (IIRC alongside
 Alexey Klyukin who might remember more details).  I couldn't find the
 thread, but one of the first problems we encountered was that we wanted
 to avoid storing the text name of the timezone on each datum; we had the
 idea of creating a catalog to attach an OID to each timezone, but that
 turned very quickly into a horrid mess and we discarded the idea.

 (For instance: if a new timezone is added in a new tzdata release, it
 needs to be added to the catalog, but how do you do that in minor
 releases?)

But the standard doesn't say anything about storing a time zone
*name* or *abbreviation* -- it requires that it be stored as UTC
with the *offset* (in hours and minutes).  That makes it pretty
close to what we have -- it's all about a difference in
presentation.  And as far as I can see it completely dodges the
kinds of problems you're talking about.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Alvaro Herrera
Kevin Grittner wrote:

 But the standard doesn't say anything about storing a time zone
 *name* or *abbreviation* -- it requires that it be stored as UTC
 with the *offset* (in hours and minutes).  That makes it pretty
 close to what we have -- it's all about a difference in
 presentation.  And as far as I can see it completely dodges the
 kinds of problems you're talking about.

Yeah, it does, but is it useful?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multithreaded SIGPIPE race in libpq on Solaris

2014-08-28 Thread Thomas Munro
Hello,

A while back someone showed me a program blocking in libpq 9.2 on
Solaris 11, inside sigwait called by pq_reset_sigpipe.  It had
happened a couple of times before during a period of
instability/crashing with a particular DB (a commercial PostgreSQL
derivative, but the client was using regular libpq).  This was a very
busy multithreaded client where each thread had its own connection.
My theory is that if two connections accessed by different threads get
shut down around the same time, there is a race scenario where each of
them fails to write to its socket, sees errno == EPIPE and then sees a
pending SIGPIPE with sigpending(), but only one thread returns from
sigwait() due to signal merging.

We never saw the problem again after we made the following change:

--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -450,7 +450,6 @@ void
 pq_reset_sigpipe(sigset_t *osigset, bool sigpipe_pending, bool got_epipe)
 {
int save_errno = SOCK_ERRNO;
-   int signo;
sigset_tsigset;

/* Clear SIGPIPE only if none was pending */
@@ -460,11 +459,13 @@ pq_reset_sigpipe(sigset_t *osigset, bool
sigpipe_pending, bool got_epipe)
sigismember(sigset, SIGPIPE))
{
sigset_tsigpipe_sigset;
+   siginfo_t   siginfo;
+   struct timespec timeout = { 0, 0 };

sigemptyset(sigpipe_sigset);
sigaddset(sigpipe_sigset, SIGPIPE);

-   sigwait(sigpipe_sigset, signo);
+   sigtimedwait(sigpipe_sigset, siginfo, timeout);
}
}

Does this make any sense?

Best regards,
Thomas Munro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Robert Haas
On Thu, Aug 28, 2014 at 4:56 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 I agree that you might not like that.  But you might not like having
 the table vacuumed slower than the configured rate, either.  My
 impression is that the time between vacuums isn't really all that
 negotiable for some people.  I had one customer who had horrible bloat
 issues on a table that was vacuumed every minute; when we changed the
 configuration so that it was vacuumed every 15 seconds, those problems
 went away.

 Wow, that's extreme.  For that case you can set
 autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
 lets vacuum run at full speed -- no throttling at all.  Would that
 satisfy the concern?

Well, maybe, if you want to run completely unthrottled.  But I have no
evidence that's a common desire.

 Now that is obviously more a matter for autovacuum_naptime
 than this option, but the point seems general to me: if you need the
 table vacuumed every N seconds, minutes, or hours, and it only gets
 vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
 are other autovacuum workers running, the table is going to bloat.

 There might be another problem here which is that if you have all your
 workers busy because they are vacuuming large tables that don't have
 churn high enough to warrant disrupting the whole environment (thus low
 cost_limit), then the table will bloat no matter what you set its cost
 limit to.  So there's not only a matter of a low enough naptime (which
 is a bad thing for the rest of the system, also) but also one of
 something similar to priority inversion; should you speed up the
 vacuuming of those large tables so that one worker is freed soon enough
 to get to the high-churn table?

I don't think so.  I continue to believe that the we need to provide
the user with the tools to be certain that table X will get vacuumed
at least every Y seconds/minutes/hours.  To me, allowing the user to
set a rate that the system will not adjust or manipulate in any way
makes this a lot easier than anything else we might do.

 Was the solution for that customer to set an external tool running
 vacuum on that table?

Nope, we just changed settings.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-28 Thread Tomas Vondra
On 26.8.2014 21:38, Jeff Davis wrote:
 On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote:
 I think this is enough for this commitfest - we have consensus on
 the design. For the next one, please address those open items, and
 resubmit.
 
 Agreed, return with feedback.
 
 I need to get the accounting patch in first, which needs to address 
 some performance issues, but there's a chance of wrapping those up 
 quickly.

Sounds good to me.

I'd like to coordinate our efforts on this a bit, if you're interested.

I've been working on the hashjoin-like batching approach PoC (because I
proposed it, so it's fair I do the work), and I came to the conclusion
that it's pretty much impossible to implement on top of dynahash. I
ended up replacing it with a hashtable (similar to the one in the
hashjoin patch, unsurprisingly), which supports the batching approach
well, and is more memory efficient and actually faster (I see ~25%
speedup in most cases, although YMMV).

I plan to address this in 4 patches:

(1) replacement of dynahash by the custom hash table (done)

(2) memory accounting (not sure what's your plan, I've used the
approach I proposed on 23/8 for now, with a few bugfixes/cleanups)

(3) applying your HashWork patch on top of this (I have this mostly
completed, but need to do more testing over the weekend)

(4) extending this with the batching I proposed, initially only for
aggregates with states that we can serialize/deserialize easily
(e.g. types passed by value) - I'd like to hack on this next week

So at this point I have (1) and (2) pretty much ready, (3) is almost
complete and I plan to start hacking on (4). Also, this does not address
the open items listed in your message.


But I agree this is more complex than the patch you proposed. So if you
choose to pursue your patch, I have no problem with that - I'll then
rebase my changes on top of your patch and submit them separately.


regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function to know last log write timestamp

2014-08-28 Thread Robert Haas
On Thu, Aug 28, 2014 at 3:34 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Theoretically it's not safe without a barrier on a machine with weak
 memory ordering. No?

Why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 But the standard doesn't say anything about storing a time zone
 *name* or *abbreviation* -- it requires that it be stored as UTC
 with the *offset* (in hours and minutes).  That makes it pretty
 close to what we have -- it's all about a difference in
 presentation.  And as far as I can see it completely dodges the
 kinds of problems you're talking about.

 Yeah, it does, but is it useful?

More so than CHAR(n).  It would have been beneficial to support for 
the same reason.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Josh Berkus
On 08/28/2014 02:25 PM, Kevin Grittner wrote:
 But the standard doesn't say anything about storing a time zone
 *name* or *abbreviation* -- it requires that it be stored as UTC
 with the *offset* (in hours and minutes).  That makes it pretty
 close to what we have -- it's all about a difference in
 presentation.  And as far as I can see it completely dodges the
 kinds of problems you're talking about.

Except that an offset is not a timezone.  This is why the spec behavior
was always academic crippleware, and why we abandoned it back in ~~7.2.
 It does me no good at all to know that a timestamp is offset -07:00:
that could be Mountain Time, Arizona Time, or Navajo Nation time, all of
which will behave differently when I add 2 months to them.

Unless the only goal is to be compatible with some other DBMS, in which
case ... build an extension.

On the other hand, I take partial responsibility for the mess which is
our data type naming.  What we call timestamptz should just be
timestamp, and whether or not it converts to local timezone on
retrieval should be a GUC setting.  And the type we call timestamp
shouldn't exist.  Hindsight is 20/20.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 But the standard doesn't say anything about storing a time zone
 *name* or *abbreviation* -- it requires that it be stored as UTC
 with the *offset* (in hours and minutes).  That makes it pretty
 close to what we have -- it's all about a difference in
 presentation.  And as far as I can see it completely dodges the
 kinds of problems you're talking about.

However, the added field creates its own semantic problems.
As an example, is 2014-08-28 18:00:00-04 the same as or different from
2014-08-28 17:00:00-05?  If they're different, which one is less?
If they're the same, what's the point of storing the extra field?
And do you really like equal values that behave differently,
not only for I/O but for operations such as EXTRACT()?

(I imagine the SQL spec gives a ruling on this issue, which
I'm too lazy to look up; my point is that whatever they did, it
will be the wrong thing for some use-cases.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris

2014-08-28 Thread Tom Lane
Thomas Munro mu...@ip9.org writes:
 My theory is that if two connections accessed by different threads get
 shut down around the same time, there is a race scenario where each of
 them fails to write to its socket, sees errno == EPIPE and then sees a
 pending SIGPIPE with sigpending(), but only one thread returns from
 sigwait() due to signal merging.

Hm, that does sound like it could be a problem, if the platform fails
to track pending SIGPIPE on a per-thread basis.

 We never saw the problem again after we made the following change:
 ...
 Does this make any sense?

I don't think that patch would fix the problem if it's real.  It would
prevent libpq from hanging up when it's trying to throw away a pending
SIGPIPE, but the fundamental issue is that that action could cause a
SIGPIPE that's meant for some other thread to get lost; and that other
thread isn't necessarily doing anything with libpq.

I don't claim to be an expert on this stuff, but I had the idea that
multithreaded environments were supposed to track signal state per-thread
not just per-process, precisely because of issues like this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-08-28 Thread Alvaro Herrera
Vik Fearing wrote:

 Here are two patches for this.
 
 The first one, reindex_user_tables.v1.patch, implements the variant that
 only hits user tables, as suggested by you.
 
 The second one, reindex_no_dbname.v1.patch, allows the three
 database-wide variants to omit the database name (voted for by Daniel
 Migowski, Bruce, and myself; voted against by you).  This patch is to be
 applied on top of the first one.

Not a fan.  Here's a revised version that provides REINDEX USER TABLES,
which can only be used without a database name; other modes are not
affected i.e. they continue to require a database name.  I also renamed
your proposed reindexdb's --usertables to --user-tables.

Oh, I just noticed that if you say reindexdb --all --user-tables, the
latter is not honored.  Must fix before commit.

Makes sense?

Note: I don't like the reindexdb UI; if you just run reindexdb -d
foobar it will reindex everything, including system catalogs.  I think
USER TABLES should be the default operation mode for reindex.   If you
want plain old REINDEX DATABASE foobar which also hits the catalogs,
you should request that separately (how?).  This patch doesn't change
this.

Also note: if you say user tables, information_schema is reindexed too,
which kinda sucks.

Further note: this command is probably pointless in the majority of
cases.  Somebody should spend some serious time with REINDEX
CONCURRENTLY ..

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index cabae19..d05e1ac 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ]
+REINDEX USER TABLES
 /synopsis
  /refsynopsisdiv
 
@@ -126,14 +127,26 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERnam
/varlistentry
 
varlistentry
+termliteralUSER TABLES/literal/term
+listitem
+ para
+  Recreate all indexes on user tables within the current database.
+  Indexes on system catalogs are not processed.
+  This form of commandREINDEX/command cannot be executed inside a
+  transaction block.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
   The name of the specific index, table, or database to be
   reindexed.  Index and table names can be schema-qualified.
-  Presently, commandREINDEX DATABASE/ and commandREINDEX SYSTEM/
-  can only reindex the current database, so their parameter must match
-  the current database's name.
+  Presently, commandREINDEX DATABASE/, commandREINDEX SYSTEM/,
+  and commandREINDEX USER TABLES/ can only reindex the current
+  database, so their parameter must match the current database's name.
  /para
 /listitem
/varlistentry
diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 486f5c9..f69d84b 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -65,6 +65,15 @@ PostgreSQL documentation
/group
arg choice=optreplaceabledbname/replaceable/arg
   /cmdsynopsis
+
+  cmdsynopsis
+   commandreindexdb/command
+   arg rep=repeatreplaceableconnection-option/replaceable/arg
+   group choice=plain
+arg choice=plainoption--user-tables/option/arg
+arg choice=plainoption-u/option/arg
+   /group
+  /cmdsynopsis
  /refsynopsisdiv
 
 
@@ -173,6 +182,16 @@ PostgreSQL documentation
   /listitem
  /varlistentry
 
+ varlistentry
+  termoption-u//term
+  termoption--user-tables//term
+  listitem
+   para
+Reindex database's user tables.
+   /para
+  /listitem
+ /varlistentry
+
 varlistentry
   termoption-V//term
   termoption--version//term
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fdfa6ca..23e13f0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1772,6 +1772,9 @@ ReindexTable(RangeVar *relation)
  * To reduce the probability of deadlocks, each table is reindexed in a
  * separate transaction, so we can release the lock on it right away.
  * That means this must not be called within a user transaction block!
+ *
+ * databaseName can be NULL when do_user is set and do_system isn't; this
+ * is the REINDEX USER TABLES case.
  */
 Oid
 ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
@@ -1784,9 +1787,10 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
 	List	   *relids = NIL;
 	ListCell   *l;
 
-	AssertArg(databaseName);
+	AssertArg(databaseName || (do_user  !do_system));
 
-	if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
+	if (databaseName 
+		

Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris

2014-08-28 Thread Thomas Munro
On 28 August 2014 23:45, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't claim to be an expert on this stuff, but I had the idea that
 multithreaded environments were supposed to track signal state per-thread
 not just per-process, precisely because of issues like this.

After some googling, I found reply #3 in
https://community.oracle.com/thread/1950900?start=0tstart=0 and
various other sources which say that in Solaris versions 10 they
changed SIGPIPE delivery from per process (as specified by UNIX98) to
per thread (as specified by POSIX:2001).  But we are on version 11, so
my theory doesn't look great.  (Though 9 is probably still in use out
there somewhere...)

I also found this article:
http://krokisplace.blogspot.co.uk/2010/02/suppressing-sigpipe-in-library.html

The author recommends an approach nearly identical to the PostgreSQL
approach, except s/he says:  to do this we use sigtimedwait() with
zero timeout; this is to avoid blocking in a scenario where malicious
user sent SIGPIPE manually to a whole process: in this case we will
see it pending, but other thread may handle it before we had a
[chance] to wait for it.

Maybe we have malicious users sending signals to processes.  It does
seem more likely the crashing database triggered this somehow though,
perhaps in combination with something else the client app was doing,
though I can't think what it could be that would eat another thread's
SIGPIPE in between the sigpending and sigwait syscalls.

Best regards,
Thomas Munro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Bruce Momjian
On Thu, Aug 28, 2014 at 03:25:49PM -0700, Josh Berkus wrote:
 On 08/28/2014 02:25 PM, Kevin Grittner wrote:
  But the standard doesn't say anything about storing a time zone
  *name* or *abbreviation* -- it requires that it be stored as UTC
  with the *offset* (in hours and minutes).  That makes it pretty
  close to what we have -- it's all about a difference in
  presentation.  And as far as I can see it completely dodges the
  kinds of problems you're talking about.
 
 Except that an offset is not a timezone.  This is why the spec behavior
 was always academic crippleware, and why we abandoned it back in ~~7.2.
  It does me no good at all to know that a timestamp is offset -07:00:
 that could be Mountain Time, Arizona Time, or Navajo Nation time, all of
 which will behave differently when I add 2 months to them.
 
 Unless the only goal is to be compatible with some other DBMS, in which
 case ... build an extension.
 
 On the other hand, I take partial responsibility for the mess which is
 our data type naming.  What we call timestamptz should just be
 timestamp, and whether or not it converts to local timezone on
 retrieval should be a GUC setting.  And the type we call timestamp
 shouldn't exist.  Hindsight is 20/20.

Well, the standard TIMESTAMP requires WITHOUT TIME ZONE, so I don't know
how you would be standards-compliant without it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: Allow distdir to be overridden on make command line

2014-08-28 Thread Craig Ringer
Not just a one line patch, a one character patch.

Use ?= instead of = in distdir assignment, so it can be overridden on
the command line when building dist tarballs with patches.

Yes, you can just modify GNUMakefile.in, but that's extra noise in a
diff, adds merge conflicts, etc.

Please apply. Surely this is harmless?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 2a115f7b62dbe3f98ef2c011f3283a41af86fd15 Mon Sep 17 00:00:00 2001
From: Craig Ringer cr...@2ndquadrant.com
Date: Fri, 29 Aug 2014 10:00:40 +0800
Subject: [PATCH] Allow distdir to be overridden on the make command line

This is useful for preparing dist tarballs for patches with
useful suffixes, such as:

distdir='postgresql-$(VERSION)'-git$(git rev-parse --short HEAD)
---
 GNUmakefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 69e0824..b469e3a 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -76,7 +76,7 @@ GNUmakefile: GNUmakefile.in $(top_builddir)/config.status
 
 ##
 
-distdir	= postgresql-$(VERSION)
+distdir	?= postgresql-$(VERSION)
 dummy	= =install=
 garbage = =*  #*  .#*  *~*  *.orig  *.rej  core  postgresql-*
 
-- 
1.9.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-28 Thread Craig Ringer
On 08/29/2014 04:59 AM, Kevin Grittner wrote:
 I just took a quick look at the spec to refresh my memory, and it
 seems to require that the WITH TIME ZONE types store UTC (I suppose
 for fast comparisons), it requires the time zone in the form of a
 hour:minute offset to be stored with it, so you can determine the
 local time from which it was derived.  I concede that this is not
 usually useful, and am glad we have a type that behaves as
 timestamptz does; but occasionally a type that behaves in
 conformance with the spec would be useful, and it would certainly
 be less confusing for people who are used to the standard behavior.

FWIW, MS SQL's DateTimeOffset data type:

http://msdn.microsoft.com/en-AU/library/bb630289.aspx

is much more like what I, when I was getting started, expected TIMESTAMP
WITH TIME ZONE to be. We don't really have anything equivalent in
PostgreSQL.


The PostgreSQL implementation merits some highlighted clear explanation
in the documentation, explaining the concept of a point in absolute time
(the first person to mention relativity gets smacked ... oh, darn) vs a
wall-clock value in local time. It should also discuss the approach of
storing a (instant timestamptz, timezone text) or (instant timestampts,
tzoffset smallint) tuple for when unambiguous representation is required.

(I guess I just volunteered myself to write a draft of that).


BTW, it might be interesting to have a validated 'timezone' data type
that can store time zone names or offsets, for use in conjunction with
timestamptz to store a (timestamptz, timezone) tuple. Though also
complicated - whether 'EST' is Australian or USA Eastern time is
GUC-dependent, and it can't just be expanded into Australia/Sydney at
input time because EST is always +1000 while Australia/Sydney could
also be EDT +1100 . I hate time zones. It'd probably have to expand
abbrevs to their UTC offsets at input time.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimization: push down aggregates

2014-08-28 Thread Craig Ringer
On 08/28/2014 03:46 AM, Claudio Freire wrote:
 You can't with mean and stddev, only with associative aggregates.
 
 That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.

You could with a new helper function to merge the temporary states for
each scan though.

In the case of mean, for example, it'd just mean adding the counts and sums.

However, I'm not sure how interesting that is without the ability to
execute the subplans in parallel.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-08-28 Thread Tom Lane
Haribabu Kommi kommi.harib...@gmail.com writes:
 Thanks for your review. Please find the rebased patch to latest HEAD.

Committed with minor (mostly cosmetic) alterations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-08-28 Thread Peter Geoghegan
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan p...@heroku.com wrote:
 There are some restrictions on what this auxiliary update may do, but
 FWIW there are considerably fewer than those that the equivalent MySQL
 or SQLite feature imposes on their users.


I realized that I missed a few cases here. For one thing, the posted
patch fails to arrange for the UPDATE post-parse-analysis tree
representation to go through the rewriter stage (on the theory that
user-defined rules shouldn't be able to separately affect the
auxiliary UPDATE query tree), but rewriting is at least necessary so
that rewriteTargetListIU() can expand a SET val = DEFAULT
targetlist, as well as normalize the ordering of the UPDATE's tlist.
Separately, the patch fails to defend against certain queries that
ought to be disallowed, where a subselect is specified with a subquery
expression in the auxiliary UPDATE's WHERE clause.

These are garden-variety bugs that aren't likely to affect the kind of
high-level design discussion that I'm looking for here. I'll post a
fixed version in a few days time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-28 Thread Etsuro Fujita

(2014/08/13 12:40), Etsuro Fujita wrote:

(2014/08/12 18:34), Shigeru Hanada wrote:

Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement  in the v2 patch.



* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger.  DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.


Ah, I noticed that the current code for that is not correct.  Will fix.


Done.


* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()?  It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).


Good idea!  Will improve that too.


Done.


* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation.  An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.


Yeah, I also think that it would not necessarily easy for the users to
understand which expression is safe to send.  So I agree with that
enhancement, but ISTM that it would be better to do that as a separate
patch.


As above, I'd like to leave this as another patch.

Please find attached the updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 188,197  is_foreign_expr(PlannerInfo *root,
  	if (!foreign_expr_walker((Node *) expr, glob_cxt, loc_cxt))
  		return false;
  
- 	/* Expressions examined here should be boolean, ie noncollatable */
- 	Assert(loc_cxt.collation == InvalidOid);
- 	Assert(loc_cxt.state == FDW_COLLATE_NONE);
- 
  	/*
  	 * An expression which includes any mutable functions can't be sent over
  	 * because its result is not stable.  For example, sending now() remote
--- 188,193 
***
*** 927,932  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 923,981 
  }
  
  /*
+  * deparse remote UPDATE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
+ 	   Index rtindex, Relation rel,
+ 	   List	*remote_conds,
+ 	   List	*targetlist,
+ 	   List *targetAttrs, List *returningList,
+ 	   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root-simple_rel_array[rtindex];
+ 	List	   *params_list = NIL;
+ 	deparse_expr_cxt context;
+ 	bool		first;
+ 	ListCell   *lc;
+ 
+ 	/* Set up context struct for recursion */
+ 	context.root = root;
+ 	context.foreignrel = baserel;
+ 	context.buf = buf;
+ 	context.params_list = NULL;
+ 
+ 	appendStringInfoString(buf, UPDATE );
+ 	deparseRelation(buf, rel);
+ 	appendStringInfoString(buf,  SET );
+ 
+ 	first = true;
+ 	foreach(lc, targetAttrs)
+ 	{
+ 		int			attnum = lfirst_int(lc);
+ 		TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+ 
+ 		if (!first)
+ 			appendStringInfoString(buf, , );
+ 		first = false;
+ 
+ 		deparseColumnRef(buf, rtindex, attnum, root);
+ 		appendStringInfo(buf,  = );
+ 		deparseExpr((Expr *) tle-expr, context);
+ 	}
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	deparseReturningList(buf, root, rtindex, rel, false,
+ 		 returningList, retrieved_attrs);
+ }
+ 
+ /*
   * deparse remote DELETE statement
   *
   * The statement text is appended to buf, and we also create an integer List
***
*** 949,954  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 998,1030 
  }
  
  /*
+  * deparse remote DELETE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
+ 	   Index rtindex, Relation rel,
+ 	   List	*remote_conds,
+ 	   List *returningList,
+ 	   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root-simple_rel_array[rtindex];
+ 	List	   *params_list = NIL;
+ 
+ 	appendStringInfoString(buf, DELETE FROM );
+ 	deparseRelation(buf, rel);
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	deparseReturningList(buf, root, rtindex, rel, false,
+ 		 returningList, 

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-28 Thread Etsuro Fujita

(2014/08/26 12:20), Etsuro Fujita wrote:

(2014/08/25 21:58), Albe Laurenz wrote:

I played with it, and apart from Hanada's comments I have found the
following:

test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id  3;
 QUERY PLAN
--

  Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970
width=10) (actual time=0.005..0.005 rows=0 loops=1)
-  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40
rows=299970 width=10) (actual time=0.002..0.002 rows=27 loops=1)
  Output: id, val, ctid
  Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE
((id  3))
  Planning time: 0.179 ms
  Execution time: 3706.919 ms
(6 rows)

Time: 3708.272 ms

The actual time readings are surprising.
Shouldn't these similar to the actual execution time, since most of
the time is spent
in the foreign scan node?


I was also thinkng that this is confusing to the users.  I think this is
because the patch executes the UPDATE/DELETE statement during
postgresBeginForeignScan, not postgresIterateForeignScan, as you
mentioned below:


Reading the code, I noticed that the pushed down UPDATE or DELETE
statement is executed
during postgresBeginForeignScan rather than during
postgresIterateForeignScan.



I'll modify the patch so as to execute the statement during
postgresIterateForeignScan.


Done.


It is not expected that postgresReScanForeignScan is called when the
UPDATE/DELETE
is pushed down, right?  Maybe it would make sense to add an assertion
for that.


IIUC, that is right.  As ModifyTable doesn't support rescan currently,
postgresReScanForeignScan needn't to be called in the update pushdown
case.  The assertion is a good idea.  I'll add it.


Done.

You can find the updated version of the patch at

http://www.postgresql.org/message-id/53fffa50.6020...@lab.ntt.co.jp

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-28 Thread Mark Kirkwood

On 29/08/14 08:56, Alvaro Herrera wrote:

Robert Haas wrote:


I agree that you might not like that.  But you might not like having
the table vacuumed slower than the configured rate, either.  My
impression is that the time between vacuums isn't really all that
negotiable for some people.  I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.


Wow, that's extreme.  For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all.  Would that
satisfy the concern?



Well no - you might have a whole lot of big tables that you want vacuum 
to not get too aggressive on, but a few small tables that are highly 
volatile. So you want *them* vacuumed really fast to prevent them 
becoming huge tables with only a few rows therein, but your system might 
not be able to handle *all* your tables being vacuum full speed.


This is a fairly common scenario for (several) web CMS systems that tend 
to want to have session and/cache tables that are small and extremely 
volatile, plus the rest of the (real) data that is bigger and vastly 
less volatile.


While there is a valid objection along the lines of don't use a 
database instead of memcache, it does seem reasonable that Postgres 
should be able to cope with this type of workload.


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers