Re: [HACKERS] New functions in sslinfo module

2014-07-18 Thread Воронин Дмитрий
Hello, Andreas and others!I make a new version of patch. I corrected your notes for my previous version of patch. Could you test it? Thank you.03.07.2014, 01:54, "Andreas Karlsson" andr...@proxel.se: On 07/02/2014 02:17 PM, Воронин Дмитрий wrote:  I apologize, that I am writing this message today. Thank you for testing  my patch! You are welcome!  I will modify functions ssl_extensions(), that it returns a set (key,  value). Could you get me an example of code those function? You can look at hstore_each() in hstore_op.c.  - Why are X509_NAME_field_to_text(), X509_NAME_to_text(),  ASN1_STRING_to_text() and get_extension() not static? None of these are  a symbol which should be exported.  Why do you use pg_do_encoding_conversion() over pg_any_to_server()?  pg_any_to_server() is implemented using pg_do_encoding_conversion().  I don't write a code of those functions and I can't answer on your question. Hm, I thought I saw them changed from static to not in the diff after applying your patch. Maybe I just misread the patch. -- Andreas Karlsson-- Best regards, Dmitry Voronin*** a/contrib/sslinfo/sslinfo.c
--- b/contrib/sslinfo/sslinfo.c
***
*** 5,10 
--- 5,12 
   * This file is distributed under BSD-style license.
   *
   * contrib/sslinfo/sslinfo.c
+  * 
+  * Extension functions written by Dmitry Voronin carriingfat...@yandex.ru, CNIIEISU.
   */
  
  #include postgres.h
***
*** 14,29 
  #include miscadmin.h
  #include utils/builtins.h
  #include mb/pg_wchar.h
  
  #include openssl/x509.h
  #include openssl/asn1.h
  
  PG_MODULE_MAGIC;
  
- static Datum X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
- static Datum X509_NAME_to_text(X509_NAME *name);
- static Datum ASN1_STRING_to_text(ASN1_STRING *str);
  
  
  /*
   * Indicates whether current session uses SSL
--- 16,49 
  #include miscadmin.h
  #include utils/builtins.h
  #include mb/pg_wchar.h
+ #include funcapi.h
  
  #include openssl/x509.h
  #include openssl/asn1.h
+ #include openssl/x509v3.h
+ 
  
  PG_MODULE_MAGIC;
  
  
+ Datum		ssl_is_used(PG_FUNCTION_ARGS);
+ Datum		ssl_version(PG_FUNCTION_ARGS);
+ Datum		ssl_cipher(PG_FUNCTION_ARGS);
+ Datum		ssl_client_cert_present(PG_FUNCTION_ARGS);
+ Datum		ssl_client_serial(PG_FUNCTION_ARGS);
+ Datum		ssl_client_dn_field(PG_FUNCTION_ARGS);
+ Datum		ssl_issuer_field(PG_FUNCTION_ARGS);
+ Datum		ssl_client_dn(PG_FUNCTION_ARGS);
+ Datum		ssl_issuer_dn(PG_FUNCTION_ARGS);
+ Datum		X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
+ Datum		X509_NAME_to_text(X509_NAME *name);
+ Datum		ASN1_STRING_to_text(ASN1_STRING *str);
+ 
+ X509_EXTENSION	*get_extension(X509* certificate, char *name);
+ Datum 		ssl_get_extension_value(PG_FUNCTION_ARGS);
+ Datum		ssl_is_critical_extension(PG_FUNCTION_ARGS);
+ Datum 		ssl_get_count_of_extensions(PG_FUNCTION_ARGS);
+ Datum		ssl_get_extension_names(PG_FUNCTION_ARGS);
  
  /*
   * Indicates whether current session uses SSL
***
*** 40,46  ssl_is_used(PG_FUNCTION_ARGS)
  
  
  /*
!  * Returns SSL version currently in use.
   */
  PG_FUNCTION_INFO_V1(ssl_version);
  Datum
--- 60,66 
  
  
  /*
!  * Returns SSL cipher currently in use.
   */
  PG_FUNCTION_INFO_V1(ssl_version);
  Datum
***
*** 66,72  ssl_cipher(PG_FUNCTION_ARGS)
  
  
  /*
!  * Indicates whether current client provided a certificate
   *
   * Function has no arguments.  Returns bool.  True if current session
   * is SSL session and client certificate is verified, otherwise false.
--- 86,92 
  
  
  /*
!  * Indicates whether current client have provided a certificate
   *
   * Function has no arguments.  Returns bool.  True if current session
   * is SSL session and client certificate is verified, otherwise false.
***
*** 121,133  ssl_client_serial(PG_FUNCTION_ARGS)
   * current database encoding if possible.  Any invalid characters are
   * replaced by question marks.
   *
!  * Parameter: str - OpenSSL ASN1_STRING structure.  Memory management
   * of this structure is responsibility of caller.
   *
   * Returns Datum, which can be directly returned from a C language SQL
   * function.
   */
! static Datum
  ASN1_STRING_to_text(ASN1_STRING *str)
  {
  	BIO		   *membuf;
--- 141,153 
   * current database encoding if possible.  Any invalid characters are
   * replaced by question marks.
   *
!  * Parameter: str - OpenSSL ASN1_STRING structure.	Memory management
   * of this structure is responsibility of caller.
   *
   * Returns Datum, which can be directly returned from a C language SQL
   * function.
   */
! Datum
  ASN1_STRING_to_text(ASN1_STRING *str)
  {
  	BIO		   *membuf;
***
*** 146,152  ASN1_STRING_to_text(ASN1_STRING *str)
  	nullterm = '\0';
  	BIO_write(membuf, nullterm, 1);
  	size = BIO_get_mem_data(membuf, sp);
! 	dp = pg_any_to_server(sp, size - 1, PG_UTF8);
  	result = cstring_to_text(dp);
  	if (dp != sp)
  		pfree(dp);
--- 166,175 
  	nullterm = '\0';
  	

Re: [HACKERS] Portability issues in TAP tests

2014-07-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Thu, Jul 17, 2014 at 03:31:19PM -0400, Tom Lane wrote:
 4. IPC::Run isn't installed by default on RHEL, and probably not on other
 distros either.  If there's a reasonably painless way to remove this
 dependency, it'd improve the portability of the tests.  This is lower
 priority than the previous items, for sure.

 Perl 5.10 and later incorporate IPC::Cmd, a weaker counterpart to IPC::Run.
 It looked promising as a replacement when I examined the matter briefly.  Why
 do you estimate the priority of (4) well below the priority of (3)?

Well, yum install perl-IPC-Run fixes that problem on RHEL6.  Installing
a more modern version of Test::More is at least one order of magnitude
harder technically; and it might involve political/managerial hurdles as
well, for people who face policies against installing not-supported-by-vendor
packages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-18 Thread Mitsumasa KONDO
2014-07-18 5:13 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


  However, ISTM that it is not the purpose of pgbench documentation to be a
 primer about what is an exponential or gaussian distribution, so the idea
 would yet be to have a relatively compact explanation, and that the
 interested but clueless reader would document h..self from wikipedia or a
 text book or a friend or a math teacher (who could be a friend as
 well:-).


 Well, I think it's a balance.  I agree that the pgbench documentation
 shouldn't try to substitute for a text book or a math teacher, but I
 also think that you shouldn't necessarily need to refer to a text book
 or a math teacher in order to figure out how to use pgbench.  Saying
 it's complicated, so we don't have to explain it would be a cop out;
 we need to *make* it simple.  And if there's no way to do that, then
 IMHO we should reject the patch in favor of some future patch that
 implements something that will be easy for users to understand.

   [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
 starting vacuum...end.
 transaction type: Exponential distribution TPC-B (sort of)
 scaling factor: 1
 exponential threshold: 10.0

 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
 highest/lowest percent of the range: 9.5% 0.0%


 I don't have a clue what that means.  None.


 Maybe we could add in front of the decile/percent

 distribution of increasing account key values selected by pgbench:


 I still wouldn't know what that meant.  And it misses the point
 anyway: if the documentation is good, this will be unnecessary.  If
 the documentation is bad, a printout that tries to illustrate it by
 example is not an acceptable substitute.


 The decile description is quite classic when discussing statistics.

Yeah, maybe, I and Fabien-san don't believe that he doesn't know the decile
percentage.
However, I think more description about decile is needed.

For example,  when we set the number of transaction 10,000 (-t 1),
range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

They mean that,
#number of access in range of aid (from decile percents):
  1 to 10,000 = 6,320 times
  10,001 to 20,000= 2,330 times
  20,001 to 30,000= 860 times
  ...
  90,001 to 10,  = 0 times

#number of access in range of aid (from highest/lowest percent of the
range):
 1 to 1,000= 950 times
 ...
 99,001 to 10,   = 0 times

that's all.

Their information is easy to understand distribution of access probability,
isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile
percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation
in the documents.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-07-18 Thread Magnus Hagander
On Fri, Jul 18, 2014 at 5:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 17, 2014 at 4:51 PM, Magnus Hagander mag...@hagander.net
 wrote:

 Did anyone actually test this patch? :)

 I admit I did not build it on Windows specifically because I assumed
 that was done as part of the development and review. And the changes
 to pg_event.c can never have built, since the file does not include
 the required header.

 I have tested it on Windows and infact on Linux as well to see if there
 is any side impact before marking it as Ready For Committer.

 It seems to me that the required header is removed in last version
 (pg_ctl_eventsrc_v11) where MessageBox() related changes have been
 removed from patch as per recent discussion.  Sorry for not being able
 to check last version posted.

Gotcha. Thanks for clarifying, and I apologize if I came across a bit
harsh even with the smiley.


 I have reverted that part of the patch for now, hopefully that'll
 unbreak the buildfarm.

 Do you want me to write a patch to use DEFAULT_EVENT_SOURCE in
 pgevent?

I'm not sure it's worth it. I do like the property that pg_event
doesn't have to pull in the full set of headers. But I guess it's
quite confusing to have *one* place that doesn't use the #define. So I
guess if it does work to just pull int he required header then yes, we
should do it - but if it turns out that header causes any conflicts
with anything else we're doing in the file, then let's just skipp it.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-07-18 Thread Petr Jelinek

On 18/07/14 00:41, Andres Freund wrote:

On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote:

{
switch (c)
{
@@ -227,6 +229,33 @@ main(int argc, char *argv[])
XLogFromFileName(optarg, minXlogTli, 
minXlogSegNo);
break;

+   case 's':
+   if (optarg)
+   {
+#ifdef HAVE_STRTOULL
+   set_sysid = strtoull(optarg, endptr, 
10);
+#else
+   set_sysid = strtoul(optarg, endptr, 
10);
+#endif


Wouldn't it be better to use sscanf()? That should work with large
inputs across platforms.



Well, sscanf does not do much validation and silently truncates too big 
input (which is why I replaced it with strtoull, original patch did have 
sscanf), also I think the portability of sscanf might not be as good, 
see 9d7ded0f4277f5c0063eca8e871a34e2355a8371 commit.




+   /*
+* Validate input, we use strspn 
because strtoul(l) accepts
+* negative numbers and silently 
converts them to unsigned
+*/
+   if (set_sysid == 0 || errno != 0 ||
+   endptr == optarg || *endptr != 
'\0' ||
+   strspn(optarg, 0123456789) != 
strlen(optarg))
+   {
+   fprintf(stderr, _(%s: invalid 
argument for option -s\n), progname);
+   fprintf(stderr, _(Try \%s --help\ 
for more information.\n), progname);
+   exit(1);
+   }


Maybe: 'invalid argument \%s\ ...'?



Ok


@@ -1087,6 +1147,7 @@ usage(void)
printf(_(  -o OID   set next OID\n));
printf(_(  -O OFFSETset next multitransaction offset\n));
printf(_(  -V, --versionoutput version information, then exit\n));
+   printf(_(  -s [SYSID]   set system identifier (or generate 
one)\n));
printf(_(  -x XID   set next transaction ID\n));
printf(_(  -?, --help   show this help, then exit\n));
printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));


I think we usually try to keep the options roughly alphabetically
ordered. Same in the getopt() above.



Ok, attached v4 with those changes.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 34b0606..39ae574 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -30,6 +30,7 @@ PostgreSQL documentation
arg choice=optoption-m/option replaceable class=parametermxid/replaceable,replaceable class=parametermxid/replaceable/arg
arg choice=optoption-O/option replaceable class=parametermxoff/replaceable/arg
arg choice=optoption-l/option replaceable class=parameterxlogfile/replaceable/arg
+   arg choice=optoption-s/option [replaceable class=parametersysid/replaceable]/arg
arg choice=plainreplaceabledatadir/replaceable/arg
   /cmdsynopsis
  /refsynopsisdiv
@@ -184,6 +185,13 @@ PostgreSQL documentation
   /para
 
   para
+   The option-s/ option instructs commandpg_resetxlog/command to 
+   set the system identifier of the cluster to specified replaceablesysid/.
+   If the replaceablesysid/ is not provided new one will be generated using
+   same algorithm commandinitdb/ uses.
+  /para
+
+  para
The option-V/ and option--version/ options print
the applicationpg_resetxlog/application version and exit.  The
options option-?/ and option--help/ show supported arguments,
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 915a1ed..c94ccda 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -67,7 +67,9 @@ static MultiXactId set_mxid = 0;
 static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
 static uint32 minXlogTli = 0;
 static XLogSegNo minXlogSegNo = 0;
+static uint64 set_sysid = 0;
 
+static uint64 GenerateSystemIdentifier(void);
 static bool ReadControlFile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
@@ -111,7 +113,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, fl:m:no:O:x:e:)) != -1)
+	while ((c = getopt(argc, argv, fl:m:no:O:s::x:e:)) != -1)
 	{
 		switch (c)
 		{
@@ -227,6 +229,33 @@ main(int argc, char *argv[])
 XLogFromFileName(optarg, minXlogTli, minXlogSegNo);
 break;
 
+			case 's':
+if (optarg)
+{
+#ifdef HAVE_STRTOULL
+	set_sysid = strtoull(optarg, 

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-07-18 Thread Andres Freund
On 2014-07-18 13:08:24 +0200, Petr Jelinek wrote:
 On 18/07/14 00:41, Andres Freund wrote:
 On 2014-06-27 00:51:02 +0200, Petr Jelinek wrote:
 {
 switch (c)
 {
 @@ -227,6 +229,33 @@ main(int argc, char *argv[])
 XLogFromFileName(optarg, minXlogTli, 
  minXlogSegNo);
 break;
 
 +   case 's':
 +   if (optarg)
 +   {
 +#ifdef HAVE_STRTOULL
 +   set_sysid = strtoull(optarg, endptr, 
 10);
 +#else
 +   set_sysid = strtoul(optarg, endptr, 
 10);
 +#endif
 
 Wouldn't it be better to use sscanf()? That should work with large
 inputs across platforms.
 
 
 Well, sscanf does not do much validation and silently truncates too big
 input (which is why I replaced it with strtoull, original patch did have
 sscanf)

Well, the checks on length you've added should catch that when adapted
properly.

, also I think the portability of sscanf might not be as good, see
 9d7ded0f4277f5c0063eca8e871a34e2355a8371 commit.

Fair point. But I don't think the arguments why using strtoul instead of
strtoull is safe hold much water here. In the pg_stat_statement case the
worst that could happen is that the rowcount isn't accurate. Here it's
setting a wrong system identifier. Not really the same.

Maybe it's time to pursue
http://archives.postgresql.org/message-id/20140603144654.GM24145%40awork2.anarazel.de
further :(

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New functions in sslinfo module

2014-07-18 Thread Michael Paquier
On Fri, Jul 18, 2014 at 3:12 PM, Воронин Дмитрий
carriingfat...@yandex.ru wrote:
 I make a new version of patch. I corrected your notes for my previous
 version of patch. Could you test it? Thank you.

I just had a look at the new version of this patch, and this is
lacking a couple of things.

First, this patch has been developed using a tarball of 9.3.4. sslinfo
is not a module changing frequently so you are lucky that this is not
generating diffs with HEAD but you should try to get yourself familiar
with git and submit patches that are based on HEAD to facilitate their
review and integration. You can have a look here to begin with:
https://wiki.postgresql.org/wiki/Working_with_GIT

Then, here are more comments about the patch:
- Update sslinfo to 1.1. Here are all the details about how to do it:
http://www.postgresql.org/docs/devel/static/extend-extensions.html#AEN57147
Well, basically this is only creating sslinfo--1.0--1.1.sql to be able
to do an upgrade, copy sslinfo--1.0.sql to sslinfo--1.1.sql with the
new objects defined and dumping sslinfo.control.
- In your previous patches I saw that you defined the new functions in
sslinfo--1.0.sql but in the new version of your patch it is not the
case.
- Please remove all the diff noise in sslinfo.sgml, like this one:
*** 157,167 
 varlistentry
  term
   functionssl_client_dn_field(fieldname text) returns text/function
-  indexterm
-   primaryssl_client_dn_field/primary
-  /indexterm
  /term
  listitem
  para
--- 157,167 
 /varlistentry

 varlistentry
+ indexterm
+  primaryssl_client_dn_field/primary
+ /indexterm
Your patch should only include documentation for the new functions.
- Please remove whitespaces, there are quite a lot of them.
- in 9.5, PG_FUNCTION_INFO_V1 does the function declaration for you,
making following block useless:
[...]
+Datum  ssl_is_used(PG_FUNCTION_ARGS);
+Datum  ssl_version(PG_FUNCTION_ARGS);
+Datum  ssl_cipher(PG_FUNCTION_ARGS);
[...]
- Please avoid if blocks of this type, this is not consistent with the
project style:
if (SRF_IS_FIRSTCALL()) {
[...]
}
One newline for each bracket. Here is the manual page referencing code
conventions:
http://www.postgresql.org/docs/current/static/source.html

Most of those comments have been already mentioned by Andreas in one
of his emails upthread but you have obviously not solved the issues he
has reported.

This looks like a useful feature, but at this point of the commit fest
and considering the number of structural changes still needed I will
mark this patch as Returned with feedback. Feel free to resubmit to
the next commit fest in August with an updated patch of course!

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-07-18 Thread Petr Jelinek

On 18/07/14 13:18, Andres Freund wrote:

On 2014-07-18 13:08:24 +0200, Petr Jelinek wrote:

On 18/07/14 00:41, Andres Freund wrote:

Wouldn't it be better to use sscanf()? That should work with large
inputs across platforms.



Well, sscanf does not do much validation and silently truncates too big
input (which is why I replaced it with strtoull, original patch did have
sscanf)


Well, the checks on length you've added should catch that when adapted
properly.


I don't see a portable way how to validate too big input values when 
using sscanf.




Maybe it's time to pursue
http://archives.postgresql.org/message-id/20140603144654.GM24145%40awork2.anarazel.de
further :(



Yes, I've seen that patch and was quite sad that it didn't make it in core.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-18 Thread Fabien COELHO



For example,  when we set the number of transaction 10,000 (-t 1),
range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

They mean that,
#number of access in range of aid (from decile percents):
 1 to 10,000 = 6,320 times
 10,001 to 20,000= 2,330 times
 20,001 to 30,000= 860 times
 ...
 90,001 to 10,  = 0 times

#number of access in range of aid (from highest/lowest percent of the
range):
1 to 1,000= 950 times
...
99,001 to 10,   = 0 times

that's all.

Their information is easy to understand distribution of access probability,
isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile
percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation
in the documents.


What we are talking about is the summary at the end of the run, which is 
expected to be compact, hence the terse few lines.


I'm not sure how to make it explicit without extending the summary too 
much, so it would not be a summary anymore:-)


My initial assumption is that anyone interested enough in changing the 
default uniform distribution for a test would know about decile, but that 
seems to be optimistic.


Maybe it would be okay to keep a terse summary but to expand the 
documentation to explain what it means, as you suggested above...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in binning functions

2014-07-18 Thread Petr Jelinek

On 16/07/14 21:35, Pavel Stehule wrote:

The performance difference is about 20% (+/- few depending on the
array size), I don't know if that's bad enough to warrant
type-specific implementation. I personally don't know how to make
the generic implementation much faster than it is now, except maybe
by turning it into aggregate which would cache the deconstructed
version of the array, but that change semantics quite a bit and is
probably not all that desirable.


I am not sure if our API is enough to do it - there are no any simple
support for immutable parameters.


Just to clarify, the ~20% performance difference is with separate 
generic implementation for fixed width types (most of the time seems to 
be spent in the FunctionCallInvoke part, I even tryed to use sortsupport 
instead but it does not seem to help much).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-07-18 Thread MauMau

From: Magnus Hagander mag...@hagander.net
On Fri, Jul 18, 2014 at 5:33 AM, Amit Kapila amit.kapil...@gmail.com 
wrote:

On Thu, Jul 17, 2014 at 4:51 PM, Magnus Hagander mag...@hagander.net
wrote:


Did anyone actually test this patch? :)

I admit I did not build it on Windows specifically because I assumed
that was done as part of the development and review. And the changes
to pg_event.c can never have built, since the file does not include
the required header.


I have tested it on Windows and infact on Linux as well to see if there
is any side impact before marking it as Ready For Committer.

It seems to me that the required header is removed in last version
(pg_ctl_eventsrc_v11) where MessageBox() related changes have been
removed from patch as per recent discussion.  Sorry for not being able
to check last version posted.


Gotcha. Thanks for clarifying, and I apologize if I came across a bit
harsh even with the smiley.


I'm sorry to have caused both of you trouble.  I have to admit that I didn't 
compile the source when I removed the MessageBox()-related changes.  The 
attached patch fixes that.  I confirmed successful build this time.


Thank you for committing, Magnus-san, and thank you very much for kind and 
repeated review and help, Amit-san.



Regards
MauMau


pgevent.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-18 Thread Fabien COELHO


Please find attached 2 patches, which are a split of the patch discussed 
in this thread.


(A) add gaussian  exponential options to pgbench \setrandom
the patch includes sql test files.

There is no change in the *code* from previous already reviewed 
submissions, so I do not think that it needs another review on that 
account.


However I have (yet again) reworked the *documentation* (for Andres Freund 
 Robert Haas), in particular both descriptions now follow the same 
structure (introduction, formula, intuition, rule of thumb and 
constraint). I have differentiated the concept and the option by putting 
the later in literal tags, and added a link to the corresponding 
wikipedia pages.



Please bear in mind that:
 1. English is not my native language.
 2. this is not easy reading... this is maths, to read slowly:-)
 3. word smithing contributions are welcome.

I assume somehow that a user interested in gaussian  exponential 
distributions must know a little bit about probabilities...




(B) add pgbench test variants with gauss  exponential.

I have reworked the patch so as to avoid copy pasting the 3 test cases, as 
requested by Andres Freund, thus this is new, although quite simple, code. 
I have also added explanations in the documentation about how to interpret 
the decile outputs, so as to hopefully address Robert Haas comments.


--
Fabien.diff --git a/contrib/pgbench/README b/contrib/pgbench/README
new file mode 100644
index 000..6881256
--- /dev/null
+++ b/contrib/pgbench/README
@@ -0,0 +1,5 @@
+# gaussian and exponential tests
+# with XXX as expo or gauss
+psql test  test-init.sql
+./pgbench -M prepared -f test-XXX-run.sql -t 100 -P 1 -n test
+psql test  test-XXX-check.sql
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 4aa8a50..a80c0a5 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -41,6 +41,7 @@
 #include math.h
 #include signal.h
 #include sys/time.h
+#include assert.h
 #ifdef HAVE_SYS_SELECT_H
 #include sys/select.h
 #endif
@@ -98,6 +99,8 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
+#define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+
 int			nxacts = 0;			/* number of transactions per client */
 int			duration = 0;		/* duration in seconds */
 
@@ -471,6 +474,76 @@ getrand(TState *thread, int64 min, int64 max)
 	return min + (int64) ((max - min + 1) * pg_erand48(thread-random_state));
 }
 
+/*
+ * random number generator: exponential distribution from min to max inclusive.
+ * the threshold is so that the density of probability for the last cut-off max
+ * value is exp(-exp_threshold).
+ */
+static int64
+getExponentialrand(TState *thread, int64 min, int64 max, double exp_threshold)
+{
+	double cut, uniform, rand;
+	assert(exp_threshold  0.0);
+	cut = exp(-exp_threshold);
+	/* erand in [0, 1), uniform in (0, 1] */
+	uniform = 1.0 - pg_erand48(thread-random_state);
+	/*
+	 * inner expresion in (cut, 1] (if exp_threshold  0),
+	 * rand in [0, 1)
+	 */
+	assert((1.0 - cut) != 0.0);
+	rand = - log(cut + (1.0 - cut) * uniform) / exp_threshold;
+	/* return int64 random number within between min and max */
+	return min + (int64)((max - min + 1) * rand);
+}
+
+/* random number generator: gaussian distribution from min to max inclusive */
+static int64
+getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
+{
+	double		stdev;
+	double		rand;
+
+	/*
+	 * Get user specified random number from this loop, with
+	 * -stdev_threshold  stdev = stdev_threshold
+	 *
+	 * This loop is executed until the number is in the expected range.
+	 *
+	 * As the minimum threshold is 2.0, the probability of looping is low:
+	 * sqrt(-2 ln(r)) = 2 = r = e^{-2} ~ 0.135, then when taking the average
+	 * sinus multiplier as 2/pi, we have a 8.6% looping probability in the
+	 * worst case. For a 5.0 threshold value, the looping probability
+	 * is about e^{-5} * 2 / pi ~ 0.43%.
+	 */
+	do
+	{
+		/*
+		 * pg_erand48 generates [0,1), but for the basic version of the
+		 * Box-Muller transform the two uniformly distributed random numbers
+		 * are expected in (0, 1] (see http://en.wikipedia.org/wiki/Box_muller)
+		 */
+		double rand1 = 1.0 - pg_erand48(thread-random_state);
+		double rand2 = 1.0 - pg_erand48(thread-random_state);
+
+		/* Box-Muller basic form transform */
+		double var_sqrt = sqrt(-2.0 * log(rand1));
+		stdev = var_sqrt * sin(2.0 * M_PI * rand2);
+
+		/*
+ 		 * we may try with cos, but there may be a bias induced if the previous
+		 * value fails the test? To be on the safe side, let us try over.
+		 */
+	}
+	while (stdev  -stdev_threshold || stdev = stdev_threshold);
+
+	/* stdev is in [-threshold, threshold), normalization to [0,1) */
+	rand = (stdev + stdev_threshold) / (stdev_threshold * 2.0);
+
+	/* return int64 random number within between min and max */
+	return 

[HACKERS] Proposal for updating src/timezone

2014-07-18 Thread John Cochran
Given my recent examination of the src/timezone subtree and the current
code at IANA for timezone information and tools, I proposing the following
changes and would like to first get group consensus on the change prior to
investing any major effort.

My proposal is the have the following directory structure

.../src/timezone - Would have only PostgreSQL specific code
.../src/timezone/tznames - would be unaltered from current (optionally this
could be removed)
.../src/timezone/iana - would contain unaltered code and data from IANA

The .../src/timezone/data directory would be removed.

In doing this, any future updates to the IANA code should be trivial to
incorporate into the postgres tree.

Reasons for change.
1. The current timezone code is based upon, according to the comments,
version 2010c from IANA. IANA's latest code is version 2014e and there are
substantial changes between those versions (as well as 35 intermediate
versions ... Yes, IANA changes and updates the code regularly and
frequently)

2. The current timezone data has a few entries that are not properly
handled by any pre-2013 code so an update of some sort is needed anyway.

3. The latest code from IANA compiles with GCC using a kitchen sink array
of warning options without any generated warnings. This should alleviate
any  concerns about not accepting any code unless it's warning free. In
case anyone is interested, the GCC options are: -Dlint -g3 -O3 -fno-common
-fstrict-aliasing  -Wall -Wextra -Wbad-function-cast -Wcast-align
-Wcast-qual -Wformat=2 -Winit-self -Wmissing-declarations
-Wmissing-noreturn -Wmissing-prototypes -Wnested-externs -Wno-address
-Wno-cast-qual -Wno-format-nonliteral -Wno-sign-compare
-Wno-sign-conversion -Wno-type-limits -Wno-unused-parameter
-Woverlength-strings -Wpointer-arith -Wshadow -Wstrict-prototypes
-Wsuggest-attribute=const -Wsuggest-attribute=noreturn
-Wsuggest-attribute=pure -Wtrampolines -Wwrite-strings

Aspects of change that I'm not happy with.
1. I would have liked to recommend 2 sub-directories underneath
.../src/timezone/iana named code and data, but unfortunately have to
suggest untarring both the code and data files directly into the iana
directory. The reason for this is that the IANA code does not compile using
the IANA Makefile unless the script yearistype.sh is present and that
script is currently present in the data tar file, not the code tar file.
And that unfortunately means that splitting the IANA code and data into
different directories would violate the objective of being able to deposit
the files unaltered into postgres.

2. Depositing the IANA code unaltered would also deposit some junk files
into the directory. The files would mainly consist of man pages and html
files containing documentation for the timezone code. The extra files would
consume approximately 500 kilobytes above what's actually needed, but
otherwise wouldn't have any adverse effects.

Thank you for reading this proposal,
John Cochran
-- 

There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies and the other way is to
make it so complicated that there are no obvious deficiencies. — C.A.R.
Hoare


Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-18 Thread Andres Freund
Hi,

On 2014-07-18 23:38:09 +0900, MauMau wrote:
 My customer reported a problem that the following message is output too
 often.
 
 LOG:  autovacuum: found orphan temp table pg_temp_838.some_table in
 database some_db
 LOG:  autovacuum: found orphan temp table pg_temp_902.some_table in
 database some_db

So they had server crashes of some form before - otherwise they
shouldn't see this because during ordinary shutdown the schema will have
been dropped. C.f. RemoveTempRelationsCallback().

 1. Why and when are these messages are output?  Do we have to do
 something?

Yes, you should investigate how the situation came to be.

 2. Won't they use up disk space?

Yes.

 3. Doesn't the output processing of these messages or its cause affect
 performance?  We happen to be facing a performance problem, the cause of
 which we haven't found yet.

Meh. If that's the bottleneck you've bigger problems.

 So, I propose a simple fix to change the LOG level to DEBUG1.  I don't know
 which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
 include this in 9.2.9?

Surely that's the wrong end to tackle this from. Hiding actual problems
is a seriously bad idea.

It'd be nice if we had infrastructure to do this at startup, but we
don't...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for updating src/timezone

2014-07-18 Thread Tom Lane
John Cochran j69coch...@gmail.com writes:
 My proposal is the have the following directory structure

 .../src/timezone - Would have only PostgreSQL specific code
 .../src/timezone/tznames - would be unaltered from current (optionally this
 could be removed)
 .../src/timezone/iana - would contain unaltered code and data from IANA

 The .../src/timezone/data directory would be removed.

I am not in favor of doing anything to the data/ directory.  If we can
have unaltered tzcode source files in their own subdirectory, that'd be
great, but I see no advantage to smashing the IANA code and data files
together.  Furthermore, doing so would break the existing ability to just
cherry-pick tzdata update commits into back branches.

Part of my thought here is that we do generally just dump tzdata updates
into our tree without much thought, but I don't think that will ever be
the case for tzcode updates; we'll want to look at those with more care,
and probably we'll not back-patch them.

 1. I would have liked to recommend 2 sub-directories underneath
 .../src/timezone/iana named code and data, but unfortunately have to
 suggest untarring both the code and data files directly into the iana
 directory. The reason for this is that the IANA code does not compile using
 the IANA Makefile unless the script yearistype.sh is present and that
 script is currently present in the data tar file, not the code tar file.

I have exactly zero expectation of using their Makefile, so this is not a
concern.

 2. Depositing the IANA code unaltered would also deposit some junk files
 into the directory. The files would mainly consist of man pages and html
 files containing documentation for the timezone code. The extra files would
 consume approximately 500 kilobytes above what's actually needed, but
 otherwise wouldn't have any adverse effects.

We're not doing that either, IMO.  Why should we invest 500K of tarball
space on that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
Hi,

On 2014-07-17 18:18:41 -0700, Peter Geoghegan wrote:
 I'm working on UPSERT again. I think that in order to make useful
 progress, I'll have to find a better way of overcoming the visibility
 issues (i.e. the problem of what to do about a
 still-in-progress-to-our-snapshot row being locked at READ COMMITTED
 isolation level [1][2]).

Agreed.

 I've made some tentative additions to my earlier patch to change the syntax:
 
 postgres=# explain analyze
 with c as (
   insert into ints(key, val) values (1, 'a'), (2, 'b')
   on conflict
   select * for update)
   update ints set val = c.val from c conflicts;

I still don't agree that this is a sane syntax for upsert. Avoiding it
would reduce the scope of the problems you have measureably. We should
provide a syntax that does the UPSERT itself, instead of delegating that
to the user as you're proposing above.  Afaics nearly none of the
problems you're debating in your email would exist if upsert were
entirely done internally.

I think the things that use wierd visibility semantics are pretty much
all doing that internally (things being EvalPlanQual stuff for
INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
see sufficient reason why we should break away from that here. Yes,
there's stuff that cannot be done when it's done internally, but we can
live with those not being possible.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think the things that use wierd visibility semantics are pretty much
 all doing that internally (things being EvalPlanQual stuff for
 INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
 see sufficient reason why we should break away from that here. Yes,
 there's stuff that cannot be done when it's done internally, but we can
 live with those not being possible.

The whole point of what I was proposing was that those semantics would
only apply to a special tid scan node. Perhaps I missed something, but
I'm not sure why you'd consider that I was breaking away from that
here at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
On 2014-07-18 10:53:36 -0700, Peter Geoghegan wrote:
 On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think the things that use wierd visibility semantics are pretty much
  all doing that internally (things being EvalPlanQual stuff for
  INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
  see sufficient reason why we should break away from that here. Yes,
  there's stuff that cannot be done when it's done internally, but we can
  live with those not being possible.
 
 The whole point of what I was proposing was that those semantics would
 only apply to a special tid scan node. Perhaps I missed something, but
 I'm not sure why you'd consider that I was breaking away from that
 here at all.

I don't see why you'd need such a node at all if we had a fully builtin
UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
barely anybody will understand, let alone use correctly in queries they
write themselves.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't see why you'd need such a node at all if we had a fully builtin
 UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
 UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
 barely anybody will understand, let alone use correctly in queries they
 write themselves.

I accept that there will be a need for certain restrictions. Most
obviously, if you update the target table referencing a CTE like this,
not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
error. And as I mentioned, you may only join the projected duplicates
to the UPDATE ModifyTable - an attempt to join any more relations is
an error. In short, this *is* a fully built-in upsert.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in binning functions

2014-07-18 Thread Petr Jelinek

On 08/07/14 02:14, Tom Lane wrote:

Also, the set of functions provided still needs more thought, because the
resolution of overloaded functions doesn't really work as nicely as all
that.


I am honestly considering just removing special case for int8 for now, 
the sql standard width_bucket has only support for float8 and numeric, 
maybe it's enough for this one also...


What's your opinion on that?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
On 2014-07-18 11:14:34 -0700, Peter Geoghegan wrote:
 On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I don't see why you'd need such a node at all if we had a fully builtin
  UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
  UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
  barely anybody will understand, let alone use correctly in queries they
  write themselves.
 
 I accept that there will be a need for certain restrictions. Most
 obviously, if you update the target table referencing a CTE like this,
 not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
 error. And as I mentioned, you may only join the projected duplicates
 to the UPDATE ModifyTable - an attempt to join any more relations is
 an error. In short, this *is* a fully built-in upsert.

Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such. I think you're attempting a sort of genericity
that's making your (important!) goal much harder to reach.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Meh. A understandable syntax wouldn't require the pullups with a special
 scan node and such. I think you're attempting a sort of genericity
 that's making your (important!) goal much harder to reach.

Well, maybe. If the genericity of this syntax isn't what people want,
I may go with something else. At the suggestion of Robert I did start
a dedicated thread on that question months back (explicitly putting
aside the nitty-gritty details of the implementation), but that didn't
go anywhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-07-18 Thread Robert Haas
On Wed, Jul 16, 2014 at 10:04 PM, Brightwell, Adam
adam.brightw...@crunchydatasolutions.com wrote:

 Yes, I just tested it and the following would work from a grammar
 perspective:

 ALTER TABLE table_name POLICY ADD policy_name (policy_quals)
 ALTER TABLE table_name POLICY DROP policy_name

 Though, it would obviously require the addition of POLICY to the list of
 unreserved keywords.  I don't suspect that would be a concern, as it is not
 reserved, but thought I would point it out just in case.

 Another thought I had was, would we also want the following, so that
 policies could be modified?

 ALTER TABLE table_name POLICY ALTER policy_name (policy_quals)

I think we do want a way to modify policies.  However, we tend to
avoid syntax that involves unnatural word order, as this certainly
does.  Maybe it's better to follow the example of CREATE RULE and
CREATE TRIGGER and do something this instead:

CREATE POLICY policy_name ON table_name USING quals;
ALTER POLICY policy_name ON table_name USING quals;
DROP POLICY policy_name ON table_name;

The advantage of this is that you can regard policy_name ON
table_name as the identifier for the policy throughout the system.
You need some kind of identifier of that sort anyway to support
COMMENT ON, SECURITY LABEL, and ALTER EXTENSION ADD/DROP for policies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:32 AM, Peter Geoghegan p...@heroku.com wrote:
 Well, maybe. If the genericity of this syntax isn't what people want,
 I may go with something else.

By the way, I didn't mention that there is now also the optional
ability to specify a merge on unique index directly in DML. It would
be much nicer to specify a sort key instead, but that might be tricky
in the general case. I imagine that other systems solve the problem by
being very restrictive in what will be (implicitly) accepted as a
merge-on index. Seemingly there are problems with all major SQL MERGE
implementations, so I don't necessarily expect to be able to draw
lessons from them in any way here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for updating src/timezone

2014-07-18 Thread John Cochran
On Fri, Jul 18, 2014 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 John Cochran j69coch...@gmail.com writes:
  My proposal is the have the following directory structure



 ...
  1. I would have liked to recommend 2 sub-directories underneath

...



I have exactly zero expectation of using their Makefile, so this is not a
 concern.

  2. Depositing the IANA code unaltered would also deposit some junk
 files

...

 We're not doing that either, IMO.  Why should we invest 500K of tarball
 space on that?



Understandable, and both issues are in the category of Things I didn't
like

Did a diff between the 2010c version of localtime.c and the postgres
version and saw a lot more differences than what could have been expected
from simple reformatting and adaptation. So I installed gitk and took a
look at the change history of localtime.c

Well, looking at the results, it pretty much put paid to the concept of
ever importing the IANA code unaltered into the postgres tree. In fact, it
looks like the original version of localtime.c was based upon one of the
2004a thru 2004c versions of the IANA code and since then has been
independently maintained.

In any case, I think my best path forward is to look at the change history
of the source files under src/timezone and determine the actual origin of
each file based upon the git history and the IANA archive. Then see what's
needed to bring the code up to date based upon the IANA code. Due to the
configure option --with-system-tzdata, I assume that postgres needs to
remain compatible with the output from the IANA zic program. That combined
with the warning messages about some of the entries in the current timezone
data not being compatible with pre-2013 code worries me a bit since the
2010c code from IANA does not support the output from the 2014e zic.
-- 

There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies and the other way is to
make it so complicated that there are no obvious deficiencies. — C.A.R.
Hoare


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-07-18 Thread Jeff Janes
On Wed, Jul 16, 2014 at 5:30 AM, Dilip kumar dilip.ku...@huawei.com wrote:

  On 16 July 2014 12:13 Magnus Hagander Wrote,

 Yeah, those are exactly my points. I think it would be significantly
 simpler to do it that way, rather than forking and threading. And also
 easier to make portable...

 (and as a  optimization on Alvaros suggestion, you can of course reuse
 the initial connection as one of the workers as long as you got the full
 list of tasks from it up front, which I think you  do anyway in order to do
 sorting of tasks...)

 Oh, I got your point, I will update my patch and send,

 Now we can completely remove vac_parallel.h file and no need of
 refactoring also:)

 Thanks  Regards,

 Dilip Kumar


Should we push the refactoring through anyway?  I have a hard time
believing that pg_dump is going to be the only client program we ever have
that will need process-level parallelism, even if this feature itself does
not need it.  Why make the next person who comes along re-invent that
re-factoring of this wheel?

Cheers,

Jeff


Re: [HACKERS] Proposal for updating src/timezone

2014-07-18 Thread Tom Lane
John Cochran j69coch...@gmail.com writes:
 Did a diff between the 2010c version of localtime.c and the postgres
 version and saw a lot more differences than what could have been expected
 from simple reformatting and adaptation. So I installed gitk and took a
 look at the change history of localtime.c

 Well, looking at the results, it pretty much put paid to the concept of
 ever importing the IANA code unaltered into the postgres tree. In fact, it
 looks like the original version of localtime.c was based upon one of the
 2004a thru 2004c versions of the IANA code and since then has been
 independently maintained.

That's certainly true, but I'm not sure that we should give up all that
easily on getting closer to the IANA code again.  For one thing, some of
the thrashing had to do with the fact that we went over to 64-bit
timestamps sooner than the IANA crew did.  I'm assuming that they insist
on 64-bit arithmetic now; we do too, so for instance some of the typedef
hacking might no longer be necessary.

AFAICT from a quick scan of the git logs, the main thing we've done to the
API that's not in IANA is invent pg_next_dst_boundary().  Perhaps that
could be pushed into a separate source file.

Even if we only got to a point where the sources were the same except for
a few narrow patches, it would make tracking their updates much easier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-07-18 Thread Alvaro Herrera
Jeff Janes wrote:

 Should we push the refactoring through anyway?  I have a hard time
 believing that pg_dump is going to be the only client program we ever have
 that will need process-level parallelism, even if this feature itself does
 not need it.  Why make the next person who comes along re-invent that
 re-factoring of this wheel?

I gave the refactoring patch a look some days ago, and my conclusion was
that it is reasonably sound but it needed quite some cleanup in order
for it to be committable.  Without any immediate use case, it's hard to
justify going through all that effort.  Maybe we can add a TODO item and
have it point to the posted patch, so that if in the future we see a
need for another parallel client program we can easily rebase the
current patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how to reach D5 in tuplesort.c 's polyphase merge algorithm?

2014-07-18 Thread 土卜皿
hi, all
  for studying polyphase merge algorithm of tuplesort.c, I use ddd and
apend a table, which has a schema as follows:

CREATE TABLE Towns (
   id SERIAL UNIQUE NOT NULL,
   code VARCHAR(10) NOT NULL, -- Only unique inside a department
   article TEXT,
   name TEXT NOT NULL, -- Names are not really unique, for instance
'Sainte-Croix'
   department VARCHAR(4) NOT NULL REFERENCES Departments (code),
   UNIQUE (code, department)
   -- UNIQUE (name, department) -- Not perfectly unique but almost
  );

and has 36684 records, and every record is like:
id  code  article  name  department
31800266\NMachault77

and for getting into external sort, I type the following command:

select * from towns order by name desc;

but I found it need not reach D5 and D6 during sorting,  I thought that the
reason is the amount of runs is 3 (too small) before merging, for generate
more runs, I shuffled the records
when inputting them and got the same result.

so that I want some help, what schema and records do I as least need for
reaching D5 and D6?  any advice will be appreciated!

BEST REGAERDS
Dillon


Re: [HACKERS] how to reach D5 in tuplesort.c 's polyphase merge algorithm?

2014-07-18 Thread Tom Lane
=?UTF-8?B?5Zyf5Y2c55q/?= pengcz.n...@gmail.com writes:
   for studying polyphase merge algorithm of tuplesort.c, I use ddd and
 apend a table, which has a schema as follows:
 ...
 and has 36684 records, and every record is like:
 id  code  article  name  department
 31800266\NMachault77

 and for getting into external sort, I type the following command:

 select * from towns order by name desc;

 but I found it need not reach D5 and D6 during sorting,

That doesn't sound like enough data to force it to spill to disk at all;
at least not unless you turn down work_mem to some very small value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to reach D5 in tuplesort.c 's polyphase merge algorithm?

2014-07-18 Thread 土卜皿
2014-07-19 6:26 GMT+08:00 Tom Lane t...@sss.pgh.pa.us:

 =?UTF-8?B?5Zyf5Y2c55q/?= pengcz.n...@gmail.com writes:
for studying polyphase merge algorithm of tuplesort.c, I use ddd and
  apend a table, which has a schema as follows:
  ...
  and has 36684 records, and every record is like:
  id  code  article  name  department
  31800266\NMachault77

  and for getting into external sort, I type the following command:

  select * from towns order by name desc;

  but I found it need not reach D5 and D6 during sorting,

 That doesn't sound like enough data to force it to spill to disk at all;
 at least not unless you turn down work_mem to some very small value.



hi, Tom
  thanks a lot!



work_mem you said remind me one more thing I did, I tried to change BLCKSZ
= 8192/2, and successfully compiled, but I got a error when executing
initdb

Dillon


Re: [HACKERS] RLS Design

2014-07-18 Thread Brightwell, Adam

 I think we do want a way to modify policies.  However, we tend to
 avoid syntax that involves unnatural word order, as this certainly
 does.  Maybe it's better to follow the example of CREATE RULE and
 CREATE TRIGGER and do something this instead:

 CREATE POLICY policy_name ON table_name USING quals;
 ALTER POLICY policy_name ON table_name USING quals;
 DROP POLICY policy_name ON table_name;

 The advantage of this is that you can regard policy_name ON
 table_name as the identifier for the policy throughout the system.
 You need some kind of identifier of that sort anyway to support
 COMMENT ON, SECURITY LABEL, and ALTER EXTENSION ADD/DROP for policies.


Sounds good.  I certainly think it makes a lot of sense to include the
ALTER functionality, if for no other reason than ease of use.

Another item to consider, though I believe it can come later, is per-action
policies.  Following the above suggested syntax, perhaps that might look
like the following?

CREATE POLICY policy_name ON table_name FOR action USING quals;
ALTER POLICY policy_name ON table_name FOR action USING quals;
DROP POLICY policy_name ON table_name FOR action;

I was also giving some thought to the use of POLICY, perhaps I am wrong,
but it does seem it could be at risk of becoming ambiguous down the road.
 I can't think of any specific examples at the moment, but my concern is
what happens if we wanted to add another type of policy, whatever that
might be, later?  Would it make more sense to go ahead and qualify this a
little more with ROW SECURITY POLICY?

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


[HACKERS] Issues with dropped columns in views depending on functions

2014-07-18 Thread Tom Lane
I looked into the problem described here:
http://www.postgresql.org/message-id/53c93986.80...@clickware.de

The core issue is that the ruleutils.c code isn't thinking about dropped
columns in the output of a function-in-FROM that returns a composite type.
Pre-9.3, there had been some code there to cope with the case by means of
doing a get_rte_attribute_is_dropped() test on each column as the alias
list was printed, but I'd lost that in the rewrite that improved handling
of column aliases in RTEs with dropped columns.  I looked at putting that
back, but (a) it's slow, and (b) it didn't really work right even when it
was there: it only accidentally failed to crash in EXPLAIN's usage,
because we were applying exprType() to a null funcexpr field.

The next possibility that I looked at was teaching the code to ignore
empty-string items in a function RTE's alias list, since that's what
the parser inserts to correspond to dropped columns at parse time.
This solves the reported problem, almost, except that it exposed an
ancient bug in outfuncs/readfuncs: the code to print a T_String node is
wrong for the case of an empty string.  What it actually prints is 
which of course reads back as , not empty.  After repairing that, I had
a patch that takes care of the reported problem, at least for newly
created views (see attached).  It won't do anything to retroactively fix
the problem in existing views, but I think that might be okay given the
small number of complaints.

However, this only fixes the issue for columns that were dropped before
the view was created.  The system will let you drop columns *after* the
view is created, and then we have a problem, as illustrated by the
rather dubious results in the regression test in the attached patch.
I'm of the opinion that this is a bug and we should disallow dropping
a composite type's column if it's explicitly referenced in any view, much
as would happen if there were a direct reference to the table.  However,
that will take some nontrivial surgery on the dependency code, and I
rather doubt that we want to back-patch such a thing --- especially since
the needed pg_depend entries wouldn't be there anyway for existing views
in existing databases.

What I propose to do is apply the attached back through 9.3, so that at
least users are no worse off than they were pre-9.3.  I will look at
fixing the problem of a post-view-creation column drop later, but I
suspect we'll end up not back-patching those changes.

regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9573a9b..0631bc2 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outValue(StringInfo str, const Value *v
*** 2506,2512 
  			break;
  		case T_String:
  			appendStringInfoChar(str, '');
! 			_outToken(str, value-val.str);
  			appendStringInfoChar(str, '');
  			break;
  		case T_BitString:
--- 2506,2513 
  			break;
  		case T_String:
  			appendStringInfoChar(str, '');
! 			if (value-val.str[0] != '\0')
! _outToken(str, value-val.str);
  			appendStringInfoChar(str, '');
  			break;
  		case T_BitString:
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 0781ac8..7237e5d 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*** set_relation_column_names(deparse_namesp
*** 3086,3092 
  		i = 0;
  		foreach(lc, rte-eref-colnames)
  		{
! 			real_colnames[i] = strVal(lfirst(lc));
  			i++;
  		}
  	}
--- 3086,3101 
  		i = 0;
  		foreach(lc, rte-eref-colnames)
  		{
! 			/*
! 			 * If the column name shown in eref is an empty string, then it's
! 			 * a column that was dropped at the time of parsing the query, so
! 			 * treat it as dropped.
! 			 */
! 			char	   *cname = strVal(lfirst(lc));
! 
! 			if (cname[0] == '\0')
! cname = NULL;
! 			real_colnames[i] = cname;
  			i++;
  		}
  	}
diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out
index 06b2037..e2d4276 100644
*** a/src/test/regress/expected/create_view.out
--- b/src/test/regress/expected/create_view.out
*** select pg_get_viewdef('vv6', true);
*** 1332,1337 
--- 1332,1389 
JOIN tt13 USING (z);
  (1 row)
  
+ --
+ -- Check some cases involving dropped columns in a function's rowtype result
+ --
+ create table tt14t (f1 text, f2 text, f3 text, f4 text);
+ insert into tt14t values('foo', 'bar', 'baz', 'quux');
+ alter table tt14t drop column f2;
+ create function tt14f() returns setof tt14t as
+ $$
+ declare
+ rec1 record;
+ begin
+ for rec1 in select * from tt14t
+ loop
+ return next rec1;
+ end loop;
+ end;
+ $$
+ language plpgsql;
+ create view tt14v as select t.* from tt14f() t;
+ select pg_get_viewdef('tt14v', true);
+  pg_get_viewdef 
+ 
+   SELECT t.f1, +
+  t.f3,

Re: [HACKERS] WAL format and API changes (9.5)

2014-07-18 Thread Michael Paquier
On Wed, Jul 2, 2014 at 4:23 PM, Michael Paquier
michael.paqu...@gmail.com wrote:



 On Wed, Jul 2, 2014 at 4:09 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 3) I noticed a bug in gin redo code path when trying to use the WAL replay
 facility.
This patch has been on status Waiting on author for a little bit of
time, marking it now as Returned with feedback as there are still
issues with gin record construction and redo code paths.

Feel free to disagree if you don't think this is correct.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] subquery in CHECK constraint

2014-07-18 Thread Tatsuo Ishii
Hi,

Has anybody tried to implement subquery in CHECK constaint? If so,
what are issues to implement it? Or the feature is not worth the
effort? Comments and/or opinions are welcome.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subquery in CHECK constraint

2014-07-18 Thread Peter Geoghegan
Hi,

On Fri, Jul 18, 2014 at 7:31 PM, Tatsuo Ishii is...@postgresql.org wrote:
 Has anybody tried to implement subquery in CHECK constaint? If so,
 what are issues to implement it? Or the feature is not worth the
 effort? Comments and/or opinions are welcome.

I think the basic problem would be what the check constraint subquery
meant to the user, and how useful that is expected to be in general. A
subquery in a check constraint would presumably involve checking the
subquery using an existing snapshot of the command that required the
constraint to be verified (say, an INSERT). But why should that
snapshot be so special? In any case the result of the subquery may not
be immutable (even in some limited, practical sense), and we expect
check constraints to be on immutable conditions on constrained columns
only. In general it would be practically impossible to determine that
something else had changed the state of the database in such a way as
to make the check constraint no longer verify successfully on each
row, so we would not be able to prevent that from happening later on.

I imagine that you have a very specific case in mind, though. Perhaps
you can share the details.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subquery in CHECK constraint

2014-07-18 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Fri, Jul 18, 2014 at 7:31 PM, Tatsuo Ishii is...@postgresql.org wrote:
 Has anybody tried to implement subquery in CHECK constaint? If so,
 what are issues to implement it? Or the feature is not worth the
 effort? Comments and/or opinions are welcome.

 I think the basic problem would be what the check constraint subquery
 meant to the user, and how useful that is expected to be in general.

Yeah.  Check constraints are only well-defined to the extent that they
constrain the contents of the current row independent of anything else.
It's hard to conceive of a use-case for a subquery that wouldn't violate
that in some fashion.

I can certainly conceive of cases in which you want to constrain the
contents of one table in terms of another's contents, sort of like foreign
keys, but let's suppose that the particular invariant you have in mind
isn't expressible as a foreign key.  But you can write a CHECK subquery
that captures what you want.  Now what?  There's a *lot* of complicated
infrastructure needed to implement foreign keys, because they constrain
both tables not just one.  How would you invert a CHECK subquery to figure
out what changes are allowed in the referenced table?

Maybe you're willing to accept the special case in which you don't intend
ever to change the referenced table, or are willing to take responsibility
for not changing it in a way that violates the CHECK constraint for any
existing row in the referencing table.  So fine; all the system is
supposed to do is check the constraint on every insert/update in the
referencing table.  I think the implementation issues would be

(1) there's no support for doing any planning of subqueries in standalone
expressions.  This is probably just a small matter of programming, but
still a hurdle to be jumped.

(2) how would pg_dump deal with check constraints like these?  At minimum
it'd have to understand, or guess at, the dump ordering restrictions
needed to allow data to be reloaded with such a constraint.

I'm not sure this is much easier to solve than the general case of
SQL assertions (which we have not got either).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subquery in CHECK constraint

2014-07-18 Thread Tatsuo Ishii
 I think the basic problem would be what the check constraint subquery
 meant to the user, and how useful that is expected to be in general. A
 subquery in a check constraint would presumably involve checking the
 subquery using an existing snapshot of the command that required the
 constraint to be verified (say, an INSERT). But why should that
 snapshot be so special? In any case the result of the subquery may not
 be immutable (even in some limited, practical sense), and we expect
 check constraints to be on immutable conditions on constrained columns
 only. In general it would be practically impossible to determine that
 something else had changed the state of the database in such a way as
 to make the check constraint no longer verify successfully on each
 row, so we would not be able to prevent that from happening later on.
 
 I imagine that you have a very specific case in mind, though. Perhaps
 you can share the details.

No I don't have a specific case. I am just wondering because it's
defined in the standard.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers