[HACKERS] pgbench more operators & functions

2016-04-02 Thread Fabien COELHO


Hello,

Here is a simple patch which adds a bunch of operators (bitwise: & | ^ ~, 
comparisons: =/== <>/!= < <= > >=, logical: and/&& or/|| xor/^^ not/!) and 
functions (exp ln if) to pgbench. I've tried to be pg's SQL compatible 
where appropriate.


Also attached is a simple test script.

Some kind of continuations in \ commands would be a good thing.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 06cd5db..def6761 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -821,9 +821,17 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
+  arithmetic operators (unary: +, -; binary:
+  +, -, *, /, %),
+  bitwise operators (unary: ~; binary: ,
+  |, #/^, ,
+  )
+  comparisons (=/==, /!=,
+  =, , =, ),
+  logical operators (unary: not/!;
+  binary: and/,
+  or/||, xor/^^),
+  with their usual precedence and associativity,
   function calls, and
   parentheses.
  
@@ -954,6 +962,20 @@ pgbench  options  dbname
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
+   if(c,e1,e2)
+   same as e*
+   if c is not zero then e1 else e2
+   if(0,1.0,2.0)
+   2.0
+  
+  
int(x)
integer
cast to int
@@ -961,6 +983,13 @@ pgbench  options  dbname
9
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
max(i [, ... ] )
integer
maximum value
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 64c29dc..f9047e1 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -52,8 +52,22 @@ static PgBenchExpr *make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *
 %type  VARIABLE FUNCTION
 
 %token INTEGER_CONST DOUBLE_CONST VARIABLE FUNCTION
+%token AND_OP OR_OP XOR_OP NE_OP LE_OP GE_OP LS_OP RS_OP
 
 /* Precedence: lowest to highest */
+
+/* logical */
+%left	XOR_OP
+%left	OR_OP
+%left	AND_OP
+%right  UNOT
+/* comparison */
+%left	'=' NE_OP
+%left	'<' '>' LE_OP GE_OP
+/* bitwise */
+%left	'^' '|' '&' LS_OP RS_OP
+%right	UINV
+/* arithmetic */
 %left	'+' '-'
 %left	'*' '/' '%'
 %right	UMINUS
@@ -71,11 +85,29 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| '+' expr %prec UMINUS	{ $$ = $2; }
 	| '-' expr %prec UMINUS	{ $$ = make_op(yyscanner, "-",
 		   make_integer_constant(0), $2); }
+	| '~' expr %prec UINV	{ $$ = make_op(yyscanner, "^",
+		   make_integer_constant(-1), $2); }
+	| '!' expr %prec UNOT	{ $$ = make_op(yyscanner, "^^",
+		   make_integer_constant(1), $2); }
 	| expr '+' expr			{ $$ = make_op(yyscanner, "+", $1, $3); }
 	| expr '-' expr			{ $$ = make_op(yyscanner, "-", $1, $3); }
 	| expr '*' expr			{ $$ = make_op(yyscanner, "*", $1, $3); }
 	| expr '/' expr			{ $$ = make_op(yyscanner, "/", $1, $3); }
 	| expr '%' expr			{ $$ = make_op(yyscanner, "%", $1, $3); }
+	| expr '<' expr			{ $$ = make_op(yyscanner, "<", $1, $3); }
+	| expr LE_OP expr		{ $$ = make_op(yyscanner, "<=", $1, $3); }
+	| expr '>' expr			{ $$ = make_op(yyscanner, "<", $3, $1); }
+	| expr GE_OP expr		{ $$ = make_op(yyscanner, "<=", $3, $1); }
+	| expr '=' expr			{ $$ = make_op(yyscanner, "=", $1, $3); }
+	| expr NE_OP expr		{ $$ = make_op(yyscanner, "<>", $1, $3); }
+	| expr '&' expr			{ $$ = make_op(yyscanner, "&", $1, $3); }
+	| expr '|' expr			{ $$ = make_op(yyscanner, "|", $1, $3); }
+	| expr '^' expr			{ $$ = make_op(yyscanner, "^", $1, $3); }
+	| expr LS_OP expr		{ $$ = make_op(yyscanner, "<<", $1, $3); }
+	| expr RS_OP expr		{ $$ = make_op(yyscanner, ">>", $1, $3); }
+	| expr AND_OP expr		{ $$ = make_op(yyscanner, "&&", $1, $3); }
+	| expr OR_OP expr		{ $$ = make_op(yyscanner, "||", $1, $3); }
+	| expr XOR_OP expr		{ $$ = make_op(yyscanner, "^^", $1, $3); }
 	| INTEGER_CONST			{ $$ = make_integer_constant($1); }
 	| DOUBLE_CONST			{ $$ = make_double_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
@@ -177,6 +209,12 @@ static const struct
 		"sqrt", 1, PGBENCH_SQRT
 	},
 	{
+		"ln", 1, PGBENCH_LN
+	},
+	{
+		"exp", 1, PGBENCH_EXP
+	},
+	{
 		"int", 1, PGBENCH_INT
 	},
 	{
@@ -191,6 +229,45 @@ static const struct
 	{
 		"random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL
 	},
+	{
+		"&&", 2, PGBENCH_AND
+	},
+	{
+		"||", 2, PGBENCH_OR
+	},
+	{
+		"^^", 2, PGBENCH_XOR
+	},
+	{
+		"&", 2, PGBENCH_BITAND
+	},
+	{
+		"|", 2, PGBENCH_BITOR
+	},
+	{
+		"^", 2, PGBENCH_BITXOR
+	},
+	{
+		"<<", 2, PGBENCH_LSHIFT
+	},
+	{
+		">>", 2, PGBENCH_RSHIFT
+	},
+	{
+		"=", 2, PGBENCH_EQ
+	},
+	{
+		"<>", 2, PGBENCH_NE
+	},
+	{
+		"<=", 2, PGBENCH_LE
+	},
+	{
+		"<", 

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Tom Lane
Alex Shulgin  writes:
> On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:
>> Well, we have to do *something* with the last (possibly only) value.
>> Neither "include always" nor "omit always" seem sane to me.  What other
>> decision rule do you want there?

> Well, what implies that the last value is somehow special?  I would think
> we should just do with it whatever we do with the rest of the candidate
> MCVs.

Sure, but both of the proposed decision rules break down when there are no
values after the one under consideration.  We need to do something sane
there.

> For "the only value" case: we cannot build a histogram out of a single
> value, so omitting it from MCVs is not a good strategy, ISTM.
> From my point of view that amounts to "include always".

If there is only one value, it will have 100% of the samples, so it would
get included under just about any decision rule (other than "more than
100% of this value plus following values").  I don't think making sure
this case works is sufficient to get us to a reasonable rule --- it's
a necessary case, but not a sufficient case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin 
> wrote:
> >> I'm not sure yet about the 1% rule for the last value, but would also
> love
> >> to see if we can avoid the arbitrary limit here.  What happens with a
> last
> >> value which is less than 1% popular in the current code anyway?
>
> > Now that I think about it, I don't really believe this arbitrary
> heuristic
> > is any good either, sorry.
>
> Yeah, it was just a placeholder to produce a working patch.
>
> Maybe we could base this cutoff on the stats target for the column?
> That is, "1%" would be the right number if stats target is 100,
> otherwise scale appropriately.
>
> > What was your motivation to introduce some limit at the bottom anyway?
>
> Well, we have to do *something* with the last (possibly only) value.
> Neither "include always" nor "omit always" seem sane to me.  What other
> decision rule do you want there?
>

Well, what implies that the last value is somehow special?  I would think
we should just do with it whatever we do with the rest of the candidate
MCVs.

For "the only value" case: we cannot build a histogram out of a single
value, so omitting it from MCVs is not a good strategy, ISTM.

>From my point of view that amounts to "include always".  What problems do
you see with this approach exactly?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Tom Lane
Alex Shulgin  writes:
> On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin  wrote:
>> I'm not sure yet about the 1% rule for the last value, but would also love
>> to see if we can avoid the arbitrary limit here.  What happens with a last
>> value which is less than 1% popular in the current code anyway?

> Now that I think about it, I don't really believe this arbitrary heuristic
> is any good either, sorry.

Yeah, it was just a placeholder to produce a working patch.

Maybe we could base this cutoff on the stats target for the column?
That is, "1%" would be the right number if stats target is 100,
otherwise scale appropriately.

> What was your motivation to introduce some limit at the bottom anyway?

Well, we have to do *something* with the last (possibly only) value.
Neither "include always" nor "omit always" seem sane to me.  What other
decision rule do you want there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Tom Lane
Alex Shulgin  writes:
> What about regression tests?  My assumption was that we won't be able to
> add them with the usual expected file approach, but that we also don't need
> it that hard.  Everyone's in favor?

It'd be nice to have a regression test, but I concur that the LOCATION
information is too unstable for that to be practical.

We could imagine testing some variant behavior that omits printing
LOCATION, but that doesn't really seem worth the trouble; I think
we'd be inventing the variant behavior only for testing purposes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Refer to a TOKEN_USER payload as a "token user, " not as a "user

2016-04-02 Thread Tom Lane
Stephen Frost  writes:
> * Noah Misch (n...@leadboat.com) wrote:
>> The wording in GetTokenUser() and AddUserToTokenDacl() seems fine; let's
>> standardize on that.  Also, every GetTokenUser() failure has been yielding 
>> two
>> messages, the second contributing no further detail.  I'll reduce that to the
>> usual one message per failure.

> This approach works for me.

OK by me, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin  wrote:

>
> I'm not sure yet about the 1% rule for the last value, but would also love
> to see if we can avoid the arbitrary limit here.  What happens with a last
> value which is less than 1% popular in the current code anyway?
>

Tom,

Now that I think about it, I don't really believe this arbitrary heuristic
is any good either, sorry.  What if you have a value that is just a bit
under 1% popular, but is being used in 50% of your queries in WHERE clause
with equality comparison?  Without this value in the MCV list the planner
will likely use SeqScan instead of an IndexScan that might be more
appropriate here.  I think we are much better off if we don't touch this
aspect of the current code.

What was your motivation to introduce some limit at the bottom anyway?  If
that was to prevent accidental division by zero, then an explicit check on
denominator not being 0 seems to me like a better safeguard than this.

Regards.
--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Alex Shulgin
On Sat, Apr 2, 2016 at 11:41 PM, Tom Lane  wrote:

> "Shulgin, Oleksandr"  writes:
> > On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane  wrote:
> >> Yeah, I don't much like that either.  But I don't think we can avoid
> >> some refactoring there; as designed, conversion of an error message into
> >> user-visible form is too tightly tied to receipt of the message.
>
> > True.  Attached is a v2 which addresses all of the points raised earlier
> I
> > believe.
>
> I took a closer look at what's going on here and realized that actually
> it's not that hard to decouple the message-building routine from the
> PGconn state, because mostly it works with fields it's extracting out
> of the PGresult anyway.  The only piece of information that's lacking
> is conn->last_query.  I propose therefore that instead of doing it like
> this, we copy last_query into error PGresults.  This is strictly less
> added storage requirement than storing the whole verbose message would be,
> and it saves time compared to the v2 patch in the typical case where
> the application never does ask for an alternately-formatted error message.
> Plus we can actually support requests for any variant format, not only
> VERBOSE.
>
> Attached is a libpq-portion-only version of a patch doing it this way.
> I've not yet looked at the psql part of your patch.
>
> Comments?
>

Ah, neat, that's even better. :-)

What about regression tests?  My assumption was that we won't be able to
add them with the usual expected file approach, but that we also don't need
it that hard.  Everyone's in favor?

--
Alex


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-02 Thread Dilip Kumar
On Fri, Apr 1, 2016 at 2:09 PM, Andres Freund  wrote:

> One interesting thing to do would be to use -P1 during the test and see
> how much the performance varies over time.
>

I have run with -P option, I ran for 1200 second and set -P as 30 second,
and what I observed is that when its low its low throughout the run and
when its high, Its high for complete run.

Non Default Parameter:

shared_buffer 8GB
Max Connections 150

./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres

Test results are attached in result.tar file.

File details:
--
   1. head_run1.txt   --> 20 mins run on head
reading 1
   2. head_run2.txt   --> 20 mins run on head
reading 2
   3. head_patch_run1.txt--> 20 mins run on head + patch
reading 1
   4. head_patch_run2.txt--> 20 mins run on head + patch
reading 2
   5. head_pinunpin.txt--> 20 mins run on head +
pinunpin-cas-8.patch
   6. head_pinunpin_patch.txt --> 20 mins run on head +
pinunpin-cas-8.patch + patch reading

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


result.tar
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.6 Feature Freeze - April 8, 2016

2016-04-02 Thread Noah Misch
On Tue, Mar 15, 2016 at 11:35:31AM -0400, Robert Haas wrote:
> At the recent PostgreSQL developer meeting in Brussels, a consensus
> was reached that an early beta, leading to an on-time release, would
> be very desirable.  In particular, it was suggested that we should
> attempt to release PostgreSQL 9.6beta1 in May.  The release management
> team has determined that this will require a feature freeze in early
> April.  Accordingly, the release management has decided that all
> feature patches destined for PostgreSQL 9.6 must be committed no later
> than April 8, 2016.  Any patch not committed prior to 2016-04-09
> 00:00:00 GMT may not be committed to PostgreSQL 9.6 unless (a) it is a
> bug fix, (b) it represents essential cleanup of a previously-committed
> patch, or (c) the release management team has approved an extension to
> the deadline for that particular patch.
> 
> As of the time when feature freeze goes into effect, all patches
> remaining in the CommitFest will be either moved to the next
> CommitFest, if they are as of that time in a state of Needs Review or
> Ready for Committer; or marked as Returned with Feedback, if they are
> in a state of Waiting on Author.  The release management team
> encourages patch authors and reviewers, the CommitFest manager (David
> Steele), and all other community members to keep the status of each
> patch in the CommitFest accurate throughout the CommitFest, and
> particularly as the feature freeze deadline approaches.
> 
> The release management team anticipates approving exceptions to the
> feature freeze deadline only if the following criteria are met: (1)
> the extension is requested by a committer who actually intends to
> commit the patch before the extension lapses; (2) the release
> management team believes that the patch has sufficient consensus to
> proceed with it; (3) the release management team believes that
> accepting the patch will not destabilize the tree or otherwise
> compromise the PostgreSQL community's ability to get a beta out the
> door on schedule; and (4) the proposed extension does not extend
> beyond April 15, 2016.

Reminder: the PostgreSQL 9.6 feature freeze takes effect in under six days, at
2016-04-09 00:00:00 GMT.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Refer to a TOKEN_USER payload as a "token user, " not as a "user

2016-04-02 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Apr 01, 2016 at 11:07:01PM -0400, Tom Lane wrote:
> > Stephen Frost  writes:
> > > * Noah Misch (n...@leadboat.com) wrote:
> > >> I see some advantages of writing "TokenUser", as you propose.  However, 
> > >> our
> > >> error style guide says "Avoid mentioning called function names, either;
> > >> instead say what the code was trying to do."  Mentioning an enumerator 
> > >> name is
> > >> morally similar to mentioning a function name.
> > 
> > > That's a fair point, but it doesn't mean we should use a different
> > > spelling for the enumerator name to avoid that piece of the policy.  I
> > > certianly don't see "token user" as saying "what the code was trying to
> > > do" in this case.
> > 
> > FWIW, "token user" conveys entirely inappropriate, politically incorrect
> > connotations to me ;-).  I don't have any great suggestions on what to use
> > instead, but I share Stephen's unhappiness with the wording as-committed.
> 
> The wording in GetTokenUser() and AddUserToTokenDacl() seems fine; let's
> standardize on that.  Also, every GetTokenUser() failure has been yielding two
> messages, the second contributing no further detail.  I'll reduce that to the
> usual one message per failure.

This approach works for me.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] pgsql: Refer to a TOKEN_USER payload as a "token user, " not as a "user

2016-04-02 Thread Noah Misch
On Fri, Apr 01, 2016 at 11:07:01PM -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > * Noah Misch (n...@leadboat.com) wrote:
> >> I see some advantages of writing "TokenUser", as you propose.  However, our
> >> error style guide says "Avoid mentioning called function names, either;
> >> instead say what the code was trying to do."  Mentioning an enumerator 
> >> name is
> >> morally similar to mentioning a function name.
> 
> > That's a fair point, but it doesn't mean we should use a different
> > spelling for the enumerator name to avoid that piece of the policy.  I
> > certianly don't see "token user" as saying "what the code was trying to
> > do" in this case.
> 
> FWIW, "token user" conveys entirely inappropriate, politically incorrect
> connotations to me ;-).  I don't have any great suggestions on what to use
> instead, but I share Stephen's unhappiness with the wording as-committed.

The wording in GetTokenUser() and AddUserToTokenDacl() seems fine; let's
standardize on that.  Also, every GetTokenUser() failure has been yielding two
messages, the second contributing no further detail.  I'll reduce that to the
usual one message per failure.

nm
diff --git a/src/common/exec.c b/src/common/exec.c
index ec8c655..d736b02 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -674,10 +674,7 @@ AddUserToTokenDacl(HANDLE hToken)
 
/* Get the current user SID */
if (!GetTokenUser(hToken, ))
-   {
-   log_error("could not get token user: error code %lu", 
GetLastError());
-   goto cleanup;
-   }
+   goto cleanup;   /* callee printed a message */
 
/* Figure out the size of the new ACL */
dwNewAclSize = asi.AclBytesInUse + sizeof(ACCESS_ALLOWED_ACE) +
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 2751183..f21056e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1242,8 +1242,8 @@ pg_SSPI_recvauth(Port *port)
 
if (!GetTokenInformation(token, TokenUser, NULL, 0, ) && 
GetLastError() != 122)
ereport(ERROR,
-   (errmsg_internal("could not get token user size: error 
code %lu",
-GetLastError(;
+   (errmsg_internal("could not get token 
information buffer size: error code %lu",
+
GetLastError(;
 
tokenuser = malloc(retlen);
if (tokenuser == NULL)
@@ -1252,8 +1252,8 @@ pg_SSPI_recvauth(Port *port)
 
if (!GetTokenInformation(token, TokenUser, tokenuser, retlen, ))
ereport(ERROR,
-   (errmsg_internal("could not get token user: 
error code %lu",
-
GetLastError(;
+ (errmsg_internal("could not get token information: error code 
%lu",
+  GetLastError(;
 
CloseHandle(token);
 
diff --git a/src/port/win32security.c b/src/port/win32security.c
index d3eba9a..ab9cd67 100644
--- a/src/port/win32security.c
+++ b/src/port/win32security.c
@@ -248,14 +248,14 @@ pgwin32_get_dynamic_tokeninfo(HANDLE token, 
TOKEN_INFORMATION_CLASS class,
if (GetTokenInformation(token, class, NULL, 0, ))
{
snprintf(errbuf, errsize,
-"could not get token information: got zero 
size\n");
+"could not get token information buffer size: got zero 
size\n");
return FALSE;
}
 
if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
{
snprintf(errbuf, errsize,
-"could not get token information: error code 
%lu\n",
+"could not get token information buffer size: error 
code %lu\n",
 GetLastError());
return FALSE;
}
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 416829d..dcafcf1 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -904,7 +904,7 @@ current_windows_user(const char **acct, const char **dom)
if (!GetTokenInformation(token, TokenUser, NULL, 0, ) && 
GetLastError() != 122)
{
fprintf(stderr,
-   _("%s: could not get token user size: error 
code %lu\n"),
+   _("%s: could not get token information buffer 
size: error code %lu\n"),
progname, GetLastError());
exit(2);
}
@@ -912,7 +912,7 @@ current_windows_user(const char **acct, const char **dom)
if (!GetTokenInformation(token, TokenUser, tokenuser, retlen, ))
{
fprintf(stderr,
-   _("%s: could not get token user: error code 
%lu\n"),

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sat, Apr 2, 2016 at 8:57 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
> On Apr 2, 2016 18:38, "Tom Lane"  wrote:
>
>> I did not like the fact that the compute_scalar_stats logic
>> would allow absolutely anything into the MCV list once num_hist falls
>> below 2. I think it's important that we continue to reject values that
>> are only seen once in the sample, because there's no very good reason to
>> think that they are MCVs and not just infrequent values that by luck
>> appeared in the sample.
>
> In my understanding we only put a value in the track list if we've seen it
> at least twice, no?

This is actually the case for compute_scalar_stats, but not for
compute_distinct_stats.  In the latter case we can still have
track[i].count == 1, but we can also break out of the loop if we see the
first tracked item like that.

>> Before I noticed the regression failure, I'd been thinking that maybe
it'd
>> be better if the decision rule were not "at least 100+x% of the average
>> frequency of this value and later ones", but "at least 100+x% of the
>> average frequency of values after this one".
>
> Hm, sounds pretty similar to what I wanted to achieve, but better
> formalized.
>
>> With that formulation, we're
>> not constrained as to the range of x.  Now, if there are *no* values
after
>> this one, then this way needs an explicit special case in order not to
>> compute 0/0; but the preceding para shows that we need a special case for
>> the last value anyway.
>>
>> So, attached is a patch rewritten along these lines.  I used 50% rather
>> than 25% as the new cutoff percentage --- obviously it should be higher
>> in this formulation than before, but I have no idea if that particular
>> number is good or we should use something else.  Also, the rule for the
>> last value is "at least 1% of the non-null samples".  That's a pure guess
>> as well.
>>
>> I do not have any good corpuses of data to try this on.  Can folks who
>> have been following this thread try it on their data and see how it
>> does?  Also please try some other multipliers besides 1.5, so we can
>> get a feeling for where that cutoff should be placed.
>
> Expect me to run it on my pet db early next week. :-)

I was trying to come up with some examples where 50% could be a good or a
bad choice and then I noticed that we might be able to turn it it the other
way round: instead of inventing an arbitrary limit at the MCVs frequency we
could use the histogram as the criteria for a candidate MCV to be
considered "common enough".  If we can prove that the value would produce
duplicates in the histogram, we should rather put it in the MCV list
(unless it's already fully occupied, then we can't do anything).

A value is guaranteed to produce a duplicate if it has appeared at least
2*hfreq+1 times in the sample (hfreq from your version of the patch, which
is recalculated on every loop iteration).  I could produce an updated patch
on Monday or anyone else following this discussion should be able to do
that.

This approach would be a huge win in my opinion, because this way we can
avoid all the arbitrariness of that .25 / .50 multiplier.  Otherwise there
might be (valid) complaints that for some data .40 (or .60) is a better
fit, but we have already hard-coded something and there would be no easy
way to improve situation for some users while avoiding to break it for the
rest (unless we introduce a per-attribute configurable parameter like
statistics_target for this multiplier, which I'd like to avoid even
thinking about ;-)

While we don't (well, can't) build a histogram in the
compute_distinct_stats variant, we could also apply the above mind trick
there for the same reason and to make the output of both functions more
consistent (and to have less maintenance burden between the variants).  And
anyway it would be rather surprising to see that depending on the presence
of an order operator for the type, the resulting MCV lists after the
ANALYZE would be different (I mean not only due to the random nature of the
sample).

I'm not sure yet about the 1% rule for the last value, but would also love
to see if we can avoid the arbitrary limit here.  What happens with a last
value which is less than 1% popular in the current code anyway?

Cheers!
--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Peter Geoghegan
On Thu, Feb 11, 2016 at 9:50 AM, Robert Haas  wrote:
>> Actually, what'd be really handy IMO is something to regurgitate the
>> most recent error in verbose mode, without making a permanent session
>> state change.  Something like
>>
>> regression=# insert into bar values(1);
>> ERROR:  insert or update on table "bar" violates foreign key constraint 
>> "bar_f1_fkey"
>> DETAIL:  Key (f1)=(1) is not present in table "foo".
>> regression=# \saywhat
>> ERROR:  23503: insert or update on table "bar" violates foreign key 
>> constraint "bar_f1_fkey"
>> DETAIL:  Key (f1)=(1) is not present in table "foo".
>> SCHEMA NAME:  public
>> TABLE NAME:  bar
>> CONSTRAINT NAME:  bar_f1_fkey
>> LOCATION:  ri_ReportViolation, ri_triggers.c:3326
>> regression=#
>
> Wow, that's a fabulous idea.  I see Oleksandr has tried to implement
> it, although I haven't looked at the patch.  But I think this would be
> REALLY helpful.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:22 PM, Peter Geoghegan  wrote:
> On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark  wrote:
>> There are also some weird cases in this list where there's a
>> significant regression at 32MB but not at 8MB. I would like to see
>> 16MB and perhaps 12MB and 24MB. They would help understand if these
>> are just quirks or there's a consistent pattern.
>
> I'll need to drill down to trace_sort output to see what happened there.

I looked into this.

I too noticed that queries like "SELECT a FROM int_test UNION SELECT a
FROM int_test_padding" looked strangely faster for 128MB +
high_cardinality_almost_asc + i5 for master branch. This made the
patch look relatively bad for the test with those exact properties
only; the patch was faster with both lower and higher work_mem
settings than 128MB. There was a weird spike in performance for the
master branch only.

Having drilled down to trace_sort output, I think I know roughly why.
I see output like this:

1459308434.753 2016-03-30 05:27:14 CEST STATEMENT:  SELECT * FROM
(SELECT a FROM int_test UNION SELECT a FROM int_test_padding OFFSET
1e10) ff;

I think that this is invalid, because the query was intended as this:

SELECT * FROM (SELECT * FROM (SELECT a FROM int_test UNION SELECT a
FROM int_test_padding) gg OFFSET 1e10) ff;

This would have controlled for client overhead, per my request to
Tomas, without altering the "underlying query" that you see in the
final spreadsheet. I don't have an exact explanation for why you'd see
this spike at 128MB for the master branch but not the other at the
moment, but it seems like that one test is basically invalid, and
should be discarded. I suspect that the patch didn't see its own
similar spike due to my changes to cost_sort(), which reflected that
sorts don't need to do so much expensive random I/O.

This is the only case that I saw that was not more or less consistent
with my expectations, which is good.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 7:31 AM, Tomas Vondra
 wrote:
> So, I do have the results from both machines - I've attached the basic
> comparison spreadsheets, the complete summary is available here:
>
>https://github.com/tvondra/sort-benchmark
>
> The database log also includes the logs for trace_sort=on for each query
> (use the timestamp logged for each query in the spreadsheet to locate the
> right section of the log).

Thanks!

Each row in these spreadsheets shows what looks like a multimodal
distribution for the patch (if you focus on the actual run times, not
the ratios). IOW, you can clearly see the regressions are only where
master has its best case, and the patch its worst case; as the
work_mem increases for each benchmark case for the patch, by far the
largest improvement is usually seen as we cross the CPU cache
threshold. Master gets noticeably slower as work_mem goes from 8MB to
32MB, but the patch gets far far faster. Things continue to improve
for patched in absolute terms and especially relative to master
following further increases in work_mem, but not nearly as
dramatically as that first increment (unless we have lots of padding,
which makes the memtuples heap itself much smaller, so it happens one
step later). Master shows a slow decline at and past 32MB of work_mem.
If the test hardware had a larger L3 cache, we might expect to notice
a second big drop, but this hardware doesn't have the enormous L3
cache sizes of new Xeon processors (e.g. 32MB, 45MB).

> While it might look like I'm somehow opposed to this patch series, that's
> mostly because we tend to look only at the few cases that behave poorly.
>
> So let me be clear: I do think the patch seems to be a significant
> performance improvement for most of the queries, and I'm OK with accepting a
> few regressions (particularly if we agree those are pathological cases,
> unlikely to happen in real-world workloads).
>
> It's quite rare that a patch is a universal win without regressions, so it's
> important to consider how likely those regressions are and what's the net
> effect of the patch - and the patch seems to be a significant improvement in
> most cases (and regressions limited to pathological or rare corner cases).
>
> I don't think those are reasons not to push this into 9.6.

I didn't think that you opposed the patch. In fact, you did the right
thing by focussing on the low-end regressions, as I've said. I was
probably too concerned about Robert failing to consider that they were
not representative, particularly with regard to how small the
memtuples heap could be relative to the CPU cache; blame it on how
close I've become to this problem. I'm pretty confident that Robert
can be convinced that these do not matter enough to not commit the
patch. In any case, I'm pretty confident that I cannot fix any
remaining regressions.

> I haven't done any thorough investigation of the results yet, but in general
> it seems the results from both machines are quite similar - the numbers are
> different, but the speedup/slowdown patterns are mostly the same (with some
> exceptions that I'd guess are due to HW differences).

I agree. What we clearly see is the advantages of quicksort being
cache oblivious, especially relative to master's use of a heap. That
advantage becomes pronounced at slightly different points in each
case, but the overall picture is the same. This pattern demonstrates
why a cache oblivious algorithm is so useful in general -- we don't
have to care about tuning for that. As important as this is for serial
sorts, it's even more important for parallel sorts, where parallel
workers compete for memory bandwidth, and where it's practically
impossible to build a cost model for CPU cache size + memory use +
nworkers.

> The slowdown/speedup patterns (red/green cells in the spreadheets) are also
> similar to those collected originally. Some timings are much lower,
> presumably thanks to using the "OFFSET 1e10" pattern, but the patterns are
> the same.

I think it's notable that this made things more predictable, and made
the benefits clearer.

> The one thing that surprised me a bit is that
>
> replacement_sort_mem=64
>
> actually often made the results considerably worse in many cases. A common
> pattern is that the slowdown "spreads" to nearby cells - the are many
> queries where the 8MB case is 1:1 with master and 32MB is 1.5:1 (i.e. takes
> 1.5x more time), and setting replacement_sort_mem=64 just slows down the 8MB
> case.
>
> In general, replacement_sort_mem=64 seems to only affect the 8MB case, and
> in most cases it results in 100% slowdown (so 2x as long queries).

To be clear, for the benefit of other people: replacement_sort_mem=64
makes the patch never use a replacement selection heap, even at the
lowest tested work_mem setting of 8MB.

This is exactly what I expected. When replacement_sort_mem is the
proposed default of 16MB, it literally has zero impact on how the
patch behaves 

Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Tom Lane
I wrote:
> Attached is a libpq-portion-only version of a patch doing it this way.
> I've not yet looked at the psql part of your patch.

Here's an update for the psql side.

regards, tom lane

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 385cb59..330127a 100644
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** Tue Oct 26 21:40:57 CEST 1999
*** 1718,1723 
--- 1718,1737 
  
  

+ \errverbose
+ 
+ 
+ 
+ Repeats the most recent server error or notice message at maximum
+ verbosity, as though VERBOSITY were set
+ to verbose and SHOW_CONTEXT were
+ set to always.
+ 
+ 
+   
+ 
+ 
+   
  \f [ string ]
  
  
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 50dc43b..3401b51 100644
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 822,827 
--- 822,849 
  		}
  	}
  
+ 	/* \errverbose -- display verbose message from last failed query */
+ 	else if (strcmp(cmd, "errverbose") == 0)
+ 	{
+ 		if (pset.last_error_result)
+ 		{
+ 			char	   *msg;
+ 
+ 			msg = PQresultVerboseErrorMessage(pset.last_error_result,
+ 			  PQERRORS_VERBOSE,
+ 			  PQSHOW_CONTEXT_ALWAYS);
+ 			if (msg)
+ 			{
+ psql_error("%s", msg);
+ PQfreemem(msg);
+ 			}
+ 			else
+ puts(_("out of memory"));
+ 		}
+ 		else
+ 			puts(_("There was no previous error."));
+ 	}
+ 
  	/* \f -- change field separator */
  	else if (strcmp(cmd, "f") == 0)
  	{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 892058e..a2a07fb 100644
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
*** AcceptResult(const PGresult *result)
*** 497,502 
--- 497,529 
  }
  
  
+ /*
+  * ClearOrSaveResult
+  *
+  * If the result represents an error, remember it for possible display by
+  * \errverbose.  Otherwise, just PQclear() it.
+  */
+ static void
+ ClearOrSaveResult(PGresult *result)
+ {
+ 	if (result)
+ 	{
+ 		switch (PQresultStatus(result))
+ 		{
+ 			case PGRES_NONFATAL_ERROR:
+ 			case PGRES_FATAL_ERROR:
+ if (pset.last_error_result)
+ 	PQclear(pset.last_error_result);
+ pset.last_error_result = result;
+ break;
+ 
+ 			default:
+ PQclear(result);
+ break;
+ 		}
+ 	}
+ }
+ 
  
  /*
   * PSQLexec
*** PSQLexec(const char *query)
*** 548,554 
  
  	if (!AcceptResult(res))
  	{
! 		PQclear(res);
  		res = NULL;
  	}
  
--- 575,581 
  
  	if (!AcceptResult(res))
  	{
! 		ClearOrSaveResult(res);
  		res = NULL;
  	}
  
*** PSQLexecWatch(const char *query, const p
*** 590,596 
  
  	if (!AcceptResult(res))
  	{
! 		PQclear(res);
  		return 0;
  	}
  
--- 617,623 
  
  	if (!AcceptResult(res))
  	{
! 		ClearOrSaveResult(res);
  		return 0;
  	}
  
*** SendQuery(const char *query)
*** 1077,1087 
  		if (PQresultStatus(results) != PGRES_COMMAND_OK)
  		{
  			psql_error("%s", PQerrorMessage(pset.db));
! 			PQclear(results);
  			ResetCancelConn();
  			goto sendquery_cleanup;
  		}
! 		PQclear(results);
  		transaction_status = PQtransactionStatus(pset.db);
  	}
  
--- 1104,1114 
  		if (PQresultStatus(results) != PGRES_COMMAND_OK)
  		{
  			psql_error("%s", PQerrorMessage(pset.db));
! 			ClearOrSaveResult(results);
  			ResetCancelConn();
  			goto sendquery_cleanup;
  		}
! 		ClearOrSaveResult(results);
  		transaction_status = PQtransactionStatus(pset.db);
  	}
  
*** SendQuery(const char *query)
*** 1102,1112 
  			if (PQresultStatus(results) != PGRES_COMMAND_OK)
  			{
  psql_error("%s", PQerrorMessage(pset.db));
! PQclear(results);
  ResetCancelConn();
  goto sendquery_cleanup;
  			}
! 			PQclear(results);
  			on_error_rollback_savepoint = true;
  		}
  	}
--- 1129,1139 
  			if (PQresultStatus(results) != PGRES_COMMAND_OK)
  			{
  psql_error("%s", PQerrorMessage(pset.db));
! ClearOrSaveResult(results);
  ResetCancelConn();
  goto sendquery_cleanup;
  			}
! 			ClearOrSaveResult(results);
  			on_error_rollback_savepoint = true;
  		}
  	}
*** SendQuery(const char *query)
*** 1202,1208 
  			if (PQresultStatus(svptres) != PGRES_COMMAND_OK)
  			{
  psql_error("%s", PQerrorMessage(pset.db));
! PQclear(svptres);
  OK = false;
  
  PQclear(results);
--- 1229,1235 
  			if (PQresultStatus(svptres) != PGRES_COMMAND_OK)
  			{
  psql_error("%s", PQerrorMessage(pset.db));
! ClearOrSaveResult(svptres);
  OK = false;
  
  PQclear(results);
*** SendQuery(const char *query)
*** 1213,1219 
  		}
  	}
  
! 	PQclear(results);
  
  	/* Possible microtiming output */
  	if (pset.timing)
--- 1240,1246 
  		}
  	}
  
! 	ClearOrSaveResult(results);
  
  	/* Possible microtiming output */
  	if (pset.timing)
*** 

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark  wrote:
> These are the averages across all queries across all data sets for the
> run-time for the patch versus master (not patched 64 which I think is
> the replacement_sort_mem=64MB which appears to not be a win). So even
> in the less successful cases on average quicksort is faster than
> replacement selection.

It's actually replacement_sort_mem=64 (64KB -- effectively disabled).
So where that case does better or worse, which can only be when
work_mem=8MB in practice, that's respectively good or bad for
replacement selection. So, typically RS does better when there are
presorted inputs with a positive (not inverse/DESC) correlation, and
there is little work_mem. As I've said, this is where the CPU cache is
large enough to fit the entire memtuples heap.

"Padded" cases are mostly bad because they make the memtuples heap
relatively small in each case. So with work_mem=32MB, you get a
memtuples heap structure similar to work_mem=8MB. The padding pushes
things out a bit further, which favors master.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark  wrote:
> There are also some weird cases in this list where there's a
> significant regression at 32MB but not at 8MB. I would like to see
> 16MB and perhaps 12MB and 24MB. They would help understand if these
> are just quirks or there's a consistent pattern.

I'll need to drill down to trace_sort output to see what happened there.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Greg Stark
On Sat, Apr 2, 2016 at 3:31 PM, Tomas Vondra
 wrote:

> So let me be clear: I do think the patch seems to be a significant
> performance improvement for most of the queries, and I'm OK with accepting a
> few regressions (particularly if we agree those are pathological cases,
> unlikely to happen in real-world workloads).

The ultra-short version of this is:

8MB: 0.98
32MB:   0.79
128MB: 0.63
512MB: 0.51
1GB: 0.42

These are the averages across all queries across all data sets for the
run-time for the patch versus master (not patched 64 which I think is
the replacement_sort_mem=64MB which appears to not be a win). So even
in the less successful cases on average quicksort is faster than
replacement selection.

But selecting just the cases where 8MB is significantly slower than
master it does look like the "padding" data sets are endemic.

On the one hand that's a very realistic use-case where I think a lot
of users find themselves. I know in my days as a web developer I
typically threw a lot of columns into my queries and through a lot of
joins and order by and then left it to the application to pick through
the recordsets that were returned for the columns that were of
interest. The tuples being sorted were probably huge.

On the other hand perhaps this is something better tackled by the
planner. If the planner can arrange sorts to happen when the rows are
narrower that would be a a bigger win than trying to move a lot of
data around like this. (In the extreme if it were possible to replace
unnecessary columns by the tid and then refetching them later though
that's obviously more than a little tricky to do effectively.)

There are also some weird cases in this list where there's a
significant regression at 32MB but not at 8MB. I would like to see
16MB and perhaps 12MB and 24MB. They would help understand if these
are just quirks or there's a consistent pattern.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standalone backend PANICs during recovery

2016-04-02 Thread Alvaro Herrera
Bernd Helmle wrote:
> While investigating a problem on a streaming hot standby instance at a
> customer site, i got the following when using a standalone backend:
> 
> PANIC:  btree_xlog_delete_get_latestRemovedXid: cannot operate with
> inconsistent data
> CONTEXT:  xlog redo delete: index 1663/65588/65625; iblk 11, heap
> 1663/65588/65613;
> 
> The responsible PANIC is triggered here:
> 
> src/backend/access/nbtree/nbtxlog.c:555
> 
> btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)

This PANIC was introduced in the commit below.  AFAICS your proposed
patch and rationale are correct.


commit f786e91a75b2f64527dcf321e754b6448fcad7fe
Author: Tom Lane 
AuthorDate: Fri Aug 3 15:41:18 2012 -0400
CommitDate: Fri Aug 3 15:41:18 2012 -0400

Improve underdocumented btree_xlog_delete_get_latestRemovedXid() code.

As noted by Noah Misch, btree_xlog_delete_get_latestRemovedXid is
critically dependent on the assumption that it's examining a consistent
state of the database.  This was undocumented though, so the
seemingly-unrelated check for no active HS sessions might be thought to be
merely an optional optimization.  Improve comments, and add an explicit
check of reachedConsistency just to be sure.

This function returns InvalidTransactionId (thereby killing all HS
transactions) in several cases that are not nearly unlikely enough for my
taste.  This commit doesn't attempt to fix those deficiencies, just
document them.

Back-patch to 9.2, not from any real functional need but just to keep the
branches more closely synced to simplify possible future back-patching.



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Tom Lane
"Shulgin, Oleksandr"  writes:
> On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane  wrote:
>> Yeah, I don't much like that either.  But I don't think we can avoid
>> some refactoring there; as designed, conversion of an error message into
>> user-visible form is too tightly tied to receipt of the message.

> True.  Attached is a v2 which addresses all of the points raised earlier I
> believe.

I took a closer look at what's going on here and realized that actually
it's not that hard to decouple the message-building routine from the
PGconn state, because mostly it works with fields it's extracting out
of the PGresult anyway.  The only piece of information that's lacking
is conn->last_query.  I propose therefore that instead of doing it like
this, we copy last_query into error PGresults.  This is strictly less
added storage requirement than storing the whole verbose message would be,
and it saves time compared to the v2 patch in the typical case where
the application never does ask for an alternately-formatted error message.
Plus we can actually support requests for any variant format, not only
VERBOSE.

Attached is a libpq-portion-only version of a patch doing it this way.
I've not yet looked at the psql part of your patch.

Comments?

regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2328d8f..80f7014 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** char *PQresultErrorMessage(const PGresul
*** 2691,2696 
--- 2691,2738 

   
  
+  
+   
+PQresultVerboseErrorMessage
+
+ PQresultVerboseErrorMessage
+
+   
+ 
+   
+
+ Returns a reformatted version of the error message associated with
+ a PGresult object.
+ 
+ char *PQresultVerboseErrorMessage(const PGresult *res,
+   PGVerbosity verbosity,
+   PGContextVisibility show_context);
+ 
+ In some situations a client might wish to obtain a more detailed
+ version of a previously-reported error.
+ PQresultVerboseErrorMessage addresses this need
+ by computing the message that would have been produced
+ by PQresultErrorMessage if the specified
+ verbosity settings had been in effect for the connection when the
+ given PGresult was generated.  If
+ the PGresult is not an error result,
+ PGresult is not an error result is reported instead.
+ The returned string includes a trailing newline.
+
+ 
+
+ Unlike most other functions for extracting data from
+ a PGresult, the result of this function is a freshly
+ allocated string.  The caller must free it
+ using PQfreemem() when the string is no longer needed.
+
+ 
+
+ A NULL return is possible if there is insufficient memory.
+
+   
+  
+ 
   
PQresultErrorFieldPQresultErrorField

*** PGVerbosity PQsetErrorVerbosity(PGconn *
*** 5582,5587 
--- 5624,5631 
mode includes all available fields.  Changing the verbosity does not
affect the messages available from already-existing
PGresult objects, only subsequently-created ones.
+   (But see PQresultVerboseErrorMessage if you
+   want to print a previous error with a different verbosity.)
   
  
 
*** PGContextVisibility PQsetErrorContextVis
*** 5622,5627 
--- 5666,5673 
affect the messages available from
already-existing PGresult objects, only
subsequently-created ones.
+   (But see PQresultVerboseErrorMessage if you
+   want to print a previous error with a different display mode.)
   
  
 
*** PQsetNoticeProcessor(PGconn *conn,
*** 6089,6096 
 receiver function is called.  It is passed the message in the form of
 a PGRES_NONFATAL_ERROR
 PGresult.  (This allows the receiver to extract
!individual fields using PQresultErrorField, or the complete
!preformatted message using PQresultErrorMessage.) The same
 void pointer passed to PQsetNoticeReceiver is also
 passed.  (This pointer can be used to access application-specific state
 if needed.)
--- 6135,6143 
 receiver function is called.  It is passed the message in the form of
 a PGRES_NONFATAL_ERROR
 PGresult.  (This allows the receiver to extract
!individual fields using PQresultErrorField, or complete
!preformatted messages using PQresultErrorMessage or
!PQresultVerboseErrorMessage.) The same
 void pointer passed to PQsetNoticeReceiver is also
 passed.  (This pointer can be used to access application-specific state
 if needed.)
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index c69a4d5..21dd772 100644
*** 

Re: [HACKERS] Batch update of indexes

2016-04-02 Thread Konstantin Knizhnik

On 04/02/2016 09:57 PM, Tom Lane wrote:

Konstantin Knizhnik  writes:

Attached please find patch for "ALTER INDEX ... WHERE ..." clause.
It is now able to handle all three possible situations:
1. Making index partial (add WHERE condition to the ordinary index)
2. Extend partial index range (less restricted index predicate)
3. Arbitrary change of partial index predicate

I've not been following this thread previously, but this proposal
scares me quite a lot.  I am certain there are places in our code
that assume that the properties of an index don't change after it's
been created.  One area that this almost certainly breaks is HOT updates:
adding a previously-unindexed column to an index predicate might break
existing HOT chains, and I see nothing in this patch that could deal
with that.  I seem to recall there are other places that would be
broken by changing an index's DDL definition after creation, but can't
recall specifics right now.

I am also, frankly, not seeing a use-case for this functionality that
would justify trying to find and remove those assumptions.

There's a lot of things I don't care for about the way the patch is
written, in particular its willingness to use SPI (which opens a lot of
potential for search-path problems, failure to see uncommitted tuples,
etc).  But we need not get to that if we don't believe the functionality
can work.


Thank you for review, Tom.

I completely agree with all your arguments against this patch.
I have proposed this patch mostly as prove of concept.
Yes, I have not take in account hot updates and may be there are other possible 
issues which I not considered.

The main question is whether the proposed way of batch update of indexes is 
viable or it is conceptually wrong approach
(because it beaks assumption that index properties can't be changed or because 
it is not convenient to use...).

I hope that everybody agree that maintaining of indexes is the main limiting 
factor for insert speed.
If table has no indexes, then insert speed can be as high as disk write speed 
(100Mb/sec or 1000 for SSD).
So if size of record is about 10 bytes, then we can get about 10 millions TPS.
But presence of indexes will dramatically change this picture: if database is 
large enough so that even index can not fit in memory
and records are inserted in random key order, then each insert in index will 
require reading of 3-4 pages from random locations on the disk.
With average HDD positioning time 10 msec, we get 100 reads per second and ... 
20-30 TPS. It is just with one index.
If we have 10 indexes, then TPS can be less than fingers on a hand.

Certainly it is very pessimistic estimation.
But still it is true that we can not provide good insert speed if we have to 
update indexes immediately.
And without indexes we can not efficiently execute most of queries.

I do not see any way in Postgres to solve this problem now. The hack with 
creating materialized views requires a lot of extra time and space.
It will not work for really large table.

So we need some way to postpone insertion of new records in the index. Then we can do such insertion in background or in idle time (at night), try to use bulk insert if index implementation supports it (for example sorting records by key before insert can 
significantly increase locality and so improve speed of insert in index). But the principle moment here is that such delayed update of index violates the main RDBMS rule that results of query execution with and without indexes should be the same. The trick 
with partial indexes allows to eliminate this contradiction. But it requires more actions from user. So are users ready to do some exatra job just because of "idealogical" reasons? Because if user wants to have delayed update of indexes, then he actually 
approves that it is ok for him that query results may not include some most recent updates.


Another aspect is which database objects are allowed to be altered and which 
not. Right now with tables we can alter almost everything.
With indexes - almost nothing. It is assumed that index can always be reconstructed. But for very big table reconstruction of indexes from scratch will take unacceptable amount of time. So should we make it possible to alter some index characteristics 
which do not require to rebuild index from scratch (and it is definitely true for partial index predicate)? Or price of supporting it is so high, that it can not be compensated by obtained benefits?


So what do you think?
1. Should I continue work in this direction and fix all possible issues with 
hot updates,... to make it possible to alter partial index predicates and 
support batch inserts i this way?
2. Or it is better to just add extra option to the index, allowing it to be slightly out-of-sync? It will allow, for example, to eliminate pending list for GIN which can cause very significant degradation of query speed, while for most full-text search 
engine it is 

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Shulgin, Oleksandr
On Apr 2, 2016 18:38, "Tom Lane"  wrote:
>
> "Shulgin, Oleksandr"  writes:
> > On Apr 1, 2016 23:14, "Tom Lane"  wrote:
> >> Haven't looked at 0002 yet.
>
> > [crosses fingers] hope you'll have a chance to do that before feature
> > freeze for 9.6
>
> I studied this patch for awhile after rebasing it onto yesterday's
> commits.

Fantastic! I could not hope for a better reply :-)

> I did not like the fact that the compute_scalar_stats logic
> would allow absolutely anything into the MCV list once num_hist falls
> below 2. I think it's important that we continue to reject values that
> are only seen once in the sample, because there's no very good reason to
> think that they are MCVs and not just infrequent values that by luck
> appeared in the sample.

In my understanding we only put a value in the track list if we've seen it
at least twice, no?

> However, after I rearranged the tests there so
> that "if (num_hist >= 2)" only controlled whether to apply the 1/K limit,
> one of the regression tests started to fail:

Uh-oh.

> there's a place in
> rowsecurity.sql that expects that if a column contains nothing but several
> instances of a single value, that value will be recorded as a lone MCV.
> Now this isn't a particularly essential thing for that test, but it still
> seems like a good property for ANALYZE to have.

No objection here.

> The reason it's failing,
> of course, is that the test as written cannot possibly accept the last
> (or only) value.

Yeah, this I would expect from such a change.

> Before I noticed the regression failure, I'd been thinking that maybe it'd
> be better if the decision rule were not "at least 100+x% of the average
> frequency of this value and later ones", but "at least 100+x% of the
> average frequency of values after this one".

Hm, sounds pretty similar to what I wanted to achieve, but better
formalized.

> With that formulation, we're
> not constrained as to the range of x.  Now, if there are *no* values after
> this one, then this way needs an explicit special case in order not to
> compute 0/0; but the preceding para shows that we need a special case for
> the last value anyway.
>
> So, attached is a patch rewritten along these lines.  I used 50% rather
> than 25% as the new cutoff percentage --- obviously it should be higher
> in this formulation than before, but I have no idea if that particular
> number is good or we should use something else.  Also, the rule for the
> last value is "at least 1% of the non-null samples".  That's a pure guess
> as well.
>
> I do not have any good corpuses of data to try this on.  Can folks who
> have been following this thread try it on their data and see how it
> does?  Also please try some other multipliers besides 1.5, so we can
> get a feeling for where that cutoff should be placed.

Expect me to run it on my pet db early next week. :-)

Many thanks!
--
Alex


Re: [HACKERS] Batch update of indexes

2016-04-02 Thread Tom Lane
Konstantin Knizhnik  writes:
> Attached please find patch for "ALTER INDEX ... WHERE ..." clause.
> It is now able to handle all three possible situations:
> 1. Making index partial (add WHERE condition to the ordinary index)
> 2. Extend partial index range (less restricted index predicate)
> 3. Arbitrary change of partial index predicate

I've not been following this thread previously, but this proposal
scares me quite a lot.  I am certain there are places in our code
that assume that the properties of an index don't change after it's
been created.  One area that this almost certainly breaks is HOT updates:
adding a previously-unindexed column to an index predicate might break
existing HOT chains, and I see nothing in this patch that could deal
with that.  I seem to recall there are other places that would be
broken by changing an index's DDL definition after creation, but can't
recall specifics right now.

I am also, frankly, not seeing a use-case for this functionality that
would justify trying to find and remove those assumptions.

There's a lot of things I don't care for about the way the patch is
written, in particular its willingness to use SPI (which opens a lot of
potential for search-path problems, failure to see uncommitted tuples,
etc).  But we need not get to that if we don't believe the functionality
can work.

> This patch includes src/bin/insbench utility for testing insert 
> performance. It can be easily excluded from the patch to reduce it size.

C++ is not likely to get accepted into our tree ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-04-02 Thread Andrew Dunstan



On 04/01/2016 11:42 AM, Daniel Verite wrote:

Andrew Dunstan wrote:


If someone can make a good case that this is going to be of
general use I'll happily go along, but I haven't seen one so far.

About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237

which essentially is: how to import contents from a file without any
particular interpretation of any character?\



There is so much wrong with this it's hard to know where to start.

Inserting the whole contents of a text file unchanged is insanely easy 
in psql.


\set file `cat /path/to/file`
insert into mytable(contents) values(:'file');

What is more everyone on SO missed the fact that CSV mode gives you very 
considerable control over the quote, delimiter and null settings.


See for example 
 which 
has this example for handling files consisting of 1 json document per line:


copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';

psql's \copy will work just the same way

(I noticed with amusement this week that CitusData is using pretty much 
exactly this in one of their examples.)




With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)
or the equivalent COPY.
If it's a binary column, that works just the same.



It would be fairly simple to invent a binary mechanism that did the 
equivalent of the above insert. All without any change to SQL or the 
backend at all.





Without this, it's not obvious at all how this result can be
achieved without resorting to external preprocessing,
and assuming the availability of such preprocessing tools
in the environment. Notwithstanding the fact that the
solution proposed on SO (doubling backslashes with sed)
doesn't even work if the file contains tabs, as they would be
interpreted as field separators, even if the copy target has only
one column. You can change the delimiter with COPY but AFAIK
you can't tell that there is none.



There is arguably a good case for allowing a null delimiter. But that SO 
page is just a terrible piece of misinformation, as far too often 
happens in my experience.


And I am still waiting for a non-psql use case. But I don't expect to 
see one, precisely because most clients have no difficulty at all in 
handling binary data.


cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Transactional enum additions - was Re: [HACKERS] Alter or rename enum value

2016-04-02 Thread Tom Lane
Andrew Dunstan  writes:
> Looking at this briefly. It looks like the check should be called from 
> enum_in() and enum_recv(). What error should be raised if the enum row's 
> xmin isn't committed? ERRCODE_FEATURE_NOT_SUPPORTED? or maybe 
> ERRCODE_DATA_EXCEPTION? I don't see anything that fits very well.

ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is something we use in some
other places where the meaning is "just wait awhile, dude".  Or you
could invent a new ERRCODE.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Tom Lane
"Shulgin, Oleksandr"  writes:
> On Apr 1, 2016 23:14, "Tom Lane"  wrote:
>> Haven't looked at 0002 yet.

> [crosses fingers] hope you'll have a chance to do that before feature
> freeze for 9.6

I studied this patch for awhile after rebasing it onto yesterday's
commits.  I did not like the fact that the compute_scalar_stats logic
would allow absolutely anything into the MCV list once num_hist falls
below 2.  I think it's important that we continue to reject values that
are only seen once in the sample, because there's no very good reason to
think that they are MCVs and not just infrequent values that by luck
appeared in the sample.  However, after I rearranged the tests there so
that "if (num_hist >= 2)" only controlled whether to apply the 1/K limit,
one of the regression tests started to fail: there's a place in
rowsecurity.sql that expects that if a column contains nothing but several
instances of a single value, that value will be recorded as a lone MCV.
Now this isn't a particularly essential thing for that test, but it still
seems like a good property for ANALYZE to have.  The reason it's failing,
of course, is that the test as written cannot possibly accept the last
(or only) value.

Before I noticed the regression failure, I'd been thinking that maybe it'd
be better if the decision rule were not "at least 100+x% of the average
frequency of this value and later ones", but "at least 100+x% of the
average frequency of values after this one".  With that formulation, we're
not constrained as to the range of x.  Now, if there are *no* values after
this one, then this way needs an explicit special case in order not to
compute 0/0; but the preceding para shows that we need a special case for
the last value anyway.

So, attached is a patch rewritten along these lines.  I used 50% rather
than 25% as the new cutoff percentage --- obviously it should be higher
in this formulation than before, but I have no idea if that particular
number is good or we should use something else.  Also, the rule for the
last value is "at least 1% of the non-null samples".  That's a pure guess
as well.

I do not have any good corpuses of data to try this on.  Can folks who
have been following this thread try it on their data and see how it
does?  Also please try some other multipliers besides 1.5, so we can
get a feeling for where that cutoff should be placed.

regards, tom lane

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 44a4b3f..a2c606b 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
*** compute_distinct_stats(VacAttrStatsP sta
*** 2120,2128 
  		 * we are able to generate a complete MCV list (all the values in the
  		 * sample will fit, and we think these are all the ones in the table),
  		 * then do so.  Otherwise, store only those values that are
! 		 * significantly more common than the (estimated) average. We set the
! 		 * threshold rather arbitrarily at 25% more than average, with at
! 		 * least 2 instances in the sample.
  		 */
  		if (track_cnt < track_max && toowide_cnt == 0 &&
  			stats->stadistinct > 0 &&
--- 2120,2138 
  		 * we are able to generate a complete MCV list (all the values in the
  		 * sample will fit, and we think these are all the ones in the table),
  		 * then do so.  Otherwise, store only those values that are
! 		 * significantly more common than the ones we omit.  We determine that
! 		 * by considering the values in frequency order, and accepting each
! 		 * one if it is at least 50% more common than the average among the
! 		 * values after it.  The 50% threshold is somewhat arbitrary.
! 		 *
! 		 * Note that the 50% rule will never accept a value with count 1,
! 		 * since all the values have count at least 1; this is a property we
! 		 * desire, since there's no very good reason to assume that a
! 		 * single-occurrence value is an MCV and not just a random non-MCV.
! 		 *
! 		 * We need a special rule for the very last value.  If we get to it,
! 		 * we'll accept it if it's at least 1% of the non-null samples and has
! 		 * count more than 1.
  		 */
  		if (track_cnt < track_max && toowide_cnt == 0 &&
  			stats->stadistinct > 0 &&
*** compute_distinct_stats(VacAttrStatsP sta
*** 2133,2153 
  		}
  		else
  		{
! 			/* d here is the same as d in the Haas-Stokes formula */
  			int			d = nonnull_cnt - summultiple + nmultiple;
! 			double		avgcount,
! 		mincount;
  
! 			/* estimate # occurrences in sample of a typical nonnull value */
! 			avgcount = (double) nonnull_cnt / (double) d;
! 			/* set minimum threshold count to store a value */
! 			mincount = avgcount * 1.25;
! 			if (mincount < 2)
! mincount = 2;
  			if (num_mcv > track_cnt)
  num_mcv = track_cnt;
  			for (i = 0; i < num_mcv; i++)
  			{
  if (track[i].count < mincount)
  {
  	num_mcv = i;
--- 2143,2181 
  		}
  	

Transactional enum additions - was Re: [HACKERS] Alter or rename enum value

2016-04-02 Thread Andrew Dunstan



On 03/29/2016 04:56 PM, Andrew Dunstan wrote:



On 03/27/2016 10:20 AM, Tom Lane wrote:

Andrew Dunstan  writes:

The more I think about this the more I bump up against the fact that
almost anything we do might want to do to ameliorate the situation is
going to be rolled back. The only approach I can think of that doesn't
suffer from this is to abort if an insert/update will affect an 
index on

a modified enum. i.e. we prevent the possible corruption from happening
in the first place, as we do now, but in a much more fine grained way.

Perhaps, instead of forbidding ALTER ENUM ADD in a transaction, we could
allow that, but not allow the new value to be *used* until it's 
committed?
This could be checked cheaply during enum value lookup (ie, is xmin 
of the

pg_enum row committed).

What you really need is to prevent the new value from being inserted
into any indexes, but checking that directly seems far more difficult,
ugly, and expensive than the above.

I do not know whether this would be a meaningful improvement for
common use-cases, though.  (It'd help if people were more specific
about the use-cases they need to work.)





I think this is a pretty promising approach, certainly well worth 
putting some resources into investigating. One thing I like about it 
is that it gives a nice cheap negative test, so we know if the xmin is 
committed we are safe. So we could start by rejecting anything where 
it's not, but later might adopt a more refined but expensive tests for 
cases where it isn't committed without imposing a penalty on anything 
else.






Looking at this briefly. It looks like the check should be called from 
enum_in() and enum_recv(). What error should be raised if the enum row's 
xmin isn't committed? ERRCODE_FEATURE_NOT_SUPPORTED? or maybe 
ERRCODE_DATA_EXCEPTION? I don't see anything that fits very well.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small typo in a comment in pg_regress.c

2016-04-02 Thread Stephen Frost
Andreas,

* Andreas 'ads' Scherbaum (adsm...@wars-nicht.de) wrote:
> stumbled over this while looking into the source. Patch attached.

Fix pushed, thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Small typo in a comment in pg_regress.c

2016-04-02 Thread Andreas 'ads' Scherbaum


Hi,

stumbled over this while looking into the source. Patch attached.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 416829d..343fd19 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1329,7 +1329,7 @@ results_differ(const char *testname, const char *resultsfile, const char *defaul
 	if (platform_expectfile)
 	{
 		/*
-		 * Replace everything afer the last slash in expectfile with what the
+		 * Replace everything after the last slash in expectfile with what the
 		 * platform_expectfile contains.
 		 */
 		char	   *p = strrchr(expectfile, '/');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-04-02 Thread Tom Lane
Dean Rasheed  writes:
> Here's an updated patch with references to both papers, and a more
> detailed justification for the formula, along with the other changes
> discussed. Note that although equation (2) in the Dell'Era paper looks
> different from the Yao formula, it's actually the same.

Looks good to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-02 Thread Kevin Grittner
On Sat, Apr 2, 2016 at 7:12 AM, Michael Paquier
 wrote:
> On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera  
> wrote:
>> Kevin Grittner wrote:
>>
>>> Attached is what I think you're talking about for the first patch.
>>> AFAICS this should generate identical executable code to unpatched.
>>> Then the patch to actually implement the feature would, instead
>>> of adding 30-some lines with TestForOldSnapshot() would implement
>>> that as the behavior for the other enum value, and alter those
>>> 30-some BufferGetPage() calls.
>>>
>>> Álvaro and Michael, is this what you were looking for?
>>
>> Yes, this is what I was thinking, thanks.
>
> A small thing:
> $ git diff master --check
> src/include/storage/bufmgr.h:181: trailing whitespace.
> +#define BufferGetPage(buffer, snapshot, relation, agetest)
> ((Page)BufferGetBlock(buffer))
>
> -   Pagepage = BufferGetPage(buf);
> +   Pagepage = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
> Having a BufferGetPageExtended() with some flags and a default
> corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as
> long as the check is integrated with BufferGetPage[Extended]() I would
> not complain, the patch as proposed being 174kB...

If you are saying that the 450 places that don't need the check
would remain unchanged, and the only difference would be to use
BufferGetPageExtended() instead of BufferGetPage() followed by
TestForOldSnapshot() in the 30-some places that need the check, I
don't see the point.  That would eliminate the "forced choice"
aspect of what Álvaro is asking for, and it seems to me that it
would do next to nothing to prevent the errors of omission that are
the concern here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v11] GSSAPI encryption support

2016-04-02 Thread Michael Paquier
On Sat, Apr 2, 2016 at 7:34 AM, Robbie Harwood  wrote:
> - Attempt to address a crash Michael is observing by switching to using
>   the StringInfo/pqExpBuffer management functions over my own code as
>   much as possible.  Michael, if this doesn't fix it, I'm out of ideas.

Nope, it doesn't.

>   Since I still can't reproduce this locally (left a client machine and
>   a process on the same machine retrying for over an hour on your test
>   case and didn't see it), could you provide me with some more
>   information on why repalloc is complaining?
>   Is this a low memory situation where alloc might have failed?

No, this is an assertion failure, and it seems that you are compiling
this code without --enable-cassert, without the switch the code
actually works.

>   What's your setup look like?

Just a simple Linux VM running krb5kdc with 386MB of memory, with
Postgres running locally as well.

>   That pointer looks like it's on the heap, is that correct?

appendBinaryStringInfo depends on palloc calls that allocate memory
depending on the memory context used. It looks that what's just
missing in your logic is a private memory context that be_gssapi_write
and be_gssapi_read can use to handle the allocation of the
communication buffers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-04-02 Thread Dean Rasheed
On 31 March 2016 at 22:02, Tom Lane  wrote:
> I'm just concerned about what happens when the Dellera paper stops being
> available.  I don't mind including that URL as a backup to the written-out
> argument I just suggested.
>

Here's an updated patch with references to both papers, and a more
detailed justification for the formula, along with the other changes
discussed. Note that although equation (2) in the Dell'Era paper looks
different from the Yao formula, it's actually the same.

Regards,
Dean
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
new file mode 100644
index a6555e9..99f5f7c
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3439,9 +3439,51 @@ estimate_num_groups(PlannerInfo *root, L
 reldistinct = clamp;
 
 			/*
-			 * Multiply by restriction selectivity.
+			 * Update the estimate based on the restriction selectivity,
+			 * guarding against division by zero when reldistinct is zero.
+			 * Also skip this if we know that we are returning all rows.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			if (reldistinct > 0 && rel->rows < rel->tuples)
+			{
+/*
+ * Given a table containing N rows with n distinct values in a
+ * uniform distribution, if we select p rows at random then
+ * the expected number of distinct values selected is
+ *
+ * n * (1 - product((N-N/n-i)/(N-i), i=0..p-1))
+ *
+ * = n * (1 - (N-N/n)! / (N-N/n-p)! * (N-p)! / N!)
+ *
+ * See "Approximating block accesses in database
+ * organizations", S. B. Yao, Communications of the ACM,
+ * Volume 20 Issue 4, April 1977 Pages 260-261.
+ *
+ * Alternatively, re-arranging the terms from the factorials,
+ * this may be written as
+ *
+ * n * (1 - product((N-p-i)/(N-i), i=0..N/n-1))
+ *
+ * This form of the formula is more efficient to compute in
+ * the common case where p is larger than N/n.  Additionally,
+ * as pointed out by Dell'Era, if i << N for all terms in the
+ * product, it can be approximated by
+ *
+ * n * (1 - ((N-p)/N)^(N/n))
+ *
+ * See "Expected distinct values when selecting from a bag
+ * without replacement", Alberto Dell'Era,
+ * http://www.adellera.it/investigations/distinct_balls/.
+ *
+ * The condition i << N is equivalent to n >> 1, so this is a
+ * good approximation when the number of distinct values in
+ * the table is large.  It turns out that this formula also
+ * works well even when n is small.
+ */
+reldistinct *=
+	(1 - pow((rel->tuples - rel->rows) / rel->tuples,
+			 rel->tuples / reldistinct));
+			}
+			reldistinct = clamp_row_est(reldistinct);
 
 			/*
 			 * Update estimate of total distinct groups.
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
new file mode 100644
index de64ca7..0fc93d9
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-  QUERY PLAN  
---
- Hash Join
+   QUERY PLAN   
+
+ Hash Semi Join
Output: o.f1
Hash Cond: (o.f1 = "ANY_subquery".f1)
->  Seq Scan on public.int4_tbl o
  Output: o.f1
->  Hash
  Output: "ANY_subquery".f1, "ANY_subquery".g
- ->  HashAggregate
+ ->  Subquery Scan on "ANY_subquery"
Output: "ANY_subquery".f1, "ANY_subquery".g
-   Group Key: "ANY_subquery".f1, "ANY_subquery".g
-   ->  Subquery Scan on "ANY_subquery"
- Output: "ANY_subquery".f1, "ANY_subquery".g
- Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
- ->  HashAggregate
-   Output: i.f1, (generate_series(1, 2) / 10)
-   Group Key: i.f1
-   ->  Seq Scan on public.int4_tbl i
- Output: i.f1
-(18 rows)
+   Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+   ->  HashAggregate
+ Output: i.f1, (generate_series(1, 2) / 10)
+ Group Key: i.f1
+ ->  Seq Scan on public.int4_tbl i
+   Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-04-02 Thread David Rowley
On 2 April 2016 at 23:26, David Rowley  wrote:
> I worked on this today to try and get it into shape.

In the last patch I failed to notice that there's an alternative
expected results file for one of the regression tests.

The attached patch includes the fix to update that file to match the
new expected EXPLAIN output.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_57837dc_2016-04-03.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-02 Thread Michael Paquier
On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>
>> Attached is what I think you're talking about for the first patch.
>> AFAICS this should generate identical executable code to unpatched.
>> Then the patch to actually implement the feature would, instead
>> of adding 30-some lines with TestForOldSnapshot() would implement
>> that as the behavior for the other enum value, and alter those
>> 30-some BufferGetPage() calls.
>>
>> Álvaro and Michael, is this what you were looking for?
>
> Yes, this is what I was thinking, thanks.

A small thing:
$ git diff master --check
src/include/storage/bufmgr.h:181: trailing whitespace.
+#define BufferGetPage(buffer, snapshot, relation, agetest)
((Page)BufferGetBlock(buffer))

-   Pagepage = BufferGetPage(buf);
+   Pagepage = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
Having a BufferGetPageExtended() with some flags and a default
corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as
long as the check is integrated with BufferGetPage[Extended]() I would
not complain, the patch as proposed being 174kB...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-04-02 Thread Amit Kapila
On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund  wrote:
>
> On 2016-03-31 15:07:22 +0530, Amit Kapila wrote:
> > On Thu, Mar 31, 2016 at 4:39 AM, Andres Freund 
wrote:
> > >
> > > On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> > > > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <
amit.kapil...@gmail.com>
> > > > wrote:
> > > > >
> > >
> > > Amit, could you run benchmarks on your bigger hardware? Both with
> > > USE_CONTENT_LOCK commented out and in?
> > >
> >
> > Yes.
>
> Cool.
>

Here is the performance data (configuration of machine used to perform this
test is mentioned at end of mail):

Non-default parameters

max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout=35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB

median of 3, 20-min pgbench tpc-b results for --unlogged-tables


Client Count/No. Of Runs (tps) 2 64 128
HEAD+clog_buf_128 4930 66754 68818
group_clog_v8 5753 69002 78843
content_lock 5668 70134 70501
nocontent_lock 4787 69531 70663


I am not exactly sure why using content lock (defined USE_CONTENT_LOCK in
0003-Use-a-much-more-granular-locking-model-for-the-clog-) patch or no
content lock (not defined USE_CONTENT_LOCK) patch gives poor performance at
128 client, may it is due to some bug in patch or due to some reason
mentioned by Robert [1] (usage of two locks instead of one).  On running it
many-2 times with content lock and no content lock patch, some times it
gives 80 ~ 81K TPS at 128 client count which is approximately 3% higher
than group_clog_v8 patch which indicates that group clog approach is able
to address most of the remaining contention (after increasing clog buffers)
around CLOGControlLock.  There is one small regression observed with no
content lock patch at lower client count (2) which might be due to
run-to-run variation or may be it is due to increased number of
instructions due to atomic ops, need to be investigated if we want to
follow no content lock approach.

Note, I have not posted TPS numbers with HEAD, as I have already shown
above that increasing clog bufs has increased TPS from ~36K to ~68K at 128
client-count.


M/c details
-
Power m/c config (lscpu)
-
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache:  512K
L3 cache:  8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191


[1] -
http://www.postgresql.org/message-id/CA+TgmoYjpNKdHDFUtJLAMna-O5LGuTDnanHFAOT5=hn_vau...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-04-02 Thread David Rowley
On 2 April 2016 at 05:52, Tom Lane  wrote:
>
> David Rowley  writes:
> > On 12 March 2016 at 11:43, Tom Lane  wrote:
> >> It seems like the major intellectual complexity here is to figure out
> >> how to detect inner-side-unique at reasonable cost.  I see that for
> >> LEFT joins you're caching that in the SpecialJoinInfos, which is probably
> >> fine.  But for INNER joins it looks like you're just doing it over again
> >> for every candidate join, and that seems mighty expensive.
>
> > I have a rough idea for this, but I need to think of it a bit more to
> > make sure it's bulletproof.
>
> Where are we on this?  I had left it alone for awhile because I supposed
> that you were going to post a new patch, but it's been a couple weeks...

Apologies for the delay. I was fully booked.

I worked on this today to try and get it into shape.

Changes:

* Added unique_rels and non_unique_rels cache to PlannerInfo. These
are both arrays of Lists which are indexed by the relid, which is
either proved to be unique by, or proved not to be unique by each
listed set of relations. Each List item is a Bitmapset containing the
relids of the relation which proved the uniqueness, or proved no
possibility of uniqueness for the non_unique_rels case.  The
non_unique_rels cache seems far less useful than the unique one, as
with the standard join search, we start at level one, comparing
singleton relations and works our way up, so it's only towards the end
of the search that we'll have more rels on each side of the join
search. Many of the unique cases are found early on in the search, but
the non-unique cases must be rechecked as more relations are added to
the search, as that introduces a new possibility of uniqueness. In
fact, the only cache hit of the non-unique case in the regression
tests is a test that's using the GEQO, so I'm not quite sure if the
standard join search will ever have cache hit here. Never-the-less I
kept the cache, as it's likely going to be most useful to have it when
the GEQO is running the show, as that's when we're going to see the
largest number of relations to join.

There's also a small quirk which could lead to false negatives for
uniqueness which might still need ironed out: I don't yet attempt to
get the minimum set of relations which proved the uniqueness. This,
perhaps, does not matter much for the standard join search, but likely
will, more so for GEQO cases. Fixing this will require changes to
relation_has_unique_index_for() to get it to record and return the
relids of the clauses which matched the index.

* Renamed JOIN_LEFT_UNIQUE to JOIN_SEMI_LEFT. It seems better to
maintain the SEMI part. I thought about renaming JOIN_SEMI to
JOIN_SEMI_INNER, but thought I'd better not touch that here.

* Made a pass to update comments in the areas where I've added
handling for JOIN_SEMI_LEFT.

* I also updated comments in the regression tests to remove reference
to unique joins. "Join conversion" seems to be a better term now.

There was also a couple of places where I wasn't quite sure how to
handle JOIN_SEMI_LEFT. I've marked these with an XXX comment. Perhaps
how to handle them is more clear to you.

I also was not quite sure the best place to allocate memory for these
caches. I've ended up doing that in setup_simple_rel_arrays(), which
is likely the wrong spot. I originally did this in
standard_join_search(), after join_rel_level is allocated memory, but
I soon realised that it can't happen here as the GEQO requires the
caches too, and naturally it does not come through
standard_join_search().

I was not quite sure if I should update any docs to mention that Inner
Joins can become Semi Joins in some cases.

I was also a bit unsure if I should move the two new functions I added
to joinpath.c into analyzejoins.c.

Thanks for taking an interest in this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_6f34aa1_2016-04-02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fd.c: flush data problems on osx

2016-04-02 Thread Michael Paquier
On Mon, Mar 21, 2016 at 9:09 PM, Stas Kelvich  wrote:
> On 21 Mar 2016, at 14:53, Andres Freund  wrote:
>> Hm. I think we should rather just skip calling pg_flush_data in the
>> directory case, that very likely isn't beneficial on any OS.
>
> Seems reasonable, changed.
>
>> I think we still need to fix the mmap() implementation to support the
>> offset = 0, nbytes = 0 case (via fseek(SEEK_END).
>
> It is already in this diff. I’ve added this few messages ago.

A similar change seems to be needed in initdb.c's pre_sync_fname.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - remove unused clientDone parameter

2016-04-02 Thread Fabien COELHO



Remove pgbench clientDone unused "ok" parameter.


Seems useless, yeah, removed.


Good!


Actually it was introduced as an unused argument in 3da0dfb4b146.


Strange indeed. I did not thought to dig into the history.


I switchted the corresponding commitfest entry as committed.

Thanks,

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-02 Thread Masahiko Sawada
On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch  wrote:
> On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
>> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch  wrote:
>> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
>> >> As a result of looked into code around the recvoery, ISTM that the
>> >> cause is related to relation cache clear.
>> >> In heap_xlog_visible, if the standby server receives WAL record then
>> >> relation cache is eventually cleared in vm_extend,  but If standby
>> >> server receives FPI then relation cache would not be cleared.
>> >> For example, after I applied attached patch to HEAD, (it might not be
>> >> right way but) this problem seems to be resolved.
>> >>
>> >> Is this a bug? or not?
>> >
>> > It's a bug.  I don't expect it causes queries to return wrong answers, 
>> > because
>> > visibilitymap.c says "it's always safe to clear a bit in the map from
>> > correctness point of view."  (The bug makes a visibility map bit 
>> > temporarily
>> > appear to have been cleared.)  I still call it a bug, because recovery
>> > behavior becomes too difficult to verify when xlog replay produces 
>> > conditions
>> > that don't happen outside of recovery.  Even if there's no way to get a 
>> > wrong
>> > query answer today, this would be too easy to break later.  I wonder if we
>> > make the same omission in other xlog replay functions.  Similar omissions 
>> > may
>> > cause wrong query answers, even if this particular one does not.
>> >
>> > Would you like to bisect for the commit, or at least the major release, at
>> > which the bug first appeared?
>> >
>> > I wonder if your discovery has any relationship to this recently-reported 
>> > case
>> > of insufficient smgr invalidation:
>> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=nh2o1fxkkj6yvobtvy...@mail.gmail.com
>> >
>>
>> I'm not sure this bug has relationship to another issue you mentioned
>> but after further investigation, this bug seems to be reproduced even
>> on more older version.
>> At least I reproduced it at 9.0.0.
>
> Would you try PostgreSQL 9.2.16?  The visibility map was not crash safe and
> had no correctness implications until 9.2.  If 9.2 behaves this way, it's
> almost certainly not a recent regression.

Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression.
The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which
introduces crash-safe visibility map.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers