date:20160328

Re: [HACKERS] extend pgbench expressions with functions

2016-03-28 Thread Fabien COELHO




I guess the question here is whether we want the part-c patch, which
removes the historical \setrandom syntax.  That's technically no
longer needed since we now can use random() as a function directly
inside \set commands, but we might want it anyway for backward
compatibility.


This patch is indeed a proposal.


Anybody have an opinion on that?


+1 for nuking it. That's not worth the trouble maintaining it.


I share Michaël opinion.

Some arguments for removing it:

 - \setrandom is syntactically inhomogeneous in the overall syntax,
   and is now redundant

 - switching to the \set syntax is pretty easy, see attached script

 - custom scripts are short, they are used by few but
   advance users, for which updating would not be an issue

 - the parsing & execution codes are lengthy, repetitive...

--
Fabien.#! /usr/bin/perl -wp
s/^\\setrandom\s+(\S+)\s+(\S+)\s+(\S+)\s+(exponential|gaussian)\s+(\S+)/\\set $1 random_$4($2, $3, $5)/;
s/^\\setrandom\s+(\S+)\s+(\S+)\s+(\S+)(\s+uniform)?/\\set $1 random($2, $3)/;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-28 Thread Tom Lane

Christian Ullrich  writes:
> * Tom Lane wrote:
>> Yeah.  I've been staring at that for awhile, but it's not clear where
>> the problem is.  There are a bunch of other SET TIME ZONE commands in
>> the regression tests, how is it that this trivial case fails on the
>> Windows critters?

> zic aborts somewhere between writing Etc/UTC and UTC.

Huh ... I would not have guessed that.  Can you track down exactly
where it's failing?

We absorbed some new code in zic.c for creating subdirectories of the
target timezone directory, and said code seemed a bit odd to me,
but I supposed that the IANA guys had debugged it already.  Maybe not.
Or maybe the problem is with some specific input file that gets
reached somewhere in that range?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 1:36 PM, Christian Ullrich  wrote:
> * Tom Lane wrote:
>
>> Michael Paquier  writes:
>>>
>>> Buildfarm-not-being-happy-status: woodloose, mastodon, thrips, jacana.
>>>
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=woodlouse=2016-03-29%2000%3A42%3A08
>>> The origin of the problem is that, which prevents all the subsequent
>>> queries to fail:
>>>SET TimeZone to 'UTC';
>>> + ERROR:  invalid value for parameter "TimeZone": "UTC"
>>
>>
>> Yeah.  I've been staring at that for awhile, but it's not clear where
>> the problem is.  There are a bunch of other SET TIME ZONE commands in
>> the regression tests, how is it that this trivial case fails on the
>> Windows critters?
>
>
> I think this is the reason, from the check log on woodlouse (jacana says the
> same in make style):
>
> Generating timezone files...release\zic\zic: Can't create
> C:/buildfarm/buildenv/HEAD/pgsql.build/tmp_install/share/timezone/US/Pacific-New:
> No such file or directory

Yes, I have bumped into that when running the build. And I think that
the error is in zic.c, in dolink() when performing a Link operation
because this parent directory is not created because of that:
if (link_errno == ENOENT || link_errno == ENOTSUP)
{
if (!mkdirs(toname))
exit(EXIT_FAILURE);
retry_if_link_supported = true;
}
I think that we'd want here to check as well on EISDIR or EACCES...
Haven't checked yet though. I'll update this thread with hopefully a
patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: pgbench: Support double constants and functions.

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 1:56 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Tue, Mar 29, 2016 at 9:52 AM, Robert Haas  wrote:
>>> pgbench: Support double constants and functions.
>
>> Instead of INT64_MIN and INT64_MAX, I think that it would be better to
>> use PG_INT64_MIN and PG_INT64_MAX.
>
> Indeed.
>
>> Note as well that DOUBLE is an
>> existing variable type in VS (while INTEGER is not),
>
> Ooops; that one was harder to foresee.

Thanks for the push.

>> A way to fix compilation here is to rename those tokens to something
>> that will never conflict, like in the attached to DOUBLE_VAR and
>> INTEGER_VAR, both exprparse.y and exprparse.l need an update.
>
> Agreed, but these symbols represent constants not variables, so
> I used DOUBLE_CONST and INTEGER_CONST instead.  Pushed with those
> corrections, so that we can get back to looking at my own bugs :-(

I think I know what's going on here...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: pgbench: Support double constants and functions.

2016-03-28 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Mar 29, 2016 at 9:52 AM, Robert Haas  wrote:
>> pgbench: Support double constants and functions.

> Instead of INT64_MIN and INT64_MAX, I think that it would be better to
> use PG_INT64_MIN and PG_INT64_MAX.

Indeed.

> Note as well that DOUBLE is an
> existing variable type in VS (while INTEGER is not),

Ooops; that one was harder to foresee.

> A way to fix compilation here is to rename those tokens to something
> that will never conflict, like in the attached to DOUBLE_VAR and
> INTEGER_VAR, both exprparse.y and exprparse.l need an update.

Agreed, but these symbols represent constants not variables, so
I used DOUBLE_CONST and INTEGER_CONST instead.  Pushed with those
corrections, so that we can get back to looking at my own bugs :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Amit Kapila

On Tue, Mar 29, 2016 at 3:21 AM, Petr Jelinek  wrote:

> On 28/03/16 14:46, Dilip Kumar wrote:
>
>>
>> Conclusion:
>> ---
>> 1. I think v15 is solving the problem exist with v13 and performance
>> is significantly high compared to base, and relation size is also
>> stable, So IMHO V15 is winner over other solution, what other thinks ?
>>
>>
> It seems so, do you have ability to reasonably test with 64 clients? I am
> mostly wondering if we see the performance going further down or just
> plateau.
>
>
Yes, that makes sense.  One more point is that if the reason for v13 giving
better performance is extra blocks (which we believe in certain cases can
leak till the time Vacuum updates the FSM tree), do you think it makes
sense to once test by increasing lockWaiters * 20 limit to may
be lockWaiters * 25 or lockWaiters * 30.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-28 Thread Christian Ullrich


* Tom Lane wrote:


Michael Paquier  writes:

Buildfarm-not-being-happy-status: woodloose, mastodon, thrips, jacana.
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=woodlouse=2016-03-29%2000%3A42%3A08
The origin of the problem is that, which prevents all the subsequent
queries to fail:
   SET TimeZone to 'UTC';
+ ERROR:  invalid value for parameter "TimeZone": "UTC"


Yeah.  I've been staring at that for awhile, but it's not clear where
the problem is.  There are a bunch of other SET TIME ZONE commands in
the regression tests, how is it that this trivial case fails on the
Windows critters?


I think this is the reason, from the check log on woodlouse (jacana says 
the same in make style):


Generating timezone files...release\zic\zic: Can't create 
C:/buildfarm/buildenv/HEAD/pgsql.build/tmp_install/share/timezone/US/Pacific-New: 
No such file or directory


zic aborts somewhere between writing Etc/UTC and UTC. This is how the 
toplevel directory ends up after the error:


  Africa
  America
  Antarctica
  Arctic
  Asia
  Atlantic
  Australia
 2.102 CET
 2.294 CST6CDT
 1.876 EET
   127 EST
 2.294 EST5EDT
  Etc
  Europe
   264 Factory
   128 HST
  Indian
 2.102 MET
   127 MST
 2.294 MST7MDT
  Pacific
 2.294 PST8PDT
 1.873 WET

After I manually created the "US" directory, zic had no further trouble 
putting files into it.


--
Christian



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: pgbench: Support double constants and functions.

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 9:52 AM, Robert Haas  wrote:
> pgbench: Support double constants and functions.
>
> The new functions are pi(), random(), random_exponential(),
> random_gaussian(), and sqrt().  I was worried that this would be
> slower than before, but, if anything, it actually turns out to be
> slightly faster, because we now express the built-in pgbench scripts
> using fewer lines; each \setrandom can be merged into a subsequent
> \set.

Buildfarm is complaining about a couple of things here:
  src/bin/pgbench/exprparse.c(139): error C2365: 'DOUBLE' :
redefinition; previous definition was 'typedef'
[C:\buildfarm\buildenv\HEAD\pgsql.build\pgbench.vcxproj]
  src/bin/pgbench/pgbench.c(1046): error C2065: 'INT64_MIN' :
undeclared identifier
[C:\buildfarm\buildenv\HEAD\pgsql.build\pgbench.vcxproj]
  src/bin/pgbench/exprparse.c(139): error C2086: 'yytokentype DOUBLE'
: redefinition [C:\buildfarm\buildenv\HEAD\pgsql.build\pgbench.vcxproj]
  src/bin/pgbench/pgbench.c(1046): error C2065: 'INT64_MAX' :
undeclared identifier
[C:\buildfarm\buildenv\HEAD\pgsql.build\pgbench.vcxproj]
  src/bin/pgbench/exprscan.l(131): error C2275: 'DOUBLE' : illegal use
of this type as an expression (src/bin/pgbench/exprparse.c)
[C:\buildfarm\buildenv\HEAD\pgsql.build\pgbench.vcxproj]

Instead of INT64_MIN and INT64_MAX, I think that it would be better to
use PG_INT64_MIN and PG_INT64_MAX. Note as well that DOUBLE is an
existing variable type in VS (while INTEGER is not), and it is true
that it is not directly documented:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms221627%28v=vs.85%29.aspx
A way to fix compilation here is to rename those tokens to something
that will never conflict, like in the attached to DOUBLE_VAR and
INTEGER_VAR, both exprparse.y and exprparse.l need an update.
-- 
Michael


pgbench-fix-vs.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql stops completion for the only matching schema name.

2016-03-28 Thread Kyotaro HORIGUCHI

Hello, I found that COMPLETE_SCHEMA_QUERY doesn't complete
only-one-matching schema names.

=# alter foreign table 
information_schema.  pg_temp_1.   pg_toast_temp_1.
pg_catalog.  pg_toast.public.
=# alter foreign table i


or 

=# alter foreign table pu


This makes more perplexing behavior with addon query.

=# create schema alpha;
=# alter table 
ALL IN TABLESPACEpg_catalog.  pg_toast_temp_1.
alpha.   pg_temp_1.   public.
information_schema.  pg_toast.
=# alter table a
=# alter table ALL IN TABLESPACE _

Where my alpha has gone?

_complete_from_query adds schema names only if more than one
shcmea names match the current input. The 'i' matches only one
schema name. The reason for the behavior seems not to add a
schema name when qualified names in the schema is added. So the
attached patch does exactly the thing to do. psql behaves as the
following with this patch.

=# create schema alpha;
=# create table alpha.a (a int);
=# create table alpha.b (a int);
=# alter table 
ALL IN TABLESPACEpg_catalog.  pg_toast_temp_1.
alpha.   pg_temp_1.   public.
information_schema.  pg_toast.
=# alter table al
ALL IN TABLESPACE  alpha.aalpha.b
=# alter table alp
=# alter table alpha.
alpha.a  alpha.b  

This seems to me the intended behavior here.

Any comments?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c3a1220ecec07af660fc184e5a1d24baa1fe913b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 29 Mar 2016 12:13:03 +0900
Subject: [PATCH] Fix the condition to decide whether to add schema names.

Currently, COMPLETE_WITH_SCHEMA_QUERY stops completetion for the only
matching schema name for the current input. Schema names are added
only when there's more than one matches for current input but the
correct condition is that there's no qualified names to match. No
qualified names is listed only when there's exactly one matching
schema name.
---
 src/bin/psql/tab-complete.c | 52 -
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6a81416..631d0d7 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3206,7 +3206,7 @@ _complete_from_query(int is_schema_query, const char *text, int state)
 	 */
 	if (state == 0)
 	{
-		PQExpBufferData query_buffer;
+		PQExpBufferData query_buffer, tmp_buffer;
 		char	   *e_text;
 		char	   *e_info_charp;
 		char	   *e_info_charp2;
@@ -3279,26 +3279,12 @@ _complete_from_query(int is_schema_query, const char *text, int state)
 			}
 
 			/*
-			 * Add in matching schema names, but only if there is more than
-			 * one potential match among schema names.
+			 * Make query to collect matching qualified names, but only if
+			 * there is exactly one schema matching the input-so-far. This
+			 * query is made separatedly since used twice.
 			 */
-			appendPQExpBuffer(_buffer, "\nUNION\n"
-		   "SELECT pg_catalog.quote_ident(n.nspname) || '.' "
-			  "FROM pg_catalog.pg_namespace n "
-			  "WHERE substring(pg_catalog.quote_ident(n.nspname) || '.',1,%d)='%s'",
-			  char_length, e_text);
-			appendPQExpBuffer(_buffer,
-			  " AND (SELECT pg_catalog.count(*)"
-			  " FROM pg_catalog.pg_namespace"
-			" WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,%d) ="
-			  " substring('%s',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) > 1",
-			  char_length, e_text);
-
-			/*
-			 * Add in matching qualified names, but only if there is exactly
-			 * one schema matching the input-so-far.
-			 */
-			appendPQExpBuffer(_buffer, "\nUNION\n"
+			initPQExpBuffer(_buffer);
+			appendPQExpBuffer(_buffer, 
 	 "SELECT pg_catalog.quote_ident(n.nspname) || '.' || %s "
 			  "FROM %s, pg_catalog.pg_namespace n "
 			  "WHERE %s = n.oid AND ",
@@ -3306,9 +3292,9 @@ _complete_from_query(int is_schema_query, const char *text, int state)
 			  completion_squery->catname,
 			  completion_squery->namespace);
 			if (completion_squery->selcondition)
-appendPQExpBuffer(_buffer, "%s AND ",
+appendPQExpBuffer(_buffer, "%s AND ",
   completion_squery->selcondition);
-			appendPQExpBuffer(_buffer, "substring(pg_catalog.quote_ident(n.nspname) || '.' || %s,1,%d)='%s'",
+			appendPQExpBuffer(_buffer, "substring(pg_catalog.quote_ident(n.nspname) || '.' || %s,1,%d)='%s'",
 			  qualresult,
 			  char_length, e_text);
 
@@ -3316,17 +3302,35 @@ _complete_from_query(int is_schema_query, const char *text, int state)
 			 * This condition exploits the single-matching-schema rule to
 			 * speed up the query
 			 */
-			appendPQExpBuffer(_buffer,
+			appendPQExpBuffer(_buffer,
 			" AND substring(pg_catalog.quote_ident(n.nspname) || '.',1,%d) ="
 			  "

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

Hi


>>> Anyway this is certainly not committable as-is, so I'm setting it back
>>> to Waiting on Author.  But the fact that both libpq and ecpg would need
>>> updates makes me question whether we can safely pretend that this isn't
>>> a protocol break.
>>>
>>>
>>>
>>
>>
>> In that case I humbly submit that there is a case for reviving the psql
>> patch I posted that kicked off this whole thing and lets you export a piece
>> of binary data from psql quite easily. It should certainly not involve any
>> protocol break.
>>
>
> The psql only solution can work only for output. Doesn't help with input.
>

In this case, I am thinking so the features of COPY statement is perfect
for this feature. The way from a  content to the file is direct. In psql
you have to off - tuple separator, record separator, you have to set output
file. You can get same effect, but with more work. In previous version it
was relatively  hard to use it from command line - now, with multi command
-c is much simpler, but still the COPY is the ideal.

I agree, so output formats of psql is nice feature. And should be pretty
nice, if we support more common formats - like csv, simple xml, simple
json. I believe so sometime the redundancy is acceptable, if the cost is
not too high.

sorry for offtopic - I would to see some output format on client side, but
the format possibilities are on server side. So there are natural idea -
define server side output format. psql output format just can wrap it.

Regards

Pavel


> Regards
>
> Pavel
>
>
>>
>> cheers
>>
>> andrew
>>
>
>

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

Hi

2016-03-29 0:26 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > [ copy-raw-format-20160227-03.patch ]
>
> I looked at this patch.  I'm having a hard time accepting that it has
> a use-case large enough to justify it, and here's the reason: it's
> a protocol break.  Conveniently omitting to update protocol.sgml
> doesn't make it not a protocol break.  (libpq.sgml also contains
> assorted statements that are falsified by this patch.)
>

The reply on this question depends how we would to be strict. This doesn't
change the format in types stream, but it creates new enum value. Correctly
written should to raise exception when is processing unknown enum value.

I'll do tests against old libpq.


>
> You could argue that it's the user's own fault if he tries to use
> COPY RAW with client-side code that hasn't been updated to support it.
> Maybe that's okay, but I wonder if we're opening ourselves up to
> problems.  Maybe even security-grade problems.
>
> In terms of specific code that hasn't been updated, ecpg is broken
> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
> ought to do but probably something other than what it does today.
>
> There's also a definitional question of what we think PQfformat() ought
> to do; should it return "2" for the per-field format?  Or maybe the
> per-field format is still "1", since it's after all the same binary data
> format as for COPY BINARY, and only the overall copy format reported by
> PQbinaryTuples() should change to "2".
>
>
I'll recheck it


> BTW, I'm not really sure why the patch is trying to enforce single
> row and column for the COPY OUT case.  I thought the idea for that
> was that we'd just shove out the data without any delimiters, and
> if it's more than one datum it's the user's problem whether he can
> identify the boundaries.  On the input side we would have to insist
> on one column since we're not going to attempt to identify boundaries
> (and one row would fall out of the fact that we slurp the entire input
> and treat it as one datum).
>

It should not be problem. I though about it. The COPY statements is
extensible with options. We can support more fields, more rows if it will
be required with additional options. But now, it looks like premature
optimization.


>
> Anyway this is certainly not committable as-is, so I'm setting it back
> to Waiting on Author.  But the fact that both libpq and ecpg would need
> updates makes me question whether we can safely pretend that this isn't
> a protocol break.
>

I'll do test against some clients.

Regards

Pavel


>
> regards, tom lane
>

Re: [HACKERS] [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-28 Thread Tom Lane

Michael Paquier  writes:
> Buildfarm-not-being-happy-status: woodloose, mastodon, thrips, jacana.
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=woodlouse=2016-03-29%2000%3A42%3A08
> The origin of the problem is that, which prevents all the subsequent
> queries to fail:
>   SET TimeZone to 'UTC';
> + ERROR:  invalid value for parameter "TimeZone": "UTC"

Yeah.  I've been staring at that for awhile, but it's not clear where
the problem is.  There are a bunch of other SET TIME ZONE commands in
the regression tests, how is it that this trivial case fails on the
Windows critters?

Probably the first thing to eliminate is whether it's zic that is
messing up (and writing a bad zone file) or the backend.  Can someone
compare the $INSTALLDIR/share/timezone/UTC files between a working
build and a failing one?  They should be bitwise identical (but note
I'm not expecting them to be identical to files from before the tzcode
merge commit).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

2016-03-29 5:12 GMT+02:00 Andrew Dunstan :

>
>
> On 03/28/2016 06:26 PM, Tom Lane wrote:
>
>> Pavel Stehule  writes:
>>
>>> [ copy-raw-format-20160227-03.patch ]
>>>
>> I looked at this patch.  I'm having a hard time accepting that it has
>> a use-case large enough to justify it, and here's the reason: it's
>> a protocol break.  Conveniently omitting to update protocol.sgml
>> doesn't make it not a protocol break.  (libpq.sgml also contains
>> assorted statements that are falsified by this patch.)
>>
>> You could argue that it's the user's own fault if he tries to use
>> COPY RAW with client-side code that hasn't been updated to support it.
>> Maybe that's okay, but I wonder if we're opening ourselves up to
>> problems.  Maybe even security-grade problems.
>>
>> In terms of specific code that hasn't been updated, ecpg is broken
>> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
>> ought to do but probably something other than what it does today.
>>
>> There's also a definitional question of what we think PQfformat() ought
>> to do; should it return "2" for the per-field format?  Or maybe the
>> per-field format is still "1", since it's after all the same binary data
>> format as for COPY BINARY, and only the overall copy format reported by
>> PQbinaryTuples() should change to "2".
>>
>> BTW, I'm not really sure why the patch is trying to enforce single
>> row and column for the COPY OUT case.  I thought the idea for that
>> was that we'd just shove out the data without any delimiters, and
>> if it's more than one datum it's the user's problem whether he can
>> identify the boundaries.  On the input side we would have to insist
>> on one column since we're not going to attempt to identify boundaries
>> (and one row would fall out of the fact that we slurp the entire input
>> and treat it as one datum).
>>
>> Anyway this is certainly not committable as-is, so I'm setting it back
>> to Waiting on Author.  But the fact that both libpq and ecpg would need
>> updates makes me question whether we can safely pretend that this isn't
>> a protocol break.
>>
>>
>>
>
>
> In that case I humbly submit that there is a case for reviving the psql
> patch I posted that kicked off this whole thing and lets you export a piece
> of binary data from psql quite easily. It should certainly not involve any
> protocol break.
>

The psql only solution can work only for output. Doesn't help with input.

Regards

Pavel


>
> cheers
>
> andrew
>

Re: [HACKERS] Using quicksort for every external sort run

2016-03-28 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 6:54 PM, Peter Geoghegan  wrote:
> I've used amcheck [2] to test this latest revision -- the tool ought
> to not see any problems with any index created with the patch applied.
> Reviewers might find it helpful to use amcheck, too. As 9.6 is
> stabilized, I anticipate that amcheck will give us a fighting chance
> at early detection of any bugs that might have slipped into tuplesort,
> or a B-Tree operator class. Since we still don't even have one single
> test of the external sort code [3], it's just as well. If we wanted to
> test external sorting, maybe we'd do that by adding tests to amcheck,
> that are not run by default, much like test_decoding, which tests
> logical decoding but is not targeted by "make installcheck"; that
> would allow the tests to be fairly comprehensive without being
> annoying. Using amcheck neatly side-steps issues with the portability
> of "expected" pg_regress output when collatable type sorting is
> tested.

Note that amcheck V2, which I posted just now features tests for
external sorting. The way these work requires discussion. The tests
are motivated in part by the recent strxfrm() debacle, as well as by
the need to have at least some test coverage for this patch. It's bad
that external sorting currently has no test coverage. We should try
and do better there as part of this overhaul to tuplesort.c.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-28 Thread Peter Geoghegan

On Tue, Mar 22, 2016 at 10:57 AM, Peter Geoghegan  wrote:
> That's right - I have a small number of feedback items to work
> through. I also determined myself that there could be a very low
> probability race condition when checking the key space across sibling
> pages, and will work to address that.

I attach a V2 revision.

Behavioral changes:

* Fixes very low probability race condition when verifying ordering
across sibling pages with bt_index_check()/AccessShareLock.

Notably, the fix does not involve using the high key from the next
page instead of the first item, despite my recent suggestion that it
would; I changed my mind. In this revision,
bt_right_page_check_scankey() continues to return the first item on
the next/right page (as a scankey), just as in the first version. The
locking considerations here are more complicated than before, and
hopefully correct. I abandoned the high key idea when I realized that
a concurrent page split could happen in the right sibling, making the
high key generally no more safe in the face of concurrent target page
deletion than continuing to use the first data item. Using the first
data item from the right page is better than using the high key item
from the right page verification-wise anyway, so that's good. I found
a new way of locking down and recovering from any race conditions (the
low probability bug in V1 for bt_index_check()/AccessShareLock calls
to the C function bt_right_page_check_scankey()). I won't go into the
gory details here, but I will mention that my new approach is inspired
by how backwards scans work already.

* Adds DEBUG1 traces that indicate the level and page currently being checked.

Advanced users will probably find these traces useful on occasion.
It's nice to be able to see how amcheck walks the B-Tree in practice.

Non-behavioral changes:

* Explains why an ExclusiveLock is needed on the target index by
bt_index_parent_check() (and a ShareLock on its heap relation).

This new explanation talks about why an ExclusiveLock *is* necessary
when checking downlinks against child pages, per Tomas Vondra's
request. (Note that nothing changes about child/downlink verification
itself). Before now, I focused on why they were not necessary, but
Tomas was right to point out that that isn't enough information.

* No longer uses the word "comport" in an error message, per Amit
Langote. Similarly, minor typos fixed, per Tomas.

* Adds a variety of pg_regress-based tests for external sorting. (Via
various CREATE INDEX statements involving a variety of datatypes. Data
is generated pseudo-randomly.)

Notably, the regression tests *never* test external sorting today.

ISTM that amcheck is a convenient, reliable, and portable way of
testing the sorting of collatable types like text, so this is the best
way of testing external sorting. External sorting uses abbreviated
keys for runs, but discards them as runs are written out, and so the
merging of runs does not use abbreviated keys. Abbreviated comparisons
and authoritative comparisons are therefore reliably used within the
same sort. One particular goal here is that something like the recent
strxfrm() debacle might be caught by this testing on some platforms
(if we ever get back to trusting strxfrm() at all, that is). The style
of these new tests is unorthodox, because a variety of collations are
tested, which varies from system to system; we should discuss all
this. The tests can take a minute or two to run on my laptop, but that
could vary significantly. And so, whether or not that needs to be
targeted by something other than "check" and "installcheck" is a
discussion that needs to happen. I imagine a lot of buildfarm animals
have slow storage, but on the other hand I don't think hackers run the
contrib regression tests so often. I prefer to not follow the example
of test_decoding if possible; test_decoding's Makefile only runs tests
when wal_level=logical is expected (that's how the relevant Makefile
targets are scoped), but it's a complicated special case.

Review
--

As already noted after the first bullet point, there are some tricky
considerations on fixing the race when the user calls
bt_index_check(). Reviewers should pay close attention to this; could
I be wrong about that being race-safe even now? This seems to me to be
by far the best place for reviewers to look for problems in this
patch, as it's the only bt_index_check()/AccessShareLock case that
compares anything *across* pages. The measures taken to prevent false
positives described within bt_target_page_check() require expert
review, especially because this is the one case where a faulty
rationale may result in under-protection from false positives, and not
just harmless over-protection.

The rationale for why we can reliably recover from a race described
within the C function bt_right_page_check_scankey() is *complicated*.
I may have failed to make it as simple as possible despite significant
effort. It's hard to

Re: [HACKERS] raw output from copy

2016-03-28 Thread Andrew Dunstan




On 03/28/2016 06:26 PM, Tom Lane wrote:

Pavel Stehule  writes:

[ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.





In that case I humbly submit that there is a case for reviving the psql 
patch I posted that kicked off this whole thing and lets you export a 
piece of binary data from psql quite easily. It should certainly not 
involve any protocol break.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-28 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Tuesday, March 29, 2016 10:54 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Petr Jelinek; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
> 
> On Mon, Mar 28, 2016 at 9:36 PM, Kouhei Kaigai  wrote:
> > I don't have a strong reason to keep these stuff in separate files.
> > Both stuffs covers similar features and amount of code are enough small.
> > So, the attached v4 just merged custom-node.[ch] stuff into extensible.
> >
> > Once we put similar routines closely, it may be better to consolidate
> > these routines.
> > As long as EXTNODENAME_MAX_LEN == CUSTOM_NAME_MAX_LEN, both features
> > have identical structure layout, so it is easy to call an internal
> > common function to register or find out a table of callbacks according
> > to the function actually called by other modules.
> >
> > I'm inclined to think to replace EXTNODENAME_MAX_LEN and
> > CUSTOM_NAME_MAX_LEN by NAMEDATALEN again, to fit structure layout.
> 
> I don't think that we need both EXTNODENAME_MAX_LEN and
> CUSTOM_NAME_MAX_LEN; we can use EXTNODENAME_MAX_LEN for both.  I'm
> opposed to using NAMEDATALEN for anything unrelated to the size of a
> Name.  If it's not being stored in a catalog, it doesn't need to care.
>
OK, I adjusted the v4 patch to use EXTNODENAME_MAX_LEN for both.

The structure of hash entry was revised as follows, then registered via
an internal common function: RegisterExtensibleNodeEntry, and found out
via also an internal common function: GetExtensibleNodeEntry.

typedef struct
{
charextnodename[EXTNODENAME_MAX_LEN];
const void *extnodemethods;
 } ExtensibleNodeEntry;

ExtensibleNodeMethods and CustomScanMethods shall be stored in
'extensible_node_methods' and 'custom_scan_methods' separatedly.
The entrypoint functions calls above internal common functions with
appropriate HTAB variable.

It will be re-usable if we would have further extensible nodes in the
future versions.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


pgsql-v9.6-custom-scan-serialization-reworks.5.patch
Description: pgsql-v9.6-custom-scan-serialization-reworks.5.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Robert Haas

On Mon, Mar 28, 2016 at 8:46 AM, Dilip Kumar  wrote:
> Found one problem with V15, so sending the new version.
> In V15 I am taking prev_blkno as targetBlock instead it should be the last
> block of the relation at that time. Attaching new patch.

 BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+prev_blkno = RelationGetNumberOfBlocks(relation);

Absolutely not.  There is no way it's acceptable to introduce an
unconditional call to RelationGetNumberOfBlocks() into every call to
RelationGetBufferForTuple().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Robert Haas

On Sun, Mar 27, 2016 at 9:51 PM, Dilip Kumar  wrote:
> I think this is better option, Since we will search last two pages of FSM
> tree, then no need to update the upper level of the FSM tree. Right ?

Well, it's less important in that case, but I think it's still worth
doing.  Some people are going to do just plain GetPageWithFreeSpace().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-28 Thread Robert Haas

On Mon, Mar 28, 2016 at 9:36 PM, Kouhei Kaigai  wrote:
> I don't have a strong reason to keep these stuff in separate files.
> Both stuffs covers similar features and amount of code are enough small.
> So, the attached v4 just merged custom-node.[ch] stuff into extensible.
>
> Once we put similar routines closely, it may be better to consolidate
> these routines.
> As long as EXTNODENAME_MAX_LEN == CUSTOM_NAME_MAX_LEN, both features
> have identical structure layout, so it is easy to call an internal
> common function to register or find out a table of callbacks according
> to the function actually called by other modules.
>
> I'm inclined to think to replace EXTNODENAME_MAX_LEN and
> CUSTOM_NAME_MAX_LEN by NAMEDATALEN again, to fit structure layout.

I don't think that we need both EXTNODENAME_MAX_LEN and
CUSTOM_NAME_MAX_LEN; we can use EXTNODENAME_MAX_LEN for both.  I'm
opposed to using NAMEDATALEN for anything unrelated to the size of a
Name.  If it's not being stored in a catalog, it doesn't need to care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-28 Thread Robert Haas

On Thu, Mar 17, 2016 at 2:26 AM, Ashutosh Bapat
 wrote:
> On Wed, Mar 16, 2016 at 10:22 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>> > On Wed, Mar 16, 2016 at 4:10 AM, Ashutosh Bapat <
>> > ashutosh.ba...@enterprisedb.com> wrote:
>> >> In 9.5, postgres_fdw allowed to prepare statements involving foreign
>> >> tables without an associated user mapping as long as planning did not
>> >> require the user mapping. Remember, planner would require user mapping
>> >> in
>> >> case use_remote_estimate is ON for that foreign table. The user mapping
>> >> would be certainly required at the time of execution. So executing such
>> >> a
>> >> prepared statement would throw error. If somebody created a user
>> >> mapping
>> >> between prepare and execute, execute would not throw an error. A join
>> >> can
>> >> be pushed down only when user mappings associated with the joining
>> >> relations are known and they match. But given the behavior of 9.5 we
>> >> should
>> >> let the prepare succeed, even if the join can not be pushed down
>> >> because of
>> >> unknown user mapping. Before this fix, the test was not letting the
>> >> prepare
>> >> succeed, which is not compliant with 9.5.
>>
>> > If a query against a single table with no user mapping is legal, I don't
>> > see why pushing down a join between two tables neither of which has a
>> > user
>> > mapping shouldn't also be legal.
>>
>> The key point here is that Ashutosh is arguing on the basis of the
>> behavior of postgres_fdw, which is not representative of all FDWs.
>> The core code has no business assuming that all FDWs require user
>> mappings; file_fdw is a counterexample.
>>
>
> Here's what the patch is doing.
>
> The "core" code FDW is given chance to add paths for a join between foreign
> relations if they are from the same (valid) server and have same user
> mappings, even if the user mapping is invalid. This is code in
> build_join_rel(). So, from core code's perspective it's perfectly fine to
> push a join down when both sides do not have valid user mapping associated
> with them. So, file_fdw is capable of implementing join pushdown even
> without having user mapping.
>
> postgres_fdw code however is different. Rest of the discussion only pertains
> to postgres_fdw. The comment in postgres_fdw.sql testcase, to which Robert
> objected, is relevant only for postgres_fdw.
>
> postgres_fdw requires user mapping to execute the query. For base foreign
> tables, it's fine not to have a user mapping at the time of planning as long
> as planner doesn't need to query the foreign server. But it certainly
> requires a user mapping at the time of execution, or else it throws error at
> the time of execution. So, Robert's statement "If a query against a single
> table with no user mapping is legal" is not entirely correct in the context
> of postgres_fdw; postgresGetForeignRelSize(), which is called during
> planning, can throw error if it doesn't find a valid user mapping. If a join
> between two postgres_fdw foreign relations without valid user mappings is
> decided to be pushed down at the time of planning, it will require a user
> mapping at the time of execution. This user mapping is derived from the
> joining relations recursively. If all of them have same user mapping (even
> if invalid) it's fine. If it's invalid, we will throw error at the time of
> execution. But if they acquire different user mappings, postgres_fdw can not
> fire the join query. It can not use any of the user mappings since that will
> compromise security. So, we have landed with a plan which can not be
> executed. To be on the safer side, postgres_fdw does not push a join down if
> joining sides do not have a valid user mapping. A base foreign relation with
> postgres_fdw will require a user mapping, for all serious uses. So, it's not
> likely that we will end up with joins between postgres_fdw foreign relations
> with invalid user mapping.

Makes sense to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-28 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Friday, March 25, 2016 12:27 AM
> To: Petr Jelinek
> Cc: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
> 
> On Wed, Mar 23, 2016 at 1:36 PM, Petr Jelinek  wrote:
> > Ok, I am happy with it, marked it as ready for committer (it was marked as
> > committed although it wasn't committed).
> 
> Thanks for fixing the status.   I had forgotten about this thread.
> 
> I can't really endorse the naming conventions here.  I mean, we've got
> the main extensible nodes stuff in extensible.h, and then we've got
> this stuff in custom_node.h (BTW, there is a leftover reference to
> custom-node.h).  There's no hint in the naming that this relates to
> scans, and why is it extensible in one place and custom in another?
> 
> I'm not quite sure how to clean this up.  At a minimum, I think we
> should standardize on "custom_scan.h" instead of "custom_node.h".  I
> think that would be clearer.  But I'm wondering if we should bite the
> bullet and rename everything from "custom" to "extensible" and declare
> it all in "extensible.h".
>
I don't have a strong reason to keep these stuff in separate files.
Both stuffs covers similar features and amount of code are enough small.
So, the attached v4 just merged custom-node.[ch] stuff into extensible.

Once we put similar routines closely, it may be better to consolidate
these routines.
As long as EXTNODENAME_MAX_LEN == CUSTOM_NAME_MAX_LEN, both features
have identical structure layout, so it is easy to call an internal
common function to register or find out a table of callbacks according
to the function actually called by other modules.

I'm inclined to think to replace EXTNODENAME_MAX_LEN and
CUSTOM_NAME_MAX_LEN by NAMEDATALEN again, to fit structure layout.

> src/backend/nodes/custom_node.c:45: indent with spaces.
> +}
> 
Oops, thanks,

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



pgsql-v9.6-custom-scan-serialization-reworks.4.patch
Description: pgsql-v9.6-custom-scan-serialization-reworks.4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-03-28 Thread Stephen Frost

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas  wrote:
> > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO  wrote:
> >> v40 is yet another rebase.
> >
> > Thanks.  Committed after removing an unnecessary parameter from the
> > coerceToXXX() functions.
> >
> > I guess the question here is whether we want the part-c patch, which
> > removes the historical \setrandom syntax.  That's technically no
> > longer needed since we now can use random() as a function directly
> > inside \set commands, but we might want it anyway for backward
> > compatibility.
> >
> > Anybody have an opinion on that?
> 
> +1 for nuking it. That's not worth the trouble maintaining it.

If we don't nuke it, it'll never die.

See also: pg_shadow

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] extend pgbench expressions with functions

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas  wrote:
> On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO  wrote:
>> v40 is yet another rebase.
>
> Thanks.  Committed after removing an unnecessary parameter from the
> coerceToXXX() functions.
>
> I guess the question here is whether we want the part-c patch, which
> removes the historical \setrandom syntax.  That's technically no
> longer needed since we now can use random() as a function directly
> inside \set commands, but we might want it anyway for backward
> compatibility.
>
> Anybody have an opinion on that?

+1 for nuking it. That's not worth the trouble maintaining it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 10:06 AM, Robert Haas  wrote:
> On Sun, Mar 27, 2016 at 10:35 PM, Michael Paquier
>  wrote:
>> On Fri, Mar 25, 2016 at 9:48 PM, Andrew Dunstan  wrote:
>>> OK, sounds good.
>>
>> Just a side-note. Andres has pushed the fix for the GinIs* macros as
>> af4472bc, making patch 0003 from the last series useless now.
>
> I'm not prepared to get very involved in this patch series since I am
> not a Windows guy, but I went ahead and pushed 0001 anyway.  I'm not
> sure that we want to commit 0002 without a BF machine.  Can anybody
> help with that?

I'll try to get a machine up and running, I guess Win7 with VS2015 at
this stage.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 6:19 AM, Tom Lane  wrote:
> Sync tzload() and tzparse() APIs with IANA release tzcode2016c.
>
> This brings us a bit closer to matching upstream, but since it affects
> files outside src/timezone/, we might choose not to back-patch it.
> Hence keep it separate from the main update patch.

Buildfarm-not-being-happy-status: woodloose, mastodon, thrips, jacana.
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=woodlouse=2016-03-29%2000%3A42%3A08

The origin of the problem is that, which prevents all the subsequent
queries to fail:
  SET TimeZone to 'UTC';
+ ERROR:  invalid value for parameter "TimeZone": "UTC"
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-28 Thread Robert Haas

On Sun, Mar 27, 2016 at 10:35 PM, Michael Paquier
 wrote:
> On Fri, Mar 25, 2016 at 9:48 PM, Andrew Dunstan  wrote:
>> OK, sounds good.
>
> Just a side-note. Andres has pushed the fix for the GinIs* macros as
> af4472bc, making patch 0003 from the last series useless now.

I'm not prepared to get very involved in this patch series since I am
not a Windows guy, but I went ahead and pushed 0001 anyway.  I'm not
sure that we want to commit 0002 without a BF machine.  Can anybody
help with that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in comment

2016-03-28 Thread Robert Haas

On Sat, Mar 26, 2016 at 4:21 PM, Thomas Munro
 wrote:
> Here are a couple of patches to fix a typo in a comment in latch.c:
>
> - * The memory barrier has be to be placed here to ensure that any flag
> + * The memory barrier has to be placed here to ensure that any flag

Committed, but it hardly seems worth back-patching.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-03-28 Thread Robert Haas

On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO  wrote:
> v40 is yet another rebase.

Thanks.  Committed after removing an unnecessary parameter from the
coerceToXXX() functions.

I guess the question here is whether we want the part-c patch, which
removes the historical \setrandom syntax.  That's technically no
longer needed since we now can use random() as a function directly
inside \set commands, but we might want it anyway for backward
compatibility.

Anybody have an opinion on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-28 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Saturday, March 19, 2016 8:57 AM
> To: Tom Lane
> Cc: Robert Haas; Petr Jelinek; David Rowley; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP: Upper planner pathification
> 
> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> > Sent: Friday, March 18, 2016 11:44 PM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: Robert Haas; Petr Jelinek; David Rowley; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] WIP: Upper planner pathification
> >
> > Kouhei Kaigai  writes:
> > > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane  wrote:
> > >> I do not, however, like the proposal to expose wflists and so forth.
> > >> Those are internal data structures in grouping_planner and I absolutely
> > >> refuse to promise that they're going to stay stable.
> >
> > > Hmm... It's not easy to imagine a case that extension wants own idea
> > > to extract window functions from the target list and select active
> > > windows, even if extension wants to have own executor and own cost
> > > estimation logic.
> > > In case when extension tries to add WindowPath + CustomPath(Sort),
> > > extension is interested in alternative sort task, but not window
> > > function itself. It is natural to follow the built-in implementation,
> > > thus, it motivates extension author to take copy & paste the code.
> > > select_active_windows() is static, so extension needs to have same
> > > routine on their side.
> >
> > Well, to be perfectly blunt about it, I have said from day one that this
> > notion that a CustomScan extension will be able to cause arbitrary planner
> > behavior changes is loony.  We are simply not going to drop a hook into
> > every tenth line of the planner for you, nor de-static-ify every internal
> > function, nor (almost equivalently) expose the data structures those
> > functions produce, because it would cripple core planner development to
> > try to keep the implied APIs stable.  And I continue to maintain that any
> > actually-generally-useful ideas would be better handled by submitting them
> > as patches to the core planner, rather than trying to implement them in an
> > arms-length extension.
> >
> > In the case at hand, I notice that the WindowFuncLists struct is
> > actually from find_window_functions in clauses.c, so an extension
> > that needed to get hold of that would be unlikely to do any copying
> > and pasting anyhow -- it'd just call find_window_functions again.
> > (That only needs to search the targetlist, so it's not that expensive.)
> > The other lists you mention are all tightly tied to specific, and not
> > terribly well-designed, implementation strategies for grouping sets and
> > window functions.  Those are *very* likely to change in the near future;
> > and even if they don't, I'm skeptical that an external implementor of
> > grouping sets or window functions would want to use exactly those same
> > implementation strategies.  Maybe the only reason you're there at all
> > is you want to be smarter about the order of doing window functions,
> > for example.
> >
> > So I really don't want to export any of that stuff.
> >
> Hmm. I could understand we have active development around this area
> thus miscellaneous internal data structure may not be enough stable
> to expose the extension.
> Don't you deny recall the discussion once implementation gets calmed
> down on the future development cycle?
> 
> > As for other details of the proposed patch, I was intending to put
> > all the hook calls into grouping_planner for consistency, rather than
> > scattering them between grouping_planner and its subroutines.  So that
> > would probably mean calling the hook for a given step *before* we
> > generate the core paths for that step, not after.  Did you have a
> > reason to want the other order?  (If you say "so the hook can look
> > at the core-made paths", I'm going to say "that's a bad idea".  It'd
> > further increase the coupling between a CustomScan extension and core.)
> >
> No deep reason. I just followed the manner in scan/join path hook; that
> calls extension once the core feature adds built-in path nodes.
>
Ah, I oversight a deep reason.
ForeignScan/CustomScan may have an alternative execution path if extension
supports corner case handling for a case when these node cannot execute or
can execute but unreasonable cost actually, on execution time, even if
planner thought the path is most reasonable on optimization time.

One example at scan/join was EPQ rechecks. Remote join invocation was not
reasonable for each EPQ recheck, thus, ForeignPath had fdw_outerpath field
to pick up a core-made path.

I can expect some other similar cases. For example, a custom-scan that
support multi-node aggregation wants to switch to

Re: [HACKERS] Some messages of pg_rewind --debug not getting translated

2016-03-28 Thread Michael Paquier

On Tue, Mar 29, 2016 at 2:45 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Wed, Mar 23, 2016 at 12:24 AM, Alvaro Herrera
>>  wrote:
>> > Seems reasonable.  For the last hunk in your patch, though, I would add
>> > a /* translator: */ comment explaining what each of the values is;
>> > otherwise it's just incomprehensible percent-sign-salad for the poor
>> > translator.  Note that if the line is too long you have to use dirty
>> > tricks to avoid pgindent from messing it up (see 673b52753489 for
>> > example)
>>
>> OK, done this way.
>>
>> > I'm not sure if we should keep "blk" in the original or make it a
>> > complete word.
>>
>> OK, a complete word makes more sense.
>
> Thanks, pushed.

Thanks.

> If you're interesting in improving translatability of this program
> further, I suggest that messages of this sort
>   msgid "BKPBLOCK_HAS_DATA set, but no data included at %X/%X"
> should have a %s instead of the flag name.

I'll think about it. Though I am afraid this would reduce the code
readability in xlogreader.c
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP / recovery-test fs-level backups, psql enhancements etc

2016-03-28 Thread Alvaro Herrera

Craig Ringer wrote:

> I found a minor issue with the new psql method while writing tests for
> failover slots. Patch attached.

Thanks, pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pthread portability

2016-03-28 Thread Alvaro Herrera

Michael McConville wrote:
> The below diff fixes one problem: you can't compare pthread_t values
> directly. Only the function pthread_equal(3) is defined. Direct
> comparison usually works because most implementations define pthread_t
> as an integer type.

So is there a platform where this assumption doesn't hold?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pthread portability

2016-03-28 Thread Michael McConville

The below diff fixes one problem: you can't compare pthread_t values
directly. Only the function pthread_equal(3) is defined. Direct
comparison usually works because most implementations define pthread_t
as an integer type.

Relatedly, INVALID_THREAD is defined as (pthread_t)0. I don't think this
is a portable way of checking whether a thread is valid, and I don't
know if it's actually possible to get an "invalid" thread out of
pthread_create(3) if it succeeds (returns 0). In the cases where you're
setting a pthread_t to INVALID_THREAD, maybe using a struct that
includes a pthread_t and a 'valid' bool would be preferable.

Thanks for your time,
Michael


diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4196b0e..f2e5aed 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3791,7 +3791,7 @@ main(int argc, char **argv)
{
int err = 
pthread_create(>thread, NULL, threadRun, thread);
 
-   if (err != 0 || thread->thread == INVALID_THREAD)
+   if (err != 0 || pthread_equal(thread->thread, 
INVALID_THREAD))
{
fprintf(stderr, "could not create thread: 
%s\n", strerror(err));
exit(1);
@@ -3819,7 +3819,7 @@ main(int argc, char **argv)
TState *thread = [i];
 
 #ifdef ENABLE_THREAD_SAFETY
-   if (threads[i].thread == INVALID_THREAD)
+   if (pthread_equal(threads[i].thread, INVALID_THREAD))
/* actually run this thread directly in the main thread 
*/
(void) threadRun(thread);
else


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] help for old extension, first_last_agg bug in install with pg9.5

2016-03-28 Thread Peter Krauss

The C implementation is simple, the problem is with the makefile,
   https://github.com/wulczer/first_last_agg/issues/2

*some clues?*

- - -

make
Makefile:25: /usr/lib/postgresql/9.5/lib/pgxs/src/makefiles/pgxs.mk:
  No such file or directory
make: *** No rule to make target
  `/usr/lib/postgresql/9.5/lib/pgxs/src/makefiles/pgxs.mk'.
  Stop.

Using UBUNTU 14 LTS

See also https://github.com/wulczer/first_last_agg/blob/master/Makefile

Re: [HACKERS] raw output from copy

2016-03-28 Thread Tom Lane

Pavel Stehule  writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN is missing in multicolumn indexes documentation

2016-03-28 Thread Alvaro Herrera

Petr Jediný wrote:
> Comments? This is my first patch to postgresql project, so comments
> are very much welcomed.

I think it's fine, so I pushed it.  Thanks.

I hope you already read this
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Petr Jelinek


On 28/03/16 14:46, Dilip Kumar wrote:


Conclusion:
---
1. I think v15 is solving the problem exist with v13 and performance
is significantly high compared to base, and relation size is also
stable, So IMHO V15 is winner over other solution, what other thinks ?



It seems so, do you have ability to reasonably test with 64 clients? I 
am mostly wondering if we see the performance going further down or just 
plateau.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dealing with collation and strcoll/strxfrm/etc

2016-03-28 Thread Peter Geoghegan

On Mon, Mar 28, 2016 at 12:36 PM, Stephen Frost  wrote:
> Having to figure out how each and every stdlib does versioning doesn't
> sound fun, I certainly agree with you there, but it hardly seems
> impossible.  What we need, even if we look to move to ICU, is a place to
> remember that version information and a way to do something when we
> discover that we're now using a different version.

I think that the versioning situation is all over the place. It isn't
in the C standard. And there are many different versions of many
different stdlibs to support. Most importantly, where support
nominally exists, a strong incentive to get it exactly right may not.
We've seen that already.

> I'm not quite sure what the best way to do that is, but I imagine it
> involves changes to existing catalogs or perhaps even a new one.  I
> don't have any particularly great ideas for existing releases (maybe
> stash information in the index somewhere when it's rebuilt and then
> check it and throw an ERROR if they don't match?)

I think we'd need to introduce an abstraction like a "collation
provider", of which ICU would theoretically be just one. The OS would
be a baked-in collation provider. Everything that works today would
continue to work. We'd then largely just be grandfathering out systems
that rely on OS locales across major version upgrades, since the vast
majority of users are happy with Unicode, and have no cultural or
technical reason to prefer the OS locales that I can think of.

I am unconvinced with the idea that it especially matters that sort(1)
might not be in agreement with Postgres. Neither is any Java app, or
any .Net app, or the user's web browser in the case of Safari or
Google Chrome (maybe others). I want Postgres to be consistent with
Postgres, across different nodes on the network, in environments where
I may have little knowledge of the underlying OS. Think "sort pushdown
in postgres_fdw".

Users from certain East Asian user communities might prefer to stick
with regional encodings, perhaps due to specific concerns about the
Han Unification controversy. But I'm pretty sure that these users have
very low expectations about collations in Postgres today. I was
recently told that collating Japanese is starting to get a bit better,
due to various new initiatives, but that most experienced Japanese
Postgres DBAs tend to use the "C" collation.

I don't want to impose a Unicode monoculture on anyone. But I do think
there are clear benefits for the large majority of users that always
use Unicode. Nothing needs to break that works today to make this
happen. Abbreviated keys provide an immediate incentive for users to
adopt ICU; users that might otherwise be on the fence about it.

>> The question is only how we deal with this when it happens. One thing
>> that's attractive about ICU is that it makes this explicit, both for
>> the logical behavior of a collation, as well as the stability of
>> binary sort keys (Glibc's versioning seemingly just does the former).
>> So the equivalent of strxfrm() output has license to change for
>> technical reasons that are orthogonal to the practical concerns of
>> end-users about how text sorts in their locale. ICU is clear on what
>> it takes to make binary sort keys in indexes work. And various major
>> database systems rely on this being right.
>
> There seems to be some disagreement about if ICU provides the
> information we'd need to make a decision or not.  It seems like it
> would, given its usage in other database systems, but if so, we need to
> very clearly understand exactly how it works and how we can depend on
> it.

It seems likely that it exposes the information required to make what
we need to do practical.

Certainly, adopting ICU is a big project that we should proceed
cautiously with, but there is a reason why every other major database
system uses either ICU, or a library based on UCA [1] that allows the
system to centrally control versioned collations (SQLite just makes
this optional).

I think that ICU *could* still tie us to the available collations on
an OS (those collations that are available with their ICU packages).
What I haven't figured out yet is if it's practical to install
versions that are available from some central location, like the CLDR
[2]. I don't think we'd want to have Postgres ship "supported
collations" in each major version, in roughly the style of the IANA
timezone stuff, but it's far too early to rule that out. It would have
upsides.

[1] https://en.wikipedia.org/wiki/Unicode_collation_algorithm
[2] http://cldr.unicode.org/
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SQL access to access method details

2016-03-28 Thread Jim Nasby

While working on a tool to capture catalog/stats info and looking at 
cross version compatibility, I noticed that the pg_am changes removed 
SQL access to a bunch of AM info. [1] indicates that's part of the 
purpose of the patch; are we sure no tools are using this info? Unlike 
previous catalog compatibility breaks, this one completely removes that 
information, so if someone was using it they're now completely hosed.


[1] http://www.postgresql.org/message-id/55fec1ab.8010...@2ndquadrant.com
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dealing with collation and strcoll/strxfrm/etc

2016-03-28 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> On Mon, Mar 28, 2016 at 7:57 AM, Stephen Frost  wrote:
> > If we're going to talk about minimum requirements, I'd like to argue
> > that we require whatever system we're using to have versioning (which
> > glibc currently lacks, as I understand it...) to avoid the risk that
> > indexes will become corrupt when whatever we're using for collation
> > changes.  I'm pretty sure that's already bitten us on at least some
> > RHEL6 -> RHEL7 migrations in some locales, even forgetting the issues
> > with strcoll vs. strxfrm.
> 
> I totally agree that anything we should adopt should support
> versioning. Glibc does have a non-standard versioning scheme, but we
> don't use it. Other stdlibs may do versioning another way, or not at
> all. A world in which ICU is the defacto standard for Postgres (i.e.
> the actual standard on all major platforms), we mostly just have one
> thing to target, which seems like something to aim for.

Having to figure out how each and every stdlib does versioning doesn't
sound fun, I certainly agree with you there, but it hardly seems
impossible.  What we need, even if we look to move to ICU, is a place to
remember that version information and a way to do something when we
discover that we're now using a different version.

I'm not quite sure what the best way to do that is, but I imagine it
involves changes to existing catalogs or perhaps even a new one.  I
don't have any particularly great ideas for existing releases (maybe
stash information in the index somewhere when it's rebuilt and then
check it and throw an ERROR if they don't match?)

> The question is only how we deal with this when it happens. One thing
> that's attractive about ICU is that it makes this explicit, both for
> the logical behavior of a collation, as well as the stability of
> binary sort keys (Glibc's versioning seemingly just does the former).
> So the equivalent of strxfrm() output has license to change for
> technical reasons that are orthogonal to the practical concerns of
> end-users about how text sorts in their locale. ICU is clear on what
> it takes to make binary sort keys in indexes work. And various major
> database systems rely on this being right.

There seems to be some disagreement about if ICU provides the
information we'd need to make a decision or not.  It seems like it
would, given its usage in other database systems, but if so, we need to
very clearly understand exactly how it works and how we can depend on
it.

> > Regarding key abbreviation and performance, if we are confident that
> > strcoll and strxfrm are at least independently internally consistent
> > then we could consider offering an option to choose between them.
> 
> I think they just need to match, per the standard. After all,
> abbreviation will sometimes require strcoll() tie-breakers.

Ok, I didn't see that in the man-pages.  If that's the case then it
seems like there isn't much hope of just using strxfrm().

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Sequence Access Method WIP

2016-03-28 Thread Fabrízio de Royes Mello

On Thu, Mar 24, 2016 at 6:12 PM, Petr Jelinek  wrote:
>
> Hi,
>
> I rebased this on top of the recently committed CREATE ACCESS METHOD.
>

Hi,

I got the above error trying to apply to the current master:

$ git apply /home/fabrizio/Downloads/0001-seqam-2016-03-24.patch
error: patch failed: src/backend/commands/amcmds.c:29
error: src/backend/commands/amcmds.c: patch does not apply


There are a wrong definition at the beginning of the amcmds.c:

 34 <<< ours
 35 static Oid  lookup_index_am_handler_func(List *handler_name, char
amtype);
 36 static const char *get_am_type_string(char amtype);
 37 ===
 38 static Oid  lookup_am_handler_func(List *handler_name, char amtype);
 39 static char *get_am_type_string(char amtype);
 40 >>> theirs


After this small fix I can build and ran regress tests without errors.

But running "check-world" I got the error:

make[1]: Leaving directory `/data/postgresql/src/test/regress'
make: Leaving directory `/data/postgresql'
+ pg_dumpall -f /data/postgresql/src/bin/pg_upgrade/tmp_check/dump1.sql
ok 9 - dropuser foobar1 exit code 0
ok 10 - SQL DROP ROLE run: SQL found in server log
ok 11 - fails with nonexistent user
ok
t/080_pg_isready.pl ...
1..10
ok 1 - pg_isready --help exit code 0
ok 2 - pg_isready --help goes to stdout
ok 3 - pg_isready --help nothing to stderr
ok 4 - pg_isready --version exit code 0
ok 5 - pg_isready --version goes to stdout
ok 6 - pg_isready --version nothing to stderr
ok 7 - pg_isready with invalid option nonzero exit code
ok 8 - pg_isready with invalid option prints error message
ok 9 - fails with no server running
pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
   ^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL  WHEN increment_by < 0 AND max_value = -1
THEN NULL  ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL  WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL  ELSE min_value END AS min_value,
cache_value, is_cycled FROM check_seq
pg_dumpall: pg_dump failed on database "regression", exiting
+ pg_dumpall1_status=1
+ [ /data/postgresql != /data/postgresql ]
+
/data/postgresql/src/bin/pg_upgrade/tmp_check/install//home/fabrizio/pgsql/bin/pg_ctl
-m fast stop
waiting for server to shut down done
server stopped
+ [ -n  ]
+ [ -n  ]
+ [ -n 1 ]
+ echo pg_dumpall of pre-upgrade database cluster failed
pg_dumpall of pre-upgrade database cluster failed
+ exit 1
+ rm -rf /tmp/pg_upgrade_check-3NUa0X
make[2]: *** [check] Error 1
make[2]: Leaving directory `/data/postgresql/src/bin/pg_upgrade'
make[1]: *** [check-pg_upgrade-recurse] Error 2
make[1]: *** Waiting for unfinished jobs



And testing pg_dump itself I got the same error trying to dump a database
that contains a sequence.

fabrizio=# create sequence x;
CREATE SEQUENCE
fabrizio=# \ds
  List of relations
 Schema | Name |   Type   |  Owner
+--+--+--
 public | x| sequence | fabrizio
(1 row)

fabrizio=# \d x
  Sequence "public.x"
Column|   Type|Value
--+---+-
 start_value  | bigint| 1
 increment_by | bigint| 1
 max_value| bigint| 9223372036854775807
 min_value| bigint| 1
 cache_value  | bigint| 1
 is_cycled| boolean   | f
 amstate  | seqam_local_state | (1,f,0)
Access Method: local

fabrizio=# \q

fabrizio@bagual:~/pgsql
$ bin/pg_dump > /tmp/fabrizio.sql
pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
   ^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL  WHEN increment_by < 0 AND max_value = -1
THEN NULL  ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL  WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL  ELSE min_value END AS min_value,
cache_value, is_cycled FROM x


Toninght I'll review some parts of the code.

Regards,


Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Dealing with collation and strcoll/strxfrm/etc

2016-03-28 Thread Peter Geoghegan

On Mon, Mar 28, 2016 at 7:57 AM, Stephen Frost  wrote:
> If we're going to talk about minimum requirements, I'd like to argue
> that we require whatever system we're using to have versioning (which
> glibc currently lacks, as I understand it...) to avoid the risk that
> indexes will become corrupt when whatever we're using for collation
> changes.  I'm pretty sure that's already bitten us on at least some
> RHEL6 -> RHEL7 migrations in some locales, even forgetting the issues
> with strcoll vs. strxfrm.

I totally agree that anything we should adopt should support
versioning. Glibc does have a non-standard versioning scheme, but we
don't use it. Other stdlibs may do versioning another way, or not at
all. A world in which ICU is the defacto standard for Postgres (i.e.
the actual standard on all major platforms), we mostly just have one
thing to target, which seems like something to aim for.

Collations change from time to time, legitimately. Read from
"Collation order is not fixed", here:

http://unicode.org/reports/tr10/#Stability

The question is only how we deal with this when it happens. One thing
that's attractive about ICU is that it makes this explicit, both for
the logical behavior of a collation, as well as the stability of
binary sort keys (Glibc's versioning seemingly just does the former).
So the equivalent of strxfrm() output has license to change for
technical reasons that are orthogonal to the practical concerns of
end-users about how text sorts in their locale. ICU is clear on what
it takes to make binary sort keys in indexes work. And various major
database systems rely on this being right.

> Regarding key abbreviation and performance, if we are confident that
> strcoll and strxfrm are at least independently internally consistent
> then we could consider offering an option to choose between them.

I think they just need to match, per the standard. After all,
abbreviation will sometimes require strcoll() tie-breakers.

Clearly it would be very naive to imagine that ICU is bug-free.
However, I surmise that there is a large difference how ICU and glibc
think about things like strxfrm() or strcoll() stability and
consistency. Tom was able to demonstrate that strxfrm() and strcoll()
behaved inconsistently without too much effort, contrary to POSIX, and
in many common cases. I doubt that the Glibc maintainers are all that
concerned about it. Certainly, less concerned than they are about the
latest security bug. Whereas if this happened in ICU, it would be a
total failure of the project to fulfill its most basic goals. Our
disaster would also be a disaster for several other major database
systems. ICU carefully and explicitly considers multiple forms of
stability, "deterministic" sort ordering, etc. That *is* a big
difference, and it makes me optimistic that there'd be far fewer
problems.

I also think that ICU could be a reasonable basis for case-insensitive
collations, which would let us kill citext, a module that I consider
to be a total kludge. And, we might also be able to lock down WAL
compatibility, which would be generally useful.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-28 Thread Tom Lane

David Steele  writes:
> On 3/10/16 11:00 AM, Petr Jelinek wrote:
>> The comment above errhidefromclient says "Only log levels below ERROR
>> can be hidden from the client." but use of the errhidefromclient(true)
>> actually does hide the error message from client, client just gets
>> failed query without any message when used with ERROR level.

> Right you are.  The v3 patch adds this check.

> I also added the documentation to sources.sgml that Tom pointed out was
> missing.

I set to work on committing this, but was soon rather dissatisfied with
it, because as-implemented there is no way to short-circuit elog.c's
processing if the message is not to be sent to either the client or the
postmaster log.  Ideally this would be taken into account when errstart()
figures the output_to_client setting to begin with --- but of course we
can't do that with this API, because errhidefromclient() hasn't run yet.

The patch is buggy even without that consideration, because it would
potentially disable client output of messages even after they've been
promoted to FATAL by pg_re_throw.  We could fix that by clearing the flag
in pg_re_throw, but I think that's just another symptom of the shutoff
being done in the wrong place.

I wonder just what the expected usage scenario is for this function, and
whether it would not be better addressed by inventing some other API
rather than modeling it on errhidestmt(), which is by no means the same
kind of thing.

One idea is to invent a new elevel which is never sent to the client ---
analogously to COMMERROR, though probably much higher priority than that,
maybe the same priority as LOG.  If there actually is a use for a range of
elevels on errhidefromclient'd messages, that wouldn't work very well of
course.  Or we could consider having a flag bit that is OR'd into the
elevel, along the lines of

ereport(LOG | ERR_HIDE_FROM_CLIENT, (errmsg()));

so that the information is available at errstart time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some messages of pg_rewind --debug not getting translated

2016-03-28 Thread Alvaro Herrera

Michael Paquier wrote:
> On Wed, Mar 23, 2016 at 12:24 AM, Alvaro Herrera
>  wrote:
> > Seems reasonable.  For the last hunk in your patch, though, I would add
> > a /* translator: */ comment explaining what each of the values is;
> > otherwise it's just incomprehensible percent-sign-salad for the poor
> > translator.  Note that if the line is too long you have to use dirty
> > tricks to avoid pgindent from messing it up (see 673b52753489 for
> > example)
> 
> OK, done this way.
> 
> > I'm not sure if we should keep "blk" in the original or make it a
> > complete word.
> 
> OK, a complete word makes more sense.

Thanks, pushed.

If you're interesting in improving translatability of this program
further, I suggest that messages of this sort
  msgid "BKPBLOCK_HAS_DATA set, but no data included at %X/%X"
should have a %s instead of the flag name.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-28 Thread Claudio Freire

On Thu, Mar 24, 2016 at 7:12 PM, Peter Geoghegan  wrote:
> On Thu, Mar 24, 2016 at 7:17 AM, Robert Haas  wrote:
>> I really like this idea, and the performance results seem impressive,
>> but I think we should push this out to 9.7.  A btree patch that didn't
>> have WAL support until two and a half weeks into the final CommitFest
>> just doesn't seem to me like a good candidate.  First, as a general
>> matter, if a patch isn't code-complete at the start of a CommitFest,
>> it's reasonable to say that it should be reviewed but not necessarily
>> committed in that CommitFest.  This patch has had some review, but I'm
>> not sure how deep that review is, and I think it's had no code review
>> at all of the WAL logging changes, which were submitted only a week
>> ago, well after the CF deadline.  Second, the btree AM is a
>> particularly poor place to introduce possibly destabilizing changes.
>> Everybody depends on it, all the time, for everything.  And despite
>> new tools like amcheck, it's not a particularly easy thing to debug.
>
> Regrettably, I must agree. I don't see a plausible path to commit for
> this patch in the ongoing CF.
>
> I think that Anastasia did an excellent job here, and I wish I could
> have been of greater help sooner. Nevertheless, it would be unwise to
> commit this given the maturity of the code. There have been very few
> instances of performance improvements to the B-Tree code for as long
> as I've been interested, because it's so hard, and the standard is so
> high. The only example I can think of from the last few years is
> Kevin's commit 2ed5b87f96 and Tom's commit 1a77f8b63d both of which
> were far less invasive, and Simon's commit c7111d11b1, which we just
> outright reverted from 9.5 due to subtle bugs (and even that was
> significantly less invasive than this patch). Improving nbtree is
> something that requires several rounds of expert review, and that's
> something that's in short supply for the B-Tree code in particular. I
> think that a new testing strategy is needed to make this easier, and I
> hope to get that going with amcheck. I need help with formalizing a
> "testing first" approach for improving the B-Tree code, because I
> think it's the only way that we can move forward with projects like
> this. It's *incredibly* hard to push forward patches like this given
> our current, limited testing strategy.

I've been toying (having gotten nowhere concrete really) with prefix
compression myself, I agree that messing with btree code is quite
harder than it ought to be.

Perhaps trying experimental format changes in a separate experimental
am wouldn't be all that bad (say, nxbtree?). People could opt-in to
those, by creating the indexes with nxbtree instead of plain btree
(say in development environments) and get some testing going without
risking much.

Normally the same effect should be achievable with mere flags, but
since format changes to btree tend to be rather invasive, ensuring the
patch doesn't change behavior with the flag off is hard as well, hence
the wholly separate am idea.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-28 Thread Alvaro Herrera

Alexander Korotkov wrote:
> On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera 
> wrote:
> 
> > .. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
> > getAccessMethods.  And we're missing psql tab-complete support for the
> > new commands.
> 
> Attached patches fix both these issues.

I see Teodor just pushed these two fixes.  Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-28 Thread Amit Kapila

On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila 
wrote:
>
> On Thu, Sep 3, 2015 at 5:11 PM, Andres Freund  wrote:
> >
>
> Updated comments and the patch (increate_clog_bufs_v2.patch)
> containing the same is attached.
>

Andres mentioned to me in off-list discussion, that he thinks we should
first try to fix the clog buffers problem as he sees in his tests that clog
buffer replacement is one of the bottlenecks. He also suggested me a test
to see if the increase in buffers could lead to regression.  The basic idea
of test was to ensure every access on clog access to be a disk one.  Based
on his suggestion, I have written a SQL statement which will allow every
access of CLOG to be a disk access and the query used for same is as below:
With ins AS (INSERT INTO test_clog_access values(default) RETURNING c1)
Select * from test_clog_access where c1 = (Select c1 from ins) - 32768 *
:client_id;

Test Results
-
HEAD - commit d12e5bb7 Clog Buffers - 32
Patch-1 - Clog Buffers - 64
Patch-2 - Clog Buffers - 128

Patch_Ver/Client_Count 1 64
HEAD 12677 57470
Patch-1 12305 58079
Patch-2 12761 58637

Above data is a median of 3 10-min runs.  Above data indicates that there
is no substantial dip in increasing clog buffers.

Test scripts used in testing are attached with this mail.  In
perf_clog_access.sh, you need to change data_directory path as per your
m/c, also you might want to change the binary name, if you want to create
postgres binaries with different names.

Andres, Is this test inline with what you have in mind?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

access_clog_prep.sql
Description: Binary data

access_clog.sql
Description: Binary data

perf_clog_access.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN is missing in multicolumn indexes documentation

2016-03-28 Thread Petr Jediný

Comments? This is my first patch to postgresql project, so comments
are very much welcomed.

PJ

On Wed, Mar 23, 2016 at 8:40 PM, Petr Jediný  wrote:
>>>
>>> Multicolumn BRIN is like GIN.  Every column is indexed separately.
>>> The order of the columns doesn't matter.
>>
>> Right.  You can use one index to cover all columns; the position of the
>> column in the index won't matter for a query that uses one column.  The
>> only reason to have multiple BRIN indexes on a single table is to have
>> a different pages_per_range.
>
> Thanks for the information, I've attached the patch updated to v2.
>
>
> PJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-28 Thread Paul Ramsey

On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost  wrote:
> Paul,
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
>> I spent some time over the weekend trying out the different modes of
>> parallel query (seq scan, aggregate, join) in combination with PostGIS
>> and have written up the results here:
>>
>> http://blog.cleverelephant.ca/2016/03/parallel-postgis.html
>
> Neat!
>
> Regarding aggregate parallelism and the cascaded union approach, though
> I imagine in other cases as well, it seems like having a
> "final-per-worker" function for aggregates would be useful.
>
> Without actually looking at the code at all, it seems like that wouldn't
> be terribly difficult to add.
>
> Would you agree that it'd be helpful to have for making the st_union()
> work better in parallel?

For our particular situation w/ ST_Union, yes, it would be ideal to be
able to run a worker-side combine function as well as the master-side
one. Although the cascaded union would be less effective spread out
over N nodes, doing it only once per worker, rather than every N
records would minimize the loss of effectiveness.

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-28 Thread Stephen Frost

Paul,

* Paul Ramsey (pram...@cleverelephant.ca) wrote:
> I spent some time over the weekend trying out the different modes of
> parallel query (seq scan, aggregate, join) in combination with PostGIS
> and have written up the results here:
> 
> http://blog.cleverelephant.ca/2016/03/parallel-postgis.html

Neat!

Regarding aggregate parallelism and the cascaded union approach, though
I imagine in other cases as well, it seems like having a
"final-per-worker" function for aggregates would be useful.

Without actually looking at the code at all, it seems like that wouldn't
be terribly difficult to add.

Would you agree that it'd be helpful to have for making the st_union()
work better in parallel?

Though I do wonder if you would end up wanting to have a different
final() function in that case..

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Parallel Queries and PostGIS

2016-03-28 Thread Paul Ramsey

I spent some time over the weekend trying out the different modes of
parallel query (seq scan, aggregate, join) in combination with PostGIS
and have written up the results here:

http://blog.cleverelephant.ca/2016/03/parallel-postgis.html

The TL:DR; is basically

* With some adjustments to function COST both parallel sequence scan
and parallel aggregation deliver very good parallel performance
results.
* The cost adjustments for sequence scan and aggregate scan are not
consistent in magnitude.
* Parallel join does not seem to work for PostGIS indexes yet, but
perhaps there is some magic to learn from PostgreSQL core on that.

The two findings at the end are ones that need input from parallel
query masters...

We recognize we'll have to adjust costs to that our particular use
case (very CPU-intensive calculation per function) is planned better,
but it seems like different query modes are interpreting costs in
order-of-magnitude different ways in building plans.

Parallel join would be a huge win, so some help/pointers on figuring
out why it's not coming into play when our gist operators are in
effect would be helpful.

Happy Easter to you all,
P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] backup tools ought to ensure created backups are durable

2016-03-28 Thread Magnus Hagander

On Mon, Mar 28, 2016 at 3:12 PM, Andres Freund  wrote:

> On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> > On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >
> > > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund 
> wrote:
> > > > As pointed out in
> > > >
> > >
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de
> > > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > > make any efforts to ensure their output is durable.
> > > >
> > > > I think for backup tools of possibly critical data, that's pretty
> much
> > > > unaceptable.
> > >
> > > Definitely agreed, once a backup/dump has been taken and those
> > > utilities exit, we had better ensure that they are durably on disk.
> > > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > > require a target directory for the location of the output result, we
> > > really can make things better.
> > >
> > >
> > Definitely agreed on fixing it. But I don't think your summary is right.
> >
> > pg_basebackup in tar mode can be sent to stdout, does not require a
> > directory. And the same for pg_dump in any mode except for directory. So
> we
> > can't just drive it off the mode, some more detailed checks are required.
>
> if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
> warning somewhere in the docs about the tools not fsyncing if you pipe
> their output data somewhere?
>

That should work yeah. And given that we already use that check in other
places, it seems it should be perfectly safe. And as long as we only do a
WARNING and not abort if the fsync fails, we should be OK if people
intentionally store their backups on an fs that doesn't speak fsync (if
that exists), in which case I don't really think we even need a switch to
turn it off.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Alvaro Herrera

Tomas Vondra wrote:

> I'm not sure about the prototypes though. It was a bit weird because
> prototypes in the same header file were formatted very differently.

Yeah, it is very odd.  What happens is that the BSD indent binary does
one thing (return type is in one line and function name in following
line; subsequent argument lines are aligned to opening parens), then the
pgindent perl script changes it (moves function name to same line as
return type, but does not reindent subsequent lines of arguments).

You can imitate the effect by adding an extra newline just before the
function name, reflowing the arguments to align to the (, then deleting
the extra newline.  Rather annoying.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alvaro Herrera

Andres Freund wrote:
> On 2016-03-28 15:46:43 +0300, Alexander Korotkov wrote:

> > @@ -932,8 +936,13 @@ ReadBuffer_common(SMgrRelation smgr, cha
> >  
> > if (isLocalBuf)
> > {
> > -   /* Only need to adjust flags */
> > -   bufHdr->flags |= BM_VALID;
> > +   /*
> > +* Only need to adjust flags.  Since it's local buffer, there 
> > is no
> > +* concurrency.  We assume read/write pair to be cheaper than 
> > atomic OR.
> > +*/
> > +   uint32 state = pg_atomic_read_u32(>state);
> > +   state |= BM_VALID;
> > +   pg_atomic_write_u32(>state, state);
> 
> I'm not a fan of repeating that comment in multiple places. Hm.

Perhaps a set of macros should be offered that do "read, set some flag,
write".  Then the comments can be attached to the macro instead of to
each callsite.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Robert Haas

On Mon, Mar 28, 2016 at 10:24 AM, Tom Lane  wrote:
> I'm also not exactly convinced by your implicit assumption that ICU is
> bug-free.

Noah spent some time looking at ICU back when he was EnterpriseDB, and
his conclusion was that ICU collations weren't stable across releases,
which is pretty much the same problem we're running into with glibc
collations.  Now it might still be true that they have the equivalent
of strxfrm() and strcoll() and that those things behave consistently
with each other, and that would be very good.  Everybody seems to
agree it's faster, and that's good, too.  But I wonder what we do
about the fact that, as with glibc, an ICU upgrade involves a REINDEX
of every potentially affected index.  It seems like ICU has some
facilities built into it that might be useful for detecting and
handling such situations, but I don't understand them well enough to
know whether they'd solve our versioning problems or how effectively
they would do so, and I think there are packaging issues that tie into
it, too.  http://userguide.icu-project.org/design mentions building
with specific configure flags if you need to link with multiple server
versions, and I don't know what operating system packagers typically
do about that stuff.

In any case, I agree that we'd be very unwise to think that ICU is
necessarily going to be bug-free without testing it carefully.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dealing with collation and strcoll/strxfrm/etc

2016-03-28 Thread Stephen Frost

All,

Changed the thread name (we're no longer talking about release
notes...).

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Oleg Bartunov  writes:
> > Should we start thinking about ICU ?
> 
> Isn't it still true that ICU fails to meet our minimum requirements?
> That would include (a) working with the full Unicode character range
> (not only UTF16) and (b) working with non-Unicode encodings.  No doubt
> we could deal with (b) by inserting a conversion, but that would take
> a lot of shine off the performance numbers you mention.
> 
> I'm also not exactly convinced by your implicit assumption that ICU is
> bug-free.

We have a wiki page about ICU.  I'm not sure that it's current, but if
it isn't and people are interested then perhaps we should update it:

https://wiki.postgresql.org/wiki/Todo:ICU

If we're going to talk about minimum requirements, I'd like to argue
that we require whatever system we're using to have versioning (which
glibc currently lacks, as I understand it...) to avoid the risk that
indexes will become corrupt when whatever we're using for collation
changes.  I'm pretty sure that's already bitten us on at least some
RHEL6 -> RHEL7 migrations in some locales, even forgetting the issues
with strcoll vs. strxfrm.

Regarding key abbreviation and performance, if we are confident that
strcoll and strxfrm are at least independently internally consistent
then we could consider offering an option to choose between them.
We'd need to identify what each index was built with to do so, however,
as they would need to be rebuilt if the choice changes, at least
until/unless they're made to reliably agree.  Even using only one or the
other doesn't address the versioning problem though, which is a problem
for all currently released versions of PG and is just going to continue
to be an issue.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-28 Thread Anastasia Lubennikova


25.03.2016 01:12, Peter Geoghegan:

On Thu, Mar 24, 2016 at 7:17 AM, Robert Haas  wrote:

I really like this idea, and the performance results seem impressive,
but I think we should push this out to 9.7.  A btree patch that didn't
have WAL support until two and a half weeks into the final CommitFest
just doesn't seem to me like a good candidate.  First, as a general
matter, if a patch isn't code-complete at the start of a CommitFest,
it's reasonable to say that it should be reviewed but not necessarily
committed in that CommitFest.

You're right.
Frankly, I thought that someone will help me with the path, but I had to 
finish it myself.

*off-topic*
I wonder, if we can add new flag to commitfest. Something like "Needs 
assistance",

which will be used to mark big and complicated patches in progress.
While "Needs review" means that the patch is almost ready and only 
requires the final review.



   This patch has had some review, but I'm
not sure how deep that review is, and I think it's had no code review
at all of the WAL logging changes, which were submitted only a week
ago, well after the CF deadline.  Second, the btree AM is a
particularly poor place to introduce possibly destabilizing changes.
Everybody depends on it, all the time, for everything.  And despite
new tools like amcheck, it's not a particularly easy thing to debug.

Regrettably, I must agree. I don't see a plausible path to commit for
this patch in the ongoing CF.

I think that Anastasia did an excellent job here, and I wish I could
have been of greater help sooner. Nevertheless, it would be unwise to
commit this given the maturity of the code. There have been very few
instances of performance improvements to the B-Tree code for as long
as I've been interested, because it's so hard, and the standard is so
high. The only example I can think of from the last few years is
Kevin's commit 2ed5b87f96 and Tom's commit 1a77f8b63d both of which
were far less invasive, and Simon's commit c7111d11b1, which we just
outright reverted from 9.5 due to subtle bugs (and even that was
significantly less invasive than this patch). Improving nbtree is
something that requires several rounds of expert review, and that's
something that's in short supply for the B-Tree code in particular. I
think that a new testing strategy is needed to make this easier, and I
hope to get that going with amcheck. I need help with formalizing a
"testing first" approach for improving the B-Tree code, because I
think it's the only way that we can move forward with projects like
this. It's *incredibly* hard to push forward patches like this given
our current, limited testing strategy.


Unfortunately, I must agree. This patch seems to be far from final 
version until the feature freeze.

I'll move it to the future commitfest.

Anyway it means, that now we have more time to improve the patch.
If you have any ideas related to this patch like prefix/suffix 
compression, I'll be glad to discuss them.

Same for any other ideas of B-tree optimization.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Tom Lane

Oleg Bartunov  writes:
> Should we start thinking about ICU ?

Isn't it still true that ICU fails to meet our minimum requirements?
That would include (a) working with the full Unicode character range
(not only UTF16) and (b) working with non-Unicode encodings.  No doubt
we could deal with (b) by inserting a conversion, but that would take
a lot of shine off the performance numbers you mention.

I'm also not exactly convinced by your implicit assumption that ICU is
bug-free.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combinations of pg_strdup/free in pg_dump code

2016-03-28 Thread Tom Lane

Michael Paquier  writes:
> While reading some code of pg_dump, I noticed that the following
> pattern is heavily present:
> lanname = pg_strdup(stuff)
> free(lanname);

> When pg_strdup or any pg-related allocation routines are called, I
> think that we should use pg_free() and not free(). It does not matter
> much in practice because pg_free() calls actually free() and the
> latter per the POSIX spec should do nothing if the input pointer is
> NULL (some version of SunOS that crash on that actually :p), but we
> really had better be consistent in the calls done. Thoughts?

I do not think this is worth troubling over, really.  If there are
places that are relying on free(NULL) to work, it might be worth
ensuring they go through pg_free; but the pattern you show here
is perfectly safe.  We have other things to do besides create
code churn for this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Andres Freund

On 2016-03-28 15:46:43 +0300, Alexander Korotkov wrote:
> diff --git a/src/backend/storage/buffer/bufmnew file mode 100644
> index 6dd7c6e..fe6fb9c
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -52,7 +52,6 @@
>  #include "utils/resowner_private.h"
>  #include "utils/timestamp.h"
>  
> -
>  /* Note: these two macros only work on shared buffers, not local ones! */
>  #define BufHdrGetBlock(bufHdr)   ((Block) (BufferBlocks + ((Size) 
> (bufHdr)->buf_id) * BLCKSZ))
>  #define BufferGetLSN(bufHdr) (PageGetLSN(BufHdrGetBlock(bufHdr)))

spurious

> @@ -848,7 +852,7 @@ ReadBuffer_common(SMgrRelation smgr, cha
>* it's not been recycled) but come right back here to try smgrextend
>* again.
>*/
> - Assert(!(bufHdr->flags & BM_VALID));/* spinlock not needed 
> */
> + Assert(!(pg_atomic_read_u32(>state) & BM_VALID)); /* header 
> lock not needed */
>  
>   bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : 
> BufHdrGetBlock(bufHdr);
>  
> @@ -932,8 +936,13 @@ ReadBuffer_common(SMgrRelation smgr, cha
>  
>   if (isLocalBuf)
>   {
> - /* Only need to adjust flags */
> - bufHdr->flags |= BM_VALID;
> + /*
> +  * Only need to adjust flags.  Since it's local buffer, there 
> is no
> +  * concurrency.  We assume read/write pair to be cheaper than 
> atomic OR.
> +  */
> + uint32 state = pg_atomic_read_u32(>state);
> + state |= BM_VALID;
> + pg_atomic_write_u32(>state, state);

I'm not a fan of repeating that comment in multiple places. Hm.


>   }
>   else
>   {
> @@ -987,10 +996,11 @@ BufferAlloc(SMgrRelation smgr, char relp
>   BufferTag   oldTag; /* previous identity of 
> selected buffer */
>   uint32  oldHash;/* hash value for oldTag */
>   LWLock *oldPartitionLock;   /* buffer partition lock for it 
> */
> - BufFlagsoldFlags;
> + uint32  oldFlags;
>   int buf_id;
>   BufferDesc *buf;
>   boolvalid;
> + uint32  state;
>  
>   /* create a tag so we can lookup the buffer */
>   INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
> @@ -1050,23 +1060,23 @@ BufferAlloc(SMgrRelation smgr, char relp
>   for (;;)
>   {
>   /*
> -  * Ensure, while the spinlock's not yet held, that there's a 
> free
> +  * Ensure, while the header lock isn't yet held, that there's a 
> free
>* refcount entry.
>*/
>   ReservePrivateRefCountEntry();
>  
>   /*
>* Select a victim buffer.  The buffer is returned with its 
> header
> -  * spinlock still held!
> +  * lock still held!
>*/
> - buf = StrategyGetBuffer(strategy);
> + buf = StrategyGetBuffer(strategy, );

The new thing really still is a spinlock, not sure if it's worth
changing the comments that way.


> @@ -1319,7 +1330,7 @@ BufferAlloc(SMgrRelation smgr, char relp
>   * InvalidateBuffer -- mark a shared buffer invalid and return it to the
>   * freelist.
>   *
> - * The buffer header spinlock must be held at entry.  We drop it before
> + * The buffer header lock must be held at entry.  We drop it before
>   * returning.  (This is sane because the caller must have locked the
>   * buffer in order to be sure it should be dropped.)
>   *

Ok, by now I'm pretty sure that I don't want this to be changed
everywhere, just makes reviewing harder.



> @@ -1433,6 +1443,7 @@ void
>  MarkBufferDirty(Buffer buffer)
>  {
>   BufferDesc *bufHdr;
> + uint32  state;
>  
>   if (!BufferIsValid(buffer))
>   elog(ERROR, "bad buffer ID: %d", buffer);
> @@ -1449,14 +1460,14 @@ MarkBufferDirty(Buffer buffer)
>   /* unfortunately we can't check if the lock is held exclusively */
>   Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
>  
> - LockBufHdr(bufHdr);
> + state = LockBufHdr(bufHdr);
>  
> - Assert(bufHdr->refcount > 0);
> + Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
>  
>   /*
>* If the buffer was not dirty already, do vacuum accounting.
>*/
> - if (!(bufHdr->flags & BM_DIRTY))
> + if (!(state & BM_DIRTY))
>   {
>   VacuumPageDirty++;
>   pgBufferUsage.shared_blks_dirtied++;
> @@ -1464,9 +1475,9 @@ MarkBufferDirty(Buffer buffer)
>   VacuumCostBalance += VacuumCostPageDirty;
>   }
>  
> - bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
> -
> - UnlockBufHdr(bufHdr);
> + state |= BM_DIRTY | BM_JUST_DIRTIED;
> + state &= ~BM_LOCKED;
> + pg_atomic_write_u32(>state, state);
>  }

Hm, this is a routine I think we should avoid taking the lock in; it's
often called quite

Re: [HACKERS] WIP: Covering + unique indexes.

2016-03-28 Thread Anastasia Lubennikova


21.03.2016 19:53, Anastasia Lubennikova:

19.03.2016 08:00, Peter Geoghegan:
On Fri, Mar 18, 2016 at 5:15 AM, David Steele  
wrote:
It looks like this patch should be marked "needs review" and I have 
done so.

Uh, no it shouldn't. I've posted an extensive review on the original
design thread. See CF entry:

https://commitfest.postgresql.org/9/433/

Marked "Waiting on Author".

Thanks to David,
I've missed these letters at first.
I'll answer here.


* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, 
indnkeyatts);

+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
 ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past)


I agree that _bt_insertonpg() is not right for truncation.
Furthermore, I've noticed that all internal keys are solely the copies 
of "High keys" from the leaf pages. Which is pretty logical.
Therefore, if we have already truncated the tuple, when it became a 
High key, we do not need the same truncation within 
_bt_insert_parent() or any other function.
So the only thing to worry about is the HighKey truncation. I rewrote 
the code. Now only _bt_split cares about truncation.


It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
/*
 * We copy the last item on the page into the new page, and then
 * rearrange the old page so that the 'last item' becomes its 
high key
 * rather than a true data item.  There had better be at least 
two
 * items on the page already, else the page would be empty of 
useful

 * data.
 */
/*
 * Move 'last' into the high key position on opage
 */

To be consistent with other steps of algorithm ( all high keys must be 
truncated tuples), I had to update this high key on place:

delete the old one, and insert truncated high key.
The very same logic I use to truncate posting list of a compressed 
tuple in the "btree_compression" patch. [1]
I hope, both patches will be accepted, and then I'll thoroughly merge 
them .



* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal.


It is a very important issue. But I don't think it's a bug there.
I've read amcheck sources thoroughly and found that the problem 
appears at

"invariant_key_less_than_equal_nontarget_offset()


static bool
invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
   Page nontarget, ScanKey 
key,

   OffsetNumber upperbound)
{
int16natts = state->rel->rd_rel->relnatts;
int32cmp;

cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);

return cmp <= 0;
}

It uses scankey, made with _bt_mkscankey() which uses only key 
attributes, but calls _bt_compare with wrong keysz.
If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of 
natts, all the checks would be passed successfully.


Same for invariant_key_greater_than_equal_offset() and 
invariant_key_less_than_equal_nontarget_offset().


In my view, it's the correct way to fix this problem, because the 
caller is responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it 
an assertion (see the patch attached).


I'll add a flag to distinguish regular and truncated tuples, but it 
will not be used in this patch. Please, comment, if I've missed 
something.
As you've already mentioned, neither high keys, nor tuples on internal 
pages are using  "itup->t_tid.ip_posid", so I'll take one bit of it.


It will definitely require changes in the future works on suffix 
truncation or something like that, but IMHO for now it's enough.


Do you have any objections or comments?

[1] https://commitfest.postgresql.org/9/494/



One more version of the patch is attached. I did more testing, and fixed 
couple of bugs.

[HACKERS] Combinations of pg_strdup/free in pg_dump code

2016-03-28 Thread Michael Paquier

Hi all,

While reading some code of pg_dump, I noticed that the following
pattern is heavily present:
lanname = pg_strdup(stuff)
free(lanname);

One example is for example that:
lanname = get_language_name(fout, transforminfo[i].trflang);
if (typeInfo && lanname)
appendPQExpBuffer(, "%s %s",
  typeInfo->dobj.name, lanname);
transforminfo[i].dobj.name = namebuf.data;
free(lanname);
And get_language_name() uses pg_strdup() to allocate the string freed here.

When pg_strdup or any pg-related allocation routines are called, I
think that we should use pg_free() and not free(). It does not matter
much in practice because pg_free() calls actually free() and the
latter per the POSIX spec should do nothing if the input pointer is
NULL (some version of SunOS that crash on that actually :p), but we
really had better be consistent in the calls done. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-28 Thread Michael Paquier

On Mon, Mar 28, 2016 at 10:08 PM, Thomas Munro
 wrote:
> On Tue, Mar 29, 2016 at 1:56 AM, Thomas Munro
>  wrote:
>> On Mon, Mar 28, 2016 at 8:54 PM, Michael Paquier
>>  wrote:
>>> I have been also thinking a lot about this patch, and the fact that
>>> the WAL receiver latch is being used within the internals of
>>> libpqwalreceiver has been bugging me a lot, because this makes the
>>> wait phase happening within the libpqwalreceiver depend on something
>>> that only the WAL receiver had a only control on up to now (among the
>>> things thought: having a second latch for libpqwalreceiver, having an
>>> event interface for libpqwalreceiver, switch libpq_receive into being
>>> asynchronous...).
>>
>> Yeah, it bugs me too.  Do you prefer this?
>>
>> int walrcv_receive(char **buffer, int *wait_fd);
>>
>> Return value -1 means end-of-copy as before, return value 0 means "no
>> data available now, please call me again when *wait_fd is ready to
>> read".  Then walreceiver.c can look after the WaitLatchOrSocket call
>> and deal with socket readiness, postmaster death, timeout and latch,
>> and libpqwalreceiver.c doesn't know anything about all that stuff
>> anymore, but it is now part of the interface that it must expose a
>> file descriptor for readiness testing when it doesn't have data
>> available.
>>
>> Please find attached a new patch series which does it that way.
>
> Oops, there is a bug in the primary disconnection case when len == 1
> and it breaks out of the loop and wait_fd is invalid.  I'll follow up
> on that tomorrow, but I'm interested to hear your thoughts (and anyone
> else's!) on that interface change and general approach.

I definitely prefer that, that's neater! libpq_select could be
simplified because a timeout does not matter much.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-28 Thread Andres Freund

On 2016-03-28 13:21:41 +0530, Pavan Deolasee wrote:
> On Wed, Mar 23, 2016 at 6:16 PM, Michael Paquier 
> wrote:
> 
> >
> >
> > I'd just add dots at the end of the sentences in the comment blocks
> > because that's project style, but I'm being picky, except that the
> > logic looks quite good.
> >
> 
> Since this is a bug affecting all stable branches, IMHO it will be a good
> idea to fix this before the upcoming minor releases.

It's definitely too late for that; they're going to be wrapped in a
couple hours.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] backup tools ought to ensure created backups are durable

2016-03-28 Thread Andres Freund

On 2016-03-28 11:35:57 +0200, Magnus Hagander wrote:
> On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier 
> wrote:
> 
> > On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund  wrote:
> > > As pointed out in
> > >
> > http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de
> > > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > > make any efforts to ensure their output is durable.
> > >
> > > I think for backup tools of possibly critical data, that's pretty much
> > > unaceptable.
> >
> > Definitely agreed, once a backup/dump has been taken and those
> > utilities exit, we had better ensure that they are durably on disk.
> > For pg_basebackup and pg_dump, as everything except pg_dump/plain
> > require a target directory for the location of the output result, we
> > really can make things better.
> >
> >
> Definitely agreed on fixing it. But I don't think your summary is right.
> 
> pg_basebackup in tar mode can be sent to stdout, does not require a
> directory. And the same for pg_dump in any mode except for directory. So we
> can't just drive it off the mode, some more detailed checks are required.

if (!isastty(stdout)) ought to do the trick, afaics? And maybe add a
warning somewhere in the docs about the tools not fsyncing if you pipe
their output data somewhere?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Andres Freund

On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
>
> >
> > What's sizeof(BufferDesc) after applying these patches? It should better
> > be <= 64...
> >
>
> It is 72.

Ah yes, miscalculated the required alignment.  Hm. So we got to get this
smaller. I see three approaches:

1) Reduce the spinlock size on ppc. That actually might just work by
   replacing "unsigned int" by "unsigned char"
2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
   embedding the spinlock, and actually might allow to avoid one atomic
   op in a number of cases.
3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
   64byte.

I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
of arithmetic to a number of places in bufmgr.c.  Robert, what do you
think?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-28 Thread Thomas Munro

On Tue, Mar 29, 2016 at 1:56 AM, Thomas Munro
 wrote:
> On Mon, Mar 28, 2016 at 8:54 PM, Michael Paquier
>  wrote:
>> I have been also thinking a lot about this patch, and the fact that
>> the WAL receiver latch is being used within the internals of
>> libpqwalreceiver has been bugging me a lot, because this makes the
>> wait phase happening within the libpqwalreceiver depend on something
>> that only the WAL receiver had a only control on up to now (among the
>> things thought: having a second latch for libpqwalreceiver, having an
>> event interface for libpqwalreceiver, switch libpq_receive into being
>> asynchronous...).
>
> Yeah, it bugs me too.  Do you prefer this?
>
> int walrcv_receive(char **buffer, int *wait_fd);
>
> Return value -1 means end-of-copy as before, return value 0 means "no
> data available now, please call me again when *wait_fd is ready to
> read".  Then walreceiver.c can look after the WaitLatchOrSocket call
> and deal with socket readiness, postmaster death, timeout and latch,
> and libpqwalreceiver.c doesn't know anything about all that stuff
> anymore, but it is now part of the interface that it must expose a
> file descriptor for readiness testing when it doesn't have data
> available.
>
> Please find attached a new patch series which does it that way.

Oops, there is a bug in the primary disconnection case when len == 1
and it breaks out of the loop and wait_fd is invalid.  I'll follow up
on that tomorrow, but I'm interested to hear your thoughts (and anyone
else's!) on that interface change and general approach.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alexander Korotkov

On Sun, Mar 27, 2016 at 4:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:
>
>> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
>> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
>> > a.korot...@postgrespro.ru> wrote:
>> >
>> > > Thank you very much for testing!
>> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going
>> to
>> > > post results of tests on this server in next Monday.
>> > >
>> >
>> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
>> 100
>> > -M prepared -T 300.
>> > See results in the table and chart.
>> >
>> > clients master  v3  v5
>> > 1   11671   12507   12679
>> > 2   24650   26005   25010
>> > 4   49631   48863   49811
>> > 8   96790   96441   99946
>> > 10  121275  119928  124100
>> > 20  243066  243365  246432
>> > 30  359616  342241  357310
>> > 40  431375  415310  441619
>> > 50  489991  489896  500590
>> > 60  538057  636473  554069
>> > 70  588659  714426  738535
>> > 80  405008  923039  902632
>> > 90  295443  1181247 1155918
>> > 100 258695  1323125 1325019
>> > 110 238842  1393767 1410274
>> > 120 226018  1432504 1474982
>> > 130 215102  1465459 1503241
>> > 140 206415  1470454 1505380
>> > 150 197850  1475479 1519908
>> > 160 190935  1420915 1484868
>> > 170 185835  1438965 1453128
>> > 180 182519  1416252 1453098
>> >
>> > My conclusions are following:
>> > 1) We don't observe any regression in v5 in comparison to master.
>> > 2) v5 in most of cases slightly outperforms v3.
>>
>> What commit did you base these tests on? I guess something recent, after
>> 98a64d0bd?
>>
>
> Yes, more recent than 98a64d0bd. It was based on 676265eb7b.
>
>
>> > I'm going to do some code cleanup of v5 in Monday
>>
>> Ok, I'll try to do a review and possibly commit after that.
>>
>
> Sounds good.
>

There is next revision of patch.  It contains mostly cosmetic changes.
Comment are adjusted to reflect changes of behaviour.
I also changed atomic AND/OR for local buffers to read/write pair which
would be cheaper I suppose.  However, I don't insist on it, and it could be
reverted.
The patch is ready for your review.  It's especially interesting what do
you think about the way I abstracted exponential back off of spinlock.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar

On Mon, Mar 28, 2016 at 3:02 PM, Dilip Kumar  wrote:

> 1. Relation Size : No change in size, its same as base and v13
>
> 2. INSERT 1028 Byte 1000 tuple performance
> ---
> Client base v13 v15
> 1 117 124 122
> 2 111 126 123
> 4 51 128 125
> 8 43 149 135
> 16 40 217 147
> 32 35 263 141
>
> 3. COPY 1 Tuple performance.
> --
> Client base v13 v15
> 1 118 147 155
> 2 217 276 273
> 4 210 421 457
> 8 166 630 643
> 16 145 813 595
> 32 124 985 598
>
> Conclusion:
> ---
> 1. I think v15 is solving the problem exist with v13 and performance is
> significantly high compared to base, and relation size is also stable, So
> IMHO V15 is winner over other solution, what other thinks ?
>
> 2. And no point in thinking that V13 is better than V15 because, V13 has
> bug of sometime extending more than expected pages and that is uncontrolled
> and same can be the reason also of v13 performing better.
>

Found one problem with V15, so sending the new version.
In V15 I am taking prev_blkno as targetBlock instead it should be the last
block of the relation at that time. Attaching new patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

2016-03-28 Thread Piotr Stefaniak


On 2016-03-28 11:33, Aleksander Alekseev wrote:

Hello, Piotr.

Thanks for report. But I'm having some difficulties reproducing issues
you described.


Oh, if you want backtraces then either set a conditional breakpoint or 
add your own Assert like I did.



Also it would be a good idea to include these steps to regression tests.


I agree and I generally think that the more test cases touch previously 
not covered code paths the better, even if it had to be run as a 
different make(1) target. Although it seems that at least some people 
would agree (see [1]), the "make check" split somehow isn't happening.


[1] ca+tgmoymofe94+3wg3spg9sqajkv4sixjey+rdrmamye-vq...@mail.gmail.com



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-28 Thread Alexander Korotkov

Hi, Petr!

On Thu, Mar 17, 2016 at 10:56 AM, Petr Jelinek  wrote:

> On 16/03/16 15:31, Teodor Sigaev wrote:
>
>> Good catch, thanks! Tests were added.
>>>
>>
>> I don't see any objection, is consensus reached? I'm close to comiit
>> that...
>>
>>
> I did only cursory review on the bloom contrib module and don't really
> have complaints there, but I know you can review that one. I'd like the
> English of the generic_xlog.c description improved but I won't get to it
> before weekend.


What is your plans about English of the generic_xlog.c?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Autovacuum to prevent wraparound tries to consume xid

2016-03-28 Thread Alexander Korotkov

Hackers,

one our customer meet near xid wraparound situation.  xid counter
reached xidStopLimit value.  So, no transactions could be executed in
normal mode.  But what I noticed is strange behaviour of autovacuum to
prevent wraparound.  It vacuums tables, updates pg_class and pg_database,
but then falls with "database is not accepting commands to avoid wraparound
data loss in database" message.  We end up with situation that according to
pg_database  maximum age of database was less than 200 mln., but
transactions couldn't be executed, because ShmemVariableCache wasn't
updated (checked by gdb).

I've reproduced this situation on my laptop as following:

1) Connect gdb, do "set ShmemVariableCache->nextXid =
ShmemVariableCache->xidStopLimit"
2) Stop postgres
3) Make some fake clog: "dd bs=1m if=/dev/zero
of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
4) Start postgres

Then I found the same situation as in customer database.  Autovacuum to
prevent wraparound regularly produced following messages in the log:

ERROR:  database is not accepting commands to avoid wraparound data loss in
database "template1"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

Finally all databases was frozen

# SELECT datname, age(datfrozenxid) FROM pg_database;
  datname  │   age
───┼──
 template1 │0
 template0 │0
 postgres  │ 5000
(3 rows)

but no transactions could be executed (ShmemVariableCache wasn't updated).

After some debugging I found that vac_truncate_clog consumes xid just to
produce warning.  I wrote simple patch which replaces
GetCurrentTransactionId() with ShmemVariableCache->nextXid.  That
completely fixes this situation for me: ShmemVariableCache was successfully
updated.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_vac_truncate_clog_xid_consume.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Thomas Kellerer

Oleg Bartunov-2 wrote
> But still, icu provides us abbreviated keys and collation stability,

Does include ICU mean that collation handling is identical across platforms?
E.g. a query on Linux involving string comparison would yield the same
result on MacOS and Windows? 

If that is the case I'm all for it. 

Currently the different behaviour in handling collation aware string
comparisons is a bug in my eyes from a user's perspective. I do understand
and can accept the technical reasons for that, but it still feels odd that a
query yields different results (with identical data) just because it runs on
a different platform.




--
View this message in context: 
http://postgresql.nabble.com/Draft-release-notes-for-next-week-s-releases-tp5895357p5895484.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

2016-03-28 Thread Etsuro Fujita


On 2016/03/28 18:17, Rajkumar Raghuwanshi wrote:

I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB,
and I observed below issue._

Observation:_ Inner join and full outer join combination on a table
generating wrong result.

SELECT * FROM lt;
  c1

   1
   2
(2 rows)

SELECT * FROM ft;
  c1

   1
   2
(2 rows)

\d+ ft
  Foreign table "public.ft"
  Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
Description
+-+---+-+-+--+-
  c1 | integer |   | | plain   |  |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
++
   1 |  1 |  1
   2 |  2 |  2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
  c1 | c1 | c1
++
   1 |  1 |  1
   1 |  2 |
   2 |  1 |
   2 |  2 |  2
(4 rows)


I think the reason for that is in foreign_join_ok.  This in that function:

switch (jointype)
{
case JOIN_INNER:
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));
break;

case JOIN_LEFT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));
break;

case JOIN_RIGHT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));
break;

case JOIN_FULL:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));
break;

default:
/* Should not happen, we have just check this above */
elog(ERROR, "unsupported join type %d", jointype);
}

wrongly pulls up remote_conds from joining relations in the FULL JOIN 
case.  I think we should not pull up such conditions in the FULL JOIN case.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A question on systable_beginscan()

2016-03-28 Thread Amit Langote

Hi,

On 2016/03/25 23:49, Onder Kalaci wrote:
> Hi hackers,
> 
> As it's documented in the source code, systable_beginscan() could be used
> to on non-system tables as well. My question is that, is it possible to
> write a C code with systable_beginscan(), systable_getnext() and ScanKeys
> which is equivalent to the following query: (Assume that the qual_column is
> a text column)
> 
> SELECT id FROM table WHERE qual_column::int = 15;
> 
> In other words, can we cast a column type to another type via these
> low-level APIs? Are there any examples in the Postgres source?

Don't think the API advertises any support for that or at least I couldn't
find any. However, I am not sure if it's outright impossible to achieve
that with some prep -  initialize the scan key (sk_func) with
FmgrInfo of a specialized function, that performs the cast before
comparing.  Such a function would be written to work with fixed pair of
source and target types to be able to invoke, say, textout() -> int4in(),
before comparing the value with the argument (sk_argument). The argument
would have to be of the target type, needless to say . Most
likely, this would be an unrecommended course of action in the long run,
so you would be better off considering other ways (like SPI).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alter or rename enum value

2016-03-28 Thread Emre Hasegeli

> I do not know whether this would be a meaningful improvement for
> common use-cases, though.  (It'd help if people were more specific
> about the use-cases they need to work.)

For what its worth, in the company I am working for, InnoGames GmbH,
not being able to alter enums is the number one pain point with
PostgreSQL.  We have hundreds of small databases on the backend of the
game worlds.  They are heavily using enums.  New values to the enums
need to be added or to be removed for the new features.  We are
relying on the transactions for the database migrations, so we
couldn't make use of ALTER TYPE ADD VALUE.

Some functions found on the Internet [1] which change the values on
the catalog had been used, until they corrupted some indexes.  (I
believe those functions are still quite common.)  Now, we are using a
function to replace an enum type on all tables with another one, but
we are not at all happy with this solution.  It requires all objects
which were using the enum to be dropped and recreated, and it rewrites
the tables, so it greatly increases the migration time and effort.

[1] http://en.dklab.ru/lib/dklab_postgresql_enum/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Masahiko Sawada

On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the new patch. Sorry to have overlooked some
> versions. I'm looking the  v19 patch now.
>
> make complains for an unused variable.
>
> | syncrep.c: In function ‘SyncRepGetSyncStandbys’:
> | syncrep.c:601:13: warning: variable ‘next’ set but not used 
> [-Wunused-but-set-variable]
> |ListCell *next;
>
>
> At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada  
> wrote in

Re: [HACKERS] backup tools ought to ensure created backups are durable

2016-03-28 Thread Magnus Hagander

On Mon, Mar 28, 2016 at 3:11 AM, Michael Paquier 
wrote:

> On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund  wrote:
> > As pointed out in
> >
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de
> > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> > make any efforts to ensure their output is durable.
> >
> > I think for backup tools of possibly critical data, that's pretty much
> > unaceptable.
>
> Definitely agreed, once a backup/dump has been taken and those
> utilities exit, we had better ensure that they are durably on disk.
> For pg_basebackup and pg_dump, as everything except pg_dump/plain
> require a target directory for the location of the output result, we
> really can make things better.
>
>
Definitely agreed on fixing it. But I don't think your summary is right.

pg_basebackup in tar mode can be sent to stdout, does not require a
directory. And the same for pg_dump in any mode except for directory. So we
can't just drive it off the mode, some more detailed checks are required.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

2016-03-28 Thread Aleksander Alekseev

Hello, Piotr.

Thanks for report. But I'm having some difficulties reproducing issues
you described.

I compiled PostgreSQL from master branch on FreeBSD 10.2 using this
command:

```
CC=/usr/local/bin/gcc49 CFLAGS="-O0 -g" \
  ./configure --enable-cassert --enable-debug \
  --prefix=/home/eax/postgresql-install \
  && gmake clean && gmake -j2 -s
```

Then I run reinit.sh:

```
#!/usr/bin/env bash

P=~/postgresql-install

pkill -9 postgres
make install

rm -rf $P/data
$P/bin/initdb -D $P/data

echo "max_prepared_transactions = 100" >> $P/data/postgresql.conf
echo "wal_level = hot_standby" >> $P/data/postgresql.conf
echo "wal_keep_segments = 128" >> $P/data/postgresql.conf
echo "max_connections = 10" >> $P/data/postgresql.conf
echo "listen_addresses = '*'" >> $P/data/postgresql.conf

echo '' > $P/data/logfile

echo "host all all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf
echo "host replication all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf
echo "local replication all trust" >> $P/data/pg_hba.conf

$P/bin/pg_ctl -w -D $P/data -l $P/data/logfile start
$P/bin/createdb `whoami`
$P/bin/psql -c "create table test(k int primary key, v text);"
```

..., connected to PostgreSQL using psql, in second terminal I attached
to the backend process using gdb710 and input `c`. Now in psql:

```
eax=# create table tt5(x int);
CREATE TABLE
eax=# create table b_star(x int);
CREATE TABLE
eax=# insert into b_star values (1), (2), (3);
INSERT 0 3
eax=# insert into tt5 values (2), (3), (4), (5);
INSERT 0 4
eax=# select 1
eax-# from public.tt5 as subq_0
eax-# where EXISTS (
eax(#select 1
eax(#from public.b_star as ref_0
eax(#where false
eax(# );
 ?column? 
--
(0 rows)

eax=# select 1
eax-# from unnest('{}'::boolean[]) a (x)
eax-# left join (
eax(#select *
eax(#from unnest('{}'::boolean[])
eax(#where false
eax(# ) b (x) on a.x = b.x;
 ?column? 
--
(0 rows)

```

Everything seems to work, no stacktraces in gdb.

Could you please provide more concrete steps to reproduce these
issues i.e, OS and compiler version, compilation flags (or package
version), cluster config, database schema, etc? These steps are required
at least to make sure that fixed code really fixes a problem. Also it
would be a good idea to include these steps to regression tests.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar

On Mon, Mar 28, 2016 at 7:21 AM, Dilip Kumar  wrote:

> I agree with that conclusion.  I'm not quite sure where that leaves
>> us, though.  We can go back to v13, but why isn't that producing extra
>> pages?  It seems like it should: whenever a bulk extend rolls over to
>> a new FSM page, concurrent backends will search either the old or the
>> new one but not both.
>>
>
Our open question was why V13 is not producing extra pages, I tried to
print some logs and debug. It seems to me that,
when blocks are spilling to next FSM pages, that time all backends who are
waiting on lock will not get the block because searching in old FSM page.
But the backend which is extending the pages will set
 RelationSetTargetBlock to latest blocks, and that will make new FSM page
available for search by new requesters.

1. So this is why v13  (in normal cases*1) not producing unused pages.
2. But it will produce extra pages (which will be consumed by new
requesters), because all waiter will come one by one and extend 512 pages.

*1 : Above I have mentioned normal case, I mean there is some case exist
where V13 can leave unused page, Like one by one each waiter Get the lock
and extend the page, but no one go down till RelationSetTargetBlock so till
now new pages are not available by new requester, and time will come when
blocks will spill to third FSM page, now one by one all backends go down
and set RelationSetTargetBlock, and suppose last one set it to the block
which is in 3rd FSM page, in this case, pages in second FSM pages are
unused.

Maybe we could do this - not sure if it's what you were suggesting above:
>>
>> 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each
>> one.
>> 2. After inserting them all, go back and update the upper levels of
>> the FSM tree up the root.
>>
> I think this is better option, Since we will search last two pages of FSM
> tree, then no need to update the upper level of the FSM tree. Right ?
>
> I will test and post the result with this option.
>

I have created this patch and results are as below.

* All test scripts are same attached upthread

1. Relation Size : No change in size, its same as base and v13

2. INSERT 1028 Byte 1000 tuple performance
---
Client base v13 v15
1 117 124 122
2 111 126 123
4 51 128 125
8 43 149 135
16 40 217 147
32 35 263 141

3. COPY 1 Tuple performance.
--
Client base v13 v15
1 118 147 155
2 217 276 273
4 210 421 457
8 166 630 643
16 145 813 595
32 124 985 598

Conclusion:
---
1. I think v15 is solving the problem exist with v13 and performance is
significantly high compared to base, and relation size is also stable, So
IMHO V15 is winner over other solution, what other thinks ?

2. And no point in thinking that V13 is better than V15 because, V13 has
bug of sometime extending more than expected pages and that is uncontrolled
and same can be the reason also of v13 performing better.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

multi_extend_v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov

On Mon, Mar 28, 2016 at 2:06 PM, Peter Geoghegan  wrote:

> On Mon, Mar 28, 2016 at 12:55 AM, Oleg Bartunov 
> wrote:
> >  We'll post the patch.
>
> Cool.
>
> > Teodor made something to get abbreviated keys work as
> > I remember. I should say, that 27x improvement I got on my macbook. I
> will
> > check on linux.
>
> I think that Linux will be much faster. The stxfrm() blob produced by
> Mac OSX will have a horribly low concentration of entropy. For an 8
> byte Datum, you get only 2 distinguishing bytes. It's really, really
> bad. Mac OSX probably makes very little use of strxfrm() in practice;
> there are proprietary APIs that do something similar, but all using
> UTF-16 only.
>

Yes, Linux is much-much faster, I see no difference in performance using
latest icu 57_1.
I tested on Ubuntu 14.4.04.  But still, icu provides us abbreviated keys
and collation stability,
so let's add --with-icu.


>
> --
> Peter Geoghegan
>

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Amit Langote

On 2016/03/28 17:50, Kyotaro HORIGUCHI wrote:
> 
> # LISPers don't hesitate to dive into Sea of Parens.

Sorry in advance to be off-topic: https://xkcd.com/297 :)

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

2016-03-28 Thread Rajkumar Raghuwanshi

Hi,

I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB, and
I observed below issue.

*Observation:* Inner join and full outer join combination on a table
generating wrong result.

SELECT * FROM lt;
 c1

  1
  2
(2 rows)

SELECT * FROM ft;
 c1

  1
  2
(2 rows)

\d+ ft
 Foreign table "public.ft"
 Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
Description
+-+---+-+-+--+-
 c1 | integer |   | | plain   |  |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
 c1 | c1 | c1
++
  1 |  1 |  1
  2 |  2 |  2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
 c1 | c1 | c1
++
  1 |  1 |  1
  1 |  2 |
  2 |  1 |
  2 |  2 |  2
(4 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

>

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-28 Thread Alexander Korotkov

Hi!

On Sat, Mar 26, 2016 at 12:02 AM, David Steele  wrote:

> On 3/25/16 3:54 PM, Artur Zakirov wrote:
> > On 25.03.2016 21:42, Dmitry Ivanov wrote:
> >> Sorry for the delay, I desperately needed some time to finish a bunch of
> >> dangling tasks.
> >>
> >> I've added some new comments and clarified the ones that were obscure.
> >> Moreover, I felt an urge to recheck most parts of the code since
> >> apparently
> >> nobody (besides myself) has gone so far yet.
> >>
> >> On 25.03.16 18:42 MSK, David Steele wrote:
> >>> Time is short and it's not encouraging that you say there is "still
> much
> >> work to be done".
> >>
> >> Indeed, I was inaccurate. I am more than interested in the positive
> >> outcome.
> >>
> >
> > I tested the patch and take a look on it. All regression tests passed.
> > The code looks good and the patch introduce a great functionality.
> >
> > I think the patch can be marked as "Ready for Commiter". But I do not
> > feel the force to do that myself.
> >
> > Also I agree with you about tsvector_setweight(). There is not a problem
> > with it because this weights are immutable and so there is not benefits
> > from new function.
>
> Ideally Alexander can also look at it.  If not, then you should mark it
> "ready for committer" since I doubt any more reviewers will sign on this
> late in the CF.
>

I've checked the last version of the patch. Patch applies cleanly on
master, builds without warnings, regression tests pass.
The issue I reported about tsquery textual representation is fixed.
I'm going to mark this "Ready for Committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Kyotaro HORIGUCHI

Thank you for the new patch. Sorry to have overlooked some
versions. I'm looking the  v19 patch now.

make complains for an unused variable.

| syncrep.c: In function ‘SyncRepGetSyncStandbys’:
| syncrep.c:601:13: warning: variable ‘next’ set but not used 
[-Wunused-but-set-variable]
|ListCell *next;


At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada  
wrote in

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra


On 03/26/2016 08:09 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:


There are a few places where I reverted the pgindent formatting, because it
seemed a bit too weird - the first one are the lists of function prototypes
in common.h/mvstat.h, the second one are function calls to
_greedy/_exhaustive methods.


Function prototypes being weird is something that we've learned to
accept.  There's no point in undoing pgindent decisions there, because
the next run will re-apply them anyway.  Best not to fight it.

What you should definitely look into fixing is the formatting of
comments, if the result is too horrible.  You can prevent it from
messing those by adding dashes /*- at the beginning of the comment.



Yep, formatting of some of the comments got slightly broken, but it 
wasn't difficult to fix that without the /*--- trick.


I'm not sure about the prototypes though. It was a bit weird because 
prototypes in the same header file were formatted very differently.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra


Hi,

On 03/26/2016 10:18 AM, Tatsuo Ishii wrote:

Fair point. Attached is v18 of the patch, after pgindent cleanup.


Here are some feedbacks to v18 patch.

1) regarding examples in create_statistics manual

Here are numbers I got. "with statistics" referrers to the case where
multivariate statistics are used.  "without statistics" referrers to the
case where multivariate statistics are not used. The numbers denote
estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
are shown as a fraction to avoid 0 division. In my understanding case
1, 3, 4 showed that multivariate statistics superior.

with statistics without statistics
case1   0.980.01
case2   98/01/0


The case2 shows that functional dependencies assume that the conditions 
used in queries won't be incompatible - that's something this type of 
statistics can't fix.



case3   1.050.01
case4   1/0 103/0
case5   18.50   18.33
case6   23/0123/0


The last two lines (case5 + case6) seem a bit suspicious. I believe 
those are for the histogram data, and I do get these numbers:


case50.93 (5517 / 5949) 42.0 (249943 / 5949)
case6100/0  100/0

Perhaps you've been using the version before the bugfix, with ANALYZE on 
the wrong table?




2) following comments by me are not addressed in the v18 patch.


- There's no docs for pg_mv_statistic (should be added to "49. System
  Catalogs")

- The word "multivariate statistics" or something like that should
  appear in the index.

- There are some explanation how to deal with multivariate statistics
  in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
  section.


Yes, those are valid omissions. I plan to address them, and I'd also 
considering adding a section to 65.1 (How the Planner Uses Statistics), 
explaining more thoroughly how the planner uses multivariate stats.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Peter Geoghegan

On Mon, Mar 28, 2016 at 12:55 AM, Oleg Bartunov  wrote:
>  We'll post the patch.

Cool.

> Teodor made something to get abbreviated keys work as
> I remember. I should say, that 27x improvement I got on my macbook. I will
> check on linux.

I think that Linux will be much faster. The stxfrm() blob produced by
Mac OSX will have a horribly low concentration of entropy. For an 8
byte Datum, you get only 2 distinguishing bytes. It's really, really
bad. Mac OSX probably makes very little use of strxfrm() in practice;
there are proprietary APIs that do something similar, but all using
UTF-16 only.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov

On Mon, Mar 28, 2016 at 1:21 PM, Peter Geoghegan  wrote:

> On Mon, Mar 28, 2016 at 12:08 AM, Oleg Bartunov 
> wrote:
> > Should we start thinking about ICU ? I compare Postgres with ICU and
> without
> > and found 27x improvement in btree index creation for russian strings.
> This
> > includes effect of abbreviated keys and ICU itself. Also, we'll get
> system
> > independent locale.
>
> I think we should. I want to develop a detailed proposal before
> talking about it more, though, because the idea is controversial.
>
> Did you use the FreeBSD ports patch? Do you have your own patch that
> you could share?
>

 We'll post the patch. Teodor made something to get abbreviated keys work
as
I remember. I should say, that 27x improvement I got on my macbook. I will
check on linux.

>
> I'm not surprised that ICU is so much faster, especially now that
> UTF-8 is not a second class citizen (it's been possible to build ICU
> to specialize all its routines to handle UTF-8 for years now). As you
> may know, ICU supports partial sort keys, and sort key compression,
> which may have also helped:
> http://userguide.icu-project.org/collation/architecture
>

>
> That page also describes how binary sort keys are versioned, which
> allows them to be stored on disk. It says "A common example is the use
> of keys to build indexes in databases". We'd be crazy to trust Glibc
> strxfrm() to be stable *on disk*, but ICU already cares deeply about
> the things we need to care about, because it's used by other database
> systems like DB2, Firebird, and in some configurations SQLite [1].
>
> Glibc strxfrm() is not great with codepoints from the Cyrillic
> alphabet -- it seems to store 2 bytes per code-point in the primary
> weight level. So ICU might also do better in your test case for that
> reason.
>

Yes, I see on this page, that ICU is ~3 times faster for russian text.
http://site.icu-project.org/charts/collation-icu4c48-glibc


>
> [1]
> https://www.sqlite.org/src/artifact?ci=trunk=ext/icu/README.txt
> --
> Peter Geoghegan
>

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-28 Thread Michael Paquier

On Sun, Mar 27, 2016 at 7:30 AM, Thomas Munro
 wrote:
> On Sat, Mar 26, 2016 at 2:48 AM, Michael Paquier
>  wrote:
>> Should we worried about potential backward-incompatibilities with the
>> new return values of walrcv_receive?
>
> There are three changes to the walrcv_receive interface:
>
> 1.  It now takes a latch pointer, which may be NULL.
>
> 2.  If the latch pointer is non-NULL, the existing function might
> return a new sentinel value WALRCV_RECEIVE_LATCH_SET.  (The
> pre-existing sentinel value -1 is still in use and has the same value
> and meaning as before, but it now has a name:
> WALRCV_RECEIVE_COPY_ENDED.)
>
> 3.  It will no longer return when the process is signalled (a latch
> should be used to ask it to return instead).
>
> Effectively, any client code would need to change at least to add NULL
> or possibly a latch if it needs to ask it to return, and any
> alternative implementation of the WAL receiver interface would need to
> use WaitEventSet (or WaitLatchOrSocket) as its event loop instead of
> whatever it might be using now so that it can detect a latch's state.
> But in any case, any such code would fail to compile against 9.6 due
> to the new argument, and then you'd only be able to get the new return
> value if you asked for it by passing in a latch.  What affected code
> are we aware of -- either users of libpqwalreceiver.so or other WAL
> receiver implementations?

Any negative value returned by walrcv_receive would have stopped the
replication stream, not only -1. And as you say, it's actually a good
thing that the interface of walrcv_receive is changing with this
patch, this way compilation would just fail and failures would not
happen discreetly. That's too low-level to be mentioned in the release
notes either way, so just a compilation failure is acceptable to me.

>> Do you have numbers to share regarding how is performing the
>> latch-based approach and the approach that used SIGUSR2 when
>> remote_apply is used?
>
> I couldn't measure any significant change (Linux, all on localhost, 128 
> cores):
>
> pgbench -c 1 -N bench2 -T 600
>
> 0001-remote-apply-v5.patch (signals), remote_apply -> 449 TPS
> 0001-remote-apply-v6.patch (latches), remote_apply -> 452 TPS
>
> pgbench -c 64 -j 32 -N bench2 -T 600
>
> 0001-remote-apply-v5.patch (signals), remote_apply -> 8536 TPS
> 0001-remote-apply-v6.patch (latches), remote_apply -> 8534 TPS

Which concludes that both imply the same kind of performance. We are
likely seeing noise.

> Incidentally, I also did some testing on what happens when you signal
> a process that is busily writing and fsyncing.  I tested a few
> different kernels, write sizes and disk latencies and saw that things
> were fine on all of them up to 10k signals/sec but after that some
> systems' fsync performance started to reduce.  Only Linux on Power was
> still happily fsyncing at around 3/4 of full speed when signalled with
> a 2MHz+ tight kill loop (!), while FreeBSD and Linux on Intel weren't
> able to make much progress at all under such adversity.  So I suppose
> that if you could get the commit rate up into 5 figures you might be
> able to measure an improvement for the latch version due to
> latch-collapsing, though I noticed a large amount of signal-collapsing
> going on at the OS level on all systems I tested anyway, so maybe it
> wouldn't make a difference.  I attach that test program for interest.

Interesting results.

> Also, I updated the comment for the declaration of the latch in
> walreceiver.h to say something about the new usage.
>
> New patch series attached.

 static void WalRcvQuickDieHandler(SIGNAL_ARGS);

-
 static void
 ProcessWalRcvInterrupts(void)
Noise here.

+   ret = WaitLatchOrSocket(latch, events, PQsocket(streamConn), timeout_ms);
+
+   if (ret & WL_POSTMASTER_DEATH)
+   exit(0);
Exiting within libpqwalreceiver.so is no good. I think that even in
the case of a postmaster cleanup we should allow things to be cleaned
up.

 /*
+ * Wake up the walreceiver if it happens to be blocked in walrcv_receive,
+ * and tell it that a commit record has been applied.
+ *
+ * This is called by the startup process whenever interesting xlog records
+ * are applied, so that walreceiver can check if it needs to send an apply
+ * notification back to the master which may be waiting in a COMMIT with
+ * synchronous_commit = remote_apply.
+ */
+void
+WalRcvWakeup(void)
+{
+   SetLatch(>latch);
+}
I think here that it would be good to add an assertion telling that
this can just be called by the startup process while in recovery,
WalRcv->latch is not protected by a mutex lock.

+maximum of 'timeout' ms. If a message was successfully read, returns
+its length. Otherwise returns 0 for timeout, WALRCV_RECEIVE_COPY_ENDED
+for disconnection or WALRCV_RECEIVE_LATCH_SET. On success, a pointer
Having an assigned constant name for timeout would be good for
consistency with the rest.

I have been also

Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-28 Thread Pavan Deolasee

On Wed, Mar 23, 2016 at 6:16 PM, Michael Paquier 
wrote:

>
>
> I'd just add dots at the end of the sentences in the comment blocks
> because that's project style, but I'm being picky, except that the
> logic looks quite good.
>

Since this is a bug affecting all stable branches, IMHO it will be a good
idea to fix this before the upcoming minor releases.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[HACKERS] pgbench - remove unused clientDone parameter

2016-03-28 Thread Fabien



Remove pgbench clientDone unused "ok" parameter.

I cannot get the point of keeping a useless parameter, which is probably 
there because at some point in the past it was used. If it is needed some 
day it can always be reinserted.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4196b0e..1146e16 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1314,10 +1314,8 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static bool
-clientDone(CState *st, bool ok)
+clientDone(CState *st)
 {
-	(void) ok;	/* unused */
-
 	if (st->con != NULL)
 	{
 		PQfinish(st->con);
@@ -1388,7 +1386,7 @@ top:
 
 		/* stop client if next transaction is beyond pgbench end of execution */
 		if (duration > 0 && st->txn_scheduled > end_time)
-			return clientDone(st, true);
+			return clientDone(st);
 
 		/*
 		 * If this --latency-limit is used, and this slot is already late so
@@ -1441,7 +1439,7 @@ top:
 			if (!PQconsumeInput(st->con))
 			{	/* there's something wrong */
 fprintf(stderr, "client %d aborted in state %d; perhaps the backend died while processing\n", st->id, st->state);
-return clientDone(st, false);
+return clientDone(st);
 			}
 			if (PQisBusy(st->con))
 return true;	/* don't have the whole result yet */
@@ -1488,7 +1486,7 @@ top:
 	fprintf(stderr, "client %d aborted in state %d: %s",
 			st->id, st->state, PQerrorMessage(st->con));
 	PQclear(res);
-	return clientDone(st, false);
+	return clientDone(st);
 			}
 			PQclear(res);
 			discard_response(st);
@@ -1504,7 +1502,7 @@ top:
 
 			++st->cnt;
 			if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
-return clientDone(st, true);	/* exit success */
+return clientDone(st);	/* exit success */
 		}
 
 		/* increment state counter */
@@ -1541,7 +1539,7 @@ top:
 		{
 			fprintf(stderr, "client %d aborted while establishing connection\n",
 	st->id);
-			return clientDone(st, false);
+			return clientDone(st);
 		}
 		INSTR_TIME_SET_CURRENT(end);
 		INSTR_TIME_ACCUM_DIFF(thread->conn_time, end, start);
@@ -1873,7 +1871,7 @@ top:
 			bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
 
 			if (timer_exceeded) /* timeout */
-return clientDone(st, true);
+return clientDone(st);
 			else if (!ret)		/* on error */
 			{
 st->ecnt++;
@@ -1887,7 +1885,7 @@ top:
 			bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
 
 			if (timer_exceeded) /* timeout */
-return clientDone(st, true);
+return clientDone(st);
 			else if (!ret)		/* on error */
 			{
 st->ecnt++;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgbench - show weight percent

2016-03-28 Thread Fabien



This minor patch shows the expected drawing percent in multi-script 
reports, next to the relative weight.


--
Fabiendiff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4196b0e..3b63d69 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3080,10 +3080,10 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		{
 			if (num_scripts > 1)
 printf("SQL script %d: %s\n"
-	   " - weight = %d\n"
+	   " - weight = %d (targets %.1f%% of total)\n"
 	   " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n",
 	   i + 1, sql_script[i].desc,
-	   sql_script[i].weight,
+	   sql_script[i].weight, 100.0 * sql_script[i].weight / total_weight,
 	   sql_script[i].stats.cnt,
 	   100.0 * sql_script[i].stats.cnt / total->cnt,
 	   sql_script[i].stats.cnt / time_include);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-28 Thread Fabien COELHO




- that it does work:-) I'm not sure what happens by the script selection
  process, it should be checked carefully because it was not designed
  with allowing a zero weight, and it may depend on its/their positions.
  It may already work, but it really needs checking.


Hmmm, it seems ok.

Attached is an updated patch, which:


- I would suggest that a warning is shown when a weight is zero,
  something like "warning, script #%d weight is zero, will be ignored".


includes such a warning.


- the documentation should be updated:-)


adds a line about 0 weight in the documentation.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c6d1454..e31d5ee 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -698,6 +698,7 @@ pgbench  options  dbname
Each script may be given a relative weight specified after a
@ so as to change its drawing probability.
The default weight is 1.
+   Weight 0 scripts are ignored.
  
 
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4196b0e..0c1a0ee 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2953,10 +2953,10 @@ parseScriptWeight(const char *option, char **script)
 			fprintf(stderr, "invalid weight specification: %s\n", sep);
 			exit(1);
 		}
-		if (wtmp > INT_MAX || wtmp <= 0)
+		if (wtmp > INT_MAX || wtmp < 0)
 		{
 			fprintf(stderr,
-			"weight specification out of range (1 .. %u): " INT64_FORMAT "\n",
+			"weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
 	INT_MAX, (int64) wtmp);
 			exit(1);
 		}
@@ -2987,6 +2987,13 @@ addScript(ParsedScript script)
 		exit(1);
 	}
 
+	if (script.weight == 0)
+	{
+		fprintf(stderr,
+"warning, script \"%s\" (%d) weight is zero, will be ignored\n",
+script.desc, num_scripts);
+	}
+
 	sql_script[num_scripts] = script;
 	num_scripts++;
 }
@@ -3527,6 +3534,12 @@ main(int argc, char **argv)
 		/* cannot overflow: weight is 32b, total_weight 64b */
 		total_weight += sql_script[i].weight;
 
+	if (total_weight == 0)
+	{
+		fprintf(stderr, "total weight for scripts (-b and -f options) cannot be zero\n");
+		exit(1);
+	}
+
 	/* show per script stats if several scripts are used */
 	if (num_scripts > 1)
 		per_script_stats = true;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Peter Geoghegan

On Mon, Mar 28, 2016 at 12:08 AM, Oleg Bartunov  wrote:
> Should we start thinking about ICU ? I compare Postgres with ICU and without
> and found 27x improvement in btree index creation for russian strings. This
> includes effect of abbreviated keys and ICU itself. Also, we'll get system
> independent locale.

I think we should. I want to develop a detailed proposal before
talking about it more, though, because the idea is controversial.

Did you use the FreeBSD ports patch? Do you have your own patch that
you could share?

I'm not surprised that ICU is so much faster, especially now that
UTF-8 is not a second class citizen (it's been possible to build ICU
to specialize all its routines to handle UTF-8 for years now). As you
may know, ICU supports partial sort keys, and sort key compression,
which may have also helped:
http://userguide.icu-project.org/collation/architecture

That page also describes how binary sort keys are versioned, which
allows them to be stored on disk. It says "A common example is the use
of keys to build indexes in databases". We'd be crazy to trust Glibc
strxfrm() to be stable *on disk*, but ICU already cares deeply about
the things we need to care about, because it's used by other database
systems like DB2, Firebird, and in some configurations SQLite [1].

Glibc strxfrm() is not great with codepoints from the Cyrillic
alphabet -- it seems to store 2 bytes per code-point in the primary
weight level. So ICU might also do better in your test case for that
reason.

[1] https://www.sqlite.org/src/artifact?ci=trunk=ext/icu/README.txt
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov

On Mar 28, 2016 09:44, "Peter Geoghegan"  wrote:
>
> On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> > Probably the most discussion-worthy item is whether we can say
> > anything more about the strxfrm mess.  Should we make a wiki
> > page about that and have the release note item link to it?
>
> I think that there is an argument against doing so, which is that
> right now, all we have to offer on that are weasel words. However, I'm
> still in favor of a Wiki page, because I would not be at all surprised
> if our understanding of this problem evolved, and we were able to
> offer better answers in several weeks. Realistically, it will probably
> take at least that long before affected users even start to think
> about this.

Should we start thinking about ICU ? I compare Postgres with ICU and
without and found 27x improvement in btree index creation for russian
strings. This includes effect of abbreviated keys and ICU itself. Also,
we'll get system independent locale.
>
>
> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Masahiko Sawada

On Fri, Mar 25, 2016 at 9:20 PM, Robert Haas  wrote:
> On Thu, Mar 24, 2016 at 9:29 AM, Masahiko Sawada  
> wrote:
>> Also I felt a sense of discomfort regarding using [ and ] as a special
>> character for priority method.
>> Because (, ) and [, ] are a little similar each other, so it would
>> easily make many syntax errors when nested style is supported.
>> And the synopsis of that in documentation is odd;
>> synchronous_standby_names = 'N [ node_name [, ...] ]'
>>
>> This topic has been already discussed before but, we might want to
>> change it to other characters such as < and >?
>
> I personally would recommend against <>.  Those should mean less-than
> and greater-than, not grouping.  I think you could use parentheses,
> ().  There's nothing saying that has to mean any particular thing, so
> you may as well use it for the first thing implemented, perhaps.  Or
> you could use [] or {}.  It *is* important that you don't create
> confusing syntax summaries, but I don't think that's a reason to pick
> a nonstandard syntax for grouping.
>

I agree with you.
I've changed it to use parentheses.

Regards,

--
Masahiko Sawada
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d48a13f..1650b6d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2902,20 +2902,18 @@ include_dir 'conf.d'
   
   

-Specifies a comma-separated list of standby names that can support
-synchronous replication, as described in
-.
-At any one time there will be at most one active synchronous standby;
-transactions waiting for commit will be allowed to proceed after
-this standby server confirms receipt of their data.
-The synchronous standby will be the first standby named in this list
-that is both currently connected and streaming data in real-time
-(as shown by a state of streaming in the
+Specifies the standby names that can support synchronous replication
+using either of two syntaxes: a comma-separated list, or a more flexible syntax
+described in .
+Transactions waiting for commit will be allowed to proceed after a
+configurable subset of standby servers confirms receipt of their data.
+For the simple comma-separated list syntax, it is one server.
+The synchronous standbys will be those named in this parameter that are both
+currently connected and streaming data in real-time (as shown by a state
+of streaming in the
 
 pg_stat_replication view).
-Other standby servers appearing later in this list represent potential
-synchronous standbys.
-If the current synchronous standby disconnects for whatever reason,
+If the any of the current synchronous standbys disconnects for whatever reason,
 it will be replaced immediately with the next-highest-priority standby.
 Specifying more than one standby name can allow very high availability.

@@ -2923,9 +2921,10 @@ include_dir 'conf.d'
 The name of a standby server for this purpose is the
 application_name setting of the standby, as set in the
 primary_conninfo of the standby's WAL receiver.  There is
-no mechanism to enforce uniqueness. In case of duplicates one of the
-matching standbys will be chosen to be the synchronous standby, though
-exactly which one is indeterminate.
+no mechanism to enforce uniqueness. For each specified standby name,
+only the specified count of standbys will be chosen to be synchronous
+standbys, though exactly which ones is indeterminate.  The rest will
+represent potential synchronous standbys.
 The special entry * matches any
 application_name, including the default application name
 of walreceiver.
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 19d613e..5dd9fab 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1027,24 +1027,27 @@ primary_slot_name = 'node_a_slot'
 

 Synchronous replication offers the ability to confirm that all changes
-made by a transaction have been transferred to one synchronous standby
-server. This extends the standard level of durability
+made by a transaction have been transferred to one or more synchronous
+standby servers. This extends that standard level of durability
 offered by a transaction commit. This level of protection is referred
-to as 2-safe replication in computer science theory.
+to as 2-safe replication in computer science theory, and group-1-safe
+(group-safe and 1-safe) when synchronous_commit is set to
+more than remote_write.

 

 When requesting synchronous replication, each commit of a
 write transaction will wait until confirmation is

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Dilip Kumar

On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:

>
> What's sizeof(BufferDesc) after applying these patches? It should better
> be <= 64...
>

It is 72.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar

On Mon, Mar 28, 2016 at 11:00 AM, Amit Kapila 
wrote:

> I have not debugged the flow, but by looking at v13 code, it looks like it
> will search both old and new.   In
> function 
> GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(),
> the basic idea of search is: Start the search from the target slot.  At
> every step, move one
> node to the right, then climb up to the parent.  Stop when we reach a node
> with enough free space (as we must, since the root has enough space).
> So shouldn't it be able to find the new FSM page where the bulk extend
> rolls over?
>

This is actually multi level tree, So each FSM page contain one slot tree.

So fsm_search_avail() is searching only the slot tree, inside one FSM page.
But we want to go to next FSM page.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

1 2 >

100 matches

Mail list logo