Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Andres Freund
On 2017-04-04 08:01:32 -0400, Robert Haas wrote:
> On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund  wrote:
> > I don't think the parallel seqscan is comparable in complexity with the
> > parallel append case.  Each worker there does the same kind of work, and
> > if one of them is behind, it'll just do less.  But correct sizing will
> > be more important with parallel-append, because with non-partial
> > subplans the work is absolutely *not* uniform.
>
> Sure, that's a problem, but I think it's still absolutely necessary to
> ramp up the maximum "effort" (in terms of number of workers)
> logarithmically.  If you just do it by costing, the winning number of
> workers will always be the largest number that we think we'll be able
> to put to use - e.g. with 100 branches of relatively equal cost we'll
> pick 100 workers.  That's not remotely sane.

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that.  Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

If the goal is to compute the results of the query as fast as possible,
and to not use more than max_parallel_per_XXX, and it's actually
beneficial to use more workers, then we should.  Because otherwise you
really can't use the resources available.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-04-04 Thread Pavel Stehule
2017-04-04 22:05 GMT+02:00 Fabien COELHO :

>
> After some discussions about what could be useful since psql scripts now
> accepts tests, this patch sets a few variables which can be used by psql
> after a "front door" (i.e. actually typed by the user) query:
>
>  - RESULT_STATUS: the status of the query
>  - ERROR: whether the query failed
>  - ERROR_MESSAGE: ...
>  - ROW_COUNT: #rows affected
>
>  SELECT * FROM ;
>  \if :ERROR
>\echo oops
>\q
>  \endif
>
> I'm not sure that the names are right. Maybe STATUS would be better than
> RESULT_STATUS.


good ideas

Regards

Pavel

>
>
> --
> Fabien.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-04-04 Thread Andrew Dunstan


On 04/03/2017 05:17 PM, Andres Freund wrote:
> On 2017-03-21 14:31:08 -0400, Andrew Dunstan wrote:
>>
>> On 03/21/2017 01:37 PM, David Steele wrote:
>>> On 3/16/17 11:54 AM, David Steele wrote:
 On 2/1/17 12:53 AM, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
>> Nikita Glukhov  writes:
>>> On 25.01.2017 23:58, Tom Lane wrote:
 I think you need to take a second look at the code you're producing
 and realize that it's not so clean either.  This extract from
 populate_record_field, for example, is pretty horrid:
>>> But what if we introduce some helper macros like this:
>>> #define JsValueIsNull(jsv) \
>>>  ((jsv)->is_json ? !(jsv)->val.json.str \
>>>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)
>>> #define JsValueIsString(jsv) \
>>>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>>>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)
>> Yeah, I was wondering about that too.  I'm not sure that you can make
>> a reasonable set of helper macros that will fix this, but if you want
>> to try, go for it.
>>
>> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
>> to go back to the manual every darn time to convince myself whether
>> that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
>> the reader (... or the author) having memorized C's precedence rules
>> in quite that much detail.  Extra parens help.
> Moved to CF 2017-03 as discussion is going on and more review is
> needed on the last set of patches.
>
 The current patches do not apply cleanly at cccbdde:

 $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch
 error: patch failed: src/backend/utils/adt/jsonb_util.c:328
 error: src/backend/utils/adt/jsonb_util.c: patch does not apply
 error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266
 error: src/backend/utils/adt/jsonfuncs.c: patch does not apply
 error: patch failed: src/include/utils/jsonb.h:218
 error: src/include/utils/jsonb.h: patch does not apply

 In addition, it appears a number of suggestions have been made by Tom
 that warrant new patches.  Marked "Waiting on Author".
>>> This thread has been idle for months since Tom's review.
>>>
>>> The submission has been marked "Returned with Feedback".  Please feel
>>> free to resubmit to a future commitfest.
>>>
>>>
>> Please revive. I am planning to look at this later this week.
> Since that was 10 days ago - you're still planning to get this in before
> the freeze?
>


Hoping to, other demands have intervened a bit, but I might be able to.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
On 4/4/17 12:55 PM, Ashutosh Sharma wrote:
> 
> As I am not seeing any response from Tomas for last 2-3 days and since
> the commit-fest is coming towards end, I have planned to work on the
> review comments that I had given few days back and submit the updated
> patch. PFA new version of patch that takes care of all the review
> comments given by me. I have also ran pgindent on btreefuncs.c file
> and have run some basic test cases. All looks fine to me now!

Awesome, Ashutosh!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Tomas Vondra

On 04/04/2017 10:42 PM, Tomas Vondra wrote:

Hi,

Andres nagged to me about valgrind runs taking much longer since
9fab40ad introduced the SlabContext into reorderbuffer.c. And by
"longer" I mean hours instead of minutes.

After a bit of investigation I stumbled on this line in slab.c:

  VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));

which is wrong, because the second parameter should be size and not
another pointer. This essentially marks a lot of memory as defined,
which results in the extreme test runtime.

Instead, the line should be:

  VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));

Patch attached.



Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting 
in a valgrind failure with --enable-assert. Updated patch version 
attached, passing all tests in test_decoding.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-valgrind-fix-v2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread Tomas Vondra

Thanks. I planned to look into this today, but you've been faster ;-)

regards
Tomas

On 04/04/2017 06:55 PM, Ashutosh Sharma wrote:

Hi,

As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!

Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-04-04 Thread Peter Geoghegan
On Tue, Apr 4, 2017 at 3:07 AM, Anastasia Lubennikova
 wrote:
>> * I think that we should store this (the number of attributes), and
>> use it directly when comparing, per my remarks to Tom over on that
>> other thread. We should also use the free bit within
>> IndexTupleData.t_info, to indicate that the IndexTuple was truncated,
>> just to make it clear to everyone that might care that that's how
>> these truncated IndexTuples need to be represented.
>>
>> Doing this would have no real impact on your patch, because for you
>> this will be 100% redundant. It will help external tools, and perhaps
>> another, more general suffix truncation patch that comes in the
>> future. We should try very hard to have a future-proof on-disk
>> representation. I think that this is quite possible.
>
> To be honest, I think that it'll make the patch overcomplified, because this
> exact patch has nothing to do with suffix truncation.
> Although, we can add any necessary flags if this work will be continued in
> the future.

Yes, doing things that way would mean adding a bit more complexity to
your patch, but IMV would be worth it to have the on-disk format be
compatible with what a full suffix truncation patch will eventually
require.

Obviously I disagree with what you say here -- I think that your patch
*does* have plenty in common with suffix truncation. But, you don't
have to even agree with me on that to see why what I propose is still
a good idea. Tom Lane had a specific objection to this patch --
catalog metadata is currently necessary to interpret internal page
IndexTuples [1]. However, by storing the true number of columns in the
case of truncated tuples, we can make the situation with IndexTuples
similar enough to the existing situation with heap tuples, where the
number of attributes is available right in the header as "natts". We
don't have to rely on something like catalog metadata from a great
distance, where some caller may forget to pass through the metadata to
a lower level.

So, presumably doing things this way addresses Tom's exact objection
to the truncation aspect of this patch [2]. We have the capacity to
store something like natts "for free" -- let's use it. The lack of any
direct source of metadata was called "dangerous". As much as anything
else, I want to remove any danger.

> There is already an assertion.
> Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup));
> Do you think it is not enough?

I think that a "can't happen" check will work better in the future,
when user defined code could be involved in truncation. Any extra
overhead will be paid relatively infrequently, and will be very low.

>> * I see a small bug. You forgot to teach _bt_findsplitloc() about
>> truncation. It does this currently, which you did not update:
>>
>>  /*
>>   * The first item on the right page becomes the high key of the left
>> page;
>>   * therefore it counts against left space as well as right space.
>>   */
>>  leftfree -= firstrightitemsz;
>>
>> I think that this accounting needs to be fixed.
>
> Could you explain, what's wrong with this accounting? We may expect to take
> more space on the left page, than will be taken after highkey truncation.
> But I don't see any problem here.

Obviously it would at least be slightly better to have the actual
truncated high key size where that's expected -- not the would-be
untruncated high key size. The code as it stands might lead to a bad
choice of split point in edge-cases.

At the very least, you should change comments to note the issue. I
think it's highly unlikely that this could ever result in a failure to
find a split point, which there are many defenses against already, but
I think I would find that difficult to prove. The intent of the code
is almost as important as the code, at least in my opinion.

[1] 
postgr.es/m/CAH2-Wz=vmdh8pfazx9wah9bn5ast5vrna0xsz+gsfrs12bp...@mail.gmail.com
[2] postgr.es/m/11895.1490983884%40sss.pgh.pa.us
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql - add special variable to reflect the last query status

2017-04-04 Thread Fabien COELHO


After some discussions about what could be useful since psql scripts now 
accepts tests, this patch sets a few variables which can be used by psql 
after a "front door" (i.e. actually typed by the user) query:


 - RESULT_STATUS: the status of the query
 - ERROR: whether the query failed
 - ERROR_MESSAGE: ...
 - ROW_COUNT: #rows affected

 SELECT * FROM ;
 \if :ERROR
   \echo oops
   \q
 \endif

I'm not sure that the names are right. Maybe STATUS would be better than 
RESULT_STATUS.


--
Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 3b86612..7006f23 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3449,6 +3449,24 @@ bar
   
 
   
+   ERROR
+   
+
+ Whether the last query failed.
+
+   
+  
+
+  
+   ERROR_MESSAGE
+   
+
+ If the last query failed, the associated error message.
+
+   
+  
+
+  
 FETCH_COUNT
 
 
@@ -3653,6 +3671,25 @@ bar
   
 
   
+   RESULT_STATUS
+   
+
+ Text status of the last query.
+
+   
+  
+
+  
+   ROW_COUNT
+   
+
+ When appropriate, how many rows were returned or affected by the
+ last query.
+
+   
+  
+
+  
 QUIET
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a2f1259..74d22fb 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1213,6 +1213,57 @@ PrintQueryResults(PGresult *results)
 	return success;
 }
 
+/*
+ * Set special variables for "front door" queries
+ * - RESULT_STATUS: last query status
+ * - ERROR: TRUE/FALSE, whether an error occurred
+ * - ERROR_MESSAGE: message if an error occured
+ * - ROW_COUNT: how many rows were returned or affected, if appropriate
+ */
+static void
+SetResultVariables(PGresult *results)
+{
+	bool			success;
+	ExecStatusType	restat = PQresultStatus(results);
+
+	SetVariable(pset.vars, "RESULT_STATUS", PQresStatus(restat));
+
+	switch (restat)
+	{
+		case PGRES_EMPTY_QUERY:
+		case PGRES_TUPLES_OK:
+		case PGRES_COMMAND_OK:
+		case PGRES_COPY_OUT:
+		case PGRES_COPY_IN:
+		case PGRES_COPY_BOTH:
+		case PGRES_SINGLE_TUPLE:
+			success = true;
+			SetVariable(pset.vars, "ERROR", "FALSE");
+			SetVariable(pset.vars, "ERROR_MESSAGE", NULL);
+			break;
+		case PGRES_BAD_RESPONSE:
+		case PGRES_NONFATAL_ERROR:
+		case PGRES_FATAL_ERROR:
+			success = false;
+			SetVariable(pset.vars, "ERROR", "TRUE");
+			SetVariable(pset.vars, "ERROR_MESSAGE", PQerrorMessage(pset.db));
+			break;
+		default:
+			/* dead code */
+			success = false;
+			psql_error("unexpected PQresultStatus: %d\n", restat);
+			break;
+	}
+
+	if (success)
+	{
+		/* set variable if non empty */
+		char   *ntuples = PQcmdTuples(results);
+		SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : NULL);
+	}
+	else
+		SetVariable(pset.vars, "ROW_COUNT", NULL);
+}
 
 /*
  * SendQuery: send the query string to the backend
@@ -1346,6 +1397,9 @@ SendQuery(const char *query)
 			elapsed_msec = INSTR_TIME_GET_MILLISEC(after);
 		}
 
+		/* set special variables to reflect the result status */
+		SetResultVariables(results);
+
 		/* but printing results isn't: */
 		if (OK && results)
 			OK = PrintQueryResults(results);
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index d602aee..3ee96b5 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -2904,6 +2904,36 @@ bar 'bar' "bar"
 	\echo 'should print #8-1'
 should print #8-1
 \endif
+-- special result variables
+SELECT 1 UNION SELECT 2;
+ ?column? 
+--
+1
+2
+(2 rows)
+
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_TUPLES_OK
+\echo 'number of rows: ' :ROW_COUNT
+number of rows:  2
+SELECT 1 UNION;
+ERROR:  syntax error at or near ";"
+LINE 1: SELECT 1 UNION;
+  ^
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_FATAL_ERROR
+\if :ERROR
+  \echo 'Oops, an error occured...'
+Oops, an error occured...
+  \echo 'error message: ' :ERROR_MESSAGE
+error message:  ERROR:  syntax error at or near ";"
+LINE 1: SELECT 1 UNION;
+  ^
+
+\endif
+;
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_EMPTY_QUERY
 -- SHOW_CONTEXT
 \set SHOW_CONTEXT never
 do $$
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index b56a05f..716fe06 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -526,6 +526,21 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
 	\echo 'should print #8-1'
 \endif
 
+-- special result variables
+SELECT 1 UNION SELECT 2;
+\echo 'result status: ' :RESULT_STATUS
+\echo 'number of rows: ' :ROW_COUNT
+
+SELECT 1 UNION;
+\echo 'result status: ' :RESULT_STATUS
+\if :ERROR
+  \echo 'Oops, an error occured...'
+  \echo 'error message: ' 

Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Keith Fiske
On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed  wrote:

> Hello,
>
> Please find attached an updated patch.
> Following has been accomplished in this update:
>
> 1. A new partition can be added after default partition if there are no
> conflicting rows in default partition.
> 2. Solved the crash reported earlier.
>
> Thank you,
> Rahila Syed
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Installed and compiled against commit
60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
-0400) without any issue

However, running your original example, I'm getting this error

keith@keith=# CREATE TABLE list_partitioned (
keith(# a int
keith(# ) PARTITION BY LIST (a);
CREATE TABLE
Time: 4.933 ms
keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
VALUES IN (DEFAULT);
CREATE TABLE
Time: 3.492 ms
keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
IN (4,5);
ERROR:  unrecognized node type: 216
Time: 0.979 ms

Also, I'm still of the opinion that denying future partitions of values in
the default would be a cause of confusion. In order to move the data out of
the default and into a proper child it would require first removing that
data from the default, storing it elsewhere, creating the child, then
moving it back. If it's only a small amount of data it may not be a big
deal, but if it's a large amount, that could cause quite a lot of
contention if done in a single transaction. Either that or the user would
have to deal with data existing in the table, disappearing, then
reappearing again.

This also makes it harder to migrate an existing table easily. Essentially
no child tables for a large, existing data set could ever be created
without going through one of the two situations above.

However, thinking through this, I'm not sure I can see a solution without
the global index support. If this restriction is removed, there's still an
issue of data duplication after the necessary child table is added. So
guess it's a matter of deciding which user experience is better for the
moment?

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com


Re: [HACKERS] Logical decoding on standby

2017-04-04 Thread Andres Freund
On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
> I'm much happier with this. I'm still fixing some issues in the tests
> for 03 and tidying them up, but 03 should allow 01 and 02 to be
> reviewed in their proper context now.

To me this very clearly is too late for v10, and now should be moved to
the next CF.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-04 Thread Andres Freund
On 2017-03-13 18:14:07 -0700, Peter Geoghegan wrote:
> From 5351b5db257cb39832647d9117465c0217e6268b Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan 
> Date: Thu, 13 Oct 2016 10:54:31 -0700
> Subject: [PATCH 1/2] Avoid copying within tuplesort_gettupleslot().

s/Avoid/Allow to avoid/

> Add a "copy" argument to make it optional to receive a copy of caller
> tuple that is safe to use following a subsequent manipulating of
> tuplesort's state.  This is a performance optimization.  Most existing
> tuplesort_gettupleslot() callers are made to opt out of copying.
> Existing callers that happen to rely on the validity of tuple memory
> beyond subsequent manipulations of the tuplesort request their own copy.
> 
> This brings tuplesort_gettupleslot() in line with
> tuplestore_gettupleslot().  In the future, a "copy" tuplesort_getdatum()
> argument may be added, that similarly allows callers to opt out of
> receiving their own copy of tuple.
> 
> In passing, clarify assumptions that callers of other tuplesort fetch
> routines may make about tuple memory validity, per gripe from Tom Lane.
> ---
>  src/backend/executor/nodeAgg.c | 10 +++---
>  src/backend/executor/nodeSort.c|  5 +++--
>  src/backend/utils/adt/orderedsetaggs.c |  5 +++--
>  src/backend/utils/sort/tuplesort.c | 28 +---
>  src/include/utils/tuplesort.h  |  2 +-
>  5 files changed, 31 insertions(+), 19 deletions(-)
> 
> diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
> index aa08152..b650f71 100644
> --- a/src/backend/executor/nodeAgg.c
> +++ b/src/backend/executor/nodeAgg.c
> @@ -570,6 +570,10 @@ initialize_phase(AggState *aggstate, int newphase)
>   * Fetch a tuple from either the outer plan (for phase 0) or from the sorter
>   * populated by the previous phase.  Copy it to the sorter for the next phase
>   * if any.
> + *
> + * Callers cannot rely on memory for tuple in returned slot remaining valid
> + * past any subsequent manipulation of the sorter, such as another fetch of
> + * input from sorter.  (The sorter may recycle memory.)
>   */

It's not just the sorter, right? I'd just rephrase that the caller
cannot rely on tuple to stay valid beyond a single call.



>  static bool
>  tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool 
> forward,
>   * NULL value in leading attribute will set abbreviated value to zeroed
>   * representation, which caller may rely on in abbreviated inequality check.
>   *
> - * The slot receives a copied tuple (sometimes allocated in caller memory
> - * context) that will stay valid regardless of future manipulations of the
> - * tuplesort's state.
> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
> + * regardless of future manipulations of the tuplesort's state.  Memory is
> + * owned by the caller.  If copy is FALSE, the slot will just receive a
> + * pointer to a tuple held within the tuplesort, which is more efficient,
> + * but only safe for callers that are prepared to have any subsequent
> + * manipulation of the tuplesort's state invalidate slot contents.
>   */

Hm. Do we need to note anything about how long caller memory needs to
live? I think we'd get into trouble if the caller does a memory context
reset, without also clearing the slot?


>  bool
> -tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
> +tuplesort_gettupleslot(Tuplesortstate *state, bool forward, bool copy,
>  TupleTableSlot *slot, Datum *abbrev)
>  {
>   MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
> @@ -2113,8 +2117,10 @@ tuplesort_gettupleslot(Tuplesortstate *state, bool 
> forward,
>   if (state->sortKeys->abbrev_converter && abbrev)
>   *abbrev = stup.datum1;
>  
> - stup.tuple = heap_copy_minimal_tuple((MinimalTuple) stup.tuple);
> - ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, true);
> + if (copy)
> + stup.tuple = heap_copy_minimal_tuple((MinimalTuple) 
> stup.tuple);
> +
> + ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, copy);
>   return true;


Other than these minimal adjustments, this looks good to go to me.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Tomas Vondra

Hi,

Andres nagged to me about valgrind runs taking much longer since 
9fab40ad introduced the SlabContext into reorderbuffer.c. And by 
"longer" I mean hours instead of minutes.


After a bit of investigation I stumbled on this line in slab.c:

  VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));

which is wrong, because the second parameter should be size and not 
another pointer. This essentially marks a lot of memory as defined, 
which results in the extreme test runtime.


Instead, the line should be:

  VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));

Patch attached.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-valgrind-fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Andres Freund
On 2017-04-04 23:23:30 +0200, Tomas Vondra wrote:
> On 04/04/2017 10:42 PM, Tomas Vondra wrote:
> > Hi,
> > 
> > Andres nagged to me about valgrind runs taking much longer since
> > 9fab40ad introduced the SlabContext into reorderbuffer.c. And by
> > "longer" I mean hours instead of minutes.
> > 
> > After a bit of investigation I stumbled on this line in slab.c:
> > 
> >   VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));
> > 
> > which is wrong, because the second parameter should be size and not
> > another pointer. This essentially marks a lot of memory as defined,
> > which results in the extreme test runtime.
> > 
> > Instead, the line should be:
> > 
> >   VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));
> > 
> > Patch attached.
> > 
> 
> Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting in a
> valgrind failure with --enable-assert. Updated patch version attached,
> passing all tests in test_decoding.

Pushed, and re-enabled TestDecoding on skink.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> It's too late. Someone has already moved the patch to the next CF (for
> PostgreSQL 11).

Yes, but this patch will be necessary by the final release of PG 10 if the 
libpq batch/pipelining is committed in PG 10.

I marked this as ready for committer in the next CF, so that some committer can 
pick up this patch and consider putting it in PG 10.  If you decide to modify 
the patch, please change the status.

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Andres Freund
On 2017-01-05 03:12:09 +, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander
> > For the pg_ctl changes, we're going from removing all privilieges from the
> > token, to removing none. Are there any other privileges that we should be
> > worried about? I think you may be correct in that it's overkill to do it,
> > but I think we need some more specifics to decide that.
> 
> This page lists the privileges.  Is there anyhing you are concerned about?
> 
> https://msdn.microsoft.com/ja-jp/library/windows/desktop/bb530716(v=vs.85).aspx

Aren't like nearly all of them a concern?  We gone from having some
containment (not being to create users, shut the system down, ...), to
none.  I do however think there's a fair argument to be made that other
platforms do not have a similar containment (no root, but sudo etc is
still possible), and that the containment isn't super strong.

Can't we, to reduce the size of the behavioural change, just use
AdjustTokenPrivileges() to re-add the privileges we want?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-04 Thread Michael Paquier
Hi all,

There is still one open item pending for SCRAM that has not been
treated which is mentioned here:
https://www.postgresql.org/message-id/b081887e-1712-3aa4-7dbe-e012333d5...@iki.fi

When doing an authentication with SASL, the server decides what is the
mechanism that the client has to use. As SCRAM-SHA-256 is only one of
such mechanisms, it would be nice to have something more generic and
have the server return to the client a list of protocols that the
client can choose from. And also the server achnowledge which protocol
is going to be used.

Note that RFC4422 has some content on the matter
https://tools.ietf.org/html/rfc4422#section-3.1:
   Mechanism negotiation is protocol specific.

   Commonly, a protocol will specify that the server advertises
   supported and available mechanisms to the client via some facility
   provided by the protocol, and the client will then select the "best"
   mechanism from this list that it supports and finds suitable.

So once the server sends back the list of mechanisms that are
supported, the client is free to use what it wants.

On HEAD, a 'R' message with AUTH_REQ_SASL followed by
SCRAM_SHA256_NAME is sent to let the client know what is the mechanism
to use for the SASL exchange. In the future, this should be extended
so as a list of names is sent, for example a comma-separated list, but
we are free to choose the format we want here. With this list at hand,
the client can then choose the protocol it thinks is the best. Still,
there is a gap with our current implementation because the server
expects the first message from the client to have a SCRAM format, but
that's true only if SCRAM-SHA-256 is used as mechanism.

In order to cover this gap, it seems to me that we need to have an
intermediate state before the server is switched to FE_SCRAM_INIT so
as the mechanism used is negotiated between the two parties. Once the
protocol negotiation is done, the server can then move on with the
mechanism to use. This would be important in the future to allow more
SASL mechanisms to work. I am adding an open item for that.

For extensibility, we may also want to revisit the choice of defining
'scram' in pg_hba.conf instead of 'sasl'...

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andres Freund
> I don't think the errdetail is quite right - OpenProcessToken isn't really
> a syscall, is it? But then it's a common pattern already in wind32_shmem.c...

Yes, "Win32 API function" would be correct, but I followed the existing code...


> > +errdetail("Failed system call was %s.",
> > +"LookupPrivilegeValue")));
> 
> Other places in the file actually log the arguments to the function...

The only place is CreateFileMapping.  Other places (DuplicateHandle and 
MapViewOfFileEx) don't log arguments.  I guess the original developer thought 
that size and name arguments to CreateFileMapping() might be useful for 
troubleshooting.


> Wonder if we should quote "Lock Pages in Memory" or add dashes, to make
> sure it's clear that that's the right?

I saw several Microsoft pages, including a page someone introduced me here, and 
they didn't quote the user right.  I'm comfortable with the current code, but I 
don't mind if the committer adds some quotation.


> > +   flProtect = PAGE_READWRITE | SEC_COMMIT |
> SEC_LARGE_PAGES;
> 
> Why don't we just add the relevant flag, instead of overwriting the previous
> contents?

I don't have strong opinion on this...

> Uh - isn't that a behavioural change?  Was this discussed?

Yes, I described this in the first mail.  Magnus asked about this later and I 
replied.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote:
> >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
> >> starts and stops the timer), then it's a concern and the patch should not 
> >> be ready for committer.  However, the current patch is not like that -- it 
> >> seems to do what others in this thread are expecting.
> > 
> > Oh, interesting - I kind of took the author's statement as, uh,
> > authoritative ;).  A quick look over the patch confirms your
> > understanding.
> 
> Yes, Tsunakawa-san is correct. Sorry for confusion.
> 
> > I think the code needs a few clarifying comments around this, but
> > otherwise seems good.  Not restarting the timeout in those cases
> > obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> > comments should note that.
> > 
> > Tatsuo-san, do you want to change those, and push?  I can otherwise.
> 
> Andres,
> 
> If you don't mind, could you please fix the comments and push it.

Hm. I started to edit it, but I'm halfway coming back to my previous
view that this isn't necessarily ready.

If a client were to to prepare a large number of prepared statements
(i.e. a lot of parse messages), this'll only start the timeout once, at
the first statement sent.  It's not an issue for libpq which sends a
sync message after each PQprepare, nor does it look to be an issue for
pgjdbc based on a quick look.

Does anybody think there might be a driver out there that sends a bunch
of 'parse' messages, without syncs?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 6 March 2017 at 05:09, Jim Nasby  wrote:
> On 2/28/17 9:42 PM, Jim Nasby wrote:
>>>
>>>
>>> I'll post a plpython patch that doesn't add the output format control.
>>
>>
>> I've attached the results of that. Unfortunately the speed improvement
>> is only 27% at this point (with 999 tuples). Presumably that's
>> because it's constructing a brand new dictionary from scratch for each
>> tuple.

Taking a look at this now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-04-04 Thread Tatsuo Ishii
>> Please find attached a v8 which hopefully fixes these two issues.
> Looks good to me, marking as ready for committer.

I have looked into this a little bit.

It seems the new feature \gset doesn't work with tables having none
ascii column names:

$ src/bin/pgbench/pgbench -t 1 -f /tmp/f test
starting vacuum...end.
gset: invalid variable name: "数字"
client 0 file 0 command 0 compound 0: error storing into var 数字
transaction type: /tmp/f
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 1
number of transactions actually processed: 0/1

This is because pgbench variable names are limited to ascii
ranges. IMO the limitation is unnecessary and should be removed. (I
know that the limitation was brought in by someone long time ago and
the patch author is not responsible for that).

Anyway, now that the feature reveals the undocumented limitation, we
should document the limitation of \gset at least.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
> Hm. I started to edit it, but I'm halfway coming back to my previous
> view that this isn't necessarily ready.
> 
> If a client were to to prepare a large number of prepared statements
> (i.e. a lot of parse messages), this'll only start the timeout once, at
> the first statement sent.  It's not an issue for libpq which sends a
> sync message after each PQprepare, nor does it look to be an issue for
> pgjdbc based on a quick look.
> 
> Does anybody think there might be a driver out there that sends a bunch
> of 'parse' messages, without syncs?

What's your point of the question? What kind of problem do you expect
if the timeout starts only once at the first parse meesage out of
bunch of parse messages?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Outdated comments around HandleFunctionRequest

2017-04-04 Thread Andres Freund
Hi,

PostgresMain() has the following blurb for fastpath functions:

/*
 * Note: we may at this point be inside an 
aborted
 * transaction.  We can't throw error for that 
until we've
 * finished reading the function-call message, 
so
 * HandleFunctionRequest() must check for it 
after doing so.
 * Be careful not to do anything that assumes 
we're inside a
 * valid transaction here.
 */
and in HandleFunctionRequest() there's:

 * INPUT:
 *  In protocol version 3, postgres.c has already read the message 
body
 *  and will pass it in msgBuf.
 *  In old protocol, the passed msgBuf is empty and we must read the
 *  message here.

which is not true anymore.  Followed by:

/*
 * Now that we've eaten the input message, check to see if we actually
 * want to do the function call or not.  It's now safe to ereport(); we
 * won't lose sync with the frontend.
 */

which is also not really meaningful, because there's no previous code in
the function.

This largely seems to be damage from


commit 2b3a8b20c2da9f39ffecae25ab7c66974fbc0d3b
Author: Heikki Linnakangas 
Date:   2015-02-02 17:08:45 +0200

Be more careful to not lose sync in the FE/BE protocol.

Heikki?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii
> Hmm. IMO, that could happen even with the current statement timeout
> implementation as well.
> 
> Or we could start/stop the timeout in exec_execute_message() only. This
> could avoid the problem above. Also this is more consistent with
> log_duration/log_min_duration_statement behavior than now.

I think it's better to include Parse and Bind as now.  Parse or Bind could take 
long time when the table has many partitions, the query is complex and/or very 
long, some DDL statement is running against a target table, or the system load 
is high.

Firing statement timeout during or after many Parses is not a problem, because 
the first Parse started running some statement and it's not finished.  Plus, 
Andres confirmed that major client drivers don't use such a pattern.  There may 
be an odd behavior purely from the perspective of E.Q.P, that's a compromise, 
which Andres meant by "not perfect but."

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com]
> TBH, anyone who cares about security and runs Win7 or Win2k8 or newer should
> be using virtual service accounts and managed service accounts.
> 
> https://technet.microsoft.com/en-us/library/dd548356
> 
> 
> Those are more like Unix service accounts. Notably they don't need a password,
> getting rid of some of the management pain that led us to abandon the
> 'postgres' system user on Windows.
> 
> Now that older platforms are EoL and even the oldest that added this feature
> are also near EoL or in extended maintenance, I think installers should
> switch to these by default instead of using NETWORK SERVICE.
> 
> Then the issue of priv dropping would be a lesser concern anyway.

Good point!  And I said earlier in this thread, I think managing privileges 
(adding/revoking privileges from the user account) is the DBA's or sysadmin's 
duty, and PG's removing all privileges feels overkill.

OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock Pages 
in Memory, using the attached pg_ctl.c.  Please see EnableLockPagesPrivilege() 
and its call site.  But pg_ctl -w start fails emitting the following message:

error code 1300
waiting for server to startFATAL:  could not enable "Lock pages in memory" 
privilege
HINT:  Assign "Lock pages in memory" privilege to the Windows user account 
which runs PostgreSQL.
LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

error code 1300 is ERROR_NOT_ALL_ASSIGNED, which means AdjustTokenPrivileges() 
cound not enable Lock Pages in Memory privilege.  It seems that the privilege 
cannot be enabled once it was removed with 
CreateRestrictedToken(DISABLE_MAX_PRIVILEGE).

Regards
Takayuki Tsunakawa


/*-
 *
 * pg_ctl --- start/stops/restarts the PostgreSQL server
 *
 * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
 *
 * src/bin/pg_ctl/pg_ctl.c
 *
 *-
 */

#ifdef WIN32
/*
 * Need this to get defines for restricted tokens and jobs. And it
 * has to be set before any header from the Win32 API is loaded.
 */
#define _WIN32_WINNT 0x0501
#endif

#include "postgres_fe.h"

#include "libpq-fe.h"
#include "pqexpbuffer.h"

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifdef HAVE_SYS_RESOURCE_H
#include 
#include 
#endif

#include "getopt_long.h"
#include "miscadmin.h"

/* PID can be negative for standalone backend */
typedef long pgpid_t;


typedef enum
{
SMART_MODE,
FAST_MODE,
IMMEDIATE_MODE
} ShutdownMode;


typedef enum
{
NO_COMMAND = 0,
INIT_COMMAND,
START_COMMAND,
STOP_COMMAND,
RESTART_COMMAND,
RELOAD_COMMAND,
STATUS_COMMAND,
PROMOTE_COMMAND,
KILL_COMMAND,
REGISTER_COMMAND,
UNREGISTER_COMMAND,
RUN_AS_SERVICE_COMMAND
} CtlCommand;

#define DEFAULT_WAIT60

static bool do_wait = false;
static bool del_service_rid = false;
static bool wait_set = false;
static int  wait_seconds = DEFAULT_WAIT;
static bool wait_seconds_arg = false;
static bool silent_mode = false;
static ShutdownMode shutdown_mode = FAST_MODE;
static int  sig = SIGINT;   /* default */
static CtlCommand ctl_command = NO_COMMAND;
static char *pg_data = NULL;
static char *pg_config = NULL;
static char *pgdata_opt = NULL;
static char *post_opts = NULL;
static const char *progname;
static char *log_file = NULL;
static char *exec_path = NULL;
static char *event_source = NULL;
static char *register_servicename = "PostgreSQL";   /* FIXME: + 
version ID? */
static char *register_username = NULL;
static char *register_password = NULL;
static char *argv0 = NULL;
static bool allow_core_files = false;
static time_t start_time;

static char postopts_file[MAXPGPATH];
static char version_file[MAXPGPATH];
static char pid_file[MAXPGPATH];
static char backup_file[MAXPGPATH];
static char recovery_file[MAXPGPATH];
static char promote_file[MAXPGPATH];

#ifdef WIN32
static DWORD pgctl_start_type = SERVICE_AUTO_START;
static SERVICE_STATUS status;
static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0;
static HANDLE shutdownHandles[2];
static pid_t postmasterPID = -1;

#define shutdownEvent shutdownHandles[0]
#define postmasterProcess shutdownHandles[1]
#endif


static void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2);
static void do_advice(void);
static void do_help(void);
static void set_mode(char *modeopt);
static void set_sig(char *signame);
static void do_init(void);
static void do_start(void);
static void do_stop(void);
static void do_restart(void);
static void do_reload(void);
static void do_status(void);
static void do_promote(void);
static void do_kill(pgpid_t pid);
static void print_msg(const char *msg);
static void 

Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: Andres Freund [mailto:and...@anarazel.de]
> Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> npgsql doing it seems like some evidence that it's ok.

And psqlODBC now uses always libpq.

Now time for final decision?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 6:56 PM, Joe Conway  wrote:
> On 04/04/2017 10:02 AM, Joe Conway wrote:
>> On 04/04/2017 09:55 AM, Mike Palmiotto wrote:
>>> After some discussion off-list, I've rebased and udpated the patches.
>>> Please see attached for further review.
>>
>> Thanks -- will have another look and test on a machine with selinux
>> setup. Robert, did you want me to take responsibility to commit on this
>> or just provide review/feedback?
>
> I did some editorializing on these.
>
> In particular I did not like the approach to fixing "warning: ‘tclass’
> may be used uninitialized" and ended up just doing it the same as was
> done elsewhere in relation.c already (set tclass = 0 in the variable
> declaration). Along the way I also changed one instance of tclass from
> uint16 to uint16_t for the sake of consistency.
>
> Interestingly we figured out that the warning was present with -Og, but
> not present with -O0, -O2, or -O3.
>
> If you want to test, apply 0001a and 0001b before 0002.
>
> Any objections?

I'm guessing Tom's going to have a strong feeling about whether 0001a
is the right way to address the stdbool issue, but I don't personally
know what the right thing to do is so I will avoid opining on that
topic.

So, nope, no objections here to you committing those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
> On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote:
>> On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
>> > From: Andres Freund [mailto:and...@anarazel.de]
>> > > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs 
>> > > and
>> > > npgsql doing it seems like some evidence that it's ok.
>> > 
>> > And psqlODBC now uses always libpq.
>> > 
>> > Now time for final decision?
>> 
>> I'll send an updated patch in a bit, and then will wait till tomorrow to
>> push. Giving others a chance to chime in seems fair.
> 
> Attached.  I did not like that the previous patch had the timeout
> handling duplicated in the individual functions, I instead centralized
> it into start_xact_command().  Given that it already activated the
> timeout in the most common cases, that seems to make more sense to
> me. In your version we'd have called enable_statement_timeout() twice
> consecutively (which behaviourally is harmless).
> 
> What do you think?  I've not really tested this with the extended
> protocol, so I'd appreciate if you could rerun your test from the
> older thread.

Ok, I will look into your patch and test it out using pgproto.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Joe Conway
On 04/04/2017 10:02 AM, Joe Conway wrote:
> On 04/04/2017 09:55 AM, Mike Palmiotto wrote:
>> After some discussion off-list, I've rebased and udpated the patches.
>> Please see attached for further review.
> 
> Thanks -- will have another look and test on a machine with selinux
> setup. Robert, did you want me to take responsibility to commit on this
> or just provide review/feedback?

I did some editorializing on these.

In particular I did not like the approach to fixing "warning: ‘tclass’
may be used uninitialized" and ended up just doing it the same as was
done elsewhere in relation.c already (set tclass = 0 in the variable
declaration). Along the way I also changed one instance of tclass from
uint16 to uint16_t for the sake of consistency.

Interestingly we figured out that the warning was present with -Og, but
not present with -O0, -O2, or -O3.

If you want to test, apply 0001a and 0001b before 0002.

Any objections?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..320c098 100644
*** a/contrib/sepgsql/label.c
--- b/contrib/sepgsql/label.c
***
*** 10,15 
--- 10,25 
   */
  #include "postgres.h"
  
+ #include 
+ /*
+  * selinux/label.h includes stdbool.h, which redefines bool, so
+  * revert to the postgres definition of bool from c.h
+  */
+ #ifdef bool
+ #undef bool
+ typedef char bool;
+ #endif
+ 
  #include "access/heapam.h"
  #include "access/htup_details.h"
  #include "access/genam.h"
***
*** 37,44 
  
  #include "sepgsql.h"
  
- #include 
- 
  /*
   * Saved hook entries (if stacked)
   */
--- 47,52 
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..2ea6bfb 100644
*** a/contrib/sepgsql/relation.c
--- b/contrib/sepgsql/relation.c
*** sepgsql_relation_post_create(Oid relOid)
*** 243,249 
  	HeapTuple	tuple;
  	Form_pg_class classForm;
  	ObjectAddress object;
! 	uint16		tclass;
  	char	   *scontext;		/* subject */
  	char	   *tcontext;		/* schema */
  	char	   *rcontext;		/* relation */
--- 243,249 
  	HeapTuple	tuple;
  	Form_pg_class classForm;
  	ObjectAddress object;
! 	uint16_t	tclass;
  	char	   *scontext;		/* subject */
  	char	   *tcontext;		/* schema */
  	char	   *rcontext;		/* relation */
*** sepgsql_relation_drop(Oid relOid)
*** 413,419 
  {
  	ObjectAddress object;
  	char	   *audit_name;
! 	uint16_t	tclass;
  	char		relkind;
  
  	relkind = get_rel_relkind(relOid);
--- 413,419 
  {
  	ObjectAddress object;
  	char	   *audit_name;
! 	uint16_t	tclass = 0;
  	char		relkind;
  
  	relkind = get_rel_relkind(relOid);
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..4dda82a 100644
*** a/contrib/sepgsql/label.c
--- b/contrib/sepgsql/label.c
*** exec_object_restorecon(struct selabel_ha
*** 779,785 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
--- 779,786 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION ||
! 	relForm->relkind == RELKIND_PARTITIONED_TABLE)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..f8689c0 100644
*** a/contrib/sepgsql/relation.c
--- b/contrib/sepgsql/relation.c
*** sepgsql_attribute_post_create(Oid relOid
*** 54,65 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
  
  	/*
! 	 * Only attributes within regular relation have individual security
! 	 * labels.
  	 */
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
  
  	/*
--- 54,66 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
+ 	char		relkind = get_rel_relkind(relOid);
  
  	/*
! 	 * Only attributes within regular relations or partition relations have
! 	 * individual security labels.
  	 */
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return;
  
  	/*
*** sepgsql_attribute_drop(Oid relOid, AttrN
*** 135,142 
  {
  	ObjectAddress object;
  	char	   *audit_name;
  
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
  
  	/*
--- 136,144 
  {
  	ObjectAddress object;
  	char	   *audit_name;
+ 	char		relkind = get_rel_relkind(relOid);
  
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return;
  
  	/*
*** sepgsql_attribute_relabel(Oid relOid, At
*** 167,174 
  {
  	ObjectAddress object;
  	char	   *audit_name;
  
! 	if 

Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote:
> > Hm. I started to edit it, but I'm halfway coming back to my previous
> > view that this isn't necessarily ready.
> > 
> > If a client were to to prepare a large number of prepared statements
> > (i.e. a lot of parse messages), this'll only start the timeout once, at
> > the first statement sent.  It's not an issue for libpq which sends a
> > sync message after each PQprepare, nor does it look to be an issue for
> > pgjdbc based on a quick look.
> > 
> > Does anybody think there might be a driver out there that sends a bunch
> > of 'parse' messages, without syncs?
> 
> What's your point of the question? What kind of problem do you expect
> if the timeout starts only once at the first parse meesage out of
> bunch of parse messages?

It's perfectly valid to send a lot of Parse messages without
interspersed Sync or Bind/Execute message.  There'll be one timeout
covering all of those Parse messages, which can thus lead to a timeout,
even though nothing actually takes long individually.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Masahiko Sawada
On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:
> On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
>  wrote:
>> Hmm, you're right.  It could be counted with a separate variable
>> initialized to 0 and incremented every time we decide to add a row to the
>> final set of sampled rows, although different implementations of
>> AcquireSampleRowsFunc have different ways of deciding if a given row will
>> be part of the final set of sampled rows.
>>
>> On the other hand, if we decide to count progress in terms of blocks as
>> you suggested afraid, I'm afraid that FDWs won't be able to report the
>> progress.
>
> I think it may be time to push this patch out to v11.  It was
> submitted one day before the start of the last CommitFest, the design
> wasn't really right, and it's not clear even now that we know what the
> right design is.  And we're pretty much out of time.
>

+1
We're encountering the design issue and it takes more time to find out
right design including FDWs support.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-04-04 Thread Michael Paquier
On Tue, Apr 4, 2017 at 9:30 PM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Tue, Apr 4, 2017 at 7:38 AM, Stephen Frost  wrote:
>> The patch presented here does lower the coverage we have now.
>
> I assume (perhaps mistakenly) that this statement was based on an
> analysis of before-and-after 'make coverage' runs.  Here you are saying
> that you're *not* lowering the coverage.

My apologies here, when I used the work "coverage" in my previous
emails it visibly implied that I meant that I had used the coverage
stuff. But I did not because the TAP test proposed does exactly what
test.sh is doing:
1) Initialize the old cluster and start it.
2) create a bunch of databases with full range of ascii characters.
3) Run regression tests.
4) Take dump on old cluster.
4) Stop the old cluster.
5) Initialize the new cluster.
6) Run pg_upgrade.
7) Start new cluster.
8) Take dump from it.
9) Run deletion script (Oops forgot this part!)

> I understand how the current pg_upgrade tests work.  I don't see
> off-hand why the TAP tests would reduce the code coverage of pg_upgrade,
> but if they do, we should be able to figure out why and correct it.

Good news is that this patch at least does not lower the bar.

>> So in short I don't think that this lack of infrastructure should be a
>> barrier for what is basically a cleanup but... I just work here.
>
> I didn't mean to imply that this patch needs to address the
> cross-version testing challenge, was merely mentioning that there's been
> some work in this area already by Andrew and that if you're interested
> in working on that problem that you should probably coordinate with him.

Sure.

> What I do think is a barrier to this patch moving forward is if it
> reduces our current code coverage testing (with the same-version
> pg_upgrade that's run in the regular regression tests).  If it doesn't,
> then great, but if it does, then the patch should be updated to fix
> that.

I did not do a coverage test first, but surely this patch needs
numbers, so here you go.

Without the patch, here is the coverage of src/bin/pg_upgrade:
  lines..: 57.7% (1311 of 2273 lines)
  functions..: 85.3% (87 of 102 functions)

And with the patch:
  lines..: 58.8% (1349 of 2294 lines)
  functions..: 85.6% (89 of 104 functions)
The coverage gets a bit higher as a couple of basic code paths like
pg_upgrade --help get looked at.
-- 
Michael


pgupgrade-tap-test-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-04 Thread Simon Riggs
On 27 March 2017 at 15:36, Beena Emerson  wrote:

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.

Committed first part to allow internal representation change (only).

No commitment yet to increasing wal-segsize in the way this patch has it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
Andres,

>> I think the code needs a few clarifying comments around this, but
>> otherwise seems good.  Not restarting the timeout in those cases
>> obviously isn't entirely "perfect"/"correct", but a tradeoff - the
>> comments should note that.
>> 
>> Tatsuo-san, do you want to change those, and push?  I can otherwise.
> 
> Andres,
> 
> If you don't mind, could you please fix the comments and push it.

I have changed the comments as you suggested. If it's ok, I can push
the patch myself (today I have time to work on this).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2282058..1fd93359 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether we have started statement timeout timer.
+ */
+static bool stmt_timer_started = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -1239,6 +1246,15 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running if it's not already set. The timer will
+	 * be reset in a subsequent execute message. Ideally the timer should be
+	 * reset in this function so that each parse/bind/execute message catches
+	 * the timeout, but it needs gettimeofday() call for each, which is too
+	 * expensive.
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 *
 	 * We have two strategies depending on whether the prepared statement is
@@ -1526,6 +1542,15 @@ exec_bind_message(StringInfo input_message)
 	 */
 	start_xact_command();
 
+	/*
+	 * Set statement timeout running if it's not already set. The timer will
+	 * be reset in a subsequent execute message. Ideally the timer should be
+	 * reset in this function so that each parse/bind/execute message catches
+	 * the timeout, but it needs gettimeofday() call for each, which is too
+	 * expensive.
+	 */
+	enable_statement_timeout();
+
 	/* Switch back to message context */
 	MemoryContextSwitchTo(MessageContext);
 
@@ -1942,6 +1967,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running if it's not already set.
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * If we re-issue an Execute protocol request against an existing portal,
 	 * then we are only fetching more rows rather than completely re-executing
 	 * the query from the start. atStart is never reset for a v3 portal, so we
@@ -2014,6 +2044,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/*
+			 * We need to reset statement timeout if already set.
+			 */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2443,14 +2478,10 @@ start_xact_command(void)
 	{
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-			enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
-		else
-			disable_timeout(STATEMENT_TIMEOUT, false);
-
 		xact_started = true;
+
+		/* Set statement timeout running, if any */
+		enable_statement_timeout();
 	}
 }
 
@@ -2460,7 +2491,7 @@ finish_xact_command(void)
 	if (xact_started)
 	{
 		/* Cancel any active statement timeout before committing */
-		disable_timeout(STATEMENT_TIMEOUT, false);
+		disable_statement_timeout();
 
 		CommitTransactionCommand();
 
@@ -4511,3 +4542,53 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+/*
+ * Set statement timeout if any.
+ */
+static void
+enable_statement_timeout(void)
+{
+	if (!stmt_timer_started)
+	{
+		if (StatementTimeout > 0)
+		{
+
+			/*
+			 * Sanity check
+			 */
+			if (!xact_started)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("Transaction must have been already started to set statement timeout")));
+			}
+
+			ereport(DEBUG3,
+	(errmsg_internal("Set statement timeout")));
+
+			enable_timeout_after(STATEMENT_TIMEOUT, 

Re: [HACKERS] delta relations in AFTER triggers

2017-04-04 Thread Kevin Grittner
On Mon, Apr 3, 2017 at 7:16 PM, Thomas Munro
 wrote:
> On Tue, Apr 4, 2017 at 3:41 AM, Kevin Grittner  wrote:
>> On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane  wrote:
>>> Thomas Munro  writes:
 Or perhaps the code to inject trigger data transition tables into SPI
 (a near identical code block these three patches) should be somewhere
 common so that each PLs would only need to call a function.  If so,
 where should that go?
>>>
>>> spi.c?
>>
>> Until now, trigger.c didn't know about SPI, and spi.c didn't know
>> about triggers.  The intersection was left to referencing code, like
>> PLs.  Is there any other common code among the PLs dealing with this
>> intersection?  If so, maybe a new triggerspi.c file (or
>> spitrigger.c?) would make sense.  Possibly it could make sense from
>> a code structure PoV even for a single function, but it seems kinda
>> iffy for just this function.  As far as I can see it comes down to
>> adding it to spi.c or creating a new file -- or just duplicating
>> these 30-some lines of code to every PL.
>
> Ok, how about SPI_register_trigger_data(TriggerData *)?  Or any name
> you prefer...  I didn't suggest something as specific as
> SPI_register_transition_tables because think it's plausible that
> someone might want to implement SQL standard REFERENCING OLD/NEW ROW
> AS and make that work in all PLs too one day, and this would be the
> place to do that.
>
> See attached, which add adds the call to all four built-in PLs.  Thoughts?

Worked on the docs some more and then pushed it.

Nice job cutting the number of *.[ch] lines by 30 while adding support for
the other three core PLs.  :-)

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread 'Andres Freund'
On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote:
> On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
> > From: Andres Freund [mailto:and...@anarazel.de]
> > > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> > > npgsql doing it seems like some evidence that it's ok.
> > 
> > And psqlODBC now uses always libpq.
> > 
> > Now time for final decision?
> 
> I'll send an updated patch in a bit, and then will wait till tomorrow to
> push. Giving others a chance to chime in seems fair.

Attached.  I did not like that the previous patch had the timeout
handling duplicated in the individual functions, I instead centralized
it into start_xact_command().  Given that it already activated the
timeout in the most common cases, that seems to make more sense to
me. In your version we'd have called enable_statement_timeout() twice
consecutively (which behaviourally is harmless).

What do you think?  I've not really tested this with the extended
protocol, so I'd appreciate if you could rerun your test from the
older thread.

- Andres
>From c47d904725f3ef0272a4540b6b5bcf0be5c1dea6 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 4 Apr 2017 18:44:42 -0700
Subject: [PATCH] Rearm statement_timeout after each executed query.

Previously statement_timeout, in the extended protocol, affected all
messages till a Sync message.  For clients that pipeline/batch
query execution that's problematic.

Instead disable timeout after each Execute message, and enable, if
necessary, the timer in start_xact_command().

Author: Tatsuo Ishii, editorialized by Andres Freund
Reviewed-By: Takayuki Tsunakawa, Andres Freund
Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-is...@sraoss.co.jp
---
 src/backend/tcop/postgres.c | 77 ++---
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2282058c0..4ab3bd7f07 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether statement timeout timer is active.
+ */
+static bool stmt_timeout_active = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -1234,7 +1241,8 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	/*
 	 * Start up a transaction command so we can run parse analysis etc. (Note
 	 * that this will normally change current memory context.) Nothing happens
-	 * if we are already in one.
+	 * if we are already in one.  This also arms the statement timeout if
+	 * necessary.
 	 */
 	start_xact_command();
 
@@ -1522,7 +1530,8 @@ exec_bind_message(StringInfo input_message)
 	/*
 	 * Start up a transaction command so we can call functions etc. (Note that
 	 * this will normally change current memory context.) Nothing happens if
-	 * we are already in one.
+	 * we are already in one.  This also arms the statement timeout if
+	 * necessary.
 	 */
 	start_xact_command();
 
@@ -2014,6 +2023,9 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/* full command has been executed, reset timeout */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2443,25 +2455,27 @@ start_xact_command(void)
 	{
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-			enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
-		else
-			disable_timeout(STATEMENT_TIMEOUT, false);
-
 		xact_started = true;
 	}
+
+	/*
+	 * Start statement timeout if necessary.  Note that this'll intentionally
+	 * not reset the clock on an already started timeout, to avoid the timing
+	 * overhead when start_xact_command() is invoked repeatedly, without an
+	 * interceding finish_xact_command() (e.g. parse/bind/execute).  If that's
+	 * not desired, the timeout has to be disabled explicitly.
+	 */
+	enable_statement_timeout();
 }
 
 static void
 finish_xact_command(void)
 {
+	/* cancel active statement timeout after each command */
+	disable_statement_timeout();
+
 	if (xact_started)
 	{
-		/* Cancel any active statement timeout before committing */
-		

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas  wrote:

>  but
> try to access the TOAST table would be fatal; that probably would have
> deadlock hazards among other problems.


Hmm. I think you're right. We could make a copy of the heap tuple, drop the
lock and then access TOAST to handle that. Would that work?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-04 Thread Heikki Linnakangas

On 04/04/2017 01:52 PM, Heikki Linnakangas wrote:

On 03/31/2017 10:10 AM, Michael Paquier wrote:

On Wed, Mar 8, 2017 at 10:39 PM, Robert Haas  wrote:

On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier
 wrote:
I kinda hope Heikki is going to step up to the plate here, because I
think he understands most of it already.


The second person who knows a good deal about NFKC is Ishii-san.

Attached is a rebased patch.


Thanks, I'm looking at this now.


I will continue tomorrow, but I wanted to report on what I've done so 
far. Attached is a new patch version, quite heavily modified. Notable 
changes so far:


* Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit 
ints. IMHO this makes the tables easier to read (to a human), and they 
are also packed slightly more tightly (see next two points), as you can 
fit more codepoints in a 16-bit integer.


* Reorganize the decomposition table. Instead of a separate 
binary-searched table for each "size", have one table that's ordered by 
codepoint, which contains a length and offset into another array, where 
all the decomposed sequences are. This makes the recomposition step 
somewhat slower, as it now has to grovel through an array that contains 
all decompositions, rather than just the array that contains 
decompositions of two characters. But this isn't performance-critical, 
readability and tight packing of the tables matter more.


* In the lookup table, store singleton decompositions that decompose to 
a single character, and the character fits in a uint16, directly in the 
main lookup table, instead of storing an index into the other table. 
This makes the tables somewhat smaller, as there are a lot of such 
decompositions.


* Use uint32 instead of pg_wchar to represent unicode codepoints. 
pg_wchar suggests something that holds multi-byte characters in the OS 
locale, used by functions like wcscmp, so avoid that. I added a "typedef 
uint32 Codepoint" alias, though, to make it more clear which variables 
hold Unicode codepoints rather than e.g. indexes into arrays.


* Unicode consortium publishes a comprehensive normalization test suite, 
as a text file, NormalizationTest.txt. I wrote a small perl and C 
program to run all the tests from that test suite, with the 
normalization function. That uncovered a number of bugs in the 
recomposition algorithm, which I then fixed:


 * decompositions that map to ascii characters cannot be excluded.
 * non-canonical and non-starter decompositions need to be excluded. 
They are now flagged in the lookup table, so that we only use them 
during the decomposition phase, not when recomposing.



TODOs / Discussion points:

* I'm not sure I like the way the code is organized, I feel that we're 
littering src/common with these. Perhaps we should add a 
src/common/unicode_normalization directory for all this.


* The list of characters excluded from recomposition is currently 
hard-coded in utf_norm_generate.pl. However, that list is available in 
machine-readable format, in file CompositionExclusions.txt. Since we're 
reading most of the data from UnicodeData.txt, would be good to read the 
exclusion table from a file, too.


* SASLPrep specifies normalization form KC, but it also specifies that 
some characters are mapped to space or nothing. Should do those 
mappings, too.


- Heikki


implement-SASLprep-2.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
>> What's your point of the question? What kind of problem do you expect
>> if the timeout starts only once at the first parse meesage out of
>> bunch of parse messages?
> 
> It's perfectly valid to send a lot of Parse messages without
> interspersed Sync or Bind/Execute message.  There'll be one timeout
> covering all of those Parse messages, which can thus lead to a timeout,
> even though nothing actually takes long individually.

Hmm. IMO, that could happen even with the current statement timeout
implementation as well.

Or we could start/stop the timeout in exec_execute_message()
only. This could avoid the problem above. Also this is more consistent
with log_duration/log_min_duration_statement behavior than now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread 'Andres Freund'
On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
> From: Andres Freund [mailto:and...@anarazel.de]
> > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> > npgsql doing it seems like some evidence that it's ok.
> 
> And psqlODBC now uses always libpq.
> 
> Now time for final decision?

I'll send an updated patch in a bit, and then will wait till tomorrow to
push. Giving others a chance to chime in seems fair.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-04-04 Thread Thomas Munro
On Wed, Apr 5, 2017 at 11:49 AM, Kevin Grittner  wrote:
> Worked on the docs some more and then pushed it.
>
> Nice job cutting the number of *.[ch] lines by 30 while adding support for
> the other three core PLs.  :-)

Great.  Thanks.  I wonder if there is some way we can automatically
include code fragments in the documentation without keeping them in
sync manually.

Now it looks like I have no more excuses for putting off reading the
papers you've shared[1][2] about incremental matview algorithms!
Looking forward to helping out with the next steps in that project if
I can.

[1] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254=rep1=pdf
[2] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208=rep1=pdf

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-04-04 Thread Michael Paquier
On Tue, Mar 28, 2017 at 3:10 PM, Michael Paquier
 wrote:
> OK, done. I have just noticed that Simon has marked himself as a
> committer of this patch 24 hours ago.

For the archive's sake, this has been committed as 728bd991. Thanks Simon!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-04-04 Thread Michael Paquier
On Wed, Apr 5, 2017 at 12:39 AM, Stephen Frost  wrote:
> Committed, with those additions.

Thanks for the commit. The final result is nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Ashutosh Bapat
On Wed, Apr 5, 2017 at 1:43 AM, Andres Freund  wrote:

> On 2017-04-04 08:01:32 -0400, Robert Haas wrote:
> > On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund 
> wrote:
> > > I don't think the parallel seqscan is comparable in complexity with the
> > > parallel append case.  Each worker there does the same kind of work,
> and
> > > if one of them is behind, it'll just do less.  But correct sizing will
> > > be more important with parallel-append, because with non-partial
> > > subplans the work is absolutely *not* uniform.
> >
> > Sure, that's a problem, but I think it's still absolutely necessary to
> > ramp up the maximum "effort" (in terms of number of workers)
> > logarithmically.  If you just do it by costing, the winning number of
> > workers will always be the largest number that we think we'll be able
> > to put to use - e.g. with 100 branches of relatively equal cost we'll
> > pick 100 workers.  That's not remotely sane.
>
> I'm quite unconvinced that just throwing a log() in there is the best
> way to combat that.  Modeling the issue of starting more workers through
> tuple transfer, locking, startup overhead costing seems a better to me.
>
> If the goal is to compute the results of the query as fast as possible,
> and to not use more than max_parallel_per_XXX, and it's actually
> beneficial to use more workers, then we should.  Because otherwise you
> really can't use the resources available.
>

+1. I had expressed similar opinion earlier, but yours is better
articulated. Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-04-04 Thread Michael Paquier
On Tue, Apr 4, 2017 at 10:52 PM, Peter Eisentraut
 wrote:
> On 4/3/17 11:32, Andres Freund wrote:
>> That doesn't strike as particularly future proof.  We intentionally
>> leave objects behind pg_regress runs, but that only works if we actually
>> run them...
>
> I generally agree with the sentiments expressed later in this thread.
> But just to clarify what I meant here:  We don't need to run a, say,
> 1-minute serial test to load a few "left behind" objects for the
> pg_upgrade test, if we can load the same set of objects using dedicated
> scripting in say 2 seconds.  This would make both the pg_upgrade tests
> faster and would reduce the hidden dependencies in the main tests about
> which kinds of objects need to be left behind.

Making the tests run shorter while maintaining the current code
coverage is nice. But this makes more complicated the test suite
maintenance as this needs either a dedicated regression schedule or an
extra test suite where objects are created just for the sake of
pg_upgrade. This increases the risks of getting a rotten test suite
with the time if patch makers and reviewers are not careful.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-04 Thread Peter Geoghegan
On Tue, Apr 4, 2017 at 1:32 PM, Andres Freund  wrote:
> s/Avoid/Allow to avoid/

WFM.

>> + *
>> + * Callers cannot rely on memory for tuple in returned slot remaining valid
>> + * past any subsequent manipulation of the sorter, such as another fetch of
>> + * input from sorter.  (The sorter may recycle memory.)
>>   */
>
> It's not just the sorter, right? I'd just rephrase that the caller
> cannot rely on tuple to stay valid beyond a single call.

It started that way, but changed on the basis of Tom's feedback. I
would be equally happy with either wording.

>>  static bool
>>  tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
>> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, 
>> bool forward,
>>   * NULL value in leading attribute will set abbreviated value to zeroed
>>   * representation, which caller may rely on in abbreviated inequality check.
>>   *
>> - * The slot receives a copied tuple (sometimes allocated in caller memory
>> - * context) that will stay valid regardless of future manipulations of the
>> - * tuplesort's state.
>> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
>> + * regardless of future manipulations of the tuplesort's state.  Memory is
>> + * owned by the caller.  If copy is FALSE, the slot will just receive a
>> + * pointer to a tuple held within the tuplesort, which is more efficient,
>> + * but only safe for callers that are prepared to have any subsequent
>> + * manipulation of the tuplesort's state invalidate slot contents.
>>   */
>
> Hm. Do we need to note anything about how long caller memory needs to
> live? I think we'd get into trouble if the caller does a memory context
> reset, without also clearing the slot?

Since we arrange to have caller explicitly own memory (it's always in
their own memory context (current context), which is not the case with
other similar routines), it's 100% the caller's problem. As I assume
you know, convention in executor around resource management of memory
like this is pretty centralized within ExecStoreTuple(), and nearby
routines. In general, the executor is more or less used to having this
be its problem alone, and is pessimistic about memory lifetime unless
otherwise noted.

> Other than these minimal adjustments, this looks good to go to me.

Cool.

I'll try to get out a revision soon, maybe later today, including an
updated 0002-* (Valgrind suppression), which I have not forgotten
about.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-05 08:34:43 +0900, Tatsuo Ishii wrote:
> Andres,
> 
> >> I think the code needs a few clarifying comments around this, but
> >> otherwise seems good.  Not restarting the timeout in those cases
> >> obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> >> comments should note that.
> >> 
> >> Tatsuo-san, do you want to change those, and push?  I can otherwise.
> > 
> > Andres,
> > 
> > If you don't mind, could you please fix the comments and push it.
> 
> I have changed the comments as you suggested. If it's ok, I can push
> the patch myself (today I have time to work on this).

I'm working on the patch, and I've edited it more heavily, so please
hold off.

Changes:
I don't think the debugging statements are a good idea, the
!xact_started should be an assert, and disable_timeout should be called
from within enable_statement_timeout independent of stmt_timer_started.


But more importantly I had just sent a question that I think merits
discussion.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-04 16:38:53 -0700, Andres Freund wrote:
> On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote:
> > >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
> > >> starts and stops the timer), then it's a concern and the patch should 
> > >> not be ready for committer.  However, the current patch is not like that 
> > >> -- it seems to do what others in this thread are expecting.
> > > 
> > > Oh, interesting - I kind of took the author's statement as, uh,
> > > authoritative ;).  A quick look over the patch confirms your
> > > understanding.
> > 
> > Yes, Tsunakawa-san is correct. Sorry for confusion.
> > 
> > > I think the code needs a few clarifying comments around this, but
> > > otherwise seems good.  Not restarting the timeout in those cases
> > > obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> > > comments should note that.
> > > 
> > > Tatsuo-san, do you want to change those, and push?  I can otherwise.
> > 
> > Andres,
> > 
> > If you don't mind, could you please fix the comments and push it.
> 
> Hm. I started to edit it, but I'm halfway coming back to my previous
> view that this isn't necessarily ready.
> 
> If a client were to to prepare a large number of prepared statements
> (i.e. a lot of parse messages), this'll only start the timeout once, at
> the first statement sent.  It's not an issue for libpq which sends a
> sync message after each PQprepare, nor does it look to be an issue for
> pgjdbc based on a quick look.
> 
> Does anybody think there might be a driver out there that sends a bunch
> of 'parse' messages, without syncs?

Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs
and npgsql doing it seems like some evidence that it's ok.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 08:00, Craig Ringer  wrote:

> Taking a look at this now.

Rebased to current master with conflicts and whitespace errors fixed.
Review pending.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 71b6163734934db3160de81fd064e7d113b9122f Mon Sep 17 00:00:00 2001
From: Jim Nasby 
Date: Wed, 1 Mar 2017 15:45:51 -0600
Subject: [PATCH 1/2] Add SPI_execute_callback() and callback-based
 DestReceiver.

Instead of placing results in a tuplestore, this method of execution
uses the supplied callback when creating the Portal for a query.
---
 src/backend/executor/spi.c | 79 --
 src/backend/tcop/dest.c| 11 +++
 src/include/executor/spi.h |  4 +++
 src/include/tcop/dest.h|  1 +
 4 files changed, 85 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3a89ccd..a0af31a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -55,7 +55,8 @@ static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan);
 
 static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
   Snapshot snapshot, Snapshot crosscheck_snapshot,
-  bool read_only, bool fire_triggers, uint64 tcount);
+  bool read_only, bool fire_triggers, uint64 tcount,
+  DestReceiver *callback);
 
 static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
 	Datum *Values, const char *Nulls);
@@ -321,7 +322,34 @@ SPI_execute(const char *src, bool read_only, long tcount)
 
 	res = _SPI_execute_plan(, NULL,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
+
+	_SPI_end_call(true);
+	return res;
+}
+int
+SPI_execute_callback(const char *src, bool read_only, long tcount,
+		DestReceiver *callback)
+{
+	_SPI_plan	plan;
+	int			res;
+
+	if (src == NULL || tcount < 0)
+		return SPI_ERROR_ARGUMENT;
+
+	res = _SPI_begin_call(true);
+	if (res < 0)
+		return res;
+
+	memset(, 0, sizeof(_SPI_plan));
+	plan.magic = _SPI_PLAN_MAGIC;
+	plan.cursor_options = 0;
+
+	_SPI_prepare_oneshot_plan(src, );
+
+	res = _SPI_execute_plan(, NULL,
+			InvalidSnapshot, InvalidSnapshot,
+			read_only, true, tcount, callback);
 
 	_SPI_end_call(true);
 	return res;
@@ -355,7 +383,34 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
+
+	_SPI_end_call(true);
+	return res;
+}
+
+/* Execute a previously prepared plan with a callback Destination */
+int
+SPI_execute_plan_callback(SPIPlanPtr plan, Datum *Values, const char *Nulls,
+ bool read_only, long tcount, DestReceiver *callback)
+{
+	int			res;
+
+	if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0)
+		return SPI_ERROR_ARGUMENT;
+
+	if (plan->nargs > 0 && Values == NULL)
+		return SPI_ERROR_PARAM;
+
+	res = _SPI_begin_call(true);
+	if (res < 0)
+		return res;
+
+	res = _SPI_execute_plan(plan,
+			_SPI_convert_params(plan->nargs, plan->argtypes,
+Values, Nulls),
+			InvalidSnapshot, InvalidSnapshot,
+			read_only, true, tcount, callback);
 
 	_SPI_end_call(true);
 	return res;
@@ -384,7 +439,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
 
 	res = _SPI_execute_plan(plan, params,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -425,7 +480,7 @@ SPI_execute_snapshot(SPIPlanPtr plan,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			snapshot, crosscheck_snapshot,
-			read_only, fire_triggers, tcount);
+			read_only, fire_triggers, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -472,7 +527,7 @@ SPI_execute_with_args(const char *src,
 
 	res = _SPI_execute_plan(, paramLI,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -1907,7 +1962,8 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
 static int
 _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
   Snapshot snapshot, Snapshot crosscheck_snapshot,
-  bool read_only, bool fire_triggers, uint64 tcount)
+  bool read_only, bool fire_triggers, uint64 tcount,
+  DestReceiver *callback)
 {
 	int			my_res = 0;
 	uint64		my_processed = 0;
@@ -1918,6 +1974,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	ErrorContextCallback spierrcontext;
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
+	DestReceiver *dest = callback;
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -2037,7 +2094,6 @@ 

Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 08:23, Craig Ringer  wrote:
> On 5 April 2017 at 08:00, Craig Ringer  wrote:
>
>> Taking a look at this now.
>
> Rebased to current master with conflicts and whitespace errors fixed.
> Review pending.

This patch fails to update the documentation at all.

https://www.postgresql.org/docs/devel/static/spi.html



The patch crashes in initdb with --enable-cassert builds:

performing post-bootstrap initialization ... TRAP:
FailedAssertion("!(myState->pub.mydest == DestSQLFunction)", File:
"functions.c", Line: 800)
sh: line 1: 30777 Aborted (core dumped)
"/home/craig/projects/2Q/postgres/tmp_install/home/craig/pg/10/bin/postgres"
--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true
template1 > /dev/null
child process exited with exit code 134

Backtrace attached.




Details on patch 1:

missing newline

+}
+int
+SPI_execute_callback(const char *src, bool read_only, long tcount,


+/* Execute a previously prepared plan with a callback Destination */


caps "Destination"



+// XXX throw error if callback is set

^^



+static DestReceiver spi_callbackDR = {
+donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
+DestSPICallback
+};


Is the callback destreceiver supposed to be a blackhole? Why? Its
name, spi_callbackDR and DestSPICallback, doesn't seem to indicate
that it drops its input.

Presumably that's a default destination you're then supposed to modify
with your own callbacks? There aren't any comments to indicate what's
going on here.



Comments on patch 2:

If this is the "bare minimum" patch, what is pending? Does it impose
any downsides or limits?

+/* Get switch execution contexts */
+extern PLyExecutionContext
*PLy_switch_execution_context(PLyExecutionContext *new);

Comment makes no sense to me. This seems something like memory context
switch, where you supply the new and return the old. But the comment
doesn't explain it at all.

+void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo);
+void PLy_CSDestroy(DestReceiver *self);

These are declared in the plpy_spi.c. Why aren't these static?

+volatile MemoryContext oldcontext;
+volatile ResourceOwner oldowner;
 int rv;
-volatile MemoryContext oldcontext;
-volatile ResourceOwner oldowner;

Unnecessary code movement.

In PLy_Callback_New, I think your use of a sub-context is sensible. Is
it necessary to palloc0 though?

+static CallbackState *
+PLy_Callback_Free(CallbackState *callback)

The code here looks like it could be part of spi.c's callback support,
rather than plpy_spi specific, since you provide a destroy callback in
the SPI callback struct.

+/* We need to store this because the TupleDesc the receive
function gets has no names. */
+myState->desc = typeinfo;

Is it safe to just store the pointer to the TupleDesc here? What's the lifetime?

+ * will clean it up when the time is right. XXX This might result in a leak
+ * if an error happens and the result doesn't get dereferenced.
+ */
+MemoryContextSwitchTo(TopMemoryContext);
+result->tupdesc = CreateTupleDescCopy(typeinfo);

especially given this XXX comment...


Patch needs bug fix, docs updates, fixes for issues marked in
comments. But overall approach looks sensible enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
#0  0x7f9d5d64ea28 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9d5d65062a in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0083a521 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9b68d0 "!(myState->pub.mydest == 
DestSQLFunction)", errorType=errorType@entry=0x882d29 "FailedAssertion", 
fileName=fileName@entry=0x9b6920 "functions.c", 
lineNumber=lineNumber@entry=800) at assert.c:54
No locals.
#3  0x006166e8 in postquel_start (fcache=0x27772c0, es=0x284eb58) at 
functions.c:800
myState = 0x284eeb8
dest = 0x284eeb8
#4  fmgr_sql (fcinfo=0x27a7a28) at functions.c:1149
fcache = 0x27772c0
sqlerrcontext = {previous = 0x0, callback = 0x614910 
, arg = 0x27a79d0}
randomAccess = 0 '\000'
lazyEvalOK = 
is_first = 
pushed_snapshot = 0 '\000'
es = 0x284eb58
slot = 
result = 
eslist = 
eslc = 0x284eb90
__func__ = "fmgr_sql"
#5  0x0060810b in ExecInterpExpr (state=0x27a7528, econtext=0x27a4ce0, 
isnull=) at execExprInterp.c:670
fcinfo = 0x27a7a28
argnull = 0x27a7d68 ""
argno = 
op = 0x27a76f8
resultslot = 0x27a7130
innerslot = 
outerslot = 0x27a5930
scanslot = 0x0
dispatch_table = {0x608960 , 0x607e00 
, 0x607e20 , 0x607e38 
, 0x607c90 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 8:42 AM, Robert Haas  wrote:

> On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
>  wrote:
> > On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas 
> wrote:
> >>  but
> >> try to access the TOAST table would be fatal; that probably would have
> >> deadlock hazards among other problems.
> >
> > Hmm. I think you're right. We could make a copy of the heap tuple, drop
> the
> > lock and then access TOAST to handle that. Would that work?
>
> Yeah, but it might suck.  :-)


Well, better than causing a deadlock ;-)

Lets see if we want to go down the path of blocking WARM when tuples have
toasted attributes. I submitted a patch yesterday, but having slept over
it, I think I made mistakes there. It might not be enough to look at the
caller supplied new tuple because that may not have any toasted values, but
the final tuple that gets written to the heap may be toasted. We could look
at the new tuple's attributes to find if any indexed attributes are
toasted, but that might suck as well. Or we can simply block WARM if the
old or the new tuple has external attributes i.e. HeapTupleHasExternal()
returns true. That could be overly restrictive because irrespective of
whether the indexed attributes are toasted or just some other attribute is
toasted, we will block WARM on such updates. May be that's not a problem.

We will also need to handle the case where some older tuple in the chain
has toasted value and that tuple is presented to recheck (I think we can
handle that case fairly easily, but its not done in the code yet) because
of a subsequent WARM update and the tuples updated by WARM did not have any
toasted values (and hence allowed).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread David Rowley
On 3 April 2017 at 03:05, Emre Hasegeli  wrote:
> Unfortunately, I am on vacation for two weeks without my computer.  I can
> post another version after 18th.  I know we are under time pressure for
> release.  I wouldn't mind if you or Alvaro would change it anyway you like.

I've made some changes. Actually, I completely changed how the
estimates work. I find this method more self-explanatory.

Basically, we work out the total index ranges, then work out how many
of those we'd touch in a perfectly correlated scenario. We then work
out how many ranges we'll probably visit based on the correlation
estimates from the stats, and assume the selectivity is probableRanges
/ totalIndexRanges.

I've attached a spreadsheet that compares Emre's method to mine. Mine
seems to favour the BRIN index less when the table is small. I think
this is pretty natural since if there is only 1 range, and we narrow
the result to one of them, then we might as well have performed a
seqscan.

My method seems favour BRIN a bit longer when the correlation is
between about 1% and 100%. But penalises BRIN much more when
correlation is less than around 1%. This may be better my way is
certainly smarter than the unpatched version, but still holds on a bit
longer, which may be more favourable if a BRIN index actually exists.
It might be more annoying for a user to have added a BRIN index and
never have the planner choose it.

My method also never suffers from estimating > 100% of the table.

I was a bit worried that Emre's method would penalise BRIN too much
when the correlation is not so high.

Interested to hear comments on this.

Please feel free to play with the spreadsheet by changing rows 1-3 in column B.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


brin-correlation-drowley_v1.patch
Description: Binary data


BRIN_estimates2.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compiler warning in costsize.c

2017-04-04 Thread Michael Paquier
On Wed, Apr 5, 2017 at 2:54 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can
>> generate warnings. Here is for example with MSVC:
>>   src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : 
>> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>>   src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : 
>> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>
>> a9c074ba7 has done an effort, but a bit more is needed as the attached.
>
> That doesn't look right at all:
>
> +#ifdef USE_ASSERT_CHECKING
> RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY;
> +#endif
>
> The entire point of the PG_USED_FOR_ASSERTS_ONLY annotation is to suppress
> this type of warning without a need for an explicit #ifdef like that one.
>
> We should either fix PG_USED_FOR_ASSERTS_ONLY to work on MSVC as well,
> or decide that we don't care about suppressing such warnings on MSVC,
> or give up on PG_USED_FOR_ASSERTS_ONLY and remove it everywhere in
> favor of plain #ifdefs.

Visual does not have any equivalent of __attribute__((unused))... And
visual does not have an equivalent after looking around. A trick that
I can think of would be to change PG_USED_FOR_ASSERTS_ONLY to be
roughly a macro like that:

#if defined(__GNUC__)
#define PG_ASSERT_ONLY(x) ASSERT_ONLY_ ## x __attribute__((unused))
#else
#define PG_ASSERT_ONLY(x) ((void) x)
#endif

But that's ugly...

> (I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY,
> because it tends to confuse pgindent.)

I would be incline to just do that, any other solution I can think of
is uglier than that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 4, 2017 at 6:56 PM, Joe Conway  wrote:
>> Any objections?

> I'm guessing Tom's going to have a strong feeling about whether 0001a
> is the right way to address the stdbool issue,

I will?  [ looks ... ]  Yup, you're right.

I doubt that works at all, TBH.  What I'd expect to happen with a
typical compiler is a complaint about redefinition of typedef bool,
because c.h already declared it and here this fragment is doing
so again.  It'd make sense to me to do

+ #ifdef bool
+ #undef bool
+ #endif

to get rid of the macro definition of bool that stdbool.h is
supposed to provide.  But there should be no reason to declare
our typedef a second time.

Another issue is whether you won't get compiler complaints about
redefinition of the "true" and "false" macros.  But those would
likely only be warnings, not flat-out errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
>> What do you think?  I've not really tested this with the extended protocol,
>> so I'd appreciate if you could rerun your test from the older thread.
> 
> The patch looks good and cleaner.  It looks like the code works as expected.  
> As before, I ran one INSERT statement with PgJDBC, with gdb's breakpoints set 
> on enable_timeout() and disable_timeout().  I confirmed that enable_timeout() 
> is called just once at Parse message, and disable_timeout() is called just 
> once at Execute message.
> 
> I'd like to wait for Tatsuo-san's thorough testing with pgproto.

I have done tests using pgproto. One thing I noticed a strange
behavior. Below is an output of pgproto. The test first set the
timeout to 3 seconds, and parse/bind for "SELECT pg_sleep(2)" then set
timeout to 1 second using extended query. Subsequent Execute emits a
statement timeout error as expected, but next "SELECT pg_sleep(2)"
call using extended query does not emit a statement error. The test
for this is "007-timeout-twice". Attached is the test cases including
this.

FE=> Query(query="SET statement_timeout = '3s'")
<= BE CommandComplete(SET)
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S1", query="SELECT pg_sleep(2)")
FE=> Bind(stmt="S1", portal="S2")
FE=> Parse(stmt="", query="SET statement_timeout = '1s'")
FE=> Bind(stmt="", portal="")
FE=> Execute(portal="")
FE=> Execute(portal="S2")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE ParseComplete
<= BE BindComplete
<= BE CommandComplete(SET)
<= BE ErrorResponse(S ERROR V ERROR C 57014 M canceling statement due to 
statement timeout F postgres.c L 2968 R ProcessInterrupts )
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S3", query="SELECT pg_sleep(2)")
FE=> Bind(stmt="S3", portal="S2")
FE=> Execute(portal="S2")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE DataRow
<= BE CommandComplete(SELECT 1)
<= BE ReadyForQuery(I)
FE=> Terminate



tests.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-04 Thread Kyotaro HORIGUCHI
At Tue, 4 Apr 2017 20:19:39 +0200, Tomas Vondra  
wrote in <56f40b20-c464-fad2-ff39-06b668fac...@2ndquadrant.com>
> On 04/04/2017 09:55 AM, David Rowley wrote:
> > On 1 April 2017 at 04:25, David Rowley 
> > wrote:
> >> I've attached an updated patch.
> >
> > I've made another pass at this and ended up removing the tryextstats
> > variable. We now only try to use extended statistics when
> > clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
> > RTE_RELATION, and of course, it must also have some extended stats
> > defined too.
> >
> > I've also cleaned up a few more comments, many of which I managed to
> > omit updating when I refactored how the selectivity estimates ties
> > into clauselist_selectivity()
> >
> > I'm quite happy with all of this now, and would also be happy for
> > other people to take a look and comment.
> >
> > As a reviewer, I'd be marking this ready for committer, but I've moved
> > a little way from just reviewing this now, having spent two weeks
> > hacking at it.
> >
> > The latest patch is attached.
> >
> 
> Thanks David, I agree the reworked patch is much cleaner that the last
> version I posted. Thanks for spending your time on it.
> 
> Two minor comments:
> 
> 1) DEPENDENCY_MIN_GROUP_SIZE
> 
> I'm not sure we still need the min_group_size, when evaluating
> dependencies. It was meant to deal with 'noisy' data, but I think it
> after switching to the 'degree' it might actually be a bad idea.
> 
> Consider this:
> 
> create table t (a int, b int);
> insert into t select 1, 1 from generate_series(1, 1) s(i);
> insert into t select i, i from generate_series(2, 2) s(i);
> create statistics s with (dependencies) on (a,b) from t;
> analyze t;
> 
> select stadependencies from pg_statistic_ext ;
>   stadependencies
> 
>  [{1 => 2 : 0.44}, {2 => 1 : 0.44}]
> (1 row)
> 
> So the degree of the dependency is just ~0.333 although it's obviously
> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
> reason is that we discard 2/3 of rows, because those groups are only a
> single row each, except for the one large group (1/3 of rows).
> 
> Without the mininum group size limitation, the dependencies are:
> 
> test=# select stadependencies from pg_statistic_ext ;
>   stadependencies
> 
>  [{1 => 2 : 1.00}, {2 => 1 : 1.00}]
> (1 row)
> 
> which seems way more reasonable, I think.

I think the same. Quite large part of functional dependency in
reality is in this kind.

> 2) A minor detail is that instead of this
> 
> if (estimatedclauses != NULL &&
> bms_is_member(listidx, estimatedclauses))
> continue;
> 
> perhaps we should do just this:
> 
> if (bms_is_member(listidx, estimatedclauses))
> continue;
> 
> bms_is_member does the same NULL check right at the beginning, so I
> don't think this might make a measurable difference.


I have some other comments.

==
- The comment for clauselist_selectivity,
| + * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
| + * selectivity estimates using any extended statistcs on 'rel'.

The 'rel' is actually a parameter but rtekind means rel->rtekind
so this might be better be such like the following.

| When a relation of RTE_RELATION is given as 'rel', we try
| extended statistcs on the relation.

Then the following line doesn't seem to be required.

| + * If we identify such extended statistics exist, we try to apply them.


=
The following comment in the same function,

| +if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
| +{
| +/*
| + * Try to estimate with multivariate functional dependency 
statistics.
| + *
| + * The function will supply an estimate for the clauses which it
| + * estimated for. Any clauses which were unsuitible were ignored.
| + * Clauses which were estimated will have their 0-based list index 
set
| + * in estimatedclauses.  We must ignore these clauses when processing
| + * the remaining clauses later.
| + */

(Notice that I'm not a good writer) This might better be the
following.

|  dependencies_clauselist_selectivity gives selectivity over
|  caluses that functional dependencies on the given relation is
|  applicable. 0-based index numbers of consumed clauses are
|  returned in the bitmap set estimatedclauses so that the
|  estimation here after can ignore them.

=
| +s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
| +   jointype, sjinfo, rel, );

The name prefix "dependency_" means "functional_dependency" here
and omitting "functional" is confusing to me. On the other hand
"functional_dependency" is quite long as prefix. Could we 

Re: [HACKERS] identity columns

2017-04-04 Thread Vitaly Burovoy
On 4/3/17, Peter Eisentraut  wrote:
> On 3/30/17 22:57, Vitaly Burovoy wrote:
>> Why do you still want to leave "ADD IF NOT EXISTS" instead of using
>> "SET IF NOT EXISTS"?
>> If someone wants to follow the standard he can simply not to use "IF
>> NOT EXISTS" at all. Without it error should be raised.
>
> As I tried to mention earlier, it is very difficult to implement the IF
> NOT EXISTS behavior here, because we need to run the commands the create
> the sequence before we know whether we will need it.

I understand. You did not mention the reason.
But now you have the "get_attidentity" function to check whether the
column is an identity or not.
If it is not so create a sequence otherwise skip the creation.

I'm afraid after the feature freeze it is impossible to do "major
reworking of things", but after the release we have to support the
"ALTER COLUMN col ADD" grammar.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

The next nitpickings to the last patch. I try to get places with
lacking of variables' initialization.
All other things seem good for me now. I'll continue to review the
patch while you're fixing the current notes.
Unfortunately I don't know PG internals so deep to understand
correctness of changes in the executor (what Tom and Andres are
talking about).


0. There is a little conflict of applying to the current master
because of 60a0b2e.

1.
First of all, I think the previous usage of the constant
"ATTRIBUTE_IDENTITY_NONE" (for setting a value) is better than just
'\0'.
It is OK for lacking of the constant in comparison.

2.
Lack of "SET IDENTITY" and "RESTART" in the "Compatibility" chapter in
"alter_table.sgml":
"The forms ADD (without USING INDEX), DROP, SET DEFAULT, and SET DATA
TYPE (without USING) conform with the SQL standard."
Also ADD IDENTITY is an extension (I hope temporary), but it looks
like a standard feature (from the above sentance).

3.
It would be better if tab-completion has ability to complete the
"RESTART" keyword like:
ALTER TABLE x1 ALTER COLUMN i RESTART
tab-complete.c:8898
- COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "ADD", "DROP");
+ COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "RESTART", "ADD", "DROP");

4.
The block in "gram.y":
/* ALTER TABLE  ALTER [COLUMN]  DROP IDENTITY */
does not contain "n->missing_ok = false;". I think it should be.

5.
In the file "pg_dump.c" in the "getTableAttrs" function at row 7959
the "tbinfo->needs_override" is used (in the "||" operator) but it is
initially nowhere set to "false".

6.
And finally... a segfault when pre-9.6 databases are pg_dump-ing:
SQL query in the file "pg_dump.c" in funcion "getTables" has the
"is_identity_sequence" column only in the "if (fout->remoteVersion >=
90600)" block (frow 5536), whereas 9.6 does not support it (but OK,
for 9.6 it is always FALSE, no sense to create a new "if" block), but
in other blocks no ",FALSE as is_identity_sequence" is added.

The same mistake is in the "getTableAttrs" function. Moreover whether
"ORDER BY a.attrelid, a.attnum" is really necessary ("else if
(fout->remoteVersion >= 90200)" has only "ORDER BY a.attnum")?


-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 10:37, Tsunakawa, Takayuki
 wrote:

> Good point!  And I said earlier in this thread, I think managing privileges 
> (adding/revoking privileges from the user account) is the DBA's or sysadmin's 
> duty, and PG's removing all privileges feels overkill.

I think it's a sensible alternative to refusing to run as a highly
privileged role, which is what we used to do IIRC.

> OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock 
> Pages in Memory, using the attached pg_ctl.c.  Please see 
> EnableLockPagesPrivilege() and its call site.  But pg_ctl -w start fails 
> emitting the following message:

That won't work. You'd have to pass 0 to the flags of
CreateRestrictedToken and instead supply a PrivilegesToDelete array.
You'd probably GetTokenInformation and AND with a mask of ones you
wanted to retain.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread Emre Hasegeli
> Interested to hear comments on this.


I don't have chance to test it right now, but I am sure it would be an
improvement over what we have right now.  There is no single correct
equation with so many unknowns we have.

>   *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);

Have you preserved this line intentionally?  This would make BRIN index
scan cost even higher, but the primary property of BRIN is to be cheap to
scan. Shouldn't bitmap heap scan node take into account of checking the
tuples in its cost?


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com]
> On 5 April 2017 at 10:37, Tsunakawa, Takayuki
>  wrote:
> 
> OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock
> Pages in Memory, using the attached pg_ctl.c.  Please see
> EnableLockPagesPrivilege() and its call site.  But pg_ctl -w start fails
> emitting the following message:
> 
> That won't work. You'd have to pass 0 to the flags of CreateRestrictedToken
> and instead supply a PrivilegesToDelete array.
> You'd probably GetTokenInformation and AND with a mask of ones you wanted
> to retain.

Uh, that's inconvenient.  We can't determine what privileges to delete, and we 
must be aware of new privileges added in the future version of Windows.

Then, I have to say the last patch (v12) is the final answer.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-04 Thread Michael Paquier
fore

On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangas  wrote:
> I will continue tomorrow, but I wanted to report on what I've done so far.
> Attached is a new patch version, quite heavily modified. Notable changes so
> far:

Great, thanks!

> * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints.
> IMHO this makes the tables easier to read (to a human), and they are also
> packed slightly more tightly (see next two points), as you can fit more
> codepoints in a 16-bit integer.

Using directly codepoints is not much consistent with the existing
backend, but for the sake of packing things more, OK.

> * Reorganize the decomposition table. Instead of a separate binary-searched
> table for each "size", have one table that's ordered by codepoint, which
> contains a length and offset into another array, where all the decomposed
> sequences are. This makes the recomposition step somewhat slower, as it now
> has to grovel through an array that contains all decompositions, rather than
> just the array that contains decompositions of two characters. But this
> isn't performance-critical, readability and tight packing of the tables
> matter more.

Okay, no objection to that.

> * In the lookup table, store singleton decompositions that decompose to a
> single character, and the character fits in a uint16, directly in the main
> lookup table, instead of storing an index into the other table. This makes
> the tables somewhat smaller, as there are a lot of such decompositions.

Indeed.

> * Use uint32 instead of pg_wchar to represent unicode codepoints. pg_wchar
> suggests something that holds multi-byte characters in the OS locale, used
> by functions like wcscmp, so avoid that. I added a "typedef uint32
> Codepoint" alias, though, to make it more clear which variables hold Unicode
> codepoints rather than e.g. indexes into arrays.
>
> * Unicode consortium publishes a comprehensive normalization test suite, as
> a text file, NormalizationTest.txt. I wrote a small perl and C program to
> run all the tests from that test suite, with the normalization function.
> That uncovered a number of bugs in the recomposition algorithm, which I then
> fixed:

I was looking for something like that for some time, thanks! That's
here actually:
http://www.unicode.org/Public/8.0.0/ucd/NormalizationTest.txt

>  * decompositions that map to ascii characters cannot be excluded.
>  * non-canonical and non-starter decompositions need to be excluded. They
> are now flagged in the lookup table, so that we only use them during the
> decomposition phase, not when recomposing.

Okay on that.

> TODOs / Discussion points:
>
> * I'm not sure I like the way the code is organized, I feel that we're
> littering src/common with these. Perhaps we should add a
> src/common/unicode_normalization directory for all this.

pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their
own file I think, and wchar.c can use that.

> * The list of characters excluded from recomposition is currently hard-coded
> in utf_norm_generate.pl. However, that list is available in machine-readable
> format, in file CompositionExclusions.txt. Since we're reading most of the
> data from UnicodeData.txt, would be good to read the exclusion table from a
> file, too.

Ouch. Those are present here...
http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions
Definitely it makes more sense to read them from a file.

> * SASLPrep specifies normalization form KC, but it also specifies that some
> characters are mapped to space or nothing. Should do those mappings, too.

Ah, right. Those ones are here:
https://tools.ietf.org/html/rfc3454#appendix-B.1
The conversion tables need some extra tweaking...

+   else if ((*utf & 0xf0) == 0xe0)
+   {
+   if (utf[1] == 0 || utf[2] == 0)
+   goto invalid;
+   cp = (*utf++) & 0x0F;
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   }
+   else if ((*utf & 0xf8) == 0xf0)
+   {
+   if (utf[1] == 0 || utf[2] == 0|| utf[3] == 0)
+   goto invalid;
+   cp = (*utf++) & 0x07;
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   }
I find more readable something like pg_utf2wchar_with_len(), but well...

Some typos:
s/fore/for/
s/reprsented/represented/

You seem to be fully on it... I can help out with any of those items as needed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote:
>> What's your point of the question? What kind of problem do you expect
>> if the timeout starts only once at the first parse meesage out of
>> bunch of parse messages?

> It's perfectly valid to send a lot of Parse messages without
> interspersed Sync or Bind/Execute message.  There'll be one timeout
> covering all of those Parse messages, which can thus lead to a timeout,
> even though nothing actually takes long individually.

It might well be reasonable to redefine statement_timeout as limiting the
total time from the first client input to the response to Sync ... but
if that's what we're doing, let's make sure we do it consistently.

I haven't read the patch, but the comments in this thread make me fear
that it's introducing some ad-hoc, inconsistent behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/5/17 00:58, Tom Lane wrote:
>> Another issue is whether you won't get compiler complaints about
>> redefinition of the "true" and "false" macros.  But those would
>> likely only be warnings, not flat-out errors.

> The complaint about bool is also just a warning.

Really?

$ cat test.c
typedef char bool;
typedef char bool;
$ gcc -c test.c
test.c:2: error: redefinition of typedef 'bool'
test.c:1: note: previous declaration of 'bool' was here

This is with gcc 4.4.7.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Amit Langote
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
> 
> However, running your original example, I'm getting this error
> 
> keith@keith=# CREATE TABLE list_partitioned (
> keith(# a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have
   been accepted by any of the table's existing partitions, so no need to
   move any rows

2. Default partition exists which might contain key=3 rows, which we'll
   need to move.  If we do this in the same transaction, as you say, it
   might result in unnecessary unavailability of table's data.  So, it's
   better to delegate that responsibility to a background process.  The
   current transaction will only add the new key=3 partition, so any key=3
   rows will be routed to the new partition from this point on.  But we
   haven't updated the default partition's constraint yet to say that it
   no longer contains key=3 rows (constraint that the planner consumes),
   so it will continue to be scanned for queries that request key=3 rows
   (there should be some metadata to indicate that the default partition's
   constraint is invalid), along with the new partition.

3. A background process receives a "work item" requesting it to move all
   key=3 rows from the default partition heap to the new partition's heap.
   The movement occurs without causing the table to become unavailable;
   once all rows have been moved, we momentarily lock the table to update
   the default partition's constraint to mark it valid, so that it will
   no longer be accessed by queries that want to see key=3 rows.

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of 

Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Rushabh Lathia
On Wed, Apr 5, 2017 at 10:59 AM, Amit Langote  wrote:

> On 2017/04/05 6:22, Keith Fiske wrote:
> > On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
> >> Please find attached an updated patch.
> >> Following has been accomplished in this update:
> >>
> >> 1. A new partition can be added after default partition if there are no
> >> conflicting rows in default partition.
> >> 2. Solved the crash reported earlier.
> >
> > Installed and compiled against commit
> > 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> > -0400) without any issue
> >
> > However, running your original example, I'm getting this error
> >
> > keith@keith=# CREATE TABLE list_partitioned (
> > keith(# a int
> > keith(# ) PARTITION BY LIST (a);
> > CREATE TABLE
> > Time: 4.933 ms
> > keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned
> FOR
> > VALUES IN (DEFAULT);
> > CREATE TABLE
> > Time: 3.492 ms
> > keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR
> VALUES
> > IN (4,5);
> > ERROR:  unrecognized node type: 216
>
> It seems like the new ExecPrepareCheck should be used in the place of
> ExecPrepareExpr in the code added in check_new_partition_bound().
>
> > Also, I'm still of the opinion that denying future partitions of values
> in
> > the default would be a cause of confusion. In order to move the data out
> of
> > the default and into a proper child it would require first removing that
> > data from the default, storing it elsewhere, creating the child, then
> > moving it back. If it's only a small amount of data it may not be a big
> > deal, but if it's a large amount, that could cause quite a lot of
> > contention if done in a single transaction. Either that or the user would
> > have to deal with data existing in the table, disappearing, then
> > reappearing again.
> >
> > This also makes it harder to migrate an existing table easily.
> Essentially
> > no child tables for a large, existing data set could ever be created
> > without going through one of the two situations above.
>
> I thought of the following possible way to allow future partitions when
> the default partition exists which might contain rows that belong to the
> newly created partition (unfortunately, nothing that we could implement at
> this point for v10):
>
> Suppose you want to add a new partition which will accept key=3 rows.
>
> 1. If no default partition exists, we're done; no key=3 rows would have
>been accepted by any of the table's existing partitions, so no need to
>move any rows
>
> 2. Default partition exists which might contain key=3 rows, which we'll
>need to move.  If we do this in the same transaction, as you say, it
>might result in unnecessary unavailability of table's data.  So, it's
>better to delegate that responsibility to a background process.  The
>current transaction will only add the new key=3 partition, so any key=3
>rows will be routed to the new partition from this point on.  But we
>haven't updated the default partition's constraint yet to say that it
>no longer contains key=3 rows (constraint that the planner consumes),
>so it will continue to be scanned for queries that request key=3 rows
>(there should be some metadata to indicate that the default partition's
>constraint is invalid), along with the new partition.
>
> 3. A background process receives a "work item" requesting it to move all
>key=3 rows from the default partition heap to the new partition's heap.
>The movement occurs without causing the table to become unavailable;
>once all rows have been moved, we momentarily lock the table to update
>the default partition's constraint to mark it valid, so that it will
>no longer be accessed by queries that want to see key=3 rows.
>
> Regarding 2, there is a question of whether it should not be possible for
> the row movement to occur in the same transaction.  Somebody may want that
> to happen because they chose to run the command during a maintenance
> window, when the table's becoming unavailable is not an issue.  In that
> case, we need to think of the interface more carefully.
>
> Regarding 3, I think the new autovacuum work items infrastructure added by
> the following commit looks very promising:
>
> * BRIN auto-summarization *
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
> 7526e10224f0792201e99631567bbe44492bbde4
>
> > However, thinking through this, I'm not sure I can see a solution without
> > the global index support. If this restriction is removed, there's still
> an
> > issue of data duplication after the necessary child table is added. So
> > guess it's a matter of deciding which user experience is better for the
> > moment?
>
> I'm not sure about the fate of this particular patch for v10, but until we
> implement a solution to move rows and design appropriate interface for the
> same, we could error out if moving rows is required at 

Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Masahiko Sawada
On Tue, Apr 4, 2017 at 2:12 PM, Amit Langote
 wrote:
> On 2017/03/30 17:39, Masahiko Sawada wrote:
>> On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote:
> I have updated the patch.
>>
>> I reviewed v7 patch.
>>
>> When I ran ANALYZE command to the table having 5 millions rows with 3
>> child tables, the progress report I got shows the strange result. The
>> following result was output after sampled all target rows from child
>> tables (i.g, after finished acquire_inherited_sample_rows). I think
>> the progress information should report 100% complete at this point.
>>
>
> [...]
>
>>
>> I guess the cause of this is that you always count the number of
>> sampled rows in acquire_sample_rows staring from 0 for each child
>> table. num_rows_sampled is reset whenever starting collecting sample
>> rows.
>> Also, even if table has child table, the total number of sampling row
>> is not changed. I think that main differences between analyzing on
>> normal table and on partitioned table is where we fetch the sample row
>> from. So I'm not sure if adding "computing inherited statistics" phase
>> is desirable. If you want to report progress of collecting sample rows
>> on child table I think that you might want to add the information
>> showing which child table is being analyzed.
>
> Two kinds of statistics are collected if the table is a inheritance parent.
>
> First kind considers the table by itself and calculates regular
> single-table statistics using rows sampled from the only available heap
> (by calling do_analyze_rel() with inh=false).
>
> The second kind are called inheritance statistics, which consider rows
> sampled from the whole inheritance hierarchy (by calling do_analyze_rel()
> with inh=true).  Note that we are still collecting statistics for the
> parent table, so we not really "analyzing" child tables; we just collect
> sample rows from them to calculate whole-tree statistics for the root
> parent table.

Agreed.

>  It might still be possible to show the child table being
> sampled though.
>
>> --
>> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
>> the number of rows the target table has actually. So If the table has
>> rows less than target number of rows, the num_rows_sampled never reach
>> to num_target_sample_rows.
>>
>> --
>> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
>> }
>>
>> samplerows += 1;
>> +
>> +   /* Report number of rows sampled so far */
>> +
>> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
>> numrows);
>> }
>> }
>>
>> I think that it's wrong to use numrows variable to report the progress
>> of the number of rows sampled. As the following comment mentioned, we
>> first save row into rows[] and increment numrows until numrows <
>> targrows. And then we could replace stored sample row with new sampled
>> row. That is, after collecting "numrows" rows, analyze can still take
>> a long time to get and replace the sample row. To computing progress
>> of collecting sample row, IMO reporting the number of blocks we
>> scanned so far is better than number of sample rows. Thought?
>
> We can report progress in terms of individual blocks only inside
> acquire_sample_rows(), which seems undesirable when one thinks that we
> will be resetting the target for every child table.  We should have a
> global target that considers all child tables in the inheritance
> hierarchy, which maybe is possible if we count them beforehand in
> acquire_inheritance_sample_rows().  But why not use target sample rows,
> which remains the same for both when we're collecting sample rows from one
> table and from the whole inheritance hierarchy.  We can keep the count of
> already collected rows in a struct that is used across calls for all the
> child tables and increment upward from that count when we start collecting
> from a new child table.

An another option I came up with is that support new pgstat progress
function, say pgstat_progress_incr_param, which increments index'th
member in st_progress_param[]. That way we just need to report a delta
using that function.

>
>>> /*
>>>  * The first targrows sample rows are simply copied into the
>>>  * reservoir. Then we start replacing tuples in the sample
>>>  * until we reach the end of the relation.  This algorithm is
>>>  * from Jeff Vitter's paper (see full citation below). It
>>>  * works by repeatedly computing the number of tuples to skip
>>>  * before selecting a tuple, which replaces a randomly chosen
>>>  * element of the reservoir (current set of tuples).  At all
>>>  * times the reservoir is a true random sample of the tuples
>>>  * we've passed over so far, so when we fall off the end of
>>>  * the relation we're done.
>>>  */
>
> It seems that we could use samplerows instead 

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Neha Khatri
Looking further in this context, number of active parallel workers is:
parallel_register_count - parallel_terminate_count

Can active workers ever be greater than max_parallel_workers, I think no.
Then why should there be greater than check in the following condition:

if (parallel && (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >= max_parallel_workers)

I feel there should be an assert if
(BackgroundWorkerData->parallel_register_count
- BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)

And the check could be
if (parallel && (active_parallel_workers == max_parallel_workers))
then do not register new parallel wokers and return.

There should be no tolerance for the case when active_parallel_workers >
max_parallel_workers. After all that is the purpose of max_parallel_workers.

Is it like multiple backends trying to register parallel workers at the
same time, for which the greater than check should be present?

Thoughts?

Regards,
Neha


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-04-04 Thread Daniel Gustafsson
> On 04 Apr 2017, at 05:52, Alvaro Herrera  wrote:
> 
> Daniel Gustafsson wrote:
>> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
>> mixed.  Since the option defnames are all lowercased, either via IDENT, 
>> keyword
>> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to 
>> be
>> so in quite a lot of places).
>> 
>> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential 
>> to
>> hide a DefElem created with a mixed-case defname where it in other places is
>> expected to be in lowercase, which may lead to subtle bugs.
>> 
>> The attached patch refactors to use strcmp() consistently for option 
>> processing
>> in the command code as a pre-emptive belts+suspenders move against such 
>> subtle
>> bugs and to make the code more consistent.  Also reorders a few checks to 
>> have
>> all in the same “format” and removes a comment related to the above.
>> 
>> Tested with randomizing case on options in make check (not included in 
>> patch).
> 
> Does it work correctly in the Turkish locale?

Yes, running make check with random case defnames under tr_TR works fine.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variable substitution in psql backtick expansion

2017-04-04 Thread Fabien COELHO


Hello Pavel,


The expression evaluation is interesting question, but there is a
workaround - we can use \gset already.


Yes, that is a good point. It is a little bit inconvenient because it 
requires a dummy variable name each time for testing.


  SELECT whatever AS somename \gset
  \if :somename

But this is an already functional solution to use server-side expressions, 
so there is no hurry.



What is more important, because there is not any workaround, is detection
if some variable exists or not.

So possibilities

1. \if defined varname


Yep, and as Tom pointed it should handle NOT as well.

My issue with this version is that Lane Tom convinced me some time ago 
that client side expressions should look like SQL expressions, so that 
everything in the script is somehow in the same language. I think that he 
made a good point.


However "defined varname" is definitely not an SQL expression, so I do not 
find that "intuitive", for a given subjective idea of intuitive.


Then there is the question of simple comparisons, which would also make 
sense client-side, eg simple things like:


  \if :VERSION_NUM >= 11

Should not need to be executed on the server.


2. \ifdefined or \ifdef varname


I think that we want to avoid that if possible, but it is a cpp-like 
possibility. This approach does not allow to support comparisons.



3. \if :?varname


Tom suggested that there is a special namespace problem with this option. 
I did not understand what is the actual issue.



I like first two, and I can live with @3 - although it is not intuitive


For me @3 is neither worth nor better than the already existing :'varname' 
and :"varname" hacks, it is consistent with them, so it is just an 
extension of the existing approach.


It seems easy to implement because the substitution would be handled by 
the lexer, so there is no need for anything special like looking at the 
first or second word, rewinding, whatever.


Basically I agree with everything Tom suggested (indeed, some client side 
definition & comparison tests are really needed), but not with the 
proposed prefix syntax because it does not look clean and SQL.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: Introduction a commontype as new polymorphic type

2017-04-04 Thread Pavel Stehule
Hi

I am still little bit unhappy with missing functionality in our generic
types.

If I write function fx(anyelement, anyelement) returns anyelement

postgres=# create or replace function fx(anyelement, anyelement) returns
anyelement
as $$ select greather($1,$2) $$ language sql;
CREATE FUNCTION
postgres=# select fx(1,1.1);
ERROR:  function fx(integer, numeric) does not exist
LINE 1: select fx(1,1.1);
   ^
It fails on basic example.

What do you think about introduction new similar polymorphic type, that
will use common type for real parameters?

some like

create or replace function fx(anyvalue, anyvalue) returns anyvalue
create or replace function fx(anyvalue[]) returns anyvalue

Using "any" and casting inside function has significant negative impact on
performance

Regards

Pavel


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-04 06:18:04 +, Tsunakawa, Takayuki wrote:
> From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> > It's too late. Someone has already moved the patch to the next CF (for
> > PostgreSQL 11).
> 
> Yes, but this patch will be necessary by the final release of PG 10 if the 
> libpq batch/pipelining is committed in PG 10.
> 
> I marked this as ready for committer in the next CF, so that some committer 
> can pick up this patch and consider putting it in PG 10.  If you decide to 
> modify the patch, please change the status.

Given the concern raised in 
http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.pa.us
I don't see this being ready for committer.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: Andres Freund [mailto:and...@anarazel.de]
Given the concern raised in
> http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.p
> a.us
> I don't see this being ready for committer.

If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute starts 
and stops the timer), then it's a concern and the patch should not be ready for 
committer.  However, the current patch is not like that -- it seems to do what 
others in this thread are expecting.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
>> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
>> starts and stops the timer), then it's a concern and the patch should not be 
>> ready for committer.  However, the current patch is not like that -- it 
>> seems to do what others in this thread are expecting.
> 
> Oh, interesting - I kind of took the author's statement as, uh,
> authoritative ;).  A quick look over the patch confirms your
> understanding.

Yes, Tsunakawa-san is correct. Sorry for confusion.

> I think the code needs a few clarifying comments around this, but
> otherwise seems good.  Not restarting the timeout in those cases
> obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> comments should note that.
> 
> Tatsuo-san, do you want to change those, and push?  I can otherwise.

Andres,

If you don't mind, could you please fix the comments and push it.

> Unfortunately I can't move the patch back to the current CF, but I guess
> we can just mark it as committed in the next.

That will be great.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-04 Thread David Rowley
On 1 April 2017 at 04:25, David Rowley  wrote:
> I've attached an updated patch.

I've made another pass at this and ended up removing the tryextstats
variable. We now only try to use extended statistics when
clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
RTE_RELATION, and of course, it must also have some extended stats
defined too.

I've also cleaned up a few more comments, many of which I managed to
omit updating when I refactored how the selectivity estimates ties
into clauselist_selectivity()

I'm quite happy with all of this now, and would also be happy for
other people to take a look and comment.

As a reviewer, I'd be marking this ready for committer, but I've moved
a little way from just reviewing this now, having spent two weeks
hacking at it.

The latest patch is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mv_functional-deps_2017-04-04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread 'Andres Freund'
On 2017-04-04 06:35:00 +, Tsunakawa, Takayuki wrote:
> From: Andres Freund [mailto:and...@anarazel.de]
> Given the concern raised in
> > http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.p
> > a.us
> > I don't see this being ready for committer.
> 
> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
> starts and stops the timer), then it's a concern and the patch should not be 
> ready for committer.  However, the current patch is not like that -- it seems 
> to do what others in this thread are expecting.

Oh, interesting - I kind of took the author's statement as, uh,
authoritative ;).  A quick look over the patch confirms your
understanding.

I think the code needs a few clarifying comments around this, but
otherwise seems good.  Not restarting the timeout in those cases
obviously isn't entirely "perfect"/"correct", but a tradeoff - the
comments should note that.

Tatsuo-san, do you want to change those, and push?  I can otherwise.

Unfortunately I can't move the patch back to the current CF, but I guess
we can just mark it as committed in the next.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some never executed code regarding the table sync worker

2017-04-04 Thread Masahiko Sawada
On Tue, Apr 4, 2017 at 6:26 PM, Kyotaro HORIGUCHI
 wrote:
> Hi,
>
> At Sat, 1 Apr 2017 02:35:00 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Amit Khandekar
On 4 April 2017 at 01:47, Andres Freund  wrote:
>> +typedef struct ParallelAppendDescData
>> +{
>> + LWLock  pa_lock;/* mutual exclusion to choose 
>> next subplan */
>> + int pa_first_plan;  /* plan to choose while 
>> wrapping around plans */
>> + int pa_next_plan;   /* next plan to choose by any 
>> worker */
>> +
>> + /*
>> +  * pa_finished : workers currently executing the subplan. A worker 
>> which
>> +  * finishes a subplan should set pa_finished to true, so that no new
>> +  * worker picks this subplan. For non-partial subplan, a worker which 
>> picks
>> +  * up that subplan should immediately set to true, so as to make sure
>> +  * there are no more than 1 worker assigned to this subplan.
>> +  */
>> + boolpa_finished[FLEXIBLE_ARRAY_MEMBER];
>> +} ParallelAppendDescData;
>
>
>> +typedef ParallelAppendDescData *ParallelAppendDesc;
>
> Pointer hiding typedefs make this Andres sad.

Yeah .. was trying to be consistent with other parts of code where we
have typedefs for both structure and a pointer to that structure.

>
>
>
>> @@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node)
>>   if (subnode->chgParam == NULL)
>>   ExecReScan(subnode);
>>   }
>> +
>> + if (padesc)
>> + {
>> + padesc->pa_first_plan = padesc->pa_next_plan = 0;
>> + memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
>> + }
>> +
>
> Is it actually guaranteed that none of the parallel workers are doing
> something at that point?

ExecReScanAppend() would be called by ExecReScanGather().
ExecReScanGather() shuts down all the parallel workers before calling
its child node (i.e. ExecReScanAppend).


>> +static bool
>> +exec_append_parallel_next(AppendState *state)
>> +{
>> + ParallelAppendDesc padesc = state->as_padesc;
>> + int whichplan;
>> + int initial_plan;
>> + int first_partial_plan = ((Append 
>> *)state->ps.plan)->first_partial_plan;
>> + boolfound;
>> +
>> + Assert(padesc != NULL);
>> +
>> + /* Backward scan is not supported by parallel-aware plans */
>> + Assert(ScanDirectionIsForward(state->ps.state->es_direction));
>> +
>> + /* The parallel leader chooses its next subplan differently */
>> + if (!IsParallelWorker())
>> + return exec_append_leader_next(state);
>
> It's a bit weird that the leader's case does is so separate, and does
> it's own lock acquisition.

Since we wanted to prevent it from taking the most expensive
non-partial plans first , thought it would be better to keep its logic
simple and separate, so could not merge it in the main logic code.

>
>
>> + found = false;
>> + for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;)
>> + {
>> + /*
>> +  * Ignore plans that are already done processing. These also 
>> include
>> +  * non-partial subplans which have already been taken by a 
>> worker.
>> +  */
>> + if (!padesc->pa_finished[whichplan])
>> + {
>> + found = true;
>> + break;
>> + }
>> +
>> + /*
>> +  * Note: There is a chance that just after the child plan node 
>> is
>> +  * chosen above, some other worker finishes this node and sets
>> +  * pa_finished to true. In that case, this worker will go 
>> ahead and
>> +  * call ExecProcNode(child_node), which will return NULL tuple 
>> since it
>> +  * is already finished, and then once again this worker will 
>> try to
>> +  * choose next subplan; but this is ok : it's just an extra
>> +  * "choose_next_subplan" operation.
>> +  */
>
> IIRC not all node types are safe against being executed again when
> they've previously returned NULL.  That's why e.g. nodeMaterial.c
> contains the following blurb:
> /*
>  * If necessary, try to fetch another row from the subplan.
>  *
>  * Note: the eof_underlying state variable exists to short-circuit 
> further
>  * subplan calls.  It's not optional, unfortunately, because some plan
>  * node types are not robust about being called again when they've 
> already
>  * returned NULL.
>  */

This scenario is different from the parallel append scenario described
by my comment. A worker sets pa_finished to true only when it itself
gets a NULL tuple for a given subplan. So in
exec_append_parallel_next(), suppose a worker W1 finds a subplan with
pa_finished=false. So it chooses it. Now a different worker W2 sets
this subplan's pa_finished=true because W2 has got a NULL tuple. But
W1 hasn't yet got a NULL tuple. If it had got a NULL tuple earlier, it
would have itself set pa_finished to true, and then it 

Re: [HACKERS] Variable substitution in psql backtick expansion

2017-04-04 Thread Pavel Stehule
2017-04-04 9:53 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> The expression evaluation is interesting question, but there is a
>> workaround - we can use \gset already.
>>
>
> Yes, that is a good point. It is a little bit inconvenient because it
> requires a dummy variable name each time for testing.
>
>   SELECT whatever AS somename \gset
>   \if :somename
>
> But this is an already functional solution to use server-side expressions,
> so there is no hurry.
>
> What is more important, because there is not any workaround, is detection
>> if some variable exists or not.
>>
>> So possibilities
>>
>> 1. \if defined varname
>>
>
> Yep, and as Tom pointed it should handle NOT as well.
>
> My issue with this version is that Lane Tom convinced me some time ago
> that client side expressions should look like SQL expressions, so that
> everything in the script is somehow in the same language. I think that he
> made a good point.
>
> However "defined varname" is definitely not an SQL expression, so I do not
> find that "intuitive", for a given subjective idea of intuitive.
>
> Then there is the question of simple comparisons, which would also make
> sense client-side, eg simple things like:
>
>   \if :VERSION_NUM >= 11
>
> Should not need to be executed on the server.
>
> 2. \ifdefined or \ifdef varname
>>
>
> I think that we want to avoid that if possible, but it is a cpp-like
> possibility. This approach does not allow to support comparisons.
>
> 3. \if :?varname
>>
>
> Tom suggested that there is a special namespace problem with this option.
> I did not understand what is the actual issue.
>
> I like first two, and I can live with @3 - although it is not intuitive
>>
>
> For me @3 is neither worth nor better than the already existing :'varname'
> and :"varname" hacks, it is consistent with them, so it is just an
> extension of the existing approach.
>
> It seems easy to implement because the substitution would be handled by
> the lexer, so there is no need for anything special like looking at the
> first or second word, rewinding, whatever.
>
> Basically I agree with everything Tom suggested (indeed, some client side
> definition & comparison tests are really needed), but not with the proposed
> prefix syntax because it does not look clean and SQL.


I don't need a full SQL expression in \if commands ever. I prefer some
simple functional language here implemented only on client side - the code
from pgbench can be used maybe

\if fx( variable | constant [, ... ] )

the buildin functions can be only basic

defined, undefined, equal, greater, less

\if defined(varname)
\if geq(VERSION_NUM, 11000)

But this question is important - there is not a workaround

postgres=# select :xxx
postgres-# ;
ERROR:  syntax error at or near ":"
LINE 1: select :xxx
   ^
postgres=# \if :xxx
unrecognized value ":xxx" for "\if expression": boolean expected
postgres@#






>
>
> --
> Fabien.
>


Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Amit Langote
On 2017/04/04 15:30, Masahiko Sawada wrote:
>> We can report progress in terms of individual blocks only inside
>> acquire_sample_rows(), which seems undesirable when one thinks that we
>> will be resetting the target for every child table.  We should have a
>> global target that considers all child tables in the inheritance
>> hierarchy, which maybe is possible if we count them beforehand in
>> acquire_inheritance_sample_rows().  But why not use target sample rows,
>> which remains the same for both when we're collecting sample rows from one
>> table and from the whole inheritance hierarchy.  We can keep the count of
>> already collected rows in a struct that is used across calls for all the
>> child tables and increment upward from that count when we start collecting
>> from a new child table.
> 
> An another option I came up with is that support new pgstat progress
> function, say pgstat_progress_incr_param, which increments index'th
> member in st_progress_param[]. That way we just need to report a delta
> using that function.

That's an interesting idea.  It could be made to work and would not
require changing the interface of AcquireSampleRowsFunc, which seems very
desirable.

 /*
  * The first targrows sample rows are simply copied into the
  * reservoir. Then we start replacing tuples in the sample
  * until we reach the end of the relation.  This algorithm is
  * from Jeff Vitter's paper (see full citation below). It
  * works by repeatedly computing the number of tuples to skip
  * before selecting a tuple, which replaces a randomly chosen
  * element of the reservoir (current set of tuples).  At all
  * times the reservoir is a true random sample of the tuples
  * we've passed over so far, so when we fall off the end of
  * the relation we're done.
  */
>>
>> It seems that we could use samplerows instead of numrows to count the
>> progress (if we choose to count progress in terms of sample rows collected).
>>
> 
> I guess it's hard to count progress in terms of sample rows collected
> even if we use samplerows instead, because samplerows can be
> incremented independently of the target number of sampling rows. The
> samplerows can be incremented up to the total number of rows of
> relation.

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-04 Thread Etsuro Fujita

On 2017/04/04 14:38, Ashutosh Bapat wrote:

Probably we should use "could not be created" instead of "was not
created" in "... a local path suitable for EPQ checks was not created".


Done.


"outer_path should not require relations from inner_path" may be
reworded as "outer paths should not be parameterized by the inner
relations".

"neither path should require relations from the other path" may be
reworded as "neither path should be parameterized by the the other
joining relation".


Done.  I used "input" instead of "joining" in the latter, though.


Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We
should be able
to create a nested loop join for JOIN_RIGHT?
+case JOIN_RIGHT:
+case JOIN_FULL:


I don't think so, because nestloop joins aren't supported for
JOIN_RIGHT.  See ExecInitNestLoop().


Hmm, I see in match_unsorted_outer()
1254 case JOIN_RIGHT:
1255 case JOIN_FULL:
1256 nestjoinOK = false;
1257 useallclauses = true;
1258 break;


Yeah, I should have pointed that out as well.

I rebased the patch also.  Please find attached an updated version of 
the patch.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1629,1650  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1673,1694  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS 

Re: [HACKERS] Some never executed code regarding the table sync worker

2017-04-04 Thread Kyotaro HORIGUCHI
Hi,

At Sat, 1 Apr 2017 02:35:00 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Compiler warning in costsize.c

2017-04-04 Thread David Rowley
On 4 April 2017 at 16:22, Michael Paquier  wrote:
> Hi all,
>
> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can
> generate warnings. Here is for example with MSVC:
>   src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : 
> unreferen
> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>   src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : 
> unreferen
> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>
> a9c074ba7 has done an effort, but a bit more is needed as the attached.

Thanks for writing the patch.

I wondering if it would be worth simplifying it a little to get rid of
the double #ifdefs, as per attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


costsize-fix-warning_drowley.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-04-04 Thread Anastasia Lubennikova

01.04.2017 02:31, Peter Geoghegan:


* index_truncate_tuple() should have as an argument the number of
attributes. No need to "#include utils/rel.h" that way.

Will fix.


* I think that we should store this (the number of attributes), and
use it directly when comparing, per my remarks to Tom over on that
other thread. We should also use the free bit within
IndexTupleData.t_info, to indicate that the IndexTuple was truncated,
just to make it clear to everyone that might care that that's how
these truncated IndexTuples need to be represented.

Doing this would have no real impact on your patch, because for you
this will be 100% redundant. It will help external tools, and perhaps
another, more general suffix truncation patch that comes in the
future. We should try very hard to have a future-proof on-disk
representation. I think that this is quite possible.
To be honest, I think that it'll make the patch overcomplified, because 
this exact patch has nothing to do with suffix truncation.
Although, we can add any necessary flags if this work will be continued 
in the future.

* I suggest adding a "can't happen" defensive check + error that
checks that the tuple returned by index_truncate_tuple() is sized <=
the original. This cannot be allowed to ever happen. (Not that I think
it will.)

There is already an assertion.
Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup));
Do you think it is not enough?

* I see a small bug. You forgot to teach _bt_findsplitloc() about
truncation. It does this currently, which you did not update:

 /*
  * The first item on the right page becomes the high key of the left page;
  * therefore it counts against left space as well as right space.
  */
 leftfree -= firstrightitemsz;

I think that this accounting needs to be fixed.
Could you explain, what's wrong with this accounting? We may expect to 
take more space on the left page, than will be taken after highkey 
truncation. But I don't see any problem here.



* Note sure about one thing. What's the reason for this change?


-   /* Log left page */
-   if (!isleaf)
-   {
-   /*
-* We must also log the left page's high key, because the right
-* page's leftmost key is suppressed on non-leaf levels.  Show it
-* as belonging to the left page buffer, so that it is not stored
-* if XLogInsert decides it needs a full-page image of the left
-* page.
-*/
-   itemid = PageGetItemId(origpage, P_HIKEY);
-   item = (IndexTuple) PageGetItem(origpage, itemid);
-   XLogRegisterBufData(0, (char *) item, 
MAXALIGN(IndexTupleSize(item)));
-   }
+   /*
+* We must also log the left page's high key, because the right
+* page's leftmost key is suppressed on non-leaf levels.  Show it
+* as belonging to the left page buffer, so that it is not stored
+* if XLogInsert decides it needs a full-page image of the left
+* page.
+*/
+   itemid = PageGetItemId(origpage, P_HIKEY);
+   item = (IndexTuple) PageGetItem(origpage, itemid);
+   XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item)));

Is this related to the problem that you mentioned to me that you'd
fixed when we spoke in person earlier today? You said something about
WAL logging, but I don't recall any details. I don't remember seeing
this change in prior versions.

Anyway, whatever the reason for doing this on the leaf level now, the
comments should be updated to explain it.

This change related to the bug described in this message.
https://www.postgresql.org/message-id/20170330192706.GA2565%40e733.localdomain
After fix it is not reproducible. I will update comments in the next patch.

* Speaking of WAL-logging, I think that there is another bug in
btree_xlog_split(). You didn't change this existing code at all:

 /*
  * On leaf level, the high key of the left page is equal to the first key
  * on the right page.
  */
 if (isleaf)
 {
 ItemId  hiItemId = PageGetItemId(rpage, P_FIRSTDATAKEY(ropaque));

 left_hikey = PageGetItem(rpage, hiItemId);
 left_hikeysz = ItemIdGetLength(hiItemId);
 }

It seems like this was missed when you changed WAL-logging, since you
do something for this on the logging side, but not here, on the replay
side. No?


I changed it. Now we always use highkey saved in xlog.
This code don't needed anymore and can be deleted. Thank you for the 
notice. I will send updated patch today.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-04-04 Thread Stas Kelvich

> On 4 Apr 2017, at 04:23, Masahiko Sawada  wrote:
> 
> 
> I reviewed this patch but when I tried to build contrib/test_decoding
> I got the following error.
> 

Thanks!

Yes, seems that 18ce3a4a changed ProcessUtility_hook signature.
Updated.

> There are still some unnecessary code in v5 patch.

Actually second diff isn’t intended to be part of the patch, I've just shared
the way I ran regression test suite through the 2pc decoding changing
all commits to prepare/commits where commits happens only after decoding
of prepare is finished (more details in my previous message in this thread).

That is just argument against Andres concern that prepared transaction
is able to deadlock with decoding process — at least no such cases in
regression tests.

And that concern is main thing blocking this patch. Except explicit catalog
locks in prepared tx nobody yet found such cases and it is hard to address
or argue about.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



logical_twophase_v6.diff
Description: Binary data


logical_twophase_regresstest.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Bitmap Heap Scan - Prefetch pages are not updated properly

2017-04-04 Thread Dilip Kumar
While analyzing the coverage for the prefetching part, I found an
issue that prefetch_pages were not updated properly while executing in
parallel mode.

Attached patch fixes the same.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


parallel_bitmap_prefetch_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-04-04 Thread Amit Kapila
On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma  wrote:
>
> Please note that these patches needs to be applied on top of  [1].
>

Few more review comments:

1.
- page = BufferGetPage(so->hashso_curbuf);
+ blkno = so->currPos.currPage;
+ if (so->hashso_bucket_buf == so->currPos.buf)
+ {
+ buf = so->currPos.buf;
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+ page = BufferGetPage(buf);
+ }

Here, you are assuming that only bucket page can be pinned, but I
think that assumption is not right.  When _hash_kill_items() is called
before moving to next page, there could be a pin on the overflow page.
You need some logic to check if the buffer is pinned, then just lock
it.  I think once you do that, it might not be convinient to release
the pin at the end of this function.

Add some comments on top of _hash_kill_items to explain the new
changes or say some thing like "See _bt_killitems for details"

2.
+ /*
+ * We save the LSN of the page as we read it, so that we know whether it
+ * safe to apply LP_DEAD hints to the page later.  This allows us to drop
+ * the pin for MVCC scans, which allows vacuum to avoid blocking.
+ */
+ so->currPos.lsn = PageGetLSN(page);
+

The second part of above comment doesn't make sense because vacuum's
will still be blocked because we hold the pin on primary bucket page.

3.
+ {
+ /*
+ * No more matching tuples were found. return FALSE
+ * indicating the same. Also, remember the prev and
+ * next block number so that if fetching tuples using
+ * cursor we remember the page from where to start the
+ * scan.
+ */
+ so->currPos.prevPage = (opaque)->hasho_prevblkno;
+ so->currPos.nextPage = (opaque)->hasho_nextblkno;

You can't read opaque without having lock and by this time it has
already been released.  Also, I think if you want to save position for
cursor movement, then you need to save the position of last bucket
when scan completes the overflow chain, however as you have written it
will be always invalid block number. I think there is similar problem
with prevblock number.

4.
+_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
+   OffsetNumber maxoff, ScanDirection dir)
+{
+ HashScanOpaque so = (HashScanOpaque) scan->opaque;
+ IndexTuple  itup;
+ int itemIndex;
+
+ if (ScanDirectionIsForward(dir))
+ {
+ /* load items[] in ascending order */
+ itemIndex = 0;
+
+ /* new page, relocate starting position by binary search */
+ offnum = _hash_binsearch(page, so->hashso_sk_hash);

What is the need to find offset number when this function already has
that as an input parameter?

5.
+_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
+   OffsetNumber maxoff, ScanDirection dir)

I think maxoff is not required to be passed a parameter to this
function as you need it only for forward scan. You can compute it
locally.

6.
+_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
+   OffsetNumber maxoff, ScanDirection dir)
{
..
+ if (ScanDirectionIsForward(dir))
+ {
..
+ while (offnum <= maxoff)
{
..
+ if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
+ _hash_checkqual(scan, itup))
+ {
+ /* tuple is qualified, so remember it */
+ _hash_saveitem(so, itemIndex, offnum, itup);
+ itemIndex++;
+ }
+
+ offnum = OffsetNumberNext(offnum);
..

Why are you traversing the whole page even when there is no match?
There is a similar problem in backward scan. I think this is blunder.

7.
+ if (so->currPos.buf == so->hashso_bucket_buf ||
+ so->currPos.buf == so->hashso_split_bucket_buf)
+ {
+ so->currPos.prevPage = InvalidBlockNumber;
+ LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+ }
+ else
+ {
+ so->currPos.prevPage = (opaque)->hasho_prevblkno;
+ _hash_relbuf(rel, so->currPos.buf);
+ }
+
+ so->currPos.nextPage = (opaque)->hasho_nextblkno;

What makes you think it is safe read opaque after releasing the lock?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-04-04 Thread Kyotaro HORIGUCHI
Hello,

At Sun, 2 Apr 2017 12:21:14 -0400, Corey Huinker  
wrote in 

Re: [HACKERS] Page Scan Mode in Hash Index

2017-04-04 Thread Amit Kapila
On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharma  wrote:
>
> My guess (which could be wrong) is that so->hashso_bucket_buf =
>> InvalidBuffer should be moved back up higher in the function where it
>> was before, just after the first if statement, and that the new
>> condition so->hashso_bucket_buf == so->currPos.buf on the last
>> _hash_dropbuf() should be removed.  If that's not right, then I think
>> I need somebody to explain why not.
>
> Okay, as i mentioned above, in case of page scan mode we only keep pin
> on a bucket buf. There won't be any case where we will be having pin
> on overflow buf at the end of scan.
>

What if mark the buffer as invalid after releasing the pin?  We are
already doing it that way in btree code, refer
_bt_drop_lock_and_maybe_pin().  I think if we do that way, then we can
do what Robert is suggesting.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW and parallel execution

2017-04-04 Thread Kyotaro HORIGUCHI
Hi,

At Sun, 02 Apr 2017 16:30:24 +0300, Konstantin Knizhnik 
 wrote in <58e0fcf0.2070...@postgrespro.ru>
> Hi hackers and personally Robet (you are the best expert in both
> areas).
> I want to ask one more question concerning parallel execution and FDW.
> Below are two plans for the same query (TPC-H Q5): one for normal
> tables, another for FDW to vertical representation of the same data.
> FDW supports analyze function and is expected to produce the similar
> statistic as for original tables.

> The plans look very similar, but first one is parallel and second -
> not.
> My FDW provides implementation for IsForeignScanParallelSafe which
> returns true.
> I wonder what can prevent optimizer from using parallel plan in this
> case?

Parallel execution requires partial paths. It's the work for
GetForeignPaths of your FDW.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-04-04 Thread Amit Khandekar
On 3 April 2017 at 17:13, Amit Langote  wrote:
> Hi Amit,
>
> Thanks for updating the patch.  Since ddl.sgml got updated on Saturday,
> patch needs a rebase.

Rebased now.

>
>> On 31 March 2017 at 16:54, Amit Khandekar  wrote:
>>> On 31 March 2017 at 14:04, Amit Langote  
>>> wrote:
 On 2017/03/28 19:12, Amit Khandekar wrote:
> In the section 5.11 "Partitioning" => subsection 5 "Caveats", I have
> removed the caveat of not being able to update partition key. And it
> is now replaced by the caveat where an update/delete operations can
> silently miss a row when there is a concurrent UPDATE of partition-key
> happening.

 Hmm, how about just removing the "partition-changing updates are
 disallowed" caveat from the list on the 5.11 Partitioning page and explain
 the concurrency-related caveats on the UPDATE reference page?
>>>
>>> IMHO this caveat is better placed in Partitioning chapter to emphasize
>>> that it is a drawback specifically in presence of partitioning.
>
> I mean we fixed things for declarative partitioning so it's no longer a
> caveat like it is for partitioning implemented using inheritance (in that
> the former doesn't require user-defined triggers to implement
> row-movement).  Seeing the first sentence, that is:
>
> An UPDATE causes a row to move from one partition to another
> if the new value of the row fails to satisfy the implicit partition
> constraint of the original partition but there is another partition which
> can fit this row.
>
> which clearly seems to suggest that row-movement, if required, is handled
> by the system.  So it's not clear why it's in this list.  If we want to
> describe the limitations of the current implementation, we'll need to
> rephrase it a bit.

Yes I agree.

> How about something like:
> For an UPDATE that causes a row to move from one partition to
> another due the partition key being updated, the following caveats exist:
>  presence of concurrent manipulation of the row in question>

Now with the slightly changed doc structuring for partitioning in
latest master, I have described in the end of section "5.10.2.
Declarative Partitioning" this note :

---

"Updating the partition key of a row might cause it to be moved into a
different partition where this row satisfies its partition
constraint."

---

And then in the Limitations section, I have replaced the earlier
can't-update-partition-key limitation with this new limitation as
below :

"When an UPDATE causes a row to move from one partition to another,
there is a chance that another concurrent UPDATE or DELETE misses this
row. Suppose, during the row movement, the row is still visible for
the concurrent session, and it is about to do an UPDATE or DELETE
operation on the same row. This DML operation can silently miss this
row if the row now gets deleted from the partition by the first
session as part of its UPDATE row movement. In such case, the
concurrent UPDATE/DELETE, being unaware of the row movement,
interprets that the row has just been deleted so there is nothing to
be done for this row. Whereas, in the usual case where the table is
not partitioned, or where there is no row movement, the second session
would have identified the newly updated row and carried UPDATE/DELETE
on this new row version."

---

Further, in the Notes section of update.sgml, I have kept a link to
the above limitations section like this :

"In the case of a partitioned table, updating a row might cause it to
no longer satisfy the partition constraint of the containing
partition. In that case, if there is some other partition in the
partition tree for which this row satisfies its partition constraint,
then the row is moved to that partition. If there isn't such a
partition, an error will occur. The error will also occur when
updating a partition directly. Behind the scenes, the row movement is
actually a DELETE and INSERT operation. However, there is a
possibility that a concurrent UPDATE or DELETE on the same row may
miss this row. For details see the section Section 5.10.2.3."

>
 +If an UPDATE on a partitioned table causes a row to
 +move to another partition, it is possible that all row-level
 +BEFORE UPDATE triggers and all 
 row-level
 +BEFORE DELETE/INSERT
 +triggers are applied on the respective partitions in a way that is
 apparent
 +from the final state of the updated row.

 How about dropping "it is possible that" from this sentence?
>>>
>>> What the statement means is : "It is true that all triggers are
>>> applied on the respective partitions; but it is possible that they are
>>> applied in a way that is apparent from final state of the updated
>>> row". So "possible" applies to "in a way that is apparent..". It
>>> means, the user should be aware that all the triggers can change the
>>> row and so the final row will be 

Re: [HACKERS] wait event documentation

2017-04-04 Thread Robert Haas
On Mon, Apr 3, 2017 at 11:57 PM, Amit Langote
 wrote:
> By the way, wonder if it wouldn't make sense to take the whole Table 28.1.
> Dynamic Statistics Views into a new section (perhaps before 28.2 Viewing
> Locks or after), since those views display information different from what
> the statistics collector component collects and publishes (those in the
> Table 28.2. Collected Statistics Views).

It seems a little short for that, doesn't it?  I mean, I'm not against
it on principle, but we don't want to hoist a 5-line table into a
section by itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-04 Thread Heikki Linnakangas

On 03/31/2017 10:10 AM, Michael Paquier wrote:

On Wed, Mar 8, 2017 at 10:39 PM, Robert Haas  wrote:

On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier
 wrote:
I kinda hope Heikki is going to step up to the plate here, because I
think he understands most of it already.


The second person who knows a good deal about NFKC is Ishii-san.

Attached is a rebased patch.


Thanks, I'm looking at this now.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-04-04 Thread David Rowley
On 4 April 2017 at 13:35, Claudio Freire  wrote:
> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire  wrote:
>> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
>>  wrote:
 One last observation:

 +/*
 + * An IS NOT NULL test is a no-op if there's any other strict 
 quals,
 + * so if that's the case, then we'll only apply this, otherwise 
 we'll
 + * ignore it.
 + */
 +else if (cslist->selmask == CACHESEL_NOTNULLTEST)
 +s1 *= cslist->notnullsel;

 In some paths, the range selectivity estimation code punts and returns
 DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
 present, it should still be applied. It could make a big difference if
 the selectivity for the nulltest is high.
>>>
>>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
>>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
>>> to exists to estimate that properly. I don't think that needs done as
>>> part of this patch. I'd rather limit the scope since "returned with
>>> feedback" is already knocking at the door of this patch.
>>
>> I agree, but this patch regresses for those cases if the user
>> compensated with a not null test, leaving little recourse, so I'd fix
>> it in this patch to avoid the regression.
>>
>> Maybe you're right that the null fraction should be applied regardless
>> of whether there's an explicit null check, though.
>
> The more I read that code, the more I'm convinced there's no way to
> get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
> &&
>  rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
> clauses, a case the current code handles very poorly, returning
> DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
> give way-off estimates if the table had lots of nulls.
>
> Say, consider if "value" was mostly null and you write:
>
> SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
> 50 AND 60;
>
> All other cases I can think of would end with either hibound or
> lobound being equal to DEFAULT_INEQ_SEL.
>
> It seems to me, doing a git blame, that the checks in the else branch
> were left only as a precaution, one that seems unnecessary.
>
> If you ask me, I'd just leave:
>
> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
> DEFAULT_INEQ_SEL)
> + {
> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
> implies we have no stats */
> + s2 = DEFAULT_RANGE_INEQ_SEL;
> + }
> + else
> + {
> ...
> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
> +   notnullsel = 1.0 - nullsel;
> +
> +   /* Adjust for double-exclusion of NULLs */
> +   s2 += nullsel;
> +   if (s2 <= 0.0) {
> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
> +  {
> +   /* Most likely contradictory clauses found */
> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
> +  }
> +  else
> +  {
> +   /* Could be a rounding error */
> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
> +  }
> +   }
> + }
>
> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
> cautious) estimation of the amount of rounding error that could be
> there with 32-bit floats.
>
> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
> an estimation based on stats, maybe approximate, but one that makes
> sense (ie: preserves the monotonicity of the CPF), and as such
> negative values are either sign of a contradiction or rounding error.
> The previous code left the possibility of negatives coming out of
> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
> but that doesn't seem possible coming out of scalarineqsel.
>
> But I'd like it if Tom could comment on that, since he's the one that
> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
> in 2004).
>
> Barring that, I'd just change the
>
> s2 = DEFAULT_RANGE_INEQ_SEL;
>
> To
>
> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>
> Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compiler warning in costsize.c

2017-04-04 Thread Michael Paquier
On Tue, Apr 4, 2017 at 7:03 PM, David Rowley
 wrote:
> On 4 April 2017 at 16:22, Michael Paquier  wrote:
>> Hi all,
>>
>> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can
>> generate warnings. Here is for example with MSVC:
>>   src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : 
>> unreferen
>> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>>   src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : 
>> unreferen
>> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>>
>> a9c074ba7 has done an effort, but a bit more is needed as the attached.
>
> Thanks for writing the patch.
>
> I wondering if it would be worth simplifying it a little to get rid of
> the double #ifdefs, as per attached.

Yes, that would be fine as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Craig Ringer
On 4 Apr. 2017 14:22, "Andres Freund"  wrote:

On 2017-01-05 03:12:09 +, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander
> > For the pg_ctl changes, we're going from removing all privilieges from
the
> > token, to removing none. Are there any other privileges that we should
be
> > worried about? I think you may be correct in that it's overkill to do
it,
> > but I think we need some more specifics to decide that.
>
> This page lists the privileges.  Is there anyhing you are concerned about?
>
> https://msdn.microsoft.com/ja-jp/library/windows/desktop/
bb530716(v=vs.85).aspx

Aren't like nearly all of them a concern?  We gone from having some
containment (not being to create users, shut the system down, ...), to
none.  I do however think there's a fair argument to be made that other
platforms do not have a similar containment (no root, but sudo etc is
still possible), and that the containment isn't super strong.


TBH, anyone who cares about security and runs Win7 or Win2k8 or newer
should be using virtual service accounts and managed service accounts.

https://technet.microsoft.com/en-us/library/dd548356

Those are more like Unix service accounts. Notably they don't need a
password, getting rid of some of the management pain that led us to abandon
the 'postgres' system user on Windows.

Now that older platforms are EoL and even the oldest that added this
feature are also near EoL or in extended maintenance, I think installers
should switch to these by default instead of using NETWORK SERVICE.

Then the issue of priv dropping would be a lesser concern anyway.


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Mithun Cy
On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas  wrote:
> Committed.

Thanks Robert,

And also sorry, one unfortunate thing happened in the last patch while
fixing one of the review comments a variable disappeared from the
equation
@_hash_spareindex.

splitpoint_phases +=
-   (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
+   (((num_bucket - 1) >>
+ (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
 HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */

I wanted most significant 3 bits. And while fixing comments in patch11
I unknowingly somehow removed splitpoint_group from the equation.
Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
pointing the mistake.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


_hash_spareindex_defect.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cy  wrote:
> On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas  wrote:
>> Committed.
>
> Thanks Robert,
>
> And also sorry, one unfortunate thing happened in the last patch while
> fixing one of the review comments a variable disappeared from the
> equation
> @_hash_spareindex.
>
> splitpoint_phases +=
> -   (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
> +   (((num_bucket - 1) >>
> + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
>  HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */
>
> I wanted most significant 3 bits. And while fixing comments in patch11
> I unknowingly somehow removed splitpoint_group from the equation.
> Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
> pointing the mistake.

Ugh, OK, committed that also.  Please try to be more careful in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund  wrote:
> I don't think the parallel seqscan is comparable in complexity with the
> parallel append case.  Each worker there does the same kind of work, and
> if one of them is behind, it'll just do less.  But correct sizing will
> be more important with parallel-append, because with non-partial
> subplans the work is absolutely *not* uniform.

Sure, that's a problem, but I think it's still absolutely necessary to
ramp up the maximum "effort" (in terms of number of workers)
logarithmically.  If you just do it by costing, the winning number of
workers will always be the largest number that we think we'll be able
to put to use - e.g. with 100 branches of relatively equal cost we'll
pick 100 workers.  That's not remotely sane.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-04-04 Thread Masahiko Sawada
On Tue, Apr 4, 2017 at 10:09 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Fri, 31 Mar 2017 18:20:23 +0900, Masahiko Sawada  
> wrote in 
>> On Wed, Mar 29, 2017 at 12:46 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello, it would be too late but I'd like to propose this because
>> > this cannot be back-patched.
>> >
>> >
>> > In autovacuum logs, "%u skipped frozen" shows the number of pages
>> > skipped by ALL_FROZEN only in aggressive vacuum.
>> >
>> > So users cannot tell whether '0 skipped-frozen' means a
>> > non-agressive vacuum or no frozen-pages in an agressive vacuum.
>> >
>> > I think it is nice to have an indication whether the scan was
>> > "agressive" or not in log output.
>>
>> Good idea. I also was thinking about this.
>
> Thanks. Currently we cannot use "skipped-frozen" to see the
> effect of ALL_FROZEN.
>
>> > Like this,
>> >
>> >> LOG:  automatic aggressive vacuum of table 
>> >> "template1.pg_catalog.pg_statistic": index scans: 0
>> >
>> > "0 skipped frozen" is uesless in non-aggressive vacuum but
>> > removing it would be too-much.  Inserting "aggressive" reduces
>> > machine-readability so it might be better in another place. The
>> > attached patch does the following.
>> >
>> >>  LOG:  automatic vacuum of table "postgres.public.pgbench_branches": 
>> >> mode: normal, index scans: 0
>> >>  LOG:  automatic vacuum of table "postgres.public.pgbench_branches": 
>> >> mode: aggressive, index scans: 0
>> >
>>
>> Should we add this even to the manual vacuum verbose message?
>
> I forgot that. The patch adds the mode indication in the first
> message of VACUUM VERBOSE.
>
> | =# vacuum freeze verbose it;
> | INFO:  vacuuming "public.it" in aggressive mode
> | INFO:  "it": found 0 removable, 0 nonremovable row versions in 0 out of 0 
> pages
> ...
> | Skipped 0 pages due to buffer pins, 0 frozen pages.
>
> I still feel a bit uneasy about the word "aggressive" here.

I think we can use the word "aggressive" here since we already use the
word "aggressive vacuum" in docs[1], but it might be easily
misunderstood.

[1] https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html

>Is it better to be "freezing" or something?

An another idea can be something like "prevent wraparound". The
autovaucum process doing aggressive vacuum appears in pg_stat_activity
with the word " (to prevent wraparound)". This word might be more
user friendly IMO.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 6:29 AM, Amit Kapila  wrote:
> On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharma  
> wrote:
>> My guess (which could be wrong) is that so->hashso_bucket_buf =
>>> InvalidBuffer should be moved back up higher in the function where it
>>> was before, just after the first if statement, and that the new
>>> condition so->hashso_bucket_buf == so->currPos.buf on the last
>>> _hash_dropbuf() should be removed.  If that's not right, then I think
>>> I need somebody to explain why not.
>>
>> Okay, as i mentioned above, in case of page scan mode we only keep pin
>> on a bucket buf. There won't be any case where we will be having pin
>> on overflow buf at the end of scan.
>>
>
> What if mark the buffer as invalid after releasing the pin?  We are
> already doing it that way in btree code, refer
> _bt_drop_lock_and_maybe_pin().  I think if we do that way, then we can
> do what Robert is suggesting.

Please continue reviewing, but I think we're out of time to get this
patch into v10.  This patch seems to be still under fairly heavy
revision, and we're only a couple of days from feature freeze, and the
patch upon which it depends (page-at-a-time vacuum) has had no fewer
than four follow-up commits repairing various problems with the logic,
with no guarantee that we've found all the bugs yet.  In view of those
facts, I don't think it would be wise for me to commit this to v10, so
I'm instead going to move it to the next CommitFest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait event documentation

2017-04-04 Thread Amit Langote
On Tue, Apr 4, 2017 at 9:05 PM, Robert Haas  wrote:
> On Mon, Apr 3, 2017 at 11:57 PM, Amit Langote
>  wrote:
>> By the way, wonder if it wouldn't make sense to take the whole Table 28.1.
>> Dynamic Statistics Views into a new section (perhaps before 28.2 Viewing
>> Locks or after), since those views display information different from what
>> the statistics collector component collects and publishes (those in the
>> Table 28.2. Collected Statistics Views).
>
> It seems a little short for that, doesn't it?  I mean, I'm not against
> it on principle, but we don't want to hoist a 5-line table into a
> section by itself.

I was thinking that if we move Table 28.1 to a new section, Tables
28.3 (pg_stat_activity) to 28.8 (pg_stat_ssl) will go too.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >