date:20170213

Re: [HACKERS] pg_waldump's inclusion of backend headers is a mess

2017-02-13 Thread Fabien COELHO



You may want to name the new headers dedicated to WAL records with 
_xlog.h as suffix though, like gin_xlog.h instead of ginxlog.h.


Should not it be more consistent to use "*_wal.h", after all these efforts 
to move "xlog" to "wal" everywhere?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Fabien COELHO



Hello Corey,


If I can find some simple mnemonic for "," vs "@" for being executed vs
ignored, I could live with that, but nothing obvious comes to my mind.


@in't gonna execute it?


Hmmm... This is too much of an Americanism, IMHO.


I'm here all week, try the veal.


Sorry, syntax error, you have lost me. Some googling suggests a reference 
to post WW2 "lounge entertainers", probably in the USA. I also do not 
understand why this would mean "yes".



I'd be fine with either of these on aesthetic grounds. On technical
grounds, 'z' is harder to show.


I'm not sure that this valid technical point should be a good reason for 
guiding what feedback should be provided to the user, but it makes it 
simpler to choose two states:-)


For three states with more culturally neutral mnemonics, I thought of:
  ? for f (waiting for a true answer...)
  . for z (waiting for the end of the sentence, i.e. endif)
  & for t (no real mnemonic)

For two states:
  * for being executed (beware, it is ***important***)
  / for not (under the hood, and it is opposed to *)

Otherwise I still like "?[tfz]", but it is two characters long.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-02-13 Thread Iwata, Aya

Hi,

On 18 November 2016 at 08:18, Craig Ringer wrote:

>At this point I doubt I'll be able to

>get update it again in time for v10, so anyone who wants to adopt it

>is welcome.

I am interested in pipeline/batch support for ECPG, and found this thread.

I updated Craig's patch [1] to apply this one to HEAD. Moreover, I fixed an 
easy typo.

First, I'm changing PQqueriesInBatch() to work as documented.

After that, I plan to reflect contents of reviews in the patch.

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite wrote:

> Wouldn't pgbench benefit from it?

> It was mentioned some time ago [1], in relationship to the

> \into construct, how client-server latency was important enough to

> justify the use of a "\;" separator between statements, to send them

> as a group.

>

> But with the libpq batch API, maybe this could be modernized

> with meta-commands like this:

>   \startbatch

>   ...

>   \endbatch

I'm planning to work on meta-commands to support batch mode after committing 
this patch successfully.

[1]https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

Regards,

Aya Iwata

FUJITSU

0001-Pipelining-batch-support-for-libpq-v3.patch
Description: 0001-Pipelining-batch-support-for-libpq-v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-13 Thread Amit Langote

On 2017/02/13 14:21, Amit Langote wrote:
> On 2017/02/10 19:19, Simon Riggs wrote:
>> On 10 February 2017 at 08:18, Amit Langote
>>  wrote:
>>
>>> I agree that getting the proposed documentation changes committed would be
>>> a step ahead.
>>
>> Committed. I tested how it works and added documentation that matched
>> my experiences, correcting what you'd said and adding more information
>> for clarity as I went.
>
> Thanks for making the improvements to and committing the patch!

Since I had added this to CF 2017-03, I have marked it as committed.

https://commitfest.postgresql.org/13/983/

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : Parallel Merge Join

2017-02-13 Thread Amit Kapila

On Wed, Jan 4, 2017 at 9:27 PM, Dilip Kumar  wrote:
> On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila  wrote:

>> 3.
>> + /*
>> + * See comments in try_partial_nestloop_path().
>> + */
>>
>> This same code exists in try_partial_nestloop_path() and
>> try_partial_hashjoin_path() with minor difference of code in
>> try_partial_hashjoin_path().  Can't we write a separate inline
>> function for this code and call from all the three places.
>
> It's a small check, is it ok to make it the separate function. I have
> also observed similar kind of duplicate code in try_mergejoin_path and
> try_hashjoin_path. However, if you think that it will be better to
> move that check to an inline function I can submit a seperate patch in
> this thread as a refactoring patch?
>

Let's keep that check unchanged.

Few review comments on the latest version of the patch:
1.
- if (joinrel->consider_parallel && nestjoinOK &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- bms_is_empty(joinrel->lateral_relids))
+ if (outerrel->partial_pathlist == NIL ||
+ !joinrel->consider_parallel ||
+ save_jointype == JOIN_UNIQUE_OUTER ||
+ !bms_is_empty(joinrel->lateral_relids) ||
+ jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT)
+ return;

For the matter of consistency, I think the above checks should be in
same order as they are in function hash_inner_and_outer().  I wonder
why are you using jointype in above check instead of save_jointype, it
seems to me we can use save_jointype for all the cases.

2.
+ generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
+
jointype, extra, false,
+
inner_cheapest_total, merge_pathkeys,
+
true);

IIUC, the reason of passing a 7th parameter (useallclauses) as false
is that it can be true only for Right and Full joins and for both we
don't generate partial merge join paths.  If so, then I think it is
better to add a comment about such an assumption, so that we don't
forget to update this code in future if we need to useallclauses for
something else.  Another way to deal with this is that you can pass
the value of useallclauses to consider_parallel_mergejoin() and then
to generate_mergejoin_paths().  I feel second way is better, but if
for some reason you don't find it appropriate then at least add a
comment.

3.
generate_mergejoin_paths()
{
..
..
- try_mergejoin_path(root,
-   joinrel,
-   outerpath,
-   innerpath,
-   merge_pathkeys,
-   newclauses,
-   NIL,
-   NIL,
-   jointype,
-   extra);

+ if (!is_partial)
+ try_mergejoin_path(root,
+   joinrel,
+   outerpath,
+   innerpath,
+   merge_pathkeys,
+   newclauses,
+   NIL,
+   NIL,
+   jointype,
+   extra);

+ /* Generate partial path only if innerpath is parallel safe. */
+ else if (innerpath->parallel_safe)
+ try_partial_mergejoin_path(root,
+   joinrel,
+   outerpath,
+   innerpath,
+   merge_pathkeys,
+   newclauses,
+   NIL,
+   NIL,
+   jointype,
+   extra);

..
}

The above and similar changes in generate_mergejoin_paths() doesn't
look good and in some cases when innerpath is *parallel-unsafe*, you
need to perform some extra computation which is not required.  How
about writing a separate function generate_partial_mergejoin_paths()?
I think you can save some extra computation and it will look neat.  I
understand that there will be some code duplication, but not sure if
there is any other better way.

How about adding one simple test case as I have kept for other
parallel patches like parallel index scan, parallel subplan, etc.?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 12:56 PM, Michael Paquier
 wrote:
> This way you can monitor the time it takes to build
> completely each index, including its .

You can ignore the terms "including its" here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 3:19 PM, Seki, Eiji  wrote:
> This change will be useful for features that only reads rows that are visible 
> by all transactions and could not wait specific processes (VACUUM, ANALYZE, 
> etc...) for performance. Our company (Fujitsu) is developing such an 
> extension. In our benchmark, we found that waiting an ANALYZE process created 
> by autovacuum daemon often has a significant impact to the performance 
> although the waited process do only reading as to the table. So I hope to 
> ignore it using GetOldestXminExtend as below. The second argument represents 
> flags to ignore.
>
>   GetOldestXminExtended(rel, PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING | 
> PROC_IN_ANALYZE);
>
> Note: We can ignore VACUUM processes or VACUUM with ANALYZE processes using 
> the second option of GetOldesXmin, "ignoreVacuum". However, we cannot ignore 
> only ANALYZE processes because the "ignoreVacuum" = true is same to the 
> following.

 GetOldestXmin(Relation rel, bool ignoreVacuum)
 {
+   uint8 ignoreFlags = PROC_IN_LOGICAL_DECODING;
+
+   if (ignoreVacuum)
+   ignoreFlags |= PROC_IN_VACUUM;
+
+   return GetOldestXminExtended(rel, ignoreFlags);
+}

It seems to me that it would be far less confusing to just replace the
boolean argument of GetOldestXmin by a uint8 and get those flags
declared in procarray.h, no? There are a couple of code path calling
GetOldestXmin() that do not include proc.h, so it would be better to
not add any extra header dependencies in those files.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib modules and relkind check

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 3:18 PM, Amit Langote
 wrote:
> On 2017/02/13 14:59, Michael Paquier wrote:
> I see, thanks for explaining.  Implemented that in the attached, also
> fixing the errcode.  Please see if that's what you had in mind.

Yes. That's it, the overall patch footprint is reduced.

> I was tempted for a moment to name the functions
> check_relation_has_storage() with the message "relation %s has no storage"
> and check_relation_has_visibility_info() with a similar message, but soon
> got over it. ;)

I like the format of your patch: simple and it goes to the point.

> No new regression tests. I think we should create a dummy partitioned 
> table
> for each contrib and show that it fails.

 Yep, good point. That's easy enough to add.
>>>
>>> I added tests with a partitioned table to pageinspect's page.sql and
>>> pgstattuple.sql.  I don't see any tests for pg_visibility module, maybe I
>>> should add?
>>
>> Checking those error code paths would be nice. It would be nice as
>> well that the relation types that are expected to be supported and
>> unsupported are checked. For pageinspect the check range is correct.
>> There are some relkinds missing in pgstatindex though.
>
> Added more tests in pgstattuple and the new ones for pg_visibility,
> although I may have overdone the latter.

A bonus idea is also to add tests for relkinds that work, with for
example the creation of a table, inserting some data in it, vacuum it,
and look at "SELECT count(*) > 0 FROM pg_visibility('foo'::regclass)".

> In certain contexts where a subset of relkinds are allowed and others are
> not or vice versa, partitioned tables are still referred to as "tables".
> That's because we still use CREATE/DROP TABLE to create/drop them and
> perhaps more to the point, their being partitioned is irrelevant.
>
> Examples of where partitioned tables are referred to as tables:
>
> [...]
>
> In other contexts, where a table's being partitioned is relevant, the
> message is shown as "relation is/is not partitioned table".  Examples:
>
> [...]

Hm... It may be a good idea to be consistent on the whole system and
refer to "partitioned table" as a table without storage and used as an
entry point for partitions. The docs use this term in CREATE TABLE,
and we would finish with messages like "not a table or a partitioned
table". Extra thoughts are welcome here, the current inconsistencies
would be confusing for users.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-02-13 Thread Seki, Eiji

Hi all, 

I propose the patch that adds a function GetOldestXminExtended that is like 
GetOldestXmin but can ignore arbitrary vacuum flags. And then, rewrite 
GetOldestXmin to use it. Note that this is done so as not to change the 
behavior of GetOldestXmin.

This change will be useful for features that only reads rows that are visible 
by all transactions and could not wait specific processes (VACUUM, ANALYZE, 
etc...) for performance. Our company (Fujitsu) is developing such an extension. 
In our benchmark, we found that waiting an ANALYZE process created by 
autovacuum daemon often has a significant impact to the performance although 
the waited process do only reading as to the table. So I hope to ignore it 
using GetOldestXminExtend as below. The second argument represents flags to 
ignore.

  GetOldestXminExtended(rel, PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING | 
PROC_IN_ANALYZE);

Note: We can ignore VACUUM processes or VACUUM with ANALYZE processes using the 
second option of GetOldesXmin, "ignoreVacuum". However, we cannot ignore only 
ANALYZE processes because the "ignoreVacuum" = true is same to the following.

  GetOldestXminExtended(rel, PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING)

This change should have no impact to the existing GetOldestXmin without slight 
overhead to call the function.

I'm not sure that this feature is useful in general.
Please let me know your opinion if you are interested.

Regards,
Eiji Seki
Fujitsu.


get_oldest_xmin_extended.patch
Description: get_oldest_xmin_extended.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib modules and relkind check

2017-02-13 Thread Amit Langote

On 2017/02/13 14:59, Michael Paquier wrote:
> On Fri, Feb 10, 2017 at 4:29 PM, Amit Langote wrote:
>> On 2017/02/10 14:32, Michael Paquier wrote:
>>> The visibility checks are localized in pg_visibility.c and the storage
>>> checks in pgstatindex.c, so yes we could have macros in those files.
>>> Or even better: just have a sanity check routine as the error messages
>>> are the same everywhere.
>>
>> If I understand correctly what's being proposed here, tablecmds.c has
>> something called ATWrongRelkindError() which sounds (kind of) similar.
>> It's called by ATSimplePermissions() whenever it finds out that relkind of
>> the table specified in a given ALTER TABLE command is not in the "allowed
>> relkind targets" for the command.  Different ALTER TABLE commands call
>> ATSimplePermissions() with an argument that specifies the "allowed relkind
>> targets" for each command.   I'm not saying that it should be used
>> elsewhere, but if we do invent a new centralized routine for relkind
>> checks, it would look similar.
> 
> You did not get completely what I wrote upthread. This check is
> repeated 3 times in pg_visibility.c:
> +   /* Other relkinds don't have visibility info */
> +   if (rel->rd_rel->relkind != RELKIND_RELATION &&
> +   rel->rd_rel->relkind != RELKIND_MATVIEW &&
> +   rel->rd_rel->relkind != RELKIND_TOASTVALUE)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("\"%s\" is not a table, materialized view, or
> TOAST table",
> +   RelationGetRelationName(rel;
> 
> And this one is present 4 times in pgstatindex.c:
> +   /* only the following relkinds have storage */
> +   if (rel->rd_rel->relkind != RELKIND_RELATION &&
> +   rel->rd_rel->relkind != RELKIND_INDEX &&
> +   rel->rd_rel->relkind != RELKIND_MATVIEW &&
> +   rel->rd_rel->relkind != RELKIND_SEQUENCE &&
> +   rel->rd_rel->relkind != RELKIND_TOASTVALUE)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("\"%s\" is not a table, index, materialized
> view, sequence, or TOAST table",
> +   RelationGetRelationName(rel;
> 
> So the suggestion is simply to encapsulate such blocks into their own
> function like check_relation_relkind, each one locally in their
> respective contrib's file. And each function includes the error
> message, which should use ERRCODE_WRONG_OBJECT_TYPE as errcode by the
> way.

I see, thanks for explaining.  Implemented that in the attached, also
fixing the errcode.  Please see if that's what you had in mind.

I was tempted for a moment to name the functions
check_relation_has_storage() with the message "relation %s has no storage"
and check_relation_has_visibility_info() with a similar message, but soon
got over it. ;)

 No new regression tests. I think we should create a dummy partitioned table
 for each contrib and show that it fails.
>>>
>>> Yep, good point. That's easy enough to add.
>>
>> I added tests with a partitioned table to pageinspect's page.sql and
>> pgstattuple.sql.  I don't see any tests for pg_visibility module, maybe I
>> should add?
> 
> Checking those error code paths would be nice. It would be nice as
> well that the relation types that are expected to be supported and
> unsupported are checked. For pageinspect the check range is correct.
> There are some relkinds missing in pgstatindex though.

Added more tests in pgstattuple and the new ones for pg_visibility,
although I may have overdone the latter.

>> Although, I felt a bit uncomfortable the way the error message looks, for
>> example:
>>
>> + -- check that using any of these functions with a partitioned table
>> would fail
>> + create table test_partitioned (a int) partition by range (a);
>> + select pg_relpages('test_partitioned');
>> + ERROR:  "test_partitioned" is not a table, index, materialized view,
>> sequence, or TOAST table
> 
> Would it be a problem to mention this relkind as just "partitioned
> table" in the error message.

In certain contexts where a subset of relkinds are allowed and others are
not or vice versa, partitioned tables are still referred to as "tables".
That's because we still use CREATE/DROP TABLE to create/drop them and
perhaps more to the point, their being partitioned is irrelevant.

Examples of where partitioned tables are referred to as tables:

1. The following check in get_relation_by_qualified_name():

case OBJECT_TABLE:
if (relation->rd_rel->relkind != RELKIND_RELATION &&
relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("\"%s\" is not a table",
RelationGetRelationName(relation;

2. The following in DefineQueryRewrite() (from the rewriter's point of view):

/*
 * Verify relation is of a type that rules can sensibly be applied to.
 * Internal callers can target

Re: [HACKERS] pg_waldump's inclusion of backend headers is a mess

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 12:42 PM, Robert Haas  wrote:
> Therefore, I proposed the attached patch, which splits spgxlog.h out
> of spgist_private.h, nbtxlog.h out of nbtree.h, gistxlog.h out of
> gist_private.h, and ginxlog.h and ginblock.h out of gin_private.h.
> These new header files are included by pg_waldump in lieu of the
> "private" versions.  This solves the immediate problem and I suspect
> it will head off future problems as well.
>
> Thoughts, comments, objections, better ideas?

+1 for the overall move. You may want to name the new headers
dedicated to WAL records with _xlog.h as suffix though, like
gin_xlog.h instead of ginxlog.h. All the existing RMGRs doing this
separation (heap, brin, hash and generic) use this format.

The patch looks rather sane to me.

--- /dev/null
+++ b/src/include/access/nbtxlog.h
@@ -0,0 +1,255 @@
+/*-
+ *
+ * nbtree.h
+ *  header file for postgres btree xlog routines
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/nbxlog.h
+ *
+ *-
+ */
This file name is incorrect, as well as the description.

+/*--
+ * ginxlog.h
+ *   header file for postgres inverted index xlog implementation.
+ *
+ * Copyright (c) 2006-2017, PostgreSQL Global Development Group
+ *
+ * src/include/access/gin_private.h
+ *--
+ */
Bip. Error.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Dilip Kumar

On Tue, Feb 14, 2017 at 6:51 AM, Haribabu Kommi
 wrote:
> +#if SIZEOF_DSA_POINTER == 4
> +typedef uint32 dsa_pointer;
> +#else
> +typedef uint64 dsa_pointer;
> +#endif
>
> I feel the declaration of the above typdef can be moved into the
> section above if we going with the current move into postgres.h
> file.

Robert has already given the patch[1] for the same so now don't need
to do anything for this. Therefore, I included the dsa.h in
tidbitmap.h.

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZ%3DF%3DGkxV0YEv-A8tb%2BAEGy_Qa7GSiJ8deBKFATnzfEug%40mail.gmail.com
>
> +/*
> + * tbm_alloc_shared
> + *
> + * Callback function for allocating the memory for hashtable elements.
> + * It allocates memory from DSA if tbm holds a reference to a dsa.
> + */
> +static inline void *
> +pagetable_allocate(pagetable_hash *pagetable, Size size)
>
> Function name and comments mismatch?

Fixed.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0001-tidbitmap-support-shared-v2.patch
Description: Binary data


0002-parallel-bitmap-heapscan-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-13 Thread Michael Paquier

On Thu, Feb 2, 2017 at 3:01 PM, Higuchi, Daisuke
 wrote:
> From: Michael Paquier [mailto:michael.paqu...@gmail.com]
>>This has not been added yet to the next CF. As Ashutosh mentioned
>>things tend to be easily forgotten. So I have added it here:
>>https://commitfest.postgresql.org/13/982/
> Thank you for adding this problem to CF.

I have added this thread to the list of open items for PG10:
https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-13 Thread Pavel Stehule

2017-02-14 3:50 GMT+01:00 Andres Freund :

> On 2017-02-14 11:46:52 +0900, Michael Paquier wrote:
> > > I still fail to see why --use-existing as suggested in
> > > https://www.postgresql.org/message-id/20170208002900.
> vkldujzfkwbvq...@alap3.anarazel.de
> > > isn't sufficient.
> >
> > Some tests create objects without removing them, meaning that
> > continuous runs would fail with only --use-existing. This patch brings
> > value in such cases.
>
> You can trivially script the CREATE/DROP DB outside with
> --use-existing. Which seems a lot more flexible than adding more and
> more options to pg_regress.
>

Using template is natural and very simply solution - more it doesn't need
any outer scripts - so infrastructure for test can be pretty simply in this
case.

Regards

Pavel

Re: [HACKERS] parallelize queries containing subplans

2017-02-13 Thread Amit Kapila

On Thu, Feb 2, 2017 at 9:23 AM, Amit Kapila  wrote:
> On Wed, Feb 1, 2017 at 10:04 PM, Robert Haas  wrote:
>> On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila  wrote:
>>> Moved this patch to next CF.
>>
>> So here is what seems to be the key hunk from this patch:
>>
>>  /*
>> - * Since we don't have the ability to push subplans down to workers at
>> - * present, we treat subplan references as parallel-restricted.  We need
>> - * not worry about examining their contents; if they are unsafe, we 
>> would
>> - * have found that out while examining the whole tree before reduction 
>> of
>> - * sublinks to subplans.  (Really we should not see SubLink during a
>> - * max_interesting == restricted scan, but if we do, return true.)
>> + * We can push the subplans only if they don't contain any 
>> parallel-aware
>> + * node as we don't support multi-level parallelism (parallel workers
>> + * invoking another set of parallel workers).
>>   */
>> -else if (IsA(node, SubLink) ||
>> - IsA(node, SubPlan) ||
>> - IsA(node, AlternativeSubPlan))
>> +else if (IsA(node, SubPlan))
>> +return !((SubPlan *) node)->parallel_safe;
>> +else if (IsA(node, AlternativeSubPlan))
>>  {
>> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
>> -return true;
>> +AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
>> +ListCell   *lc;
>> +
>> +foreach(lc, asplan->subplans)
>> +{
>> +SubPlan*splan = (SubPlan *) lfirst(lc);
>> +
>> +Assert(IsA(splan, SubPlan));
>> +
>> +if (max_parallel_hazard_walker((Node *) splan, context))
>> +return true;
>> +}
>> +
>> +return false;
>>  }
>>
>> I don't see the reason for having this new code that makes
>> AlternativeSubPlan walk over the subplans; expression_tree_walker
>> already does that.
>>

I have removed the check of AlternativeSubPlan so that it can be
handled by expression_tree_walker.

>
> It is done this way mainly to avoid recursion/nested calls, but I
> think you prefer to handle it via expression_tree_walker so that code
> looks clean.  Another probable reason could be that there is always a
> chance that we miss handling of some expression of a particular node
> (in this case AlternativeSubPlan), but if that is the case then there
> are other places in the code which do operate on individual subplan/'s
> in AlternativeSubPlan list.
>
>>  On the flip side I don't see the reason for
>> removing the max_parallel_hazard_test() call for AlternativeSubPlan or
>> for SubLink.
>
> If we don't remove the current test of max_parallel_hazard_test() for
> AlternativeSubPlan, then AlternativeSubPlan node will be considered
> parallel restricted, so why do we want that after this patch.  For
> Sublinks, I think they would have converted to Subplans by the time we
> reach this function for the parallel restricted check.  Can you
> elaborate what you have in mind for the handling of AlternativeSubPlan
> and SubLink?
>

I have removed the changes for SubLink node.

>> +* CTE scans are not consider for parallelism (cf
>>
>>
>> considered
>>

Fixed.

>> +   select count(*)from tenk1 where (two, four) not in
>>
>> whitespace

Fixed.


Attached patch fixes all the comments raised till now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 5:38 PM, Stephen Frost  wrote:
> I really do not think the PG core project should be held hostage by an
> external and apparently not-really-maintained project.  What if we
> introduce some other difference in PG10 that breaks pgAdmin3?  Are we
> going to roll that change back?  Are we sure that none exists already?

As a general rule, I don't agree that taking account of what will
break external projects constitutes being "held hostage".  I think
that kind of hyperbole is unhelpful.  How about we call it "trying not
to gratuitously break popular third-party tools"?

But in this case, I conceded the exact point that you are making here
later on in the exact same email to which you are replying, so I'm a
little mystified by the way you wrote this response.

> In short, my
> recollection is that we added them because it was easy to do at the time
> and we didn't have the foresight to realize just how hard they would
> become to get rid of and how much time they would suck up from people
> arguing about them.

I'm pretty sure we've spent more time arguing about them than it would
have taken to maintain them from now until 2030, and I don't really
understand why you're on the warpath to get rid of them.  But I don't
really care about it enough to keep arguing now that I've realized
pgAdmin3 isn't going to work with PG 10 either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 11:32 AM, Andreas Karlsson  wrote:
> On 02/13/2017 06:31 AM, Michael Paquier wrote:
>> Er, something like that as well, no?
>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
>
> REINDEX (VERBOSE) currently prints one such line per index, which does not
> really work for REINDEX (VERBOSE) CONCURRENTLY since it handles all indexes
> on a relation at the same time. It is not immediately obvious how this
> should work. Maybe one such detail line per table?

Hard to recall this thing in details with the time and the fact that a
relation is reindexed by processing all the indexes once at each step.
Hm... What if ReindexRelationConcurrently() actually is refactored in
such a way that it processes all the steps for each index
individually? This way you can monitor the time it takes to build
completely each index, including its . This operation would consume
more transactions but in the event of a failure the amount of things
to clean up is really reduced particularly for relations with many
indexes. This would as well reduce VERBOSE to print one line per index
rebuilt.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 5:29 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost  wrote:
>> > Note that these views have not been consistently maintained and have
>> > ended up including some role attributes from recent versions
>>
>> That's not a bug.  According to the documentation, these views exist
>> for compatibility with PostgreSQL versions before 8.1, so there's no
>> need to update them with newer fields.  Clients who are expecting to
>> talk with a pre-8.1 PostgreSQL won't expect those fields to be present
>> anyway.
>
> Yet we added bypassrls to them, after a similar discussion of how
> they're for backwards compat and we can't get rid of them, but we should
> update them with new things, blah, blah.

My recollection of that conversation is a little fuzzy and I can't
immediately find the email in the archives, but I don't think I argued
for that; I think I may have argued against it; I don't think that I
understood the point of it then and I don't now.  In short, if you're
feeling under some kind of obligation to update those views, don't
feel obliged on my account.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 8:48 AM, Dilip Kumar  wrote:
> On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas  wrote:
>> I don't think it's acceptable (or necessary) to move the DSA
>> definitions into postgres.h.  Why do you think you need to do that,
>> vs. just including dsa.h in a few more places?
>
> I need to access dsa_pointer in tidbitmap.h, which is included from
> FRONTEND as well. Now, problem is that dsa.h is including #include
> "port/atomics.h", but atomic.h can not be included if FRONTEND is
> defined.
>
> #ifndef ATOMICS_H
> #define ATOMICS_H
> #ifdef FRONTEND
> #error "atomics.h may not be included from frontend code"
> #endif
>
> Is there any other solution to this ?

Well, any problem like this generally has a bunch of solutions, so
I'll say yes.  I spent a good chunk of today studying the issue and
started a new thread devoted specifically to it:

https://www.postgresql.org/message-id/CA%2BTgmoZ%3DF%3DGkxV0YEv-A8tb%2BAEGy_Qa7GSiJ8deBKFATnzfEug%40mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 5:19 PM, Simon Riggs  wrote:
> On 13 February 2017 at 17:12, Robert Haas  wrote:
>> On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby  wrote:
>>> On 2/10/17 2:33 PM, Robert Haas wrote:
 That having been said, I think it could certainly be useful to have
 more control over what DDL gets logged in foreground processes.
>>>
>>> FWIW, this is a significant problem outside of DDL. Once you're past 1-2
>>> levels of nesting SET client_min_messages = DEBUG becomes completely
>>> useless.
>>>
>>> I think the ability to filter logging based on context would be very
>>> valuable. AFAIK you could actually do that for manual logging with existing
>>> plpgsql support, but obviously that won't help for anything else.
>>
>> Well, that's moving the goalposts a lot further and in an unrelated
>> direction.  I don't think that it's a good idea to change the
>> semantics of log_autovacuum_min_duration in the way Simon is proposing
>> for the reasons I noted, but I think that an acceptable patch could be
>> 100 lines of pretty straightforward code and documentation, like a new
>> GUC that controls this output for the vac-non-autovac case.
>
> If my idea would not log manual ANALYZE, well, we can add that in
> easily. There is no reason to block the patch for such a minor foible.
>
> This is a short patch to address a specific minor issue, not a blue
> sky redesign of logging.
>
> If someone else wants to add more, they can, later. Incremental
> change, just as happens all the time everywhere else.

Please don't ignore the other points in my original response.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_waldump's inclusion of backend headers is a mess

2017-02-13 Thread Robert Haas

Dilip complained on another thread about the apparent difficulty of
getting the system to compile after adding a reference to dsa_pointer
to tidbitmap.c:

https://www.postgresql.org/message-id/CAFiTN-u%3DBH_b0QE9FG%2B%2B_Ht6Q%3DF84am%3DJ-rt7fNbQkq3%2BqX3VQ%40mail.gmail.com

I dug into the problem and discovered that pg_waldump is slurping up a
tremendous crapload of backend headers.  It includes gin.h,
gist_private.h, hash_xlog.h, nbtree.h, and spgist.h, which all end up
including amapi.h, which includes genam.h, which includes tidbitmap.h.
When you make tidbitmap.h include utils/dsa.h, it in turn includes
port/atomics.h, which has an #error preventing frontend includes.

There are a number of ways to fix this problem; probably the cheapest
available hack is to stick #ifndef FRONTEND around the additional
stuff getting added to tidbitmap.h.  But that seems to be attacking
the problem from the wrong end.  The problem isn't really that having
tidbitmap.h refer to backend-only stuff is unreasonable; it's that
pg_waldump is skating on thin ice by importing so many backend-only
headers.  It's only a matter of time before some other one of the many
files that it directly or indirectly includes develops a similar
problem.

Therefore, I proposed the attached patch, which splits spgxlog.h out
of spgist_private.h, nbtxlog.h out of nbtree.h, gistxlog.h out of
gist_private.h, and ginxlog.h and ginblock.h out of gin_private.h.
These new header files are included by pg_waldump in lieu of the
"private" versions.  This solves the immediate problem and I suspect
it will head off future problems as well.

Thoughts, comments, objections, better ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


disentangle-am-xlog.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 2:07 AM, Fujii Masao  wrote:
> No, the tab-completions for ALTER/DROP PUBLICATION should show the local
> publications because those commands drop and alter the local ones. OTOH,
> CREATE/ALTER SUBSCRIPTOIN ... PUBLICATION should show nothing because
> the remote publications in the publisher side should be specified there.

Doh, yes. You are right about that.

>> - Addition of a view pg_subscriptions with all the non-sensitive data.
>> (- Or really extend pg_stat_subscriptions with the database ID and use
>> it for tab completion?)
>
> Probably I failed to get Peter's point... Anyway IMO that we can expose all 
> the
> columns except the sensitive information (i.e., subconninfo field)
> in pg_subscription to even non-superusers. Then we can use pg_subscription
> for the tab-completion for ALTER/DROP SUBSCRIPTION.

To be honest, I find subconninfo quite useful to check where a
subscription is getting its changes from, so I'd rather not touch it.
It looks as well a bit overkill to just create a new view on an object
type non-superusers cannot even use... There are already 1 view and 1
system catalog related to it, so I'd be of the opinion to let it fail
silently with a permission error and keep it as an empty list for
them.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible TODO: allow arbitrary expressions in event trigger WHEN

2017-02-13 Thread Tom Lane

Jim Nasby  writes:
> Is there a reason not to allow $SUBJECT? Specifically, it'd be nice to 
> be able to do something like WHEN tag LIKE 'ALTER%'.

Seems like it would be a seriously bad idea for such an expression to be
able to invoke arbitrary SQL code.  What if it calls a user-defined
function that tries to do DDL?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-13 Thread Andres Freund

On 2017-02-14 11:46:52 +0900, Michael Paquier wrote:
> > I still fail to see why --use-existing as suggested in
> > https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de
> > isn't sufficient.
> 
> Some tests create objects without removing them, meaning that
> continuous runs would fail with only --use-existing. This patch brings
> value in such cases.

You can trivially script the CREATE/DROP DB outside with
--use-existing. Which seems a lot more flexible than adding more and
more options to pg_regress.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 11:36 AM, Andres Freund  wrote:
> On 2017-02-13 20:59:43 +0100, Pavel Stehule wrote:
>> Hi
>>
>> 2017-02-13 6:46 GMT+01:00 Michael Paquier :
>>
>> > On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule 
>> > wrote:
>> > > here is new update - check is done before any creating
>> >
>> > It may be better to do any checks before dropping existing databases
>> > as well... It would be as well just simpler to complain with a single
>> > error message like "database and template list lengths do not match".
>> >
>>
>> next step

This looks fine to me.

> I still fail to see why --use-existing as suggested in
> https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de
> isn't sufficient.

Some tests create objects without removing them, meaning that
continuous runs would fail with only --use-existing. This patch brings
value in such cases.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-13 Thread Andres Freund

On 2017-02-13 20:59:43 +0100, Pavel Stehule wrote:
> Hi
> 
> 2017-02-13 6:46 GMT+01:00 Michael Paquier :
> 
> > On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule 
> > wrote:
> > > here is new update - check is done before any creating
> >
> > It may be better to do any checks before dropping existing databases
> > as well... It would be as well just simpler to complain with a single
> > error message like "database and template list lengths do not match".
> >
> 
> next step

I still fail to see why --use-existing as suggested in
https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de
isn't sufficient.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-02-13 Thread Andreas Karlsson


On 02/13/2017 06:31 AM, Michael Paquier wrote:

- What should we do with REINDEX DATABASE CONCURRENTLY and the system
catalog? I so not think we can reindex the system catalog concurrently
safely, so what should REINDEX DATABASE do with the catalog indexes? Skip
them, reindex them while taking locks, or just error out?


System indexes cannot have their OIDs changed as they are used in
syscache lookups. So just logging a warning looks fine to me, and the
price to pay to avoid taking an exclusive lock even for a short amount
of time.


Good idea, I think I will add one line of warning if it finds any system 
index in the schema.



- What level of information should be output in VERBOSE mode?


Er, something like that as well, no?
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.


REINDEX (VERBOSE) currently prints one such line per index, which does 
not really work for REINDEX (VERBOSE) CONCURRENTLY since it handles all 
indexes on a relation at the same time. It is not immediately obvious 
how this should work. Maybe one such detail line per table?



This is a crasher:
create table aa (a int primary key);
reindex (verbose) schema concurrently public ;

For invalid indexes sometimes snapshots are still active (after
issuing the previous crash for example):
=# reindex (verbose) table concurrently aa;
WARNING:  XX002: cannot reindex concurrently invalid index
"public.aa_pkey_cct", skipping
LOCATION:  ReindexRelationConcurrently, indexcmds.c:2119
WARNING:  01000: snapshot 0x7fde12003038 still active


Thanks for testing the patch! The crash was caused by things being 
allocated in the wrong memory context when reindexing multiple tables 
and therefore freed on the first intermediate commit. I have created a 
new memory context to handle this in which I only allocate the lists 
which need to survive between transactions..


Hm, when writing the above I just realized why ReindexTable/ReindexIndex 
did not suffer from the same bug. It is because the first transaction 
there allocated in the PortalHeapMemory context which survives commit. I 
really need to look at if there is a clean way to handle memory contexts 
in my patch.


I also found the snapshot still active bug, it seems to have been caused 
by REINDEX TABLE CONCURRENTLY leaving an open snapshot which cannot be 
popped by PortalRunUtility().


Thanks again!
Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication and Character encoding

2017-02-13 Thread Kyotaro HORIGUCHI

At Sat, 4 Feb 2017 21:27:32 +0100, Petr Jelinek  
wrote in 
> Hmm I wonder if we should just make the subscriber send the
> client_encoding always (based on server encoding of the subscriber).
> That should solve the issue in combination with your patch no?

Yeah, right. I considered that a subscriber might want to set its
own value for that but that is useless.

The attached patch does the following things to just prevent
making a logical replication connection between databases with
inconsistent encodings.

- added client_encoding with subscriber(or standby)'s encoding at
  the last of options in libpqrcv_connect.

- CheckLogicalDecodingRequirements refuses connection for a
  request with inconsistent encodings.

> ERROR:  logical replication requires consistent encodings on both side 
> (publisher = UTF8, subscriber = EUC_JP)


We could check this earlier if involving physical replication but
I think this is a matter of logical replication.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e8233c47d174261a331718e9434d5fc825523305 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 14 Feb 2017 11:18:20 +0900
Subject: [PATCH] Refuse logical replication with inconsistent encodings

Logical replication requires encoding conversion of characters if the
publisher and subscriber are on different encodings. We could add
character conversion on-the-fly but we just hinhibit a connection for
the case as the first step.
---
 .../replication/libpqwalreceiver/libpqwalreceiver.c| 14 ++
 src/backend/replication/logical/logical.c  | 13 +
 2 files changed, 27 insertions(+)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 44a89c7..550a76d 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -24,6 +24,7 @@
 #include "access/xlog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "replication/walreceiver.h"
 #include "storage/proc.h"
@@ -134,9 +135,22 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	}
 	keys[++i] = "fallback_application_name";
 	vals[i] = appname;
+
+	/*
+	 * Override clinet_encoding with the database encoding for logical
+	 * replication.
+	 */
+	if (logical)
+	{
+		keys[++i] = "client_encoding";
+		vals[i] = GetDatabaseEncodingName();
+	}
+
 	keys[++i] = NULL;
 	vals[i] = NULL;
 
+	Assert(i < 5); /* size of keys/vals */
+
 	conn = palloc0(sizeof(WalReceiverConn));
 	conn->streamConn = PQconnectdbParams(keys, vals, /* expand_dbname = */ true);
 	if (PQstatus(conn->streamConn) != CONNECTION_OK)
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5529ac8..fc74ff1 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -33,6 +33,8 @@
 #include "access/xact.h"
 #include "access/xlog_internal.h"
 
+#include "mb/pg_wchar.h"
+
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
@@ -87,6 +89,17 @@ CheckLogicalDecodingRequirements(void)
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg("logical decoding requires a database connection")));
 
+	/*
+	 * Currently logical replication refuses subscription that requires
+	 * chacater conversion.
+	 */
+	if (pg_get_client_encoding() != GetDatabaseEncoding())
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication requires consistent encodings on both side (publisher = %s, subscriber = %s)",
+		GetDatabaseEncodingName(),
+		pg_get_client_encoding_name(;
+
 	/* 
 	 * TODO: We got to change that someday soon...
 	 *
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-13 Thread Andres Freund

Hi,

On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
> On 02/11/2017 02:33 AM, Andres Freund wrote:
> > > I have a hard time believing this the cache efficiency of linked lists
> > > (which may or may not be real in this case) out-weights this, but if you
> > > want to try, be my guest.
> > 
> > I'm not following - why would there be overhead in anything for
> > allocations bigger than 4 (or maybe 8) bytes? You can store the list
> > (via chunk ids, not pointers) inside the chunks itself, where data
> > otherwise would be.  And I don't see why you'd need a doubly linked
> > list, as the only operations that are needed are to push to the front of
> > the list, and to pop from the front of the list - and both operations
> > are simple to do with a singly linked list?
> > 
> 
> Oh! I have not considered storing the chunk indexes (for linked lists) in
> the chunk itself, which obviously eliminates the overhead concerns, and
> you're right a singly-linked list should be enough.
>
> There will be some minimum-chunk-size requirement, depending on the block
> size/chunk size. I wonder whether it makes sense to try to be smart and make
> it dynamic, so that we only require 1B or 2B for cases when only that many
> chunks fit into a block, or just say that it's 4B and be done with it.

I doubt it's worth it - it seems likely that the added branch is more
noticeable overall than the possible savings of 3 bytes. Also, won't the
space be lost due to alignment *anyway*?
+   /* chunk, including SLAB header (both addresses nicely aligned) */
+   fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
use a plain slist - no point in being more careful than that.


> I mean 2^32 chunks ought to be enough for anyone, right?

Yea, that seems enough; but given the alignment thing pointed out above,
I think we can just use plain pointers - and that definitely should be
enough :P


> I'm still not buying the cache efficiency argument, though. One of the
> reasons is that the implementation prefers blocks with fewer free chunks
> when handling palloc(), so pfree() is making the block less likely to be
> chosen by the next palloc().

That'll possibly de-optimize L1, but for L2 usage the higher density
seems like it'll be a win.  All this memory is only accessed by a single
backend, so packing as densely as possible is good.


> > If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you
> > end up with an array of 1024 doubly linked lists, which'll take up 64kb
> > on its own.  And there a certainly scenarios where even bigger block
> > sizes could make sense.   That's both memory overhead, and runtime
> > overhead, because at reset-time we'll have to check the whole array
> > (which'll presumably largely be empty lists).  Resetting is a pretty
> > common path...
> > 
> 
> True, but it's not entirely clear if resetting is common for the paths where
> we use those new allocators.

That's fair enough.  There's still the memory overhead, but I guess we
can also live with that.


> Also, if we accept that it might be a problem, what other solution you
> propose? I don't think just merging everything into a single list is a good
> idea, for the reasons I explained before (it might make the resets somewhat
> less expensive, but it'll make pfree() more expensive).

Now that I think about it, a binary heap, as suggested elsewhere, isn't
entirely trivial to use for this - it's more or less trivial to "fix"
the heap after changing an element's value, but it's harder to find that
element first.

But a two-level list approach seems like it could work quite well -
basically a simplified skip-list.  A top-level list contains that has
the property that all the elements have a distinct #free, and then
hanging of those sub-lists for all the other blocks with the same number
of chunks.

I think that'd be a better implementation, but I can understand if you
don't immediately want to go there.


> What might work is replacing the array with a list, though. So we'd have a
> list of lists, which would eliminate the array overhead.

That seems likely to be significantly worse, because a) iteration is
more expensive b) accessing the relevant list to move between two
different "freecount" lists would be O(n).

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming pg_resetxlog to pg_resetwal has broken pg_upgrade.

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 9:09 AM, Jeff Janes  wrote:
> check for "/usr/local/pgsql9_6/bin/pg_resetwal" failed: No such file or
> directory
>
> This looks somewhat complicated to fix.  Should check_bin_dir test the old
> cluster version, and make a deterministic check based on that?  Or just
> check for either spelling, and stash the successful result somewhere?

The fix does not seem that complicated to me. get_bin_version() just
needs pg_ctl to be present, so we could move that in check_bin_dir()
after looking if pg_ctl is in a valid state, and reuse the version of
bin_version to see if the binary version is post-10 or not. Then the
decision making just depends on this value. Please see the patch
attached, this is passing 9.6->10 and check-world.

I have added as well an open item on the wiki.
-- 
Michael

pgupgrade-rename-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2017-02-13 Thread Amit Kapila

On Mon, Feb 13, 2017 at 5:47 PM, Robert Haas  wrote:
> On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila  wrote:
> Why can't we rely on _bt_walk_left?
 The reason is mentioned in comments, but let me try to explain with
 some example.  When you reach that point of code, it means that either
 the current page (assume page number is 10) doesn't contain any
 matching items or it is a half-dead page, both of which indicates that
 we have to move to the previous page.   Now, before checking if the
 current page contains matching items, we signal parallel machinery
 (via _bt_parallel_release) to allow workers to read the previous page
 (assume previous page number is 9).  So it is quite possible that
 after deciding that current page (page number 10) doesn't contain any
 matching tuples if we directly move to the previous page (in this case
 it will be 9) by using _bt_walk_left, some other worker would have
 read page 9.  In short, if we directly use _bt_walk_left(), then we
 are prone to returning some of the values twice as multiple workers
 can read the same page.
>>> But ... the entire point of the seize-and-release stuff is to avoid
>>> this problem.  You're suppose to seize the scan, read the current
>>> page, walk left, store the page you find in the scan, and then release
>>> the scan.
>> Exactly and that is what is done in the patch.  Basically, if we found
>> that the current page is half-dead or it doesn't contain any matching
>> items, then release the current buffer, seize the scan, read the
>> current page, walk left and so on.  I am slightly confused here
>> because it seems both of us agree on what is the right thing to do and
>> according to me that is how it is implemented.  Are you just ensuring
>> about whether I have implemented as discussed or do you see a problem
>> with the way it is implemented?
>
> Well, before, I thought you said that relying entirely on
> _bt_walk_left couldn't work because then two people might end up
> running it at the same time, and that would cause problems.  But if
> you can only run _bt_walk_left while you've got the scan seized, then
> that can't happen.  Evidently I'm missing something here.
>

I think the comment at that place is not as clear as it should be.  So
how about changing it as below:

Existing comment:
--
/*
* For parallel scans, get the last page scanned as it is quite
* possible that by the time we try to fetch previous page, other
* worker has also decided to scan that previous page.  So we
* can't rely on _bt_walk_left call.
*/

Modified comment:
--
/*
 * For parallel scans, it is quite possible that by the time we try to fetch
 * the previous page, another worker has also decided to scan that
 * previous page.  So to avoid that we need to get the last page scanned
 * from shared scan descriptor before calling _bt_walk_left.
 */


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-02-13 Thread Haribabu Kommi

On Tue, Feb 14, 2017 at 2:57 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Hi,
>
> I wonder if it is possible to somehow benchmark your clustered index
> implementation.
> I tried to create VCI index for lineitem table from TPC and run Q6 query.
> After index creation Postgres is not using parallel execution plan any
> more but speed of sequential plan is not changed
> and nothing in query execution plan indicates that VCI index is used:
>
>
> postgres=# explain select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
>  QUERY
> PLAN
>
>
> 
> 
> ---
> 
> -
>  Finalize Aggregate  (cost=608333.85..608333.86 rows=1 width=4)
>->  Gather  (cost=608333.23..608333.84 rows=6 width=4)
>  Workers Planned: 6
>  ->  Partial Aggregate  (cost=607333.23..607333.24 rows=1 width=4)
>->  Parallel Seq Scan on lineitem_projection
> (cost=0.00..607024.83 rows=61680 width=8)
>  Filter: ((l_shipdate >= '1996-01-01'::date) AND
> (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double
> precision) AN
> D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double
> precision))
> (6 rows)
>
> postgres=# select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>revenue
> -
>  6.2e+08
> (1 row)
>
> Time: 1171.324 ms (00:01.171)
>
> postgres=# create index vci_idx on lineitem_projection using
> vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,
> l_returnflag,l_linestatus);
> CREATE INDEX
> Time: 4.705 ms
>
>
> postgres=# explain select
> * from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
> QUERY
> PLAN
>
> 
> 
> ---
> ---
>  Seq Scan on lineitem_projection  (cost=0.00..382077.00 rows=1 width=22)
>Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <=
> '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND
> (l_discount <= '
> 0.1'::double precision) AND (l_quantity < '24'::double precision))
> (2 rows)
>
> postgres=# select
>
>
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>   revenue
> 
>  6.2112e+08
> (1 row)
>
> Time: 4304.355 ms (00:04.304)
>
>
> I wonder if there is any query which can demonstrate advantages of using
> VCI index?
>

The current patch that I shared doesn't contains the plan and executor
changes to show
the performance benefit of the clustered index. we used custom plan to
generate the plan
for the clustered index. Currently I am working on it to rebase it to
current master and
other necessary changes.

In the current state of the patch, I cannot take any performance tests, as
it needs some
major changes according to the latest PostgreSQL version. I have an old
performance
report that is took on 9.5 attached for your reference.

The current patch that is shared is to find out the best approach in
developing a columnar
storage in PostgreSQL, by adopting Index access methods + additional hooks
or pluggable
storage access methods?

The only problem I can think of pluggable storage methods is, to use the
proper benefits of
columnar storage, the planner and executor needs to be changed to support
vector processing,
But whereas in the current model, we implemented the same with custom plan
and additional
hooks. The same may be possible with pluggable storage methods also.


Regards,
Hari Babu
Fujitsu Australia


VCI_DBT3_Query_Performance.xlsx
Description: MS-Excel 2007 spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Haribabu Kommi

On Tue, Feb 14, 2017 at 12:48 AM, Dilip Kumar  wrote:

> On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas 
> wrote:
> > I don't think it's acceptable (or necessary) to move the DSA
> > definitions into postgres.h.  Why do you think you need to do that,
> > vs. just including dsa.h in a few more places?
>
> I need to access dsa_pointer in tidbitmap.h, which is included from
> FRONTEND as well. Now, problem is that dsa.h is including #include
> "port/atomics.h", but atomic.h can not be included if FRONTEND is
> defined.
>
> #ifndef ATOMICS_H
> #define ATOMICS_H
> #ifdef FRONTEND
> #error "atomics.h may not be included from frontend code"
> #endif
>
> Is there any other solution to this ?


How about creating another header file with the parallel changes
and include it only in necessary places?

Following are my observations, while going through the patch.

+#if SIZEOF_DSA_POINTER == 4
+typedef uint32 dsa_pointer;
+#else
+typedef uint64 dsa_pointer;
+#endif

I feel the declaration of the above typdef can be moved into the
section above if we going with the current move into postgres.h
file.

+/*
+ * tbm_alloc_shared
+ *
+ * Callback function for allocating the memory for hashtable elements.
+ * It allocates memory from DSA if tbm holds a reference to a dsa.
+ */
+static inline void *
+pagetable_allocate(pagetable_hash *pagetable, Size size)

Function name and comments mismatch?


Regards,
Hari Babu
Fujitsu Australia

[HACKERS] Set of fixes for WAL consistency check facility

2017-02-13 Thread Michael Paquier

Hi all,

Beginning a new thread to raise awareness... As already reported here,
I had a look at what has been committed in a507b869:
https://www.postgresql.org/message-id/CAB7nPqRzCQb=vdfhvmtp0hmlbhu6z1agdo4gjsup-hp8jx+...@mail.gmail.com

Here are a couple of things I have noticed while looking at the code:

+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
s/2016/2017/ in bufmask.c and bufmask.h.

+   if (ItemIdIsNormal(iid))
+   {
+
+   HeapTupleHeader page_htup = (HeapTupleHeader) page_item;
Unnecessary newline here.

+* Read the contents from the backup copy, stored in WAL record and
+* store it in a temporary page. There is not need to allocate a new
+* page here, a local buffer is fine to hold its contents and a mask
+* can be directly applied on it.
s/not need/no need/.

In checkXLogConsistency(), FPWs that have the flag BKPIMAGE_APPLY set
will still be checked, resulting in a FPW being compared to itself. I
think that those had better be bypassed.

Please find attached a patch with those fixes. I am attaching as well
this patch to next CF.
Regards,
-- 
Michael


consistency-checks-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] renaming pg_resetxlog to pg_resetwal has broken pg_upgrade.

2017-02-13 Thread Jeff Janes

Upgrading from 9.6 to dev, I now get:

$ rm bisectdata -r ; bisect/bin/pg_ctl initdb -D bisectdata;
bisect/bin/pg_upgrade -b /usr/local/pgsql9_6/bin/ -B bisect/bin/ -d 96 -D
bisectdata/


check for "/usr/local/pgsql9_6/bin/pg_resetwal" failed: No such file or
directory

This looks somewhat complicated to fix.  Should check_bin_dir test the old
cluster version, and make a deterministic check based on that?  Or just
check for either spelling, and stash the successful result somewhere?


Culprit is here:

commit 85c11324cabaddcfaf3347df78555b30d27c5b5a
Author: Robert Haas 
Date:   Thu Feb 9 16:23:46 2017 -0500

Rename user-facing tools with "xlog" in the name to say "wal".

This means pg_receivexlog because pg_receivewal, pg_resetxlog
becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.



Cheers,

Jeff

Re: [HACKERS] libpq Alternate Row Processor

2017-02-13 Thread Kyle Gearhart

On Mon, Feb 13, 2017 Merlin Moncure wrote:
>A barebones callback mode ISTM is a complete departure from the classic 
>PGresult interface.  This code is pretty unpleasant IMO:
acct->abalance = *((int*)PQgetvalue(res, 0, i)); abalance = 
acct->__bswap_32(acct->abalance);

> Your code is faster but foists a lot of the work on the user, so it's kind of 
> cheating in a way (although very carefully written applications might be able 
> to benefit).

The bit you call out above is for single row mode.  Binary mode is a slippery 
slope, with or without the proposed callback.

Let's remember that one of the biggest, often overlooked, gains when using an 
ORM is that it abstracts all this mess away.  The goal here is to prevent all 
the ORM/framework folks from having to implement protocol.  Otherwise they get 
to wait on libpq to copy from the socket to the PGconn buffer to the PGresult 
structure to their buffers.  The callback keeps the slowest guy on the 
team...on the bench. 

Kyle Gearhart

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Possible TODO: allow arbitrary expressions in event trigger WHEN

2017-02-13 Thread Jim Nasby

Is there a reason not to allow $SUBJECT? Specifically, it'd be nice to 
be able to do something like WHEN tag LIKE 'ALTER%'.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Feb 13, 2017 at 2:42 PM, Alvaro Herrera
>  wrote:
> > Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted
> > to use the new ones, conditionally on server version.  Surely pgAdmin3
> > is going to receive further updates, given that it's still widely used?
> 
> According to the pgAdmin web site, no.  (Yeah, that does seem odd.)

I really do not think the PG core project should be held hostage by an
external and apparently not-really-maintained project.  What if we
introduce some other difference in PG10 that breaks pgAdmin3?  Are we
going to roll that change back?  Are we sure that none exists already?

And, as I understand it, pgAdmin3 hasn't got support for features
introduced as far back as 9.5 either, surely it's not going to have
support added to it for the publication/subscription things or
declarative partitioning, should we rip those out to accomedate
pgAdmin3?

> >> IMHO, these views aren't costing us much.  It'd be nice to get rid of
> >> them eventually but a view definition doesn't really need much
> >> maintenance.
> >
> > Maybe not, but the fact that they convey misleading information is bad.
> 
> Has anyone actually been confused by them?

This isn't something we can prove.  Nor can we prove that no one has
ever been confused.  What we can show is that they're clearly misleading
and inconsistent.  Even if no one is ever confused by them, having them
is bad.

> I'm a bit skeptical about the idea that these are misleading people,
> because the information is no more or less misleading now than it was
> in PostgreSQL 8.1 when the views were introduced.  And evidently it
> was not so misleading at that time as to make us thing that a hard
> compatibility break was warranted.

No, that's not how I recall the discussion going down when I introduced
them in 8.1.  It was more along the lines of "Here's a patch, oh, and I
added these views too."  The archives might prove me wrong, as memory
does fade after years, but I certainly don't recall any lengthy
discussion about if we should add these views or not.  In short, my
recollection is that we added them because it was easy to do at the time
and we didn't have the foresight to realize just how hard they would
become to get rid of and how much time they would suck up from people
arguing about them.

This is part of the bigger picture that we need to consider whenever we
think about backwards-compat mis-features.

> On the other hand, I suppose that the last version of pgAdmin 3 isn't
> likely to work with future major versions of PostgreSQL anyway unless
> somebody updates it, and if somebody decides to update it for the
> other changes in v10 then updating it for the removal of these views
> won't be much extra work.  So maybe it doesn't matter.

Indeed.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost  wrote:
> > Note that these views have not been consistently maintained and have
> > ended up including some role attributes from recent versions
> 
> That's not a bug.  According to the documentation, these views exist
> for compatibility with PostgreSQL versions before 8.1, so there's no
> need to update them with newer fields.  Clients who are expecting to
> talk with a pre-8.1 PostgreSQL won't expect those fields to be present
> anyway.

Yet we added bypassrls to them, after a similar discussion of how
they're for backwards compat and we can't get rid of them, but we should
update them with new things, blah, blah.

> My big objection to removing these views is that it will break pgAdmin
> 3, which uses all three of these views.  I understand that the pgAdmin
> community is now moving away from pgAdmin 3 and toward pgAdmin 4, but
> I bet that pgAdmin 3 still has significant usage and will continue to
> have significant usage for at least a year or three.  When it's
> thoroughly dead, then I think we can go ahead and do this, unless
> there are other really important tools still depending on those views,
> but it's only been 3 months since the final pgAdmin 3 release.

IMHO, if it's dead enough to not get updated for such changes then we
shouldn't care enough about it to maintain backwards compat views in our
code-base for it.

> IMHO, these views aren't costing us much.  It'd be nice to get rid of
> them eventually but a view definition doesn't really need much
> maintenance.  (A contrib module doesn't either, but more than a view
> definition.)

Clearly, it does need some form of maintenance and consideration or it
ends up in a confusing and inconsistent state, as evidenced by the fact
that that's exactly where we are.  I don't really expect to actually win
this argument, as I've had the experience of trying to fight this fight
before, but I certainly don't agree that we should try to continue to
maintain them for pgAdmin3's sake.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-02-13 Thread Thomas Munro

On Thu, Feb 9, 2017 at 2:03 AM, Ashutosh Bapat
 wrote:
>>
>> 0004-hj-refactor-batch-increases-v4.patch:
>>
>> Modify the existing hash join code to detect work_mem exhaustion at
>> the point where chunks are allocated, instead of checking after every
>> tuple insertion.  This matches the logic used for estimating, and more
>> importantly allows for some parallelism in later patches.
>
> The patch has three changes
> 1. change dense_alloc() to accept respect_workmem argument and use it
> within the function.
> 2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from
> ExecHashTableInsert() to account for memory before inserting new tuple
> 3. Check growEnabled before calling ExecHashIncreaseNumBatches().

Thanks for the review!

> I think checking growEnabled within ExecHashIncreaseNumBatches() is
> more easy to maintain that checking at every caller. If someone is to
> add a caller tomorrow, s/he has to remember to add the check.

Hmm.  Yeah.  In the later 0010 patch ExecHashIncreaseNumBatches will
be used in a slightly different way -- not for making decisions or
performing the hash table shrink, but only for reallocating the batch
arrays.  I will see if putting the growEnabled check back in there in
the 0004 patch and then refactoring in a later patch makes more sense
to someone reviewing the patches independently, for the next version.

> It might be better to add some comments in
> ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be
> called with respect_work_mem = false? ExecHashSkewTableInsert() does
> call ExecHashIncreaseNumBatches() after calling
> ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do
> expect increase in space used and thus go beyond work_mem for a short
> while. Is there a way we can handle this case in dense_alloc()?

Right, that needs some explanation, which I'll add for the next
version.  The explanation is that while 'shrinking' the hash table, we
may need to go over the work_mem limit by one chunk for a short time.
That is already true in master, but by moving the work_mem checks into
dense_alloc I ran into the problem that dense_alloc might decide to
shrink the hash table which needs to call dense alloc.  Shrinking
works by spinning through all the chunks copying only the tuples we
want to keep into new chunks and freeing the old chunks as we go.  We
will temporarily go one chunk over work_mem when we allocate the first
new chunk but before we've freed the first old one.  We don't want
shrink operations to trigger recursive shrink operations, so we
disable respect for work_mem when calling it from
ExecHashIncreaseNumBatches.  In the course of regular hash table
loading, we want to respect work_mem.

Looking at the v5 patch series I posted yesterday, I see that in fact
ExecHashIncreaseNumBatches calls dense_alloc with respect_work_mem =
true in the 0004 patch, and then I corrected that mistake in the 0008
patch; I'll move the correction back to the 0004 patch in the next
version.

To reach ExecHashIncreaseNumBatches, see the "ugly" query in
hj-test-queries.sql (posted with v5).

In ExecHashRemoveNextSkewBucket I preserved the existing behaviour of
not caring about work_mem when performing the rare operation of
copying a tuple from the skew bucket into a dense_alloc memory chunk
so it can be inserted into a regular (non-skew) bucket.

> Is it possible that increasing the number of batches changes the
> bucket number of the tuple being inserted? If so, should we
> recalculate the bucket and batch of the tuple being inserted?

No -- see the function documentation for ExecHashGetBucketAndBatch.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs

2017-02-13 Thread Simon Riggs

On 13 February 2017 at 17:12, Robert Haas  wrote:
> On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby  wrote:
>> On 2/10/17 2:33 PM, Robert Haas wrote:
>>> That having been said, I think it could certainly be useful to have
>>> more control over what DDL gets logged in foreground processes.
>>
>> FWIW, this is a significant problem outside of DDL. Once you're past 1-2
>> levels of nesting SET client_min_messages = DEBUG becomes completely
>> useless.
>>
>> I think the ability to filter logging based on context would be very
>> valuable. AFAIK you could actually do that for manual logging with existing
>> plpgsql support, but obviously that won't help for anything else.
>
> Well, that's moving the goalposts a lot further and in an unrelated
> direction.  I don't think that it's a good idea to change the
> semantics of log_autovacuum_min_duration in the way Simon is proposing
> for the reasons I noted, but I think that an acceptable patch could be
> 100 lines of pretty straightforward code and documentation, like a new
> GUC that controls this output for the vac-non-autovac case.

If my idea would not log manual ANALYZE, well, we can add that in
easily. There is no reason to block the patch for such a minor foible.

This is a short patch to address a specific minor issue, not a blue
sky redesign of logging.

If someone else wants to add more, they can, later. Incremental
change, just as happens all the time everywhere else.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread David Rowley

On 14 February 2017 at 10:56, Brad DeJong  wrote:
> David Rowley wrote:
>> I propose we just remove the whole paragraph, and mention about
>> the planning and estimated number of groups stuff in another new paragraph.
>>
>> I've attached a patch to this effect ...
>
> s/In a worse case scenario/In the worst case scenario,/
>
> Other than that, the phrasing in the new patch reads very smoothly.

Thanks. Updated patch attached.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_doc_fixes_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p

2017-02-13 Thread Michael Paquier

On Tue, Feb 14, 2017 at 2:31 AM, Fujii Masao  wrote:
> On Tue, Feb 14, 2017 at 2:06 AM, Robert Haas  wrote:
>> On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao  wrote:
>>> On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas  wrote:
 Remove all references to "xlog" from SQL-callable functions in pg_proc.

 Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog
 directory to pg_wal.  To make things consistent, and because "xlog" is
 terrible terminology for either "transaction log" or "write-ahead log"
 rename all SQL-callable functions that contain "xlog" in the name to
 instead contain "wal".  (Note that this may pose an upgrade hazard for
 some users.)
>>>
>>> There are still some mentions to "xlog" in the doc. Maybe we should replace
>>> them with "wal" as the attached patch does.
>>
>> +1.  Patch looks good to me.
>
> Thanks! Pushed.

It seems like the previous review I provided for the set of renaming
patches has been ignored:
https://www.postgresql.org/message-id/CAB7nPqSm=PNSe3EfvnEResdFzpQMcXXgapFBcF=EFdxVW4=f...@mail.gmail.com
Good that somebody else checked...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Brad DeJong

David Rowley wrote:
> I propose we just remove the whole paragraph, and mention about
> the planning and estimated number of groups stuff in another new paragraph.
> 
> I've attached a patch to this effect ...

s/In a worse case scenario/In the worst case scenario,/

Other than that, the phrasing in the new patch reads very smoothly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread David Rowley

On 14 February 2017 at 10:10, Brad DeJong  wrote:
> Robert Haas wrote:
>
>> +COUNT(*), each worker must compute subtotals which later 
>> must
>> +be combined to produce an overall total in order to produce the final
>> +answer.  If the query involves a GROUP BY clause,
>> +separate subtotals must be computed for each group seen by each parallel
>> +worker. Each of these subtotals must then be combined into an overall
>> +total for each group once the parallel aggregate portion of the plan is
>> +complete.  This means that queries which produce a low number of groups
>> +relative to the number of input rows are often far more attractive to 
>> the
>> +query planner, whereas queries which don't collect many rows into each
>> +group are less attractive, due to the overhead of having to combine the
>> +subtotals into totals, of which cannot run in parallel.
>
>> I don't think "of which cannot run in parallel" is good grammar.  I'm 
>> somewhat unsure whether the rest is an improvement or not.  Other opinions?
>
> Does this read any more clearly?
>
> +COUNT(*), each worker must compute subtotals which are later
> +combined in order to produce an overall total for the final answer.  If
> +the query involves a GROUP BY clause, separate subtotals
> +must be computed for each group seen by each parallel worker.  After the
> +parallel aggregate portion of the plan is complete, there is a serial 
> step
> +where the group subtotals from all of the parallel workers are combined
> +into an overall total for each group.  Because of the overhead of 
> combining
> +the subtotals into totals, plans which produce few groups relative to the
> +number of input rows are often more attractive to the query planner
> +than plans which produce many groups relative to the number of input 
> rows.

Actually looking over this again I think it's getting into too much
detail which is already described in the next paragraph (of which I
think is very clear). I propose we just remove the whole paragraph,
and mention about the planning and estimated number of groups stuff in
another new paragraph.

I've attached a patch to this effect, which also just removes the text
about why we don't support Merge Join. I felt something needed written
in its place, so I mentioned that identical hash tables are created in
each worker. This is perhaps not required, but the paragraph seemed a
bit empty without it.  I also noticed a mistake "based on a column
taken from the inner table", this "inner" I assume should be "outer"
since it surely must be talking of a parameterised index scan?, in
which case the parameter is from the outer side, not the inner.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_doc_fixes_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY IN/BOTH vs. extended query mode

2017-02-13 Thread Craig Ringer

On 14 Feb. 2017 06:15, "Robert Haas"  wrote:

On Mon, Jan 23, 2017 at 9:12 PM, Robert Haas  wrote:
> According to the documentation for COPY IN mode, "If the COPY command
> was issued via an extended-query message, the backend will now discard
> frontend messages until a Sync message is received, then it will issue
> ReadyForQuery and return to normal processing."  I added a similar
> note to the documentation for COPY BOTH mode in
> 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation
> accurately describes the behavior of the server.  However, this seems
> to make fully correct error handling for clients using libpq almost
> impossible, because PQsendQueryGuts() sends
> Parse-Bind-Describe-Execute-Sync in one shot without regard to whether
> the command that was just sent invoked COPY mode (cf. the note in
> CopyGetData about why we ignore Flush and Sync in that function).
>
> So imagine that the client uses libpq to send (via the extended query
> protocol) a COPY IN command (or some hypothetical command that starts
> COPY BOTH mode to begin).  If the server throws an error before the
> Sync message is consumed, it will bounce back to PostgresMain which
> will set doing_extended_query_message = true after which it will
> consume messages, find the Sync, reset that flag, and send
> ReadyForQuery.  On the other hand, if the server enters CopyBoth mode,
> consumes the Sync message in CopyGetData (or a similar function), and
> *then* throws an ERROR, the server will wait for a second Sync message
> from the client before issuing ReadyForQuery.  There is no sensible
> way of coping with this problem in libpq, because there is no way for
> the client to know which part of the server code consumed the Sync
> message that it already sent.  In short, from the client's point of
> view, if it enters COPY IN or COPY BOTH mode via the extend query
> protocol, and an error occurs on the server, the server MAY OR MAY NOT
> expect a further Sync message before issuing ReadyForQuery, and the
> client has no way of knowing -- except maybe waiting for a while to
> see what happens.
>
> It does not appear to me that there is any good solution to this
> problem.  Fixing it on the server side would require a wire protocol
> change - e.g. one kind of Sync message that is used in a
> Parse-Bind-Describe-Execute-Sync sequence that only terminates
> non-COPY commands and another kind that is used to signal the end even
> of COPY.  Fixing it on the client side would require all clients to
> know prior to initiating an extended-query-protocol sequence whether
> or not the command was going to initiate COPY, which is an awful API
> even if didn't constitute an impossible-to-contemplate backward
> compatibility break.  Perhaps we will have to be content to document
> the fact that this part of the protocol is depressingly broken...
>
> ...unless of course somebody can see something that I'm missing here
> and the situation isn't as bad as it currently appears to me to be.

Anybody have any thoughts on this?


I've been thinking on it a bit, but don't really have anything that can be
done without a protocol version bump.

We can't really disallow extended query protocol COPY, too much is likely
to break. And we can't fix it without a protocol change.

A warning in the docs for COPY would be appropriate, noting that clients
should use the simple query protocol to issue COPY. It's kind of mixing
layers, since many users won't see the protocol level or have any idea if
their client driver uses ext or simple query, but we can at least advise
libpq users.

Also in the protocol docs, noting that clirnfa sending COPY should prefer
the simple query protocol due to error recovery issues with COPY and
extended query protocol.

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

On Mon, Feb 13, 2017 at 3:40 PM, Fabien COELHO  wrote:

>
> Hello Robert,
>
> [...] I think we should try to make this REALLY simple.  We don't really
>> want to have everybody have to change their PROMPT1 and PROMPT2 strings for
>> this one feature.
>>
>
> Ok. I think that we agree that the stack was too much details.
>
> How about just introducing a new value for %R?
>>
>
> Yes. That is indeed one of the idea being discussed.
>
> [...] , or @ if commands are currently being ignored because of the result
>> of an \if test.
>>
>
,-or-@ has one advantage over t/f/z: we cannot infer the 'z' state purely
from pset.active_state, and the if-stack itself is sequestered in
scan_state, which is not visible to the get_prompt() function.

I suppose if somebody wanted it, a separate slash command that does a
verbose printing of the current if-stack would be nice, but mostly just to
explain to people how the if-stack works.


> If I can find some simple mnemonic for "," vs "@" for being executed vs
> ignored, I could live with that, but nothing obvious comes to my mind.
>

@in't gonna execute it?

I'm here all week, try the veal.

To sum up your points: just update %R (ok), keep it simple/short (ok... but
> how simple [2 vs 3 states] and short [1 or 2 chars]), and no real need to
> be too nice with the user beyond the vital (ok, that significantly
> simplifies things).


I'd be fine with either of these on aesthetic grounds. On technical
grounds, 'z' is harder to show.

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Brad DeJong

Robert Haas wrote:

> +COUNT(*), each worker must compute subtotals which later must
> +be combined to produce an overall total in order to produce the final
> +answer.  If the query involves a GROUP BY clause,
> +separate subtotals must be computed for each group seen by each parallel
> +worker. Each of these subtotals must then be combined into an overall
> +total for each group once the parallel aggregate portion of the plan is
> +complete.  This means that queries which produce a low number of groups
> +relative to the number of input rows are often far more attractive to the
> +query planner, whereas queries which don't collect many rows into each
> +group are less attractive, due to the overhead of having to combine the
> +subtotals into totals, of which cannot run in parallel.

> I don't think "of which cannot run in parallel" is good grammar.  I'm 
> somewhat unsure whether the rest is an improvement or not.  Other opinions?

Does this read any more clearly?

+COUNT(*), each worker must compute subtotals which are later
+combined in order to produce an overall total for the final answer.  If
+the query involves a GROUP BY clause, separate subtotals
+must be computed for each group seen by each parallel worker.  After the
+parallel aggregate portion of the plan is complete, there is a serial step
+where the group subtotals from all of the parallel workers are combined
+into an overall total for each group.  Because of the overhead of combining
+the subtotals into totals, plans which produce few groups relative to the
+number of input rows are often more attractive to the query planner
+than plans which produce many groups relative to the number of input rows.


I got rid of the ", of which cannot run in parallel" entirely. It was
already stated that the subtotals->totals step runs "once the parallel
aggregate portion of the plan is complete." which implies that it is serial.
I made that explicit with "there is a serial step". Also, the purpose of the 
", of which cannot run in parallel" sentence is to communicate why the
planner prefers one plan over another and, if I'm reading this correctly,
the subtotals->totals step is serial for both plans so that is not a reason
to prefer one over the other.

I think that the planner prefers plans rather than queries, so I changed that 
as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq Alternate Row Processor

2017-02-13 Thread Merlin Moncure

On Mon, Feb 13, 2017 at 8:46 AM, Kyle Gearhart
 wrote:
> On 2/9/17 7:15 PM, Jim Nasby wrote:
>> Can you run a trace to see where all the time is going in the single row 
>> case? I don't see an obvious time-suck with a quick look through the code. 
>> It'd be interesting to see how things change if you eliminate the filler 
>> column from the SELECT.
>
> Traces are attached, these are with callgrind.
>
> profile_nofiller.txt: single row without filler column
> profile_filler.txt: single row with filler column
> profile_filler_callback.txt: callback with filler column
>
> pqResultAlloc looks to hit malloc pretty hard.  The callback reduces all of 
> that to a single malloc for each row.

Couldn't that be optimized, say, by preserving malloc'd memory when in
single row mode and recycling it?  (IIRC during the single row mode
discussion this optimization was voted down).

A barebones callback mode ISTM is a complete departure from the
classic PGresult interface.  This code is pretty unpleasant IMO:
acct->abalance = *((int*)PQgetvalue(res, 0, i));
acct->abalance = __bswap_32(acct->abalance);

Your code is faster but foists a lot of the work on the user, so it's
kind of cheating in a way (although very carefully written
applications might be able to benefit).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Fabien COELHO



Hello Corey,


I went back to master and re-applied v11, something must have gotten lost
in translation.


Probably you need "git add" for added files?

About v14: patch applies, make check ok, psql tap tests ok.

All seems fine to me. Test coverage is better than a lot of other 
features. Code & comments seem fine. Doc and example are clear enough to 
me.


The level of messages in interactive is terse but informative when needed, 
on errors and when commands are ignored. The only missing point is about 
doing something to the prompt, but given the current messages ISTM that 
this can wait for a follow-up patch. Robert Haas advice is to keep it 
simple and short and in %R. There was also some suggestion to have a "show 
the stack" command for debug, I think that this can wait as well.


I've turned again the CF entry to "ready for committers", to see what 
committers thing about this new and simplified version.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

On Mon, Feb 13, 2017 at 11:29 AM, Robert Haas  wrote:

> possible states here.  Just when you've figured out what tfzffft


I agree with what you've said, but wanted to point out that any condition
that follows a 'z' would itself be 'z'. Not that tfz is much better.

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Fabien COELHO



Hello Robert,

[...] I think we should try to make this REALLY simple.  We don't really 
want to have everybody have to change their PROMPT1 and PROMPT2 strings 
for this one feature.


Ok. I think that we agree that the stack was too much details.


How about just introducing a new value for %R?


Yes. That is indeed one of the idea being discussed.

[...] , or @ if commands are currently being ignored because of the 
result of an \if test.


Currently I find that %R logic is quite good, with "=" for give me 
something, "^" is start line regular expression for one line, "!" for 
beware someting is amiss, and in prompt2 "-" for continuation, '"' for in 
double quotes, "(" for in parenthesis and so on.


What would be the mnemonic for "," an "@"?

By shortening one of the suggestion down to two characters, we may have 
three cases:


  "?t" for "in condition, in true block"
  "?f" for "in condition, in false block (but true yet to come)"
  "?z" for "in condition, waiting for the end (true has been executed)".

So no indication about the stack depth and contents. tfz for true false 
and sleeping seem quite easy to infer and understand. "?" is also needed 
as a separator with the previous field which is the database name 
sometimes:


  calvin=> \if false
  calvin?f=> \echo 1
  calvin?f=> \elif true
  calvin?t=> \echo 2
2
  calvin?t=> \else
  calvin?z=> \echo 3
  calvin?z=> \endif
  calvin=>

With the suggested , and @:

  calvin=> \if false
  calvin,=> \echo 1
  calvin,=> \elif true
  calvin@=> \echo 2
2
  calvin@=> \else
  calvin,=> \echo 3
  calvin,=> \endif
  calvin=>

If I can find some simple mnemonic for "," vs "@" for being executed vs 
ignored, I could live with that, but nothing obvious comes to my mind.


The "?" for condition and Corey's [tfz] looked quite intuitive/mnemonic to 
me. The drawback is that it is 2 chars vs one char in above.


[...] I think that's all you need here: a way to alert users as to 
whether commands are being ignored, or not.


Yep.


[...]


To sum up your points: just update %R (ok), keep it simple/short (ok... 
but how simple [2 vs 3 states] and short [1 or 2 chars]), and no real need 
to be too nice with the user beyond the vital (ok, that significantly 
simplifies things).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 3:29 PM, David Rowley
 wrote:
> On 14 February 2017 at 09:21, Robert Haas  wrote:
>> On Sun, Feb 12, 2017 at 7:16 PM, David Rowley
>> -table. Each worker will execute the outer side of the plan in full, 
>> which
>> -is why merge joins are not supported here. The outer side of a merge 
>> join
>> -will often involve sorting the entire inner table; even if it involves 
>> an
>> -index, it is unlikely to be productive to have multiple processes each
>> -conduct a full index scan of the inner table.
>> +relation. Each worker will execute the inner side of the join in full,
>> +which is why merge joins are not supported here. The inner side of a 
>> merge
>> +join will often involve sorting the entire inner relation; even if it
>> +involves an index, it is unlikely to be productive to have multiple
>> +processes each conduct a full index scan of the inner side of the join.
>>
>> Why s/table/relation/?  I don't think that's useful, especially
>> because the first part of that very same paragraph would still say
>> "table".
>
> Perhaps it's not correct to do that. I did mean relation is the true
> relational theory sense, rather than where relkind = 'r'. I didn't
> really like the way it assumed the inner side was a table. Perhaps its
> better to just say "involve sorting the entire inner side of the join"

Yeah, that seems fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread David Rowley

On 14 February 2017 at 09:21, Robert Haas  wrote:
> On Sun, Feb 12, 2017 at 7:16 PM, David Rowley
> -table. Each worker will execute the outer side of the plan in full, which
> -is why merge joins are not supported here. The outer side of a merge join
> -will often involve sorting the entire inner table; even if it involves an
> -index, it is unlikely to be productive to have multiple processes each
> -conduct a full index scan of the inner table.
> +relation. Each worker will execute the inner side of the join in full,
> +which is why merge joins are not supported here. The inner side of a 
> merge
> +join will often involve sorting the entire inner relation; even if it
> +involves an index, it is unlikely to be productive to have multiple
> +processes each conduct a full index scan of the inner side of the join.
>
> Why s/table/relation/?  I don't think that's useful, especially
> because the first part of that very same paragraph would still say
> "table".

Perhaps it's not correct to do that. I did mean relation is the true
relational theory sense, rather than where relkind = 'r'. I didn't
really like the way it assumed the inner side was a table. Perhaps its
better to just say "involve sorting the entire inner side of the join"



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to parallel query docs

2017-02-13 Thread Robert Haas

On Sun, Feb 12, 2017 at 7:16 PM, David Rowley
 wrote:
> Tomas Vondra pointed out to me that there's an error in parallel.sgml
> which confuses the inner and outer sides of the join.
>
> I've attached a patch which fixes this, although I think I'm still
> missing the point to text's explanation of why Merge Join is not
> included due to it having to sort the inner side in each worker. Hash
> join must build a hash table for each worker, so why is that OK by
> sorting is not?
>
> Anyway, I've attached a patch which fixes the outer/inner confusion
> and cleans up a couple of other grammar mistakes and an omissions
> regarding DISTINCT and ORDER BY not being supported in parallel
> aggregate. I ended up rewriting a section too which was explaining
> parallel aggregate, which I personally believe is a bit more clear to
> read, but someone else may think otherwise.

-loops or hash joins.  The outer side of the join may be any kind of
+loops or hash joins.  The inner side of the join may be any kind of

Oops.  That's clearly a mistake.

-table. Each worker will execute the outer side of the plan in full, which
-is why merge joins are not supported here. The outer side of a merge join
-will often involve sorting the entire inner table; even if it involves an
-index, it is unlikely to be productive to have multiple processes each
-conduct a full index scan of the inner table.
+relation. Each worker will execute the inner side of the join in full,
+which is why merge joins are not supported here. The inner side of a merge
+join will often involve sorting the entire inner relation; even if it
+involves an index, it is unlikely to be productive to have multiple
+processes each conduct a full index scan of the inner side of the join.

Why s/table/relation/?  I don't think that's useful, especially
because the first part of that very same paragraph would still say
"table".

Anyway, events have shown the bit about merge joins was wrong thinking
on my part.  See Dilip's emails about parallel merge join.  Maybe we
should just change to say that it isn't supported without giving a
reason.  I hope to commit Dilip's patch to add support for exactly
that thing soon.  Then we can remove the language altogether and say
that it is supported.

-COUNT(*), each worker could compute a total, but those totals
-would need to combined in order to produce a final answer.  If the query
-involved a GROUP BY clause, a separate total would need to
-be computed for each group.  Even though aggregation can't be done entirely
-in parallel, queries involving aggregation are often excellent candidates
-for parallel query, because they typically read many rows but return only
-a few rows to the client.  Queries that return many rows to the client
-are often limited by the speed at which the client can read the data,
-in which case parallel query cannot help very much.
+COUNT(*), each worker must compute subtotals which later must
+be combined to produce an overall total in order to produce the final
+answer.  If the query involves a GROUP BY clause,
+separate subtotals must be computed for each group seen by each parallel
+worker. Each of these subtotals must then be combined into an overall
+total for each group once the parallel aggregate portion of the plan is
+complete.  This means that queries which produce a low number of groups
+relative to the number of input rows are often far more attractive to the
+query planner, whereas queries which don't collect many rows into each
+group are less attractive, due to the overhead of having to combine the
+subtotals into totals, of which cannot run in parallel.

I don't think "of which cannot run in parallel" is good grammar.  I'm
somewhat unsure whether the rest is an improvement or not.  Other
opinions?

-a PartialAggregate node.  Second, the partial results are
+a Partial Aggregate node.  Second, the partial results are

Oops, that's clearly a mistake.

-FinalizeAggregate node.
+Finalize Aggregate node.

That one, too.

-Parallel aggregation is not supported for ordered set aggregates or when
-the query involves GROUPING SETS.  It can only be used when
-all joins involved in the query are also part of the parallel portion
-of the plan.
+Parallel aggregation is not supported if any aggregate function call
+contains DISTINCT or ORDER BY clause and is also
+not supported for ordered set aggregates or when  the query involves
+GROUPING SETS.  It can only be used when all joins involved in
+the query are also part of the parallel portion of the plan.

That chunk seems like an improvement to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

On Mon, Feb 13, 2017 at 3:04 PM, Fabien COELHO  wrote:

>
> Hello Corey,
>
> That effort was creating as many headaches as it solved. Rolled back v12
>> except for doc paragraph, added more descriptive tap test names. Enjoy.
>>
>
> The TAP tests are missing from the patch, not sure why...
>

I went back to master and re-applied v11, something must have gotten lost
in translation.


> The stack cleanup is nicer with pop calls.
>

Yeah, dunno why I didn't do that earlier.


> Two debatable suggestions about the example:
>  - put the on_error_stop as the very first thing in the example?
>  - add a comment explaining what the SELECT ... \gset does?


Both are reasonable and easy. Added.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..7e59192 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,81 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+-- stop the script if the condition variables are invalid boolean expressions
+\set ON_ERROR_STOP on
+-- check for the existence of two separate records in the database and store
+-- the results in separate psql variables
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+
+Expressions that do not properly evaluate to true or false will
+generate an error and cause the \if or
+\elif command to fail.  Because that behavior may
+change branching context in undesirable ways (executing code which
+was intended to be skipped, causing \elif,
+\else, and \endif commands to
+pair with the wrong \if, etc), it is
+recommended that scripts which use conditionals also set
+ON_ERROR_STOP.
+
+Lines within false branches are not evaluated in any way: queries are
+not sent to the server, non-conditional commands are not evaluated but
+bluntly ignored, nested if-expressions in such branches are also not
+evaluated but are tallied to check for proper nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..7a418c6 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -61,8 +61,16 @@ uninstall:
 
 clean distclean:
rm -f psql$(X) $(OBJS) lex.backup
+   rm -rf tmp_check
 
 # files removed here are supposed to be in the distribution tarball,
 # so do not clean them in the clean/distclean rules
 maintainer-clean: distclean
rm -f sql_help.h sql_help.c psqlscanslash.c
+
+
+check:
+   $(prove_check)
+
+installcheck:
+   $(prove_installcheck)
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index f17f610..390bccd 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -49,6 +49,7 @@
 #include "psqlscanslash.h"
 #include "settings.h"
 #include "variables.h"
+#include "fe_utils/psqlscan_int.h"
 
 /*
  * Editable database object types.
@@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state,
status = PSQL_CMD_ERROR;
}
 
-   if (status != PSQL_CMD_ERROR)
+   if (status != PSQL_CMD_ERROR && pset.active_branch)
{
/* eat any remaining arguments after a valid command */
/* note we suppress evaluation of backticks here */
@@ -194,6 +195,30 @@ read_connect_arg(PsqlScanState

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 2:42 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> My big objection to removing these views is that it will break pgAdmin
>> 3, which uses all three of these views.  I understand that the pgAdmin
>> community is now moving away from pgAdmin 3 and toward pgAdmin 4, but
>> I bet that pgAdmin 3 still has significant usage and will continue to
>> have significant usage for at least a year or three.  When it's
>> thoroughly dead, then I think we can go ahead and do this, unless
>> there are other really important tools still depending on those views,
>> but it's only been 3 months since the final pgAdmin 3 release.
>
> Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted
> to use the new ones, conditionally on server version.  Surely pgAdmin3
> is going to receive further updates, given that it's still widely used?

According to the pgAdmin web site, no.  (Yeah, that does seem odd.)

>> IMHO, these views aren't costing us much.  It'd be nice to get rid of
>> them eventually but a view definition doesn't really need much
>> maintenance.
>
> Maybe not, but the fact that they convey misleading information is bad.

Has anyone actually been confused by them?

I'm a bit skeptical about the idea that these are misleading people,
because the information is no more or less misleading now than it was
in PostgreSQL 8.1 when the views were introduced.  And evidently it
was not so misleading at that time as to make us thing that a hard
compatibility break was warranted.

On the other hand, I suppose that the last version of pgAdmin 3 isn't
likely to work with future major versions of PostgreSQL anyway unless
somebody updates it, and if somebody decides to update it for the
other changes in v10 then updating it for the removal of these views
won't be much extra work.  So maybe it doesn't matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Fabien COELHO



Hello Corey,

That effort was creating as many headaches as it solved. Rolled back v12 
except for doc paragraph, added more descriptive tap test names. Enjoy.


The TAP tests are missing from the patch, not sure why...

The stack cleanup is nicer with pop calls.

Two debatable suggestions about the example:
 - put the on_error_stop as the very first thing in the example?
 - add a comment explaining what the SELECT ... \gset does?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-13 Thread Pavel Stehule

Hi

2017-02-13 6:46 GMT+01:00 Michael Paquier :

> On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule 
> wrote:
> > here is new update - check is done before any creating
>
> It may be better to do any checks before dropping existing databases
> as well... It would be as well just simpler to complain with a single
> error message like "database and template list lengths do not match".
>

next step

Regards

Pavel


> --
> Michael
>
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d4d00d9c66..ef0542ad0c 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -68,6 +68,7 @@ const char *pretty_diff_opts = "-w -C3";
 
 /* options settable from command line */
 _stringlist *dblist = NULL;
+_stringlist *templatelist = NULL;
 bool		debug = false;
 char	   *inputdir = ".";
 char	   *outputdir = ".";
@@ -1907,7 +1908,7 @@ drop_database_if_exists(const char *dbname)
 }
 
 static void
-create_database(const char *dbname)
+create_database(const char *dbname, const char *template)
 {
 	_stringlist *sl;
 
@@ -1917,10 +1918,12 @@ create_database(const char *dbname)
 	 */
 	header(_("creating database \"%s\""), dbname);
 	if (encoding)
-		psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=template0 ENCODING='%s'%s", dbname, encoding,
+		psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=\"%s\" ENCODING='%s'%s",
+	 dbname, template, encoding,
 	 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : "");
 	else
-		psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=template0%s", dbname,
+		psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=\"%s\"%s",
+	 dbname, template,
 	 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : "");
 	psql_command(dbname,
  "ALTER DATABASE \"%s\" SET lc_messages TO 'C';"
@@ -1995,6 +1998,7 @@ help(void)
 	printf(_("  --outputdir=DIR   place output files in DIR (default \".\")\n"));
 	printf(_("  --schedule=FILE   use test ordering schedule from FILE\n"));
 	printf(_("(can be used multiple times to concatenate)\n"));
+	printf(_("  --template=DB use template DB (default \"template0\")\n"));
 	printf(_("  --temp-instance=DIR   create a temporary instance in DIR\n"));
 	printf(_("  --use-existinguse an existing installation\n"));
 	printf(_("\n"));
@@ -2041,10 +2045,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		{"launcher", required_argument, NULL, 21},
 		{"load-extension", required_argument, NULL, 22},
 		{"config-auth", required_argument, NULL, 24},
+		{"template", required_argument, NULL, 25},
 		{NULL, 0, NULL, 0}
 	};
 
 	_stringlist *sl;
+	_stringlist *tl;
 	int			c;
 	int			i;
 	int			option_index;
@@ -2154,6 +2160,16 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 			case 24:
 config_auth_datadir = pg_strdup(optarg);
 break;
+			case 25:
+
+/*
+ * If a default template was specified, we need to remove it
+ * before we add the specified one.
+ */
+free_stringlist();
+split_to_stringlist(optarg, ",", );
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 fprintf(stderr, _("\nTry \"%s -h\" for more information.\n"),
@@ -2205,6 +2221,18 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 	unlimit_core_size();
 #endif
 
+	/* The length of template list should be same like db list */
+	if (templatelist != NULL)
+	{
+		for (sl = dblist, tl = templatelist; sl && tl; sl = sl->next, tl = tl->next);
+		if (sl || tl)
+		{
+			fprintf(stderr, _("%s: database and template list lengths do not match\n"),
+	progname);
+			exit(2);
+		}
+	}
+
 	if (temp_instance)
 	{
 		FILE	   *pg_conf;
@@ -2454,8 +2482,17 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 	 */
 	if (!use_existing)
 	{
-		for (sl = dblist; sl; sl = sl->next)
-			create_database(sl->str);
+		if (templatelist != NULL)
+		{
+			for (sl = dblist, tl = templatelist; sl; sl = sl->next, tl = tl->next)
+create_database(sl->str, tl->str);
+		}
+		else
+		{
+			for (sl = dblist; sl; sl = sl->next)
+create_database(sl->str, "template0");
+		}
+
 		for (sl = extraroles; sl; sl = sl->next)
 			create_role(sl->str, dblist);
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Alvaro Herrera

Robert Haas wrote:

> My big objection to removing these views is that it will break pgAdmin
> 3, which uses all three of these views.  I understand that the pgAdmin
> community is now moving away from pgAdmin 3 and toward pgAdmin 4, but
> I bet that pgAdmin 3 still has significant usage and will continue to
> have significant usage for at least a year or three.  When it's
> thoroughly dead, then I think we can go ahead and do this, unless
> there are other really important tools still depending on those views,
> but it's only been 3 months since the final pgAdmin 3 release.

Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted
to use the new ones, conditionally on server version.  Surely pgAdmin3
is going to receive further updates, given that it's still widely used?

> IMHO, these views aren't costing us much.  It'd be nice to get rid of
> them eventually but a view definition doesn't really need much
> maintenance.

Maybe not, but the fact that they convey misleading information is bad.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Josh Berkus

On 02/13/2017 11:00 AM, Robert Haas wrote:
> My big objection to removing these views is that it will break pgAdmin
> 3, which uses all three of these views.  I understand that the pgAdmin
> community is now moving away from pgAdmin 3 and toward pgAdmin 4, but
> I bet that pgAdmin 3 still has significant usage and will continue to
> have significant usage for at least a year or three.  When it's
> thoroughly dead, then I think we can go ahead and do this, unless
> there are other really important tools still depending on those views,
> but it's only been 3 months since the final pgAdmin 3 release.

How long would you suggest is appropriate?  Postgres 11? 12?  Let's set
a target date; that way we can communicate it more than once.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

>
> I wasn't too happy with the extra param, either. Having moved the guts to
>> conditional.c, I'm happy with that change and can move the stack head back
>> to scan_state with a clear conscience.
>>
>
>
That effort was creating as many headaches as it solved. Rolled back v12
except for doc paragraph, added more descriptive tap test names. Enjoy.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..e86b66b 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,79 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+-- set ON_ERROR_STOP in case the variables are not valid boolean expressions
+\set ON_ERROR_STOP on
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+
+Expressions that do not properly evaluate to true or false will
+generate an error and cause the \if or
+\elif command to fail.  Because that behavior may
+change branching context in undesirable ways (executing code which
+was intended to be skipped, causing \elif,
+\else, and \endif commands to
+pair with the wrong \if, etc), it is
+recommended that scripts which use conditionals also set
+ON_ERROR_STOP.
+
+Lines within false branches are not evaluated in any way: queries are
+not sent to the server, non-conditional commands are not evaluated but
+bluntly ignored, nested if-expressions in such branches are also not
+evaluated but are tallied to check for proper nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..7a418c6 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -61,8 +61,16 @@ uninstall:
 
 clean distclean:
rm -f psql$(X) $(OBJS) lex.backup
+   rm -rf tmp_check
 
 # files removed here are supposed to be in the distribution tarball,
 # so do not clean them in the clean/distclean rules
 maintainer-clean: distclean
rm -f sql_help.h sql_help.c psqlscanslash.c
+
+
+check:
+   $(prove_check)
+
+installcheck:
+   $(prove_installcheck)
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index f17f610..390bccd 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -49,6 +49,7 @@
 #include "psqlscanslash.h"
 #include "settings.h"
 #include "variables.h"
+#include "fe_utils/psqlscan_int.h"
 
 /*
  * Editable database object types.
@@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state,
status = PSQL_CMD_ERROR;
}
 
-   if (status != PSQL_CMD_ERROR)
+   if (status != PSQL_CMD_ERROR && pset.active_branch)
{
/* eat any remaining arguments after a valid command */
/* note we suppress evaluation of backticks here */
@@ -194,6 +195,30 @@ read_connect_arg(PsqlScanState scan_state)
return result;
 }
 
+/*
+ * Read and interpret argument as a boolean expression.
+ * Return true if a boolean value was successfully parsed.
+ */
+static bool
+read_boolean_expression(PsqlScanState scan_state, char *action,
+   bool *result)
+{
+   char*value = psql_scan_slash_option(scan_state,
+   
OT_NORMAL, NULL, false);
+

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-13 Thread Tomas Vondra


On 02/13/2017 03:16 PM, Bernd Helmle wrote:

Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov:

Thus, I see reasons why in your tests absolute results are lower than
in my
previous tests.
1.  You use 28 physical cores while I was using 32 physical cores.
2.  You run tests in PowerVM while I was running test on bare metal.
PowerVM could have some overhead.
3.  I guess you run pgbench on the same machine.  While in my tests
pgbench
was running on another node of IBM E880.



Yeah, pgbench was running locally. Maybe we can get some resources to
run them remotely. Interesting side note: If you run a second postgres
instance with the same pgbench in parallel, you get nearly the same
transaction throughput as a single instance.

Short side note:

If you run two Postgres instances concurrently with the same pgbench
parameters, you get nearly the same transaction throughput for both
instances each as when running against a single instance, e.g.



That strongly suggests you're hitting some kind of lock. It'd be good to 
know which one. I see you're doing "pgbench -S" which also updates 
branches and other tiny tables - it's possible the sessions are trying 
to update the same row in those tiny tables. You're running with scale 
1000, but with 100 it's still possible thanks to the birthday paradox.


Otherwise it might be interesting to look at sampling wait events, which 
might tell us more about the locks.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-13 Thread Robert Haas

On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost  wrote:
> Note that these views have not been consistently maintained and have
> ended up including some role attributes from recent versions

That's not a bug.  According to the documentation, these views exist
for compatibility with PostgreSQL versions before 8.1, so there's no
need to update them with newer fields.  Clients who are expecting to
talk with a pre-8.1 PostgreSQL won't expect those fields to be present
anyway.

My big objection to removing these views is that it will break pgAdmin
3, which uses all three of these views.  I understand that the pgAdmin
community is now moving away from pgAdmin 3 and toward pgAdmin 4, but
I bet that pgAdmin 3 still has significant usage and will continue to
have significant usage for at least a year or three.  When it's
thoroughly dead, then I think we can go ahead and do this, unless
there are other really important tools still depending on those views,
but it's only been 3 months since the final pgAdmin 3 release.

IMHO, these views aren't costing us much.  It'd be nice to get rid of
them eventually but a view definition doesn't really need much
maintenance.  (A contrib module doesn't either, but more than a view
definition.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 10:52 AM, Peter Eisentraut
 wrote:
> People have all kinds of expectations on how the build system behaves.
> It's not just ./configure; make; make install.  All kinds of niche and
> edge cases have evolved over the years.  Variables you can set,
> overrides, targets in some subdirectories, building in subdirectories
> and having it build another subdirectory first, testing this way and
> testing that way, testing with a custom locale or a custom database
> name, building before testing or not, testing without reinstalling, and
> so on and so on.  You'd have to make sure at least some of that
> continues to work or be able to explain it away.  And most of it is not
> really documented.

...which is another argument for just not changing anything.  I mean,
again, the current Windows build system is unbeautiful and
occasionally takes some work, but if we switch to cmake, that has to
get enough better to make up for all of the things that get worse.
And it's far from obvious than this will be so, especially when you
consider the fact that many people have detailed knowledge of how to
tweak the current system to do what they want.  As you point out here,
a lot of that stuff may stop working if we replace the system
wholesale.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

On Mon, Feb 13, 2017 at 11:26 AM, Corey Huinker 
wrote:

> ISTM that it is kind of a regression, because logically this is about the
>> scan state so it should be in the corresponding structure, and having two
>> structures to pass the scan state is not an improvement...
>
>
> I wasn't too happy with the extra param, either. Having moved the guts to
> conditional.c, I'm happy with that change and can move the stack head back
> to scan_state with a clear conscience.
>

So moving the conditional stack back into PsqlScanState has some side
effects: conditional.[ch] have to move to the fe_utils/ dirs, and now
pgbench, which does not use conditionals, would have to link to them. Is
that a small price to pay for modularity and easier-to-find code? Or should
I just tuck it back into psqlscan_int.[ch]?

[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p

2017-02-13 Thread Fujii Masao

On Tue, Feb 14, 2017 at 2:06 AM, Robert Haas  wrote:
> On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao  wrote:
>> On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas  wrote:
>>> Remove all references to "xlog" from SQL-callable functions in pg_proc.
>>>
>>> Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog
>>> directory to pg_wal.  To make things consistent, and because "xlog" is
>>> terrible terminology for either "transaction log" or "write-ahead log"
>>> rename all SQL-callable functions that contain "xlog" in the name to
>>> instead contain "wal".  (Note that this may pose an upgrade hazard for
>>> some users.)
>>
>> There are still some mentions to "xlog" in the doc. Maybe we should replace
>> them with "wal" as the attached patch does.
>
> +1.  Patch looks good to me.

Thanks! Pushed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY IN/BOTH vs. extended query mode

2017-02-13 Thread Robert Haas

On Mon, Jan 23, 2017 at 9:12 PM, Robert Haas  wrote:
> According to the documentation for COPY IN mode, "If the COPY command
> was issued via an extended-query message, the backend will now discard
> frontend messages until a Sync message is received, then it will issue
> ReadyForQuery and return to normal processing."  I added a similar
> note to the documentation for COPY BOTH mode in
> 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation
> accurately describes the behavior of the server.  However, this seems
> to make fully correct error handling for clients using libpq almost
> impossible, because PQsendQueryGuts() sends
> Parse-Bind-Describe-Execute-Sync in one shot without regard to whether
> the command that was just sent invoked COPY mode (cf. the note in
> CopyGetData about why we ignore Flush and Sync in that function).
>
> So imagine that the client uses libpq to send (via the extended query
> protocol) a COPY IN command (or some hypothetical command that starts
> COPY BOTH mode to begin).  If the server throws an error before the
> Sync message is consumed, it will bounce back to PostgresMain which
> will set doing_extended_query_message = true after which it will
> consume messages, find the Sync, reset that flag, and send
> ReadyForQuery.  On the other hand, if the server enters CopyBoth mode,
> consumes the Sync message in CopyGetData (or a similar function), and
> *then* throws an ERROR, the server will wait for a second Sync message
> from the client before issuing ReadyForQuery.  There is no sensible
> way of coping with this problem in libpq, because there is no way for
> the client to know which part of the server code consumed the Sync
> message that it already sent.  In short, from the client's point of
> view, if it enters COPY IN or COPY BOTH mode via the extend query
> protocol, and an error occurs on the server, the server MAY OR MAY NOT
> expect a further Sync message before issuing ReadyForQuery, and the
> client has no way of knowing -- except maybe waiting for a while to
> see what happens.
>
> It does not appear to me that there is any good solution to this
> problem.  Fixing it on the server side would require a wire protocol
> change - e.g. one kind of Sync message that is used in a
> Parse-Bind-Describe-Execute-Sync sequence that only terminates
> non-COPY commands and another kind that is used to signal the end even
> of COPY.  Fixing it on the client side would require all clients to
> know prior to initiating an extended-query-protocol sequence whether
> or not the command was going to initiate COPY, which is an awful API
> even if didn't constitute an impossible-to-contemplate backward
> compatibility break.  Perhaps we will have to be content to document
> the fact that this part of the protocol is depressingly broken...
>
> ...unless of course somebody can see something that I'm missing here
> and the situation isn't as bad as it currently appears to me to be.

Anybody have any thoughts on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p

2017-02-13 Thread Robert Haas

On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao  wrote:
> On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas  wrote:
>> Remove all references to "xlog" from SQL-callable functions in pg_proc.
>>
>> Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog
>> directory to pg_wal.  To make things consistent, and because "xlog" is
>> terrible terminology for either "transaction log" or "write-ahead log"
>> rename all SQL-callable functions that contain "xlog" in the name to
>> instead contain "wal".  (Note that this may pose an upgrade hazard for
>> some users.)
>
> There are still some mentions to "xlog" in the doc. Maybe we should replace
> them with "wal" as the attached patch does.

+1.  Patch looks good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs

2017-02-13 Thread Robert Haas

On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby  wrote:
> On 2/10/17 2:33 PM, Robert Haas wrote:
>> That having been said, I think it could certainly be useful to have
>> more control over what DDL gets logged in foreground processes.
>
> FWIW, this is a significant problem outside of DDL. Once you're past 1-2
> levels of nesting SET client_min_messages = DEBUG becomes completely
> useless.
>
> I think the ability to filter logging based on context would be very
> valuable. AFAIK you could actually do that for manual logging with existing
> plpgsql support, but obviously that won't help for anything else.

Well, that's moving the goalposts a lot further and in an unrelated
direction.  I don't think that it's a good idea to change the
semantics of log_autovacuum_min_duration in the way Simon is proposing
for the reasons I noted, but I think that an acceptable patch could be
100 lines of pretty straightforward code and documentation, like a new
GUC that controls this output for the vac-non-autovac case.
Fine-grained control over which commands get logged in general is
harder, but it's still only a moderately complex patch (I think).
Filtering based on the context in which the logging is happening
sounds extremely complicated; I'm not sure what sort of design for
that would even be reasonable and it seems like there might be
innumerable requests for additional frammishes culminating in some
sort of mini-language and getting committed right around the time
Zefram Cochrane makes his inaugural flight.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-13 Thread Fujii Masao

On Fri, Feb 10, 2017 at 1:52 PM, Michael Paquier
 wrote:
> On Tue, Feb 7, 2017 at 6:35 AM, Peter Eisentraut
>  wrote:
>> On 2/6/17 10:54 AM, Fujii Masao wrote:
>>> Yes, that's an option. And, if we add dbid to pg_stat_subscription,
>>> I'm tempted to add all the pg_subscription's columns except subconninfo
>>> into pg_stat_subscription. Since subconninfo may contain security-sensitive
>>> information, it should be excluded. But it seems useful to expose other
>>> columns. Then, if we expose all the columns except subconninfo, maybe
>>> it's better to just revoke subconninfo column on pg_subscription instead of
>>> all columns. Thought?
>>
>> I think previous practice is to provide a view such as pg_subscriptions
>> that contains all the nonprivileged information.
>
> OK, I think that I see the point you are coming at:
> pg_stat_get_subscription (or stat tables) should not be used for
> psql's tab completion. So gathering all things discussed, we have:
> - No tab completion for publications.

No, the tab-completions for ALTER/DROP PUBLICATION should show the local
publications because those commands drop and alter the local ones. OTOH,
CREATE/ALTER SUBSCRIPTOIN ... PUBLICATION should show nothing because
the remote publications in the publisher side should be specified there.

> - Fix the meta-commands.

Yes.

> - Addition of a view pg_subscriptions with all the non-sensitive data.
> (- Or really extend pg_stat_subscriptions with the database ID and use
> it for tab completion?)

Probably I failed to get Peter's point... Anyway IMO that we can expose all the
columns except the sensitive information (i.e., subconninfo field)
in pg_subscription to even non-superusers. Then we can use pg_subscription
for the tab-completion for ALTER/DROP SUBSCRIPTION.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-13 Thread Robert Haas

On Fri, Feb 10, 2017 at 8:39 PM, Peter Geoghegan  wrote:
> On Thu, Feb 9, 2017 at 2:32 PM, Andres Freund  wrote:
>> Except that the proposed names aren't remotely like that... ;).
>
> Revision attached -- V5. We now REVOKE ALL on both functions, as
> Robert suggested, instead of the previous approach of having a
> hard-coded superuser check with enforcement.
>
>> And I proposed documenting named parameters, and
>> check_btree(performing_check_requiring_exclusive_locks => true) is just
>> about as expressive.
>
> I have not done this, nor have I renamed the functions. I still think
> that this is something that we can fix by adding a boolean argument to
> each function in the future, or something along those lines. I
> *really* hate the idea of having one function with non-obvious,
> variable requirements on locking, with locking implications that are
> not knowable when we PREPARE an SQL statement calling the function. It
> also removes a useful way of have superusers discriminate against the
> stronger locking variant bt_index_parent_check() by not granting
> execute on it (as an anti-footgun measure).

I think Andres is more or less correct that
"performing_check_requiring_exclusive_locks => true" is just about as
expressive as calling a different function, but I think that your
point that the superuser might want to grant access to one function
but not the other is a good one.  On the other hand, I think Andres
has a concern that we might have more modes in the future and we don't
want to end up with 2^n entrypoints.  That also seems valid.  Hmm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-13 Thread Ashutosh Sharma

Hi All,

I too have performed benchmarking of this patch on a large machine
(with 128 CPU(s), 520GB RAM, intel x86-64 architecture) and would like
to share my observations for the same (Please note that, as I had to
reverify readings on few client counts, it did take some time for me
to share these test-results.)

Case1: Data fits in shared buffer, Read Only workload:
---
For data fits in shared buffer, I have taken readings at 300 SF. The
result sheet 'results-readonly-300-1000-SF' containing the median of 3
runs is attached with this mail. In this case, I could see very good
performance improvement with the patch basically at high client counts
(156 - 256).

Case2: Data doesn't fit in shared buffer, Read Only workload:
--
For data doesn't fit in shared buffer, I have taken readings at 1000
SF. The result sheet 'results-readonly-300-1000-SF' is attached with
this mail. In this case, the performance improvement is not as in
Case1, Infact it just varies in the range of 2-7%. but the good thing
is that there is no regression.

Case3: Data fits in shared buffer, Read-write workload:
-
In this case, I could see that the tps on head and patch are very
close to each other with a small variation of (+-)3-4% which i assume
is a run-to-run variation. PFA result sheet
'results-readwrite-300-1000-SF' containing the test-results.

Case4: Data doesn't fit in shared buffer, Read-write workload:

In this case as well, tps on head and patch are very close to each
other with small variation of 1-2% which again is a run-to-run
variation. PFA result sheet 'results-readwrite-300-1000-SF' containing
the test-results.

Please note that details on the non-default guc params and pgbench is
all provided in the result sheets attached with this mail. Also, I
have not used pg_prewarm in my test script. Thank you.

On Mon, Feb 13, 2017 at 9:43 PM, Bernd Helmle  wrote:
> Am Montag, den 13.02.2017, 16:55 +0300 schrieb Alexander Korotkov:
>>
>> Thank you for testing.
>>
>> Yes, influence seems to be low.  But nevertheless it's important to
>> insure
>> that there is no regression here.
>> Despite pg_prewarm'ing and running tests 300s, there is still
>> significant
>> variation.
>> For instance, with clients count = 80:
>>  * pgxact-result-2.txt – 474704
>>  * pgxact-results.txt – 574844
>
>> Could some background processes influence the tests?  Or could it be
>> another virtual machine?
>> Also, I wonder why I can't see this variation on the graphs.
>> Another issue with graphs is that we can't see details of read and
>> write
>
> Whoops, good catch. I've mistakenly copied the wrong y-axis for these
> results in the gnuplot script, shame on me. New plots attached.
>
> You're right, the 2nd run with the pgxact alignment patch is notable.
> I've realized that there was a pgbouncer instance running from a
> previous test, but not sure if that could explain the difference.
>
>> TPS variation on the same scale, because write TPS values are too
>> low.  I
>> think you should draw write benchmark on the separate graph.
>>
>
> The Linux LPAR is the only one used atm. We got some more time for
> Linux now and i'm going to prepare Tomas' script to run. Not sure i can
> get to it today, though.
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


results-readonly-300-1000-SF.xlsx
Description: MS-Excel 2007 spreadsheet


results-readwrite-300-1000-SF.xlsx
Description: MS-Excel 2007 spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-13 Thread Robert Haas

On Thu, Feb 9, 2017 at 5:32 PM, Andres Freund  wrote:
>> > Again, some parts of the code doing something bad isn't a good argument
>> > for doing again. Releasing locks early is a bad pattern, because backend
>> > code isn't generally safe against it, we have special code-paths for
>> > catalog tables to support it for those.
>>
>> I don't agree with that.  The reason we keep relation locks until the
>> end of the transaction is to avoid surprising application code, not
>> because the system can't tolerate it.  But here it's more likely that
>> retaining the lock will surprise the user.
>
> It's both.  A bunch of code paths rely on early release ony happening on
> catalogs. E.g., and that's just the first thing that comes to mind,
> cache invalidations which really only are guaranteed to be delivered and
> visibile for other cache accesses, if the lock is only released after
> the end of the transaction.  Object accesses like relation_open() accept
> invalidations *after* acquiring the lock. Combined with lock releases
> only happening *after* transaction commits that guarantees an up2date
> view of the cache; but if others release the lock earlier and have cache
> invalidations, that doesn't work anymore.  Now you can argue that it's
> possibly harmless here because there's presumably no way this can ever
> trigger cache invals - and you're probably right - but this isn't the
> only place with such assumption and you'd definitely have to argue *why*
> it's right.

I agree that code which issues cache invalidations needs to hold a
blocking lock until commit IF it's critical for other backends to see
the update.  That's not always the case; for example, if an ALTER
TABLE statement only changes the fillfactor the worst thing that
happens if some other backend fails to see the invalidations is that
some transactions continue to use the old value for a while.  That's
why we now allow fillfactor and some other things using only
ShareUpdateExclusiveLock.  That's not quite the same thing but the
effect is the same: we issue invalidations that other backends aren't
guaranteed to see in a timely fashion, and it works OK.  Yeah, some
other session might not see the change for a while, but we don't view
that as a big problem.

But I think in this case that argument is basically a straw man.  A
utility that's called amcheck is obviously not issuing any cache
invalidations.  It's probably not even changing anything at all; if it
is, maybe it should be called amrepair.  So the fact that CLUSTER has
to hold AEL until commit time really doesn't have anything to do with
whether amcheck needs to hold AES until commit time.

I don't really understand your demand for "an argument why it's
right".  My argument is simple: I can't think of a single thing that
it would break, and I don't believe there is any such thing.
Furthermore, as Peter points out, we do this kind of thing all the
time in other places and it indeed does not break things.  I think
you're trying to invent a coding rule that doesn't exist, but I can't
prove a negative.  Your argument so far is that "yeah, well, maybe
inval doesn't emit sinval traffic (duh?) but it might do something
else that breaks, prove to me that there is no such thing".  But
that's just like asking me to prove that it's impossible to exceed the
speed of light.  On information and belief, nobody currently knows how
to do that and there is good scientific evidence that it cannot be
done, but there's no way to prove that someone won't in the future
discover a refinement in the physical laws of the universe as we
understand them today which sheds a different light on the situation.

>From my point of view, it's likely to be common to want to run amcheck
in series on a bunch of indexes, like by writing a SELECT query
against pg_class or pg_index and passing the OIDs to amcheck.  With
the retain-locks design, that query accumulates locks on a large
number of objects, possibly blowing out the lock table or blocking
DDL.  I think that readily-forseeable problem ought to trump concerns
about purely hypothetical issues caused by releasing locks early.  If
you can come up with an actual issue with releasing locks early that
applies to this particular case, then that's different, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sum aggregate calculation for single precsion real

2017-02-13 Thread Konstantin Knizhnik




On 13.02.2017 19:20, Tom Lane wrote:

Konstantin Knizhnik  writes:

I wonder why SUM aggregate is calculated for real (float4) type using
floating point accumulator?

If you can't deal with the vagaries of floating-point arithmetic, you
shouldn't be storing your data in float format.  Use numeric.


4-byte floats are widely used for example in trading applications just 
because it is two times shorter then double and range of stored data is 
relatively small (do not need a lot of significant digits). At the same 
time volume of stored data is very large and switching from float4 to 
float8 will almost double it. It requires two times more storage and 
almost two times increase query execution time.

So this is not acceptable answer.




Are there are reasons of using float4pl function for SUM aggregate instead of 
float4_accum?

The latter is probably a good two orders of magnitude slower, and it
wouldn't really do much to solve the inherent accuracy problems of
adding float4 values that have a wide dynamic range.


It is not true - please notice query execution time of this two queries:

postgres=# select sum(l_quantity)  from lineitem where l_shipdate <= 
'1998-12-01';

 sum
-
 1.52688e+09
(1 row)

Time: 2858.852 ms
postgres=# select sum(l_quantity+0.0)  from lineitem where l_shipdate <= 
'1998-12-01';

sum

 1529738036
(1 row)

Time: 3174.529 ms


Looks like now in Postgres aggregate calculation itself is not a 
bottleneck, comparing with tuple deform cost.





The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences.  It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.


From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to 
represent range of data I have and I need to minimize size of record.
But when I am calculating sum, I expect to receive more or less precise 
result. Certainly I realize that even in case of using double it is 
possible to loose precision while calculation and result may depend on 
sum order (if we add very small and very larger values). But in real use 
cases (for example in trading data) such large difference in attribute 
values is very rare. If you have, for example, stock price, then it is 
very unlikely that one company has value 0.01 and another 1000.0
At least in TPC-H example (which certainly deal with dummy generated 
data), double type produce "almost price" result.




regards, tom lane


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p

2017-02-13 Thread Fujii Masao

On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas  wrote:
> Remove all references to "xlog" from SQL-callable functions in pg_proc.
>
> Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog
> directory to pg_wal.  To make things consistent, and because "xlog" is
> terrible terminology for either "transaction log" or "write-ahead log"
> rename all SQL-callable functions that contain "xlog" in the name to
> instead contain "wal".  (Note that this may pose an upgrade hazard for
> some users.)

There are still some mentions to "xlog" in the doc. Maybe we should replace
them with "wal" as the attached patch does.

Regards,

-- 
Fujii Masao


xlog2wal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements

2017-02-13 Thread Andrew Dunstan



On 01/13/2017 08:36 AM, Anastasia Lubennikova wrote:
> I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER
> MAPPING statements
> for one of our customers.
> I think other users can also find it useful for scripting and
> automated tasks.
> The patches themselves are small and simple. Documentation is updated
> as well.
>



This looks good and useful. Please add some regression tests.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-13 Thread Robert Haas

On Thu, Feb 9, 2017 at 8:15 PM, Peter Geoghegan  wrote:
> BTW, aren't there cases where a PARALLEL SAFE function that needs to
> acquire locks on some arbitrary relation not referenced in the query
> can get locks on its own, which may only last as long as the parallel
> worker's lifetime? This could happen *despite* the fact that the
> author of the function may have imagined that callers could not
> release relation level locks early (i.e., by not releasing them
> directly, and so correctly following this "badly documented
> assumption").

Yes.

> It seems like the existence of PARALLEL RESTRICTED is primarily
> motivated by making stuff that definitely needs the locks to last
> until xact end being sure that that happens -- the docs say so. This
> seems wholly inconsistent with the idea that you're not supposed to
> let that happen under any circumstances. I find all this highly
> confusing. Have I missed some further subtlety that applies with
> PARALLEL RESTRICTED?

That's by no means the only thing that could cause you to mark
something PARALLEL RESTRICTED.  I think you might have missed this bit
of the documentation, a few paragraphs up:

Similarly, functions must be marked PARALLEL
RESTRICTED if they access temporary tables, client connection state,
cursors, prepared statements, or miscellaneous backend-local state which
the system cannot synchronize across workers. For example,
setseed and random are parallel restricted for
this last reason.

That stuff is the main motivation.  The early lock release stuff could
come up for user-written functions, but the things listed in the
paragraph I've quoted here come up for plenty of built-in functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Robert Haas

On Sat, Feb 11, 2017 at 2:43 AM, Fabien COELHO  wrote:
>> Ok, so that's not just PROMPT_READY, that's every prompt...which might be
>> ok. ? is a great optional cue, and you're thinking on 2 levels deep, 2nd
>> level always being '.'?
>
> Yep. The idea is to keep it short, but to still have something to say "there
> are more levels" in the stack, hence the one dot. Basically I just
> compressed your 4 level proposal, and added a separator to deal with the
> preceding stuff and suggest the conditional.

I think we should try to make this REALLY simple.  We don't really
want to have everybody have to change their PROMPT1 and PROMPT2
strings for this one feature. How about just introducing a new value
for %R?  The documentation currently says:

In prompt 1 normally =, but ^ if in single-line mode, or ! if the
session is disconnected from the database (which can happen if
\connect fails).

...and suppose we just extend that to add:

, or @ if commands are currently being ignored because of the result
of an \if test.

The latter would include being in the \if section when the conditional
was true as well as being in the \else section when the conditional
was false.  I think that's all you need here: a way to alert users as
to whether commands are being ignored, or not.  Putting details in
about precisely why they are being ignored seems like it's too
complicated; people won't remember how to decode some bizarre series
of glyphs that we output.  Telling them whether their next command is
set to be ignored or executed is good enough; if the answer isn't what
they expect, they can debug their script to figure out what they
screwed up.

Also, keep in mind that people don't need to know everything from the
current prompt. They can try to debug things by looking back at
previous prompts. They'll understand that \if is going to introduce a
new nesting level and \endif is going to end one, and that \else and
\elseif may change things.  Aside from keeping the code simple so we
can maintain it and the output simple so that users can remember what
it means, I just don't believe that it's really going to be helpful to
convey much detail here.   People aren't going to paste in a gigaton
of commands and then look only at the last line of the output and try
to understand what it's telling them, or if they do that and are
confused, I think nobody will really feel bad about giving them the
advice "scroll up" or "try a simpler test case first".

Further keep in mind that eventually somebody's going to code \while
or \for or something, and then there are going to be even more
possible states here.  Just when you've figured out what tfzffft
means, they'll be the case of a \while loop which is getting skipped
because the condition at the top turned out to be false on the first
iteration, or where (half-joking) we're skipping commands until we
find the label that matches an executed \goto.  Writing maintainable
code includes leaving room open for other people to do stuff we can't
even foresee today, and that means we need not to use up a
disproportionate number of the glyphs that can reasonably be used in a
psql prompt just on this.  This is one small feature out of many that
psql has, and one small hint to the user about whether it's currently
causing commands to be skipped seems sufficient.

All IMHO, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-13 Thread Corey Huinker

>
> ISTM that it is kind of a regression, because logically this is about the
> scan state so it should be in the corresponding structure, and having two
> structures to pass the scan state is not an improvement...


I wasn't too happy with the extra param, either. Having moved the guts to
conditional.c, I'm happy with that change and can move the stack head back
to scan_state with a clear conscience.


> I would suggest to also apply the advice to the example shown, including a
> comment about why the variable is set on.
>

+1

>
> Also, the last element of the tap tests should be distinct: I suggest to
> use 'if syntax error' and 'elif syntax error' in place of 'syntax error'
> for the two first tests.


+1, shouldn't take too long.

Re: [HACKERS] Sum aggregate calculation for single precsion real

2017-02-13 Thread Tom Lane

Konstantin Knizhnik  writes:
> I wonder why SUM aggregate is calculated for real (float4) type using 
> floating point accumulator?

If you can't deal with the vagaries of floating-point arithmetic, you
shouldn't be storing your data in float format.  Use numeric.

> Are there are reasons of using float4pl function for SUM aggregate instead of 
> float4_accum?

The latter is probably a good two orders of magnitude slower, and it
wouldn't really do much to solve the inherent accuracy problems of
adding float4 values that have a wide dynamic range.

The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences.  It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-13 Thread Bernd Helmle

Am Montag, den 13.02.2017, 16:55 +0300 schrieb Alexander Korotkov:
> 
> Thank you for testing.
> 
> Yes, influence seems to be low.  But nevertheless it's important to
> insure
> that there is no regression here.
> Despite pg_prewarm'ing and running tests 300s, there is still
> significant
> variation.
> For instance, with clients count = 80:
>  * pgxact-result-2.txt – 474704
>  * pgxact-results.txt – 574844

> Could some background processes influence the tests?  Or could it be
> another virtual machine?
> Also, I wonder why I can't see this variation on the graphs.
> Another issue with graphs is that we can't see details of read and
> write

Whoops, good catch. I've mistakenly copied the wrong y-axis for these
results in the gnuplot script, shame on me. New plots attached.

You're right, the 2nd run with the pgxact alignment patch is notable.
I've realized that there was a pgbouncer instance running from a
previous test, but not sure if that could explain the difference.

> TPS variation on the same scale, because write TPS values are too
> low.  I
> think you should draw write benchmark on the separate graph.
> 

The Linux LPAR is the only one used atm. We got some more time for
Linux now and i'm going to prepare Tomas' script to run. Not sure i can
get to it today, though.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK

2017-02-13 Thread Fujii Masao

On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada  wrote:
> On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek
>  wrote:
>> On 10/02/17 19:55, Masahiko Sawada wrote:
>>> On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek
>>>  wrote:
 On 08/02/17 07:40, Masahiko Sawada wrote:
> On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier
>  wrote:
>> On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao  
>> wrote:
>>> On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek
>>>  wrote:
 For example what happens if apply crashes during the DROP
 SUBSCRIPTION/COMMIT and is not started because the delete from catalog
 is now visible so the subscription is no longer there?
>>>
>>> Another idea is to treat DROP SUBSCRIPTION in the same way as VACUUM, 
>>> i.e.,
>>> make it emit an error if it's executed within user's transaction block.
>>
>> It seems to me that this is exactly Petr's point: using
>> PreventTransactionChain() to prevent things to happen.
>
> Agreed. It's better to prevent to be executed inside user transaction
> block. And I understood there is too many failure scenarios we need to
> handle.
>
>>> Also DROP SUBSCRIPTION should call CommitTransactionCommand() just
>>> after removing the entry from pg_subscription, then connect to the 
>>> publisher
>>> and remove the replication slot.
>>
>> For consistency that may be important.
>
> Agreed.
>
> Attached patch, please give me feedback.
>

 This looks good (and similar to what initial patch had btw). Works fine
 for me as well.

 Remaining issue is, what to do about CREATE SUBSCRIPTION then, there are
 similar failure scenarios there, should we prevent it from running
 inside transaction as well?

>>>
>>> Hmm,  after thought I suspect current discussing approach. For
>>> example, please image the case where CRAETE SUBSCRIPTION creates
>>> subscription successfully but fails to create replication slot for
>>> whatever reason, and then DROP SUBSCRIPTION drops the subscription but
>>> dropping replication slot is failed. In such case, CREAET SUBSCRIPTION
>>> and DROP SUBSCRIPTION return ERROR but the subscription is created and
>>> dropped successfully. I think that this behaviour confuse the user.
>>>
>>> I think we should just prevent calling DROP SUBSCRIPTION in user's
>>> transaction block. Or I guess that it could be better to separate the
>>> starting/stopping logical replication from subscription management.
>>>
>>
>> We need to stop the replication worker(s) in order to be able to drop
>> the slot. There is no such issue with startup of the worker as that one
>> is launched by launcher after the transaction has committed.
>>
>> IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION inside a
>> transaction block and don't do any commits inside of those (so that
>> there are no rollbacks, which solves your initial issue I believe). That
>> way failure to create/drop slot will result in subscription not being
>> created/dropped which is what we want.

On second thought, +1.

> I basically agree this option, but why do we need to change CREATE
> SUBSCRIPTION as well?

Because the window between the creation of replication slot and the transaction
commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error happens
during that window, the replication slot unexpectedly remains while there is no
corresponding subscription. Of course, even If we prevent CREATE SUBSCRIPTION
from being executed within user's transaction block, there is still such
window. But we can reduce the possibility of that problem.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-02-13 Thread Konstantin Knizhnik


Hi,

I wonder if it is possible to somehow benchmark your clustered index 
implementation.

I tried to create VCI index for lineitem table from TPC and run Q6 query.
After index creation Postgres is not using parallel execution plan any 
more but speed of sequential plan is not changed

and nothing in query execution plan indicates that VCI index is used:


postgres=# explain select
sum(l_extendedprice*l_discount) as revenue
from
lineitem_projection
where
l_shipdate between '1996-01-01' and '1997-01-01'
and l_discount between 0.08 and 0.1
and l_quantity < 24;
QUERY PLAN

---
-
 Finalize Aggregate  (cost=608333.85..608333.86 rows=1 width=4)
   ->  Gather  (cost=608333.23..608333.84 rows=6 width=4)
 Workers Planned: 6
 ->  Partial Aggregate  (cost=607333.23..607333.24 rows=1 width=4)
   ->  Parallel Seq Scan on lineitem_projection 
(cost=0.00..607024.83 rows=61680 width=8)
 Filter: ((l_shipdate >= '1996-01-01'::date) AND 
(l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double 
precision) AN
D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double 
precision))

(6 rows)

postgres=# select
sum(l_extendedprice*l_discount) as revenue
from
lineitem_projection
where
l_shipdate between '1996-01-01' and '1997-01-01'
and l_discount between 0.08 and 0.1
and l_quantity < 24;
   revenue
-
 6.2e+08
(1 row)

Time: 1171.324 ms (00:01.171)

postgres=# create index vci_idx on lineitem_projection using 
vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus);

CREATE INDEX
Time: 4.705 ms


postgres=# explain select
* from
lineitem_projection
where
l_shipdate between '1996-01-01' and '1997-01-01'
and l_discount between 0.08 and 0.1
and l_quantity < 24;
QUERY PLAN

---
---
 Seq Scan on lineitem_projection  (cost=0.00..382077.00 rows=1 width=22)
   Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= 
'1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND 
(l_discount <= '

0.1'::double precision) AND (l_quantity < '24'::double precision))
(2 rows)

postgres=# select
sum(l_extendedprice*l_discount) as revenue
from
lineitem_projection
where
l_shipdate between '1996-01-01' and '1997-01-01'
and l_discount between 0.08 and 0.1
and l_quantity < 24;
  revenue

 6.2112e+08
(1 row)

Time: 4304.355 ms (00:04.304)


I wonder if there is any query which can demonstrate advantages of using 
VCI index?


On 06.02.2017 04:26, Haribabu Kommi wrote:



On Fri, Feb 3, 2017 at 8:28 PM, Konstantin Knizhnik 
> wrote:


On 30.12.2016 06:55, Haribabu Kommi wrote:


Hi All,

Fujitsu was interested in developing a columnar storage extension
with minimal
changes the server backend.


We  in PostgresPRO are also very interested in developing vertical
storage (VS) for Postgres.
And after considering many alternatives, we came to the conclusion
that approach based on representing columnar store as access
method (index)
is the most promising one.

It allows to:
1. Implement VS as extension without affecting Postgres core.
2. Have both ROS and WOS.
3. Create multiple projections (as in Vertica).
4. Optimize insert speed by support batch inserts and use flexible
recovery model for VS.

So it is very similar with your approach. But there are few
differences:

1. Our intention is to completely eliminate changes in Postgres core.

You wrote:

Yes, it is a mix of both index and table access methods. The
current design
of Vertical clustered index needs both access methods, because of
this reason
we used both access methods.

But I still do not completely understand why it is not possible to
use VS in index only scans without any changes and standard
Postgres executor?
Why it is not possible to rely on standard rules of applying
indexes in Postgres optimizer based on costs provided by our AM
implementation?


In our storage design, we used TID-CRID map to identify a record in heap
to columnar storage. Because of HOT update, the new data will not be 
inserted
into indexes, but this will give problem to the columnar storage, so 
we added

a hook to insert index data even if the update is HOT.

And also we added another hook for initializing the parameters during the
execution.

Most of the other added hooks

Re: [HACKERS] removing tsearch2

2017-02-13 Thread Robert Haas

On Fri, Feb 10, 2017 at 1:41 PM, Robert Haas  wrote:
> Anyway, it seems like the consensus here is unanimous.  Unless there
> are a LARGE number of contrary votes in the meanwhile, I'll go make
> this happen sometime next week.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-13 Thread Alvaro Herrera

Alexander Korotkov wrote:

> Yes, influence seems to be low.  But nevertheless it's important to insure
> that there is no regression here.
> Despite pg_prewarm'ing and running tests 300s, there is still significant
> variation.
> For instance, with clients count = 80:
>  * pgxact-result-2.txt – 474704
>  * pgxact-results.txt – 574844
> Could some background processes influence the tests?  Or could it be
> another virtual machine?
> Also, I wonder why I can't see this variation on the graphs.
> Another issue with graphs is that we can't see details of read and write
> TPS variation on the same scale, because write TPS values are too low.  I
> think you should draw write benchmark on the separate graph.

So, I'm reading that on PPC64 there is no effect, and on the "lesser"
machine Tomas tested on, there is no effect either; this patch only
seems to benefit Alexander's 72 core x86_64 machine.

It seems to me that Andres comments here were largely ignored:
https://www.postgresql.org/message-id/20160822021747.u5bqx2xwwjzac...@alap3.anarazel.de
He was suggesting to increase the struct size to 16 bytes rather than
going all the way up to 128.  Did anybody test this?

Re the coding of the padding computation, seems it'd be better to use
our standard "offsetof(last-struct-member) + sizeof(last-struct-member)"
rather than adding each of the members' sizes individually.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2017-02-13 Thread Peter Eisentraut

On 2/8/17 4:44 PM, Andres Freund wrote:
>> Well, I find it a bit scary.  If you do the big switch all at once, then
>> you will have to dedicate the following 3 months to fixing complaints
>> from developers and build farmers.
> 
> That'll be the case just as well if we spread it out over two cycles,
> except that we'll have it in multiple phases, and we run into the danger
> of a half-done conversion.

They key term is "dedicated".  I don't think anyone wants to spend 3
months doing nothing but fixing the build system.  If we spread it over
several releases and bring more people up to speed along the way, that
looks more manageable.

>> That would work if there were a single entry point into the build
>> system.  But in practice there are many, and every one of them is
>> someone's favorite.  It's unlikely that we will be able to enumerate all
>> of them during patch review.
> 
> Not sure what you mean with "entry point"?

People have all kinds of expectations on how the build system behaves.
It's not just ./configure; make; make install.  All kinds of niche and
edge cases have evolved over the years.  Variables you can set,
overrides, targets in some subdirectories, building in subdirectories
and having it build another subdirectory first, testing this way and
testing that way, testing with a custom locale or a custom database
name, building before testing or not, testing without reinstalling, and
so on and so on.  You'd have to make sure at least some of that
continues to work or be able to explain it away.  And most of it is not
really documented.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and --custom-fetch-value parameters

2017-02-13 Thread Robert Haas

On Sat, Feb 11, 2017 at 9:56 AM, Andrea Urbani  wrote:
> I'm a beginner here... anyway I try to share my ideas.
>
> My situation is changed in a worst state: I'm no more able to make a pg_dump 
> neither with my custom fetch value (I have tried "1" as value = one row at 
> the time) neither without the "--column-inserts":
>
> pg_dump: Dumping the contents of table "tDocumentsFiles" failed: 
> PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  out of memory
> DETAIL:  Failed on request of size 1073741823.
> pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", 
> "ID_File", "Name", "FileName", "Link", "Note", "Picture", "Content", 
> "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO 
> stdout;
>
> I don't know if the Kyotaro Horiguchi patch will solve this, because, again, 
> I'm not able to get neither one single row.

Yeah, if you can't fetch even one row, limiting the fetch size won't
help.  But why is that failing?  A single 1GB allocation should be
fine on most modern servers.  I guess the fact that you're using a
32-bit build of PostgreSQL is probably a big part of the problem;
there is probably only 2GB of available address space and you're
trying to find a single, contiguous 1GB chunk.  If you switch to using
a 64-bit PostgreSQL things will probably get a lot better for you,
unless the server's actual memory is also very small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq Alternate Row Processor

2017-02-13 Thread Kyle Gearhart

On 2/9/17 7:15 PM, Jim Nasby wrote:
> Can you run a trace to see where all the time is going in the single row 
> case? I don't see an obvious time-suck with a quick look through the code. 
> It'd be interesting to see how things change if you eliminate the filler 
> column from the SELECT.

Traces are attached, these are with callgrind.  

profile_nofiller.txt: single row without filler column
profile_filler.txt: single row with filler column
profile_filler_callback.txt: callback with filler column

pqResultAlloc looks to hit malloc pretty hard.  The callback reduces all of 
that to a single malloc for each row.

Without the filler, here is the average over 11 runs:
Realusersys
Callback.133.033.035
Single Row  .170.112.029

For the callback case, it's slightly higher than the prior results with the 
filler column.

Profile data file 'callgrind.out.14930' (creator: callgrind-3.11.0)

I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 74120972
Trigger: Program termination
Profiled target:  ./test -m row (PID 14930, part 1)
Events recorded:  Ir
Events shown: Ir
Event sort order: Ir
Thresholds:   99
Include dirs: 
User annotated:   
Auto-annotation:  off


 Ir 

313,455,690  PROGRAM TOTALS


Ir  file:function

61,410,828  ???:_int_malloc [/usr/lib64/libc-2.17.so]
38,321,887  ???:_int_free [/usr/lib64/libc-2.17.so]
25,800,115  ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10]
20,611,330  ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10]
16,002,817  ???:malloc [/usr/lib64/libc-2.17.so]
14,800,004  ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10]
12,604,893  ???:pqGetInt [/usr/local/pgsql/lib/libpq.so.5.10]
10,400,004  ???:PQsetResultAttrs [/usr/local/pgsql/lib/libpq.so.5.10]
10,200,316  main.c:main [/usr/local/src/postgresql-perf/test]
 9,600,000  ???:check_tuple_field_number [/usr/local/pgsql/lib/libpq.so.5.10]
 8,300,631  ???:__strcpy_sse2_unaligned [/usr/lib64/libc-2.17.so]
 7,500,075  ???:pqResultStrdup [/usr/local/pgsql/lib/libpq.so.5.10]
 7,500,000  ???:pqSkipnchar [/usr/local/pgsql/lib/libpq.so.5.10]
 7,017,368  ???:__memcpy_ssse3_back [/usr/lib64/libc-2.17.so]
 6,900,000  ???:PQgetisnull [/usr/local/pgsql/lib/libpq.so.5.10]
 6,401,100  ???:free [/usr/lib64/libc-2.17.so]
 6,200,004  ???:PQcopyResult [/usr/local/pgsql/lib/libpq.so.5.10]
 6,100,959  ???:__strlen_sse2_pminub [/usr/lib64/libc-2.17.so]
 5,700,000  ???:PQgetvalue [/usr/local/pgsql/lib/libpq.so.5.10]
 4,700,045  ???:PQclear [/usr/local/pgsql/lib/libpq.so.5.10]
 4,200,057  ???:PQmakeEmptyPGresult [/usr/local/pgsql/lib/libpq.so.5.10]
 4,103,903  ???:PQgetResult [/usr/local/pgsql/lib/libpq.so.5.10]
 3,400,000  ???:pqAddTuple [/usr/local/pgsql/lib/libpq.so.5.10]
 3,203,437  ???:pqGetc [/usr/local/pgsql/lib/libpq.so.5.10]
 2,600,034  ???:pqPrepareAsyncResult [/usr/local/pgsql/lib/libpq.so.5.10]
 2,500,679  ???:appendBinaryPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10]
 2,300,621  ???:enlargePQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10]
 1,600,016  ???:appendPQExpBufferStr [/usr/local/pgsql/lib/libpq.so.5.10]
   900,270  ???:resetPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10]


Profile data file 'callgrind.out.15062' (creator: callgrind-3.11.0)

I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 84068364
Trigger: Program termination
Profiled target:  ./test -m row (PID 15062, part 1)
Events recorded:  Ir
Events shown: Ir
Event sort order: Ir
Thresholds:   99
Include dirs: 
User annotated:   
Auto-annotation:  off


 Ir 

358,525,458  PROGRAM TOTALS


Ir  file:function

61,410,901  ???:_int_malloc [/usr/lib64/libc-2.17.so]
38,321,887  ???:_int_free [/usr/lib64/libc-2.17.so]
31,400,139  ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10]
22,839,505  ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10]
17,600,004  ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10]
16,002,817  ???:malloc [/usr/lib64/libc-2.17.so]
14,716,359  ???:pqGetInt

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-13 Thread Bernd Helmle

Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov:
> Thus, I see reasons why in your tests absolute results are lower than
> in my
> previous tests.
> 1.  You use 28 physical cores while I was using 32 physical cores.
> 2.  You run tests in PowerVM while I was running test on bare metal.
> PowerVM could have some overhead.
> 3.  I guess you run pgbench on the same machine.  While in my tests
> pgbench
> was running on another node of IBM E880.
> 

Yeah, pgbench was running locally. Maybe we can get some resources to
run them remotely. Interesting side note: If you run a second postgres
instance with the same pgbench in parallel, you get nearly the same
transaction throughput as a single instance.

Short side note:

If you run two Postgres instances concurrently with the same pgbench
parameters, you get nearly the same transaction throughput for both
instances each as when running against a single instance, e.g.


- single

transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 112
duration: 300 s
number of transactions actually processed: 121523797
latency average = 0.276 ms
latency stddev = 0.096 ms
tps = 405075.282309 (including connections establishing)
tps = 405114.299174 (excluding connections establishing)

instance-1/instance-2 concurrently run:

transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 56
duration: 300 s
number of transactions actually processed: 120645351
latency average = 0.278 ms
latency stddev = 0.158 ms
tps = 402148.536087 (including connections establishing)
tps = 402199.952824 (excluding connections establishing)

transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 56
duration: 300 s
number of transactions actually processed: 121959772
latency average = 0.275 ms
latency stddev = 0.110 ms
tps = 406530.139080 (including connections establishing)
tps = 406556.658638 (excluding connections establishing)

So it looks like the machine has plenty of power, but PostgreSQL is
limiting somewhere.

> Therefore, having lower absolute numbers in your tests, win of LWLock
> optimization is also lower.  That is understandable.  But win of
> LWLock
> optimization is clearly visible definitely exceeds variation.
> 
> I think it would make sense to run more kinds of tests.  Could you
> try set
> of tests provided by Tomas Vondra?
> If even we wouldn't see win some of the tests, it would be still
> valuable
> to see that there is no regression there.

Unfortunately there are some test for AIX scheduled, which will assign
resources to that LPAR...i've just talked to the people responsible for
the machine and we can get more time for the Linux tests ;)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor

2017-02-13 Thread Konstantin Knizhnik


Hello hackers,

There were many discussions concerning  possible ways of speeding-up 
Postgres. Different approaches were suggested:


- JIT (now we have three different prototype implementations based on LLVM)
- Chunked (vectorized) executor
- Replacing pull with push
- Columnar store (cstore_fdw, IMCS)
- Optimizing and improving current executor (reducing tuple deform 
overhead, function call overhead,...)


Obviously the best result can be achieved in case of combining all this 
approaches. But actually them are more or less interchangeable: 
vectorized execution is not eliminating interpretation overhead, but it 
is divided by vector size and becomes less critical.


I decided to write small prototype to estimate possible speed 
improvement of vectorized executor. I created special types representing 
"tile" and implement standard SQL operators for them. So neither 
Postgres planer, nether Postgres executor, nether Postgres heap manager 
are changed. But I was able to reach more than 10 times speed 
improvement on TPC-H Q1/Q6 queries!


Please find more information here: 
https://cdn.rawgit.com/postgrespro/vops/ddcbfbe6/vops.html
The sources of the project can be found here: 
https://github.com/postgrespro/vops.git


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-13 Thread Alexander Korotkov

On Mon, Feb 13, 2017 at 3:34 PM, Bernd Helmle  wrote:

> Am Samstag, den 11.02.2017, 00:28 +0100 schrieb Tomas Vondra:
> > Comparing averages of tps, measured on 5 runs (each 5 minutes long),
> > the
> > difference between master and patched master is usually within 2%,
> > which
> > is pretty much within noise.
> >
> > I'm attaching spreadsheets with summary of the results, so that we
> > have
> > it in the archives. As usual, the scripts and much more detailed
> > results
> > are available here:
>
> I've done some benchmarking of this patch against the E850/ppc64el
> Ubuntu LPAR we currently have access to and got the attached results.
> pg_prewarm as recommended by Alexander was used, the tests run 300s
> secs, scale 1000, each with a testrun before. The SELECT-only pgbench
> was run twice each, the write tests only once.
>
> Looks like the influence of this patch isn't that big, at least on this
> machine.
>

Thank you for testing.

Yes, influence seems to be low.  But nevertheless it's important to insure
that there is no regression here.
Despite pg_prewarm'ing and running tests 300s, there is still significant
variation.
For instance, with clients count = 80:
 * pgxact-result-2.txt – 474704
 * pgxact-results.txt – 574844
Could some background processes influence the tests?  Or could it be
another virtual machine?
Also, I wonder why I can't see this variation on the graphs.
Another issue with graphs is that we can't see details of read and write
TPS variation on the same scale, because write TPS values are too low.  I
think you should draw write benchmark on the separate graph.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Logical replication existing data copy

2017-02-13 Thread Erik Rijkers


On 2017-02-11 11:16, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


This often works but it also fails far too often (in my hands).  I
test whether the tables are identical by comparing an md5 from an
ordered resultset, from both replica and master.  I estimate that 1 in
5 tries fail; 'fail'  being a somewhat different table on replica
(compared to mater), most often pgbench_accounts (typically there are
10-30 differing rows).  No errors or warnings in either logfile.   I'm
not sure but I think testing on faster machines seem to be doing
somewhat better ('better' being less replication error).



I have noticed that when I insert a few seconds wait-state after the 
create subscription (or actually: the 'enable'ing of the subscription) 
the problem does not occur.  Apparently, (I assume) the initial snapshot 
occurs somewhere when the subsequent pgbench-run has already started, so 
that the logical replication also starts somewhere 'into' that 
pgbench-run. Does that make sense?


I don't know what to make of it.  Now that I think that I understand 
what happens I hesitate to call it a bug. But I'd say it's still a 
useability problem that the subscription is only 'valid' after some 
time, even if it's only a few seconds.


(the other problem I mentioned (drop subscription hangs) still happens 
every now and then)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Dilip Kumar

On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas  wrote:
> I don't think it's acceptable (or necessary) to move the DSA
> definitions into postgres.h.  Why do you think you need to do that,
> vs. just including dsa.h in a few more places?

I need to access dsa_pointer in tidbitmap.h, which is included from
FRONTEND as well. Now, problem is that dsa.h is including #include
"port/atomics.h", but atomic.h can not be included if FRONTEND is
defined.

#ifndef ATOMICS_H
#define ATOMICS_H
#ifdef FRONTEND
#error "atomics.h may not be included from frontend code"
#endif

Is there any other solution to this ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-02-13 Thread Ashutosh Bapat

On Mon, Feb 6, 2017 at 11:06 AM, Amit Khandekar  wrote:
> Ashutosh Bapat  wrote:
>>> We may want to think about a third goal: preventing too many workers
>>> from executing the same plan. As per comment in get_parallel_divisor()
>>> we do not see any benefit if more than 4 workers execute the same
>>> node. So, an append node can distribute more than 4 worker nodes
>>> equally among the available subplans. It might be better to do that as
>>> a separate patch.
>>
>> I think that comment is for calculating leader contribution. It does
>> not say that 4 workers is too many workers in general.
>>
>> But yes, I agree, and I have it in mind as the next improvement.
>> Basically, it does not make sense to give more than 3 workers to a
>> subplan when parallel_workers for that subplan are 3. For e.g., if
>> gather max workers is 10, and we have 2 Append subplans s1 and s2 with
>> parallel workers 3 and 5 respectively. Then, with the current patch,
>> it will distribute 4 workers to each of these workers. What we should
>> do is : once both of the subplans get 3 workers each, we should give
>> the 7th and 8th worker to s2.
>>
>> Now that I think of that, I think for implementing above, we need to
>> keep track of per-subplan max_workers in the Append path; and with
>> that, the bitmap will be redundant. Instead, it can be replaced with
>> max_workers. Let me check if it is easy to do that. We don't want to
>> have the bitmap if we are sure it would be replaced by some other data
>> structure.
>
> Attached is v2 patch, which implements above. Now Append plan node
> stores a list of per-subplan max worker count, rather than the
> Bitmapset. But still Bitmapset turned out to be necessary for
> AppendPath. More details are in the subsequent comments.
>
>
>>> Goal A : Allow non-partial subpaths in Partial Append.
>>> Goal B : Distribute workers across the Append subplans.
>>> Both of these require some kind of synchronization while choosing the
>>> next subplan. So, goal B is achieved by doing all the synchronization
>>> stuff. And implementation of goal A requires that goal B is
>>> implemented. So there is a dependency between these two goals. While
>>> implementing goal B, we should keep in mind that it should also work
>>> for goal A; it does not make sense later changing the synchronization
>>> logic in goal A.
>>>
>>> I am ok with splitting the patch into 2 patches :
>>> a) changes required for goal A
>>> b) changes required for goal B.
>>> But I think we should split it only when we are ready to commit them
>>> (commit for B, immediately followed by commit for A). Until then, we
>>> should consider both of these together because they are interconnected
>>> as explained above.
>>
>> For B, we need to know, how much gain that brings and in which cases.
>> If that gain is not worth the complexity added, we may have to defer
>> Goal B. Goal A would certainly be useful since it will improve
>> performance of the targetted cases. The synchronization required for
>> Goal A is simpler than that of B and thus if we choose to implement
>> only A, we can live with a simpler synchronization.
>
> For Goal A , the logic for a worker synchronously choosing a subplan will be :
> Go the next subplan. If that subplan has not already assigned max
> workers, choose this subplan, otherwise, go the next subplan, and so
> on.

Right, at a given time, we have to remember only the next plan to
assign worker to. That's simpler than remembering the number of
workers for each subplan and updating those concurrently. That's why I
am saying synchronization for A is simpler than that of B.

> For Goal B , the logic will be :
> Among the subplans which are yet to achieve max workers, choose the
> subplan with the minimum number of workers currently assigned.
>
> I don't think there is a significant difference between the complexity
> of the above two algorithms. So I think here the complexity does not
> look like a factor based on which we can choose the particular logic.
> We should choose the logic which has more potential for benefits. The
> logic for goal B will work for goal A as well. And secondly, if the
> subplans are using their own different system resources, the resource
> contention might be less. One case is : all subplans using different
> disks. Second case is : some of the subplans may be using a foreign
> scan, so it would start using foreign server resources sooner. These
> benefits apply when the Gather max workers count is not sufficient for
> running all the subplans in their full capacity. If they are
> sufficient, then the workers will be distributed over the subplans
> using both the logics. Just the order of assignments of workers to
> subplans will be different.
>
> Also, I don't see a disadvantage if we follow the logic of Goal B.

Do we have any performance measurements where we see that Goal B
performs better than Goal A, in such a situation? Do we have any

Re: [HACKERS] 'text' instead of 'unknown' in Postgres 10

2017-02-13 Thread Daniele Varrazzo

On Mon, Feb 13, 2017 at 3:09 AM, Jim Nasby  wrote:
> On 2/7/17 9:16 AM, Daniele Varrazzo wrote:
>>
>> Thank you for the clarification: I will assume the behaviour cannot be
>> maintained on PG 10 and think whether the treatment of '{}' is too
>> magical and drop it instead.
>
>
> BTW, I would hope that passing '{}' into a defined array field still works,
> since an empty array isn't treated the same as NULL, which means you need
> some way to create an empty array.

Yes, that didn't change. The issue was only reading data from postgres
to python.

There is a beta of next psycopg version available on testpypi which
implements the new behaviour, if you want to take a look at it. You
can use

pip install -i https://testpypi.python.org/pypi psycopg2==2.7b1

to install it.


-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Robert Haas

On Sat, Feb 11, 2017 at 1:41 AM, Dilip Kumar  wrote:
> I tried my best, please check the latest patch (0001).

I don't think it's acceptable (or necessary) to move the DSA
definitions into postgres.h.  Why do you think you need to do that,
vs. just including dsa.h in a few more places?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-02-13 Thread Bernd Helmle

Am Samstag, den 11.02.2017, 00:28 +0100 schrieb Tomas Vondra:
> Comparing averages of tps, measured on 5 runs (each 5 minutes long),
> the 
> difference between master and patched master is usually within 2%,
> which 
> is pretty much within noise.
> 
> I'm attaching spreadsheets with summary of the results, so that we
> have 
> it in the archives. As usual, the scripts and much more detailed
> results 
> are available here:

I've done some benchmarking of this patch against the E850/ppc64el
Ubuntu LPAR we currently have access to and got the attached results.
pg_prewarm as recommended by Alexander was used, the tests run 300s
secs, scale 1000, each with a testrun before. The SELECT-only pgbench
was run twice each, the write tests only once.

Looks like the influence of this patch isn't that big, at least on this
machine.

We're going to reassign the resources to an AIX LPAR soon, which
doesn't give me enough time to test with Tomas' test scripts again.
>> clients 10 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 10
number of threads: 10
duration: 300 s
number of transactions actually processed: 39931683
latency average = 0.075 ms
tps = 133105.478637 (including connections establishing)
tps = 133109.428803 (excluding connections establishing)
>> clients 20 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 20
number of threads: 20
duration: 300 s
number of transactions actually processed: 78852558
latency average = 0.076 ms
tps = 262841.340316 (including connections establishing)
tps = 262855.469408 (excluding connections establishing)
>> clients 40 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 40
number of threads: 40
duration: 300 s
number of transactions actually processed: 118612968
latency average = 0.101 ms
tps = 395375.595262 (including connections establishing)
tps = 395432.166071 (excluding connections establishing)
>> clients 80 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 80
number of threads: 80
duration: 300 s
number of transactions actually processed: 170171627
latency average = 0.141 ms
tps = 567235.428660 (including connections establishing)
tps = 567322.181229 (excluding connections establishing)
>> clients 100 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 168267642
latency average = 0.178 ms
tps = 560888.338862 (including connections establishing)
tps = 561076.995776 (excluding connections establishing)
>> clients 160 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 160
number of threads: 160
duration: 300 s
number of transactions actually processed: 146675068
latency average = 0.327 ms
tps = 488911.857866 (including connections establishing)
tps = 489155.251547 (excluding connections establishing)
>> clients 240 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 240
number of threads: 240
duration: 300 s
number of transactions actually processed: 139558941
latency average = 0.516 ms
tps = 465184.240782 (including connections establishing)
tps = 466046.696168 (excluding connections establishing)
>> clients 280 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 280
number of threads: 280
duration: 300 s
number of transactions actually processed: 137419242
latency average = 0.611 ms
tps = 458052.555736 (including connections establishing)
tps = 459795.543770 (excluding connections establishing)
>> clients 10 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 10
number of threads: 10
duration: 300 s
number of transactions actually processed: 40994263
latency average = 0.073 ms
tps = 136647.383534 (including connections establishing)
tps = 136649.850380 (excluding connections establishing)
>> clients 20 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 20
number of threads: 20
duration: 300 s
number of transactions actually processed: 71023846
latency average = 0.084 ms
tps = 236745.708077 (including connections establishing)
tps = 236752.743230 (excluding connections establishing)
>> clients 40 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 40
number of threads: 40
duration: 300 s
number of transactions actually processed: 120060357
latency average = 0.100 ms
tps = 400200.095159 (including connections establishing)
tps = 400238.563079 (excluding connections establishing)
>> clients 80 <
transaction type: 
scaling factor: 1000
query mode: prepared
number of clients: 80
number of threads: 80
duration: 300 s
number of transactions actually processed: 170353955
latency average = 0.141 ms
tps = 567843.510912 (including connections establishing)
tps =

Re: [HACKERS] Parallel Index Scans

2017-02-13 Thread Robert Haas

On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila  wrote:
 Why can't we rely on _bt_walk_left?
>>> The reason is mentioned in comments, but let me try to explain with
>>> some example.  When you reach that point of code, it means that either
>>> the current page (assume page number is 10) doesn't contain any
>>> matching items or it is a half-dead page, both of which indicates that
>>> we have to move to the previous page.   Now, before checking if the
>>> current page contains matching items, we signal parallel machinery
>>> (via _bt_parallel_release) to allow workers to read the previous page
>>> (assume previous page number is 9).  So it is quite possible that
>>> after deciding that current page (page number 10) doesn't contain any
>>> matching tuples if we directly move to the previous page (in this case
>>> it will be 9) by using _bt_walk_left, some other worker would have
>>> read page 9.  In short, if we directly use _bt_walk_left(), then we
>>> are prone to returning some of the values twice as multiple workers
>>> can read the same page.
>> But ... the entire point of the seize-and-release stuff is to avoid
>> this problem.  You're suppose to seize the scan, read the current
>> page, walk left, store the page you find in the scan, and then release
>> the scan.
> Exactly and that is what is done in the patch.  Basically, if we found
> that the current page is half-dead or it doesn't contain any matching
> items, then release the current buffer, seize the scan, read the
> current page, walk left and so on.  I am slightly confused here
> because it seems both of us agree on what is the right thing to do and
> according to me that is how it is implemented.  Are you just ensuring
> about whether I have implemented as discussed or do you see a problem
> with the way it is implemented?

Well, before, I thought you said that relying entirely on
_bt_walk_left couldn't work because then two people might end up
running it at the same time, and that would cause problems.  But if
you can only run _bt_walk_left while you've got the scan seized, then
that can't happen.  Evidently I'm missing something here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] UPDATE of partition key

2017-02-13 Thread Amit Khandekar

Currently, an update of a partition key of a partition is not allowed,
since it requires to move the row(s) into the applicable partition.

Attached is a WIP patch (update-partition-key.patch) that removes this
restriction. When an UPDATE causes the row of a partition to violate
its partition constraint, then a partition is searched in that subtree
that can accommodate this row, and if found, the row is deleted from
the old partition and inserted in the new partition. If not found, an
error is reported.

There are a few things that can be discussed about :

1. We can run an UPDATE using a child partition at any level in a
nested partition tree. In such case, we should move the row only
within that child subtree.

For e.g. , in a tree such as :
tab ->
   t1 ->
  t1_1
  t1_2
   t2 ->
  t2_1
  t2_2

For "UPDATE t2 set col1 = 'AAA' " , if the modified tuple does not fit
in t2_1 but can fit in t1_1, it should not be moved to t1_1, because
the UPDATE is fired using t2.

2. In the patch, as part of the row movement, ExecDelete() is called
followed by ExecInsert(). This is done that way, because we want to
have the ROW triggers on that (sub)partition executed. If a user has
explicitly created DELETE and INSERT BR triggers for this partition, I
think we should run those. While at the same time, another question
is, what about UPDATE trigger on the same table ? Here again, one can
argue that because this UPDATE has been transformed into a
DELETE-INSERT, we should not run UPDATE trigger for row-movement. But
there can be a counter-argument. For e.g. if a user needs to make sure
about logging updates of particular columns of a row, he will expect
the logging to happen even when that row was transparently moved. In
the patch, I have retained the firing of UPDATE BR trigger.

3. In case of a concurrent update/delete, suppose session A has locked
the row for deleting it. Now a session B has decided to update this
row and that is going to cause row movement, which means it will
delete it first. But when session A is finished deleting it, session B
finds that it is already deleted. In such case, it should not go ahead
with inserting a new row as part of the row movement. For that, I have
added a new parameter 'already_delete' for ExecDelete().

Of course, this still won't completely solve the concurrency anomaly.
In the above case, the UPDATE of Session B gets lost. May be, for a
user that does not tolerate this, we can have a table-level option
that disallows row movement, or will cause an error to be thrown for
one of the concurrent session.

4. The ExecSetupPartitionTupleRouting() is re-used for routing the row
that is to be moved. So in ExecInitModifyTable(), we call
ExecSetupPartitionTupleRouting() even for UPDATE. We can also do this
only during execution time for the very first time we find that we
need to do a row movement. I will think over that, but I am thinking
it might complicate things, as compared to always doing the setup for
UPDATE. WIll check on that.


5. Regarding performance testing, I have compared the results of
row-movement with partition versus row-movement with inheritance tree
using triggers.  Below are the details :

Schema :

CREATE TABLE ptab (a date, b int, c int);

CREATE TABLE ptab (a date, b int, c int) PARTITION BY RANGE (a, b);

CREATE TABLE ptab_1_1 PARTITION OF ptab
for values from ('1900-01-01', 1) to ('1900-01-01', 101)
PARTITION BY range (c);

CREATE TABLE ptab_1_1_1 PARTITION OF ptab_1_1
for values from (1) to (51);
CREATE TABLE ptab_1_1_2 PARTITION OF ptab_1_1
for values from (51) to (101);
.
.
CREATE TABLE ptab_1_1_n PARTITION OF ptab_1_1
for values from (n) to (n+m);

..
..

CREATE TABLE ptab_5_n PARTITION OF ptab
for values from ('1905-01-01', 101) to ('1905-01-01', 201)
PARTITION BY range (c);

CREATE TABLE ptab_1_2_1 PARTITION OF ptab_1_2
for values from (1) to (51);
CREATE TABLE ptab_1_2_2 PARTITION OF ptab_1_2
for values from (51) to (101);
.
.
CREATE TABLE ptab_1_2_n PARTITION OF ptab_1_2
for values from (n) to (n+m);
.
.

Similarly for inheritance :

CREATE TABLE ptab_1_1
(constraint check_ptab_1_1 check (a = '1900-01-01' and b >= 1 and b <
8)) inherits (ptab);
create trigger brutrig_ptab_1_1 before update on ptab_1_1 for each row
execute procedure ptab_upd_trig();
CREATE TABLE ptab_1_1_1
(constraint check_ptab_1_1_1 check (c >= 1 and c < 51))
inherits (ptab_1_1);
create trigger brutrig_ptab_1_1_1 before update on ptab_1_1_1 for each
row execute procedure ptab_upd_trig();
CREATE TABLE ptab_1_1_2
(constraint check_ptab_1_1_2 check (c >= 51 and c < 101))
inherits (ptab_1_1);

create trigger brutrig_ptab_1_1_2 before update on ptab_1_1_2 for each
row execute procedure ptab_upd_trig();

I had to have a BR UPDATE trigger on each of the leaf tables.

Attached is the BR trigger function update_trigger.sql. There it
generates the table

1 2 >

1 - 100 of 114 matches

Mail list logo