Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-14 Thread Kuntal Ghosh
On Wed, Mar 15, 2017 at 12:32 AM, Robert Haas  wrote:
> On Mon, Mar 13, 2017 at 10:36 AM, Ashutosh Sharma  
> wrote:
>> Couple of review comments,,
>>
>> You may also need to update the documentation as now we are also going
>> to support wal consistency check for hash index. The current
>> documentation does not include hash index.
>>
>> +only records originating from those resource managers.  Currently,
>> +the supported resource managers are heap,
>> +heap2, btree, gin,
>> +gist, sequence, spgist,
>> +brin, and generic. Only
>
> Did that, committed this.  Also ran pgindent over hash_mask() and
> fixed an instance of dubious capitalization.
Thanks Robert for the commit.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Configurable file mode mask

2017-03-14 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Steele
> Sure, but having the private key may allow them to get new data from the
> server as well as the data from the backup.

You are right.  My rough intent was that the data is stolen anyway.  So, I 
thought it might not be so bad to expect to be able to back up the SSL key file 
in $PGDATA together with the database.  If it's bad, then the default value of 
ssl_key_file (=$PGDATA/ssl.key) should be disallowed.


> > https://www.postgresql.org/docs/devel/static/ssl-tcp.html
> >
> > "On Unix systems, the permissions on server.key must disallow any access
> to world or group; achieve this by the command chmod 0600 server.key.
> Alternatively, the file can be owned by root and have group read access
> (that is, 0640 permissions). That setup is intended for installations where
> certificate and key files are managed by the operating system. The user
> under which the PostgreSQL server runs should then be made a member of the
> group that has access to those certificate and key files."
> >
> > In the latter case, the file owner is root and the permission is 0640.
> At first I was a bit confused and misunderstood that the PostgreSQL user
> account and the backup OS user needs to belong to the same OS group.  But
> that's not the case.  The group of the key file can be, for example,
> "ssl_cert", the PostgreSQL user account belongs to the OS group "ssl_cert"
> and "dba", and the backup OS user only belongs to "backup."  This can prevent
> the backup OS user from reading the key file.  I think it would be better
> to have some explanation with examples in the above section.
> 
> If the backup user is in the same group as the postgres user and in the
> ssl_cert group then backups of the certs would be possible using group reads.
> Restores will be a little tricky, though, because of the need to set
> ownership to root.  The restore would need to be run as root or the
> permissions fixed up after the restore completes.

Yes, but I thought, from the following message,  that you do not recommend that 
the backup user be able to read the SSL key file.  So, I proposed to describe 
the example configuration to achieve that -- postgres user in dba and ssl_cert 
group, and a separate backup user in only dba group.

> >> It seems to me that it would be best to advise in the docs that these
> >> files should be relocated if they won't be readable by the backup user.
> >> In any event, I'm not convinced that backing up server private keys
> >> is a good idea.


> To be clear, the default for this patch is to leave permissions exactly
> as they are now.  It also provides alternatives that may or not be useful
> in all cases.

So you think there are configurations that may be useful or not, don't you?  
Adding a new parameter could possibly complicate what users have to consider.  
Maximal flexibility could lead to insecure misuse.  So I think it would be 
better to describe secure and practical configuration examples in the SSL 
section and/or the backup chapter.  The configuration factor includes whether 
the backup user is different from the postgres user, where the SSL key file is 
placed, the owner of the SSL key file, whether the backup user can read the SSL 
key file.

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-14 Thread Rafia Sabih
On Tue, Jan 31, 2017 at 11:54 AM, Fabien COELHO  wrote:

>
> Bonjour Michaël,
>
> Attached are the patch, a test script for the feature, and various test
>>> scripts to trigger error cases.
>>>
>>
>> I have moved this patch to next CF
>>
>
> Ok.
>
> as the last status is a new patch set with no further reviews.
>>
>
> Indeed.
>
> I did not check if the comments have been applied though, this is a bit
>> too much for me now...
>>
>
> Sure.


I was reviewing v7 of this patch, to start with I found following white
space errors when applying with git apply,
 /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
whitespace.
char   *line; /* first line for short display */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:67: trailing
whitespace.
char   *lines; /* full multi-line text of command */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:72: trailing
whitespace.
int compound;   /* last compound command (number of \;) */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:73: trailing
whitespace.
char  **gset;   /* per-compound command prefix */
/home/edb/Desktop/patches/others/pgbench-into-7.patch:81: trailing
whitespace.
/* read all responses from backend */
error: patch failed: doc/src/sgml/ref/pgbench.sgml:815
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.c:375
error: src/bin/pgbench/pgbench.c: patch does not apply
error: patch failed: src/bin/pgbench/pgbench.h:11
error: src/bin/pgbench/pgbench.h: patch does not apply
error: patch failed: src/fe_utils/psqlscan.l:678
error: src/fe_utils/psqlscan.l: patch does not apply
error: patch failed: src/include/fe_utils/psqlscan_int.h:112
error: src/include/fe_utils/psqlscan_int.h: patch does not apply

Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT 2
AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
giving error Invalid command \gcset. Not sure what is the intention of this
script anyway? Also, instead of so many different files for error why don't
you combine it into one.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-14 Thread Dilip Kumar
On Mon, Mar 13, 2017 at 5:48 PM, Rafia Sabih
 wrote:
> Fixed. The attached patch is over execute_once.patch [1].
> I haven't addressed the issue regarding the confusion I raised upthread
> about incorrect value of !es->lazyeval that is restricting parallelism for
> simple queries coming from sql functions, as I am not sure if it is the
> concern of this patch or not.

I have reviewed the patch, I have some questions.

@@ -3031,7 +3031,7 @@ exec_stmt_return_query(PLpgSQL_execstate *estate,
  Assert(stmt->dynquery != NULL);
  portal = exec_dynquery_with_params(estate, stmt->dynquery,
stmt->params, NULL,
-   CURSOR_OPT_PARALLEL_OK);
+   0);

After this patch, I have noticed that In exec_stmt_return_query, we
allow parallel query if it's a static query
but not for the dynamic query.  Any specific reason for different
handling for these 2 cases?


@ -314,7 +314,7 @@ SPI_execute(const char *src, bool read_only, long tcount)

  memset(, 0, sizeof(_SPI_plan));
  plan.magic = _SPI_PLAN_MAGIC;
- plan.cursor_options = 0;
+ plan.cursor_options = CURSOR_OPT_PARALLEL_OK;

In SPI_Execute, we are setting cursor_options to
CURSOR_OPT_PARALLEL_OK. I have analysed call to this function from PL
and seems fine to me. But, I have a question have you analyzed all the
calls to this functions?
e.g. build_tuplestore_recursively, get_crosstab_tuplestore.


On Fri, Mar 10, 2017 at 5:38 PM, Rafia Sabih
 wrote:
> I wanted to clarify a few things here, I noticed that call of ExecutorRun in
> postquel_getnext() uses !es->lazyEval as execute_once, this is confusing, as
> it is true even in cases when a simple query like "select count(*) from t"
> is used in a sql function. Hence, restricting parallelism for cases when it
> shouldn't. It seems to me that es->lazyEval is not set properly or it should
> not be true for simple select statements. I found that in the definition of
> execution_state
> bool lazyEval; /* true if should fetch one row at a time

Hmm, It seems that es->lazyEval is not set properly, Ideally, it
should be set true only if it's lazy evaluation but in this case, it's
used for identifying the tupcount as well.  IMHO, it should be fixed.

Any other opinion on this?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-14 Thread Ashutosh Bapat
On Wed, Mar 15, 2017 at 3:39 AM, Robert Haas  wrote:
> On Mon, Mar 13, 2017 at 6:24 AM, Amit Langote
>  wrote:
>> The previous proposal was for expand_inherited_rtentry to not create RT
>> entries and AppendRelInfo's for the non-leaf tables, but I think that
>> doesn't work, as I tried to explain above.  We need RTEs because that
>> seems to be the only way currently for informing the executor of the
>> non-leaf tables. We need AppendRelInfo's to store the RT indexes of those
>> RTEs for the latter planning steps to collect them in partitioned_rels
>> mentioned above. So with the latest patch, we do create the RT entry and
>> AppendRelInfo for non-leaf tables.  AppendRelInfo created in this case is
>> a minimal one; only parent_relid and child_relid are valid.  To make the
>> latter planning steps ignore these minimal AppendRelInfo's, every
>> AppendRelInfo is now marked with child_relkind.  Only
>> set_append_rel_pathlist() and inheritance_planner() process them to
>> collect the child_relid into the partitioned_rels list to be stored in
>> AppendPath/MergeAppendPath and ModifyTablePath, respectively.
>
> I see your point, but I still think this kind of stinks.  You've got
> all kinds of logic that is now conditional on child_is_partitioned,
> and that seems like a recipe for bugs and future maintenance
> difficulties.  It would be much nicer if we could come up with a
> design that doesn't create the AppendRelInfo in the first place,
> because then all of that stuff could just work.  Can we get away with
> creating an RTE for each partitioned table (other than the parent,
> perhaps; for that one it would be nice to use the inh-flagged RTE we
> already have) but NOT creating an AppendRelInfo to go with it?  If
> we've got the list of RTIs for the new RTEs associated with the append
> path in some other form, can't we get by without also having an
> AppendRelInfo to hold onto that translation?
>

Will it help to retain the partition hierarchy as inheritance
hierarchy and then collapse it while creating append paths. That will
be needed by partition-wise join, will be helpful in partition pruning
without using constraints and so on. So, may be could use that
infrastructure to simplify the logic here. The patch is available as
0013 in [1].

[1] cafjfprfqotrr6cm3soobhmhevdkffaz6pyyg4grzsomuw08...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allow referring to functions without arguments when unique

2017-03-14 Thread Peter Eisentraut
On 3/14/17 03:03, Michael Paquier wrote:
> This looks good to me, so switched as ready for committer.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-14 19:34:12 -0400, Tom Lane wrote:
>> It seems bizarre that you chose to spell the new configure symbol as
>> HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO

> I went back-and-forth about this a number of times.  We have a bunch of
> symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and
> more of the nearby code seems to use __ rather than _.  I don't really
> know why we started doing that, but it's far from new..

> Any idea why we introduce __ stuff?

The nearby stuff is describing features that have a specific name that
includes a leading underscore or two, like __builtin_unreachable().
That doesn't seem to apply here, so I wouldn't add extra underscores.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
On 2017-03-14 23:10:25 -0400, Peter Eisentraut wrote:
> On 3/14/17 19:40, Andres Freund wrote:
> > Any idea why we introduce __ stuff?
> 
> Because the symbols start with an underscore:
> 
> /* Define to 1 if your compiler understands _Static_assert. */
> #undef HAVE__STATIC_ASSERT

Oh, I guess that makes some sense.  But we do so very inconsistently,
given we only keep the leading underscores not the trailing ones (see
HAVE__VA_ARGS, HAVE_FUNCNAME__FUNC).


> But we don't need to introduce underscores that are not there.

Indeed.  Don't want to repost for just this, but consider it adapted
(and moved as requested by Tom).

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Peter Eisentraut
On 3/14/17 19:40, Andres Freund wrote:
> Any idea why we introduce __ stuff?

Because the symbols start with an underscore:

/* Define to 1 if your compiler understands _Static_assert. */
#undef HAVE__STATIC_ASSERT

There is apparently some inconsistency when symbols start with more than
one underscore.

But we don't need to introduce underscores that are not there.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-14 Thread Michael Paquier
On Tue, Mar 14, 2017 at 11:34 PM, David Steele  wrote:
> This thread is stalled and it looks like the patch may not be workable,
> at least in the current form.
>
> I will mark this a "Returned with Feedback" on 2017-03-17 unless there
> are arguments to the contrary.

Or even rejected would be fine. I am also not sure if changing the
number of tarballs of the fetch mode is worth breaking to get a look
earlier at the backup_label file, still the patch as it stands is no
good.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Amit Kapila
On Wed, Mar 15, 2017 at 12:53 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> >> It's true that as soon as we need another overflow page, that's going to
>> >> get dropped beyond the 2^{N+1}-1 point, and the *apparent* size of the
>> >> index will grow quite a lot.  But any modern filesystem should handle
>> >> that without much difficulty by treating the index as a sparse file.
>>
>> > Uh, last I heard we didn't allow or want sparse files in the backend
>> > because then we have to handle a possible out-of-disk-space failure on
>> > every write.
>>
>> For a hash index, this would happen during a bucket split, which would
>> need to be resilient against out-of-disk-space anyway.
>
> We wouldn't attempt to use the area of the file which is not yet
> allocated except when doing a bucket split?
>

That's right.

>  If that's the case then
> this does seem to at least be less of an issue, though I hope we put in
> appropriate comments about it.
>

I think we have sufficient comments in code especially on top of
function _hash_alloc_buckets().


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-14 Thread Michael Paquier
On Wed, Mar 15, 2017 at 4:52 AM, Robert Haas  wrote:
> On Tue, Mar 14, 2017 at 7:22 AM, Kuntal Ghosh
>  wrote:
>> I do have extended localBackendStatusTable with slots for non-backend
>> processes. But, I've renamed it as localProcStatusTable since it
>> includes all processes. I'll keep the variable name as
>> localBackendStatusTable in the updated patch to avoid any confusion.
>> I've extended BackendStatusArray to store auxiliary processes.
>> Backends use slots indexed in the range from 1 to MaxBackends
>> (inclusive), so we use MaxBackends + AuxProcType + 1 as the index of
>> the slot for an auxiliary process.
>
> I think the subject of this the thread, for which I'm probably to
> blame, is bad terminology.  The processes we're talking about exposing
> in pg_stat_activity here are really backends, too, I think.  They're
> just ... special backends.  So I would tend to avoid any backend ->
> process type of renaming.

FWIW, my impression on the matter matches what is written in this paragraph.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New procedural language

2017-03-14 Thread Amit Langote
Hi,

On 2017/03/15 11:08, Massimo Fidanza wrote:
> Hello,
> 
> what it means to add a new procedural language such as Go or Rust?

I think you're looking for how to write a "PL function call handler", most
likely in C.  See for example how plpython_call_handler() is defined in
src/pl/plpython/plpy_main.c.  Also read:

https://www.postgresql.org/docs/devel/static/plhandler.html

Beside the call handler, there are couple of other auxiliary functions
that must be provided viz. validator and inline_handler (again, see
plpython_validator and plpython_inline_handler).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New procedural language

2017-03-14 Thread Michael Paquier
On Wed, Mar 15, 2017 at 11:08 AM, Massimo Fidanza  wrote:
> what it means to add a new procedural language such as Go or Rust?

If you are willing to allow functions defined in Postgres to be
written in such languages, you are likely looking some answers in the
documentation first on how to implement that:
https://www.postgresql.org/docs/9.6/static/plhandler.html
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Amit Kapila
On Tue, Mar 14, 2017 at 10:59 PM, Robert Haas  wrote:
> On Mon, Mar 13, 2017 at 11:48 PM, Amit Kapila  wrote:
>> We didn't found any issue with the above testing.
>
> Great!  I've committed the latest version of the patch, with some
> cosmetic changes.
>

Thanks a lot.  I have noticed that the test case patch for "snapshot
too old" is not committed.  Do you want to leave that due to timing
requirement or do you want me to submit it separately so that we can
deal with it separately and may take an input from Kevin?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings

2017-03-14 Thread Noah Misch
On Mon, Mar 13, 2017 at 06:35:53PM +0300, Aleksander Alekseev wrote:
> --- a/src/include/port.h
> +++ b/src/include/port.h
> @@ -395,11 +395,22 @@ extern double rint(double x);
>  extern int   inet_aton(const char *cp, struct in_addr * addr);
>  #endif
>  
> -#if !HAVE_DECL_STRLCAT
> +/*
> + * Unfortunately in case of strlcat and strlcpy we can't trust tests
> + * executed by Autotools if Clang > 3.6 is used. Clang manages to compile
> + * a program that shouldn't compile which causes wrong values of
> + * HAVE_DECL_STRLCAT and HAVE_DECL_STRLCPY. More details could be found here:
> + *
> + * http://lists.llvm.org/pipermail/cfe-dev/2016-March/048126.html
> + *
> + * This is no doubt a dirty hack but apparently alternative solutions are
> + * not much better.
> + */
> +#if !HAVE_DECL_STRLCAT || defined(__clang__)
>  extern size_t strlcat(char *dst, const char *src, size_t siz);
>  #endif
>  
> -#if !HAVE_DECL_STRLCPY
> +#if !HAVE_DECL_STRLCPY || defined(__clang__)
>  extern size_t strlcpy(char *dst, const char *src, size_t siz);
>  #endif

This is wrong on platforms that do have strlcpy() in libc.

If I recall correctly, you can suppress the warnings in your own build by
adding "ac_cv_func_strlcpy=no ac_cv_have_decl_strlcpy=no ac_cv_func_strlcat=no
ac_cv_have_decl_strlcat=no" to the "configure" command line.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove obsolete text from hash/README

2017-03-14 Thread Amit Kapila
As pointed out by Tom [1], attached is a patch to remove obsolete text
from src/backend/access/hash/README


[1] - https://www.postgresql.org/message-id/5515.1489514099%40sss.pgh.pa.us
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_hash_index_readme_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-14 Thread Andreas Karlsson

On 03/09/2017 10:13 PM, Peter Eisentraut wrote:

- Naming of collations:  Are we happy with the "de%icu" naming?  I might
have come up with that while reviewing the IPv6 zone index patch. ;-)
An alternative might be "de$icu" for more Oracle vibe and avoiding the
need for double quotes in some cases.  (But we have mixed-case names
like "de_AT%icu", so further changes would be necessary to fully get rid
of the need for quoting.)  A more radical alternative would be to
install ICU locales in a different schema and use the search_path, or
even have a separate search path setting for collations only.  Which
leads to ...


I do not like the schema based solution since search_path already gives 
us enough headaches. As for the naming I am fine with the current scheme.


Would be nice with something we did not have to quote, but I do not like 
using dollar signs since they are already use for other things.



- Selecting default collation provider:  Maybe we want a setting, say in
initdb, to determine which provider's collations get the "good" names?
Maybe not necessary for this release, but something to think about.


This does not seem necessary for the initial release.


- Currently (in this patch), we check a collation's version when it is
first used.  But, say, after pg_upgrade, you might want to check all of
them right away.  What might be a good interface for that?  (Possibly,
we only have to check the ones actually in use, and we have dependency
information for that.)


How about adding a SQL level function for checking this which can be 
called by pg_upgrade?


= Review

Here is an initial review. I will try to find some time to do more 
testing later this week.


This is a really useful feature given the poor quality of collation 
support libc. Just that ICU versions the encodings is huge, and the 
larger range of available collations with high configurability. 
Additionally being able to use abbreviated keys again would be huge.


ICU also supports writing your own collations and modifying existing 
ones, something which might be possible to expose in the future. In 
general ICU offers a lot for users who care about the details of text 
collation.


== Functional

- Generally things seem to work fine and as expected.

- I get a test failures in the default test suite due to not having the 
tr_TR locale installed. I would assume that this would be pretty common 
for hackers.


***
*** 428,443 

  -- to_char
  SET lc_time TO 'tr_TR';
  SELECT to_char(date '2010-04-01', 'DD TMMON ');
 to_char
  -
!  01 NIS 2010
  (1 row)

  SELECT to_char(date '2010-04-01', 'DD TMMON ' COLLATE "tr%icu");
 to_char
  -
!  01 NİS 2010
  (1 row)

  -- backwards parsing
--- 428,444 

  -- to_char
  SET lc_time TO 'tr_TR';
+ ERROR:  invalid value for parameter "lc_time": "tr_TR"
  SELECT to_char(date '2010-04-01', 'DD TMMON ');
 to_char
  -
!  01 APR 2010
  (1 row)

  SELECT to_char(date '2010-04-01', 'DD TMMON ' COLLATE "tr%icu");
 to_char
  -
!  01 APR 2010
  (1 row)

  -- backwards parsing

==

- The code no longer compiles without HAVE_LOCALE_T.

- I do not like how it is not obvious which is the default version of 
every locale. E.g. I believe that "sv%icu" is "sv_reformed%icu" and not 
"sv_standard%icu" as one might expect. Is this inherent to ICU or 
something we can work around?


- ICU talks about a new syntax for locale extensions (old: 
de_DE@collation=phonebook, new: de_DE_u_co_phonebk) at this page 
http://userguide.icu-project.org/collation/api. Is this something we 
need to car about? It looks like we currently use the old format, and 
while I personally prefer it I am not sure we should rely on an old syntax.


- I get an error when creating a new locale.

#CREATE COLLATION sv2 (LOCALE = 'sv');
ERROR:  could not create locale "sv": Success

# CREATE COLLATION sv2 (LOCALE = 'sv');
ERROR:  could not create locale "sv": Resource temporarily unavailable
Time: 1.109 ms

== Code review

- For the collprovider is it really correct to say that 'd' is the 
default value as it does in catalogs.sgml?


- I do not know what the policy for formatting the documentation is, but 
some of the paragraphs are in need of re-wrapping.


- Add a hint to "ERROR:  conflicting or redundant options". The error 
message is pretty unclear.


- I am not a fan of this patch comparing the encoding with a -1 literal. 
How about adding -1 as a value to the enum? See the example below for a 
place where the patch compares with -1.


ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
-errmsg("collation \"%s\" for encoding \"%s\" already 
exists, skipping",

-   collname, pg_encoding_to_char(collencoding;
+collencoding == -1
+? errmsg("collation \"%s\" already exists, skipping",
+   

[HACKERS] New procedural language

2017-03-14 Thread Massimo Fidanza
Hello,

what it means to add a new procedural language such as Go or Rust?

Thanks
Massimo


Re: [HACKERS] multivariate statistics (v25)

2017-03-14 Thread David Rowley
On 15 March 2017 at 12:18, David Fetter  wrote:

>
> Is the plan to convert completely from "multivariate" to "extended?"
> I ask because I found a "multivariate" in there.
>

I get the idea that Tomas would like to keep the multivariate when it's
actually referencing multivariate stats. The idea of the rename was to
allow future expansion of the code to perhaps allow creation of stats on
expressions, which is not multivariate. If you've found multivariate
reference in an area that should be generic to extended statistics then
that's a bug and should be fixed.

I found a few of these and listed them during my review.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] GSOC - TOAST'ing in slices

2017-03-14 Thread George Papadrosou
Hello!

Thank you for your message. I was just about to send this email when I got 
yours. 

> I don't recall seeing an email from you about this yet?  My apologies if
> I missed it

My apologies for the inconvenience, I wish I could start earlier with this but 
there was so much coursework reaching it’s deadline.

I have prepared a very basic proposal for the TOAST project which I am 
attaching below.  You will notice that the proposal is too basic. I would 
appreciate some guidance on how we could dive more into the project’s details 
so I can elaborate more in the proposal.

Also, I haven’t considered the PostGIS project when thinking of toast’able data 
types, so I will study it a bit in the meanwhile. 

Please find the proposal draft below. 
Thanks!
George

Abstract

In PostgreSQL, a field value is compressed and then stored in chunks in a 
separate table called TOAST table [1]. Currently there is no indication of 
which piece of the original data made it to which chunk in the TOAST table. If 
a subset of the value is needed, all of the chunks have to be re-formed and 
decompressed to the original value.

The project’s idea is implement different slicing approaches according to the 
value’s datatype. For example a text field could be split upon character 
boundaries while a JSON document would be split in a way that allows fast 
access to it’s keys or values.

Benefits to the PostgreSQL Community

Knowing about the data that each chunk holds, we could keep important chunks 
closer to computations as well as store them in indices.

Project details

?

Deliverables 

- Implement “semantic” slicing for datatypes that support slicing into TOAST 
tables. These datatypes will be the Text, Array, JSON/JSONb  and XML data types.

- Include the important chunks in the indices? (Not really sure about the data 
that indices contain at this time)

Timeline

- Until May 30: Study about Postgres internals, on-disk data structures, review 
relevant code and algorithms used, define slicing approaches and agree on 
implementation details .

- Until June 26: Implement the slicing approaches for the Text, Array, 
JSON/JSONb, XML

- June 26 - 30: Student/Mentor evaluations period and safety buffer 

- Until July 24: Make indices take advantage of the new slicing approaches

- July  24 - 28: Student/Mentor evaluations period and safety buffer  

- Until August 21: Improve testing and documentation

- August  21 - 29: Submit code and final evaluations

Bio 

Contact 
Name, email, phone etc




> On 15 Μαρ 2017, at 03:39, Stephen Frost  wrote:
> 
> George,
> 
> * George Papadrosou (gpapadro...@gmail.com) wrote:
>> I understand your efforts and I am willing to back down. This is not the 
>> only project that appeals to me :)
> 
> Thank you very much for your willingness to adapt. :)
> 
>> Mr. Frost, Mr. Munro,  thank you for your suggestions. I am now between the 
>> TOAST’ing slices and the predicate locking project. I am keen on the fact 
>> the “toasting” project is related to on-disk data structures so I will 
>> probably send you an email about that later today.
> 
> .  I have added Alexander Korotkov to the CC list as he was
> also listed as a possible mentor for TOAST'ing in slices.
> 
> As it relates to TOAST'ing in slices, it would be good to think through
> how we would represent and store the information about how a particular
> object has been split up.  Note that PostgreSQL is very extensible in
> its type system and therefore we would need a way for new data types
> which are added to the system to be able to define how data of that data
> type is to be split and a way to store the information they need to
> regarding such a split.
> 
> In particular, the PostGIS project adds multiple data types which are
> variable in length and often end up TOAST'd because they are large
> geospatial objects, anything we come up with for TOAST'ing in slices
> will need to be something that the PostGIS project could leverage.
> 
>> In general, I would like to undertake a project interesting enough and 
>> important for Postgres. Also, I could take into account if you favor one 
>> over another, so please let me know. I understand that these projects should 
>> be strictly defined to fit in the GSOC period, however the potential for 
>> future improvements or challenges is what drives and motivates me.
> 
> We are certainly very interested in having you continue on and work with
> the PostgreSQL community moving forward, though we do need to be sure to
> scope the project goals within the GSOC requirements.
> 
> Thanks!
> 
> Stephen



Re: [HACKERS] ANALYZE command progress checker

2017-03-14 Thread Haribabu Kommi
On Fri, Mar 10, 2017 at 6:46 PM, vinayak 
wrote:

>
> + /* Report total number of heap blocks and collectinf sample row phase*/
> + initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
> + initprog_val[1] = totalblocks;
> + pgstat_progress_update_multi_param(2, initprog_index, initprog_val);
>
> acquire_sample_rows function is called from acquire_inherited_sample_rows
> function, so adding the phase in that function will provide wrong info.
>
> I agree with you.
>
>
> +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2
>
> why there is no code added for the phase, any specific reason?
>
> I am thinking how to report this phase. Do you have any suggestion?
>

Thanks for the update patch.

Actually the analyze operation is done in two stages for the relation that
contains child relations,
1. For parent relation
2. All child relations

So the progress with start each time for the above two stages. This needs
to clearly mentioned in the docs.

The acquire_sample_rows function collects statistics of the relation
that is provided, updating the analyze details like number of rows and etc
works fine for a single relation, but if it contains many child relations,
the
data will be misleading.

Apart from the above, some more comments.


+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

The above block of the code should be set when it actually doing the
compute_index_stats.

+ /* Report that we are now doing index cleanup */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

The above code block should be inside  if (!(options & VACOPT_VACUUM))
block.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Stephen Frost
George,

* George Papadrosou (gpapadro...@gmail.com) wrote:
> I understand your efforts and I am willing to back down. This is not the only 
> project that appeals to me :)

Thank you very much for your willingness to adapt. :)

> Mr. Frost, Mr. Munro,  thank you for your suggestions. I am now between the 
> TOAST’ing slices and the predicate locking project. I am keen on the fact the 
> “toasting” project is related to on-disk data structures so I will probably 
> send you an email about that later today.

I don't recall seeing an email from you about this yet?  My apologies if
I missed it.  I have added Alexander Korotkov to the CC list as he was
also listed as a possible mentor for TOAST'ing in slices.

As it relates to TOAST'ing in slices, it would be good to think through
how we would represent and store the information about how a particular
object has been split up.  Note that PostgreSQL is very extensible in
its type system and therefore we would need a way for new data types
which are added to the system to be able to define how data of that data
type is to be split and a way to store the information they need to
regarding such a split.

In particular, the PostGIS project adds multiple data types which are
variable in length and often end up TOAST'd because they are large
geospatial objects, anything we come up with for TOAST'ing in slices
will need to be something that the PostGIS project could leverage.

> In general, I would like to undertake a project interesting enough and 
> important for Postgres. Also, I could take into account if you favor one over 
> another, so please let me know. I understand that these projects should be 
> strictly defined to fit in the GSOC period, however the potential for future 
> improvements or challenges is what drives and motivates me.

We are certainly very interested in having you continue on and work with
the PostgreSQL community moving forward, though we do need to be sure to
scope the project goals within the GSOC requirements.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 8:33 PM, Robert Haas  wrote:
> Of course, that supposes that 0009 can manage to postpone creating
> non-sampled child joinrels until create_partition_join_plan(), which
> it currently doesn't.  In fact, unless I'm missing something, 0009
> hasn't been even slightly adapted to take advantage of the
> infrastructure in 0001; it doesn't seem to reset the path_cxt or
> anything.  That seems like a fairly major omission.

Some other comments on 0009:

Documentation changes for the new GUCs are missing.

+between the partition keys of the joining tables. The equi-join between
+partition keys implies that for a given row in a given partition of a given
+partitioned table, its joining row, if exists, should exist only in the
+matching partition of the other partitioned table; no row from non-matching

There could be more than one.   I'd write: The equi-join between
partition keys implies that all join partners for a given row in one
partitioned table must be in the corresponding partition of the other
partitioned table.

+#include "miscadmin.h"
 #include 
 #include 

Added in wrong place.

+* System attributes do not need
translation. In such a case,
+* the attribute numbers of the parent
and the child should
+* start from the same minimum attribute.

I would delete the second sentence and add an Assert() to that effect instead.

+   /* Pass top parent's relids down the inheritance hierarchy. */

Why?

+   for (attno = rel->min_attr; attno <=
rel->max_attr; attno++)

Add add a comment explaining why we need to do this.

-   add_paths_to_append_rel(root, rel, live_childrels);
+   add_paths_to_append_rel(root, rel, live_childrels, false);
 }

-

No need to remove blank line.

+ * When called on partitioned join relation with partition_join_path = true, it
+ * adds PartitionJoinPath instead of Merge/Append path. This path is costed
+ * based on the costs of sampled child-join and is expanded later into
+ * Merge/Append plan.

I'm not a big fan of the Merge/Append terminology here.  If somebody
adds another kind of append-path someday, then all of these comments
will have to be updated.  I think this can be phrased more
generically.

/*
+* While creating PartitionJoinPath, we sample paths from only
a few child
+* relations. Even if all of sampled children have partial
paths, it's not
+* guaranteed that all the unsampled children will have partial paths.
+* Hence we do not create partial PartitionJoinPaths.
+*/

Very sad.  I guess if we had parallel append available, we could maybe
dodge this problem, but for now I suppose we're stuck with it.

+   /*
+* Partitioning scheme in join relation indicates a possibility that the
+* join may be partitioned, but it's not necessary that every pair of
+* joining relations can use partition-wise join technique. If one of
+* joining relations turns out to be unpartitioned, this pair of joining
+* relations can not use partition-wise join technique.
+*/
+   if (!rel1->part_scheme || !rel2->part_scheme)
+   return;

How can this happen?  If rel->part_scheme != NULL, doesn't that imply
that every rel covered by the joinrel is partitioned that way, and
therefore this condition must necessarily hold?

In general, I think it's better style to write explicit tests against
NULL or NIL than to just write if (blahptr).

+   partitioned_join->sjinfo = copyObject(parent_sjinfo);

Why do we need to copy it?

+   /*
+* Remove the relabel decoration. We can assume that there is
at most one
+* RelabelType node; eval_const_expressions() simplifies multiple
+* RelabelType nodes into one.
+*/
+   if (IsA(expr, RelabelType))
+   expr = (Expr *) ((RelabelType *) expr)->arg;

Still, instead of assuming this, you could just s/if/while/, and then
you wouldn't need the assumption any more.  Also, consider castNode().

partition_wise_plan_weight may be useful for testing, but I don't
think it should be present in the final patch.

This is not a full review; I ran out of mental energy before I got to
the end.  (Sorry.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 8:04 AM, Ashutosh Bapat
 wrote:
>> In 0001, you've removed a comment about how GEQO needs special
>> handling, but it doesn't look as if you've made any compensating
>> change elsewhere.  That seems unlikely to be correct.  If GEQO needs
>> some paths to survive longer than others, how can it be right for this
>> code to create them all in the same context?
>
> Thanks for pointing that out. I have replaced the code and the
> comments back. There was another issue that the temporary paths
> created by geqo will not be freed when geqo moves one genetic string
> to next genetic string (or what it calls as tour: an list of relation
> to be joined in a given order). To fix this, we need to set the path
> context to the temporary context of geqo inside geqo_eval() before
> calling gimme_tree() and reset it later. That way the temporary paths
> are also created in the temporary memory context of geqo. Fixed in the
> patch.

Yeah, that looks better.

>> Incidentally,
>> geqo_eval() seems to be an existing precedent for the idea of throwing
>> away paths and RelOptInfos, so we might want to use similar code for
>> partitionwise join.
>
> There are some differences in what geqo does and what partition-wise
> needs to do. geqo tries many joining orders each one in a separate
> temporary context. The way geqo slices the work, every slice produces
> a full plan. For partition-wise join I do not see a way to slice the
> work such that the whole path and corresponding RelOptInfos come from
> the same slice. So, we can't use the same method as GEQO.

What I was thinking about was the use of this technique for getting
rid of joinrels:

root->join_rel_list = list_truncate(root->join_rel_list,
savelength);
root->join_rel_hash = savehash;

makePathNode() serves to segregate paths into a separate memory
context that can then be destroyed, but as you point out, the path
lists are still hanging around, and so are the RelOptInfo nodes.  It
seems to me we could do a lot better using this technique.  Suppose we
jigger things so that the List objects created by add_path go into
path_cxt, and so that RelOptInfo nodes also go into path_cxt.  Then
when we blow up path_cxt we won't have dangling pointers in the
RelOptInfo objects any more because the RelOptInfos themselves will be
gone.  The only problem is that the join_rel_list (and join_rel_hash
if it exists) will be corrupt, but we can fix that using the technique
demonstrated above.

Of course, that supposes that 0009 can manage to postpone creating
non-sampled child joinrels until create_partition_join_plan(), which
it currently doesn't.  In fact, unless I'm missing something, 0009
hasn't been even slightly adapted to take advantage of the
infrastructure in 0001; it doesn't seem to reset the path_cxt or
anything.  That seems like a fairly major omission.

Incidentally, I committed 0002, 0003, and 0005 as a single commit with
a few tweaks; I think you may need to do a bit of rebasing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-14 Thread Andrew Dunstan


On 03/03/2017 11:11 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 3/3/17 19:16, Tom Lane wrote:
>>> Peter Eisentraut  writes:
 Use asynchronous connect API in libpqwalreceiver
>>> Buildfarm member bowerbird has been failing in the pg_rewind test since
>>> this patch went in.  It looks like it's failing to complete connections
>>> from the standby; which suggests that something platform-specific is
>>> missing from this commit, but I dunno what.
>> Hmm, I wonder how widely tested the async connection API is on Windows
>> at all.  I only see bowerbird and jacana running bin-check on Windows.
> Yeah, I was wondering if this is just exposing a pre-existing bug.
> However, the "normal" path operates by repeatedly invoking PQconnectPoll
> (cf. connectDBComplete) so it's not immediately obvious how such a bug
> would've escaped detection.
>
>   

(After a long period of fruitless empirical testing I turned to the code)


Maybe I'm missing something, but connectDBComplete() handles a return of
PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
don't find anywhere in our code other than libpqwalreceiver that
actually uses that interface, so it's not surprising if it's now
failing. So my bet is it is indeed a long-standing bug.

cheers

andr4ew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improve comments of snapbuild.c

2017-03-14 Thread Andres Freund
On 2017-03-14 22:03:21 +0100, Erik Rijkers wrote:
> Improvements (grammar/typos) in the comments in snapbuild.c
> 
> To be applied to master.

Thanks, pushed.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 4:54 PM, Pavel Stehule  wrote:
>> I don't understand.  The only way you'd need a server restart is if a
>> background process wasn't responding to SIGTERM, and that's a bug
>> independent of anything this patch does.  It would be cause by the
>> background process not doing CHECK_FOR_INTERRUPTS() or the moral
>> equivalent regularly.
>
> It is bug, and I don't know if it s this extension bug or general bug.
>
> There is not adequate cleaning after killing.
>
> How can be implemented pg_cancel_backend on background process if there are
> not CHECK_FOR_INTERRUPTS?

You can't.  But what does that have to do with this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
Hi,

On 2017-03-14 19:34:12 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ new patch versions ]
> 
> About to leave, but I had time to read 0003:
> 
> It seems bizarre that you chose to spell the new configure symbol as
> HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO

I went back-and-forth about this a number of times.  We have a bunch of
symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and
more of the nearby code seems to use __ rather than _.  I don't really
know why we started doing that, but it's far from new..

Any idea why we introduce __ stuff?


> Also, being a neatnik, I would insert both the definition and the
> call of that macro between PGAC_C_BUILTIN_UNREACHABLE and
> PGAC_C_VA_ARGS, keeping it more or less in alphabetical order.

That works for me.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-03-14 Thread Michael Paquier
On Wed, Mar 15, 2017 at 1:13 AM, Robert Haas  wrote:
> On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
>  wrote:
>>> I agree with that, but I propose the attached version instead.  It
>>> seems cleaner to have the entire test for setting BM_PERMANENT in one
>>> place rather than splitting it up as you did.
>>
>> Fine for me. You may want to update the comment of BM_PERMANENT in
>> buf_internals.h as Artur has mentioned upthread. For example by just
>> adding "and init forks".
>
> OK, done, and back-patched all the way.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane
Andres Freund  writes:
> [ new patch versions ]

About to leave, but I had time to read 0003:

It seems bizarre that you chose to spell the new configure symbol as
HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO, especially so
when the comment for PGAC_C_COMPUTED_GOTO alleges it's the latter.
Also, being a neatnik, I would insert both the definition and the
call of that macro between PGAC_C_BUILTIN_UNREACHABLE and
PGAC_C_VA_ARGS, keeping it more or less in alphabetical order.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-03-14 Thread David Fetter
On Tue, Mar 14, 2017 at 07:10:49PM -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > I tried patch 0002 today and again there are conflicts, so I rebased and
> > fixed the merge problems.
> 
> ... and attached the patch.

Is the plan to convert completely from "multivariate" to "extended?"
I ask because I found a "multivariate" in there.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-14 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> [ generic_type_subscription_v7.patch ]

I looked through this a bit.

I think that the basic design of having a type-specific parse analysis
function that returns a constructed SubscriptingRef node is fine.

I'm not totally excited about the naming you've chosen though,
particularly the function names "array_subscripting()" and
"jsonb_subscripting()" --- those are too generic, and a person coming to
them for the first time would probably expect that they actually execute
subscripting, when they do no such thing.  Names like
"array_subscript_parse()" might be better.  Likewise the name of the
new pg_type column doesn't really convey what it does, though I suppose
"typsubscriptparse" is too much of a mouthful.  "typsubparse" seems short
enough but might be confusing too.

I wonder also if we should try to provide some helper functions rather
than expecting every data type to know all there is to know about parsing
and execution of subscripting.  Not sure what would be helpful, however.

One thing that needs more thought for sure is the nested-assignment case
(the logic around isAssignmentIndirectionExpr) --- the design you've got
here implies that *every* container-like datatype would need to duplicate
that logic, and I don't think we want to go there.

The documentation needs a lot of work of course, and I do not think
you're doing either yourself or anybody else any favors with the proposed
additions to src/tutorial/.  You'll never be sure that that stuff even
compiles let alone accurately represents what people need to do.  Actual
running code is much better.  It may be that jsonb_subscript is enough
of an extension example, but perhaps adding a subscripting feature to some
contrib module would be better.

Aren't SBS_VALIDATION and SBS_EXEC just hangovers from the previous
design?  They're still in parse_node.h, and they're still mentioned in
the docs, but I don't see them used in actual code anywhere.
get_slice_arguments seems to be a hangover as well, which is good
because it's mighty ugly and undocumented.

It seems rather silly for ExecEvalSubscriptingRef to be palloc'ing some
per-subscript arrays each time through when it's got other arrays that are
still of fixed size MAXDIM.  I can believe that it might be a good idea
to increase or remove the MAXDIM limit, but this doesn't do it.  In any
case, you don't want to add the overhead of a couple of pallocs per
execution.  Using OidFunctionCall2 is even worse: that's adding a system
catalog lookup per execution.  You need to be caching the function address
as is done for regular function and operator calls.  (I take it you've not
done performance testing yet.)

I'm not really finding this to be an improvement:
- errmsg("array subscript in assignment must 
not be null")));
+ errmsg("container subscript in assignment 
must not be null")));
"Container" is going to seem like jargon to users.  Maybe it'd be okay to
drop the word altogether and just say "subscript in assignment must not be
null".  (Another question here is whether every datatype will be on board
with the current rules about null subscripts, or whether we need to
delegate null-handling to the datatype-specific execution function.
I'm not sure; it would complicate the API significantly for what might be
useless flexibility.)

I'm tempted to propose that it'd be a good idea to separate the regular
(varlena) array code paths from the fixed-length-array code paths
altogether, which you could do in this implementation by having separate
execution functions for them.  That would buy back some fraction of
whatever overhead we're adding with the additional level of function call.
Maybe even separate execution functions for the slice and not-slice
cases, though that might be overkill.

I'm not on board with these APPLY_OPERATOR_TO_TYPE macros.  If you
think you have a cute idea for improving the notation in the node support
files, great; submit a patch that changes all of the support functions
at once.  Patches that introduce one or two support functions that look
radically different from all the rest are not acceptable.

Likewise, what you did in places like JumbleExpr is too cute by half.
Just make two separate cases for the two new node types.  You're not
saving enough code to justify the additional intellectual complexity
and maintenance burden of doing it like that.

I do not think that the extra SubscriptingBase data structure is paying
for itself either; you're taking a hit in readability from the extra level
of struct, and as far as I can find it's not buying you one single thing,
because there's no code that operates on a SubscriptingBase argument.
I'd just drop that idea and make two independent struct types, or else
stick with the original ArrayRef design that made one struct serve both
purposes.  (IOW, my feeling that a separate assign struct would be a
readability improvement 

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan
On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:
> We already have BTPageOpaqueData.btpo, a union whose contained type
> varies based on the page being dead. We could just do the same with
> some other field in that struct, and then store epoch there. Clearly
> nobody really cares about most data that remains on the page. Index
> scans just need to be able to land on it to determine that it's dead,
> and VACUUM needs to be able to determine whether or not there could
> possibly be such an index scan at the time it considers recycling..

ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
On 2017-03-14 14:19:18 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
> >> It would be good to have someone at least read it before pushing, but
> >> I don't think anyone other than you has done so.
> 
> > I'd love for somebody else
> > to look through it, I tried asking multiple times...  Seems like
> > threatening a commit is the best way to get it :P
> 
> I'm willing to look

Cool.


> but the last few messages make it sound like you're not all that close
> to a finished version.

Hm, doesn't seem that far off to me - the latest few rounds of changes
have been largely polishing (naming, fighting w/ pgindent, language
polishing in comments, etc), otherwise I've basically just added a few
lines of optimization after doing more profiling.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Douglas Doole
On Tue, Mar 14, 2017 at 3:16 PM Andres Freund  wrote:

> Hm.  Right now ExprState's are allocated in several places - but we
> could easily move that to a central place.  Have a bit of a hard time
> seing that that branch during *initialization* time is that expensive,
> especially given that previously we allocated a lot of memory separately
> too.
>

I didn't make any comparisons of the cost of the new init against the old
init with this change in particular - I just saw that it made the new init
faster. I also didn't play around to determine if the savings was found in
removing the branch misprediction or inlining or both.

I certainly wouldn't hold up your commit for this, but it's something that
might be worth a second look once the dust has settled.


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
Hi,

On 2017-03-14 22:03:45 +, Douglas Doole wrote:
> I do have one observation based on my experiments with your first version
> of the code. In my tests, I found that expression init becomes a lot more
> expensive in this new model. (That's neither a surprise, nor a
> concern.)

I suspect that's to a good degree because back then it'd often end up
trying the "new" stuff, but then fall back to the old machinery due to
unsupported operations. Which isn't a concern anymore, because now
there's full coverage.

Otherwise the cost aren't, according to my measurements, higher than
before, excepting that some stuff that happened at execution time is now
happening during initialization.


> In particular, the function ExprEvalPushStep() is quite hot. In my
> code I made the following changes:
> 
>   * Declare ExprEvalPushStep() "inline".
>   * Remove the "if (es->steps_alloc == 0)" condition
> from ExprEvalPushStep().
>   * In ExecInitExpr(), add:
>state->steps_alloc = 16;
>state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc);
> 
> I found that this cut the cost of initializing the expression by about 20%.
> (Of course, that was on version 1 of your code, so the benefit may well be
> different now.)


Hm.  Right now ExprState's are allocated in several places - but we
could easily move that to a central place.  Have a bit of a hard time
seing that that branch during *initialization* time is that expensive,
especially given that previously we allocated a lot of memory separately
too.

> On Tue, Mar 14, 2017 at 11:51 AM Andres Freund  wrote:
> 
> > > Hmm. Could we make the instructions variable size? It would allow packing
> > > the small instructions even more tight, and we wouldn't need to obsess
> > over
> > > a particular maximum size for more complicated instructions.
> >
> > That makes jumps a lot more complicated.  I'd experimented with it and
> > given it up as "not worth it".

> Back when I was at IBM, I spent a lot of time doing stuff like this. If you
> want to commit with the fixed size arrays, I'm happy to volunteer to look
> at packing it tighter as a follow-on piece of work. (It was already on my
> list of things to try anyhow.)

Yea, I think experimenting with that is a good idea. I just think this
is complicated enough, and I don't want to add a whole lot more
complexity for very debatable benefit.


> > If we were to try to do so, we'd also
> > not like storing the pointer and enum variants both, since it'd again
> > would reduce the density.

> From my experience, it's worth the small loss in density to carry around
> both the pointer and the enum - it makes debugging so much easier.

I found it annoying enough to print the instruction itself, due to the
union - so printing the opcode using a function isn't too bad...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-14 Thread Tom Lane
Peter Eisentraut  writes:
> I see a possible problem here:  This design only allows one subscripting
> function.  But what you'd really want in this case is at least two: one
> taking an integer type for selecting by array index, and one taking text
> for selecting by field name.

No, I think you're missing the point: there is just one function per
datatype for *parse analysis* of a subscripting operation applied to
that datatype.  What it chooses to allow as subscript type, and what
function it determines will be used at execution, is up to it.

I agree that the given jsonb_subscript is failing to handle the
subscript-an-array-with-an-integer case, but that's a datatype-specific
shortcoming not a failure of the overall design.

I would guess that what we really want for jsonb is the ability to
intermix integer and text subscripts, so that
 jsonbcol['foo'][42]['bar']
would extract the "bar" field of an object in position 42 of an
array in field "foo" of the given jsonb value.  So you probably
end up still having one jsonb execution function, not two, and
it would have different code paths depending on whether it sees
the type of the next subscript expression to be integer or text.

> It looks like your jsonb subscripting function just returns null if it
> can't find a field, which is also a bit dubious.

Nah, that seems fine.  Our precedent for standard array subscripting is to
return NULL for out-of-range subscripts, and the jsonb -> operator also
returns NULL if there's no such field.  It would be rather surprising if
jsonb subscripting threw an error instead; and I do not think it would be
more useful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan
On Tue, Mar 14, 2017 at 2:48 PM, Peter Geoghegan  wrote:
> I think that that's safe, but it is a little disappointing that it
> does not allow us to skip work in the case that you really had in mind
> when writing the patch. Better than nothing, though, and perhaps still
> a good start. I would like to hear other people's opinions.

Hmm. So, the SQL-callable function txid_current() exports a 64-bit
XID, which includes an "epoch". If PostgreSQL used these 64-bit XIDs
natively, we'd never need to freeze. Obviously we don't do that
because the per-tuple overhead of visibility information is already
high, and that would make it much worse. But, we can easily afford the
extra overhead in a completely dead page. Maybe we can overcome the
_bt_page_recyclable() limitation by being cleverer about how it
determines if recycling is safe -- it can examine epoch, too. This
would also be required in the similar function
vacuumRedirectAndPlaceholder(), which is a part of SP-GiST.

We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..

Does anyone see a problem with that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-03-14 Thread Alvaro Herrera
Alvaro Herrera wrote:
> I tried patch 0002 today and again there are conflicts, so I rebased and
> fixed the merge problems.

... and attached the patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2c2da2a..b5c4129 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -296,6 +296,11 @@
  
 
  
+  pg_statistic_ext
+  extended planner statistics
+ 
+
+ 
   pg_subscription
   logical replication subscriptions
  
@@ -4223,6 +4228,98 @@
   
  
 
+ 
+  pg_statistic_ext
+
+  
+   pg_statistic_ext
+  
+
+  
+   The catalog pg_statistic_ext
+   holds extended planner statistics.
+  
+
+  
+   pg_statistic_ext Columns
+
+   
+
+ 
+  Name
+  Type
+  References
+  Description
+ 
+
+
+
+
+ 
+  starelid
+  oid
+  pg_class.oid
+  The table that the described columns belongs to
+ 
+
+ 
+  staname
+  name
+  
+  Name of the statistic.
+ 
+
+ 
+  stanamespace
+  oid
+  pg_namespace.oid
+  
+   The OID of the namespace that contains this statistic
+  
+ 
+
+ 
+  staowner
+  oid
+  pg_authid.oid
+  Owner of the statistic
+ 
+
+ 
+  staenabled
+  char[]
+  
+  
+An array with the modes of the enabled statistic types, encoded as
+d for ndistinct coefficients.
+  
+ 
+
+ 
+  stakeys
+  int2vector
+  pg_attribute.attnum
+  
+   This is an array of values that indicate which table columns this
+   statistic covers. For example a value of 1 3 would
+   mean that the first and the third table columns make up the statistic 
key.
+  
+ 
+
+ 
+  standistinct
+  pg_ndistinct
+  
+  
+   Ndistict coefficients, serialized as pg_ndistinct type.
+  
+ 
+
+
+   
+  
+ 
+
  
   pg_namespace
 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..76955e5 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -448,4 +448,145 @@ rows = (outer_cardinality * inner_cardinality) * 
selectivity
 
  
 
+ 
+  Extended Statistics
+
+  
+   extended statistics
+   planner
+  
+
+  
+   The examples presented in  used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence of the conditions and multiplies the selectivities. When the
+   columns are correlated, the independence assumption is violated, and the
+   estimates may be off by several orders of magnitude, resulting in poor
+   plan choices.
+  
+
+  
+   The examples presented below demonstrate such estimation errors on simple
+   data sets, and also how to resolve them by creating extended statistics
+   using CREATE STATISTICS command.
+  
+
+  
+   Let's start with a very simple data set - a table with two columns,
+   containing exactly the same values:
+
+
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 1) s(i);
+ANALYZE t;
+
+
+   As explained in , the planner can determine
+   cardinality of t using the number of pages and
+   rows is looked up in pg_class:
+
+
+SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
+
+ relpages | reltuples
+--+---
+   45 | 1
+
+
+   The data distribution is very simple - there are only 100 distinct values
+   in each column, uniformly distributed.
+  
+
+  
+   The following example shows the result of estimating a WHERE
+   condition on the a column:
+
+
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+   QUERY PLAN  
  
+-
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 
rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
+(5 rows)
+
+
+   The planner examines the condition and computes the estimate using
+   eqsel, the selectivity function for =, and
+   statistics stored in the pg_stats table. In this case
+   the planner estimates the condition matches 1% rows, and by comparing
+   the estimated and actual number of rows, we see that the estimate is
+   very accurate (in fact exact, as the table is very small).
+ 
+
+  
+   Adding a condition on the second column results in the following plan:
+
+
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+  QUERY PLAN   

+---
+ Seq Scan on t  

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-14 Thread Robert Haas
On Mon, Mar 13, 2017 at 6:24 AM, Amit Langote
 wrote:
> The previous proposal was for expand_inherited_rtentry to not create RT
> entries and AppendRelInfo's for the non-leaf tables, but I think that
> doesn't work, as I tried to explain above.  We need RTEs because that
> seems to be the only way currently for informing the executor of the
> non-leaf tables. We need AppendRelInfo's to store the RT indexes of those
> RTEs for the latter planning steps to collect them in partitioned_rels
> mentioned above. So with the latest patch, we do create the RT entry and
> AppendRelInfo for non-leaf tables.  AppendRelInfo created in this case is
> a minimal one; only parent_relid and child_relid are valid.  To make the
> latter planning steps ignore these minimal AppendRelInfo's, every
> AppendRelInfo is now marked with child_relkind.  Only
> set_append_rel_pathlist() and inheritance_planner() process them to
> collect the child_relid into the partitioned_rels list to be stored in
> AppendPath/MergeAppendPath and ModifyTablePath, respectively.

I see your point, but I still think this kind of stinks.  You've got
all kinds of logic that is now conditional on child_is_partitioned,
and that seems like a recipe for bugs and future maintenance
difficulties.  It would be much nicer if we could come up with a
design that doesn't create the AppendRelInfo in the first place,
because then all of that stuff could just work.  Can we get away with
creating an RTE for each partitioned table (other than the parent,
perhaps; for that one it would be nice to use the inh-flagged RTE we
already have) but NOT creating an AppendRelInfo to go with it?  If
we've got the list of RTIs for the new RTEs associated with the append
path in some other form, can't we get by without also having an
AppendRelInfo to hold onto that translation?

The comments in InitPlan() explain why locks on result relations are
taken in that function directly rather than during the ExecInitNode
pass over the tree; it's because we need to make sure we take the
strongest lock on any given relation first.  But the changes in
ExecInitAppend and ExecInitMergeAppend are problematic in that regard;
some AccessShareLocks may already have been taken, and now we're
taking locks that in some case may be RowShareLock, which could cause
a lock upgrade.  Remember that there's no reason that you couldn't
join a table to one of its own partitions, or something like that.  I
think you need to try to jigger things so that InitPlan() takes all
locks stronger than AccessShareLock that are required anywhere in the
query, and then other nodes can take anything at AccessShareLock that
they need.

I think that eliding the Append node when there's only one child may
be unsafe in the case where the child's attribute numbers are
different from the parent's attribute numbers.  I remember Tom making
some comment about this when I was working on MergeAppend, although I
no longer remember the specific details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-03-14 Thread Alvaro Herrera
I tried patch 0002 today and again there are conflicts, so I rebased and
fixed the merge problems.  I also changed a number of minor things, all
AFAICS cosmetic in nature:

* moved src/backend/statistics/common.h to
  src/include/statistics/common.h, as previously commented.  I also took
  out postgres.h and most of the includes; instead, put all these into
  each .c source file.  That aligns with our established practice.
  I also removed two prototypes that should actually be in stats.h.
  I think statistics/common.h should be further renamed to
  statistics/stats_ext_internal.h, and statistics/stats.h to something
  different though I don't know what ATM.

* Moved src/include/utils/stats.h to src/include/statistics, clean it up
  a bit.

* Moved some structs from analyze.c into statistics/common.h, removing
  some duplication; have analyze.c include that file.

* renamed src/test/regress/sql/mv_ndistinct.sql to stats_ext.sql, to
  collect all ext.stats. related tests in a single file, instead of
  having a large number of them.  I also added one test that drops a
  column, per David Rowley's reported failure, but I didn't actually fix
  the problem nor add it to the expected file.  (I'll follow up with
  that tomorrow, if Tomas doesn't beat me to it).  Also, put the test in
  an earlier parallel test group, 'cause I see no reason to put it last.

* A bunch of stylistic changes.

The added tests pass (or they passed before I added the drop column
tests; not a surprise really that they pass, since I didn't touch
anything functionally), but they aren't terribly exhaustive at the stage
of the first patch in the series.

I didn't get around to addressing all of David Rowley's input.  Also I
didn't try to rebase the remaining patches in the series on top of this
one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Douglas Doole
Andres, sorry I haven't had a chance to look at this great stuff you've
been doing. I've wanted to get to it, but work keeps getting in the way. ;-)

I do have one observation based on my experiments with your first version
of the code. In my tests, I found that expression init becomes a lot more
expensive in this new model. (That's neither a surprise, nor a concern.) In
particular, the function ExprEvalPushStep() is quite hot. In my code I made
the following changes:

  * Declare ExprEvalPushStep() "inline".
  * Remove the "if (es->steps_alloc == 0)" condition
from ExprEvalPushStep().
  * In ExecInitExpr(), add:
   state->steps_alloc = 16;
   state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc);

I found that this cut the cost of initializing the expression by about 20%.
(Of course, that was on version 1 of your code, so the benefit may well be
different now.)

On Tue, Mar 14, 2017 at 11:51 AM Andres Freund  wrote:

> > Hmm. Could we make the instructions variable size? It would allow packing
> > the small instructions even more tight, and we wouldn't need to obsess
> over
> > a particular maximum size for more complicated instructions.
>
> That makes jumps a lot more complicated.  I'd experimented with it and
> given it up as "not worth it".


Back when I was at IBM, I spent a lot of time doing stuff like this. If you
want to commit with the fixed size arrays, I'm happy to volunteer to look
at packing it tighter as a follow-on piece of work. (It was already on my
list of things to try anyhow.)


> If we were to try to do so, we'd also
> not like storing the pointer and enum variants both, since it'd again
> would reduce the density.
>

>From my experience, it's worth the small loss in density to carry around
both the pointer and the enum - it makes debugging so much easier.

- Doug
Salesforce


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-14 Thread Peter Geoghegan
On Thu, Mar 9, 2017 at 8:51 AM, Masahiko Sawada  wrote:
>> pg_class.relfrozenxid is InvalidTransactionId for indexes because
>> indexes generally don't store XIDs. This is the one exception that I'm
>> aware of, presumably justified by the fact that it's only for
>> recyclable pages anyway, and those are currently *guaranteed* to get
>> recycled quickly. In particular, they're guaranteed to get recycled by
>> the next VACUUM. They may be recycled in the course of anti-wraparound
>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
>> case that this patch proposes to have us skip touching indexes for.
>>
>
> To prevent this, I think we need to not skip the lazy_cleanup_index
> when ant-wraparound vacuum (aggressive = true) even if the number of
> scanned pages is less then the threshold. This can ensure that
> pg_class.relfrozenxid is not advanced past opaque->bp.xact with
> minimal pain. Even if the btree dead page is leaved, the subsequent
> modification makes garbage on heap and then autovauum recycles that
> page while index vacuuming(lazy_index_vacuum).

Sorry for the delay in getting back to you.

I think that that's safe, but it is a little disappointing that it
does not allow us to skip work in the case that you really had in mind
when writing the patch. Better than nothing, though, and perhaps still
a good start. I would like to hear other people's opinions.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-14 Thread Peter Eisentraut
On 3/14/17 12:12, Petr Jelinek wrote:
>> Committing this in chunks makes sense anyway.
>>
>> I've attached a slightly version that makes pg_recvlogical skip slot
>> export. The second patch is unchanged, use the copy from the
>> immediately prior mail.
>>
> 
> Okay, I merged your changes in.
> 
> Here is that merge, plus the updated main patch that replaces all the
> SQL interaces in libpqwalreceiver.c with single exec one which uses
> tuplestore to send any tuple results back.
> 
> For correct working in every situation, it still needs to snapbuild
> fixes and also the logical replication origin tracking fix.

Committed 0001.  Will look at 0002 next.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-14 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Mar 14, 2017 at 8:40 AM, Tom Lane  wrote:
>> Why exactly would anyone want "md5 only"?  I should think that "scram
>> only" is a sensible pg_hba setting, if the DBA feels that md5 is too
>> insecure, but I do not see the point of "md5 only" in 2017.  I think
>> we should just start interpreting that as "md5 or better".

> Without md5-only, a user who uses \password to change their password from a
> newer client would lock themselves out of connecting again from older
> clients.  As a conscious decision (either of the DBA or the user) that
> would be OK, but to have it happen by default would be unfortunate.

That's a point, but what it implies is that \password needs some input
from the user about whether to generate a SCRAM or MD5-hashed password.
It would be a fatal error to try to drive that off the auth method
that had been used for the current connection, even if \password had a
way to find that out.  By definition, your concern is about clients
other than the current one, which might well be coming in from other
addresses and getting challenges based on other pg_hba entries.  So
you can't say that "I came in on a SCRAM connection" is sufficient
reason to generate a SCRAM password.

In short, I don't think that argument refutes my position that "md5"
in pg_hba.conf should be understood as allowing SCRAM passwords too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] improve comments of snapbuild.c

2017-03-14 Thread Erik Rijkers

Improvements (grammar/typos) in the comments in snapbuild.c

To be applied to master.

thanks,

Erik Rijkers

--- src/backend/replication/logical/snapbuild.c.orig	2017-03-14 21:53:42.590196415 +0100
+++ src/backend/replication/logical/snapbuild.c	2017-03-14 21:57:57.906539208 +0100
@@ -34,7 +34,7 @@
  * xid. That is we keep a list of transactions between snapshot->(xmin, xmax)
  * that we consider committed, everything else is considered aborted/in
  * progress. That also allows us not to care about subtransactions before they
- * have committed which means this modules, in contrast to HS, doesn't have to
+ * have committed which means this module, in contrast to HS, doesn't have to
  * care about suboverflowed subtransactions and similar.
  *
  * One complexity of doing this is that to e.g. handle mixed DDL/DML
@@ -82,7 +82,7 @@
  * Initially the machinery is in the START stage. When an xl_running_xacts
  * record is read that is sufficiently new (above the safe xmin horizon),
  * there's a state transition. If there were no running xacts when the
- * runnign_xacts record was generated, we'll directly go into CONSISTENT
+ * running_xacts record was generated, we'll directly go into CONSISTENT
  * state, otherwise we'll switch to the FULL_SNAPSHOT state. Having a full
  * snapshot means that all transactions that start henceforth can be decoded
  * in their entirety, but transactions that started previously can't. In
@@ -273,7 +273,7 @@
 /*
  * Allocate a new snapshot builder.
  *
- * xmin_horizon is the xid >=which we can be sure no catalog rows have been
+ * xmin_horizon is the xid >= which we can be sure no catalog rows have been
  * removed, start_lsn is the LSN >= we want to replay commits.
  */
 SnapBuild *
@@ -1840,7 +1840,7 @@
 	char		path[MAXPGPATH];
 
 	/*
-	 * We start of with a minimum of the last redo pointer. No new replication
+	 * We start off with a minimum of the last redo pointer. No new replication
 	 * slot will start before that, so that's a safe upper bound for removal.
 	 */
 	redo = GetRedoRecPtr();
@@ -1898,7 +1898,7 @@
 			/*
 			 * It's not particularly harmful, though strange, if we can't
 			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be cure worse than the disease.
+			 * completing, that'd be a cure worse than the disease.
 			 */
 			if (unlink(path) < 0)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-03-14 Thread Pavel Stehule
2017-03-14 19:08 GMT+01:00 Robert Haas :

> On Tue, Mar 14, 2017 at 3:31 AM, Pavel Stehule 
> wrote:
> > Very often strategy can be recheck of parent process  in some waiting
> > cycles. It should not to impact performance.
>
> I think that's going to be hard to arrange, and I think it isn't
> necessary.  If the leader wants to arrange for the worker to die when
> it exits, it can use TerminateBackgroundWorker() from a
> PG_ENSURE_ERROR_CLEANUP block or on_shmem_exit hook.
>
> > I afraid so some waiting times in bg process can be high probable with
> this
> > patch - and then is probable so somebody use pg_terminate_backend. This
> > situation should not to finish by server restart.
>
> I don't understand.  The only way you'd need a server restart is if a
> background process wasn't responding to SIGTERM, and that's a bug
> independent of anything this patch does.  It would be cause by the
> background process not doing CHECK_FOR_INTERRUPTS() or the moral
> equivalent regularly.
>

It is bug, and I don't know if it s this extension bug or general bug.

There is not adequate cleaning after killing.

How can be implemented pg_cancel_backend on background process if there are
not CHECK_FOR_INTERRUPTS?

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-14 Thread Tom Lane
Pritam Baral  writes:
> The topic has been previously discussed[0] on the -performance mailing list,
> about four years ago.
> In that thread, Tom suggested[0] the planner could be made to "expand
> "intcol <@
> 'x,y'::int4range" into "intcol between x and y", using something similar
> to the
> index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".

> This patch tries to do exactly that.

I took a quick look through this, and have some thoughts ---

* In match_special_index_operator, there are two switch statements and
you've added a clause to only one of them.  In the second one, you need to
add a check that you're working with a btree index.  I'd expect the patch
as-is to fall over if an "indexkey <@ range" clause were matched to a hash
index.

* You're failing to account for the case of "range @> indexkey", which
really this ought to work for.  That would take a bit of fooling around
with the structure of match_special_index_operator to allow indexkey on
the right, but we've foreseen since the beginning that that would someday
be necessary.  Looks like today is that day.

* The first bit in range_elem_contained_quals will fall over for an
indexkey that is an expression rather than a simple Var.  Probably
you should just be using exprType() instead.  (Not sure about whether
that's sufficient in domain cases, though.)  Or actually, why look at
that at all?  Seems like what you want is to look at the RHS input,
ie the range value, and get the relevant element datatype from it.
That would correspond to what will happen if the <@ operator executes
normally: elem_contained_by_range does not consult the type of its LHS.

* The "return NIL" for an empty range looks pretty dubious.  Even if it
fails to fail altogether, it's not doing what we really want, which is to
signal that the condition cannot succeed so we needn't search the index.
Maybe what we should do in that case is generate an "indexkey = NULL"
qual.

* Likewise, if the range is infinite, you're just returning NIL and that's
leaving something on the table.  Probably worth generating "indexkey IS
NOT NULL" in that case; it's not going to help much in typical usage, but
it would prevent scanning nulls if there are a lot of them in the index.

* elog(ERROR) doesn't return, so stuff like this is not idiomatic:

+if (opr2oid == InvalidOid)
+{
+elog(ERROR, "no <= operator for opfamily %u", opfamily);
+return NIL;
+}

It'd be sufficient to do

+if (opr2oid == InvalidOid)
+elog(ERROR, "no <= operator for opfamily %u", opfamily);

* You're not bothering to insert any inputcollid into the generated
comparison operator nodes.  I'm not sure why that fails to fall over
for text comparisons (if indeed it does fail ...) but it's wrong.
Use the range type's collation there.

* It's sort of annoying that the whole thing only works for a Const range
value.  A different approach you might think about is to make this work
more like ScalarArrayOp, ie we pass the qual through to btree as-is and
teach relevant functions in access/nbtree/ how to extract index bound
conditions from the range datum at runtime.  That would likely end up
being a significantly larger patch, though, so you might reasonably
conclude it's not worth the effort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-14 Thread Jeff Janes
On Tue, Mar 14, 2017 at 8:40 AM, Tom Lane  wrote:

> Joe Conway  writes:
>
> > I was also thinking about that. Basically a primary method and a
> > fallback. If that were the case, a gradual transition could happen, and
> > if we want \password to enforce best practice it would be ok.
>
> Why exactly would anyone want "md5 only"?  I should think that "scram
> only" is a sensible pg_hba setting, if the DBA feels that md5 is too
> insecure, but I do not see the point of "md5 only" in 2017.  I think
> we should just start interpreting that as "md5 or better".
>

Without md5-only, a user who uses \password to change their password from a
newer client would lock themselves out of connecting again from older
clients.  As a conscious decision (either of the DBA or the user) that
would be OK, but to have it happen by default would be unfortunate.

Cheers,

Jeff


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Kevin Grittner
On Tue, Mar 14, 2017 at 6:00 AM, DEV_OPS  wrote:
> On 3/14/17 17:34, Mengxing Liu wrote:

> The worst problems have been
> seen with 32 or more cores on 4 or more sockets with a large number
> of active connections.  I don't know whether you have access to a
> machine capable of putting this kind of stress on it (perhaps at
> your university?), but if not, the community has access to various
> resources we should be able to schedule time on.
 There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
>>> Fantastic!  Can you say a bit more about the architecture and OS?
>>
>> Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB
>> SSD, running Ubuntu 14.04, Kernel 3.19.
>> I guess NUMA is disabled in BIOS, but I will check that.

I'm not sure what it would mean to "disable" NUMA -- if the CPU
chips are each functioning as memory controller for a subset of the
RAM you will have non-uniform memory access speeds from any core to
different RAM locations.  You can switch it to "interleaved" access,
so the speed of access from a core to any logical memory address
will be rather random, rather than letting the OS try to arrange
things so that processes do most of their access to memory that is
faster for them.  It is generally better to tune NUMA in the OS than
to randomize things at the BIOS level.

>> However, there is only one NIC, so network could be the
>> bottleneck if we use too many cores?

Well, if we run the pgbench client on the database server box, the
NIC won't matter at all.  If we move the client side to another box,
I still think that when we hit this problem, it will dwarf any
impact of the NIC throughput.

> The configuration is really cool, for the SSD, is it SATA interface?
> NVMe interface, or is PCIe Flash? different SSD interface had different
> performance characters.

Yeah, knowing model of SSD, as well as which particular Xeon we're
using, would be good.

> for the NUMA, because the affinity issue will impact PostgreSQL
> performance. so, Please disabled it if possible

I disagree.  (see above)

>> There are several alternative benchmarks. Tony suggested that we
>> should use TPC-E and TPC-DS.

More benchmarks is better, all other things being equal.  Keep in
mind that good benchmarking practice with PostgreSQL generally
requires a lot of setup time (so that we're starting from the exact
same conditions for every run), a lot of run time (so that the
effects of vacuuming, bloat, and page splitting all comes into play,
like it would in the real world), and a lot of repetitions of each
run (to account for variation).  In particular, on a NUMA machine it
is not at all unusual to see bifurcated

>> Personally, I am more familiar with TPC-C.

Unfortunately, the TPC-C benchmark does not create any cycles in the
transaction dependencies, meaning that it is not a great tool for
benchmarking serializable transactions.  I know there are variations
on TPC-C floating around that add a transaction type to do so, but
on a quick search right now, I was unable to find one.

>> And pgbench seems only have TPC-B built-in benchmark.

You can feed it your own custom queries, instead.

>> Well, I think we can easily find the implementations of these
>> benchmarks for PostgreSQL.

Reportedly, some of the implementations of TPC-C are not very good.
There is DBT-2, but I've known a couple of people to look at that
and find that it needed work before they could use it.  Based on the
PostgreSQL versions mentioned on the Wiki page, it has been
neglected for a while:

https://wiki.postgresql.org/wiki/DBT-2

>> The paper you recommended to me used a special benchmark defined
>> by themselves. But it is quite simple and easy to implement.

It also has the distinct advantage that we *know* they created a
scenario where the code we want to tune was using most of the CPU on
the machine.

>> For me, the challenge is profiling the execution. Is there any
>> tools in PostgreSQL to analysis where is the CPU cycles consumed?
>> Or I have to instrument and time by myself?

Generally oprofile or perf is used if you want to know where the
time is going.  This creates a slight dilemma -- if you configure
your build with --enable-cassert, you get the best stack traces and
you can more easily break down execution profiles.  That especially
true if you disable optimization and don't omit frame pointers.  But
all of those things distort the benchmarks -- adding a lot of time,
and not necessarily in proportion to where the time goes with an
optimized build.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-14 Thread Jeff Janes
On Tue, Mar 14, 2017 at 3:15 AM, Heikki Linnakangas  wrote:

> On 03/14/2017 04:47 AM, Tom Lane wrote:
>
>> Robert Haas  writes:
>>
>>> I'm not talking about changing the default, just having it be possible
>>> to use \password with the new system as it was with the old, whatever
>>> exactly we think that means.
>>>
>>
>> Seems to me the intended behavior of \password is to use the best
>> available practice.  So my guess is that it ought to use SCRAM when
>> talking to a >= 10.0 server.  What the previous password was ought
>> to be irrelevant, even if it could find that out which it shouldn't
>> be able to IMO.
>>
>
> If the server isn't set up to do SCRAM authentication, i.e. there are no
> "scram" entries in pg_hba.conf,


The method is not part of the pg_hba matching algorithm, so either the
first match is scram, or it isn't.  There is no fallback to later entries,
as I understand it.


> and you set yourself a SCRAM verifier, you have just locked yourself out
> of the system. I think that's a non-starter. There needs to be some more
> intelligence in the decision.
>

Right.  And you can lock yourself out of the system going the other way, as
well, setting a md5 password when scram is in pg_hba.  Which is how I ended
up starting this thread.


>
> It would be a lot more sensible, if there was a way to specify in
> pg_hba.conf, "scram-or-md5". We punted on that for PostgreSQL 10, but
> perhaps we should try to cram that in, after all.


So the backend for the incipient connection would consult the catalog
pg_authid before responding back to the client with an authentication
method, as opposed to merely pulling it out of pg_hba?

Cheers,

Jeff


Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-14 Thread Andres Freund
On 2017-03-13 00:35:06 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-11 22:14:07 -0500, Tom Lane wrote:
> >> This looks generally sane to me, although I'm not very happy about folding
> >> the "$(MKDIR_P) output_iso" call into pg_isolation_regress_check --- that
> >> seems weird and unlike the way it's done for the regular regression test
> >> case.
> 
> > Yea, not super happy about that either - alternatively we could fold it
> > into pg_regress.
> 
> Yeah, teaching pg_regress to auto-create the --temp-instance directory
> seems perfectly sane from here.

I was thinking about outputdir, not temp-instance.  The latter is
already created:

/* make the temp instance top directory */
make_directory(temp_instance);

Attached is an updated patch that creates outputdir if necessary.  This
is possibly going to trigger a time-to-check-time-to-use coverity
warning, but the rest of pg_regress does if(!exists) mkdir() type logic,
so I did the same.

Besides the pg_regress change, the only thing I've changed is to remove
the in-line "$(MKDIR_P) output_iso && \".

- Andres
>From a63c61ca828d186a374d05db264f95b5b07187f9 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 11 Mar 2017 15:03:18 -0800
Subject: [PATCH] Improve isolation regression tests infrastructure.

Previously if a directory had both isolationtester and plain
regression tests, they couldn't be run in parallel, because they'd
access the same files/directories.  That, so far, only affected
contrib/test_decoding.

Rather than fix that locally in contrib/test_decoding improve
pg_regress_isolation_[install]check to use separate resources from
plain regression tests.

That requires a minor change in pg_regress, namely that the
--outputdir is created if not already existing, that seems like good
idea anyway.

Use the improved helpers even where previously not used.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20170311194831.vm5ikpczq52c2...@alap3.anarazel.de
---
 contrib/test_decoding/.gitignore |  5 +++--
 contrib/test_decoding/Makefile   |  7 +--
 src/Makefile.global.in   | 29 ++--
 src/test/isolation/.gitignore|  5 ++---
 src/test/isolation/Makefile  |  8 
 src/test/modules/snapshot_too_old/.gitignore |  2 +-
 src/test/modules/snapshot_too_old/Makefile   |  6 +++---
 src/test/regress/pg_regress.c|  6 +-
 8 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/contrib/test_decoding/.gitignore b/contrib/test_decoding/.gitignore
index 1f95503494..b4903eba65 100644
--- a/contrib/test_decoding/.gitignore
+++ b/contrib/test_decoding/.gitignore
@@ -1,5 +1,6 @@
 # Generated subdirectories
 /log/
-/isolation_output/
-/regression_output/
+/results/
+/output_iso/
 /tmp_check/
+/tmp_check_iso/
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index d2bc8b8350..6c18189d9d 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -5,7 +5,7 @@ PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
 
 # Note: because we don't tell the Makefile there are any regression tests,
 # we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output
+EXTRA_CLEAN = $(pg_regress_clean_files)
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -42,11 +42,8 @@ REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
 	spill slot
 
 regresscheck: | submake-regress submake-test_decoding temp-install
-	$(MKDIR_P) regression_output
 	$(pg_regress_check) \
 	--temp-config $(top_srcdir)/contrib/test_decoding/logical.conf \
-	--temp-instance=./tmp_check \
-	--outputdir=./regression_output \
 	$(REGRESSCHECKS)
 
 regresscheck-install-force: | submake-regress submake-test_decoding temp-install
@@ -56,10 +53,8 @@ regresscheck-install-force: | submake-regress submake-test_decoding temp-install
 ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml
 
 isolationcheck: | submake-isolation submake-test_decoding temp-install
-	$(MKDIR_P) isolation_output
 	$(pg_isolation_regress_check) \
 	--temp-config $(top_srcdir)/contrib/test_decoding/logical.conf \
-	--outputdir=./isolation_output \
 	$(ISOLATIONCHECKS)
 
 isolationcheck-install-force: all | submake-isolation submake-test_decoding temp-install
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 831d39a9d1..e7862016aa 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -545,14 +545,31 @@ TEMP_CONF += --temp-config=$(TEMP_CONFIG)
 endif
 
 pg_regress_locale_flags = $(if $(ENCODING),--encoding=$(ENCODING)) $(NOLOCALE)
+pg_regress_clean_files = results/ regression.diffs regression.out tmp_check/ tmp_check_iso/ log/ output_iso/
 
-pg_regress_check = $(with_temp_install) 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Peter Geoghegan
On Tue, Mar 14, 2017 at 12:19 PM, Alvaro Herrera
 wrote:
> Impressive results.

Agreed.

It seems like an important invariant for WARM is that any duplicate
index values ought to have different TIDs (actually, it's a bit
stricter than that, since btrecheck() cares about simple binary
equality). ISTM that it would be fairly easy to modify amcheck such
that the "items in logical order" check, as well as the similar
"cross-page order" check (the one that detects transposed pages) also
check that this new WARM invariant holds. Obviously this would only
make sense on the leaf level of the index.

You wouldn't have to teach amcheck about the heap, because a TID that
points to the heap can only be duplicated within a B-Tree index
because of WARM. So, if we find that two adjacent tuples are equal,
check if the TIDs are equal. If they are also equal, check for strict
binary equality. If strict binary equality is indicated, throw an
error due to invariant failing.

IIUC, the design of WARM makes this simple enough to implement, and
cheap enough that the additional runtime overhead is well worthwhile.
You could just add this check to the existing checks without changing
the user-visible interface. It seems pretty complementary to what is
already there.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add amcheck extension to contrib.

2017-03-14 Thread Andres Freund
On 2017-03-13 14:09:39 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-03-13 15:45:01 -0400, Tom Lane wrote:
> > I could be wrong, but the most obvious explanation for this failure is
> > that autovacuum had a lock on the table or index when we looked.
> > Even if that isn't why axolotl failed in this particular case, I think
> > it's dead certain that we will see such failures from time to time
> > if this test script isn't tightened up.  IIUC what the test is trying
> > to look for, I think adding "AND pid = pg_backend_pid()" to this query
> > would be an appropriate fix.
> 
> Yes, that sounds reasonable.  Will do in a bit.  Thanks for noticing.

Pushed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 7:22 AM, Kuntal Ghosh
 wrote:
> I do have extended localBackendStatusTable with slots for non-backend
> processes. But, I've renamed it as localProcStatusTable since it
> includes all processes. I'll keep the variable name as
> localBackendStatusTable in the updated patch to avoid any confusion.
> I've extended BackendStatusArray to store auxiliary processes.
> Backends use slots indexed in the range from 1 to MaxBackends
> (inclusive), so we use MaxBackends + AuxProcType + 1 as the index of
> the slot for an auxiliary process.

I think the subject of this the thread, for which I'm probably to
blame, is bad terminology.  The processes we're talking about exposing
in pg_stat_activity here are really backends, too, I think.  They're
just ... special backends.  So I would tend to avoid any backend ->
process type of renaming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 3:37 PM, Petr Jelinek
 wrote:
> On 14/03/17 20:09, Robert Haas wrote:
>> On Tue, Mar 14, 2017 at 2:56 PM, Petr Jelinek
>>  wrote:
>>> Note that I am not necessarily saying it's better though, just trying to
>>> explain. It definitely has drawbacks, as in order to grant publish on
>>> one table you might be granting lots of privileges on various objects by
>>> granting the role. So for granularity purposes Peter's PUBLISH privilege
>>> for tables sounds better to me.
>>
>> I get that.  If, without the patch, letting user X do operation Y will
>> require either giving user X membership in a role that has many
>> privileges, and with the patch, will require only granting a specific
>> privilege on a specific object, then the latter is obviously far
>> better from a security point of view.
>>
>> However, what I'm not clear about is whether this is a situation
>> that's likely to come up much in practice.  I would have thought that
>> publications and subscriptions would typically be configured by roles
>> with quite high levels of privilege anyway, in which case the separate
>> PUBLISH privilege would rarely be used in practice, and might
>> therefore fail to be worth using up a bit.  I might be missing a
>> plausible scenario in which that's not the case, though.
>
> Yeah that's rather hard to say in front. Maybe safest action would be to
> give the permission to owners in 10 and revisit special privilege in 11
> based on feedback?

I think that would be entirely sensible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 8:02 AM, Ashutosh Sharma  wrote:
> Attached is the v6 patch for microvacuum in hash index rebased on top
> of 'v10 patch for WAL in hash index - [1]' and 'v1 patch for WAL
> consistency check for hash index - [2]'.
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1%2Bk5wR4-kAjPqLoKemuHayQd6RkQQT9gheTfpn%2B72o1UA%40mail.gmail.com
> [2] - 
> https://www.postgresql.org/message-id/flat/cagz5qcjlerun_zoo0edv6_y_d0o4tntmper7ivtlbg4rurj...@mail.gmail.com#cagz5qcjlerun_zoo0edv6_y_d0o4tntmper7ivtlbg4rurj...@mail.gmail.com
>
> Also, the patch (mask_hint_bit_LH_PAGE_HAS_DEAD_TUPLES.patch) to mask
> 'LH_PAGE_HAS_DEAD_TUPLES' flag which got added as a part of
> Microvacuum patch is attached with this mail.

Generally, this patch looks like a pretty straightforward adaptation
of the similar btree mechanism to hash indexes, so if it works for
btree it ought to work here, too.  But I noticed a few things while
reading through it.

+/* Get RelfileNode from relation OID */
+rel = relation_open(htup->t_tableOid, NoLock);
+rnode = rel->rd_node;
+relation_close(rel, NoLock);
 itup->t_tid = htup->t_self;
-_hash_doinsert(index, itup);
+_hash_doinsert(index, itup, rnode);

This is an awfully low-level place to be doing something like this.
I'm not sure exactly where this should be happening, but not in the
per-tuple callback.

+/*
+ * If there's nothing running on the standby we don't need to derive a
+ * full latestRemovedXid value, so use a fast path out of here.  This
+ * returns InvalidTransactionId, and so will conflict with all HS
+ * transactions; but since we just worked out that that's zero people,
+ * it's OK.
+ */
+if (CountDBBackends(InvalidOid) == 0)
+return latestRemovedXid;

I see that this comment (and most of what surrounds it) was just
copied from the existing btree example, but isn't there a discrepancy
between the comment and the code?  It says it returns
InvalidTransactionId, but it doesn't.  Also, you dropped the XXX from
the btree original, and the following reachedConsistency check.

+ * Hash Index delete records can conflict with standby queries.You might
+ * think that vacuum records would conflict as well, but we've handled

But they're not called delete records in a hash index.  The function
is called hash_xlog_vacuum_one_page.  The corresponding btree function
is btree_xlog_delete.  So this comment needs a little more updating.

+if (IsBufferCleanupOK(bucket_buf))
+{
+_hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
+  (buf == bucket_buf) ? true : false,
+  hnode);
+if (bucket_buf != buf)
+LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
+
+if (PageGetFreeSpace(page) >= itemsz)
+break;  /* OK, now we have enough space */
+}

I might be missing something, but I don't quite see why this needs a
cleanup lock on the primary bucket page.  I would think a cleanup lock
on the page we're actually modifying would suffice, and I think if
that's correct it would be a better way to go.  If that's not correct,
then I think the comments needs some work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Mark Kirkwood

On 15/03/17 06:29, Robert Haas wrote:



Great!  I've committed the latest version of the patch, with some
cosmetic changes.

It would be astonishing if there weren't a bug or two left, but I
think overall this is very solid work, and I think it's time to put
this out there and see how things go.



Awesome, great work!

Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Petr Jelinek
On 14/03/17 20:09, Robert Haas wrote:
> On Tue, Mar 14, 2017 at 2:56 PM, Petr Jelinek
>  wrote:
>> Note that I am not necessarily saying it's better though, just trying to
>> explain. It definitely has drawbacks, as in order to grant publish on
>> one table you might be granting lots of privileges on various objects by
>> granting the role. So for granularity purposes Peter's PUBLISH privilege
>> for tables sounds better to me.
> 
> I get that.  If, without the patch, letting user X do operation Y will
> require either giving user X membership in a role that has many
> privileges, and with the patch, will require only granting a specific
> privilege on a specific object, then the latter is obviously far
> better from a security point of view.
> 
> However, what I'm not clear about is whether this is a situation
> that's likely to come up much in practice.  I would have thought that
> publications and subscriptions would typically be configured by roles
> with quite high levels of privilege anyway, in which case the separate
> PUBLISH privilege would rarely be used in practice, and might
> therefore fail to be worth using up a bit.  I might be missing a
> plausible scenario in which that's not the case, though.
> 

Yeah that's rather hard to say in front. Maybe safest action would be to
give the permission to owners in 10 and revisit special privilege in 11
based on feedback?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentication tests, and plain 'password' authentication with a SCRAM verifier

2017-03-14 Thread Heikki Linnakangas

On 03/14/2017 09:02 PM, Jeff Janes wrote:

It is somewhat disconcerting that the client will send a plain-text
password to a mis-configured (or mal-configured) server, but I don't
think there is anything this patch series can reasonably do about
that.


Yeah. That's one pretty glaring hole with libpq: there's no option to 
disallow insecure authentication mechanisms. That's not new, but now 
that we have SCRAM that's otherwise more secure, it looks worse in 
comparison.


SCRAM also has a nice feature that it provides proof to the client, that 
the server knew the password (or rather, had a verifier for that 
password). In other words, the client knows that it connected to the 
correct server, not a fake one. But currently nothing prevents a fake 
server from simply not doing SCRAM authentication.


We clearly need something similar to sslmode=require in libpq, to 
tighten that up.



The message returned to the client for the wrong password differs between
pg_hba-set scram and pg_hba-set md5/password methods.  Is that OK?

psql: error received from server in SASL exchange: invalid-proof

psql: FATAL:  password authentication failed for user "test"


Ah yeah, I was on the fence on that one. Currently, the server returns 
the invalid-proof error to the client, as defined in RFC5802. That 
results in that error message from libpq. Alternatively, the server 
could elog(FATAL), like the other authentication mechanisms do, with the 
same message. The RFC allows that behavior too but returning the 
invalid-proof error code is potentially more friendly to 3rd party SCRAM 
implementations.


One option would be to recognize the "invalid-proof" message in libpq, 
and construct a more informative error message in libpq. Could use the 
same wording, "password authentication failed", but it would behave 
differently wrt. translation, at least.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:19 PM, Alvaro Herrera 
wrote:

> Pavan Deolasee wrote:
>
> > BTW I wanted to share some more numbers from a recent performance test. I
> > thought it's important because the latest patch has fully functional
> chain
> > conversion code as well as all WAL-logging related pieces are in place
> > too. I ran these tests on a box borrowed from Tomas (thanks!).  This has
> > 64GB RAM and 350GB SSD with 1GB on-board RAM. I used the same test setup
> > that I used for the first test results reported on this thread i.e. a
> > modified pgbench_accounts table with additional columns and additional
> > indexes (one index on abalance so that every UPDATE is a potential WARM
> > update).
> >
> > In a test where table + indexes exceeds RAM, running for 8hrs and
> > auto-vacuum parameters set such that we get 2-3 autovacuums on the table
> > during the test, we see WARM delivering more than 100% TPS as compared to
> > master. In this graph, I've plotted a moving average of TPS and the
> spikes
> > that we see coincides with the checkpoints (checkpoint_timeout is set to
> > 20mins and max_wal_size large enough to avoid any xlog-based
> checkpoints).
> > The spikes are more prominent on WARM but I guess that's purely because
> it
> > delivers much higher TPS. I haven't shown here but I see WARM updates
> close
> > to 65-70% of the total updates. Also there is significant reduction in
> WAL
> > generated per txn.
>
> Impressive results.  Labels on axes would improve readability of the chart
> :-)
>
>
Sorry about that. I was desperately searching for Undo button after hitting
"send" for the very same reason :-) Looks like I used gnuplot after a few
years.

Just to make it clear, the X-axis is duration of tests in seconds and
Y-axis is 450s moving average of TPS. BTW 450 is no magic figure. I
collected stats every 15s and took a moving average of last 30 samples.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> It's true that as soon as we need another overflow page, that's going to
> >> get dropped beyond the 2^{N+1}-1 point, and the *apparent* size of the
> >> index will grow quite a lot.  But any modern filesystem should handle
> >> that without much difficulty by treating the index as a sparse file.
> 
> > Uh, last I heard we didn't allow or want sparse files in the backend
> > because then we have to handle a possible out-of-disk-space failure on
> > every write.
> 
> For a hash index, this would happen during a bucket split, which would
> need to be resilient against out-of-disk-space anyway.

We wouldn't attempt to use the area of the file which is not yet
allocated except when doing a bucket split?  If that's the case then
this does seem to at least be less of an issue, though I hope we put in
appropriate comments about it.

> >> There may be some work to be done in places like pg_basebackup to
> >> recognize and deal with sparse files, but it doesn't seem like a
> >> reason to panic.
> 
> > Well, and every file-based backup tool out there..
> 
> Weren't you the one leading the charge to deprecate use of file-based
> backup?

No, nor do I see how we would ever be able to deprecate file-based
backups.  If anything, I'd like to see us improve our support for
them.  I'm certainly curious where the notion that I was ever in favor
of deprecating them came from, particularly given all of the effort that
David and I have been pouring into our favorite file-based backup tool
over the past few years.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Alvaro Herrera
Pavan Deolasee wrote:

> BTW I wanted to share some more numbers from a recent performance test. I
> thought it's important because the latest patch has fully functional chain
> conversion code as well as all WAL-logging related pieces are in place
> too. I ran these tests on a box borrowed from Tomas (thanks!).  This has
> 64GB RAM and 350GB SSD with 1GB on-board RAM. I used the same test setup
> that I used for the first test results reported on this thread i.e. a
> modified pgbench_accounts table with additional columns and additional
> indexes (one index on abalance so that every UPDATE is a potential WARM
> update).
> 
> In a test where table + indexes exceeds RAM, running for 8hrs and
> auto-vacuum parameters set such that we get 2-3 autovacuums on the table
> during the test, we see WARM delivering more than 100% TPS as compared to
> master. In this graph, I've plotted a moving average of TPS and the spikes
> that we see coincides with the checkpoints (checkpoint_timeout is set to
> 20mins and max_wal_size large enough to avoid any xlog-based checkpoints).
> The spikes are more prominent on WARM but I guess that's purely because it
> delivers much higher TPS. I haven't shown here but I see WARM updates close
> to 65-70% of the total updates. Also there is significant reduction in WAL
> generated per txn.

Impressive results.  Labels on axes would improve readability of the chart :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Alvaro Herrera
Pavan Deolasee wrote:
> On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera 
> wrote:

> > I have already commented about the executor involvement in btrecheck();
> > that doesn't seem good.  I previously suggested to pass the EState down
> > from caller, but that's not a great idea either since you still need to
> > do the actual FormIndexDatum.  I now think that a workable option would
> > be to compute the values/isnulls arrays so that btrecheck gets them
> > already computed.
> 
> I agree with your complaint about modularity violation. What I am unclear
> is how passing values/isnulls array will fix that. The way code is
> structured currently, recheck routines are called by index_fetch_heap(). So
> if we try to compute values/isnulls in that function, we'll still need
> access EState, which AFAIU will lead to similar violation. Or am I
> mis-reading your idea?

You're right, it's still a problem.  (Honestly, I think the whole idea
of trying to compute a fake index tuple starting from a just-read heap
tuple is a problem in itself; I just wonder if there's a way to do the
recheck that doesn't involve such a thing.)

> I wonder if we should instead invent something similar to IndexRecheck(),
> but instead of running ExecQual(), this new routine will compare the index
> values by the given HeapTuple against given IndexTuple. ISTM that for this
> to work we'll need to modify all callers of index_getnext() and teach them
> to invoke the AM specific recheck method if xs_tuple_recheck flag is set to
> true by index_getnext().

Yeah, grumble, that idea does sound intrusive, but perhaps it's
workable.  What about bitmap indexscans?  AFAICS we already have a
recheck there natively, so we only need to mark the page as lossy, which
we're already doing anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> However, what I'm not clear about is whether this is a situation
> that's likely to come up much in practice.  I would have thought that
> publications and subscriptions would typically be configured by roles
> with quite high levels of privilege anyway, in which case the separate
> PUBLISH privilege would rarely be used in practice, and might
> therefore fail to be worth using up a bit.  I might be missing a
> plausible scenario in which that's not the case, though.

Right, this is part of my concern also.

Further, PUBLISH, as I understand it, is something of a one-time or at
least reasonably rarely done operation.  This is quite different from a
SELECT privilege which is used on every query against the table and
which may be GRANT'd to user X today and user Y tomorrow and perhaps
REVOKE'd from user X the next day.

What happens when the PUBLISH right is REVOKE'd from the user who did
the PUBLISH in the first place, for example..?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> It's true that as soon as we need another overflow page, that's going to
>> get dropped beyond the 2^{N+1}-1 point, and the *apparent* size of the
>> index will grow quite a lot.  But any modern filesystem should handle
>> that without much difficulty by treating the index as a sparse file.

> Uh, last I heard we didn't allow or want sparse files in the backend
> because then we have to handle a possible out-of-disk-space failure on
> every write.

For a hash index, this would happen during a bucket split, which would
need to be resilient against out-of-disk-space anyway.

>> There may be some work to be done in places like pg_basebackup to
>> recognize and deal with sparse files, but it doesn't seem like a
>> reason to panic.

> Well, and every file-based backup tool out there..

Weren't you the one leading the charge to deprecate use of file-based
backup?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 2:56 PM, Petr Jelinek
 wrote:
> Note that I am not necessarily saying it's better though, just trying to
> explain. It definitely has drawbacks, as in order to grant publish on
> one table you might be granting lots of privileges on various objects by
> granting the role. So for granularity purposes Peter's PUBLISH privilege
> for tables sounds better to me.

I get that.  If, without the patch, letting user X do operation Y will
require either giving user X membership in a role that has many
privileges, and with the patch, will require only granting a specific
privilege on a specific object, then the latter is obviously far
better from a security point of view.

However, what I'm not clear about is whether this is a situation
that's likely to come up much in practice.  I would have thought that
publications and subscriptions would typically be configured by roles
with quite high levels of privilege anyway, in which case the separate
PUBLISH privilege would rarely be used in practice, and might
therefore fail to be worth using up a bit.  I might be missing a
plausible scenario in which that's not the case, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Stephen Frost
Greetings,

* Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote:
> On 14/03/17 19:47, Robert Haas wrote:
> > On Tue, Mar 14, 2017 at 2:41 PM, Petr Jelinek
> >  wrote:
> >> My understanding of what Shephen is proposing is, you have "ownerA" of
> >> tableA and "ownerB" of tableB, then you want role "publishe"r to be able
> >> to publish those, so you simply grant it the "ownerA" and "ownerB"
> >> roles. Obviously that might is many situations mean that the "publisher"
> >> role potentially also gets sweeping privileges to other tables which may
> >> not be desirable.
> > 
> > I didn't hear Stephen propose that "publish" should be a
> > role-attribute, and I don't understand why that would be a good idea.
> > Presumably, we don't want unprivileged users to be able to fire up
> > logical replication because that involves making connections to other
> > systems from the PostgreSQL operating system user's account, and that
> > should be a privileged operation.  But that's the subscriber side, not
> > the publisher side.
> > 
> > I don't otherwise follow Stephen's argument.  It seems like he's
> > complaining that PUBLISH might give more access to the relation than
> > SELECT, but, uh, that's what granting additional privileges does in
> > general, by definition.  Mostly we consider that a feature, not a bug.
> 
> Not what I mean - owner should be able to publish table. If you are
> granted role of the owner you can do what owner can no? That's how I
> understand Stephen's proposal.

Exactly correct, yes.  I was not suggesting it be a role attribute.

If we move forward with making PUBLISH a GRANT'able option then I do
worry that people will be surprised that PUBLISH gives you more access
to the table than a straight SELECT does.  We have a good deal of
granularity in what a user can GRANT'd to see and PUBLISH completely
ignores all of that.

The way I see PUBLISH, it's another command which is able to read from a
table, not unlike the way COPY works, but we don't have an independent
COPY privilege and the COPY command does actually respect the SELECT
privileges on the table.

Another approach to solving my concern would be to only allow the
publishing of tables by non-owner users who have table-level SELECT
rights (meaning they can see all columns of the table) and which don't
have RLS enabled.

Frankly, though, I really don't buy the argument that there's a serious
use-case for non-owners to be GRANT'd the PUBLISH capability, which
means we would end up burning one of the few remaining privilege bits
for something that isn't going to be used and I don't care for that
either.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Authentication tests, and plain 'password' authentication with a SCRAM verifier

2017-03-14 Thread Jeff Janes
On Tue, Mar 14, 2017 at 5:36 AM, Heikki Linnakangas  wrote:

> Hi,
>
> I didn't include the authentication TAP tests that Michael wrote in the
> main SCRAM commit last week. The main issue was that the new test was
> tacked on the src/test/recovery test suite, for lack of a better place. I
> propose that we add a whole new src/test/authentication directory for it.
> It would also be logical to merge src/test/ssl into it, but the SSL test
> suite has some complicated setup steps, to create the certificates, and it
> cannot be safely run on a multi-user system. So probably best to keep it
> separate, after all.
>
> While looking at the test, I noticed that the SCRAM patch didn't include
> support for logging in with plain 'password' authentication, when the user
> has a SCRAM verifier stored in pg_authid. That was an oversight. If the
> client gives the server the plain password, it's easy for the server to
> verify that it matches the SCRAM verifier.
>

I noticed the asymmetry over plain-text passwords, and didn't know if it
was intentional or not.  It is somewhat disconcerting that the client will
send a plain-text password to a mis-configured (or mal-configured) server,
but I don't think there is anything this patch series can reasonably do
about that.


>
> Attached patches add the TAP test suite, and implement plain 'password'
> authentication for users with SCRAM verifier. Any comments?


Does what it says, says what it does.  There is no installcheck target,
which makes sense because it inherently has to muck around with
pg_hba.conf.  The test should be updated to test the syntax for
0001-Add-clause-PASSWORD-val-USING-protocol-to-CREATE-ALT.patch if that
gets committed.

The message returned to the client for the wrong password differs between
pg_hba-set scram and pg_hba-set md5/password methods.  Is that OK?

psql: error received from server in SASL exchange: invalid-proof

psql: FATAL:  password authentication failed for user "test"

Cheers,

Jeff


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-14 Thread Robert Haas
On Mon, Mar 13, 2017 at 10:36 AM, Ashutosh Sharma  wrote:
> Couple of review comments,,
>
> You may also need to update the documentation as now we are also going
> to support wal consistency check for hash index. The current
> documentation does not include hash index.
>
> +only records originating from those resource managers.  Currently,
> +the supported resource managers are heap,
> +heap2, btree, gin,
> +gist, sequence, spgist,
> +brin, and generic. Only

Did that, committed this.  Also ran pgindent over hash_mask() and
fixed an instance of dubious capitalization.

> Following comment in hash_mask() may require changes if patch for
> 'Microvacuum support for Hash Index - [1]' gets committed.
>
> +   /*
> +* In hash bucket and overflow pages, it is possible to modify the
> +* LP_FLAGS without emitting any WAL record. Hence, mask the line
> +* pointer flags.
> +* See hashgettuple() for details.
> +*/

If that's so, then that patch is responsible for updating these
comments (and the code, if necessary).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Tue, Mar 14, 2017 at 2:14 PM, Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> It's become pretty clear to me that there are a bunch of other things
> >>> about hash indexes which are not exactly great, the worst of which is
> >>> the way they grow by DOUBLING IN SIZE.
> 
> >> Uh, what?  Growth should happen one bucket-split at a time.
> 
> > Technically, the buckets are created one at a time, but because of the
> > way hashm_spares works, the primary bucket pages for all bucket from
> > 2^N to 2^{N+1}-1 must be physically consecutive.  See
> > _hash_alloc_buckets.
> 
> Right, but we only fill those pages one at a time.
> 
> It's true that as soon as we need another overflow page, that's going to
> get dropped beyond the 2^{N+1}-1 point, and the *apparent* size of the
> index will grow quite a lot.  But any modern filesystem should handle
> that without much difficulty by treating the index as a sparse file.

Uh, last I heard we didn't allow or want sparse files in the backend
because then we have to handle a possible out-of-disk-space failure on
every write.

If we think they're ok to do, it'd be awful nice to figure out a way for
VACUUM to turn an entirely-empty 1G chunk into a sparse file..

> There may be some work to be done in places like pg_basebackup to
> recognize and deal with sparse files, but it doesn't seem like a
> reason to panic.

Well, and every file-based backup tool out there..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Petr Jelinek
On 14/03/17 19:49, Petr Jelinek wrote:
> On 14/03/17 19:47, Robert Haas wrote:
>> On Tue, Mar 14, 2017 at 2:41 PM, Petr Jelinek
>>  wrote:
>>> My understanding of what Shephen is proposing is, you have "ownerA" of
>>> tableA and "ownerB" of tableB, then you want role "publishe"r to be able
>>> to publish those, so you simply grant it the "ownerA" and "ownerB"
>>> roles. Obviously that might is many situations mean that the "publisher"
>>> role potentially also gets sweeping privileges to other tables which may
>>> not be desirable.
>>
>> I didn't hear Stephen propose that "publish" should be a
>> role-attribute, and I don't understand why that would be a good idea.
>> Presumably, we don't want unprivileged users to be able to fire up
>> logical replication because that involves making connections to other
>> systems from the PostgreSQL operating system user's account, and that
>> should be a privileged operation.  But that's the subscriber side, not
>> the publisher side.
>>
>> I don't otherwise follow Stephen's argument.  It seems like he's
>> complaining that PUBLISH might give more access to the relation than
>> SELECT, but, uh, that's what granting additional privileges does in
>> general, by definition.  Mostly we consider that a feature, not a bug.
>>
> 
> Not what I mean - owner should be able to publish table. If you are
> granted role of the owner you can do what owner can no? That's how I
> understand Stephen's proposal.
> 

Note that I am not necessarily saying it's better though, just trying to
explain. It definitely has drawbacks, as in order to grant publish on
one table you might be granting lots of privileges on various objects by
granting the role. So for granularity purposes Peter's PUBLISH privilege
for tables sounds better to me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
Hi,

On 2017-03-14 20:28:51 +0200, Heikki Linnakangas wrote:
> On 03/14/2017 07:31 PM, Andres Freund wrote:
> > On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:
> > > * How tight are we on space in the ExprEvalStep union? Currently, the
> > > jump-threading preparation replaces the opcodes with the goto labels, but 
> > > it
> > > would be really nice to keep the original opcodes, for debugging purposes 
> > > if
> > > nothing else.
> > 
> > There's nothing left to spare :(.  For debugging I've just used
> > ExecEvalStepOp() - it's inconvenient to directly print the struct
> > anyway, since gdb prints the whole union...
> 
> This comment above the union is not accurate:
> 
> > /*
> >  * Data for an operation. Inline stored data is faster to access, but 
> > also
> >  * bloats the size of all instructions. The union should be kept below 
> > 48
> >  * bytes (so the entire struct is below 64bytes, a single cacheline on
> >  * common systems).
> >  */
> 
> On my x86-64 system, the whole struct is 64 bytes, but the union is only 40
> bytes. The fields before the union occupy 24 bytes. IOW, the union should be
> kept below 40 bytes, not 48.

Point.  Will update.


> Hmm. Could we make the instructions variable size? It would allow packing
> the small instructions even more tight, and we wouldn't need to obsess over
> a particular maximum size for more complicated instructions.

That makes jumps a lot more complicated.  I'd experimented with it and
given it up as "not worth it".  If we were to try to do so, we'd also
not like storing the pointer and enum variants both, since it'd again
would reduce the density.


> A compromise might be to have the fixed size, but have "continuation"
> opcodes for the more complicated instructions. An XmlExpr instruction, for
> example, could have an extra instruction after the primary one, just to hold
> more fields. When evaluating it, we would just increment the Program Counter
> by two instead of one, to skip over the extra instruction.

I think for cases where 40 bytes becomes the limit, it's usually better
to have out-of-line state.  We could do this, but we'd also waste a
bunch of space in the array (namely the 24 byte Op "header").

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Petr Jelinek
On 14/03/17 19:47, Robert Haas wrote:
> On Tue, Mar 14, 2017 at 2:41 PM, Petr Jelinek
>  wrote:
>> My understanding of what Shephen is proposing is, you have "ownerA" of
>> tableA and "ownerB" of tableB, then you want role "publishe"r to be able
>> to publish those, so you simply grant it the "ownerA" and "ownerB"
>> roles. Obviously that might is many situations mean that the "publisher"
>> role potentially also gets sweeping privileges to other tables which may
>> not be desirable.
> 
> I didn't hear Stephen propose that "publish" should be a
> role-attribute, and I don't understand why that would be a good idea.
> Presumably, we don't want unprivileged users to be able to fire up
> logical replication because that involves making connections to other
> systems from the PostgreSQL operating system user's account, and that
> should be a privileged operation.  But that's the subscriber side, not
> the publisher side.
> 
> I don't otherwise follow Stephen's argument.  It seems like he's
> complaining that PUBLISH might give more access to the relation than
> SELECT, but, uh, that's what granting additional privileges does in
> general, by definition.  Mostly we consider that a feature, not a bug.
> 

Not what I mean - owner should be able to publish table. If you are
granted role of the owner you can do what owner can no? That's how I
understand Stephen's proposal.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 2:41 PM, Petr Jelinek
 wrote:
> My understanding of what Shephen is proposing is, you have "ownerA" of
> tableA and "ownerB" of tableB, then you want role "publishe"r to be able
> to publish those, so you simply grant it the "ownerA" and "ownerB"
> roles. Obviously that might is many situations mean that the "publisher"
> role potentially also gets sweeping privileges to other tables which may
> not be desirable.

I didn't hear Stephen propose that "publish" should be a
role-attribute, and I don't understand why that would be a good idea.
Presumably, we don't want unprivileged users to be able to fire up
logical replication because that involves making connections to other
systems from the PostgreSQL operating system user's account, and that
should be a privileged operation.  But that's the subscriber side, not
the publisher side.

I don't otherwise follow Stephen's argument.  It seems like he's
complaining that PUBLISH might give more access to the relation than
SELECT, but, uh, that's what granting additional privileges does in
general, by definition.  Mostly we consider that a feature, not a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel seq. plan is not coming against inheritance or partition table

2017-03-14 Thread Robert Haas
On Mon, Mar 13, 2017 at 10:18 AM, Amit Kapila  wrote:
> Agreed, so I have rebased your patch and passed heap_pages as -1 for
> index_only scans as discussed.   Also, Rafia has tested with attached
> patch that parallel index and parallel index only scans are picked for
> TPC-H queries.  I have also reviewed and tested your changes with
> respect to problem reported and found that it works as expected.  So,
> I think we can go ahead with attached patch unless you see some
> problem with the changes I have made.

OK, committed with a little more wordsmithing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-14 Thread Petr Jelinek
On 10/03/17 20:02, Peter Eisentraut wrote:
> On 2/27/17 22:10, Stephen Frost wrote:
>> Peter,
>>
>> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>>> On 2/18/17 18:06, Stephen Frost wrote:
 I'm not convinced that it really makes sense to have PUBLICATION of a
 table be independent from the rights an owner of a table has.  We don't
 allow other ALTER commands on objects based on GRANT'able rights, in
 general, so I'm not really sure that it makes sense to do so here.
>>>
>>> The REFERENCES and TRIGGER privileges are very similar in principle.
>>
>> TRIGGER is a great example of an, ultimately, poorly chosen privilege to
>> GRANT away, with a good history of discussion about it:
>>
>> https://www.postgresql.org/message-id/21827.1166115978%40sss.pgh.pa.us
>> https://www.postgresql.org/message-id/7389.1405554356%40sss.pgh.pa.us
> 
> Those discussions about the trigger privileges are valid, but the actual
> reason why this is a problem is because our trigger implementation is
> broken by default.  In the SQL standard, triggers are executed as the
> table owner, so the trigger procedure does not have full account access
> to the role that is causing the trigger to fire.  This is an independent
> problem that deserves consideration, but it does not invalidate the kind
> of privilege that can be granted.
> 
>>> Then you couldn't set up a replication structure involving tables owned
>>> by different users without resorting to blunt instruments like having
>>> everything owned by the same user or using superusers.
>>
>> That's not correct- you would simply need a user who is considered an
>> owner for all of the objects which you want to replicate, that doesn't
>> have to be a *direct* owner or a superuser.
>>
>> The tables can have individual roles, with some admin role who is a
>> member of those other roles, or another mechanism (as Simon has proposed
>> not too long ago...) to have a given role be considered equivilant to an
>> owner for all of the relations in a particular database.
> 
> I'm not really following what you are saying here.  It seems to me that
> you are describing a new kind of facility that gives a role a given
> capability with respect to a table.
> 
> If so, we already have that, namely privileges.  If not, please elaborate.
> 

My understanding of what Shephen is proposing is, you have "ownerA" of
tableA and "ownerB" of tableB, then you want role "publishe"r to be able
to publish those, so you simply grant it the "ownerA" and "ownerB"
roles. Obviously that might is many situations mean that the "publisher"
role potentially also gets sweeping privileges to other tables which may
not be desirable.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera 
wrote:

> > @@ -234,6 +236,21 @@ index_beginscan(Relation heapRelation,
> >   scan->heapRelation = heapRelation;
> >   scan->xs_snapshot = snapshot;
> >
> > + /*
> > +  * If the index supports recheck, make sure that index tuple is
> saved
> > +  * during index scans.
> > +  *
> > +  * XXX Ideally, we should look at all indexes on the table and
> check if
> > +  * WARM is at all supported on the base table. If WARM is not
> supported
> > +  * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> > +  * do that and sets rd_supportswarm after looking at all indexes.
> But we
> > +  * don't know if the function was called earlier in the session
> when we're
> > +  * here. We can't call it now because there exists a risk of
> causing
> > +  * deadlock.
> > +  */
> > + if (indexRelation->rd_amroutine->amrecheck)
> > + scan->xs_want_itup = true;
> > +
> >   return scan;
> >  }
>
> I didn't like this comment very much.  But it's not necessary: you have
> already given relcache responsibility for setting rd_supportswarm.  The
> only problem seems to be that you set it in RelationGetIndexAttrBitmap
> instead of RelationGetIndexList, but it's not clear to me why.


Hmm. I think you're right. Will fix that way and test.


>
> I noticed that nbtinsert.c and nbtree.c have a bunch of new includes
> that they don't actually need.  Let's remove those.  nbtutils.c does
> need them because of btrecheck().


Right. It's probably a left over from the way I wrote the first version.
Will fix.

Speaking of which:
>
> I have already commented about the executor involvement in btrecheck();
> that doesn't seem good.  I previously suggested to pass the EState down
> from caller, but that's not a great idea either since you still need to
> do the actual FormIndexDatum.  I now think that a workable option would
> be to compute the values/isnulls arrays so that btrecheck gets them
> already computed.


I agree with your complaint about modularity violation. What I am unclear
is how passing values/isnulls array will fix that. The way code is
structured currently, recheck routines are called by index_fetch_heap(). So
if we try to compute values/isnulls in that function, we'll still need
access EState, which AFAIU will lead to similar violation. Or am I
mis-reading your idea?

I wonder if we should instead invent something similar to IndexRecheck(),
but instead of running ExecQual(), this new routine will compare the index
values by the given HeapTuple against given IndexTuple. ISTM that for this
to work we'll need to modify all callers of index_getnext() and teach them
to invoke the AM specific recheck method if xs_tuple_recheck flag is set to
true by index_getnext().

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 14, 2017 at 2:14 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> It's become pretty clear to me that there are a bunch of other things
>>> about hash indexes which are not exactly great, the worst of which is
>>> the way they grow by DOUBLING IN SIZE.

>> Uh, what?  Growth should happen one bucket-split at a time.

> Technically, the buckets are created one at a time, but because of the
> way hashm_spares works, the primary bucket pages for all bucket from
> 2^N to 2^{N+1}-1 must be physically consecutive.  See
> _hash_alloc_buckets.

Right, but we only fill those pages one at a time.

It's true that as soon as we need another overflow page, that's going to
get dropped beyond the 2^{N+1}-1 point, and the *apparent* size of the
index will grow quite a lot.  But any modern filesystem should handle
that without much difficulty by treating the index as a sparse file.

There may be some work to be done in places like pg_basebackup to
recognize and deal with sparse files, but it doesn't seem like a
reason to panic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 2:14 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> It's become pretty clear to me that there are a bunch of other things
>> about hash indexes which are not exactly great, the worst of which is
>> the way they grow by DOUBLING IN SIZE.
>
> Uh, what?  Growth should happen one bucket-split at a time.

Technically, the buckets are created one at a time, but because of the
way hashm_spares works, the primary bucket pages for all bucket from
2^N to 2^{N+1}-1 must be physically consecutive.  See
_hash_alloc_buckets.

>> Other things that are not so great:
>
>> - no multi-column support
>> - no amcanunique support
>> - every insert dirties the metapage
>> - splitting is generally too aggressive; very few overflow pages are
>> ever created unless you have piles of duplicates
>
> Yeah.  It's a bit hard to see how to add multi-column support unless you
> give up the property of allowing queries on a subset of the index columns.
> Lack of amcanunique seems like mostly a round-tuit shortage.  The other
> two are implementation deficiencies that maybe we can remedy someday.
>
> Another thing I'd like to see is support for 64-bit hash values.
>
> But all of these were mainly blocked by people not wanting to sink effort
> into hash indexes as long as they were unusable for production due to lack
> of WAL support.  So this is a huge step forward.

Agreed, on all points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Heikki Linnakangas

On 03/14/2017 07:31 PM, Andres Freund wrote:

On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:

* How tight are we on space in the ExprEvalStep union? Currently, the
jump-threading preparation replaces the opcodes with the goto labels, but it
would be really nice to keep the original opcodes, for debugging purposes if
nothing else.


There's nothing left to spare :(.  For debugging I've just used
ExecEvalStepOp() - it's inconvenient to directly print the struct
anyway, since gdb prints the whole union...


This comment above the union is not accurate:


/*
 * Data for an operation. Inline stored data is faster to access, but 
also
 * bloats the size of all instructions. The union should be kept below 
48
 * bytes (so the entire struct is below 64bytes, a single cacheline on
 * common systems).
 */


On my x86-64 system, the whole struct is 64 bytes, but the union is only 
40 bytes. The fields before the union occupy 24 bytes. IOW, the union 
should be kept below 40 bytes, not 48.


Hmm. Could we make the instructions variable size? It would allow 
packing the small instructions even more tight, and we wouldn't need to 
obsess over a particular maximum size for more complicated instructions.


A compromise might be to have the fixed size, but have "continuation" 
opcodes for the more complicated instructions. An XmlExpr instruction, 
for example, could have an extra instruction after the primary one, just 
to hold more fields. When evaluating it, we would just increment the 
Program Counter by two instead of one, to skip over the extra instruction.


For reference, here are the sizes of all the structs in the union:

40: xmlexpr
40: scalararrayop
40: minmax
40: iocoerce
40: convert_rowtype
32: wholerow
32: rowcompare_step
32: row
32: func
32: fieldstore
32: domaincheck
32: boolexpr
32: arrayexpr
32: arraycoerce
24: casewhen
24: casethen
24: arrayref_checksubscript
16: grouping_func
16: fieldselect
16: constval
16: casetest
16: assign_var
16: arrayref
8: window_func
8: subplan
8: sqlvaluefunction
8: param
8: nulltest_row
8: assign_tmp
8: alternative_subplan
8: aggref
4: var
4: rowcompare_final
4: qualexpr
4: fetch
4: coalesce

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
>> It would be good to have someone at least read it before pushing, but
>> I don't think anyone other than you has done so.

> I'd love for somebody else
> to look through it, I tried asking multiple times...  Seems like
> threatening a commit is the best way to get it :P

I'm willing to look, but the last few messages make it sound like you're
not all that close to a finished version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Tom Lane
Robert Haas  writes:
> It's become pretty clear to me that there are a bunch of other things
> about hash indexes which are not exactly great, the worst of which is
> the way they grow by DOUBLING IN SIZE.

Uh, what?  Growth should happen one bucket-split at a time.

> Other things that are not so great:

> - no multi-column support
> - no amcanunique support
> - every insert dirties the metapage
> - splitting is generally too aggressive; very few overflow pages are
> ever created unless you have piles of duplicates

Yeah.  It's a bit hard to see how to add multi-column support unless you
give up the property of allowing queries on a subset of the index columns.
Lack of amcanunique seems like mostly a round-tuit shortage.  The other
two are implementation deficiencies that maybe we can remedy someday.

Another thing I'd like to see is support for 64-bit hash values.

But all of these were mainly blocked by people not wanting to sink effort
into hash indexes as long as they were unusable for production due to lack
of WAL support.  So this is a huge step forward.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 3:31 AM, Pavel Stehule  wrote:
> Very often strategy can be recheck of parent process  in some waiting
> cycles. It should not to impact performance.

I think that's going to be hard to arrange, and I think it isn't
necessary.  If the leader wants to arrange for the worker to die when
it exits, it can use TerminateBackgroundWorker() from a
PG_ENSURE_ERROR_CLEANUP block or on_shmem_exit hook.

> I afraid so some waiting times in bg process can be high probable with this
> patch - and then is probable so somebody use pg_terminate_backend. This
> situation should not to finish by server restart.

I don't understand.  The only way you'd need a server restart is if a
background process wasn't responding to SIGTERM, and that's a bug
independent of anything this patch does.  It would be cause by the
background process not doing CHECK_FOR_INTERRUPTS() or the moral
equivalent regularly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 1:40 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Great!  I've committed the latest version of the patch, with some
>> cosmetic changes.
>
> Woo hoo!  That's been a bee in the bonnet for, um, decades.

Yeah.  I'm pretty happy to see that go in.

It's become pretty clear to me that there are a bunch of other things
about hash indexes which are not exactly great, the worst of which is
the way they grow by DOUBLING IN SIZE.  Mithun has submitted a patch
which - if it's acceptable - would alleviate that to some degree by
splitting up each doubling into four separate increments.  But that
still could lead to very large growth increments on big indexes, like
your 32GB index suddenly growing - in one fell swoop - to 40GB.
That's admittedly a lot better than what will happen right now, which
is instantaneous growth to 64GB, but it's not great, and it's not
altogether clear how it could be fixed in a really satisfying way.

Other things that are not so great:

- no multi-column support
- no amcanunique support
- every insert dirties the metapage
- splitting is generally too aggressive; very few overflow pages are
ever created unless you have piles of duplicates

Still, this is a big step forward.  I think it will be useful for
people with long text strings or other very wide data that they want
to index, and hopefully we'll get some feedback from users about where
this works well and where it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
> Patch 0003 is huge.

I suspect you mean 0004?  If so - yes :(.  I unfortunately don't think
there's a useful way to split it in smaller chunks - I originally moved
ops over one-by-one, but that required a lot of duplicated structs and
such...


> It would be good to have someone at least read it before pushing, but
> I don't think anyone other than you has done so.

I'd love for somebody else
to look through it, I tried asking multiple times...  Seems like
threatening a commit is the best way to get it :P


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Tom Lane
Robert Haas  writes:
> Great!  I've committed the latest version of the patch, with some
> cosmetic changes.

Woo hoo!  That's been a bee in the bonnet for, um, decades.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
On 2017-03-14 08:28:02 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > EEO_SWITCH(op->opcode)
> > {
> > EEO_CASE(EEO_DONE):
> > goto out;
> 
> Oh my.
> 
> > which is a bit annoying.  (the EEO_CASE is either a jump label or a case
> > statement, depending on computed goto availability).
> > 
> > It seems we could either:
> > 1) live with the damage
> > 2) disable pgindent
> > 3) move the : inside EEO_CASE's definition, and only use {} blocks.
> 
> I think 3) is nasty because the result doesn't look like normal C.

We have a bunch of such constructs already however. foreach(),
PG_TRY().  That means 3) has the advantage that our editors and pgindent
can already deal with it.


> EEO_CASE() are potentially jump labels, then indentation becomes
> correct.  Why not accept it?  It looks odd, but that gives a better hint
> to the reader about what's going on.

Seems likely that every editor would indent them differently :(


I'm leaning towards 3) for the reasons above and after trying out the
different indentations, but I don't have a strong opinion.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Andres Freund
Hi,


On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:
> On 03/14/2017 08:53 AM, Andres Freund wrote:
> > Besides that, this version has:
> > - pgindented most of the affected pieces (i.e. all significant new code
> >   has been reindent, not all touched one)
> 
> I think you'll need to add all the inner structs ExprEvalStep typedefs.list
> to indent them right.

Yea, I saw that.  Since they're not named atm, That'd probably not work
well.  I'd be inclined to just live with it :/


> I looked at patch 0004.

Thanks!


> * EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really assumes
> that 'op' has already been set to point to the jump target. I find that a
> bit weird. I guess the idea is that you always pass the Program Counter
> variable as 'op' argument. For consistency, would be good if EEO_SWITCH()
> also took just 'op' as the argument, rather than op->opcode. But I think it
> would be more clear if they should both just assumed that there's a variable
> called 'op' that points to the current instruction.

Hm.  I dislike magic variable names. I think I prefer your idea of
passing the op entirely to EEO_SWITCH.


> * All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to calling
> EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(),
> to encapsulate setting 'op' and jumping to it in one operation?

Ok.


> * ExecEvalStepOp() seems relatively expensive, with the linear scan over all
> the opcodes, if called on an ExprState that already has
> EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the opcode is
> a particular one, so you could check if the opcode matches that particular
> one, instead of scanning the dispatch table to find what it is.

It should be fairly cheap at that place, unless the same expression is
instantiated twice for some reason.  I have been wondering about adding
a second table ptr->op that can be binary searched for the operation to
make it cheaper.


> * But is ExecEvalStepOp() ever actually get called on an expression with
> EEO_FLAG_JUMP_THREADED already set? It's only used in
> ExecInstantiateInterpretedExpr(), and it's called only once on each
> ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED is
> not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and assert
> that evalfunc != ExecInterpExpr.

The primary use of ExecEvalStepOp() is really debugging, so I'd like to
avoid adding that restriction.  I guess we can add a second function for
this if needed.


> * How tight are we on space in the ExprEvalStep union? Currently, the
> jump-threading preparation replaces the opcodes with the goto labels, but it
> would be really nice to keep the original opcodes, for debugging purposes if
> nothing else.

There's nothing left to spare :(.  For debugging I've just used
ExecEvalStepOp() - it's inconvenient to directly print the struct
anyway, since gdb prints the whole union...


> * execInterpExpr.c is quite a mouthful. How about exprInterpreter.c?



> Attached is a patch, on top of your
> 0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with some
> minor tweaks and comments here and there (search for HEIKKI). It's mostly
> the same stuff I listed above, implemented in a quick & dirty way. I hope
> it's helpful.

Thanks!



>  typedef enum ExprEvalOp
>  {
> + /*
> +  * HEIKKI: How about renaming these to EEOP_* ? I think it'd be a
> +  * good mnemonic to remember that these are opcodes, and just one
> +  * extra letter.
> +  */

I don't mind the change.


> + /* HEIKKI: Is it safe to assume that sizeof(size_t) >= sizeof(void *) ? 
> */

Yes, seems very likely - it's supposed to hold any potential sizeof()
return value.  You could end up on platforms where that's not true, if
it used tagged pointers or such.  I guess we could instead use a union,
but that doesn't seem that much of a benefit.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-14 Thread Robert Haas
On Mon, Mar 13, 2017 at 11:48 PM, Amit Kapila  wrote:
> We didn't found any issue with the above testing.

Great!  I've committed the latest version of the patch, with some
cosmetic changes.

It would be astonishing if there weren't a bug or two left, but I
think overall this is very solid work, and I think it's time to put
this out there and see how things go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Alvaro Herrera
After looking at how index_fetch_heap and heap_hot_search_buffer
interact, I can't say I'm in love with the idea.  I started thinking
that we should not have index_fetch_heap release the buffer lock only to
re-acquire it five lines later, so it should keep the buffer lock, do
the recheck and only release it afterwards (I realize that this means
there'd be need for two additional "else release buffer lock" branches);
but then this got me thinking that perhaps it would be better to have
another routine that does both call heap_hot_search_buffer and then call
recheck -- it occurs to me that what we're doing here is essentially
heap_warm_search_buffer.

Does that make sense?

Another thing is BuildIndexInfo being called over and over for each
recheck().  Surely we need to cache the indexinfo for each indexscan.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Upgrading postmaster's log messages about bind/listen errors

2017-03-14 Thread Tom Lane
I wrote:
>> I think that what would actually be of some use nowadays is a LOG-level
>> message emitted if the wraparound *isn't* activated immediately at start.
>> But otherwise, we should follow the rule that silence is golden.

> Concretely, how about the attached?  It preserves the original
> "protections are now enabled" message at LOG level, but emits it only
> when oldestOffsetKnown becomes true *after* startup.  Meanwhile, if
> oldestOffsetKnown is still not true at the conclusion of TrimMultiXact,
> then it emits a new LOG message about "protections are not active".

I realized that the second of these is not necessary because it's
redundant with the message about "MultiXact member wraparound protections
are disabled because oldest checkpointed MultiXact %u does not exist on
disk".  Pushed without that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 9:56 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Cool, thanks for the review.  I'm not quite confident that we've found
>> all of the bugs here yet, but I think we're moving in the right
>> direction.
>
> I guess the real question here is why isn't Gather Merge more like
> Append and MergeAppend?  That is, why did you complicate matters
> by making it projection capable?  That seems like a pretty random
> decision from here.

Well, then it would be inconsistent with plain old Gather.  I thought
about going that route - ripping whatever projection logic Gather has
out and teaching the system that it's not projection-capable - but I
don't see what that buys us.  It's pretty useful to be able to project
on top of Gather-type nodes, because they will often be at the top of
the plan, just before returning the results to the user.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.

2017-03-14 Thread Robert Haas
On Tue, Jan 17, 2017 at 11:49 AM, Robert Haas  wrote:
> On Mon, Jan 16, 2017 at 7:23 AM, Amit Kapila  wrote:
>> On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas  wrote:
>>> Fix cardinality estimates for parallel joins.
>>>
>>
>> +   /*
>> +* In the case of a parallel plan, the row count needs to represent
>> +* the number of tuples processed per worker.
>> +*/
>> +   path->rows = clamp_row_est(path->rows / parallel_divisor);
>> }
>>
>> path->startup_cost = startup_cost;
>> @@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
>> else
>> path->path.rows = path->path.parent->rows;
>>
>> +   /* For partial paths, scale row estimate. */
>> +   if (path->path.parallel_workers > 0)
>> +   path->path.rows /= get_parallel_divisor(>path);
>>
>>
>> Isn't it better to call clamp_row_est in join costing functions as we
>> are doing in cost_seqscan()?  Is there a reason to keep those
>> different?
>
> No, those should probably be changed to match.

So I guess that'd look something like this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


clamp-parallel-join-est.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add test case for two phase commit. Also by Masahiko Sawada.

2017-03-14 Thread Robert Haas
On Tue, Mar 14, 2017 at 2:18 AM, Michael Paquier
 wrote:
> On Tue, Mar 14, 2017 at 5:04 AM, Michael Meskes  wrote:
>> Add test case for two phase commit. Also by Masahiko Sawada.
>>
>> Branch
>> --
>> master
>>
>> Details
>> ---
>> http://git.postgresql.org/pg/commitdiff/42fcad68a9c0e0ebecf6842888723cad1f9d5be2
>>
>> Modified Files
>> --
>> src/interfaces/ecpg/test/ecpg_schedule |   1 +
>> src/interfaces/ecpg/test/expected/sql-twophase.c   | 114 
>> +
>> .../ecpg/test/expected/sql-twophase.stderr |  34 ++
>> .../ecpg/test/expected/sql-twophase.stdout |   0
>> src/interfaces/ecpg/test/sql/Makefile  |   1 +
>> src/interfaces/ecpg/test/sql/twophase.pgc  |  44 
>> 6 files changed, 194 insertions(+)
>
> This patch has forgotten to update
> src/interfaces/ecpg/test/sql/.gitignore with the new files generated.
> Please see the patch attached.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-03-14 Thread Robert Haas
On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
 wrote:
>> I agree with that, but I propose the attached version instead.  It
>> seems cleaner to have the entire test for setting BM_PERMANENT in one
>> place rather than splitting it up as you did.
>
> Fine for me. You may want to update the comment of BM_PERMANENT in
> buf_internals.h as Artur has mentioned upthread. For example by just
> adding "and init forks".

OK, done, and back-patched all the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-14 Thread Joe Conway
On 03/14/2017 08:40 AM, Tom Lane wrote:
> Joe Conway  writes:
>> On 03/14/2017 03:15 AM, Heikki Linnakangas wrote:
>>> It would be a lot more sensible, if there was a way to specify in
>>> pg_hba.conf, "scram-or-md5". We punted on that for PostgreSQL 10, but
>>> perhaps we should try to cram that in, after all.
> 
>> I was also thinking about that. Basically a primary method and a
>> fallback. If that were the case, a gradual transition could happen, and
>> if we want \password to enforce best practice it would be ok.
> 
> Why exactly would anyone want "md5 only"?  I should think that "scram
> only" is a sensible pg_hba setting, if the DBA feels that md5 is too
> insecure, but I do not see the point of "md5 only" in 2017.  I think
> we should just start interpreting that as "md5 or better".

That certainly would work for me.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] PATCH: Configurable file mode mask

2017-03-14 Thread David Steele
On 3/13/17 3:03 PM, Tom Lane wrote:
> David Steele  writes:
>> On 3/13/17 2:16 PM, Tom Lane wrote:
>>> I also don't especially want to have to analyze cases like "what
>>> happens if user initdb'd with mask X but then changes the GUC and
>>> restarts the postmaster?".  Maybe the right thing is to not expose
>>> this as a GUC at all, but drive it off the permissions observed on
>>> the data directory at postmaster start.  Viz:
>>>
>>> * $PGDATA has permissions 0700: adopt umask 077
>>>
>>> * $PGDATA has permissions 0750: adopt umask 027
>>>
>>> * anything else: fail
> 
>> How about a GUC, allow_group_access, that when set will enforce
>> permissions and set the umask accordingly, and when not set will follow
>> $PGDATA as proposed above?
> 
> Seems overcomplicated ...

Yeah.  It wouldn't be a lot of fun to document, that's for sure.

>>> That way, a "chmod -R" would be the necessary and sufficient procedure
>>> for switching from one case to the other.
> 
>> I'm OK with that if you think it's the best course, but perhaps the GUC
>> would be better because it can detect accidental permission changes.
> 
> If we're only checking file permissions at postmaster start, I think it's
> illusory to suppose that we're offering very much protection against
> accidental changes.  A chmod applied while the postmaster is running
> could still break things, and we'd not notice till the next restart.

Fair enough.

> But it might be worth thinking about whether we want to encourage people
> to do manual chmod's at all; that's fairly easy to get wrong, particularly
> given the difference in X bits that should be applied to files and
> directories.  Another approach that could be worth considering is
> a PGC_POSTMASTER GUC with just two states (group access or not) and
> make it the postmaster's responsibility to do the equivalent of chmod -R
> to make the file tree match the GUC.  I think we do a tree scan anyway
> for other purposes, so correcting any wrong file permissions might not
> be much added work in the normal case.

The majority of scanning is done in recovery (to find and remove
unlogged tables) and I'm not sure we would want to add that overhead to
normal startup.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-14 Thread Tom Lane
Joe Conway  writes:
> On 03/14/2017 03:15 AM, Heikki Linnakangas wrote:
>> If the server isn't set up to do SCRAM authentication, i.e. there are no
>> "scram" entries in pg_hba.conf, and you set yourself a SCRAM verifier,
>> you have just locked yourself out of the system. I think that's a
>> non-starter. There needs to be some more intelligence in the decision.

> Yes, this was exactly my concern.

This seems like a serious usability fail.


>> It would be a lot more sensible, if there was a way to specify in
>> pg_hba.conf, "scram-or-md5". We punted on that for PostgreSQL 10, but
>> perhaps we should try to cram that in, after all.

> I was also thinking about that. Basically a primary method and a
> fallback. If that were the case, a gradual transition could happen, and
> if we want \password to enforce best practice it would be ok.

Why exactly would anyone want "md5 only"?  I should think that "scram
only" is a sensible pg_hba setting, if the DBA feels that md5 is too
insecure, but I do not see the point of "md5 only" in 2017.  I think
we should just start interpreting that as "md5 or better".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >