Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Pavel Stehule
2016-02-08 16:55 GMT+01:00 Teodor Sigaev :

> rebased, messages changes per Tom's proposal
>>
> Cool feature and sometimes I needed it a lot.
>
> But, seems, there are some bugs in error processing.
>

I am looking on it

Regards

Pavel


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
2016-02-08 16:45 GMT+01:00 jflack :

> On 02/08/2016 03:16 AM, Pavel Stehule wrote:
>
> > Only a owner of schema can edit functions inside schema
>
> Can't anyone granted CREATE on the schema do that? Would
> that be changed by this proposal?
>

yes, anybody with necessary rights can do it.

regards

Pavel


>
> -Chap
>
>


[HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
Hi

On Russian PgConf I had a talk with Oleg about missing features in PLpgSQL,
that can complicates a migrations from Oracle to PostgreSQL. Currently I
see only one blocker - missing protected session variables. PL/SQL has
package variables with possible only package scope and session life cycle.
Currently we cannot to ensure/enforce schema scope visibility - and we
cannot to implement this functionality in PL languages other than C.

I propose really basic functionality, that can be enhanced in future - step
by step. This proposal doesn't contain any controversial feature or syntax,
I hope. It is related to PLpgSQL only, but described feature can be used
from any PL languages with implemented interface.

Proposal
===
I propose a possibility to declare variables on schema level. These
variables can be accessed from any function inside schema, and cannot by
accessed directly with functions from other schemas. Schema variables can
be accessed only from functions (in this moment). In PLpgSQL the schema
variables has same behave as local variables.

Syntax
=
New statement

CREATE SCHEMA VARIABLE varname AS type DEFAULT expr.

This statement creates new memory variable visible only from PL functions
created inside related schema. The life cycle of this variable is limited
to session. Variable is initialized to default expr (or NULL) when is first
used in session.

Usage
=

DROP SCHEMA IF EXISTS test_schema CASCADE;
SET SCHEMA test_schema;

CREATE SCHEMA VARIABLE local_counter AS int DEFAULT 0;

CREATE OR REPLACE FUNCTION increment_counter()
RETURNS void AS $$
BEGIN
  local_counter := local_counter + 1;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION get_counter()
RETURNS int AS $$
BEGIN
  RETURN local_counter;
END;
$$ LANGUAGE plpgsql;

Security
==
Only a owner of schema can edit functions inside schema, and then only
owner of schema has access to schema variable. If it is wanted, then schema
variables can be accessed from outside by auxiliary explicitly created
functions.

Possible future enhancing
===
* global life cycle (not only session)
* access and usage outside PL (from SQL)


Comments, notes??

Regards

Pavel


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Fujii Masao
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>>  wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
 On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
  wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

 "not" is used twice in this sentence in a way that renders me not able
 to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single 
> node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>
> Thoughts?

I agree that we would need something like such new view in the future,
however it seems too late to work on that for 9.6 unfortunately.
There is only one CommitFest left. Let's focus on very simple case, i.e.,
1-level priority list, now, then we can extend it to cover other cases.

If we can commit the simple version too early and there is enough
time before the date of feature freeze, of course I'm happy to review
the extended version like you proposed, for 9.6.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
Hi


>> I propose really basic functionality, that can be enhanced in future -
>> step by step. This proposal doesn't contain any controversial feature or
>> syntax, I hope. It is related to PLpgSQL only, but described feature can be
>> used from any PL languages with implemented interface.
>>
>
>
> I think it would make sense to implement the interface in at least one of
> our other supported PLs. I'm not entirely clear how well this will match up
> with, say, plperl, but I'd be interested to see.
>

The minimalistic interface can be based on get/set functions. We can do
necessary transformations there.

Regards

Pavel


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Chapman Flack
[resending because thunderbird helpfully defaulted my sender
address to the one that -isn't- subscribed to -hackers, sorry]


On 02/08/2016 03:16 AM, Pavel Stehule wrote:

> Only a owner of schema can edit functions inside schema

Can't anyone granted CREATE on the schema do that? Would
that be changed by this proposal?

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Joe Conway
On 02/08/2016 06:24 AM, Andres Freund wrote:
> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be nice to
>> have an escape sequence in the log_line_prefix setting that could reference
>> the cluster_name.
> 
> I've argued[1][2] for this when cluster_name was introduced, but back
> then I seemed to have been the only one arguing for it. Josh later
> jumped that train.
> 
> Given that we now had a number of people wishing for this, can we maybe
> reconsider?

Seems like a reasonable idea to me. But if we add a log_line_prefix
setting, shouldn't we also add it to csvlog output too?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Tom Lane
Etsuro Fujita  writes:
> Here is a patch to use %u not %d to print umid and userid.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-02-08 Thread Daniel Verite
Shulgin, Oleksandr wrote:

> Added to the Open commitfest: https://commitfest.postgresql.org/9/475/

Here's a review. Note that the patch tested and submitted
is not the initial one in the thread, so it doesn't exactly
match  $subject now.
What's tested here is a client-side approach, suggested by Tom
upthread as an alternative, and implemented by Oleksandr in 
0001-POC-errverbose-in-psql.patch

The patch has two parts: one part in libpq exposing a new function
named PQresultBuildErrorMessage, and a second part implementing an
\errverbose command in psql, essentially displaying the result of the
function.
The separation makes sense if we consider that other clients might benefit
from the function, or that libpq is a better place than psql to maintain
this in the future, as the list of error fields available in a PGresult
might grow.
OTOH maybe adding a new libpq function just for that is overkill, but this
is subjective.

psql part
=
Compiles and works as intended.
For instance with \set VERBOSITY default, an FK violation produces:

  # insert into table2 values(10);
  ERROR:  insert or update on table "table2" violates foreign key constraint
"table2_id_tbl1_fkey"
  DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".

Then \errverbose just displays the verbose form of the error:
  # \errverbose
ERROR:  23503: insert or update on table "table2" violates foreign
  key constraint "table2_id_tbl1_fkey"
DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".
SCHEMA NAME:  public
TABLE NAME:  table2
CONSTRAINT NAME:  table2_id_tbl1_fkey
LOCATION:  ri_ReportViolation, ri_triggers.c:3326

- make check passes
- valgrind test is OK (no obvious leak after using the command).

Missing bits:
- the command is not mentioned in the help (\? command, see help.c)
- it should be added to tab completions (see tab-complete.c)
- it needs to be described in the SGML documentation

libpq part
==
The patch implements and exports a new PQresultBuildErrorMessage()
function.

AFAICS its purpose is to produce a result similar to what
PQresultErrorMessage() would have produced, if PQERRORS_VERBOSE
was the verbosity in effect at execution time.

My comments:

- the name of the function does not really hint at what it does.
Maybe something like PQresultVerboseErrorMessage() would be more
explicit?

I would expect the new function to have the same interface than
PQresultErrorMessage(), but it's not the case.

- it takes a PGconn* and PGresult* as input parameters, but
PQresultErrorMessage takes only a  as input.
It's not clear why PGconn* is necessary.

- it returns a pointer to a strdup'ed() buffer, which
has to be freed separately by the caller. That differs from
PQresultErrorMessage() which keeps this managed inside the
PGresult struct.

- if protocol version < 3, an error message is returned. It's not
clear to the caller that this error is emitted by the function itself
rather than the query. Besides, are we sure it's necessary?
Maybe the function could just produce an output with whatever
error fields it has, even minimally with older protocol versions,
rather than failing?

- if the fixed error message is kept, it should pass through
libpq_gettext() for translation.

- a description of the function should be added to the SGML doc.
There's a sentence in PQsetErrorVerbosity() that says, currently:

  "Changing the verbosity does not affect the messages available from
   already-existing PGresult objects, only subsequently-created ones."

As it's precisely the point of that new function, that bit could
be altered to refer to it.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Teodor Sigaev

rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
 parse_ident
--
 {"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
 Schema".9someTable"tifier after "." symbol: ""Some

Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.

2
# select * from parse_ident('.someTable');
ERROR:  missing identifier after "." symbol: ".someTable"
Why AFTER  instead of BEFORE?

2
Function succesfully truncates long indentifier but not in case of quoted 
identifier.
select length(a[1]), length(a[2]) from 
parse_ident('"xx".y') 
as a ;

 length | length
+
414 | 63








--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 5:45 AM, Etsuro Fujita
 wrote:
> Maybe my explanation was not correct, but I'm saying that the targertlist of
> the above outer_plan should be set to the fdw_scan_tlist, to avoid
> misbehavior.

Yeah, I think you're right.  So in this hunk:

+   if (foreignrel->reloptkind == RELOPT_JOINREL)
+   {
+   /* For a join relation, get the conditions from
fdw_private structure */
+   remote_conds = fpinfo->remote_conds;
+   local_exprs = fpinfo->local_conds;
+
+   /* Build the list of columns to be fetched from the
foreign server. */
+   fdw_scan_tlist = build_tlist_to_deparse(foreignrel);
+   }

I think we should also be doing outer_plan->targetlist =
fdw_scan_tlist in this block, with a comment like "Ensure that the
outer plan produces the a tuple whose descriptor matches our scan
tuple slot.  This is safe because all scans and joins support
projection, so we never need to insert a Result node."   It would
probably be good to Assert(outer_plan != NULL) before doing the
assignment, too, just as a guard against future bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Marko Tiikkaja

On 08/02/16 09:16, Pavel Stehule wrote:

Usage
=

DROP SCHEMA IF EXISTS test_schema CASCADE;
SET SCHEMA test_schema;

CREATE SCHEMA VARIABLE local_counter AS int DEFAULT 0;

CREATE OR REPLACE FUNCTION increment_counter()
RETURNS void AS $$
BEGIN
   local_counter := local_counter + 1;
END;
$$ LANGUAGE plpgsql;


How does this function know which schema variables are visible?


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1

2016-02-08 Thread Huong Dangminh
Hi,

I think this fixed is also required for REL9_4_STABLE.
Please confirm the attached patch.

Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/



ecpg.9.4.patch
Description: ecpg.9.4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011

2016-02-08 Thread Vitaly Burovoy
On 2/7/16, Vitaly Burovoy  wrote:
> Hello, Hackers!
>
> TODO list has an entry "Move NOT NULL constraint information to
> pg_constraint" with four links and without two with the newest
> work[1][2].
>
> I rebased the patch from [2] (in attachment). At least it applies
> cleanly on top of c477e84fe2471cb675234fce75cd6bb4bc2cf481 and does
> not generate a core dump during "make check". There are no tests for
> it and it fails "make check" (by difference) which leads inability to
> run "make check-world".

It seems the file I attached has more than necessary changes.

Please, find a correct patch attached.

> ===
> [1]http://www.postgresql.org/message-id/flat/1343682669-sup-2...@alvh.no-ip.org
> [2]http://www.postgresql.org/message-id/20160109030002.GA671800@alvherre.pgsql

-- 
Best regards,
Vitaly Burovoy


catalog-notnull-2-c477e84_cleaned.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Etsuro Fujita
Here is a patch to use %u not %d to print umid and userid.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
***
*** 159,165  GetConnection(UserMapping *user, bool will_prep_stmt)
  		entry->have_error = false;
  		entry->conn = connect_pg_server(server, user);
  
! 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %d, userid %d)",
  			 entry->conn, server->servername, user->umid, user->userid);
  	}
  
--- 159,165 
  		entry->have_error = false;
  		entry->conn = connect_pg_server(server, user);
  
! 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
  			 entry->conn, server->servername, user->umid, user->userid);
  	}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Marko Tiikkaja

On 08/02/16 13:17, Pavel Stehule wrote:

2016-02-08 13:03 GMT+01:00 Marko Tiikkaja :

How does this function know which schema variables are visible?


function see all schema variables from same schema as function's schema


Personally I find that undesirable.  I don't know what oracle does, but 
variables being visible without schema-qualifying them can introduce 
variable conflicts in PL/PgSQL.  I'd prefer if you could only refer to 
them by prefixing them with the schema name (or maybe allow search_path 
to be used).



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-02-08 Thread Michael Paquier
On Mon, Feb 8, 2016 at 6:47 AM, Peter Eisentraut  wrote:
> On 1/26/16 10:56 AM, Simon Riggs wrote:
>> Removing one of "archive" or "hot standby" will just cause confusion and
>> breakage, so neither is a good choice for removal.
>>
>> What we should do is
>> 1. Map "archive" and "hot_standby" to one level with a new name that
>> indicates that it can be used for both/either backup or replication.
>>   (My suggested name for the new level is "replica"...)
>> 2. Deprecate "archive" and "hot_standby" so that those will be removed
>> in a later release.
>
> Updated patch to reflect these suggestions.

Shouldn't backup.sgml be updated as well? Here is the portion that I
am referring to:
To enable WAL archiving, set the 
configuration parameter to archive or higher,
 to on,

 But minimal WAL does not contain enough information to reconstruct the
-data from a base backup and the WAL logs, so archive or
+data from a base backup and the WAL logs, so replica or
 higher must be used to enable WAL archiving
 () and streaming replication.


-In hot_standby level, the same information is logged as
-with archive, plus information needed to reconstruct
-the status of running transactions from the WAL. To enable read-only
As the paragraph about the difference between hot_standby and archive
is removed, I think that it would be better to mention that setting
wal_level to replica allows to reconstruct data from a base backup and
the WAL logs, *and* to run read-only queries when hot_standby is
enabled.

-   if (ControlFile->wal_level < WAL_LEVEL_HOT_STANDBY)
+   if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
Upthread it was mentioned that switching to an approach where enum
values are directly listed would be better. The target of an extra
patch on top of this one?

-   if (wal_level < WAL_LEVEL_ARCHIVE)
-   ereport(ERROR,
-
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-errmsg("replication slots can only be
used if wal_level >= archive")));
We should still forbid the creation of replication slots if wal_level = minimal.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Marko Tiikkaja

On 08/02/16 13:41, Pavel Stehule wrote:

2016-02-08 13:22 GMT+01:00 Marko Tiikkaja :

Personally I find that undesirable.  I don't know what oracle does, but
variables being visible without schema-qualifying them can introduce
variable conflicts in PL/PgSQL.  I'd prefer if you could only refer to them
by prefixing them with the schema name (or maybe allow search_path to be
used).


I hope so there are not new conflicts - schema variable is not directly
visible from SQL (in this iteration) - they are visible only from functions
- and the behave is same like global plpgsql variable. So schema variable
can be in conflict with SQL identifier only exactly identically as plpgsql
variable


Yeah, and that's exactly what I don't want, because that means that 
CREATE SCHEMA VARIABLE suddenly breaks existing code.



But prefix can be used.


Sure, but I don't see the point.  Is there a reason not to require such 
variable references to be prefixed with the schema name?  Or explicitly 
bring them into scope in the DECLARE section somehow.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-02-08 Thread Haribabu Kommi
On Mon, Feb 8, 2016 at 9:01 AM, Robert Haas  wrote:
>  On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
>  wrote:
>>  [ new patch ]
>
> This patch contains a number of irrelevant hunks that really ought not
> to be here and make the patch harder to understand, like this:
>
> -* Generate appropriate target list for
> scan/join subplan; may be
> -* different from tlist if grouping or
> aggregation is needed.
> +* Generate appropriate target list for
> subplan; may be different from
> +* tlist if grouping or aggregation is needed.
>
> Please make a habit of getting rid of that sort of thing before submitting.

sure. I will take of such things in future.

> Generally, I'm not quite sure I understand the code here.  It seems to
> me that what we ought to be doing is that grouping_planner, right
> after considering using a presorted path (that is, just after the if
> (sorted_path) block between lines 1822-1850), ought to then consider
> using a partial path.  For the moment, it need not consider the
> possibility that there may be a presorted partial path, because we
> don't have any way to generate those yet.  (I have plans to fix that,
> but not in time for 9.6.)  So it can just consider doing a Partial
> Aggregate on the cheapest partial path using an explicit sort, or
> hashing; then, above the Gather, it can finalize either by hashing or
> by sorting and grouping.
>
> The trick is that there's no path representation of an aggregate, and
> there won't be until Tom finishes his upper planner path-ification
> work.  But it seems to me we can work around that.  Set best_path to
> the cheapest partial path, add a partial aggregate rather than a
> regular one around where it says "Insert AGG or GROUP node if needed,
> plus an explicit sort step if necessary", and then push a Gather node
> and a Finalize Aggregate onto the result.

Thanks, i will update the patch accordingly. Along with those changes,
I will try to calculate the cost involved in normal aggregate without
generating the plan and compare it against the parallel cost plan before
generating the actual plan. Because with less number of groups
normal aggregate is performing better than parallel aggregate in tests.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1

2016-02-08 Thread Andres Freund
On 2016-02-08 20:59:25 +1100, Haribabu Kommi wrote:
> On Mon, Feb 8, 2016 at 8:20 PM, Huong Dangminh
>  wrote:
> > Hi,
> >
> > I think this fixed is also required for REL9_4_STABLE.
> > Please confirm the attached patch.
> 
> Yes, this fix was missed for 9.4 stable branch during back patch
> and it is available on all other supported branches.

Since this evidently was an omission, I backpatched the relevant commit
to 9.4. That seems prudent, given that the issue is documented to be
fixed there...

Thanks for the report.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1

2016-02-08 Thread Michael Meskes
On Mon, Feb 08, 2016 at 09:20:46AM +, Huong Dangminh wrote:
> I think this fixed is also required for REL9_4_STABLE.
> Please confirm the attached patch.

Sure thing, no idea where that cherry-pick got lost.

Thanks.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
2016-02-08 13:03 GMT+01:00 Marko Tiikkaja :

> On 08/02/16 09:16, Pavel Stehule wrote:
>
>> Usage
>> =
>>
>> DROP SCHEMA IF EXISTS test_schema CASCADE;
>> SET SCHEMA test_schema;
>>
>> CREATE SCHEMA VARIABLE local_counter AS int DEFAULT 0;
>>
>> CREATE OR REPLACE FUNCTION increment_counter()
>> RETURNS void AS $$
>> BEGIN
>>local_counter := local_counter + 1;
>> END;
>> $$ LANGUAGE plpgsql;
>>
>
> How does this function know which schema variables are visible?
>

function see all schema variables from same schema as function's schema

Pavel


>
>
> .m
>


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
2016-02-08 13:22 GMT+01:00 Marko Tiikkaja :

> On 08/02/16 13:17, Pavel Stehule wrote:
>
>> 2016-02-08 13:03 GMT+01:00 Marko Tiikkaja :
>>
>>> How does this function know which schema variables are visible?
>>>
>>
>> function see all schema variables from same schema as function's schema
>>
>
> Personally I find that undesirable.  I don't know what oracle does, but
> variables being visible without schema-qualifying them can introduce
> variable conflicts in PL/PgSQL.  I'd prefer if you could only refer to them
> by prefixing them with the schema name (or maybe allow search_path to be
> used).
>

I hope so there are not new conflicts - schema variable is not directly
visible from SQL (in this iteration) - they are visible only from functions
- and the behave is same like global plpgsql variable. So schema variable
can be in conflict with SQL identifier only exactly identically as plpgsql
variable, and cannot be in conflict with PLpgSQL variable, because any
plpgsql variable can overwrite it. But prefix can be used.

example:

CREATE OR REPLACE FUNCTION increment_counter()
RETURNS void AS $$
BEGIN
  test_schema.local_counter := test_schema.local_counter + 1;
END;
$$ LANGUAGE plpgsql;

I would not to allow dependency on SEARCH_PATH, because then the change of
SEARCH_PATH can require replanning and possibly can change result type. So
using SEARCH PATH is way to hell. More I would to "protect" content of
variable - and the schema scope can work like good guard. If you need
public visible variables, then you can use trivial functions, that will do
it - and publish content by functions.

Regards

Pavel




>
>
> .m
>


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Craig Ringer
On 8 February 2016 at 09:37, Filip Rembiałkowski <
filip.rembialkow...@gmail.com> wrote:

> On Sun, Feb 7, 2016 at 4:37 PM, Vik Fearing  wrote:
>
> >>> There is also no mention in the documentation about what happens if I
> do:
> >>>
> >>> NOTIFY ALL chan, 'msg';
> >>> NOTIFY ALL chan, 'msg';
> >>> NOTIFY DISTINCT chan, 'msg';
> >>> NOTIFY ALL chan, 'msg';
> >>>
> >>> Without testing, I'd say I'd get two messages, but it should be
> >>> explicitly mentioned somewhere.
> >>
> >> If it's four separate transactions, LISTEN'er should get four.
> >
> > The question was for one transaction, I should have been clearer about
> that.
> >
> >> If it's in one transaction,  LISTEN'er should get three.
> >
> > This is surprising to me, I would think it would get only two.  What is
> > your rationale for three?
> >
>
> It is a single transaction, but four separate commands.
>
> >>> NOTIFY ALL chan, 'msg';
> -- send the message, save in the list/hash
> >>> NOTIFY ALL chan, 'msg';
> -- ALL was specified, send the message even if it is on the list/hash
> >>> NOTIFY DISTINCT chan, 'msg';
> -- default mode, skip message because it's in the list/hash
> >>> NOTIFY ALL chan, 'msg';
> -- ALL was specified, send the message even if it is hashed/saved
>

So in total three messages are sent?

Would it be correct to say that if ALL is specified then a message is
queued no matter what. If DISTINCT is specified then it is only queued if
no message with the same channel and argument is already queued for
delivery. Using DISTINCT can never decrease the total number of messages to
be sent.

Right?

If so, I think that's the right behaviour and the docs just need to be
explicit - an example like the above would be good, translated to be
friendlier to those who don't know the internal mechanics.

I've found the deduplication functionality of NOTIFY very frustrating in
the past and I see this as a significant improvement. Sometimes the *number
of times* something happened is significant too...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Development with Eclipse - Wrong error messages in IDE

2016-02-08 Thread Peter Moser

On 05.02.2016 um 18:40 Jason Petersen wrote:

On Feb 3, 2016, at 2:38 AM, Peter Moser  wrote:

Does anyone had similar problems? Do I have to configure Eclipse to understand 
the PG_RMGR macro or is there another possibility to teach Eclipse these macros?


Hi,



I just built 9.6 under Eclipse CDT to try this out and was able to open e.g. 
heapam.c without any error markers.

I added PostgreSQL as a “Makefile Project with Existing Code” after running 
./configure from the command-line. After that, I built the project from within 
Eclipse by adding the ‘all’ make target and running it.


I imported PG the same way, configured from terminal with

export CFLAGS="-g0"

./configure \
--prefix="/home/p/pg/build" \
--enable-debug \
--enable-depend \
--enable-cassert

I built the project from command-line, not from within Eclipse. First I 
thought that this may have caused all this problems, but no...




One setting I usually change: right-click the project, pick Properties, then drill 
down through C/C++ General -> Preprocessor Include Paths. In the Provider pane, 
there is an entry for “CDT GCC Build Output Parser”. I’m not sure if this is 
strictly necessary, but I set my “Container to keep discovered entries” setting to 
“File”.

Basically, Eclipse scans the make output for -I flags, then notes all the 
includes used to build each file, so the static analyzer, etc. can have more 
accurate information (it is crucial that the “Compiler command pattern” in this 
window be a regex that will match the compiler binary you use, so if you have 
/usr/local/bin/gcc, and “gcc” is the pattern, you are in for trouble).

After running the build, Eclipse should now know what includes are used for each file 
and stop whining. If it ever seems to have problems, you can kick it by running a 
clean target, then all, then picking “Project -> C/C++ Index -> Rebuild” (I 
think).


Thanks for all your suggestions, tried all of them, but it made no 
difference. What finally solved my problems was to delete the BUILD and 
cluster DATA folders that I had created, re-configured the system. 
Deleted the project in Eclipse, and imported it again after configure. 
All wrong markers disappeared. Maybe the C-file index of Eclipse was 
corrupted or so. Also rebuilding it didn't work. The only difference 
from my previous attempt was the CFLAGS thing (see above), but I do not 
know if this changes anything related to markers.



Peter



--
Jason Petersen
Software Engineer | Citus Data
303.736.9255
ja...@citusdata.com




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit ac1d794

2016-02-08 Thread Kyotaro HORIGUCHI
Hello, I am one who wants waiting on many sockets at once.

At Thu, 14 Jan 2016 18:55:51 +0100, Andres Freund  wrote in 
<20160114175551.gm10...@awork2.anarazel.de>
> On 2016-01-14 18:14:21 +0100, Andres Freund wrote:
> > On 2016-01-14 12:07:23 -0500, Robert Haas wrote:
> > > > Do we want to provide a backward compatible API for all this? I'm fine
> > > > either way.
> > >
> > > How would that work?
> > 
> > I'm thinking of something like;
> > 
> > int WaitOnLatchSet(LatchEventSet *set, int wakeEvents, long timeout);
> > 
> > int
> > WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,long 
> > timeout)
> > {
> > LatchEventSet set;
> > 
> > LatchEventSetInit(, latch);
> > 
> > if (sock != PGINVALID_SOCKET)
> >LatchEventSetAddSock(, sock);
> > 
> > return WaitOnLatchSet(set, wakeEvents, timeout);
> > }
> > 
> > I think we'll need to continue having wakeEvents and timeout parameters
> > for WaitOnLatchSet, we quite frequently want to wait socket
> > readability/writability, not wait on the socket, or have/not have
> > timeouts.
> 
> Hm. If we really want to support multiple sockets at some point the
> above WaitOnLatchSet signature isn't going to fly, because it won't
> support figuring out which fd the event this triggered on.
> 
> So it seems we'd need to return something like
> struct LatchEvent
> {
> enum LatchEventType {WL_LATCH_EVENT;WL_TIMEOUT;WL_SOCKET_EVENT; 
> WL_POSTMASTER_EVENT;...} event_type;
> int mask;
> pgsocket event_sock;
> };
> 
> that'd also allow to extend this to return multiple events if we want
> that at some point. Alternatively we could add a pgsocket* argument, but
> that doesn't really seem much better.
> 
> Not super happy about the above proposal.

How about allowing registration of a callback for every waiting
socket. The signature of the callback function would be like

enum LATCH_CALLBACK_STATE
 LatchWaitCallback(pgsocket event_sock,
   enum LatchEventType, int mask?, void *bogus);

It can return, for instance, LCB_CONTINUE, LCB_BREAK or
LCB_IMMEDBREAK, and if any one of them returns LCB_BREAK, it will
break after, maybe, calling all callbacks for fired events.

We could have predefined callbacks for every event which does
only setting a corresponding flag and returns LCB_BREAK.

/* Waiting set has been constructed so far */
if (!WaitOnLatchSet((?))
   errorr()

if (is_sock_readable[sockid]) {} /** is... would be global /
if (is_sock_writable[sockid]) {} /** is... would be global /
/* Any other types of trigger would processes elsewhere */


Although it might be slow if we have an enormous number of
sockets fired at once, I suppose it returns only for a few
sockets on most cases.

I don't see the big picture of the whole latch/signalling
mechanism with this, but callbacks might be usable for signalling
on many sockets.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Andres Freund
On 2016-02-08 15:58:49 +0900, Michael Paquier wrote:
> On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
> > On 2016-02-06 22:03:15 +0900, Michael Paquier wrote:
> >> The patch attached will apply on master, on 9.5 there is one minor
> >> conflict. For older versions we will need another reworked patch.
> >
> > FWIW, I don't think we should backpatch this. It'd look noticeably
> > different in the back branches, and this isn't really a critical
> > issue. I think it makes sense to see this as an optimization.
> 
> The original bug report of this thread is based on 9.4, which is the
> first version where the standby snapshot in the bgwriter has been
> introduced. Knowing that most of the systems in the world are actually
> let idle. If those are running Postgres, I'd like to think that it is
> a waste. Now it is true that this is not a critical issue.

Unconvinced. For one, the equivalent behaviour for checkpoints has
existed since at least 9.0. Secondly, the problem only really appears if
you use archiving, configure a nondefault archive timeout, and that
timeout is significantly smaller than checkpoint timeout. And then the
cost is partially filled segment every now and then.  That just doesn't
stand into relation into adding some nontrivial code into backbranches.

> >> + if (holdingAllLocks)
> >> + {
> >> + int i;
> >> +
> >> + for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
> >> + WALInsertLocks[i].l.progressAt = StartPos;
> >
> > Why update all?
> 
> For consistency. I agree that updating one, say the first one is
> enough. I have added a comment explaining that as well.

We don't set valid values in WALInsertLockAcquireExclusive for all locks
either. I don't see consistency to what this is going to buy us.

>  /*
> + * XLogInsert
> + *
> + * A shorthand for XLogInsertExtended, to update the progress of WAL
> + * activity by default.
> + */
> +XLogRecPtr
> +XLogInsert(RmgrId rmid, uint8 info)
> +{
> + return XLogInsertExtended(rmid, info, true);
> +}
> +
> +/*
> + * XLogInsertExtended

I'm not really a fan. I'd rather change existing callers to add a
'flags' bitmask argument. Right now there can't really be XLogInserts()
in extension code, so that's pretty ok to change.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1

2016-02-08 Thread Haribabu Kommi
On Mon, Feb 8, 2016 at 8:20 PM, Huong Dangminh
 wrote:
> Hi,
>
> I think this fixed is also required for REL9_4_STABLE.
> Please confirm the attached patch.

Yes, this fix was missed for 9.4 stable branch during back patch
and it is available on all other supported branches.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-08 Thread Etsuro Fujita

On 2016/02/05 17:50, Ashutosh Bapat wrote:


Btw, IIUC, I think the patch fails to adjust the targetlist of the
top plan created that way, to output the fdw_scan_tlist, as
discussed in [1] (ie, I think the attached patch is needed, which is
created on top of your patch pg_fdw_join_v8.patch).



fdw_scan_tlist represents the output fetched from the foreign server and
is not necessarily the output of ForeignScan. ForeignScan node's output
is represented by tlist argument to.

1119 return make_foreignscan(tlist,
1120 local_exprs,
1121 scan_relid,
1122 params_list,
1123 fdw_private,
1124 fdw_scan_tlist,
1125 remote_exprs,
1126 outer_plan);

This tlist is built using build_path_tlist() for all join plans. IIUC,
all of them output the same targetlist. We don't need to make sure that
targetlist match as long as we are using the targetlist passed in by
create_scan_plan(). Do you have a counter example?


Maybe my explanation was not correct, but I'm saying that the 
targertlist of the above outer_plan should be set to the fdw_scan_tlist, 
to avoid misbehavior.  Here is such an example (add() in the example is 
a user defined function that simply adds two arguments, defined by: 
create function add(integer, integer) returns integer as 
'/path/to/func', 'add' language c strict):


postgres=# create foreign table foo (a int) server myserver options 
(table_name 'foo');
postgres=# create foreign table bar (a int) server myserver options 
(table_name 'bar');

postgres=# create table tab (a int, b int);
postgres=# insert into foo select a from generate_series(1, 1000) a;
postgres=# insert into bar select a from generate_series(1, 1000) a;
postgres=# insert into tab values (1, 1);
postgres=# analyze foo;
postgres=# analyze bar;
postgres=# analyze tab;

[Terminal 1]
postgres=# begin;
BEGIN
postgres=# update tab set b = b + 1 where a = 1;
UPDATE 1

[Terminal 2]
postgres=# explain verbose select tab.* from tab, foo, bar where foo.a = 
bar.a and add(foo.a, bar.a) > 0 limit 1 for update;


QUERY PLAN


-
 Limit  (cost=100.00..107.70 rows=1 width=70)
   Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
   ->  LockRows  (cost=100.00..2663.48 rows=333 width=70)
 Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
 ->  Nested Loop  (cost=100.00..2660.15 rows=333 width=70)
   Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
   ->  Foreign Scan  (cost=100.00..2654.97 rows=333 width=56)
 Output: foo.*, bar.*
 Filter: (add(foo.a, bar.a) > 0)
 Relations: (public.foo) INNER JOIN (public.bar)
 Remote SQL: SELECT ROW(r2.a), ROW(r3.a), r2.a, 
r3.a FROM (public.foo r2 INNER JOIN public.bar r3 ON (TRUE)) WHERE 
((r2.a = r3.a)) F

OR UPDATE OF r2 FOR UPDATE OF r3
 ->  Hash Join  (cost=247.50..301.25 rows=333 width=56)
   Output: foo.*, bar.*
   Hash Cond: (foo.a = bar.a)
   Join Filter: (add(foo.a, bar.a) > 0)
   ->  Foreign Scan on public.foo 
(cost=100.00..135.00 rows=1000 width=32)

 Output: foo.*, foo.a
 Remote SQL: SELECT a FROM public.foo 
FOR UPDATE
   ->  Hash  (cost=135.00..135.00 rows=1000 
width=32)

 Output: bar.*, bar.a
 ->  Foreign Scan on public.bar 
(cost=100.00..135.00 rows=1000 width=32)

   Output: bar.*, bar.a
   Remote SQL: SELECT a FROM 
public.bar FOR UPDATE

   ->  Materialize  (cost=0.00..1.01 rows=1 width=14)
 Output: tab.a, tab.b, tab.ctid
 ->  Seq Scan on public.tab  (cost=0.00..1.01 
rows=1 width=14)

   Output: tab.a, tab.b, tab.ctid
(27 rows)

postgres=# select tab.* from tab, foo, bar where foo.a = bar.a and 
add(foo.a, bar.a) > 0 limit 1 for update;


[Terminal 1]
postgres=# commit;
COMMIT

[Terminal 2] (After the commit in Terminal 1, Terminal 2 will show the 
following.)

 a | b
---+---
(0 rows)

This is wrong.  (Note that since the SELECT FOR UPDATE doesn't impose 
any condition on a tuple from the local table tab, the EvalPlanQual 
recheck executed should succeed.)  The reason for that is that the 
targetlist of the local join plan is the same as for the ForeignScan, 
which outputs neither foo.a nor bar.a required as an argument of the 
function add().


Best regards,
Etsuro Fujita





Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-08 Thread Aleksander Alekseev
Hello, Robert.

> So: do we have clear evidence that we need 128 partitions here, or
> might, say, 16 work just fine?

Yes, this number of partitions was chosen based on this benchmark (see
"spinlock array" column):

http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

In fact we can choose 16, 32, 64 or 128 partitions - any number works
better than current implementation with a single spinlock. But
according to this benchmark 128 partitions give twice as much TPS then
16 partitions, almost as if we didn't have any locks at all (see "no
locks" column).

Still I'm a bit concerned that 128 partitions require (128-16)*
( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional
memory per hash table which is:

(gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*))
$1 = 1904

... bytes of shared memory on amd64.

I'm not sure if this is considered OK. If it is I suggest to leave
number 128. If it's not, perhaps we could agree on say 64 partitions as
a compromise. Described benchmark doesn't represent any possible
workload anyway. I personally believe that we don't have so many small
hash tables that 1.8 Kb of additional shared memory per hash table would
be an issue.

> I don't really want to increase the number of lock manager partition
> locks to 128 just for the convenience of this patch, and even less do
> I want to impose the requirement that every future shared hash table
> must use an equal number of partitions.

May I ask why? Does it create a real problem?

> I don't see any reason why the number of free list partitions needs
> to be a constant, or why it needs to be equal to the number of hash
> bucket partitions.  That just seems like a completely unnecessary
> constraint. You can take the hash value, or whatever else you use to
> pick the partition, modulo any number you want. 

True, number of spinlocks / freeLists could be a variable. I believe
it's an old-good simplicity vs flexibility question.

Lets say we allow user (i.e. calling side) to choose spinlocks and free
lists number. So now we should use freeList[hash % items_number] formula
which means division instead of binary shift. Do we really need such an
expensive operation just to calculate index of a free list? Probably
not. Let limit possible items_number values so it could be only power
of two. Now there is only four possible values user would more likely to
choose (16, 32, 64, 128), which is not as flexible anymore.

What about memory? If we allocate mutex, nentries and freeList
dynamically depending on items_number value, HASHHDR should store
pointers to allocated memory which means additional indirection for
almost every access to mutex, nentries and freeList fields. Probably a
bad idea. Otherwise we still waste shared memory, probably even more
memory since we don't want maximum items_number value to be too low.

Etc.

I believe such a change will cause only additional code complexity
(code is already far from trivial in dynahash.c) so next time it will
be much harder to change anything, probably some bugs we will miss and
will not solve any real problem. Why just not to choose a constant which
seems to be good enough instead?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Michael Paquier
On Mon, Feb 8, 2016 at 6:18 PM, Andres Freund  wrote:
> On 2016-02-08 15:58:49 +0900, Michael Paquier wrote:
>> On Sun, Feb 7, 2016 at 2:49 AM, Andres Freund  wrote:
>>  /*
>> + * XLogInsert
>> + *
>> + * A shorthand for XLogInsertExtended, to update the progress of WAL
>> + * activity by default.
>> + */
>> +XLogRecPtr
>> +XLogInsert(RmgrId rmid, uint8 info)
>> +{
>> + return XLogInsertExtended(rmid, info, true);
>> +}
>> +
>> +/*
>> + * XLogInsertExtended
>
> I'm not really a fan. I'd rather change existing callers to add a
> 'flags' bitmask argument. Right now there can't really be XLogInserts()
> in extension code, so that's pretty ok to change.

So, you mean to extend directly XLogInsert() with an int flag that has
only one possible value for now, and update *all* the calls of
XLogInsert(). Well, I am detecting 218 calls of XLogInsert() in the
code, so that's not a small change. Honestly, I'd really rather have
the Extended() routine with default taking 0 for the integer flag, 0
meaning in the case of the progress that it gets updated. We have
similar routines for RangeVar, some in bufmgr.c and some other places.
So, which way do you want to take? I don't expect to win this argument
if that's a 1vs1 against a committer, just I would like to be sure
that I am not doing useless things. Time matters.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 1:16 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote 
> in 
>> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>>  wrote:
>> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>> >  wrote:
>> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> >>  wrote:
>> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
>> >>> wrote:
>>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>   wrote:
>> > Yes, please let's use the custom language, and let's not care of not
>> > more than 1 level of nesting so as it is possible to represent
>> > pg_stat_replication in a simple way for the user.
>> 
>>  "not" is used twice in this sentence in a way that renders me not able
>>  to be sure that I'm not understanding it not properly.
>> >>>
>> >>> 4 times here. Score beaten.
>> >>>
>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> >>> to only support configurations up to one level of nested objects, like
>> >>> that:
>> >>> 2[node1, node2, node3]
>> >>> node1, 2[node2, node3], node3
>> >>> In short, we could restrict things so as we cannot define a group of
>> >>> nodes within an existing group.
>> >>
>> >> No, actually, that's stupid. Having up to two nested levels makes more
>> >> sense, a quite common case for this feature being something like that:
>> >> 2{node1,[node2,node3]}
>> >> In short, sync confirmation is waited from node1 and (node2 or node3).
>> >>
>> >> Flattening groups of nodes with a new catalog will be necessary to
>> >> ease the view of this data to users:
>> >> - group name?
>> >> - array of members with nodes/groups
>> >> - group type: quorum or priority
>> >> - number of items to wait for in this group
>> >
>> > So, here are some thoughts to make that more user-friendly. I think
>> > that the critical issue here is to properly flatten the meta data in
>> > the custom language and represent it properly in a new catalog,
>> > without messing up too much with the existing pg_stat_replication that
>> > people are now used to for 5 releases since 9.0. So, I would think
>> > that we will need to have a new catalog, say
>> > pg_stat_replication_groups with the following things:
>> > - One line of this catalog represents the status of a group or of a single 
>> > node.
>> > - The status of a node/group is either sync or potential, if a
>> > node/group is specified more than once, it may be possible that it
>> > would be sync and potential depending on where it is defined, in which
>> > case setting its status to 'sync' has the most sense. If it is in sync
>> > state I guess.
>> > - Move sync_priority and sync_state, actually an equivalent from
>> > pg_stat_replication into this new catalog, because those represent the
>> > status of a node or group of nodes.
>> > - group name, and by that I think that we had perhaps better make
>> > mandatory the need to append a name with a quorum or priority group.
>> > The group at the highest level is forcibly named as 'top', 'main', or
>> > whatever if not directly specified by the user. If the entry is
>> > directly a node, use the application_name.
>> > - Type of group, quorum or priority
>> > - Elements in this group, an element can be a group name or a node
>> > name, aka application_name. If group is of type priority, the elements
>> > are listed in increasing order. So the elements with lower priority
>> > get first, etc. We could have one column listing explicitly a list of
>> > integers that map with the elements of a group but it does not seem
>> > worth it, what users would like to know is what are the nodes that are
>> > prioritized. This covers the former 'priority' field of
>> > pg_stat_replication.
>> >
>> > We may have a good idea of how to define a custom language, still we
>> > are going to need to design a clean interface at catalog level more or
>> > less close to what is written here. If we can get a clean interface,
>> > the custom language implemented, and TAP tests that take advantage of
>> > this user interface to check the node/group statuses, I guess that we
>> > would be in good shape for this patch.
>> >
>> > Anyway that's not a small project, and perhaps I am over-complicating
>> > the whole thing.
>> >
>> > Thoughts?
>>
>> I agree that we would need something like such new view in the future,
>> however it seems too late to work on that for 9.6 unfortunately.
>> There is only one CommitFest left. Let's focus on very simple case, i.e.,
>> 1-level priority list, now, then we can extend it to cover other cases.
>>
>> If we can commit the simple version too early and there is enough
>> time before the date of feature freeze, of course I'm happy to review
>> the 

[HACKERS] [Proposal] Improvement of GiST page layout

2016-02-08 Thread Andrew Borodin
Hi, hackers!

I want to propose improvement of GiST page layout.

GiST is optimized to reduce disk operations on search queries, for
example, windows search queries in case of R-tree.

I expect that more complicated page layout will help to tradeoff some
of CPU efficiency for disk efficiency.

GiST tree is a balanced tree structure with two kinds of pages:
internal pages and leaf pages. Each tree page contains bunch of tuples
with the same structure. Tuples of an internal page reference other
pages of the same tree, while a leaf page tuples holds heap TIDs.

During execution of search query GiST for each tuple on page invokes
key comparison algorithm with two possible outcomes: 'no' and 'may
be'. 'May be' answer recursively descends search algorithms to
referenced page (in case of internal page) or yields tuple (in case of
leaf page).

Expected tuples count on page is around of tenth to hundreds. This
count is big enough to try to save some cache lines from loading into
CPU during enumeration.

For B-trees during inspection of a page we effectively apply binary
search algorithm, which is not possible in GiST tree.

Let's consider R-tree with arbitrary fan-out f. If a given query will
find exactly one data tuple, it is easily to show that keys comparison
count is minimal if f->e /*round_to_optimal(2.78) == 3*/ (tree have to
review f*h keys, h=logf(N), f*logf(N) is minimal when f->e). Smaller
keys comparison count means less cache lines are touched. So fan-out
reduction means cache pressure reduction (except avg fan-out 2, which
seems to be too small) and less time waiting for RAM. I suppose, all
that reasoning holds true in a cases when not just one tuple will be
found.

How do we reduce tree fan-out? Obviously, we can’t fill page with just
3 tuples. But we can install small tree-like structure inside one
page. General GiST index has root page. But a page tree should have
“root” layer of tuples. Private (or internal, intermediate, auxiliary,
I can’t pick precise word) tuples have only keys and fixed-size(F)
array of underlying records offsets. Each layer is linked-list. After
page have just been allocated there is only “ground” level of regular
tuples. Eventually record count reaches F-1 and we create new root
layer with two tuples. Each new tuple references half of preexisting
records. Placement of new “ground” tuples on page eventually will
cause internal tuple to split. If there is not enough space to spilt
internal tuple we mark page for whole page-split during next iteration
of insertion algorithms of owning tree. That is why tuple-spilt
happens on F-1 tuples, not on F: if we have no space for splitting, we
just adding reference to last slot. In this algorithm, page split will
cause major page defragmentation: we take root layer, halve it and
place halves on different pages. When half of a data is gone to other
page, restructuration should tend to place records in such a fashion
that accessed together tuples lie together. I think, placing whole
level together is a good strategy.

Let’s look how page grows with fan-out factor F=5. RLS – root layer
start, G – ground tuple, Ix – internal tuple of level x.

When we added 3 ground tuples it’s just a ground layer
RLS=0|G G G
Then we place one more tuple and layer splits:
RLS=4|G G G G I0 I0
Each I0 tuple now references two G tuples. We keep placing G tuples.
RLS=4|G G G G I0 I0 G G
And then one of I0 tuples is spitted
RLS=4|G G G G I0 I0 G G G I0
And right after one more I0 split causes new layer
RLS=12|G G G G I0 I0 G G G I0 G I0 I1 I1

And so on, until we have space on a page. In a regular GiST we ran out
of space on page before we insert tuple on page. Now we can run out of
space during insertion. But this will not be fatal, we still will be
able to place ground level tuple. Inner index structure will use
one-extra slot for reference allocation, but next insertion on a page
will deal with it. On a pages marked for split we have to find which
exactly index tuple has run out of extra-slot during split and fix it.

Several years ago I had unsuccessful attempt to implement akin
algorithm in a database engine of a proprietary system. I stuck in the
debug of deletion algorithm and tree condensation, it is in use in
that system. I suppose it was a mistake to defrag page with creeping
heuristic, eventually I dropped the idea and just moved on to actually
important tasks, there is always deadline rush in business. I’m newbie
in Postgres internals. But as I see there is no deletion on GiST page.
So I feel itch to try this feature one more time.

I expect that implementation of this proposal could speed up
insertions into index by 20% and performance of queries by 30% when
all index accommodates in shared buffer. In case of buffer starvation,
when index is accessed through disk this feature will cause 15%
performance degrade since effective page capacity will be smaller.
Should this feature be configurable? May be this should be other
access method?

I need help to 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Amit Kapila
On Tue, Feb 9, 2016 at 5:37 AM, Michael Paquier 
wrote:
>
> On Mon, Feb 8, 2016 at 11:24 PM, Amit Kapila 
wrote:
> > On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >>
> >> >>   /*
> >> >> +  * Fetch the progress position before taking any WAL insert
lock.
> >> >> This
> >> >> +  * is normally an operation that does not take long, but
leaving
> >> >> this
> >> >> +  * lookup out of the section taken an exclusive lock saves a
> >> >> couple
> >> >> +  * of instructions.
> >> >> +  */
> >> >> + progress_lsn = GetProgressRecPtr();
> >> >
> >> > too long for my taste. How about:
> >> > /* get progress, before acuiring insert locks to shorten locked
section
> >> > */
> >>
> >> Check.
> >>
> >
> > What is the need of holding locks one-by-one during checkpoint when
> > we anyway are going to take lock on all the insertion slots.
>
> A couple of records can slip in while scanning the progress LSN
> through all the locks.
>

Do you see any benefit in allowing checkpoints for such cases considering
bgwriter will anyway take care of logging standby snapshot for such
cases?
I don't think there is any reasonable benefit by improving the situation of
idle-system check for checkpoint other than just including
standbysnapshot WAL record.  OTOH as checkpoint is not so seldom
operation, so allowing such a change should be okay, but I don't see
the need for same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 12:16 PM, kharagesuraj 
wrote:

> Hello,
>
>
>
>
>
> >> I agree with first version, and attached the updated *patch* which are
> >> modified so that it supports simple multiple sync replication you
> >>suggested.
> >> (but test cases are not included yet.)
>
>
>
> I have tried for some basic in-built test cases for multisync rep.
>
> I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=[hidden email]
> ">patch patch.
>
> Still it is in progress.
>
> Please have look and correct me if i am wrong and suggest remaining test
> cases.
>

So the interesting part of this patch is 006_sync_rep.pl. I think that you
had better build something on top of my patch as a separate patch. This
would make things clearer.

+my $result = $node_master->psql('postgres', "select application_name,
sync_state from pg_stat_replication;");
+print "$result \n";
+is($result, "standby_1|sync\nstandby_2|sync\nstandby_3|potential",
'checked for sync standbys state initially');
Now regarding the test, you visibly got the idea, though I think that we'd
want to update a bit the parameters of postgresql.conf and re-run those
queries a couple of times, that's cheaper than having to re-create new
cluster nodes all the time, so just create a base, then switch s_s_names a
bit, and query pg_stat_replication, and you are already doing the latter.

Also, please attach patches directly to your emails. When loading something
on nabble this is located only there and not within postgresql.org which
would be annoying if nabble disappears at some point. You would also want
to use directly an email client and interact with the community mailing
lists this way instead of going through the nabble's forum-like interface
(never used it, not really willing to use it, but I guess that it is
similar to that).

I am attaching what you posted on this email for the archive's sake.
-- 
Michael


recovery_test_suite_with_multisync.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 8:16 PM, Andres Freund  wrote:
>
> On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> > I think deciding it automatically without user require to configure it,
> > certainly has merits, but what about some cases where user can get
> > benefits by configuring themselves like the cases where we use
> > PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> > buffers and won't cause misaligned writes even for smaller chunk sizes
> > like 512 bytes or so).  Some googling [1] reveals that other databases
> > also provides user with option to configure wal block/chunk size (as
> > BLOCKSIZE), although they seem to decide chunk size based on
> > disk-sector size.
>
> FWIW, you usually can't do that small writes with O_DIRECT. Usually it
> has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
> filesystems that do support doing such writes, they essentially fall
> back to doing buffered IO.
>

I have not observed this during the tests (observation is based on the
fact that whenever there is a use of OS buffer cache, writing in smaller
chunks (lesser than 4K) leads to reads and in-turn decrease the
performance). I don't see such an implication even in documentation.

> > An additional thought, which is not necessarily related to this patch
is,
> > if user chooses and or we decide to write in 512 bytes sized chunks,
> > which is usually a disk sector size, then can't we think of avoiding
> > CRC for each record for such cases, because each WAL write in
> > it-self will be atomic.  While reading, if we process in wal-chunk-sized
> > units, then I think it should be possible to detect end-of-wal based
> > on data read.
>
> O_DIRECT doesn't give any useful guarantees to do something like the
> above. It doesn't have any ordering or durability implications. You
> still need to do fdatasyncs and such.
>

It doesn't need to, if we use o_sync flag which we always use whenever
we use O_DIRECT mode during WAL writes.


> Besides, with the new CRC implications, that doesn't really seem like
> such a large win anyway.
>

I haven't check this till now that how much big win we can get if we
can avoid CRC's and still provide same reliability, but I think it can
certainly save CPU instructions both during writes and replay and
performance must be better than current.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Amit Langote

Hi Suraj,

On 2016/02/09 12:16, kharagesuraj wrote:
> Hello,
> 
> 
>>> I agree with first version, and attached the updated patch which are
>>> modified so that it supports simple multiple sync replication you
>>> suggested.
>>> (but test cases are not included yet.)
> 
> I have tried for some basic in-built test cases for multisync rep.
> I have created one patch over Michael's  href="http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com;>patch
>  patch.
> Still it is in progress.
> Please have look and correct me if i am wrong and suggest remaining test 
> cases.
> 
> recovery_test_suite_with_multisync.patch (36K) 
> 

Thanks for creating the patch. Sorry to nitpick but as has been brought up
before, it's better to send patches as email attachments (that is, not as
a links to external sites).

Also, it would be helpful if your patch is submitted as a diff over
applying Michael's patch. That is, only the stuff specific to testing the
multiple sync feature and let the rest be taken care of by Michael's base
patch.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote in 

> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>  wrote:
> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> >  wrote:
> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> >>  wrote:
> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
> >>> wrote:
>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>   wrote:
> > Yes, please let's use the custom language, and let's not care of not
> > more than 1 level of nesting so as it is possible to represent
> > pg_stat_replication in a simple way for the user.
> 
>  "not" is used twice in this sentence in a way that renders me not able
>  to be sure that I'm not understanding it not properly.
> >>>
> >>> 4 times here. Score beaten.
> >>>
> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> >>> to only support configurations up to one level of nested objects, like
> >>> that:
> >>> 2[node1, node2, node3]
> >>> node1, 2[node2, node3], node3
> >>> In short, we could restrict things so as we cannot define a group of
> >>> nodes within an existing group.
> >>
> >> No, actually, that's stupid. Having up to two nested levels makes more
> >> sense, a quite common case for this feature being something like that:
> >> 2{node1,[node2,node3]}
> >> In short, sync confirmation is waited from node1 and (node2 or node3).
> >>
> >> Flattening groups of nodes with a new catalog will be necessary to
> >> ease the view of this data to users:
> >> - group name?
> >> - array of members with nodes/groups
> >> - group type: quorum or priority
> >> - number of items to wait for in this group
> >
> > So, here are some thoughts to make that more user-friendly. I think
> > that the critical issue here is to properly flatten the meta data in
> > the custom language and represent it properly in a new catalog,
> > without messing up too much with the existing pg_stat_replication that
> > people are now used to for 5 releases since 9.0. So, I would think
> > that we will need to have a new catalog, say
> > pg_stat_replication_groups with the following things:
> > - One line of this catalog represents the status of a group or of a single 
> > node.
> > - The status of a node/group is either sync or potential, if a
> > node/group is specified more than once, it may be possible that it
> > would be sync and potential depending on where it is defined, in which
> > case setting its status to 'sync' has the most sense. If it is in sync
> > state I guess.
> > - Move sync_priority and sync_state, actually an equivalent from
> > pg_stat_replication into this new catalog, because those represent the
> > status of a node or group of nodes.
> > - group name, and by that I think that we had perhaps better make
> > mandatory the need to append a name with a quorum or priority group.
> > The group at the highest level is forcibly named as 'top', 'main', or
> > whatever if not directly specified by the user. If the entry is
> > directly a node, use the application_name.
> > - Type of group, quorum or priority
> > - Elements in this group, an element can be a group name or a node
> > name, aka application_name. If group is of type priority, the elements
> > are listed in increasing order. So the elements with lower priority
> > get first, etc. We could have one column listing explicitly a list of
> > integers that map with the elements of a group but it does not seem
> > worth it, what users would like to know is what are the nodes that are
> > prioritized. This covers the former 'priority' field of
> > pg_stat_replication.
> >
> > We may have a good idea of how to define a custom language, still we
> > are going to need to design a clean interface at catalog level more or
> > less close to what is written here. If we can get a clean interface,
> > the custom language implemented, and TAP tests that take advantage of
> > this user interface to check the node/group statuses, I guess that we
> > would be in good shape for this patch.
> >
> > Anyway that's not a small project, and perhaps I am over-complicating
> > the whole thing.
> >
> > Thoughts?
> 
> I agree that we would need something like such new view in the future,
> however it seems too late to work on that for 9.6 unfortunately.
> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> 1-level priority list, now, then we can extend it to cover other cases.
> 
> If we can commit the simple version too early and there is enough
> time before the date of feature freeze, of course I'm happy to review
> the extended version like you proposed, for 9.6.

I agree to Fujii-san. There would be many of convenient gadgets
around this and they are completely welcome, but having
fundamental 

Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 4:22 AM, Fabien COELHO  wrote:
>> +   /* compute total_weight */
>> +   for (i = 0; i < num_scripts; i++)
>> +   {
>> +   total_weight += sql_script[i].weight;
>> +
>> +   /* detect overflow... */
>> If let as int64, you may want to remove this overflow check, or keep
>> it as int32.
>
>
> I'd rather keep int64, and remove the check.

OK, and you did so. Thanks.

>>> [JSON/YAML]
>>> If you want something else, it would help to provide a sample of what you
>>> expect.
>>
>> You could do that with an additional option here as well:
>> --output-format=normal|yamljson. The tastes of each user is different.
>
> I think that json/yaml-ifying pgbench output is beyond the object of this
> patch, so should be kept out?

Yeah, that's just a free idea that this set of patches does not need
to address. If someone thinks that's worth it, feel free to submit a
patch, perhaps we could add a TODO item on the wiki. Regarding the
output generated by your patch, I think that's fine. Perhaps Alvaro
has other thoughts on the matter. I don't know this part.

>>> Find attached a 18-d which addresses these concerns, and a actualized
>>> 18-e
>>> for the prefix.
>>
>>
>> (I could live without that personally)
>
> Hmmm, I type them and I'm not so good with a keyboard, so "se" is better
> than:
>
> "selct-onlyect-only".

I can understand that feeling.

>> -/* return builtin script "name", or fail if not found */
>> +/* return commands for selected builtin script, if unambiguous */
>> static script_t *
>> findBuiltin(const char *name)
>> This comment needs a refresh. This does not return a set of commands,
>> but the script itself.
>
> Indeed.
>
> Attached 19-d and 19-e.

+/* return builtin script "name", or fail if not found */
builtin does not sound like correct English to me, but built-in is.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Tom Lane
Ashutosh Bapat  writes:
> Sorry to come to this late.
> The userid being printed is from UserMapping. The new API
> GetUserMappingById() allows an FDW to get user mapping by its OID. This API
> is intended to be used by FDWs to fetch the user mapping inferred by the
> core for given join between foreign relations. In such user mapping object
> , userid may be -1 for a public user mapping.

If that is actually how it works, it's broken and I'm going to insist
on a redesign.  There is nothing anywhere that says that 0x
is not a valid OID.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat
Sorry, I was wrong. For public user mapping userid is 0 (InvalidOid), which
is returned as is in UserMapping object. I confused InvalidOid with -1.
Sorry for the confusion.

On Tue, Feb 9, 2016 at 10:21 AM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > Sorry to come to this late.
> > The userid being printed is from UserMapping. The new API
> > GetUserMappingById() allows an FDW to get user mapping by its OID. This
> API
> > is intended to be used by FDWs to fetch the user mapping inferred by the
> > core for given join between foreign relations. In such user mapping
> object
> > , userid may be -1 for a public user mapping.
>
> If that is actually how it works, it's broken and I'm going to insist
> on a redesign.  There is nothing anywhere that says that 0x
> is not a valid OID.
>
> regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] [Proposal] Improvement of GiST page layout

2016-02-08 Thread Heikki Linnakangas

On 09/02/16 07:15, Andrew Borodin wrote:

Hi, hackers!

I want to propose improvement of GiST page layout.

GiST is optimized to reduce disk operations on search queries, for
example, windows search queries in case of R-tree.

I expect that more complicated page layout will help to tradeoff some
of CPU efficiency for disk efficiency.

GiST tree is a balanced tree structure with two kinds of pages:
internal pages and leaf pages. Each tree page contains bunch of tuples
with the same structure. Tuples of an internal page reference other
pages of the same tree, while a leaf page tuples holds heap TIDs.

During execution of search query GiST for each tuple on page invokes
key comparison algorithm with two possible outcomes: 'no' and 'may
be'. 'May be' answer recursively descends search algorithms to
referenced page (in case of internal page) or yields tuple (in case of
leaf page).

Expected tuples count on page is around of tenth to hundreds. This
count is big enough to try to save some cache lines from loading into
CPU during enumeration.

For B-trees during inspection of a page we effectively apply binary
search algorithm, which is not possible in GiST tree.

Let's consider R-tree with arbitrary fan-out f. If a given query will
find exactly one data tuple, it is easily to show that keys comparison
count is minimal if f->e /*round_to_optimal(2.78) == 3*/ (tree have to
review f*h keys, h=logf(N), f*logf(N) is minimal when f->e). Smaller
keys comparison count means less cache lines are touched. So fan-out
reduction means cache pressure reduction (except avg fan-out 2, which
seems to be too small) and less time waiting for RAM. I suppose, all
that reasoning holds true in a cases when not just one tuple will be
found.


GiST comparison operators tend to be quite expensive, so I suspect much 
of the gain is simply from reducing the number of comparisons, rather 
than cache efficiency. The conclusion is the same though, so it doesn't 
matter.



How do we reduce tree fan-out? Obviously, we can’t fill page with just
3 tuples. But we can install small tree-like structure inside one
page. General GiST index has root page. But a page tree should have
“root” layer of tuples. Private (or internal, intermediate, auxiliary,
I can’t pick precise word) tuples have only keys and fixed-size(F)
array of underlying records offsets. Each layer is linked-list. After
page have just been allocated there is only “ground” level of regular
tuples. Eventually record count reaches F-1 and we create new root
layer with two tuples. Each new tuple references half of preexisting
records. Placement of new “ground” tuples on page eventually will
cause internal tuple to split. If there is not enough space to spilt
internal tuple we mark page for whole page-split during next iteration
of insertion algorithms of owning tree. That is why tuple-spilt
happens on F-1 tuples, not on F: if we have no space for splitting, we
just adding reference to last slot. In this algorithm, page split will
cause major page defragmentation: we take root layer, halve it and
place halves on different pages. When half of a data is gone to other
page, restructuration should tend to place records in such a fashion
that accessed together tuples lie together. I think, placing whole
level together is a good strategy.


Yeah, something like that would make a lot of sense. I've actually been 
thinking of this same idea myself over the years, but never got around 
to implement it.


I'm not wedded to that particular layout, it seems a bit complicated. A 
two-level setup within each page might be good enough in practice: there 
would regular tuples, like today, and also "container" tuples, which 
represent a group of regular tuples. Also note that you can freely move 
tuples around in a page, so instead of storing an actual array of 
offsets with the container tuples, you could store just a range of line 
pointers, and keep all the child tuples packed together.


I wonder if it's really a good idea to perform page split by simply 
using the "container" tuples. It certainly saves CPU time when 
splitting, but I'm worried that the split might not be as good as today. 
The mini-groups within the page are grown "organically" as tuples are 
inserted, and the page split algorithm might come up with a better 
grouping if it sees all the tuples at once.



I expect that implementation of this proposal could speed up
insertions into index by 20% and performance of queries by 30% when
all index accommodates in shared buffer. In case of buffer starvation,
when index is accessed through disk this feature will cause 15%
performance degrade since effective page capacity will be smaller.


Not sure how you got those numbers, but the general direction sounds right.


Should this feature be configurable? May be this should be other
access method?


I think this should be included in GiST, and always enabled. Or at least 
enabled by default. While the index will be slightly larger because of 

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Pavel Stehule
Hi


> I just rechecked the thread.  In my reading, lots of people argued
> whether it should be called \rotate or \pivot or \crosstab; it seems the
> \crosstabview proposal was determined to be best.  I can support that
> decision.  But once the details were discussed, it was only you and
> Daniel left in the thread; nobody else participated.  While I understand
> that you may think that "silence is consent", what I am afraid of is
> that some committer will look at this two months from now and say "I
> hate this Hcol+ stuff, -1 from me" and send the patch back for syntax
> rework.  IMO it's better to have more people chime in here so that the
> patch that we discuss during the next commitfest is really the best one
> we can think of.
>

I have not a feeling so we did some with Daniel privately. All work was
public (I checked my mailbox) - but what is unhappy - in more mailing list
threads (not sure how it is possible, because subjects looks same). The
discus about the design was public, I am sure. It was relative longer
process, with good progress (from my perspective), because Daniel accepts
and fixed all my objection. The proposed syntax is fully consistent with
other psql commands - hard to create something new there, because psql
parser is pretty limited. Although I am thinking so syntax is good, clean
and useful I am open to discuss about it. Please, try the last design, last
patch - I spent lot of hours (and I am sure so Daniel much more) in
thinking how this can be designed better.


> Also, what about the business of putting "x" if there's no third column?
> Three months from now some Czech psql hacker will say "we should use
> Unicode chars for this" and we will be forever stuck with \pset
> unicode_crosstab_marker to change the character to a ☑ BALLOT BOX WITH
> CZECH.  Maybe we should think that a bit harder -- for example, what
> about just rejecting the case with no third column and forcing the user
> to add a third column with the character they choose?  That way you
> avoid that mess.
>

These features are in category "nice to have". There are no problem to do
in last commitfest or in next release cycle. It is not reason why to block
commit of this feature, and I am sure so lot of users can be pretty happy
with "basic" version of this patch. The all necessary functionality is
there and working. We can continue on development in other cycles, but for
this cycle, I am sure, so all necessary work is done.


>
> > This feature has only small relation to SQL PIVOTING feature - it is just
> > form of view and I am agree with Daniel about sense of this feature.
>
> Yes, I don't disagree there.  Robert Haas and David Fetter also
> expressed their support for psql-side processing, so I think we're good
> there.
>
>
great. Personally, I have not any objection against current state. If
anybody has, please do it early. We can move to forward. This is nice
feature, good patch, and there is not reason why stop here.

Regards

Pavel


> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Fabien COELHO


Hi Michaël,


Attached 19-d and 19-e.


+/* return builtin script "name", or fail if not found */
builtin does not sound like correct English to me, but built-in is.


I notice that "man bash" uses "builtin" extensively, so I think it is okay 
like that, but I would be fine as well with "built-in".


I suggest to let it as is unless some native speaker really requires 
"built-in", in which case there would be many places to update, so that 
would be another orthographic-oriented patch:-)


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat
Sorry to come to this late.

The userid being printed is from UserMapping. The new API
GetUserMappingById() allows an FDW to get user mapping by its OID. This API
is intended to be used by FDWs to fetch the user mapping inferred by the
core for given join between foreign relations. In such user mapping object
, userid may be -1 for a public user mapping. I think using %u for -1 will
print it as largest integer. Would that create confusion for users?

On Mon, Feb 8, 2016 at 9:37 PM, Tom Lane  wrote:

> Etsuro Fujita  writes:
> > Here is a patch to use %u not %d to print umid and userid.
>
> Pushed, thanks.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 1:22 PM, Ashutosh Bapat
 wrote:
> The userid being printed is from UserMapping. The new API
> GetUserMappingById() allows an FDW to get user mapping by its OID. This API
> is intended to be used by FDWs to fetch the user mapping inferred by the
> core for given join between foreign relations. In such user mapping object ,
> userid may be -1 for a public user mapping.

I am a bit surprised by this sentence, UserMapping->userid is an Oid,
and those are unsigned. Could you clarify?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Existence check for suitable index in advance when concurrently refreshing.

2016-02-08 Thread Fujii Masao
On Thu, Jan 28, 2016 at 1:01 AM, Masahiko Sawada  wrote:
> On Wed, Jan 27, 2016 at 4:42 PM, Fujii Masao  wrote:
>> On Tue, Jan 26, 2016 at 9:33 PM, Masahiko Sawada  
>> wrote:
>>> Hi all,
>>>
>>> In concurrently refreshing materialized view, we check whether that
>>> materialized view has suitable index(unique and not having WHERE
>>> condition), after filling data to new snapshot
>>> (refresh_matview_datafill()).
>>> This logic leads to taking a lot of time until postgres returns ERROR
>>> log if that table doesn't has suitable index and table is large. it
>>> wastes time.
>>> I think we should check whether that materialized view can use
>>> concurrently refreshing or not in advance.
>>
>> +1
>>
>>> The patch is attached.
>>>
>>> Please give me feedbacks.
>
> Thank you for having look at this patch.
>
>> +indexRel = index_open(indexoid, RowExclusiveLock);
>>
>> Can we use AccessShareLock here, instead?
>
> Yeah, I think we can use it. Fixed.
>
>> +if (indexStruct->indisunique &&
>> +IndexIsValid(indexStruct) &&
>> +RelationGetIndexExpressions(indexRel) == NIL &&
>> +RelationGetIndexPredicate(indexRel) == NIL)
>> +hasUniqueIndex = true;
>> +
>> +index_close(indexRel, RowExclusiveLock);
>>
>> In the case where hasUniqueIndex = true, ISTM that we can get out of
>> the loop immediately just after calling index_close(). No?
>
> Fixed.
>
>> +/* Must have at least one unique index */
>> +Assert(foundUniqueIndex);
>>
>> Can we guarantee that there is at least one valid unique index here?
>> If yes, it's better to write the comment about that.
>>
>
> Added.
>
> Attached latest patch. Please review it.

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

Regards,

-- 
Fujii Masao
*** a/src/backend/commands/matview.c
--- b/src/backend/commands/matview.c
***
*** 217,222  ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
--- 217,267 
  			 RelationGetRelationName(matviewRel));
  
  	/*
+ 	 * Check that there is a unique index with no WHERE clause on
+ 	 * one or more columns of the materialized view if CONCURRENTLY
+ 	 * is specified.
+ 	 */
+ 	if (concurrent)
+ 	{
+ 		List		*indexoidlist = RelationGetIndexList(matviewRel);
+ 		ListCell 	*indexoidscan;
+ 		bool		hasUniqueIndex = false;
+ 
+ 		foreach(indexoidscan, indexoidlist)
+ 		{
+ 			Oid			indexoid = lfirst_oid(indexoidscan);
+ 			Relation	indexRel;
+ 			Form_pg_index	indexStruct;
+ 
+ 			indexRel = index_open(indexoid, AccessShareLock);
+ 			indexStruct = indexRel->rd_index;
+ 
+ 			if (indexStruct->indisunique &&
+ IndexIsValid(indexStruct) &&
+ RelationGetIndexExpressions(indexRel) == NIL &&
+ RelationGetIndexPredicate(indexRel) == NIL &&
+ indexStruct->indnatts > 0)
+ 			{
+ hasUniqueIndex = true;
+ index_close(indexRel, AccessShareLock);
+ break;
+ 			}
+ 
+ 			index_close(indexRel, AccessShareLock);
+ 		}
+ 
+ 		list_free(indexoidlist);
+ 
+ 		if (!hasUniqueIndex)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 	 errmsg("cannot refresh materialized view \"%s\" concurrently",
+ 			quote_qualified_identifier(get_namespace_name(RelationGetNamespace(matviewRel)),
+ 	   RelationGetRelationName(matviewRel))),
+ 	 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+ 	}
+ 
+ 	/*
  	 * The stored query was rewritten at the time of the MV definition, but
  	 * has not been scribbled on by the planner.
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Noah Misch
On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote:
> Of late, by far the majority of the random-noise failures we see in the
> buildfarm have come from failure to shut down the postmaster in a
> reasonable timeframe.

> We've seen variants
> on this theme on half a dozen machines just in the past week --- and it
> seems to mostly happen in 9.5 and HEAD, which is fishy.

It has been affecting only the four AIX animals, which do share hardware.
(Back in 2015 and once in 2016-01, it did affect axolotl and shearwater.)  I
agree the concentration on 9.5 and HEAD is suspicious; while those branches
get the most buildfarm runs, that factor by itself doesn't explain the
distribution of failures among versions.

> What I'd like to do to investigate this is put in a temporary HEAD-only
> patch that makes ShutdownXLOG() and its subroutines much chattier about
> how far they've gotten and what time it is, and also makes pg_ctl print
> out the current time if it gives up waiting.  A few failed runs with
> that in place will at least allow us to confirm or deny whether it's
> just that the shutdown checkpoint is sometimes really slow, or whether
> there's a bug lurking.
> 
> Any objections?  Anybody have another idea for data to collect?

That's reasonable.  If you would like higher-fidelity data, I can run loops of
"pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
that for HEAD and 9.2 simultaneously.  A day of logs from that should show
clearly if HEAD is systematically worse than 9.2.  By the way, you would
almost surely qualify for an account on this machine.


I had drafted the following message and patch last week, and I suppose it
belongs in this thread:

On Mon, Oct 12, 2015 at 06:41:06PM -0400, Tom Lane wrote:
> I'm not sure if this will completely fix our problems with "pg_ctl start"
> related buildfarm failures on very slow critters.  It does get rid of the
> hard wired 5-second timeout, but the 60-second timeout could still be an
> issue.  I think Noah was considering a patch to allow that number to be
> raised.  I'd be in favor of letting pg_ctl accept a default timeout length
> from an environment variable, and then the slower critters could be fixed
> by adjusting their buildfarm configurations.

Your commit 6bcce25 made src/bin test suites stop failing due to pg_ctl
startup timeouts, but other suites have been failing on the AIX buildfarm zoo
due to slow shutdown.  Example taking 72s to even reach ShutdownXLOG():
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sungazer=2016-01-15%2012%3A43%3A00

So, I wish to raise the timeout for those animals.  Using an environment
variable was a good idea; it's one less thing for test authors to remember.
Since the variable affects a performance-related fudge factor rather than
change behavior per se, I'm less skittish than usual about unintended
consequences of dynamic scope.  (With said unintended consequences in mind, I
made "pg_ctl register" ignore PGCTLTIMEOUT rather than embed its value into
the service created.)

Thanks,
nm
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index eaa0cc8..6ceb781 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -362,7 +362,9 @@ PostgreSQL documentation
   

 The maximum number of seconds to wait when waiting for startup or
-shutdown to complete.  The default is 60 seconds.
+shutdown to complete.  Defaults to the value of the
+PGCTLTIMEOUT environment variable or, if not set, to 60
+seconds.

   
  
@@ -487,6 +489,17 @@ PostgreSQL documentation
 
   

+PGCTLTIMEOUT
+
+
+ 
+  Default limit on the number of seconds to wait when waiting for startup
+  or shutdown to complete.  If not set, the default is 60 seconds.
+ 
+
+   
+
+   
 PGDATA
 
 
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 9da38c4..bae6c22 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -72,6 +72,7 @@ typedef enum
 static bool do_wait = false;
 static bool wait_set = false;
 static int wait_seconds = DEFAULT_WAIT;
+static bool wait_seconds_arg = false;
 static bool silent_mode = false;
 static ShutdownMode shutdown_mode = FAST_MODE;
 static int sig = SIGINT;   /* default */
@@ -1431,7 +1432,8 @@ pgwin32_CommandLine(bool registration)
if (registration && do_wait)
appendPQExpBuffer(cmdLine, " -w");
 
-   if (registration && wait_seconds != DEFAULT_WAIT)
+   /* Don't propagate a value from an environment variable. */
+   if (registration && wait_seconds_arg && wait_seconds != DEFAULT_WAIT)
appendPQExpBuffer(cmdLine, " -t %d", wait_seconds);
 
if (registration && silent_mode)
@@ -2128,6 +2130,7 @@ main(int argc, char **argv)
{NULL, 0, NULL, 0}
};
 
+   char   *env_wait;
int 

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Noah Misch  writes:
> On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote:
>> We've seen variants
>> on this theme on half a dozen machines just in the past week --- and it
>> seems to mostly happen in 9.5 and HEAD, which is fishy.

> It has been affecting only the four AIX animals, which do share hardware.
> (Back in 2015 and once in 2016-01, it did affect axolotl and shearwater.)

Certainly your AIX critters have shown this a bunch, but here's another
current example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl=2016-02-08%2014%3A49%3A23

> That's reasonable.  If you would like higher-fidelity data, I can run loops of
> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
> that for HEAD and 9.2 simultaneously.  A day of logs from that should show
> clearly if HEAD is systematically worse than 9.2.

That sounds like a fine plan, please do it.

> So, I wish to raise the timeout for those animals.  Using an environment
> variable was a good idea; it's one less thing for test authors to remember.
> Since the variable affects a performance-related fudge factor rather than
> change behavior per se, I'm less skittish than usual about unintended
> consequences of dynamic scope.  (With said unintended consequences in mind, I
> made "pg_ctl register" ignore PGCTLTIMEOUT rather than embed its value into
> the service created.)

While this isn't necessarily a bad idea in isolation, the current
buildfarm scripts explicitly specify a -t value to pg_ctl stop, which
I would not expect an environment variable to override.  So we need
to fix the buildfarm script to allow the timeout to be configurable.
I'm not sure if there would be any value-add in having pg_ctl answer
to an environment variable once we've done that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread kharagesuraj
Hello,


>> I agree with first version, and attached the updated patch which are
>> modified so that it supports simple multiple sync replication you
>>suggested.
>> (but test cases are not included yet.)

I have tried for some basic in-built test cases for multisync rep.
I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com;>patch
 patch.
Still it is in progress.
Please have look and correct me if i am wrong and suggest remaining test cases.

Regards
Suraj Kharage


If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
This email was sent by 
kharagesuraj
 (via Nabble)
To receive all replies by email, subscribe to this 
discussion

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

recovery_test_suite_with_multisync.patch (36K) 





--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886503.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 8:11 PM, Robert Haas  wrote:
>
> On Mon, Feb 8, 2016 at 12:08 AM, Amit Kapila 
wrote:
> > I think deciding it automatically without user require to configure it,
> > certainly has merits, but what about some cases where user can get
> > benefits by configuring themselves like the cases where we use
> > PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> > buffers and won't cause misaligned writes even for smaller chunk sizes
> > like 512 bytes or so).  Some googling [1] reveals that other databases
> > also provides user with option to configure wal block/chunk size (as
> > BLOCKSIZE), although they seem to decide chunk size based on
> > disk-sector size.
>
> Well, if you can prove that we need that flexibility, then we should
> have a GUC.  Where's the benchmarking data to support that conclusion?
>

It is not posted as some more work is needed to complete the
benchmarks results when PG_O_DIRECT is used (mainly with
open_sync and open_datasync).  I will do so.  But, I think main thing
which needs to be taken care is that as smaller-chunk sized writes are
useful only in some cases, we need to ensure that users should not
get baffled by the same.  So there are multiple ways to provide the same,

a) at the startup, we ensure that if the user has set smaller chunk-size
(other than 4KB which will be default as decided based on the way
described by you upthread at configure time) and it can use PG_O_DIRECT
as we decide in get_sync_bit(), then allow it, otherwise either return an
error or just set it to default which is 4KB.

b) mention in docs that it better not to tinker with wal_chunk_size guc
unless you have other relevant settings (like wal_sync_method =
open_sync or open_datasync) and wal_level as default.

c) there is yet another option which is, let us do with 4KB sized
chunks for now as the benefit for not doing so only in sub-set of
cases we can support.

The reason why I think it is beneficial to provide an option of writing in
smaller chunks is that it could lead to reduce the amount of re-writes
by higher percentage where they can be used.  For example at 4KB,
there is ~35% reduction, similarly at smaller chunks it could gives us
saving unto 50% or 70% depending on the chunk_size.


>
> > An additional thought, which is not necessarily related to this patch
is,
> > if user chooses and or we decide to write in 512 bytes sized chunks,
> > which is usually a disk sector size, then can't we think of avoiding
> > CRC for each record for such cases, because each WAL write in
> > it-self will be atomic.  While reading, if we process in wal-chunk-sized
> > units, then I think it should be possible to detect end-of-wal based
> > on data read.
>
> Gosh, taking CRCs off of WAL records sounds like a terrible idea.  I'm
> not sure why you think that writing in sector-sized chunks would make
> that any more safe, because to me it seems like it wouldn't.  But even
> if it does, it's hard to believe that we don't derive some reliability
> from CRCs that we would lose without them.
>

I think here the point is not about more-safety, rather it is about whether
writing in disk-sector sizes gives equal reliability as CRC's, because
if it does, then not doing crc calculation for each record both while
writing and during replay can save CPU and should intern lead to better
performance.  Now, the reason why I thought it could give equal-reliability
is that as disk-sector writes are atomic, so it should buy us that
reliability.
I admit that much more analysis/research is required before doing that
and we can do that later if it proves to be any valuable in terms of
performance and reliability.  Here, I mentioned to say that writing in
smaller chunks have other potential benefits.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Michael Paquier
On Fri, Feb 5, 2016 at 12:53 AM, Fabien COELHO  wrote:
>> Something is wrong with patch d.  I noticed two things,
>> 1. the total_weight stuff can overflow,
>
> It can generate an error on overflow by checking the total_weight while it
> is being computed. I've switched total_weight to int64 so it is now really
> impossible to overflow with the 32 bit int_max limit on weight.

+   /* compute total_weight */
+   for (i = 0; i < num_scripts; i++)
+   {
+   total_weight += sql_script[i].weight;
+
+   /* detect overflow... */
+   if (total_weight < 0)
+   {
+   fprintf(stderr, "script weight overflow at
script %d\n", i+1);
+   exit(1);
+   }
+   }
If let as int64, you may want to remove this overflow check, or keep
it as int32.

>> 2. the chooseScript stuff is broken, or something.
>
> Sorry, probably a <=/< error. I think I've fixed it and I've simplified the
> code a little bit.

+   w = getrand(thread, 0, total_weight - 1);
+   do
+   {
+   w -= sql_script[i++].weight;
+   } while (w >= 0);
+
+   return i - 1;
This portion looks fine to me.

>> Another thing is that the "transaction type" output really deserves some
>> more work.  I think "multiple scripts" really doesn't cut it; we should
>> have some YAML-like as in the latency reports, which lists all scripts
>> in use and their weights.
>
> For me the current output is clear for the reader, which does not mean that
> it cannot be improve, but how is rather a matter of taste.
>
> I've tried to improve it further, see attached patch.
>
> If you want something else, it would help to provide a sample of what you
> expect.

You could do that with an additional option here as well:
--output-format=normal|yamljson. The tastes of each user is different.
>> Attached is my version of the patch.  While you're messing with it, it'd
>> be nice if you added comments on top of your recently added functions
>> such as findBuiltin, process_builtin, chooseScript.
> Why not.

const char *name;
+   int weight;
Command   **commands;
-   StatsData stats;
+   StatsData   stats;
Noise here?

> Find attached a 18-d which addresses these concerns, and a actualized 18-e
> for the prefix.

(I could live without that personally)

-/* return builtin script "name", or fail if not found */
+/* return commands for selected builtin script, if unambiguous */
 static script_t *
 findBuiltin(const char *name)
This comment needs a refresh. This does not return a set of commands,
but the script itself.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-02-08 Thread David Steele
On 2/7/16 4:47 PM, Peter Eisentraut wrote:
> On 1/26/16 10:56 AM, Simon Riggs wrote:
>> Removing one of "archive" or "hot standby" will just cause confusion and
>> breakage, so neither is a good choice for removal.
>>
>> What we should do is 
>> 1. Map "archive" and "hot_standby" to one level with a new name that
>> indicates that it can be used for both/either backup or replication.
>>   (My suggested name for the new level is "replica"...)
>> 2. Deprecate "archive" and "hot_standby" so that those will be removed
>> in a later release.
> 
> Updated patch to reflect these suggestions.

-#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)
+#define XLogIsNeeded() (wal_level >= WAL_LEVEL_REPLICA)
<...>
-#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)
+#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)

Since these are identical now shouldn't one be removed?  I searched the
code and I couldn't find anything that looked dead (i.e. XLogIsNeeded()
&& !XLogStandbyInfoActive()) but it still seems like having both could
cause confusion.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 5:39 AM, Aleksander Alekseev
 wrote:
>> So: do we have clear evidence that we need 128 partitions here, or
>> might, say, 16 work just fine?
>
> Yes, this number of partitions was chosen based on this benchmark (see
> "spinlock array" column):
>
> http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

It's very hard to understand what that email is trying to say.  It
uses terms like PN and AN without defining them.  There's certainly
not enough detail for somebody else to replicate what you did.

> In fact we can choose 16, 32, 64 or 128 partitions - any number works
> better than current implementation with a single spinlock. But
> according to this benchmark 128 partitions give twice as much TPS then
> 16 partitions, almost as if we didn't have any locks at all (see "no
> locks" column).

OK.

> Still I'm a bit concerned that 128 partitions require (128-16)*
> ( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional
> memory per hash table which is:
>
> (gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*))
> $1 = 1904
>
> ... bytes of shared memory on amd64.
>
> I'm not sure if this is considered OK. If it is I suggest to leave
> number 128. If it's not, perhaps we could agree on say 64 partitions as
> a compromise. Described benchmark doesn't represent any possible
> workload anyway. I personally believe that we don't have so many small
> hash tables that 1.8 Kb of additional shared memory per hash table would
> be an issue.

I don't particularly care about using slightly more shared memory per
hash table.

>> I don't really want to increase the number of lock manager partition
>> locks to 128 just for the convenience of this patch, and even less do
>> I want to impose the requirement that every future shared hash table
>> must use an equal number of partitions.
>
> May I ask why? Does it create a real problem?

It's hard to be 100% sure about the future, but suppose that in
another 10 years we discover that the buffer mapping hash table now
needs to have 1024 partitions, but the lock manager partition hash
table still only needs 16.  Do we really want to drag the lock manager
and any future shared hash tables up to 1024 just because the buffer
manager needs it?  That sounds like a bad idea from here.  For
example, one advantage of having fewer partitions is that it takes
less time to lock them all.  That matters to - for example - the cost
of running the deadlock detector.  Is the cost enough to matter in
that particular case with the code as it stands?  I don't know.  Could
having more partitions ever be worse than having fewer?  I'm pretty
sure that the answer is yes.

>> I don't see any reason why the number of free list partitions needs
>> to be a constant, or why it needs to be equal to the number of hash
>> bucket partitions.  That just seems like a completely unnecessary
>> constraint. You can take the hash value, or whatever else you use to
>> pick the partition, modulo any number you want.
>
> True, number of spinlocks / freeLists could be a variable. I believe
> it's an old-good simplicity vs flexibility question.
>
> Lets say we allow user (i.e. calling side) to choose spinlocks and free
> lists number. So now we should use freeList[hash % items_number] formula
> which means division instead of binary shift. Do we really need such an
> expensive operation just to calculate index of a free list? Probably
> not. Let limit possible items_number values so it could be only power
> of two. Now there is only four possible values user would more likely to
> choose (16, 32, 64, 128), which is not as flexible anymore.

Sure, I don't see anything wrong with requiring the number of
partitions to be a power of two.  On the other hand, I also suspect
that on modern hardware the cost of a shift versus a modulo operator
is almost irrelevant.  What's really going to make this fast or slow
is the degree to which it has, or does not have, cache line
contention.

> What about memory? If we allocate mutex, nentries and freeList
> dynamically depending on items_number value, HASHHDR should store
> pointers to allocated memory which means additional indirection for
> almost every access to mutex, nentries and freeList fields. Probably a
> bad idea. Otherwise we still waste shared memory, probably even more
> memory since we don't want maximum items_number value to be too low.

These seem like problems that are admit of more than one good
solution, and which you are certainly capable of solving.  For
example, you could always include space for 128 freelists but not tie
the number of freelists to the number of partition locks, or you could
always include space for 128 freelists and have the system only use
them all if there are 128 or more partitions.  If there are fewer than
128 partitions, it uses only as many freelists as there are
partitions.  Neither of these things sounds very hard to implement.

Basically, the burden 

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Teodor Sigaev

Hi!

Interesting feature, but it's not very obvious how to use it. I'd like to see 
some example(s) in documentation.


And I see an implementation of AVL tree in psql source code 
(src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree 
implementation in
src/include/lib/rbtree.h and src/backend/lib/rbtree.c. Is any advantage of using 
AVL tree here? I have some doubt about that and this implementation, obviously, 
will not be well tested. But I see in comments that implementation is reduced to 
insert only and it doesn't use the fact of ordering tree, so, even hash could be 
used.


Daniel Verite wrote:

Pavel Stehule wrote:


1. maybe we can decrease name to shorter "crossview" ?? I am happy with
crosstabview too, just crossview is correct too, and shorter


I'm in favor of keeping crosstabview. It's more explicit, only
3 characters longer and we have tab completion anyway.


2. Columns used for ordering should not be displayed by default. I can live
with current behave, but hiding ordering columns is much more practical for
me


I can see why, but I'm concerned about a consequence:
say we have 4 columns A,B,C,D and user does \crosstabview +A:B +C:D
If B and D are excluded by default, then there's nothing
left to display inside the grid.
It doesn't feel quite right. There's something counter-intuitive
in the fact that values in the grid would disappear depending on
whether and how headers are sorted.
With the 3rd argument, we let the user decide what they want
to see.


3. This code is longer, so some regress tests are recommended - attached
simple test case


I've added a few regression tests to the psql testsuite
based on your sample data. New patch with these tests
included is attached, make check passes.

Best regards,






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2016-02-08 Thread David Rowley
On 7/02/2016 4:14 am, "Tom Lane"  wrote:
>
> David Rowley  writes:
> [ timestamp_out_speedup_2015-11-05.patch ]
>
> Pushed with a bunch of cosmetic tweaks.

Great. Thanks for pushing this.

David


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Andres Freund
Hi,

(x-posting to -hackers, more relevant audience)

On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> Now that there is a setting to give a cluster a "name", it would be nice to
> have an escape sequence in the log_line_prefix setting that could reference
> the cluster_name.

I've argued[1][2] for this when cluster_name was introduced, but back
then I seemed to have been the only one arguing for it. Josh later
jumped that train.

Given that we now had a number of people wishing for this, can we maybe
reconsider?

Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/CADLWmXUm%3D7Y3UORZnGMdSC6p1eymO0k0JkH4NKr4KstdWk0P7g%40mail.gmail.com
[2] 
http://archives.postgresql.org/message-id/20140505101007.gu12...@awork2.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier 
wrote:
>
>
> >>   /*
> >> +  * Fetch the progress position before taking any WAL insert
lock. This
> >> +  * is normally an operation that does not take long, but leaving
this
> >> +  * lookup out of the section taken an exclusive lock saves a
couple
> >> +  * of instructions.
> >> +  */
> >> + progress_lsn = GetProgressRecPtr();
> >
> > too long for my taste. How about:
> > /* get progress, before acuiring insert locks to shorten locked section
*/
>
> Check.
>

What is the need of holding locks one-by-one during checkpoint when
we anyway are going to take lock on all the insertion slots.

+ * to not rely on taking an exclusive lock an all the WAL insertion locks,

/an all/on all

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Andrew Dunstan



On 02/08/2016 03:16 AM, Pavel Stehule wrote:

Hi

On Russian PgConf I had a talk with Oleg about missing features in 
PLpgSQL, that can complicates a migrations from Oracle to PostgreSQL. 
Currently I see only one blocker - missing protected session 
variables. PL/SQL has package variables with possible only package 
scope and session life cycle. Currently we cannot to ensure/enforce 
schema scope visibility - and we cannot to implement this 
functionality in PL languages other than C.


I propose really basic functionality, that can be enhanced in future - 
step by step. This proposal doesn't contain any controversial feature 
or syntax, I hope. It is related to PLpgSQL only, but described 
feature can be used from any PL languages with implemented interface.



I think it would make sense to implement the interface in at least one 
of our other supported PLs. I'm not entirely clear how well this will 
match up with, say, plperl, but I'd be interested to see.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-08 Thread Robert Haas
On Sun, Feb 7, 2016 at 7:28 PM, Kouhei Kaigai  wrote:
> The new callbacks of T_ExtensibleNode will replace the necessity to
> form and deform process of private values, like as:
>   https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L114

Yeah.

> It transforms a bunch of internal data of CustomScan (similar to the
> extended fields in T_ExtensibleNode) to/from the node functions
> understandable forms for copy, input and output support.
> I think it implies you proposition is workable.
>
> I'd like to follow this proposition basically.
> On the other hands, I have two points I want to pay attention.
>
> 1. At this moment, it is allowed to define a larger structure that
> embeds CustomPath and CustomScanState by extension. How do we treat
> this coding manner in this case? Especially, CustomScanState has no
> private pointer dereference because it assumes an extension define
> a larger structure. Of course, we don't need to care about node
> operations on Path and PlanState nodes, but right now.

I see no real advantage in letting a CustomPath be larger.  If
custom_private can include extension-defined node types, that seems
good enough.  On the other hand, if CustomScanState can be larger,
that seems fine.   We don't really need any special support for that,
do we?

> 2. I intended to replace LibraryName and SymbolName fields from the
> CustomScanMethods structure by integration of extensible node type.
> We had to give a pair of these identifiers because custom scan provider
> has no registration points at this moment. A little concern is extension
> has to assume a particular filename of itself.
> But, probably, it shall be a separated discussion. My preference is
> preliminary registration of custom scan provider by its name, as well
> as extensible node.

Seems like we could just leave the CustomScan stuff alone and worry
about this as a separate facility.

> Towards the last question; whether *_private shall be void * or List *,
> I want to keep fdw_private and custom_private as List * pointer, because
> a new node type definition is a bit overdone job if this FDW or CSP will
> take only a few private fields with primitive data types.
> It is a preferable features when extension defines ten or more private
> fields.

Well, I suggested Node *, not void *.  A Node can be a List, but not
every Node is a List.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
2016-02-08 13:53 GMT+01:00 Marko Tiikkaja :

> On 08/02/16 13:41, Pavel Stehule wrote:
>
>> 2016-02-08 13:22 GMT+01:00 Marko Tiikkaja :
>>
>>> Personally I find that undesirable.  I don't know what oracle does, but
>>> variables being visible without schema-qualifying them can introduce
>>> variable conflicts in PL/PgSQL.  I'd prefer if you could only refer to
>>> them
>>> by prefixing them with the schema name (or maybe allow search_path to be
>>> used).
>>>
>>
>> I hope so there are not new conflicts - schema variable is not directly
>> visible from SQL (in this iteration) - they are visible only from
>> functions
>> - and the behave is same like global plpgsql variable. So schema variable
>> can be in conflict with SQL identifier only exactly identically as plpgsql
>> variable
>>
>
> Yeah, and that's exactly what I don't want, because that means that CREATE
> SCHEMA VARIABLE suddenly breaks existing code.
>

theoretically yes, but this conflict can be 100% detected - so no quiet bug
is possible, and plpgsql_check can find this issue well. If you don't use
schema variable, then your code will be correct. You have to explicitly
create the variable, and if there will be any problem, then the problem
will be only in PL functions in one schema. And you can identify it by
statical analyse.


>
> But prefix can be used.
>>
>
> Sure, but I don't see the point.  Is there a reason not to require such
> variable references to be prefixed with the schema name?  Or explicitly
> bring them into scope in the DECLARE section somehow.
>

we can define any rules, but I see better to be consistent with current
variables design. I don't prefer any mandatory prefixes when it is not
necessary.

explicit safe prefixing can be used by developers "_" local variable, "__"
schema variable.

I though about DECLARE section too. But more declarations, more copy/paste
or different bugs, and these bugs can be worse detected by static analyse.


>
>
> .m
>


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 12:08 AM, Amit Kapila  wrote:
> I think deciding it automatically without user require to configure it,
> certainly has merits, but what about some cases where user can get
> benefits by configuring themselves like the cases where we use
> PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> buffers and won't cause misaligned writes even for smaller chunk sizes
> like 512 bytes or so).  Some googling [1] reveals that other databases
> also provides user with option to configure wal block/chunk size (as
> BLOCKSIZE), although they seem to decide chunk size based on
> disk-sector size.

Well, if you can prove that we need that flexibility, then we should
have a GUC.  Where's the benchmarking data to support that conclusion?

> An additional thought, which is not necessarily related to this patch is,
> if user chooses and or we decide to write in 512 bytes sized chunks,
> which is usually a disk sector size, then can't we think of avoiding
> CRC for each record for such cases, because each WAL write in
> it-self will be atomic.  While reading, if we process in wal-chunk-sized
> units, then I think it should be possible to detect end-of-wal based
> on data read.

Gosh, taking CRCs off of WAL records sounds like a terrible idea.  I'm
not sure why you think that writing in sector-sized chunks would make
that any more safe, because to me it seems like it wouldn't.  But even
if it does, it's hard to believe that we don't derive some reliability
from CRCs that we would lose without them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Andres Freund
On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> I think deciding it automatically without user require to configure it,
> certainly has merits, but what about some cases where user can get
> benefits by configuring themselves like the cases where we use
> PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> buffers and won't cause misaligned writes even for smaller chunk sizes
> like 512 bytes or so).  Some googling [1] reveals that other databases
> also provides user with option to configure wal block/chunk size (as
> BLOCKSIZE), although they seem to decide chunk size based on
> disk-sector size.

FWIW, you usually can't do that small writes with O_DIRECT. Usually it
has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
filesystems that do support doing such writes, they essentially fall
back to doing buffered IO.

> An additional thought, which is not necessarily related to this patch is,
> if user chooses and or we decide to write in 512 bytes sized chunks,
> which is usually a disk sector size, then can't we think of avoiding
> CRC for each record for such cases, because each WAL write in
> it-self will be atomic.  While reading, if we process in wal-chunk-sized
> units, then I think it should be possible to detect end-of-wal based
> on data read.

O_DIRECT doesn't give any useful guarantees to do something like the
above. It doesn't have any ordering or durability implications. You
still need to do fdatasyncs and such.

Besides, with the new CRC implications, that doesn't really seem like
such a large win anyway.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Andres Freund
Hi Fabien,

On 2016-02-04 16:54:58 +0100, Andres Freund wrote:
> I don't want to post a full series right now, but my working state is
> available on
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush
> git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush
> 
> The main changes are that:
> 1) the significant performance regressions I saw are addressed by
>changing the wal writer flushing logic
> 2) The flushing API moved up a couple layers, and now deals with buffer
>tags, rather than the physical files
> 3) Writes from checkpoints, bgwriter and files are flushed, configurable
>by individual GUCs. Without that I still saw the spiked in a lot of 
> circumstances.
> 
> There's also a more experimental reimplementation of bgwriter, but I'm
> not sure it's realistic to polish that up within the constraints of 9.6.

Any comments before I spend more time polishing this? I'm currently
updating docs and comments to actually describe the current state...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-02-08 Thread Shulgin, Oleksandr
On Mon, Jan 25, 2016 at 5:11 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
>
> On Sat, Jan 23, 2016 at 11:22 AM, Tomas Vondra <
tomas.von...@2ndquadrant.com> wrote:
>>
>>
>> Overall, I think this is really about deciding when to cut-off the MCV,
so that it does not grow needlessly large - as Robert pointed out, the
larger the list, the more expensive the estimation (and thus planning).
>>
>> So even if we could fit the whole sample into the MCV list (i.e. we
believe we've seen all the values and we can fit them into the MCV list),
it may not make sense to do so. The ultimate goal is to estimate
conditions, and if we can do that reasonably even after cutting of the
least frequent values from the MCV list, then why not?
>>
>> From this point of view, the analysis concentrates deals just with the
ANALYZE part and does not discuss the estimation counter-part at all.
>
>
> True, this aspect still needs verification.  As stated, my primary
motivation was to improve the plan stability for relatively short MCV lists.
>
> Longer MCV lists might be a different story, but see "Increasing stats
target" section of the original mail: increasing the target doesn't give
quite the expected results with unpatched code either.

To address this concern I've run my queries again on the same dataset, now
focusing on how the number of MCV items changes with the patched code
(using the CTEs from my original mail):

WITH ...

SELECT count(1),
   min(num_mcv)::real,
   avg(num_mcv)::real,
   max(num_mcv)::real,
   stddev(num_mcv)::real

  FROM stats2

 WHERE num_mcv IS NOT NULL;

(ORIGINAL)
count  | 27452
min| 1
avg| 32.7115
max| 100
stddev | 40.6927

(PATCHED)
count  | 27527
min| 1
avg| 38.4341
max| 100
stddev | 43.3596

A significant portion of the MCV lists is occupying all 100 slots available
with the default statistics target, so it also interesting to look at the
stats that habe "underfilled" MCV lists (by changing the condition of the
WHERE clause to read "num_mcv < 100"):

(<100 ORIGINAL)
count  | 20980
min| 1
avg| 11.9541
max| 99
stddev | 18.4132

(<100 PATCHED)
count  | 19329
min| 1
avg| 12.3222
max| 99
stddev | 19.6959

As one can see, with the patched code the average length of MCV lists
doesn't change all that dramatically, while at the same time exposing all
the improvements described in the original mail.

>> After fixing the estimator to consider fraction of NULLs, the estimates
look like this:
>>
>> statistics target |   master  |  patched
>>--
>>   100 | 1302  | 5356
>>  1000 | 6022  | 6791
>>
>> So this seems to significantly improve the ndistinct estimate (patch
attached).
>
>
> Hm... this looks correct.  And compute_distinct_stats() needs the same
treatment, obviously.

I've incorporated this fix into the v2 of my patch, I think it is related
closely enough.  Also, added corresponding changes to
compute_distinct_stats(), which doesn't produce a histogram.

I'm adding this to the next CommitFest.  Further reviews are very much
appreciated!

--
Alex
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 070df29..cbf3538 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***
*** 2079,2095 
  		denom,
  		stadistinct;
  
! 			numer = (double) samplerows *(double) d;
  
! 			denom = (double) (samplerows - f1) +
! (double) f1 *(double) samplerows / totalrows;
  
  			stadistinct = numer / denom;
  			/* Clamp to sane range in case of roundoff error */
  			if (stadistinct < (double) d)
  stadistinct = (double) d;
! 			if (stadistinct > totalrows)
! stadistinct = totalrows;
  			stats->stadistinct = floor(stadistinct + 0.5);
  		}
  
--- 2079,2099 
  		denom,
  		stadistinct;
  
! 			double		samplerows_nonnull = samplerows - null_cnt;
! 			double		totalrows_nonnull
! 			= totalrows * (1.0 - stats->stanullfrac);
  
! 			numer = samplerows_nonnull * (double) d;
! 
! 			denom = (samplerows_nonnull - f1) +
! (double) f1 * samplerows_nonnull / totalrows_nonnull;
  
  			stadistinct = numer / denom;
  			/* Clamp to sane range in case of roundoff error */
  			if (stadistinct < (double) d)
  stadistinct = (double) d;
! 			if (stadistinct > totalrows_nonnull)
! stadistinct = totalrows_nonnull;
  			stats->stadistinct = floor(stadistinct + 0.5);
  		}
  
***
*** 2120,2146 
  		}
  		else
  		{
  			double		ndistinct = stats->stadistinct;
- 			double		avgcount,
- 		mincount;
  
  			if (ndistinct < 0)
! ndistinct = -ndistinct * totalrows;
! 			/* estimate # of occurrences in sample of a typical value */
! 			avgcount = (double) samplerows / ndistinct;
! 			/* set minimum threshold count to store a value */
! 			mincount = avgcount * 1.25;
! 			if (mincount < 2)
! mincount = 2;
  			if (num_mcv > track_cnt)

[HACKERS] process type escape for log_line_prefix

2016-02-08 Thread Andres Freund
Hi,

Frequently when reading postgres logs to do some post mortem analysis
I'm left wondering what process emitted an error/log message. After the
fact it's often hard to know wether an error message was emitted by a
user backend or by something internal, say the checkpointer or
autovacuum.  Logging all process starts is often impractical given the
log volume that causes.

So I'm proposing adding an escape displaying the process title (say 'k'
for kind?). So %k would emit something like "autovacuum worker process",
"wal sender process" etc.

I'm thinking it might make sense to give normal connections "" as the
name, they're usually already discernible.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Andres Freund
On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
> group-locking-v1.patch is a vastly improved version of the group
> locking patch that we discussed, uh, extensively last year.  I realize
> that there was a lot of doubt about this approach, but I still believe
> it's the right approach, I have put a lot of work into making it work
> correctly, I don't think anyone has come up with a really plausible
> alternative approach (except one other approach I tried which turned
> out to work but with significantly more restrictions), and I'm
> committed to fixing it in whatever way is necessary if it turns out to
> be broken, even if that amounts to a full rewrite.  Review is welcome,
> but I honestly believe it's a good idea to get this into the tree
> sooner rather than later at this point, because automated regression
> testing falls to pieces without these changes, and I believe that
> automated regression testing is a really good idea to shake out
> whatever bugs we may have in the parallel query stuff.  The code in
> this patch is all mine, but Amit Kapila deserves credit as co-author
> for doing a lot of prototyping (that ended up getting tossed) and
> testing.  This patch includes comments and an addition to
> src/backend/storage/lmgr/README which explain in more detail what this
> patch does, how it does it, and why that's OK.

I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.

I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Daniel Verite
Teodor Sigaev wrote:

> Interesting feature, but it's not very obvious how to use it. I'd like to
> see  some example(s) in documentation.

I'm thinking of making a wiki page, because examples pretty much
require showing resultsets, and I'm not sure this would fly
with our current psql documentation, which is quite compact.

The current bit of doc I've produced is 53 lines long in manpage
format already. The text has not been proofread by a native
English speaker yet, so part of the problem might be that
it's just me struggling with english :)

> And I see an implementation of AVL tree in psql source code
> (src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree
> implementation in src/include/lib/rbtree.h and
> src/backend/lib/rbtree.c. Is any advantage of using AVL tree here? I
> have some doubt about that and this implementation, obviously, will
> not be well tested. But I see in comments that implementation is
> reduced to insert only and it doesn't use the fact of ordering tree,
> so, even hash could be used.

Yes. I expect too that a RB tree or a hash-based algorithm would do
the job and perform well.

The AVL implementation in crosstabview is purposely simplified
and specialized for this job, resulting in ~185 lines of code
versus ~850 lines for rb-tree.c
But I understand the argument that the existing rb-tree has been
battle-tested, whereas this code hasn't.

I'm looking at rb-tree.c and thinking what it would take to
incorporate it:
1. duplicating or linking from backend/lib/rb-tree.c into psql/
2. replacing the elog() calls with something else in the case of psql
3. updating the crosstabview data structures and call sites.

While I'm OK with #3, #1 and #2 seem wrong.
I could adapt rb-tree.c so that the same code can be used
both client-side and server-side, but touching server-side
code for this feature and creating links in the source tree
feels invasive and overkill.

Another approach is to replace AVL with an hash-based algorithm,
but that raises the same sort of question. If crosstabview comes
with its specific implementation, why use that rather than existing
server-side code? But at a glance, utils/hash/dynahash.c seems quite
hard to convert for client-side use.

I'm open to ideas on this. In particular, if we have a hash table
implementation that is already blessed by the project and small enough
to make sense in psql, I'd be happy to consider it.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:
> On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
>> group-locking-v1.patch is a vastly improved version of the group
>> locking patch that we discussed, uh, extensively last year.  I realize
>> that there was a lot of doubt about this approach, but I still believe
>> it's the right approach, I have put a lot of work into making it work
>> correctly, I don't think anyone has come up with a really plausible
>> alternative approach (except one other approach I tried which turned
>> out to work but with significantly more restrictions), and I'm
>> committed to fixing it in whatever way is necessary if it turns out to
>> be broken, even if that amounts to a full rewrite.  Review is welcome,
>> but I honestly believe it's a good idea to get this into the tree
>> sooner rather than later at this point, because automated regression
>> testing falls to pieces without these changes, and I believe that
>> automated regression testing is a really good idea to shake out
>> whatever bugs we may have in the parallel query stuff.  The code in
>> this patch is all mine, but Amit Kapila deserves credit as co-author
>> for doing a lot of prototyping (that ended up getting tossed) and
>> testing.  This patch includes comments and an addition to
>> src/backend/storage/lmgr/README which explain in more detail what this
>> patch does, how it does it, and why that's OK.
>
> I see you pushed group locking support. I do wonder if somebody has
> actually reviewed this? On a quick scrollthrough it seems fairly
> invasive, touching some parts where bugs are really hard to find.
>
> I realize that this stuff has all been brewing long, and that there's
> still a lot to do. So you gotta keep moving. And I'm not sure that
> there's anything wrong or if there's any actually better approach. But
> pushing an unreviewed, complex patch, that originated in a thread
> orginally about different relatively small/mundane items, for a
> contentious issue, a few days after the initial post. Hm. Not sure how
> you'd react if you weren't the author.

Probably not very well.  Do you want me to revert it?

I mean, look.  Without that patch, parallel query is definitely
broken.  Just revert the patch and try running the regression tests
with force_parallel_mode=regress and max_parallel_degree>0.  It hangs
all over the place.  With the patch, every regression test suite we
have runs cleanly with those settings.  Without the patch, it's
trivial to construct a test case where parallel query experiences an
undetected deadlock.  With the patch, it appears to work reliably.
Could there bugs someplace?  Yes, there absolutely could.  Do I really
think anybody was going to spend the time to understand deadlock.c
well enough to verify my changes?  No, I don't.  What I think would
have happened is that the patch would have sat around like an
albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.

I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined
with the other patch which I also pushed, it enables *actual automated
regression testing* of the parallelism code, which I personally think
is a really good thing - and I'd like to see the buildfarm doing that
as soon as possible, so that we can find some of those bugs before
we're deep in beta.  Not just bugs in group locking but all sorts of
parallelism bugs that might be revealed by end-to-end testing.  The
*entire stack of patches* that began this thread was a response to
problems that were found by the automated testing that you can't do
without this patch.  Those bug fixes resulted in a huge increase in
the robustness of parallel query, and that would not have happened
without this code.  Every single one of those problems, some of them
in commits dating back years, was detected by the same method: run the
regression tests with parallel mode and parallel workers used for
every query for which that seems to be safe.

And, by the way, the patch, aside from the deadlock.c portion, was
posted back in October, admittedly without much fanfare, but nobody
reviewed that or any other patch on this thread.  If I'd waited for
those reviews to come in, parallel query would not be committed now,
nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
project would just be impossible on its face.  It would be impossible
in the first instance if I did not have a commit 

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Fabien COELHO


Hello Andres,


Any comments before I spend more time polishing this?


I'm running tests on various settings, I'll send a report when it is done.
Up to now the performance seems as good as with the previous version.

I'm currently updating docs and comments to actually describe the 
current state...


I did notice the mismatched documentation.

I think I would appreciate comments to understand why/how the ringbuffer 
is used, and more comments in general, so it is fine if you improve this 
part.


Minor details:

"typedefs.list" should be updated to WritebackContext.

"WritebackContext" is a typedef, "struct" is not needed.


I'll look at the code more deeply probably over next weekend.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 10:45 AM, Robert Haas wrote:

On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:

On 2016-02-02 15:41:45 -0500, Robert Haas wrote:



I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.


Probably not very well.  Do you want me to revert it?


If I am off base, please feel free to yell Latin at me again but isn't 
this exactly what different trees are for in Git? Would it be possible 
to say:


Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism 
fixes do"?


I can't review this patch but I can run a test suite on a number of 
platforms and see if it behaves as expected.




albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.


4. We need to push the release so we can test this.



I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined


I think this further points to the need for more reviewers and less 
feature pushes. There are fundamental features that we could use, this 
is one of them. It is certainly more important than say pgLogical or BDR 
(not that those aren't useful but that we do have external solutions for 
that problem).




Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept.  Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites.  You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.


Test test test. Please commit.

Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: SET ROLE hook

2016-02-08 Thread Alvaro Herrera
Joe Conway wrote:
> On 01/06/2016 10:36 AM, Tom Lane wrote:
> > I think a design that was actually somewhat robust would require two
> > hooks, one at check_role and one at assign_role, wherein the first one
> > would do any potentially-failing work and package all required info into
> > a blob that could be passed through to the assign hook.
> 
> Fair enough -- will rework the patch and example code.

Returned with feedback.  Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Vik Fearing
On 02/08/2016 09:33 PM, Filip Rembiałkowski wrote:
> Here is my next try, after suggestions from -perf and -hackers list:
> 
> * no GUC
> 
> * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT
> 
> * corresponding, 3-argument version of pg_notify(text,text,bool)
> 
> * updated the docs to include new syntax and clarify behavior
> 
> * no hashtable in AsyncExistsPendingNotify
>  (I don't see much sense in that part; and it can be well done
> separately from this)

Please add this to the next commitfest:
https://commitfest.postgresql.org/9/new/
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Alvaro Herrera
Pavel Stehule wrote:

> I am looking on it

Thanks, closing as returned-with-feedback.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 01:07 PM, Robert Haas wrote:

Hi,

One of the questions I have about parallel query is whether it should
be enabled by default.  That is, should we make the default value of
max_parallel_degree to a value higher than 0?  Perhaps 1, say?


O.k. after some googling where I found your fantastic blog on the 
subject, max_parallel_degree looks like it should be 
max_parallel_workers. Am I correct in assuming that given the 
opportunity to use parallel workers, postgres will launch N number of 
workers up to the value of max_parallel_degree ?


If so, then I think 1 or 2 would be reasonable. By far the majority of 
servers are going to have at least two cores.




There are some good reasons why this might be a bad idea, such as:

- As discussed on a nearby thread, there is every possibility of nasty bugs.


Which we won't find if it doesn't get turned on.



- Parallel query uses substantially more resources than a regular
query, which might overload your system.


How much of a reality is that? Isn't this something we could just cover 
in the release notes?




On the other hand:

- Features that are on by default get more testing and thus might get
less buggy more quickly.


Correct.


- A lot of people don't change the default configuration and thus
wouldn't get any benefit from the feature if it's not on by default.



Correct.


Thoughts?



+1 on enabling by default.

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count_nulls(VARIADIC "any")

2016-02-08 Thread Thomas Munro
On Tue, Feb 9, 2016 at 9:26 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Would num_values be a better name than num_nonnulls?
>
> If "value" is a term that excludes null values, it's news to me.

Ah, right, I was thinking of null as the absence of a value.  But in
fact it is a special value that indicates the absence of a "data
value".  And num_data_values doesn't sound great.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 4:11 PM, Peter Geoghegan  wrote:
> All that I wanted to do was look at EXPLAIN ANALYZE output that showed
> a parallel seq scan on my laptop, simply because I wanted to see a
> cool thing happen. I had to complain about it [1] to get clarification
> from you [2].
>
> I accept that this might have been a somewhat isolated incident (that
> I couldn't easily get *at least* a little instant gratification), but
> it still should be avoided. You've accused me of burying the lead
> plenty of times. Don't tell me that it was too hard to prominently
> place those details somewhere where I or any other contributor could
> reasonably expect to find them, like the CF app page, or a wiki page
> that is maintained on an ongoing basis (and linked to at the start of
> each thread). If I said that that was too much to you, you'd probably
> shout at me. If I persisted, you wouldn't commit my patch, and for me
> that probably means it's DOA.
>
> I don't think I'm asking for much here.

I don't think you are asking for too much; what I think is that Amit
and I were trying to do exactly the thing you asked for, and mostly
did.  On March 20th, Amit posted version 11 of the sequential scan
patch, and included directions about the order in which to apply the
patches:

http://www.postgresql.org/message-id/CAA4eK1JSSonzKSN=l-dwucewdlqkbmujvfpe3fgw2tn2zpo...@mail.gmail.com

On March 25th, Amit posted version 12 of the sequential scan patch,
and again included directions about which patches to apply:

http://www.postgresql.org/message-id/caa4ek1l50y0y1ogt_dh2eouyq-rqcnpvjboon2pcgjq+1by...@mail.gmail.com

On March 27th, Amit posted version 13 of the sequential scan patch,
which did not include those directions:

http://www.postgresql.org/message-id/caa4ek1lfr8sr9viuplpmkrquvcrhefdjsz1019rpwgjyftr...@mail.gmail.com

While perhaps Amit might have included directions again, I think it's
pretty reasonable that he felt that it might not be entirely necessary
to do so given that he had already done it twice in the last week.
This was still the state of affairs when you asked your question on
April 20th.  Two days after you asked that question, Amit posted
version 14 of the patch, and again included directions about what
patches to apply:

http://www.postgresql.org/message-id/caa4ek1jlv+2y1awjhsqpfiskhbf7jwf_nzirmzyno9upbrc...@mail.gmail.com

Far from the negligence that you seem to be implying, I think Amit was
remarkably diligent about providing these kinds of updates.  I
admittedly didn't duplicate those same updates on the parallel
mode/contexts thread to which you replied, but that's partly because I
would often whack around that patch first and then Amit would adjust
his patch to cope with my changes after the fact.  That doesn't seem
to have been the case in this particular example, but if this is the
closest thing you can come up with to a process failure during the
development of parallel query, I'm not going to be sad about it: I'm
going to have a beer.  Seriously: we worked really hard at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Andrew Dunstan



On 02/08/2016 02:15 PM, Tom Lane wrote:

Of late, by far the majority of the random-noise failures we see in the
buildfarm have come from failure to shut down the postmaster in a
reasonable timeframe.  An example is this current failure on hornet:

http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2016-02-08%2013%3A41%3A55

waiting for server to shut 
down...
 failed
pg_ctl: server does not shut down
=== db log file ==
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:2] LOG:  received fast shutdown 
request
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:3] LOG:  aborting any active 
transactions
2016-02-08 15:14:35.783 UTC [56b8b00d.7e0028:2] LOG:  autovacuum launcher 
shutting down
2016-02-08 15:14:35.865 UTC [56b8b00d.2e5000a:1] LOG:  shutting down

The buildfarm script runs pg_ctl stop with -t 120, and counting the dots
shows that pg_ctl was honoring that, so apparently the postmaster took
more than 2 minutes to shut down.

Looking at other recent successful runs, stopdb-C-1 usually takes 30 to
40 or so seconds on this machine.  So it's possible that it was just so
overloaded that it took three times that much time on this run, but I'm
starting to think there may be more to it than that.  We've seen variants
on this theme on half a dozen machines just in the past week --- and it
seems to mostly happen in 9.5 and HEAD, which is fishy.

The fact that we got "shutting down" but not "database system is shut
down" indicates that the server was in ShutdownXLOG().  Normally the
only component of that that takes much time is buffer flushing, but
could something else be happening?  Or perhaps the checkpoint code
has somehow not gotten the word to do its flushing at full speed?

What I'd like to do to investigate this is put in a temporary HEAD-only
patch that makes ShutdownXLOG() and its subroutines much chattier about
how far they've gotten and what time it is, and also makes pg_ctl print
out the current time if it gives up waiting.  A few failed runs with
that in place will at least allow us to confirm or deny whether it's
just that the shutdown checkpoint is sometimes really slow, or whether
there's a bug lurking.

Any objections?  Anybody have another idea for data to collect?





I think that's an excellent idea. This has been a minor running sore for 
ages on slow machines.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas  wrote:
> So, there may be a person who knows how to do all of that
> work and get it done in a reasonable time frame and also knows how to
> make sure that everybody has the opportunity to be as involved in the
> process as they want to be and that there are no bugs or controversial
> design decisions, but I am not that person.  I am doing my best.
>
>> To be more specific, I thought it was really hard to test parallel
>> sequential scan a few months ago, because there was so many threads
>> and so many dependencies. I appreciate that we now use git
>> format-patch patch series for complicated stuff these days, but it's
>> important to make it clear how everything fits together. That's
>> actually what I was thinking about when I said we need to be clear on
>> how things fit together from the CF app patch page, because there
>> doesn't seem to be a culture of being particular about that, having
>> good "annotations", etc.
>
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.

All that I wanted to do was look at EXPLAIN ANALYZE output that showed
a parallel seq scan on my laptop, simply because I wanted to see a
cool thing happen. I had to complain about it [1] to get clarification
from you [2].

I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] 
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Pavel Stehule
Hi



> FWIW I think the general idea of this feature (client-side resultset
> "pivoting") is a good one, but I don't really have an opinion regarding
> your specific proposal.  I think you should first seek some more
> consensus about the proposed design; in your original thread [1] several
> guys defended the idea of having this be a psql feature, and the idea of
> this being a parallel to \x seems a very sensible one, but there's
> really been no discussion on whether your proposed "+/-" syntax to
> change sort order makes sense, for one thing.
>

I am sorry, but I disagree - the discussion about implementation was more
than two months, and I believe so anybody who would to discuss had enough
time to discuss. This feature and design was changed significantly and
there was not anybody who sent feature design objection.

This feature has only small relation to SQL PIVOTING feature - it is just
form of view and I am agree with Daniel about sense of this feature.

Regards

Pavel


>
> So please can we have that wiki page so that the syntax can be hammered
> out a bit more.
>
> I'm closing this as returned-with-feedback for now.
>
> [1] It's a good idea to add links to previous threads where things were
> discussed.  I had to search for
> www.postgresql.org/message-id/flat/78543039-c708-4f5d-a66f-0c0fbcda1f76@mm
> because you didn't provide a link to it when you started the second
> thread.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Andres Freund
Hi,

On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?
> 
> There are some good reasons why this might be a bad idea, such as:
> 
> - As discussed on a nearby thread, there is every possibility of nasty
> bugs.

I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Tom Lane
Robert Haas  writes:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?

I'm not sure I'm on board with that as a releaseable default, but there
certainly would be an argument for turning it on from now to say mid
beta, so as to improve test coverage.  I think we've done similar things
in the past.

I don't understand however how that doesn't break the regression tests?
Surely we've got lots of EXPLAIN queries that would change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 1:24 PM, Andres Freund  wrote:
> I think that's an argument to enable it till at least beta1. Let's
> change the default, and add an item to the open items list to reconsider
> then.

+1.

Reminds me of what happened with the num_xloginsert_locks GUC (it was
eventually replaced with a #define before release, though).



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Alvaro Herrera
Pavel Stehule wrote:

> > FWIW I think the general idea of this feature (client-side resultset
> > "pivoting") is a good one, but I don't really have an opinion regarding
> > your specific proposal.  I think you should first seek some more
> > consensus about the proposed design; in your original thread [1] several
> > guys defended the idea of having this be a psql feature, and the idea of
> > this being a parallel to \x seems a very sensible one,

Sorry, I meant \q here, not \x.

> > but there's really been no discussion on whether your proposed "+/-"
> > syntax to change sort order makes sense, for one thing.
> 
> I am sorry, but I disagree - the discussion about implementation was more
> than two months, and I believe so anybody who would to discuss had enough
> time to discuss. This feature and design was changed significantly and
> there was not anybody who sent feature design objection.

I just rechecked the thread.  In my reading, lots of people argued
whether it should be called \rotate or \pivot or \crosstab; it seems the
\crosstabview proposal was determined to be best.  I can support that
decision.  But once the details were discussed, it was only you and
Daniel left in the thread; nobody else participated.  While I understand
that you may think that "silence is consent", what I am afraid of is
that some committer will look at this two months from now and say "I
hate this Hcol+ stuff, -1 from me" and send the patch back for syntax
rework.  IMO it's better to have more people chime in here so that the
patch that we discuss during the next commitfest is really the best one
we can think of.

Also, what about the business of putting "x" if there's no third column?
Three months from now some Czech psql hacker will say "we should use
Unicode chars for this" and we will be forever stuck with \pset
unicode_crosstab_marker to change the character to a ☑ BALLOT BOX WITH
CZECH.  Maybe we should think that a bit harder -- for example, what
about just rejecting the case with no third column and forcing the user
to add a third column with the character they choose?  That way you
avoid that mess.

> This feature has only small relation to SQL PIVOTING feature - it is just
> form of view and I am agree with Daniel about sense of this feature.

Yes, I don't disagree there.  Robert Haas and David Fetter also
expressed their support for psql-side processing, so I think we're good
there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Magnus Hagander
On Mon, Feb 8, 2016 at 10:19 PM, Alvaro Herrera 
wrote:

> Hi everybody,
>
> I just closed the last few remaining items in the commitfest.  This is
> the final summary:
>
>  Committed: 32.
>  Moved to next CF: 32.
>  Rejected: 2.
>  Returned with Feedback: 33.
> Total: 99.
>
> I think we did a fairly decent job this time around: we only passed a
> third of the patches to the upcoming commitfest, which seems pretty
> decent.  The number of patches that we moved unreviewed was very small,
> which is even better -- hopefully we're not letting too many people
> down.
>
> I'm closing this commitfest now.  We even have three weeks before the
> next one starts, so everybody can take a break for once!  Yay!


Thanks for taking on the thankless task of pushing things along!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-08 Thread Alvaro Herrera
Since things are clearly still moving here, I closed it as
returned-with-feedback.  Please submit to the next CF so that we don't
lose it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Fabien COELHO


Hello Michaël,


+   /* compute total_weight */
+   for (i = 0; i < num_scripts; i++)
+   {
+   total_weight += sql_script[i].weight;
+
+   /* detect overflow... */
If let as int64, you may want to remove this overflow check, or keep
it as int32.


I'd rather keep int64, and remove the check.


[JSON/YAML]
If you want something else, it would help to provide a sample of what you
expect.


You could do that with an additional option here as well:
--output-format=normal|yamljson. The tastes of each user is different.


I think that json/yaml-ifying pgbench output is beyond the object of this
patch, so should be kept out?


   const char *name;
+   int weight;
   Command   **commands;
-   StatsData stats;
+   StatsData   stats;
Noise here?


Indeed.


Find attached a 18-d which addresses these concerns, and a actualized 18-e
for the prefix.


(I could live without that personally)


Hmmm, I type them and I'm not so good with a keyboard, so "se" is better 
than:


"selct-onlyect-only".


-/* return builtin script "name", or fail if not found */
+/* return commands for selected builtin script, if unambiguous */
static script_t *
findBuiltin(const char *name)
This comment needs a refresh. This does not return a set of commands,
but the script itself.


Indeed.

Attached 19-d and 19-e.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..780a520 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 With special name list, show the list of builtin scripts
@@ -321,12 +323,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -686,9 +690,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1135,12 +1143,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: builtin: TPC-B (sort of)
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7eb6a2d..a73e289 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -299,23 +302,26 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
 	char	   *commands;		/* actual pgbench script */
-}
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ 

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-08 Thread Alvaro Herrera
I've closed this as returned-with-feedback.  Please resubmit once you
have found answers to the questions posed; from the submitted benchmark
numbers this looks very exciting, but it needs a bit more work.  You
don't necessarily have to agree with everything Robert says, but you
need to have well reasoned explanations for the points where you
disagree.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-08 Thread Fabien COELHO


Hello,


v23 attached, which does not change the message but does the other fixes.


This doesn't apply anymore


Indeed, but the latest version was really v25.


-- please rebase and submit to the next CF.


I already provided it as v25 on Feb 1st.


I closed it here as returned with feedback.


I turned it to "moved to next CF" as the patch is already on the queue.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Andres Freund
On 2016-02-08 19:52:30 +0100, Fabien COELHO wrote:
> I think I would appreciate comments to understand why/how the ringbuffer is
> used, and more comments in general, so it is fine if you improve this part.

I'd suggest to leave out the ringbuffer/new bgwriter parts. I think
they'd be committed separately, and probably not in 9.6.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Thomas Munro
On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway  wrote:
> On 02/08/2016 06:24 AM, Andres Freund wrote:
>> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>>> Now that there is a setting to give a cluster a "name", it would be nice to
>>> have an escape sequence in the log_line_prefix setting that could reference
>>> the cluster_name.
>>
>> I've argued[1][2] for this when cluster_name was introduced, but back
>> then I seemed to have been the only one arguing for it. Josh later
>> jumped that train.
>>
>> Given that we now had a number of people wishing for this, can we maybe
>> reconsider?
>
> Seems like a reasonable idea to me. But if we add a log_line_prefix
> setting, shouldn't we also add it to csvlog output too?

Here's a tiny patch adding support for %C to log_line_prefix (this was
part of the cluster_name patch that didn't go it).

Given that csvlog's output format is hardcoded in write_csvlog, how is
it supposed to evolve without upsetting consumers of this data?
Wouldn't we first need to add a GUC that lets you control the columns
it outputs?

-- 
Thomas Munro
http://www.enterprisedb.com


cluster_name_log_line_prefix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Alvaro Herrera
Tom Lane wrote:
> Of late, by far the majority of the random-noise failures we see in the
> buildfarm have come from failure to shut down the postmaster in a
> reasonable timeframe.

I noticed that.

> An example is this current failure on hornet:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2016-02-08%2013%3A41%3A55

FWIW you seem to have edited this URL -- it returns a blank page.
The right one is
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2016-02-08%2013%3A41%3A55=stopdb-C-1

> What I'd like to do to investigate this is put in a temporary HEAD-only
> patch that makes ShutdownXLOG() and its subroutines much chattier about
> how far they've gotten and what time it is, and also makes pg_ctl print
> out the current time if it gives up waiting.  A few failed runs with
> that in place will at least allow us to confirm or deny whether it's
> just that the shutdown checkpoint is sometimes really slow, or whether
> there's a bug lurking.
> 
> Any objections?  Anybody have another idea for data to collect?

Seems like a reasonable place to start.  I assume you'll be installing
some debug-elog calls, enabled by default, and then once the problem is
fixed, we'll change the default to disabled but keep the actual calls?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Stephen Frost
Thomas,

* Thomas Munro (thomas.mu...@enterprisedb.com) wrote:
> On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway  wrote:
> > On 02/08/2016 06:24 AM, Andres Freund wrote:
> >> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> >>> Now that there is a setting to give a cluster a "name", it would be nice 
> >>> to
> >>> have an escape sequence in the log_line_prefix setting that could 
> >>> reference
> >>> the cluster_name.
> >>
> >> I've argued[1][2] for this when cluster_name was introduced, but back
> >> then I seemed to have been the only one arguing for it. Josh later
> >> jumped that train.
> >>
> >> Given that we now had a number of people wishing for this, can we maybe
> >> reconsider?
> >
> > Seems like a reasonable idea to me. But if we add a log_line_prefix
> > setting, shouldn't we also add it to csvlog output too?
> 
> Here's a tiny patch adding support for %C to log_line_prefix (this was
> part of the cluster_name patch that didn't go it).
> 
> Given that csvlog's output format is hardcoded in write_csvlog, how is
> it supposed to evolve without upsetting consumers of this data?
> Wouldn't we first need to add a GUC that lets you control the columns
> it outputs?

Not sure if you really want to go there, but I do agree with you and
there's a thread from a few years back about something similar:

http://www.postgresql.org/message-id/flat/20110112142345.ga4...@tamriel.snowman.net

Included in that thread is a patch, which likely requires some dusting
off, to add exactly that ability.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> What I'd like to do to investigate this is put in a temporary HEAD-only
>> patch that makes ShutdownXLOG() and its subroutines much chattier about
>> how far they've gotten and what time it is, and also makes pg_ctl print
>> out the current time if it gives up waiting.

> Seems like a reasonable place to start.  I assume you'll be installing
> some debug-elog calls, enabled by default, and then once the problem is
> fixed, we'll change the default to disabled but keep the actual calls?

I had in mind to just "git revert" the patch when we're done with it.
There might be some parts we want to keep (for instance, I'm thinking of
logging "postmaster is done" after we've unlinked the pidfile, which might
be useful permanently).  But I plan to err on the side of noisiness for
the moment, rather than make something I think has long-term value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 01:11 PM, Peter Geoghegan wrote:

On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas  wrote:



I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] 
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com


This part of the thread seems like something that should be a new thread 
about how to write patches. I agree that patches that are large features 
that are in depth discussed on a maintained wiki page would be awesome. 
Creating that knowledge base without having to troll through code would 
be priceless in value.


Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Filip Rembiałkowski
On Mon, Feb 8, 2016 at 1:52 PM, Craig Ringer  wrote:

> Would it be correct to say that if ALL is specified then a message is queued
> no matter what. If DISTINCT is specified then it is only queued if no
> message with the same channel and argument is already queued for delivery.
Yes, exactly.

> Using DISTINCT can never decrease the total number of messages to be sent.
This sentence does not sound true. DISTINCT is the default, old
behaviour. It *can* decrease total number of messages (by
deduplication)

> I've found the deduplication functionality of NOTIFY very frustrating in the 
> past
> and I see this as a significant improvement. Sometimes the *number of times*
> something happened is significant too...
yep, same idea here.




Here is my next try, after suggestions from -perf and -hackers list:

* no GUC

* small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT

* corresponding, 3-argument version of pg_notify(text,text,bool)

* updated the docs to include new syntax and clarify behavior

* no hashtable in AsyncExistsPendingNotify
 (I don't see much sense in that part; and it can be well done
separately from this)
diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index 4dd5608..933c76c 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-NOTIFY channel [ , payload ]
+NOTIFY [ ALL | DISTINCT ] channel [ , payload ]
 
  
 
@@ -105,6 +105,10 @@ NOTIFY channel [ , ALL is specified (contrary to DISTINCT, the
+   default), the server will deliver all notifications, including duplicates.
+   Removal of duplicate notifications takes place within transaction block,
+   finished with COMMIT, END or SAVEPOINT.
   
 
   
@@ -190,6 +194,12 @@ NOTIFY channel [ , NOTIFY command if you need to work with
 non-constant channel names and payloads.

+   
+There is a three-argument version, pg_notify(text,
+text, boolean). The third argument acts like
+the ALL keyword when set to true, and
+DISTINCT when set to false. 
+   
   
  
 
@@ -210,6 +220,21 @@ Asynchronous notification "virtual" with payload "This is the payload" received
 LISTEN foo;
 SELECT pg_notify('fo' || 'o', 'pay' || 'load');
 Asynchronous notification "foo" with payload "payload" received from server process with PID 14728.
+
+/* Identical messages from same (sub-) transaction can be eliminated - unless you use the ALL keyword */
+LISTEN bar;
+BEGIN;
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Milk please';
+NOTIFY ALL bar, 'Milk please';
+SAVEPOINT s;
+NOTIFY bar, 'Coffee please';
+COMMIT;
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
 
  
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index c39ac3a..38a8246 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -510,6 +510,7 @@ pg_notify(PG_FUNCTION_ARGS)
 {
 	const char *channel;
 	const char *payload;
+	bool use_all;
 
 	if (PG_ARGISNULL(0))
 		channel = "";
@@ -521,10 +522,15 @@ pg_notify(PG_FUNCTION_ARGS)
 	else
 		payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
 
+	if (PG_NARGS() > 2 && ! PG_ARGISNULL(2))
+		use_all = PG_GETARG_BOOL(2);
+	else
+		use_all = false;
+
 	/* For NOTIFY as a statement, this is checked in ProcessUtility */
 	PreventCommandDuringRecovery("NOTIFY");
 
-	Async_Notify(channel, payload);
+	Async_Notify(channel, payload, use_all);
 
 	PG_RETURN_VOID();
 }
@@ -540,7 +546,7 @@ pg_notify(PG_FUNCTION_ARGS)
  *		^^
  */
 void
-Async_Notify(const char *channel, const char *payload)
+Async_Notify(const char *channel, const char *payload, bool use_all)
 {
 	Notification *n;
 	MemoryContext oldcontext;
@@ -570,9 +576,10 @@ Async_Notify(const char *channel, const char *payload)
 	 errmsg("payload string too long")));
 	}
 
-	/* no point in making duplicate entries in the list ... */
-	if (AsyncExistsPendingNotify(channel, payload))
-		return;
+	if (!use_all)
+		/* remove duplicate entries in the list */
+		if (AsyncExistsPendingNotify(channel, payload))
+			return;
 
 	/*
 	 * The notification list needs to live until end of transaction, so store
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..7203f4a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -8528,11 +8528,12 @@ DropRuleStmt:
  *
  */
 
-NotifyStmt: NOTIFY ColId notify_payload
+NotifyStmt: NOTIFY all_or_distinct ColId notify_payload
 {
 		

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Alvaro Herrera
Daniel Verite wrote:
>   Teodor Sigaev wrote:
> 
> > Interesting feature, but it's not very obvious how to use it. I'd like to
> > see  some example(s) in documentation.
> 
> I'm thinking of making a wiki page, because examples pretty much
> require showing resultsets, and I'm not sure this would fly
> with our current psql documentation, which is quite compact.

Yeah, we need to keep in mind that the psql doc is processed as a
manpage also, so it may not be a great idea to add too many things
there.  But I also agree that some good examples would be useful.

FWIW I think the general idea of this feature (client-side resultset
"pivoting") is a good one, but I don't really have an opinion regarding
your specific proposal.  I think you should first seek some more
consensus about the proposed design; in your original thread [1] several
guys defended the idea of having this be a psql feature, and the idea of
this being a parallel to \x seems a very sensible one, but there's
really been no discussion on whether your proposed "+/-" syntax to
change sort order makes sense, for one thing.

So please can we have that wiki page so that the syntax can be hammered
out a bit more.

I'm closing this as returned-with-feedback for now.

[1] It's a good idea to add links to previous threads where things were
discussed.  I had to search for
www.postgresql.org/message-id/flat/78543039-c708-4f5d-a66f-0c0fbcda1f76@mm
because you didn't provide a link to it when you started the second
thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] enable parallel query by default?

2016-02-08 Thread Robert Haas
Hi,

One of the questions I have about parallel query is whether it should
be enabled by default.  That is, should we make the default value of
max_parallel_degree to a value higher than 0?  Perhaps 1, say?

There are some good reasons why this might be a bad idea, such as:

- As discussed on a nearby thread, there is every possibility of nasty bugs.
- Parallel query uses substantially more resources than a regular
query, which might overload your system.

On the other hand:

- Features that are on by default get more testing and thus might get
less buggy more quickly.
- A lot of people don't change the default configuration and thus
wouldn't get any benefit from the feature if it's not on by default.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Alvaro Herrera
Hi everybody,

I just closed the last few remaining items in the commitfest.  This is
the final summary:

 Committed: 32.
 Moved to next CF: 32.
 Rejected: 2.
 Returned with Feedback: 33.
Total: 99. 

I think we did a fairly decent job this time around: we only passed a
third of the patches to the upcoming commitfest, which seems pretty
decent.  The number of patches that we moved unreviewed was very small,
which is even better -- hopefully we're not letting too many people
down.

I'm closing this commitfest now.  We even have three weeks before the
next one starts, so everybody can take a break for once!  Yay!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake  wrote:
> If I am off base, please feel free to yell Latin at me again but isn't this
> exactly what different trees are for in Git? Would it be possible to say:
>
> Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
> fixes do"?
>
> I can't review this patch but I can run a test suite on a number of
> platforms and see if it behaves as expected.

Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the
patch, which handles the simple-deadlock case, is easy to hit,
although apparently zero people other than Amit and I have found it in
the few months since parallel sequential scan was committed, which
makes me thing people haven't tried very hard to break any part of
parallel query, which is a shame.  The really hairy part is in
deadlock.c, and it's actually very hard to hit that case.  It won't be
hit in real life except in pretty rare circumstances.  So testing is
good, but you not only need to know what you are testing but probably
have an automated tool that can run the test a gazillion times in a
loop, or be really clever and find a test case that Amit and I didn't
foresee.  And the reality is that getting anybody independent of the
parallel query effort to take an interested in deep testing has not
gone anywhere at all up until now.  I'd be happy for that change,
whether because of this commit or for any other reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >