date:20161212

On 13/12/16 03:26, Petr Jelinek wrote:
> On 13/12/16 02:41, Andres Freund wrote:
>> On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote: 
>>
>>> +static List *
>>> +OpenTableList(List *tables)
>>> +{
>>> +   List   *relids = NIL;
>>> +   List   *rels = NIL;
>>> +   ListCell   *lc;
>>> +
>>> +   /*
>>> +* Open, share-lock, and check all the explicitly-specified relations
>>> +*/
>>> +   foreach(lc, tables)
>>> +   {
>>> +   RangeVar   *rv = lfirst(lc);
>>> +   Relationrel;
>>> +   boolrecurse = interpretInhOption(rv->inhOpt);
>>> +   Oid myrelid;
>>> +
>>> +   rel = heap_openrv(rv, ShareUpdateExclusiveLock);
>>> +   myrelid = RelationGetRelid(rel);
>>> +   /* filter out duplicates when user specifies "foo, foo" */
>>> +   if (list_member_oid(relids, myrelid))
>>> +   {
>>> +   heap_close(rel, ShareUpdateExclusiveLock);
>>> +   continue;
>>> +   }
>>
>> This is a quadratic algorithm - that could bite us... Not sure if we
>> need to care.  If we want to fix it, one approach owuld be to use
>> RangeVarGetRelid() instead, and then do a qsort/deduplicate before
>> actually opening the relations.
>>
> 
> I guess it could get really slow only with big inheritance tree, I'll
> look into how much work is the other way of doing things (this is not
> exactly hot code path).
> 

Actually looking at it, it only processes user input so I don't think
it's very problematic in terms of performance. You'd have to pass many
thousands of tables in single DDL to notice.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

On Tue, Dec 13, 2016 at 10:43 AM, Michael Paquier
 wrote:
> On Mon, Dec 12, 2016 at 11:39 PM, Heikki Linnakangas  wrote:
>> A few couple more things that caught my eye while hacking on this:

Looking at what we have now, in the branch...

>> * Use SASLPrep for passwords.

SASLPrep is defined here:
https://tools.ietf.org/html/rfc4013
And stringprep is here:
https://tools.ietf.org/html/rfc3454
So that's roughly applying a conversion from the mapping table, taking
into account prohibited, bi-directional, mapping characters, etc. The
spec says that the password should be in unicode. But we cannot be
sure of that, right? Those mapping tables should be likely a separated
thing.. (perl has Unicode::Stringprep::Mapping for example).

>> * Check nonces, etc. to not contain invalid characters.

Fixed this one.

>> * Derive mock SCRAM verifier for non-existent users deterministically from
>> username.

You have put in place the facility to allow that. The only thing that
comes in mind to generate something per-cluster is to have
BootStrapXLOG() generate an "authentication secret identifier" with a
uint64 and add that in the control file. Using pg_backend_random()
would be a good idea here.

>> * Allow plain 'password' authentication for users with a SCRAM verifier in
>> rolpassword.

Done.

>> * Throw an error if an "authorization identity" is given. ATM, we just
>> ignore it, but seems better to reject the attempt than do something that
>> might not be what the client expects.

Done.

>> * Add "scram-sha-256" prefix to SCRAM verifiers stored in
>> pg_authid.rolpassword.

You did it.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fixing matching of boolean index columns to sort ordering

The attached patch addresses the complaint raised in
https://www.postgresql.org/message-id/CAHt_Luuao4gd6De61GryK=2ff-mtghzjqffdjz02usdvqym...@mail.gmail.com

namely, that if you have an index on, say, integer columns i and j,
then the planner will figure out that it can use an indexscan with
no additional sort for a query like
select * from tab where i = 42 order by j;
but the same sort of thing does not work when the first column is
a boolean.  You would think that this query is entirely isomorphic
to the one above:
select * from tab where b = true order by j;
but it isn't, because in expression preprocessing we simplify that to
select * from tab where b order by j;
Now, there's no equality condition so no EquivalenceClass is created
containing "b", and it's the presence of the EquivalenceClass that
drives the code that recognizes that the first index column can be
ignored while deciding what sort order the index produces.

The patch fixes that through the expedient of matching boolean index
columns to the restriction clauses for "tab", and when it finds a
match, acting as though we'd found a match to an EquivalenceClass
containing a constant.  This is pretty ugly, but no more so than
several other existing special cases for boolean index columns.

Those special cases would largely go away if we were to canonicalize
"WHERE b" into "WHERE b = true" rather than the reverse, so you might
reasonably ask why we don't do that.  I've asked myself that every time
I had to add another one of these special cases :-(, but the answer is
the same every time: where would you stop?  Every WHERE clause is a
boolean expression, so there's no principled reason why such a rule
wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
wreaking havoc on every other optimization we have.  Restricting it
to only apply to simple boolean columns is no answer because (a) indexes
can be on boolean expressions not just simple columns, and (b) part
of the point of the canonicalization is to make logically-equivalent
expressions look alike, so declining to apply it in some cases would
ruin that.

So, for better or worse, our approach is to simplify out "= true"
and then do whatever pushups we have to do at lower levels to keep
useful cases working nicely.  This is another such pushup.

I'll add this to the next commitfest --- it could use some review
to see if I missed anything.

regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb..7a1a498 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*** relation_has_unique_index_for(PlannerInf
*** 3025,3030 
--- 3025,3076 
  	return false;
  }
  
+ /*
+  * indexcol_is_bool_constant_for_query
+  *
+  * If an index column is constrained to have a constant value by the query's
+  * WHERE conditions, then it's irrelevant for sort-order considerations.
+  * Usually that means we have a restriction clause WHERE indexcol = constant,
+  * which gets turned into an EquivalenceClass containing a constant, which
+  * is recognized as redundant by build_index_pathkeys().  But if the index
+  * column is a boolean variable (or expression), then we are not going to
+  * see WHERE indexcol = constant, because expression preprocessing will have
+  * simplified that to "WHERE indexcol" or "WHERE NOT indexcol".  So we are not
+  * going to have a matching EquivalenceClass (unless the query also contains
+  * "ORDER BY indexcol").  To allow such cases to work the same as they would
+  * for non-boolean values, this function is provided to detect whether the
+  * specified index column matches a boolean restriction clause.
+  */
+ bool
+ indexcol_is_bool_constant_for_query(IndexOptInfo *index, int indexcol)
+ {
+ 	ListCell   *lc;
+ 
+ 	/* If the index isn't boolean, we can't possibly get a match */
+ 	if (!IsBooleanOpfamily(index->opfamily[indexcol]))
+ 		return false;
+ 
+ 	/* Check each restriction clause for the index's rel */
+ 	foreach(lc, index->rel->baserestrictinfo)
+ 	{
+ 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ 
+ 		/*
+ 		 * As in match_clause_to_indexcol, never match pseudoconstants to
+ 		 * indexes.  (It might be semantically okay to do so here, but the
+ 		 * odds of getting a match are negligible, so don't waste the cycles.)
+ 		 */
+ 		if (rinfo->pseudoconstant)
+ 			continue;
+ 
+ 		/* See if we can match the clause's expression to the index column */
+ 		if (match_boolean_index_clause((Node *) rinfo->clause, indexcol, index))
+ 			return true;
+ 	}
+ 
+ 	return false;
+ }
+ 
  
  /
   *  ROUTINES TO CHECK OPERANDS  
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 4436ac1..459498c 100644
*** a/src/backend/optimizer/path/pathkeys.c
--- b/src/backend/opti

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-12 Thread Fujii Masao

On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Sat, 5 Nov 2016 21:18:42 +0900, Michael Paquier 
>  wrote in 
> 
>> > I don't see any problem on the state-transition of
>> > exclusiveBackupState. For the following part
>> >
>> > @@ -10217,7 +10255,7 @@ do_pg_start_backup(const char *backupidstr, bool 
>> > fast, TimeLineID *starttli_p,
>> >  {
>> >/*
>> > * Check for existing backup label --- implies a back>
>> up is already
>> > -   * running.  (XXX given that we checked exclusiveBackup above,
>> > +   * running.  (XXX given that we checked exclusiveBackupState above,
>> > * maybe it would be OK to just unlink any such label file?)
>> >
>> > The issue in the XXX comment should be settled by this
>> > chance. PostgreSQL no longer (or should not) places the file by
>> > mistake of itself. It is only possible by someone evil crafting
>> > it, or crash-then-restarted. For the former it can be safely
>> > removed with some notice. For the latter we should remove it
>> > since the backup taken through the restarting is not
>> > reliable. Addition to that, issueing pg_start_backup indicates
>> > that the operator already have forgotten that he/she issued it
>> > previously. So it seems to me that it can be removed with some
>> > WARNING.
>> >
>> > The error checking on exclusiveBackupState looks somewhat
>> > redandunt but I don't come up with a better coding.
>>
>> Yes, that's on purpose to make the error messages more verbose for the user.
>>
>> > So, everything other than the XXX comment looks fine for me, and
>> > this is rather straghtforward patch.
>>
>> I agree that we could do something better, but that would be an
>> optimization, not a bug fix which is what this patch is aiming at.
>
> Ok, I understand that.
>
>> > So after deciding what to do
>> > for the issue and anyone opposed to this patch, I'll make this
>> > 'Ready for commiter'.
>>
>> Thanks for the input.
>>
>> By the way, as long as I look at that, the patch applies cleanly on
>> master and 9.6 but not further down. If a committer shows up to push
>> this patch, I can prepare versions for back branches as needed.
>
> Ok, the attached is the version just rebased to the current
> master. It is clieanly appliable without whitespace errors on
> master and 9.6 with some displacements but fails on 9.5.
>
> I will mark this as "Ready for Committer".

Thanks for updating the patch! Here are the review comments;

+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is exclusive backup in progress.

"there is exclusive" should be "there is no exclusive"?
ISTM that the description following "to be more" doesn't seem to be necessary.

+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
+ * that is is not done yet.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
+ * that is is not done yet.

Typo: "is is"

Isn't it better to mention "an exclusive backup" here? What about

EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an exclusive
backup.
EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
backup.

I think that it's worth explaining why ExclusiveBackupState is necessary,
in the comment.

- * exclusiveBackup is true if a backup started with pg_start_backup() is
- * in progress, and nonExclusiveBackups is a counter indicating the number
+ * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
+ * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS once
+ * it is done, and nonExclusiveBackups is a counter indicating the number

Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
Basically it's better to explain fully how the state changes.

+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("a backup is already starting")));
+}
+if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_STOPPING)
+{
+WALInsertLockRelease();
+ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("a backup is currently stopping")));

Isn't it better to use "an exclusive backup" explicitly rather than
"a backup" here?

You changed the code so that pg_stop_backup() removes backup_label before
it marks the exclusive-backup-state as not-in-progress. Could you tell me
why you did this change? It's better to explain it in the comment.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-12 Thread Amit Kapila

On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada  wrote:
> On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao  wrote:
>> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada  
>> wrote:
>>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  
>>> wrote:

 Few comments:
>>>
>>> Thank you for reviewing.
>>>
 + * In 10.0 we support two synchronization methods, priority and
 + * quorum. The number of synchronous standbys that transactions
 + * must wait for replies from and synchronization method are specified
 + * in synchronous_standby_names. This parameter also specifies a list
 + * of standby names, which determines the priority of each standby for
 + * being chosen as a synchronous standby. In priority method, the standbys
 + * whose names appear earlier in the list are given higher priority
 + * and will be considered as synchronous. Other standby servers appearing
 + * later in this list represent potential synchronous standbys. If any of
 + * the current synchronous standbys disconnects for whatever reason,
 + * it will be replaced immediately with the next-highest-priority standby.
 + * In quorum method, the all standbys appearing in the list are
 + * considered as a candidate for quorum commit.

 In the above description, is priority method represented by FIRST and
 quorum method by ANY in the synchronous_standby_names syntax?  If so,
 it might be better to write about it explicitly.
>>>
>>> Added description.
>>>

+ * specified in synchronous_standby_names. The priority method is
+ * represented by FIRST, and the quorum method is represented by ANY

Full stop is missing after "ANY".

>>>
 6.
 + int sync_method; /* synchronization method */
   /* member_names contains nmembers consecutive nul-terminated C strings */
   char member_names[FLEXIBLE_ARRAY_MEMBER];
  } SyncRepConfigData;

 Can't we use 1 or 2 bytes to store sync_method information?
>>>
>>> I changed it to uint8.
>>>

+ int8 sync_method; /* synchronization method */

> I changed it to uint8.

In mail, you have mentioned uint8, but in the code it is int8.  I
think you want to update code to use uint8.


>
>> +standby_list{ $$ =
>> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
>> +| NUM '(' standby_list ')'{ $$ =
>> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
>> +| ANY NUM '(' standby_list ')'{ $$ =
>> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
>> +| FIRST NUM '(' standby_list ')'{ $$ =
>> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>>
>> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
>> differently from curent version while "list" works in the same way as
>> current one) very confusing?
>>
>> I prefer to either of
>>
>> 1. break the backward-compatibility, i.e., treat the first syntax of
>> "standby_list" as quorum commit
>> 2. keep the backward-compatibility, i.e., treat the second syntax of
>> "NUM (standby_list)" as sync rep with the priority
>

+1.

> There were some comments when I proposed the quorum commit. If we do
> #1 it breaks the backward-compatibility with 9.5 or before as well. I
> don't think it's a good idea. On the other hand, if we do #2 then the
> behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
> (standby_list)''. But it would not what most of user will want and
> would confuse the users of future version who will want to use the
> quorum commit. Since many hackers thought that the sensible default
> behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
> current patch chose to changes the behaviour of s_s_names and document
> that changes thoroughly.
>

Your arguments are sensible, but I think we should address the point
of confusion raised by Fujii-san.  As a developer, I feel breaking
backward compatibility (go with Option-1 mentioned above) here is a
good move as it can avoid confusions in future.  However, I know many
a time users are so accustomed to the current usage that they feel
irritated with the change in behavior even it is for their betterment,
so it is advisable to do so only if it is necessary or we have
feedback from a couple of users.  So in this case, if we don't want to
go with Option-1, then I think we should go with Option-2.  If we go
with Option-2, then we can anyway comeback to change the behavior
which is more sensible for future after feedback from users.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

On 12/12/16 16:29, Robert Haas wrote:
> On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer
>  wrote:
>> On 12 Dec. 2016 21:55, "Robert Haas"  wrote:
>> On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin 
>> wrote:
>>> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
>>> Since one of purposes is to orchestrate transactions, may be that
>>> would be valuable.
>>
>> A background worker has no client connection, so what would COPY FROM STDIN
>> do?
>>
>> It doesn't make sense. But a bgworker may well want to supply input to COPY.
>> A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever.
> 
> That's kinda weird, though.  I mean, you don't need to go through all
> of the COPY code just to call heap_multi_insert() or whatever, do you?
>  You can hand-roll whatever you need there.
> 

You do if source of your data is already in COPY (or csv) format. I do
have similar usecase in logical replication followup patch that I plan
to submit to Jan CF, so maybe that will be interesting for this as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-12 Thread CK Tan

On Mon, Dec 12, 2016 at 6:14 PM, Andres Freund  wrote:
>
>
> For Q1 I think the bigger win is JITing the transition function
> invocation in advance_aggregates/transition_function - that's IIRC where
> the biggest bottleneck lies.
>

Yeah, we bundle the agg core into our expr work... no point otherwise since
we do
it for OLAP.

As for experience, I think you have found out for yourself. There is a lot
that
can be done and heuristics are involved in many places to decide whether
to jit fully, partially, or not at all.  But it looks like you have a solid
basis now
to proceed and explore the beyond :-)

Send me private email if you have a particular question.

Regards,
-cktan

Re: [HACKERS] Logical Replication WIP

On 13/12/16 02:41, Andres Freund wrote:
> On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote:
> 
>> diff --git a/src/backend/catalog/pg_publication.c 
>> b/src/backend/catalog/pg_publication.c
>> new file mode 100644
>> index 000..e3560b7
>> --- /dev/null
>> +++ b/src/backend/catalog/pg_publication.c
>> +
>> +Datum pg_get_publication_tables(PG_FUNCTION_ARGS);
> 
> Don't we usually put these in a header?
>

We put these to rather random places, I don't mind either way.

> 
>> +/*
>> + * Gets list of relation oids for a publication.
>> + *
>> + * This should only be used for normal publications, the FOR ALL TABLES
>> + * should use GetAllTablesPublicationRelations().
>> + */
>> +List *
>> +GetPublicationRelations(Oid pubid)
>> +{
>> +List   *result;
>> +Relationpubrelsrel;
>> +ScanKeyData scankey;
>> +SysScanDesc scan;
>> +HeapTuple   tup;
>> +
>> +/* Find all publications associated with the relation. */
>> +pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
>> +
>> +ScanKeyInit(&scankey,
>> +Anum_pg_publication_rel_prpubid,
>> +BTEqualStrategyNumber, F_OIDEQ,
>> +ObjectIdGetDatum(pubid));
>> +
>> +scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
>> +  NULL, 1, &scankey);
>> +
>> +result = NIL;
>> +while (HeapTupleIsValid(tup = systable_getnext(scan)))
>> +{
>> +Form_pg_publication_rel pubrel;
>> +
>> +pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
>> +
>> +result = lappend_oid(result, pubrel->prrelid);
>> +}
>> +
>> +systable_endscan(scan);
>> +heap_close(pubrelsrel, NoLock);
> 
> In other parts of this you drop the lock, but not here?
> 
> 
>> +heap_close(rel, NoLock);
>> +
>> +return result;
>> +}
> 
> and here.
> 

Meh, ignore, that's some pglogical legacy.


> 
> Btw, why are matviews not publishable?
> 

Because standard way of updating them is REFRESH MATERIALIZED VIEW which
is decoded as inserts into pg_temp_ table. I think we'll have to
rethink how we do this before we can sanely support them.

>> +/*
>> + * Get Publication using name.
>> + */
>> +Publication *
>> +GetPublicationByName(const char *pubname, bool missing_ok)
>> +{
>> +Oid oid;
>> +
>> +oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
>> +if (!OidIsValid(oid))
>> +{
>> +if (missing_ok)
>> +return NULL;
>> +
>> +ereport(ERROR,
>> +(errcode(ERRCODE_UNDEFINED_OBJECT),
>> + errmsg("publication \"%s\" does not exist", 
>> pubname)));
>> +}
>> +
>> +return GetPublication(oid);
>> +}
> 
> That's racy... Also, shouldn't we specify for how to deal with the
> returned memory for Publication * returning methods?
> 

So are most of the other existing functions with similar purpose. The
worst case is that with enough concurrency around same publication name
DDL you'll get cache lookup failure.

I added comment to GetPublication saying that memory is palloced.

> 
>> diff --git a/src/backend/commands/publicationcmds.c 
>> b/src/backend/commands/publicationcmds.c
>> new file mode 100644
>> index 000..954b2bd
>> --- /dev/null
>> +++ b/src/backend/commands/publicationcmds.c
>> @@ -0,0 +1,613 @@
> 
>> +/*
>> + * Create new publication.
>> + */
>> +ObjectAddress
>> +CreatePublication(CreatePublicationStmt *stmt)
>> +{
>> +Relationrel;
> 
>> +
>> +values[Anum_pg_publication_puballtables - 1] =
>> +BoolGetDatum(stmt->for_all_tables);
>> +values[Anum_pg_publication_pubinsert - 1] =
>> +BoolGetDatum(publish_insert);
>> +values[Anum_pg_publication_pubupdate - 1] =
>> +BoolGetDatum(publish_update);
>> +values[Anum_pg_publication_pubdelete - 1] =
>> +BoolGetDatum(publish_delete);
> 
> I remain convinced that a different representation would be
> better. There'll be more options over time (truncate, DDL at least).
> 

So? It's boolean properties, it's not like we store bitmaps in catalogs
much. I very much expect DDL to be much more complex than boolean btw.

> 
>> +static void
>> +AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel,
>> +   HeapTuple tup)
>> +{
>> +boolpublish_insert_given;
>> +boolpublish_update_given;
>> +boolpublish_delete_given;
>> +boolpublish_insert;
>> +boolpublish_update;
>> +boolpublish_delete;
>> +ObjectAddress   obj;
>> +
>> +parse_publication_options(stmt->options,
>> +  
>> &publish_insert_given, &publish_insert,
>> +

Re: [HACKERS] Hash Indexes

2016-12-12 Thread Amit Kapila

On Tue, Dec 13, 2016 at 2:51 AM, Robert Haas  wrote:
> On Sun, Dec 11, 2016 at 1:24 AM, Amit Kapila  wrote:
>>
>> With above fixes, the test ran successfully for more than a day.
>
> Instead of doing this:
>
> +_hash_chgbufaccess(rel, bucket_buf, HASH_WRITE, HASH_NOLOCK);
> +_hash_chgbufaccess(rel, bucket_buf, HASH_NOLOCK, HASH_WRITE);
>
> ...wouldn't it be better to just do MarkBufferDirty()?  There's no
> real reason to release the lock only to reacquire it again, is there?
>

The reason is to make the operations consistent in master and standby.
In WAL patch, for clearing the SPLIT_CLEANUP flag, we write a WAL and
if we don't release the lock after writing a WAL, the operation can
appear on standby even before on master.   Currently, without WAL,
there is no benefit of doing so and we can fix by using
MarkBufferDirty, however, I thought it might be simpler to keep it
this way as this is required for enabling WAL.  OTOH, we can leave
that for WAL patch.  Let me know which way you prefer?

> I don't think we should be afraid to call MarkBufferDirty() instead of
> going through these (fairly stupid) hasham-specific APIs.
>

Right and anyway we need to use it at many more call sites when we
enable WAL for hash index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

Hi,

On 2016-12-12 18:11:13 -0800, CK Tan wrote:
> Andres,

> > dev (no jiting):
> > Time: 30343.532 ms
> 
> > dev (jiting):
> > SET jit_tuple_deforming = on;
> > SET jit_expressions = true;
> >
> > Time: 24439.803 ms
> 
> FYI, ~20% improvement for TPCH Q1 is consistent with what we find when we
> only jit expression.

For Q1 I think the bigger win is JITing the transition function
invocation in advance_aggregates/transition_function - that's IIRC where
the biggest bottleneck lies.

If you have any details about your JITing experience that you're willing
to talk about ...

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-12 Thread CK Tan

Andres,


> dev (no jiting):
> Time: 30343.532 ms

> dev (jiting):
> SET jit_tuple_deforming = on;
> SET jit_expressions = true;
>
> Time: 24439.803 ms

FYI, ~20% improvement for TPCH Q1 is consistent with what we find when we
only jit expression.

Cheers,
-cktan

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On 13 December 2016 at 09:13, Michael Paquier  wrote:
> On Tue, Dec 13, 2016 at 10:05 AM, Craig Ringer  wrote:
>> We should probably expose a proc_type or something, with types:
>>
>> * client_backend
>> * bgworker
>> * walsender
>> * autovacuum
>> * checkpointer
>> * bgwriter
>
> A text field is adapted then, more than a single character.
>
>> for simpler filtering.
>>
>> I don't think existing user code is likely to get upset by more
>> processes appearing in pg_stat_activity, and it'll be very handy.
>
> Indeed, for WAL senders now abusing of the query field is definitely
> not consistent. Even if having this information is useful, adding such
> a column would make sense. Still, one thing that is important to keep
> with pg_stat_activity is the ability to count the number of
> connections that are part of max_connections for monitoring purposes.
> The docs definitely would need an example of such a query counting
> only client_backend and WAL senders and tell users that this can be
> used to count how many active connections there are.

Good point.

No need for a new field, since a non-null client_port should be
sufficient.  But definitely documented.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

On Mon, Dec 12, 2016 at 11:39 PM, Heikki Linnakangas  wrote:
> A few couple more things that caught my eye while hacking on this:
>
> 1. We don't use SASLPrep to scrub username's and passwords. That's by
> choice, for usernames, because historically in PostgreSQL usernames can be
> stored in any encoding, but SASLPrep assumes UTF-8. We dodge that by passing
> an empty username in the authentication exchange anyway, because we always
> use the username we got from the startup packet. But for passwords, I think
> we need to fix that. The spec is very clear on that:
>
>> Note that implementations MUST either implement SASLprep or disallow
>> use of non US-ASCII Unicode codepoints in "str".
>
> 2. I think we should check nonces, etc. more carefully, to not contain
> invalid characters. For example, in the server, we use the read_attr_value()
> function to read the client's nonce. Per the spec, the nonce should consist
> of ASCII printable characters, but we will accept anything except the comma.
> That's no trouble to the server, but let's be strict.
>
> To summarize, here's the overall TODO list so far:
>
> * Use SASLPrep for passwords.
>
> * Check nonces, etc. to not contain invalid characters.
>
> * Derive mock SCRAM verifier for non-existent users deterministically from
> username.
>
> * Allow plain 'password' authentication for users with a SCRAM verifier in
> rolpassword.
>
> * Throw an error if an "authorization identity" is given. ATM, we just
> ignore it, but seems better to reject the attempt than do something that
> might not be what the client expects.
>
> * Add "scram-sha-256" prefix to SCRAM verifiers stored in
> pg_authid.rolpassword.
>
> Anything else I'm missing?
>
> I've created a wiki page, mostly to host that TODO list, while we hack this
> to completion: https://wiki.postgresql.org/wiki/SCRAM_authentication. Feel
> free to add stuff that comes to mind, and remove stuff as you push patches
> to the branch on github.

Based on the current code, I think you have the whole list. I'll try
to look once again at the code to see I have anything else in mind.
Improving the TAP regression tests is also an item, with SCRAM
authentication support when a plain password is stored.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote:

> diff --git a/src/backend/catalog/pg_publication.c 
> b/src/backend/catalog/pg_publication.c
> new file mode 100644
> index 000..e3560b7
> --- /dev/null
> +++ b/src/backend/catalog/pg_publication.c
> +
> +Datum pg_get_publication_tables(PG_FUNCTION_ARGS);

Don't we usually put these in a header?

> +/*
> + * Insert new publication / relation mapping.
> + */
> +ObjectAddress
> +publication_add_relation(Oid pubid, Relation targetrel,
> +  bool if_not_exists)
> +{
> + Relationrel;
> + HeapTuple   tup;
> + Datum   values[Natts_pg_publication_rel];
> + boolnulls[Natts_pg_publication_rel];
> + Oid relid = RelationGetRelid(targetrel);
> + Oid prrelid;
> + Publication *pub = GetPublication(pubid);
> + ObjectAddress   myself,
> + referenced;
> +
> + rel = heap_open(PublicationRelRelationId, RowExclusiveLock);
> +
> + /* Check for duplicates */

Maybe mention that that check is racy, but a unique index protects
against the race?


> + /* Insert tuple into catalog. */
> + prrelid = simple_heap_insert(rel, tup);
> + CatalogUpdateIndexes(rel, tup);
> + heap_freetuple(tup);
> +
> + ObjectAddressSet(myself, PublicationRelRelationId, prrelid);
> +
> + /* Add dependency on the publication */
> + ObjectAddressSet(referenced, PublicationRelationId, pubid);
> + recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
> +
> + /* Add dependency on the relation */
> + ObjectAddressSet(referenced, RelationRelationId, relid);
> + recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
> +
> + /* Close the table. */
> + heap_close(rel, RowExclusiveLock);

I'm not quite sure abou the policy, but shouldn't we invoke
InvokeObjectPostCreateHook etc here?


> +/*
> + * Gets list of relation oids for a publication.
> + *
> + * This should only be used for normal publications, the FOR ALL TABLES
> + * should use GetAllTablesPublicationRelations().
> + */
> +List *
> +GetPublicationRelations(Oid pubid)
> +{
> + List   *result;
> + Relationpubrelsrel;
> + ScanKeyData scankey;
> + SysScanDesc scan;
> + HeapTuple   tup;
> +
> + /* Find all publications associated with the relation. */
> + pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
> +
> + ScanKeyInit(&scankey,
> + Anum_pg_publication_rel_prpubid,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(pubid));
> +
> + scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
> +   NULL, 1, &scankey);
> +
> + result = NIL;
> + while (HeapTupleIsValid(tup = systable_getnext(scan)))
> + {
> + Form_pg_publication_rel pubrel;
> +
> + pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
> +
> + result = lappend_oid(result, pubrel->prrelid);
> + }
> +
> + systable_endscan(scan);
> + heap_close(pubrelsrel, NoLock);

In other parts of this you drop the lock, but not here?


> + heap_close(rel, NoLock);
> +
> + return result;
> +}

and here.


> +/*
> + * Gets list of all relation published by FOR ALL TABLES publication(s).
> + */
> +List *
> +GetAllTablesPublicationRelations(void)
> +{
> + RelationclassRel;
> + ScanKeyData key[1];
> + HeapScanDesc scan;
> + HeapTuple   tuple;
> + List   *result = NIL;
> +
> + classRel = heap_open(RelationRelationId, AccessShareLock);

> + heap_endscan(scan);
> + heap_close(classRel, AccessShareLock);
> +
> + return result;
> +}

but here.


Btw, why are matviews not publishable?

> +/*
> + * Get Publication using name.
> + */
> +Publication *
> +GetPublicationByName(const char *pubname, bool missing_ok)
> +{
> + Oid oid;
> +
> + oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
> + if (!OidIsValid(oid))
> + {
> + if (missing_ok)
> + return NULL;
> +
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_OBJECT),
> +  errmsg("publication \"%s\" does not exist", 
> pubname)));
> + }
> +
> + return GetPublication(oid);
> +}

That's racy... Also, shouldn't we specify for how to deal with the
returned memory for Publication * returning methods?



> diff --git a/src/backend/commands/publicationcmds.c 
> b/src/backend/commands/publicationcmds.c
> new file mode 100644
> index 000..954b2bd
> --- /dev/null
> +++ b/src/backend/commands/publicationcmds.c
> @@ -0,0 +1,613 @@

> +/*
> + * Create new publication.
> + */
> +ObjectAddres

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

On 12 December 2016 at 22:39, Heikki Linnakangas  wrote:

> * Throw an error if an "authorization identity" is given. ATM, we just
> ignore it, but seems better to reject the attempt than do something that
> might not be what the client expects.

Yeah. That might be an opportunity to make admins' and connection
poolers' lives much happier down the track, but first we'd need a way
of specifying a mapping for the other users a given user is permitted
to masquerade as (like we have for roles and role membership). We have
SET SESSION AUTHORIZATION already, which has all the same benefits and
security problems as allowing connect-time selection of authorization
identity without such a framework. And we have SET ROLE.

ERRORing is the right thing to do here, so we can safely use this
protocol functionality later if we want to allow user masquerading.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

On 13/12/16 01:45, Tomas Vondra wrote:
> On 12/12/2016 11:39 PM, Tomas Vondra wrote:
>> On 12/12/2016 05:05 AM, Petr Jelinek wrote:
>>>
>>> I'd be happy with this patch now (as in committer ready) except that it
>>> does have some merge conflicts after the recent commits, so rebase is
>>> needed.
>>>
>>
>> Attached is a rebased version of the patch, resolving the Makefile merge
>> conflicts.
>>
> 
> Meh, managed to rebase a wrong branch, missing fix to the off-by-one
> error (fixed v6). Attached is v8, hopefully the correct one.
> 

Okay, this version looks good to me, marked as RfC.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

On 12 December 2016 at 23:29, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer
>  wrote:
>> On 12 Dec. 2016 21:55, "Robert Haas"  wrote:
>> On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin 
>> wrote:
>>> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
>>> Since one of purposes is to orchestrate transactions, may be that
>>> would be valuable.
>>
>> A background worker has no client connection, so what would COPY FROM STDIN
>> do?
>>
>> It doesn't make sense. But a bgworker may well want to supply input to COPY.
>> A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever.
>
> That's kinda weird, though.  I mean, you don't need to go through all
> of the COPY code just to call heap_multi_insert() or whatever, do you?
>  You can hand-roll whatever you need there.

And fire triggers and constraint checks if necessary, update indexes,
etc. But yeah.

The original idea with logical rep was to get COPY-format data from
the upstream when initializing a table, and apply it via COPY in a
bgworker. I think that's changed in favour of another approach in
logical rep now, but thought it was worth mentioning as something it
might make sense for someone to want to do.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark  wrote:
> On 12 December 2016 at 04:59, Craig Ringer  wrote:
>> I didn't realise Pg's use of ? was that old, so thanks. That makes
>> offering alternatives much less appealing.
>
> One option might be for Postgres to define duplicate operator names
> using ¿ or something else. I think ¿ is a good choice because it's a
> common punctuation mark in spanish so it's probably not hard to find
> on a lot of keyboards or hard to find instructions on how to type one.
>
> There is always a risk in allowing redundant syntaxes though. For
> example people running grep to find all uses of an operator will miss
> the alternate spelling. There may even be security implications for
> that though to be honest that seems unlikely in this case.

Are you sure that using a non-ASCII character is a good idea for an
in-core operator? I would think no.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background contrib module proposal

On 13 December 2016 at 01:17, Andrew Borodin  wrote:

> 6. Cancelation: a way to signal to background query that it's time to
> quit gracefully.

That at least should be fuss-free. SIGTERM it, and make sure the
worker does CHECK_FOR_INTERRUPTS() in regularly-hit places and loops.
Ensure the worker waits on latches rather than pg_usleep()ing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-12 Thread Amit Langote

On 2016/12/12 23:14, Peter Eisentraut wrote:
> On 12/7/16 1:20 PM, Robert Haas wrote:
>> I've committed 0001 - 0006 with that correction and a few other
>> adjustments.  There's plenty of room for improvement here, and almost
>> certainly some straight-up bugs too, but I think we're at a point
>> where it will be easier and less error-prone to commit follow on
>> changes incrementally rather than by continuously re-reviewing a very
>> large patch set for increasingly smaller changes.
> 
> This page
> ,
> which is found via the index entry for "Partitioning", should be updated
> for the new functionality.

A patch was included to do that, but it didn't do more than replacing the
old DDL listing by the new partitioning commands.  As Robert said in the
email you quoted, that's not enough.  So I'm trying to come up with a
patch updating the page explaining the new functionality better.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On Tue, Dec 13, 2016 at 10:05 AM, Craig Ringer  wrote:
> We should probably expose a proc_type or something, with types:
>
> * client_backend
> * bgworker
> * walsender
> * autovacuum
> * checkpointer
> * bgwriter

A text field is adapted then, more than a single character.

> for simpler filtering.
>
> I don't think existing user code is likely to get upset by more
> processes appearing in pg_stat_activity, and it'll be very handy.

Indeed, for WAL senders now abusing of the query field is definitely
not consistent. Even if having this information is useful, adding such
a column would make sense. Still, one thing that is important to keep
with pg_stat_activity is the ability to count the number of
connections that are part of max_connections for monitoring purposes.
The docs definitely would need an example of such a query counting
only client_backend and WAL senders and tell users that this can be
used to count how many active connections there are.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : Parallel Merge Join

2016-12-12 Thread Dilip Kumar

On Tue, Dec 13, 2016 at 12:11 AM, Robert Haas  wrote:
>
> Hmm, so it seems my initial guess that we didn't need to bother
> generating such paths was wrong.  Oops.
>
> This patch is hard to read because it is reorganizing a bunch of code
> as well as adding new functionality.  Perhaps you could separate it
> into two patches, one with the refactoring and then the other with the
> new functionality.

Okay, I can do that.

>  Actually, though, I don't understand why you need
> so much rearrangement

Actually match_unsorted_outer is quite complex function and
considering merge join path for many cases, And In my opinion those
all paths should be considered for partial outer as well.

So one option was to duplicate that part of code. But I chose to move
all that code from match_unsorted_outer (which is related to
generating various merge join path) to new function called
generate_mergejoin_path. And, use it for normal outer path as well as
for partial outer path.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

On Tue, Dec 13, 2016 at 9:38 AM, Nikita Glukhov  wrote:
> It also fixes the following errors/inconsistencies caused by lost quoting of
> string json values:
>
> [master]=# select * from json_to_record('{"js": "a"}') as rec(js json);
> ERROR:  invalid input syntax for type json
> DETAIL:  Token "a" is invalid.
> CONTEXT:  JSON data, line 1: a

The docs mention that this is based on a best effort now. It may be
interesting to improve that.

> [master]=# select * from json_to_record('{"js": "true"}') as rec(js json);
>   js
> --
> true
>
> [patched]=# select * from json_to_record('{"js": "a"}') as rec(js json);
>   js
> -
>  "a"

That's indeed valid JSON.

> [patched]=# select * from json_to_record('{"js": "true"}') as rec(js json);
>js
> 
>  "true"

And so is that.

> The second patch adds assignment of correct ndims to array fields of RECORD
> function result types.
> Without this patch, attndims in tuple descriptors of RECORD types is always
> 0 and the corresponding assertion fails in the next test:
>
> [patched]=# select json_to_record('{"a": [1, 2, 3]}') as rec(a int[]);
>
>
> Should I submit these patches to commitfest?

It seems to me that it would be a good idea to add them.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On 13 December 2016 at 01:45, Kevin Grittner  wrote:
> On Mon, Dec 12, 2016 at 10:33 AM, Andres Freund  wrote:
>> On 2016-12-12 13:26:32 -0300, Alvaro Herrera wrote:
>>> Robert Haas wrote:
>
 1. Show all processes that have a PGPROC in pg_stat_activity,
>
 2. Add a second view, say pg_stat_system_activity,
>
>>> I vote 1.
>>
>> +1
>
> +1

I've long wanted the ability to see auxillary process state in
pg_stat_activity, so +1.

Right now pg_stat_replication is a join over pg_stat_get_activity()
and pg_stat_get_wal_senders() filtered for walsenders, and
pg_stat_activity is a view over pg_stat_get_activity() filtered for
processes with a user id. I'd really like to see walsenders in
pg_stat_activity now that logical decoding makes them more than dumb
I/O channels, as well as other auxillary processes.

We should probably expose a proc_type or something, with types:

* client_backend
* bgworker
* walsender
* autovacuum
* checkpointer
* bgwriter

for simpler filtering.

I don't think existing user code is likely to get upset by more
processes appearing in pg_stat_activity, and it'll be very handy.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

On 13/12/16 01:33, Andres Freund wrote:
> HJi,
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
 On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>> I think that the removal of changes to ReplicationSlotAcquire() that you
>> did will result in making it impossible to reacquire temporary slot once
>> you switched to different one in the session as the if (active_pid != 0)
>> will always be true for temp slot.
>
> I see.  I suppose it's difficult to get a test case for this.

 I created a test case, saw the error of my ways, and added your code
 back in.  Patch attached.

>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> + /*
> +  * No need for locking as we are only interested in slots active in
> +  * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>   /*
>* The slot is definitely gone.  Lock out concurrent scans of the array
>* long enough to kill it.  It's OK to clear the active flag here 
> without
>* grabbing the mutex because nobody else can be scanning the array 
> here,
>* and nobody can be attached to this slot and thus access it without
>* scanning the array.
>*/
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 

Well it's protected by being called only by ReplicationSlotCleanup() and
ReplicationSlotDropAcquired(). The comment could be improved though, yes.

Holding the ReplicationSlotControlLock in the scan is somewhat
problematic because ReplicationSlotDropPtr tryes to use it as well (and
in exclusive mode), so we'd have to do exclusive lock in
ReplicationSlotCleanup() which I don't really like much.

> 
>  /*
> - * Permanently drop the currently acquired replication slot which will be
> - * released by the point this function returns.
> + * Permanently drop the currently acquired replication slot.
>   */
>  static void
>  ReplicationSlotDropAcquired(void)
> 
> Isn't that actually removing interesting information? Yes, the comment's
> been moved to ReplicationSlotDropPtr(), but that routine is an internal
> one...
> 

ReplicationSlotDropAcquired() is internal one as well.

> 
> @@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)
>   if (MyReplicationSlot != NULL)
>   ReplicationSlotRelease();
> 
> + /* Also cleanup all the temporary slots. */
> + ReplicationSlotCleanup();
> +
> 
> So we now have exactly this code in several places. Why does a
> generically named Cleanup routine not also deal with a currently
> acquired slot? Right now it'd be more appropriately named
> ReplicationSlotDropTemporary() or such.
> 

It definitely could release MyReplicationSlot as well.

> 
> @@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,
>  l.slot_type,
>  l.datoid,
>  d.datname AS database,
> +l.temporary,
>  l.active,
>  l.active_pid,
>  l.xmin,
>  l.catalog_xmin,
>  l.restart_lsn,
>  l.confirmed_flush_lsn
> -   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, 
> active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn)
> +   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, 
> temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, 
> confirmed_flush_lsn)
>   LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
>  pg_roles| SELECT pg_authid.rolname,
>  pg_authid.rolsuper,
> 
> If we start to expose this, shouldn't we expose the persistency instead
> (i.e. persistent/ephemeral/temporary)?
> 

Not sure how much is that useful given that ephemeral is transient state
only present during slot creation.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-12 Thread Tomas Vondra


On 12/12/2016 11:39 PM, Tomas Vondra wrote:

On 12/12/2016 05:05 AM, Petr Jelinek wrote:


I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.



Attached is a rebased version of the patch, resolving the Makefile merge
conflicts.



Meh, managed to rebase a wrong branch, missing fix to the off-by-one 
error (fixed v6). Attached is v8, hopefully the correct one.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-move-common-bits-to-memdebug-v8.patch
Description: binary/octet-stream


0002-slab-allocator-v8.patch
Description: binary/octet-stream


0003-generational-context-v8.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PATCH: recursive json_populate_record()

2016-12-12 Thread Nikita Glukhov


Hi.

The first attached patch implements recursive processing of nested 
objects and arrays in json[b]_populate_record[set](), 
json[b]_to_record[set](). See regression tests for examples.


It also fixes the following errors/inconsistencies caused by lost 
quoting of string json values:


[master]=# select * from json_to_record('{"js": "a"}') as rec(js json);
ERROR:  invalid input syntax for type json
DETAIL:  Token "a" is invalid.
CONTEXT:  JSON data, line 1: a

[master]=# select * from json_to_record('{"js": "true"}') as rec(js json);
  js
--
true

[patched]=# select * from json_to_record('{"js": "a"}') as rec(js json);
  js
-
 "a"

[patched]=# select * from json_to_record('{"js": "true"}') as rec(js json);
   js

 "true"


The second patch adds assignment of correct ndims to array fields of 
RECORD function result types.
Without this patch, attndims in tuple descriptors of RECORD types is 
always 0 and the corresponding assertion fails in the next test:


[patched]=# select json_to_record('{"a": [1, 2, 3]}') as rec(a int[]);


Should I submit these patches to commitfest?

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index eca98df..12049a4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11249,12 +11249,12 @@ table2-mapping
  whose columns match the record type defined by base
  (see note below).

-   select * from json_populate_record(null::myrowtype, '{"a":1,"b":2}')
+   select * from json_populate_record(null::myrowtype, '{"a": 1, "b": ["2", "a b"], "c": {"d": 4, "e": "a  b c"}}')

 
- a | b
+---
- 1 | 2
+ a |   b   |  c
+---+---+-
+ 1 | {2,"a b"} | (4,"a b c")
 

   
@@ -11343,12 +11343,12 @@ table2-mapping
  explicitly define the structure of the record with an AS
  clause.

-   select * from json_to_record('{"a":1,"b":[1,2,3],"c":"bar"}') as x(a int, b text, d text) 
+   select * from json_to_record('{"a":1,"b":[1,2,3],"c":[1,2,3],"e":"bar","r": {"a": 123, "b": "a b c"}}') as x(a int, b text, c int[], d text, r myrowtype) 

 
- a |b| d
+-+---
- 1 | [1,2,3] |
+ a |b|c| d |   r
+---+-+-+---+---
+ 1 | [1,2,3] | {1,2,3} |   | (123,"a b c")
 

   
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 17ee4e4..729826c 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -93,7 +93,7 @@ static void elements_array_element_end(void *state, bool isnull);
 static void elements_scalar(void *state, char *token, JsonTokenType tokentype);
 
 /* turn a json object into a hash table */
-static HTAB *get_json_object_as_hash(text *json, const char *funcname);
+static HTAB *get_json_object_as_hash(char *json, int len, const char *funcname);
 
 /* common worker for populate_record and to_record */
 static Datum populate_record_worker(FunctionCallInfo fcinfo, const char *funcname,
@@ -149,6 +149,33 @@ static void setPathArray(JsonbIterator **it, Datum *path_elems,
 			 int level, Jsonb *newval, uint32 nelems, int op_type);
 static void addJsonbToParseState(JsonbParseState **jbps, Jsonb *jb);
 
+/* helper functions for populate_record[set] */
+typedef struct ColumnIOData ColumnIOData;
+typedef struct RecordIOData RecordIOData;
+
+static HeapTupleHeader
+populate_record(TupleDesc		tupdesc,
+RecordIOData  **record_info,
+HeapTupleHeader	template,
+MemoryContext	mcxt,
+Oidjtype,
+HTAB	 	   *json_hash,
+JsonbContainer *cont);
+
+static Datum
+populate_record_field(ColumnIOData *col,
+	  Oid			type,
+	  int32			typmod,
+	  int32			ndims,
+	  const char   *colname,
+	  MemoryContext	mcxt,
+	  Datum			defaultval,
+	  Oid			jtype,
+	  char		   *json,
+	  bool			json_is_string,
+	  JsonbValue   *jval,
+	  bool		   *isnull);
+
 /* state for json_object_keys */
 typedef struct OkeysState
 {
@@ -216,6 +243,7 @@ typedef struct JhashState
 	HTAB	   *hash;
 	char	   *saved_scalar;
 	char	   *save_json_start;
+	bool		saved_scalar_is_string;
 } JHashState;
 
 /* hashtable element */
@@ -223,26 +251,55 @@ typedef struct JsonHashEntry
 {
 	char		fname[NAMEDATALEN];		/* hash key (MUST BE FIRST) */
 	char	   *val;
-	char	   *json;
 	bool		isnull;
+	bool		isstring;
 } JsonHashEntry;
 
-/* these two are stolen from hstore / record_out, used in populate_record* */
-typedef struct ColumnIOData
+typedef struct ScalarIOData
 {
-	Oid			column_type;
 	Oid			typiofunc;
 	Oid			typioparam;
 	FmgrInfo	proc;
-} ColumnIOData;
+} ScalarIOData;
 
-typedef struct RecordIOData
+typedef struct ArrayIOData
+{
+	Oidelement_type;
+	int32			element_typmod;
+	ColumnIOData   *element_info;
+	intndims;
+} ArrayIOData;
+
+/*
+ * CompositeIOData is a pointer to a RecordIO

Re: [HACKERS] Logical Replication WIP

HJi,

On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
> On 12/8/16 4:10 PM, Petr Jelinek wrote:
> > On 08/12/16 20:16, Peter Eisentraut wrote:
> >> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> >>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>  I think that the removal of changes to ReplicationSlotAcquire() that you
>  did will result in making it impossible to reacquire temporary slot once
>  you switched to different one in the session as the if (active_pid != 0)
>  will always be true for temp slot.
> >>>
> >>> I see.  I suppose it's difficult to get a test case for this.
> >>
> >> I created a test case, saw the error of my ways, and added your code
> >> back in.  Patch attached.
> >>
> >
> > Hi,
> >
> > I am happy with this version, thanks for moving it forward.
>
> committed

Hm.

 /*
+ * Cleanup all temporary slots created in current session.
+ */
+void
+ReplicationSlotCleanup()

I'd rather see a (void) there. The prototype has it, but still.

+
+   /*
+* No need for locking as we are only interested in slots active in
+* current process and those are not touched by other processes.

I'm a bit suspicious of this claim.  Without a memory barrier you could
actually look at outdated versions of active_pid. In practice there's
enough full memory barriers in the slot creation code that it's
guaranteed to not be the same pid from before a wraparound though.

I think that doing iterations of slots without
ReplicationSlotControlLock makes things more fragile, because suddenly
assumptions that previously held aren't true anymore.   E.g. factually
/*
 * The slot is definitely gone.  Lock out concurrent scans of the array
 * long enough to kill it.  It's OK to clear the active flag here 
without
 * grabbing the mutex because nobody else can be scanning the array 
here,
 * and nobody can be attached to this slot and thus access it without
 * scanning the array.
 */
is now simply not true anymore.  It's probably not harmfully broken, but
at least you've changed the locking protocol without adapting comments.

 /*
- * Permanently drop the currently acquired replication slot which will be
- * released by the point this function returns.
+ * Permanently drop the currently acquired replication slot.
  */
 static void
 ReplicationSlotDropAcquired(void)

Isn't that actually removing interesting information? Yes, the comment's
been moved to ReplicationSlotDropPtr(), but that routine is an internal
one...

@@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)
if (MyReplicationSlot != NULL)
ReplicationSlotRelease();

+   /* Also cleanup all the temporary slots. */
+   ReplicationSlotCleanup();
+

So we now have exactly this code in several places. Why does a
generically named Cleanup routine not also deal with a currently
acquired slot? Right now it'd be more appropriately named
ReplicationSlotDropTemporary() or such.

@@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,
 l.slot_type,
 l.datoid,
 d.datname AS database,
+l.temporary,
 l.active,
 l.active_pid,
 l.xmin,
 l.catalog_xmin,
 l.restart_lsn,
 l.confirmed_flush_lsn
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, 
active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, 
temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, 
confirmed_flush_lsn)
  LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
 pg_authid.rolsuper,

If we start to expose this, shouldn't we expose the persistency instead
(i.e. persistent/ephemeral/temporary)?

new file   contrib/test_decoding/sql/slot.sql
@@ -0,0 +1,20 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 
'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t', 
'test_decoding', true);
+
+SELECT pg_drop_replication_slot('regression_slot_p');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 
'test_decoding', false);
+
+-- reconnect to clean temp slots
+\c

Can we add multiple slots to clean up here? Can we also add a test for
the cleanup on error for temporary slots? E.g. something like in
ddl.sql (maybe we should actually move some of the relevant tests from
there to here).

It'd also be good to test this with physical slots?

+-- test switching between slots in a session
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 
'test_decoding', true);
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 
'test_decoding', true);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);

Can we actually output something? Right now this doesn't test that
much..

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On Tue, Dec 13, 2016 at 1:45 AM, Robert Haas  wrote:
> And now I'm noticing that Michael Paquier previously started a thread
> on this problem which I failed to note before starting this one:
>
> http://postgr.es/m/CAB7nPqSYN05rGsYCTahxTz+2hBikh7=m+hr2JTXaZv_Ei=q...@mail.gmail.com

Yes. I already had a look at what could be done to expose the
auxiliary processes in a system view with a set of proposals, though I
am not sure what would be better. It would be better to move the
discussion on this thread in my opinion, we are tracking down a
solution for a new problem.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 12/12/16 23:33, Andres Freund wrote:
> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>> On 12/12/16 22:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
 Hi,
 First one is outright bug, which has to do with how we track running
 transactions. What snapbuild basically does while doing initial snapshot
 is read the xl_running_xacts record, store the list of running txes and
 then wait until they all finish. The problem with this is that
 xl_running_xacts does not ensure that it only logs transactions that are
 actually still running (to avoid locking PGPROC) so there might be xids
 in xl_running_xacts that already committed before it was logged.
>>>
>>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>>> observed must actually be a bit more complex :(
>>>
>>
>> Hmm, interesting, I did see the transaction commit in the WAL before the
>> xl_running_xacts that contained the xid as running. I only seen it on
>> production system though, didn't really manage to easily reproduce it
>> locally.
> 
> I suspect the reason for that is that RecordTransactionCommit() doesn't
> conflict with ProcArrayLock in the first place - only
> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
> sense, just not the crash-recovery sense...
> 

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Also if we did the approach suggested by my patch (ie using this
xmin/xmax comparison) I guess we wouldn't need to hold the lock for
extra time in wal_level logical anymore.

That is of course unless you think it should be approached from the
other side of the stream and try log correct xl_running_xacts.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

2016-12-12 Thread Thomas Munro

On Tue, Dec 13, 2016 at 10:47 AM, Kevin Grittner  wrote:
> On Mon, Dec 12, 2016 at 1:06 PM, Kevin Grittner  wrote:
>> On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner  wrote:
>>
>>> As you can see, this generated a serialization failure.
>>
>> That was on 9.6.  On earlier versions it does indeed allow the
>> transaction on connection 2 to commit, yielding a non-serializable
>> result.  This makes a pretty strong case for back-patching this
>> commit:
>>
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766
>
> I have confirmed that this patch applies cleanly to all supported
> branches (with some offsets), that the bug is manifest without the
> patch, and that it is fixed with this patch in all supported
> branches.  This patch has been in the development code base for
> about 8 months and in production with the 9.6 release, so it has
> been in active production for 3 months with no sign of trouble.  If
> you ignore the code comment, doc changes, and new regression tests
> it consists of adding one line of code to nbtinsert.c.  What it
> does is that when a "unique_violation" error is about to fire, it
> adds a check to see whether the conditions for a serialization
> failure also exist; if so, it fires that error instead.
>
> This was not initially back-patched, in spite of numerous requests
> to do so, because it was a behavior change and not clearly a bug --
> it has a least some minimal chance of changing behavior someone
> might be relying on; however, Ian has constructed a use-case where
> without this patch we clearly allow a serialization anomaly which
> is not allowed with the patch, so this patch should, IMO, be
> considered a bug fix on that basis.
>
> Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
> (9.1 is out of support and the fix is already in 9.6 and forward.)

+1

Ian's test case uses an exception handler to convert a difference in
error code into a difference in committed effect, thereby converting a
matter of programmer convenience into a bug.

For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
remains an open question for further work.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

Jeff Janes  writes:
> I have a test case where I made the fdw connect back to itself, and
> stripped out all the objects that I could and still reproduce the case.  It
> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

Thanks for the test case.  I believe I understand the fundamental problem,
or one of the fundamental problems --- I'm not exactly convinced there
aren't several here.

The key issue is that GetExistingLocalJoinPath believes it can return
any random join path as what to use for the fdw_outerpath of a foreign
join.  You can get away with that, perhaps, as long as you only consider
two-way joins.  But in a nest of foreign joins, this fails to consider
the interrelationships of join paths and their children.  In particular,
what we have happening in this example is that GetExistingLocalJoinPath
seizes randomly on a MergePath for an upper join relation, clones it,
sees that the outer child is a join ForeignPath, and replaces that outer
child with its fdw_outerpath ... which was chosen equally cavalierly by
some previous execution of GetExistingLocalJoinPath, and does not have
the sort ordering expected by the MergePath.  So we'd generate an invalid
EPQ plan, except that the debug cross-check in create_mergejoin_plan
notices the discrepancy.

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

 * Just skip anything else. We don't know if corresponding
 * plan would build the output row from whole-row references
 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

We've probably managed to not notice this so far because foreign joins
generally ought to dominate any local join method, so that there wouldn't
often be cases where the surviving paths use local joins for input
sub-joins.  But Jeff's test case proves it can happen.

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

BTW, what's "existing" about the result of GetExistingLocalJoinPath?
And for that matter, what's "outer" about the content of fdw_outerpath?
Good luck figuring that out from the documentation of the field, all
zero words of it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-12 Thread Tomas Vondra


On 12/12/2016 05:05 AM, Petr Jelinek wrote:


I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.



Attached is a rebased version of the patch, resolving the Makefile merge 
conflicts.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v7.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
> On 12/12/16 22:42, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> >> Hi,
> >> First one is outright bug, which has to do with how we track running
> >> transactions. What snapbuild basically does while doing initial snapshot
> >> is read the xl_running_xacts record, store the list of running txes and
> >> then wait until they all finish. The problem with this is that
> >> xl_running_xacts does not ensure that it only logs transactions that are
> >> actually still running (to avoid locking PGPROC) so there might be xids
> >> in xl_running_xacts that already committed before it was logged.
> > 
> > I don't think that's actually true?  Notice how LogStandbySnapshot()
> > only releases the lock *after* the LogCurrentRunningXacts() iff
> > wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> > observed must actually be a bit more complex :(
> > 
> 
> Hmm, interesting, I did see the transaction commit in the WAL before the
> xl_running_xacts that contained the xid as running. I only seen it on
> production system though, didn't really manage to easily reproduce it
> locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does.  So they're still running in the PGPROC
sense, just not the crash-recovery sense...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 12/12/16 22:42, Andres Freund wrote:
> Hi,
> 
> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>> Hi,
>> First one is outright bug, which has to do with how we track running
>> transactions. What snapbuild basically does while doing initial snapshot
>> is read the xl_running_xacts record, store the list of running txes and
>> then wait until they all finish. The problem with this is that
>> xl_running_xacts does not ensure that it only logs transactions that are
>> actually still running (to avoid locking PGPROC) so there might be xids
>> in xl_running_xacts that already committed before it was logged.
> 
> I don't think that's actually true?  Notice how LogStandbySnapshot()
> only releases the lock *after* the LogCurrentRunningXacts() iff
> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> observed must actually be a bit more complex :(
> 

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 1:06 PM, Kevin Grittner  wrote:
> On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner  wrote:
>
>> As you can see, this generated a serialization failure.
>
> That was on 9.6.  On earlier versions it does indeed allow the
> transaction on connection 2 to commit, yielding a non-serializable
> result.  This makes a pretty strong case for back-patching this
> commit:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766

I have confirmed that this patch applies cleanly to all supported
branches (with some offsets), that the bug is manifest without the
patch, and that it is fixed with this patch in all supported
branches.  This patch has been in the development code base for
about 8 months and in production with the 9.6 release, so it has
been in active production for 3 months with no sign of trouble.  If
you ignore the code comment, doc changes, and new regression tests
it consists of adding one line of code to nbtinsert.c.  What it
does is that when a "unique_violation" error is about to fire, it
adds a check to see whether the conditions for a serialization
failure also exist; if so, it fires that error instead.

This was not initially back-patched, in spite of numerous requests
to do so, because it was a behavior change and not clearly a bug --
it has a least some minimal chance of changing behavior someone
might be relying on; however, Ian has constructed a use-case where
without this patch we clearly allow a serialization anomaly which
is not allowed with the patch, so this patch should, IMO, be
considered a bug fix on that basis.

Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
(9.1 is out of support and the fix is already in 9.6 and forward.)

Release notes for the next minor release set should probably
include a note that for those using serializable transactions a
serialization error can now occur in some situations where a unique
violation would previously have been reported.  I expect that this
will mostly be a welcome change, but it seems like something which
merits a compatibility warning.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> Hi,
> First one is outright bug, which has to do with how we track running
> transactions. What snapbuild basically does while doing initial snapshot
> is read the xl_running_xacts record, store the list of running txes and
> then wait until they all finish. The problem with this is that
> xl_running_xacts does not ensure that it only logs transactions that are
> actually still running (to avoid locking PGPROC) so there might be xids
> in xl_running_xacts that already committed before it was logged.

I don't think that's actually true?  Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
observed must actually be a bit more complex :(

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2016-12-12 Thread Peter Geoghegan

On Mon, Dec 12, 2016 at 12:59 PM, Robert Haas  wrote:
> Committed 0001.

Thanks.

I should have already specifically pointed out that the original
discussion on what became 0002-* is here:
postgr.es/m/7256.1476711...@sss.pgh.pa.us

As I said already, the general idea seems uncontroversial.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

On Sun, Dec 11, 2016 at 1:24 AM, Amit Kapila  wrote:
> The reason for this and the similar error in vacuum was that in one of
> the corner cases after freeing the overflow page and updating the link
> for the previous bucket, we were not marking the buffer as dirty.  So,
> due to concurrent activity, the buffer containing the updated links
> got evicted and then later when we tried to access the same buffer, it
> brought back the old copy which contains a link to freed overflow
> page.
>
> Apart from above issue, Kuntal has noticed that there is assertion
> failure (Assert(bucket == new_bucket);) in hashbucketcleanup with the
> same test as provided by you. The reason for that problem was that
> after deleting tuples in hashbucketcleanup, we were not marking the
> buffer as dirty due to which the old copy of the overflow page was
> re-appearing and causing that failure.
>
> After fixing the above problem,  it has been noticed that there is
> another assertion failure (Assert(bucket == obucket);) in
> _hash_splitbucket_guts.  The reason for this problem was that after
> the split, vacuum failed to remove tuples from the old bucket that are
> moved due to split. Now, during next split from the same old bucket,
> we don't expect old bucket to contain tuples from the previous split.
> To fix this whenever vacuum needs to perform split cleanup, it should
> update the metapage values (masks required to calculate bucket
> number), so that it shouldn't miss cleaning the tuples.
> I believe this is the same assertion what Andreas has reported in
> another thread [1].
>
> The next problem we encountered is that after running the same test
> for somewhat longer, inserts were failing with error "unexpected zero
> page at block ..".  After some analysis, I have found that the lock
> chain (lock next overflow bucket page before releasing the previous
> bucket page) was broken in one corner case in _hash_freeovflpage due
> to which insert went ahead than squeeze bucket operation and accessed
> the freed overflow page before the link for the same has been updated.
>
> With above fixes, the test ran successfully for more than a day.

Instead of doing this:

+_hash_chgbufaccess(rel, bucket_buf, HASH_WRITE, HASH_NOLOCK);
+_hash_chgbufaccess(rel, bucket_buf, HASH_NOLOCK, HASH_WRITE);

...wouldn't it be better to just do MarkBufferDirty()?  There's no
real reason to release the lock only to reacquire it again, is there?
I don't think we should be afraid to call MarkBufferDirty() instead of
going through these (fairly stupid) hasham-specific APIs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jacana hung after failing to acquire random number

2016-12-12 Thread Andrew Dunstan




On 12/12/2016 09:02 AM, Heikki Linnakangas wrote:

On 12/12/2016 03:40 PM, Andrew Dunstan wrote:

Or should we at least get pg_regress to try to shut down the
postmaster if it can't connect after 120 seconds?


I think that makes a lot of sense, independently of this random stuff.





I will add it to my long TODO list.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

On Mon, Dec 12, 2016 at 2:30 PM, Peter Geoghegan  wrote:
> On Mon, Dec 12, 2016 at 9:31 AM, Robert Haas  wrote:
>> I think this patch might have a bug.  In the existing code,
>> tuplesort_gettupleslot sets should_free = true if it isn't already
>> just before calling ExecStoreMinimalTuple((MinimalTuple) stup.tuple,
>> slot, should_free), so it seems that ExecStoreMinimalTuple() will
>> always get "true" as the fourth argument. However the patch changes
>> that line of code like this:
>>
>> +ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, false);
>>
>> So the patch seems to have the effect of changing the fourth argument
>> to this call to ExecStoreMinimalTuple() from always-true to
>> always-false.  I might be missing something, but my guess is that's
>> not right.
>
> There was a memory leak added by 0001-*, but then fixed by 0002-*. I
> should have done more testing of 0001-* alone. Oops.
>
> Attached revision of 0001-* fixes this. A revised 0002-* is also
> attached, just as a convenience for reviewers (they won't need to
> resolve the conflict themselves).

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Nested Wait Events?

On Mon, Dec 12, 2016 at 2:42 PM, Simon Riggs  wrote:
> There's too many "I"s in that para. I've not presented this as a
> defect, nor is there any reason to believe this post is aimed at you
> personally.

Well, actually, there is.  You said in your original post that
something was "not correct" and something else was "not handled".
That sounds like a description of a defect to me.  If that wasn't how
you meant it, fine.

> I'm letting Hackers know that I've come across two problems and I see
> more. I'm good with accepting reduced scope in return for performance,
> but we should be allowed to discuss what limitations that imposes
> without rancour.

I'm not mad.  I thought you were.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for temporary replication slots

Fujii Masao  writes:
> On Mon, Dec 12, 2016 at 11:16 PM, Peter Eisentraut  wrote:
>> Add support for temporary replication slots

> Doesn't this need catversion bump?

Yes, absolutely, because of the ABI break for the affected functions.
Pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs

On 12 December 2016 at 18:05, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 12:16 PM, Simon Riggs  wrote:
>> On 12 December 2016 at 16:52, Robert Haas  wrote:
>>> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
 Last week I noticed that the Wait Event/Locks system doesn't correctly
 describe waits for tuple locks because in some cases that happens in
 two stages.
>>>
>>> Well, I replied to that email to say that I didn't agree with your
>>> analysis.  I think if something happens in two stages, those wait
>>> events should be distinguished.  The whole point here is to get
>>> clarity on what the system is waiting for, and we lose that if we
>>> start trying to merge together things which are at the code level
>>> separate.
>>
>> Clarity is what we are both looking for then.
>
> Granted.
>
>> I know I am waiting for a tuple lock. You want information about all
>> the lower levels. I'm good with that as long as the lower information
>> is somehow recorded against the higher level task, which it wouldn't
>> be in either of the cases I mention, hence why I bring it up again.
>
> So, I think that this may be a case where I built an apple and you are
> complaining that it's not an orange.  I had very clearly in mind from
> the beginning of the wait event work that we were trying to expose
> low-level information about what the system was doing, and I advocated
> for this design as a way of doing that, I think, reasonably well.  The
> statement that you want information about what is going on at a higher
> level is fair, but IMHO it's NOT fair to present that as a defect in
> what's been committed.  It was never intended to do that, at least not
> by me, and I committed all of the relevant patches and had a fair
> amount of involvement with the design.  You may think I should have
> been trying to solve a different problem and you may even be right,
> but that is a separate issue from how well I did at solving the
> problem I was attempting to solve.

There's too many "I"s in that para. I've not presented this as a
defect, nor is there any reason to believe this post is aimed at you
personally.

I'm letting Hackers know that I've come across two problems and I see
more. I'm good with accepting reduced scope in return for performance,
but we should be allowed to discuss what limitations that imposes
without rancour.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2016-12-12 Thread Peter Geoghegan

On Mon, Dec 12, 2016 at 9:31 AM, Robert Haas  wrote:
> I think this patch might have a bug.  In the existing code,
> tuplesort_gettupleslot sets should_free = true if it isn't already
> just before calling ExecStoreMinimalTuple((MinimalTuple) stup.tuple,
> slot, should_free), so it seems that ExecStoreMinimalTuple() will
> always get "true" as the fourth argument. However the patch changes
> that line of code like this:
>
> +ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, false);
>
> So the patch seems to have the effect of changing the fourth argument
> to this call to ExecStoreMinimalTuple() from always-true to
> always-false.  I might be missing something, but my guess is that's
> not right.

There was a memory leak added by 0001-*, but then fixed by 0002-*. I
should have done more testing of 0001-* alone. Oops.

Attached revision of 0001-* fixes this. A revised 0002-* is also
attached, just as a convenience for reviewers (they won't need to
resolve the conflict themselves).

-- 
Peter Geoghegan
From 541a7f0ae6060763cd4448359159c1a2c5980a68 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 13 Oct 2016 10:54:31 -0700
Subject: [PATCH 2/3] Avoid copying within tuplesort_gettupleslot()

Add a "copy" argument to make it optional to receive a copy of caller
tuple that is safe to use across tuplesort_gettupleslot() calls, as a
performance optimization.  Existing tuplesort_gettupleslot() callers are
made to opt out of copying where that has been determined to be safe.
This brings tuplesort_gettupleslot() in line with
tuplestore_gettupleslot().

In the future, a "copy" tuplesort_getdatum() argument may be added, that
similarly allows callers to opt out of receiving a new copy of tuple
under their direct ownership.
---
 src/backend/executor/nodeAgg.c |  9 ++---
 src/backend/executor/nodeSort.c|  5 +++--
 src/backend/utils/adt/orderedsetaggs.c |  5 +++--
 src/backend/utils/sort/tuplesort.c | 17 +++--
 src/include/utils/tuplesort.h  |  2 +-
 5 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index eefb3d6..16a1263 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -570,6 +570,9 @@ initialize_phase(AggState *aggstate, int newphase)
  * Fetch a tuple from either the outer plan (for phase 0) or from the sorter
  * populated by the previous phase.  Copy it to the sorter for the next phase
  * if any.
+ *
+ * Callers cannot rely on memory for tuple in returned slot remaining allocated
+ * past any subsequent call here.  (The sorter may recycle the memory.)
  */
 static TupleTableSlot *
 fetch_input_tuple(AggState *aggstate)
@@ -578,8 +581,8 @@ fetch_input_tuple(AggState *aggstate)
 
 	if (aggstate->sort_in)
 	{
-		if (!tuplesort_gettupleslot(aggstate->sort_in, true, aggstate->sort_slot,
-	NULL))
+		if (!tuplesort_gettupleslot(aggstate->sort_in, true, false,
+	aggstate->sort_slot, NULL))
 			return NULL;
 		slot = aggstate->sort_slot;
 	}
@@ -1273,7 +1276,7 @@ process_ordered_aggregate_multi(AggState *aggstate,
 		ExecClearTuple(slot2);
 
 	while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
-  true, slot1, &newAbbrevVal))
+  true, true, slot1, &newAbbrevVal))
 	{
 		/*
 		 * Extract the first numTransInputs columns as datums to pass to the
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index a34dcc5..8fc7106 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -132,12 +132,13 @@ ExecSort(SortState *node)
 
 	/*
 	 * Get the first or next tuple from tuplesort. Returns NULL if no more
-	 * tuples.
+	 * tuples. Note that we rely on memory for tuple in slot remaining
+	 * allocated only until subsequent call here.
 	 */
 	slot = node->ss.ps.ps_ResultTupleSlot;
 	(void) tuplesort_gettupleslot(tuplesortstate,
   ScanDirectionIsForward(dir),
-  slot, NULL);
+  false, slot, NULL);
 	return slot;
 }
 
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index fe44d56..3c23329 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -1189,7 +1189,7 @@ hypothetical_rank_common(FunctionCallInfo fcinfo, int flag,
 	tuplesort_performsort(osastate->sortstate);
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, NULL))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot, NULL))
 	{
 		bool		isnull;
 		Datum		d = slot_getattr(slot, nargs + 1, &isnull);
@@ -1352,7 +1352,8 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 	slot2 = extraslot;
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, &abbrevVal))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot,
+  &abbrev

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner  wrote:

> As you can see, this generated a serialization failure.

That was on 9.6.  On earlier versions it does indeed allow the
transaction on connection 2 to commit, yielding a non-serializable
result.  This makes a pretty strong case for back-patching this
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in doc/src/sgml/catalogs.sgml

On Sun, Dec 11, 2016 at 5:13 AM, Joel Jacobson  wrote:
> Hi,
>
> I found a minor typo at
> https://www.postgresql.org/docs/9.6/static/catalog-pg-am.html.
>
> pg_catalog.pg_am. amhandler is of type "oid" according to the
> documentation, but it's actually of type "regproc" in reality.
>
> Compare with e.g. pg_aggregate where the columns of type "regproc" is
> both "regproc" in the documentation and in reality.

Thanks, committed and back-patched to 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : Parallel Merge Join

On Sat, Dec 10, 2016 at 7:59 AM, Dilip Kumar  wrote:
> I would like to propose a patch for parallelizing merge join path.
> This idea is derived by analyzing TPCH results.
>
> I have done this analysis along with my colleagues Rafia sabih and Amit 
> kaplia.
>
> Currently we already have infrastructure for executing parallel join,
> so we don't need any change at executor level. Only in path
> generation, we need to add partial paths for merge join, like we do
> for nest loop and hash join.

Hmm, so it seems my initial guess that we didn't need to bother
generating such paths was wrong.  Oops.

This patch is hard to read because it is reorganizing a bunch of code
as well as adding new functionality.  Perhaps you could separate it
into two patches, one with the refactoring and then the other with the
new functionality.  Actually, though, I don't understand why you need
so much rearrangement

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 8:45 AM, Ian Jackson  wrote:

> AIUI the documented behavour is that "every set of successful
> transactions is serialisable".

Well, in context that is referring to serializable transactions.
No such guarantee is provided for other isolation levels.

By the way, existing behavior should be sufficient to prevent
serialization anomalies from becoming manifest in the database;
where it is less than ideal is that it is hard to tell from the
SQLSTATE on a failure whether a retry is sensible.  It would be
nice to provide the additional functionality, but database is
performing as intended and (as far as I know) as documented.  If
the documentation on this is not clear, I'd be happy to help get it
fixed, but barring any deficiency there, this is a feature request,
not a bug report.

> But, consider the following scenario.
>
> [example]

> I have just tried this and got this result:
>
> [nonsensical results]

I didn't.  First, I got this when I tried to start the concurrent
transactions using the example as provided:

test=#   SELECT count(*) FROM t WHERE k=1;   -- save value
ERROR:  operator does not exist: text = integer
LINE 1: SELECT count(*) FROM t WHERE k=1;
  ^
HINT:  No operator matches the given name and argument type(s). You
might need to add explicit type casts.

That is as it should be.  There is no equality comparison operator
supported for text on one side and integer on the other.  There
would be no principled way to determine the correct result of
comparing '2' and '15' or of comparing '01' and '1'.  It kinda
raises a question of what you are running that did *not* generate
this error.  What version with what modifications are you running?

So, I modified it so that it *should* run, set
default_transaction_isolation = 'serializable' on both connections,
and got this:

*** CONNECTION 1 ***
test=# CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
test$# BEGIN
test$#   BEGIN
test$# INSERT INTO t (k,v) VALUES (nk, -1);
test$#   EXCEPTION WHEN unique_violation THEN
test$# INSERT INTO c (k,v) VALUES (nk, c);
test$#   END;
test$#   RETURN 0;
test$# END;
test$# $$ LANGUAGE plpgsql;
CREATE FUNCTION
test=#
test=# DROP TABLE IF EXISTS t;
DROP TABLE
test=# DROP TABLE IF EXISTS c;
DROP TABLE
test=#
test=# CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE
test=# CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE
test=#
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1';  -- save value
 count
---
 0
(1 row)

test=#-- sleep to ensure conflict

*** CONNECTION 2 ***
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1';  -- save value
 count
---
 0
(1 row)

test=#-- sleep to ensure conflict

*** CONNECTION 1 ***
test=# SELECT demo('1', 0);   -- using value from SELECT
 demo
--
0
(1 row)

test=#

*** CONNECTION 2 ***
test=# SELECT demo('1', 0);   -- using value from SELECT
*** CONNECTION 2 blocks ***

*** CONNECTION 1 ***
test=# COMMIT;
COMMIT
test=#

*** CONNECTION 2 unblocks and outputs ***
ERROR:  could not serialize access due to read/write dependencies
among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.
CONTEXT:  SQL statement "INSERT INTO t (k,v) VALUES (nk, -1)"
PL/pgSQL function demo(text,integer) line 4 at SQL statement
test=#

As you can see, this generated a serialization failure.  I decided
to do what an application should, and retry the transaction.

*** CONNECTION 2 ***
test=# ROLLBACK;
ROLLBACK
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1';  -- save value
 count
---
 1
(1 row)

test=# SELECT demo('1', 1);   -- using value from SELECT
 demo
--
0
(1 row)

test=# COMMIT;
COMMIT
test=# SELECT * FROM t;
 k | v
---+
 1 | -1
(1 row)

test=# SELECT * FROM c;
 k | v
---+---
 1 | 1
(1 row)

test=#

If you have some way to cause a set of concurrent serializable
transactions to generate results from those transactions which
commit which is not consistent with some one-at-a-time order of
execution, I would be very interested in seeing the test case.
The above, however, is not it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Nested Wait Events?

On Mon, Dec 12, 2016 at 12:16 PM, Simon Riggs  wrote:
> On 12 December 2016 at 16:52, Robert Haas  wrote:
>> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
>>> Last week I noticed that the Wait Event/Locks system doesn't correctly
>>> describe waits for tuple locks because in some cases that happens in
>>> two stages.
>>
>> Well, I replied to that email to say that I didn't agree with your
>> analysis.  I think if something happens in two stages, those wait
>> events should be distinguished.  The whole point here is to get
>> clarity on what the system is waiting for, and we lose that if we
>> start trying to merge together things which are at the code level
>> separate.
>
> Clarity is what we are both looking for then.

Granted.

> I know I am waiting for a tuple lock. You want information about all
> the lower levels. I'm good with that as long as the lower information
> is somehow recorded against the higher level task, which it wouldn't
> be in either of the cases I mention, hence why I bring it up again.

So, I think that this may be a case where I built an apple and you are
complaining that it's not an orange.  I had very clearly in mind from
the beginning of the wait event work that we were trying to expose
low-level information about what the system was doing, and I advocated
for this design as a way of doing that, I think, reasonably well.  The
statement that you want information about what is going on at a higher
level is fair, but IMHO it's NOT fair to present that as a defect in
what's been committed.  It was never intended to do that, at least not
by me, and I committed all of the relevant patches and had a fair
amount of involvement with the design.  You may think I should have
been trying to solve a different problem and you may even be right,
but that is a separate issue from how well I did at solving the
problem I was attempting to solve.

There was quite a lot of discussion 9-12 months ago (IIRC) about
wanting additional detail to be associated with wait events.  From
what I understand, Oracle will not only report that it waited for a
block to be read but also tells you for which block it was waiting,
and some of the folks at Postgres Pro were advocating for the wait
event facility to do something similar.  I strongly resisted that kind
of additional detail, because what makes the current system fast and
low-impact, and therefore able to be on by default, is that all it
does is one unsynchronized 4-byte write into shared memory.  If we do
anything more than that -- say 8 bytes, let alone the extra 20 bytes
we'd need to store a relfilenode -- we're going to need to insert
memory barriers in the path that updates the data in order to make
sure that it can be read without tearing, and I'm afraid that's going
to have a noticeable performance impact.  Certainly, we'd need to
check into that very carefully before doing it.  Operations like
reading a block or blocking on an LWLock are heavier than a couple of
memory barriers, but they're not necessarily so much heavier that we
can afford to throw extra memory barriers in those paths without any
impact.

Now, some of what you want to do here may be able to be done without
making wait_event_info any wider than uint32, and to the extent that's
possible without too much contortion I am fine with it.  If you want
to know that a tuple lock was being sought for an update rather than a
delete, that could probably be exposed.  But if you want to know WHICH
tuple or even WHICH relation was affected, this mechanism isn't
well-suited to that task.  I think we may well want to add some new
mechanism that reports those sorts of things, but THIS mechanism
doesn't have the bit-space for it and isn't designed to do it.  It's
designed to give basic information and be so cheap that we can use it
practically everywhere.  For more detailed reporting, we should
probably have facilities that are not turned on by default, or else
facilities that are limited to cases where the volume can never be
very high.  You don't have to add a lot of overhead to cause a problem
in a code path that executes tens of thousands of times per second per
backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-12 Thread Dmitry Ivanov

Huh, this code is broken as well. We have to ignore partitions that don't 
have any subpartitions. Patch is attached below (v2).



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 219d380..6555c7c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -950,7 +950,8 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
 			   *parted_rels;
 	ListCell   *lc;
 	int			i,
-k;
+k,
+partitioned_children_so_far = 0;
 
 	/*
 	 * Lock partitions and make a list of the partitioned ones to prepare
@@ -1001,7 +1002,7 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
 		PartitionKey partkey = RelationGetPartitionKey(partrel);
 		PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
 		int			j,
-	m;
+	my_partitioned_children;
 
 		pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
 		pd[i]->reldesc = partrel;
@@ -1010,7 +1011,7 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
 		pd[i]->partdesc = partdesc;
 		pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
 
-		m = 0;
+		my_partitioned_children = 0;
 		for (j = 0; j < partdesc->nparts; j++)
 		{
 			Oid			partrelid = partdesc->oids[j];
@@ -1026,11 +1027,21 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
  * We can assign indexes this way because of the way
  * parted_rels has been generated.
  */
-pd[i]->indexes[j] = -(i + 1 + m);
-m++;
+pd[i]->indexes[j] = -(1 +
+	  my_partitioned_children +
+	  partitioned_children_so_far);
+
+my_partitioned_children++;
 			}
 		}
 		i++;
+
+		/*
+		 * Children of this parent should be placed after all
+		 * partitioned children of all previous parents, so we
+		 * have to take this into account.
+		 */
+		partitioned_children_so_far += my_partitioned_children;
 	}
 
 	return pd;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On Mon, Dec 12, 2016 at 10:33 AM, Andres Freund  wrote:
> On 2016-12-12 13:26:32 -0300, Alvaro Herrera wrote:
>> Robert Haas wrote:

>>> 1. Show all processes that have a PGPROC in pg_stat_activity,

>>> 2. Add a second view, say pg_stat_system_activity,

>> I vote 1.
>
> +1

+1

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

On Fri, Dec 9, 2016 at 5:59 PM, Peter Geoghegan  wrote:
> Attached patch 0001-* removes all should_free arguments. To reiterate,
> this is purely a refactoring patch.

I think this patch might have a bug.  In the existing code,
tuplesort_gettupleslot sets should_free = true if it isn't already
just before calling ExecStoreMinimalTuple((MinimalTuple) stup.tuple,
slot, should_free), so it seems that ExecStoreMinimalTuple() will
always get "true" as the fourth argument. However the patch changes
that line of code like this:

+ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, false);

So the patch seems to have the effect of changing the fourth argument
to this call to ExecStoreMinimalTuple() from always-true to
always-false.  I might be missing something, but my guess is that's
not right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background contrib module proposal

2016-12-12 Thread Andrew Borodin

Hi!

Just in case you'd like to include sleepsort as a test, here it is
wrapped as a regression test(see attachment). But it has serious
downside: it runs no less than 5 seconds.

Also I'll list here every parallelism feature I managed to imagine. It
is not to say that I suggest having some of these features at v1. We
certainly have to start somewhere, this list is just for information
purposes.
1. Poolable workers. Just to be sure you can reliably queue your task
somewhere without having "increase max_connections" hint.
2. Inside one pool, you can have task chaining. After competition of
task A (query A) start task B, in case of failure start task C. Task C
is followed by task D.
3. Reliably read task status: running\awaiting\completed\errored\in a
lock. Get a query plan of a task? (I know, there are reasons why it is
not possible now)
4. Query as a future (actually it is implemented already by
pg_background_result)
5. Promised result. Declare that you are waiting for data of specific
format, execute a query, someone (from another backend) will
eventually place a data to promised result and your query continues
execution.
6. Cancelation: a way to signal to background query that it's time to
quit gracefully.

Best regards, Andrey Borodin.
diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile
index c4e717d..085fbff 100644
--- a/contrib/pg_background/Makefile
+++ b/contrib/pg_background/Makefile
@@ -6,6 +6,8 @@ OBJS = pg_background.o
 EXTENSION = pg_background
 DATA = pg_background--1.0.sql
 
+REGRESS = pg_background
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_background/expected/pg_background.out 
b/contrib/pg_background/expected/pg_background.out
new file mode 100644
index 000..dbc344e
--- /dev/null
+++ b/contrib/pg_background/expected/pg_background.out
@@ -0,0 +1,26 @@
+CREATE EXTENSION pg_background;
+--run 5 workers which wait about 1 second
+CREATE TABLE input as SELECT x FROM generate_series(1,5,1) x ORDER BY x DESC;
+CREATE TABLE output(place int,value int);
+CREATE sequence s start 1;
+CREATE TABLE handles as SELECT pg_background_launch('select pg_sleep('||x||'); 
insert into output values (nextval(''s''),'||x||');') h FROM input;
+SELECT (SELECT * FROM pg_background_result(h) as (x text) limit 1) FROM 
handles;
+x 
+--
+ SELECT 1
+ SELECT 1
+ SELECT 1
+ SELECT 1
+ SELECT 1
+(5 rows)
+
+SELECT * FROM output ORDER BY place;
+ place | value 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+ 4 | 4
+ 5 | 5
+(5 rows)
+
diff --git a/contrib/pg_background/sql/pg_background.sql 
b/contrib/pg_background/sql/pg_background.sql
new file mode 100644
index 000..d7cbd44
--- /dev/null
+++ b/contrib/pg_background/sql/pg_background.sql
@@ -0,0 +1,9 @@
+CREATE EXTENSION pg_background;
+
+--run 5 workers which wait about 1 second
+CREATE TABLE input as SELECT x FROM generate_series(1,5,1) x ORDER BY x DESC;
+CREATE TABLE output(place int,value int);
+CREATE sequence s start 1;
+CREATE TABLE handles as SELECT pg_background_launch('select pg_sleep('||x||'); 
insert into output values (nextval(''s''),'||x||');') h FROM input;
+SELECT (SELECT * FROM pg_background_result(h) as (x text) limit 1) FROM 
handles;
+SELECT * FROM output ORDER BY place;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-12 Thread Dmitry Ivanov


Hi guys,

Looks like I've just encountered a bug. Please excuse me for the messy 
email, I don't have much time at the moment.



Here's the test case:

create table test(val int) partition by range (val);
create table test_1 partition of test for values from (1) to (1000) 
partition by range(val);
create table test_2 partition of test for values from (1000) to (2000) 
partition by range(val);
create table test_1_1 partition of test_1 for values from (1) to (500) 
partition by range(val);
create table test_1_2 partition of test_1 for values from (500) to (1000) 
partition by range(val);

create table test_1_1_1 partition of test_1_1 for values from (1) to (500);
create table test_1_2_1 partition of test_1_2 for values from (500) to 
(1000);



/* insert a row into "test_1_2_1" */
insert into test values(600);


/* what we EXPECT to see */
select *, tableoid::regclass from test;
val |  tableoid  
-+

600 | test_1_2_1
(1 row)


/* what we ACTUALLY see */
insert into test values(600);
ERROR:  no partition of relation "test_1_1" found for row
DETAIL:  Failing row contains (600).


How does this happen? This is how "PartitionDispatch" array looks like:

test | test_1 | test_2 | test_1_1 | test_1_2

which means that this code (partition.c : 1025):


/*
* We can assign indexes this way because of the way
* parted_rels has been generated.
*/
pd[i]->indexes[j] = -(i + 1 + m);


doesn't work, since partitions are not always placed right after the parent 
(implied by index "m").


We have to take into account the total amount of partitions we've 
encountered so far (right before index "i").


I've attached a patch with a hotfix, but the code looks so-so and has a 
smell. I think it must be rewritten. This bug hunt surely took a while: I 
had to recheck all of the steps several times.



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 219d380..119a41d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -950,7 +950,8 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
 			   *parted_rels;
 	ListCell   *lc;
 	int			i,
-k;
+k,
+children_so_far = 0;
 
 	/*
 	 * Lock partitions and make a list of the partitioned ones to prepare
@@ -1026,11 +1027,18 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
  * We can assign indexes this way because of the way
  * parted_rels has been generated.
  */
-pd[i]->indexes[j] = -(i + 1 + m);
+pd[i]->indexes[j] = -(1 + m + children_so_far);
 m++;
 			}
 		}
 		i++;
+
+		/*
+		 * Children of this parent are placed
+		 * after all children of all previous parents,
+		 * so we have to take this into account.
+		 */
+		children_so_far += partdesc->nparts;
 	}
 
 	return pd;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs

On 12 December 2016 at 16:52, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
>> Last week I noticed that the Wait Event/Locks system doesn't correctly
>> describe waits for tuple locks because in some cases that happens in
>> two stages.
>
> Well, I replied to that email to say that I didn't agree with your
> analysis.  I think if something happens in two stages, those wait
> events should be distinguished.  The whole point here is to get
> clarity on what the system is waiting for, and we lose that if we
> start trying to merge together things which are at the code level
> separate.

Clarity is what we are both looking for then.

I know I am waiting for a tuple lock. You want information about all
the lower levels. I'm good with that as long as the lower information
is somehow recorded against the higher level task, which it wouldn't
be in either of the cases I mention, hence why I bring it up again.

Same thing occurs in any case where we wait for multiple lwlocks.

"I had to buy a mop so I could clean the toilets" is potentially
important information, but I would prefer to start at the intention
side. So that "cleaning the toilets" shows up as the intent, which
might consist of multiple sub-tasks. We can then investigate why
sometimes cleaning the toilet takes one flush and other times it
involves a shopping trip to get a mop. If "mop purchase" is not
correctly associated with cleaning then we don't notice what is going
on and cannot do anything useful with the info.

Regrettably, it's an accounting problem not a database problem and we
need a chart of accounts hierarchy to solve it. (e.g. bill of
materials).

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new table partitioning breaks \d table to older versions

On Fri, Dec 9, 2016 at 1:18 PM, Fabrízio de Royes Mello
 wrote:
> On Fri, Dec 9, 2016 at 3:59 PM, Jeff Janes  wrote:
>> Since:
>> commit f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63
>> Author: Robert Haas 
>> Date:   Wed Dec 7 13:17:43 2016 -0500
>>
>> Implement table partitioning.
>>
>> If I use psql compiled from 10devel to connect to a 9.6.1 server, then \d
>> fails:
>>
>> psql (10devel-f0e4475, server 9.6.1-16e7c02)
>> Type "help" for help.
>>
>>
>> # \d pgbench_accounts
>> ERROR:  column c.relpartbound does not exist
>> LINE 1: ...ELECT inhparent::pg_catalog.regclass, pg_get_expr(c.relpartb...
>>
>
> Looks like they forgot to adjust version check number in describe.c code.
>
> Attched patch fix it.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Nested Wait Events?

On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
> Last week I noticed that the Wait Event/Locks system doesn't correctly
> describe waits for tuple locks because in some cases that happens in
> two stages.

Well, I replied to that email to say that I didn't agree with your
analysis.  I think if something happens in two stages, those wait
events should be distinguished.  The whole point here is to get
clarity on what the system is waiting for, and we lose that if we
start trying to merge together things which are at the code level
separate.

> Now I notice that the Wait Event system doesn't handle waiting for
> recovery conflicts at all, though it does access ProcArrayLock
> multiple times.

This isn't a very clear statement.  Every place in the system that can
provoke a wait on a latch or a process semaphore display some kind of
wait event in pg_stat_activity.  Some of those displays may not be as
clear or detailed as you would like and that's fine, but saying they
are not handled is not exactly true.

> Don't have a concrete proposal, but I think we need a more complex
> model for how we record wait event data. Something that separates
> intention (e.g. "Travelling to St.Ives") from current event (e.g.
> "Waiting for LWLock")

That's not a bad thought.  We need to be careful to keep this very
lightweight so that it doesn't affect performance, but the general
concept of separating intention from current event might have some
legs.  We just need to be careful that it doesn't involve into
something that involves a lot of complicated bookkeeping, because
these wait events can occur very frequently and in hot code-paths.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On Mon, Dec 12, 2016 at 11:19 AM, Robert Haas  wrote:
> So, one of the problems in this patch as committed is that for any
> process that doesn't show up in pg_stat_activity, there's no way to
> see the wait event information.  That sucks.  I think there are
> basically two ways to fix this:
>
> 1. Show all processes that have a PGPROC in pg_stat_activity,
> including auxiliary processes and whatnot, and use some new field in
> pg_stat_activity to indicate the process type.
>
> 2. Add a second view, say pg_stat_system_activity, to show the
> processes that don't appear in pg_stat_activity.  A bunch of columns
> could likely be omitted, but there would be some duplication, too.
>
> Preferences?

And now I'm noticing that Michael Paquier previously started a thread
on this problem which I failed to note before starting this one:

http://postgr.es/m/CAB7nPqSYN05rGsYCTahxTz+2hBikh7=m+hr2JTXaZv_Ei=q...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2016-12-12 Thread David G. Johnston

On Mon, Dec 12, 2016 at 9:19 AM, Robert Haas  wrote:

> So, one of the problems in this patch as committed is that for any
> process that doesn't show up in pg_stat_activity, there's no way to
> see the wait event information.  That sucks.  I think there are
> basically two ways to fix this:
>
> 1. Show all processes that have a PGPROC in pg_stat_activity,
> including auxiliary processes and whatnot, and use some new field in
> pg_stat_activity to indicate the process type.
>
> 2. Add a second view, say pg_stat_system_activity, to show the
> processes that don't appear in pg_stat_activity.  A bunch of columns
> could likely be omitted, but there would be some duplication, too.
>
> Preferences?
>
>
I'm inclined toward option 2.

A view over both that involves just the shared columns would give you the
benefits from option 1.

Question: is there a parent-child relationship involved here?  Given a
record in today's pg_stat_activity is it useful/possible to link in all of
the pg_stat_system_activity process that are working to fulfill the
client-initiated task?

Even with "system" we'd probably want to distinguish between background
workers and true system maintenance processes.  Or am I mis-interpreting
the scope of this feature?

David J.

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On 2016-12-12 13:26:32 -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> 
> > So, one of the problems in this patch as committed is that for any
> > process that doesn't show up in pg_stat_activity, there's no way to
> > see the wait event information.  That sucks.  I think there are
> > basically two ways to fix this:
> > 
> > 1. Show all processes that have a PGPROC in pg_stat_activity,
> > including auxiliary processes and whatnot, and use some new field in
> > pg_stat_activity to indicate the process type.
> > 
> > 2. Add a second view, say pg_stat_system_activity, to show the
> > processes that don't appear in pg_stat_activity.  A bunch of columns
> > could likely be omitted, but there would be some duplication, too.
> > 
> > Preferences?
> 
> I vote 1.

+1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs

Last week I noticed that the Wait Event/Locks system doesn't correctly
describe waits for tuple locks because in some cases that happens in
two stages.

Now I notice that the Wait Event system doesn't handle waiting for
recovery conflicts at all, though it does access ProcArrayLock
multiple times.

In both cases I tried to fix the problem before mentioning it here.

We can't add waits for either of those in a simple way because the
current system doesn't allow us to report multiple levels of wait. In
both these cases there is a single "top level wait" i.e. tuple locking
or recovery conflicts, even if there are other waits that form part of
the total wait.

I'm guessing that there are other situations like this also.

Don't have a concrete proposal, but I think we need a more complex
model for how we record wait event data. Something that separates
intention (e.g. "Travelling to St.Ives") from current event (e.g.
"Waiting for LWLock")

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

Alvaro Herrera  writes:
> Robert Haas wrote:
>> So, one of the problems in this patch as committed is that for any
>> process that doesn't show up in pg_stat_activity, there's no way to
>> see the wait event information.  That sucks.  I think there are
>> basically two ways to fix this:
>> 
>> 1. Show all processes that have a PGPROC in pg_stat_activity,
>> including auxiliary processes and whatnot, and use some new field in
>> pg_stat_activity to indicate the process type.
>> 
>> 2. Add a second view, say pg_stat_system_activity, to show the
>> processes that don't appear in pg_stat_activity.  A bunch of columns
>> could likely be omitted, but there would be some duplication, too.
>> 
>> Preferences?

> I vote 1.

If we go with #2, there would immediately be a need for a union view,
which would end up being exactly the same thing as the expanded display
proposed in #1.  Seems like the hard way, so I agree with Alvaro.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2016-12-12 Thread Alvaro Herrera

Robert Haas wrote:

> So, one of the problems in this patch as committed is that for any
> process that doesn't show up in pg_stat_activity, there's no way to
> see the wait event information.  That sucks.  I think there are
> basically two ways to fix this:
> 
> 1. Show all processes that have a PGPROC in pg_stat_activity,
> including auxiliary processes and whatnot, and use some new field in
> pg_stat_activity to indicate the process type.
> 
> 2. Add a second view, say pg_stat_system_activity, to show the
> processes that don't appear in pg_stat_activity.  A bunch of columns
> could likely be omitted, but there would be some duplication, too.
> 
> Preferences?

I vote 1.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-12 Thread Masahiko Sawada

On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao  wrote:
> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada  
> wrote:
>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  wrote:
>>> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada  
>>> wrote:
 On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
  wrote:
> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas  wrote:
>> You could do that, but first I would code up the simplest, cleanest
>> algorithm you can think of and see if it even shows up in a 'perf'
>> profile.  Microbenchmarking is probably overkill here unless a problem
>> is visible on macrobenchmarks.
>
> This is what I would go for! The current code is doing a simple thing:
> select the Nth element using qsort() after scanning each WAL sender's
> values. And I think that Sawada-san got it right. Even running on my
> laptop a pgbench run with 10 sync standbys using a data set that fits
> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
> using perf top on a non-assert, non-debug build. Hash tables and
> allocations get a far larger share. Using the patch,
> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
> nodes. Let's kick the ball for now. An extra patch could make things
> better later on if that's worth it.

 Yeah, since the both K and N could be not large these algorithm takes
 almost the same time. And current patch does simple thing. When we
 need over 100 or 1000 replication node the optimization could be
 required.
 Attached latest v9 patch.

>>>
>>> Few comments:
>>
>> Thank you for reviewing.
>>
>>> + * In 10.0 we support two synchronization methods, priority and
>>> + * quorum. The number of synchronous standbys that transactions
>>> + * must wait for replies from and synchronization method are specified
>>> + * in synchronous_standby_names. This parameter also specifies a list
>>> + * of standby names, which determines the priority of each standby for
>>> + * being chosen as a synchronous standby. In priority method, the standbys
>>> + * whose names appear earlier in the list are given higher priority
>>> + * and will be considered as synchronous. Other standby servers appearing
>>> + * later in this list represent potential synchronous standbys. If any of
>>> + * the current synchronous standbys disconnects for whatever reason,
>>> + * it will be replaced immediately with the next-highest-priority standby.
>>> + * In quorum method, the all standbys appearing in the list are
>>> + * considered as a candidate for quorum commit.
>>>
>>> In the above description, is priority method represented by FIRST and
>>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>>> it might be better to write about it explicitly.
>>
>> Added description.
>>
>>>
>>> 2.
>>>  --- a/src/backend/utils/sort/tuplesort.c
>>> +++ b/src/backend/utils/sort/tuplesort.c
>>> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>>>   }
>>>
>>>   /*
>>> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
>>> - * from each input tape.
>>> - */
>>> - state->memtupsize = numInputTapes;
>>> - state->memtuples = (SortTuple *) palloc(numInputTapes * 
>>> sizeof(SortTuple));
>>> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
>>> -
>>> - /*
>>> - * Use all the remaining memory we have available for read buffers among
>>> - * the input tapes.
>>> + * Use all the spare memory we have available for read buffers among the
>>> + * input tapes.
>>>
>>> This doesn't belong to this patch.
>>
>> Oops, fixed.
>>
>>> 3.
>>> + * Return the list of sync standbys using according to synchronous method,
>>>
>>> In above sentence, "using according" seems to either incomplete or wrong 
>>> usage.
>>
>> Fixed.
>>
>>> 4.
>>> + else
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + "invalid synchronization method is specified \"%d\"",
>>> + SyncRepConfig->sync_method));
>>>
>>> Here, the error message doesn't seem to aligned and you might want to
>>> use errmsg for the same.
>>>
>>
>> Fixed.
>>
>>> 5.
>>> + * In priority method, we need the oldest these positions among sync
>>> + * standbys. In quorum method, we need the newest these positions
>>> + * specified by SyncRepConfig->num_sync.
>>>
>>> /oldest these/oldest of these
>>> /newest these positions specified/newest of these positions as specified
>>
>> Fixed.
>>
>>> Instead of newest, can we consider to use latest?
>>
>> Yeah, I changed it so.
>>
>>
>>> 6.
>>> + int sync_method; /* synchronization method */
>>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>>>  } SyncRepConfigData;
>>>
>>> Can't we use 1 or 2 bytes to store sync_method information?
>>
>> I changed it to uint8.
>>
>> Attached latest v10 patch incorporated the review comments so far.
>> Please review it.
>
> Thanks for updating the patch!
>
> Do we need to

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

On Tue, Oct 4, 2016 at 11:59 AM, Robert Haas  wrote:
> On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier
>  wrote:
>> The rest looks good to me. Thanks for the feedback and the time!
>
> Thanks for the fixes.  I committed this with an additional compile
> fix, but the buildfarm turned up a few more problems that my 'make
> check-world' didn't find.  Hopefully those are fixed now, but we'll
> see.

So, one of the problems in this patch as committed is that for any
process that doesn't show up in pg_stat_activity, there's no way to
see the wait event information.  That sucks.  I think there are
basically two ways to fix this:

1. Show all processes that have a PGPROC in pg_stat_activity,
including auxiliary processes and whatnot, and use some new field in
pg_stat_activity to indicate the process type.

2. Add a second view, say pg_stat_system_activity, to show the
processes that don't appear in pg_stat_activity.  A bunch of columns
could likely be omitted, but there would be some duplication, too.

Preferences?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jacana hung after failing to acquire random number

Heikki Linnakangas  writes:
> On 12/12/2016 03:40 PM, Andrew Dunstan wrote:
>> Should one or more of these errors be fatal? Or should we at least get
>> pg_regress to try to shut down the postmaster if it can't connect after
>> 120 seconds?

> Making it fatal, i.e. bringing down the server, doesn't seem like an 
> improvement. If the failure is transient, you don't want to kill the 
> whole server, when one connection attempt fails.

> It would be nice to fail earlier if it's permanently failing, though. 
> Like, if someone does "rm /dev/urandom". Perhaps we should perform one 
> pg_strong_random() call at postmaster startup, and if that fails, refuse 
> to start up.

That's sort of contradictory.  If you're worried about transient failures,
allowing a single failed try to cause postmaster startup failure isn't the
way to make things more robust.  Giving up after a bunch of failed tries
over a very short interval isn't much better.

I'm not sure how hard we need to work here.  The case at hand seems
to be one of simply not having gotten the bugs out of the initial
implementation, so maybe we shouldn't read too much into it.

I do agree that the buildfarm needs to be more robust against broken
postmasters, because finding bugs is its raison d' etre.  But I'm not
convinced that it's a good idea to have the postmaster itself conclude
that there's something wrong with its configured random-number source.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer
 wrote:
> On 12 Dec. 2016 21:55, "Robert Haas"  wrote:
> On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin 
> wrote:
>> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
>> Since one of purposes is to orchestrate transactions, may be that
>> would be valuable.
>
> A background worker has no client connection, so what would COPY FROM STDIN
> do?
>
> It doesn't make sense. But a bgworker may well want to supply input to COPY.
> A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever.

That's kinda weird, though.  I mean, you don't need to go through all
of the COPY code just to call heap_multi_insert() or whatever, do you?
 You can hand-roll whatever you need there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-12 Thread Tomas Vondra


On 12/12/2016 07:37 AM, Amit Langote wrote:


Hi Tomas,

On 2016/12/12 10:02, Tomas Vondra wrote:


2) I'm wondering whether having 'table' in the catalog name (and also in
the new relkind) is too limiting. I assume we'll have partitioned indexes
one day, for example - do we expect to use the same catalogs?


I am not sure I understand your idea of partitioned indexes, but I doubt
it would require entries in the catalog under consideration.  Could you
perhaps elaborate more?



OK, let me elaborate. Let's say we have a partitioned table, and I want 
to create an index. The index may be either "global" i.e. creating a 
single relation for data from all the partitions, or "local" (i.e. 
partitioned the same way as the table).


Local indexes are easier to implement (it's essentially what we have 
now, except that we need to create the indexes manually for each 
partition), and don't work particularly well for some use cases (e.g. 
unique constraints). This is what I mean by "partitioned indexes".


If the index is partitioned just like the table, we probably won't need 
to copy the partition key info (so, nothing in pg_partitioned_table).
I'm not sure it makes sense to partition the index differently than the 
table - I don't see a case where that would be useful.


The global indexes would work better for the unique constraint use case, 
but it clearly contradicts our idea of TID (no information about which 
partition that references).


So maybe the catalog really only needs to track info about tables? Not 
sure. I'm just saying it'd be unfortunate to have _table in the name, 
and end up using it for indexes too.





4) I see GetIndexOpClass() got renamed to ResolveOpClass(). I find the new
name rather strange - all other similar functions start with "Get", so I
guess "GetOpClass()" would be better. But I wonder if the rename was even
necessary, considering that it still deals with index operator classes
(although now also in context of partition keys). If the rename really is
needed, isn't that a sign that the function does not belong to indexcmds.c
anymore?


The fact that both index keys and partition keys have very similar
specification syntax means there would be some common code handling the
same, of which this is one (maybe, only) example.  I didn't see much point
in finding a new place for the function, although I can see why it may be
a bit confusing to future readers of the code.

About the new name - seeing the top line in the old header comment, what
the function does is resolve a possibly explicitly specified operator
class name to an operator class OID.  I hadn't changed the name in my
original patch, but Robert suggested the rename, which I thought made sense.



OK, I don't particularly like the name but I can live with that.


5) Half the error messages use 'child table' while the other half uses
'partition'. I think we should be using 'partition' as child tables really
have little meaning outside inheritance (which is kinda hidden behind the
new partitioning stuff).


One way to go about it may be to check all sites that can possibly report
an error involving child tables (aka "partitions") whether they know from
the context which name to use.  I think it's possible, because we have
access to the parent relation in all such sites and looking at the relkind
can tell whether to call child tables "partitions".



Clearly, this is a consequence of building the partitioning on top of 
inheritance (not objecting to that approach, merely stating a fact).


I'm fine with whatever makes the error messages more consistent, if it 
does not make the code significantly more complex. It's a bit confusing 
when some use 'child tables' and others 'partitions'. I suspect even a 
single DML command may return a mix of those, depending on where exactly 
it fails (old vs. new code).



6) The psql tab-completion seems a bit broken, because it only offers the
partitions, not the parent table. Which is usually exactly the opposite of
what the user wants.


Actually, I have not implemented any support for tab-completion for the
new DDL.  I will work on that.



Aha! So the problem probably is that psql does not recognize the new 
'partitioned table' relkind, and only looks at regular tables.



testing
---



2) When doing a SELECT COUNT(*) from the partitioned table, I get a plan
like this:

  QUERY PLAN
---
 Finalize Aggregate  (cost=124523.64..124523.65 rows=1 width=8)
   ->  Gather  (cost=124523.53..124523.64 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=123523.53..123523.54 rows=1 ...)
   ->  Append  (cost=0.00..108823.53 rows=5880001 width=0)
 ->  Parallel Seq Scan on parent ...
 ->  Parallel Seq Scan on partition_1 ...
 ->  Parallel Seq Scan on partition_2 ...
 ->  Parallel Seq

Re: [HACKERS] background sessions

On 12 Dec. 2016 21:55, "Robert Haas"  wrote:

On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin 
wrote:
> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
> Since one of purposes is to orchestrate transactions, may be that
> would be valuable.

A background worker has no client connection, so what would COPY FROM STDIN
do?

It doesn't make sense. But a bgworker may well want to supply input to
COPY. A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever.

I have the feeling something like this came up on the logical replication
thread. Logical rep needs to efficiently load data via bgworker.

Re: [HACKERS] jsonb problematic operators

On 12 Dec. 2016 22:22, "Merlin Moncure"  wrote:

 If we really wanted to fix this, maybe the right way
to think about the problem is a highly reduced character set and a
pre-processor or an extension.


I'm pretty OK with expecting client drivers not to be stupid and offer
escape syntax for their placeholders. I was kind of astonished the jdbc
spec doesn't given how hard it works to be compatible. I'd assumed the ??
was jdbc standard not a postgres extension until I looked into it.

Certainly PDO needs a patch for its postgres driver if not (preferably) to
support escaping at a higher level.

Users can work around it by defining their own operator aliases + opclasses
or using function forms.

Doesn't seem worth too much drama though it sucks for usability.

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

2016-12-12 Thread Ian Jackson

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation"):
> If I recall correctly, the constraints for which there can be
> errors appearing due to concurrent transactions are primary key,
> unique, and foreign key constraints.  I don't remember seeing it
> happen, but it would not surprise me if an exclusion constraint can
> also cause an error due to a concurrent transaction's interaction
> with the transaction receiving the error.

Is it not in principle also possible to contrive a situation where
some set of (suitably weird) transactions will generate almost any
error, from the outer transaction ?

This is at the very least possible using pgsql's in-sql
exception-trapping facilities.  Such a construction might, in
principle, generate any error which can be conditionally generated at
query runtime.

ISTM that depending on the implementation arrangements (which I
frankly don't understand at all) there may be other constructions
which would give "impossible" answers.

Actually, now I come to think of it, the fact that pgsql has an in-sql
exception trapping facility means that the current situation is
clearly an actual bug, in the sense that the behaviour is contrary to
the documentation.  (And contrary to any reasonable thing that could
be written in the documentation.)

AIUI the documented behavour is that "every set of successful
transactions is serialisable".


But, consider the following scenario.

Context:

  CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
  BEGIN
BEGIN
  INSERT INTO t (k,v) VALUES (nk, -1);
EXCEPTION WHEN unique_violation THEN
  INSERT INTO c (k,v) VALUES (nk, c);
END;
RETURN 0;
  END;
  $$ LANGUAGE plpgsql;

  DROP TABLE IF EXISTS t;
  DROP TABLE IF EXISTS c;

  CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
  CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);

Two identical transactions:

  BEGIN;
  SELECT count(*) FROM t WHERE k=1;   -- save value
  -- sleep to ensure conflict
  SELECT demo(1, ?);  -- using value from SELECT
  COMMIT;

I have just tried this and got this result:

  osstestdb_test_iwj=> select * from t;
   k | v  
  ---+
   1 | -1
  (1 row)

  osstestdb_test_iwj=> select * from c;
   k | v 
  ---+---
   1 | 0
  (1 row)

  osstestdb_test_iwj=> 

The row ('1',0) in table c is clearly wrong.  No rows with v=0 could
ever be inserted into c by this SQL code, unless the other
transaction is somehow interfering in the middle.

The perl program I used is below.  `csreadconfig' does nothing of
particular interest except obtaining the db connnection as $dbh_tests.

Ian.


#!/usr/bin/perl -w

use strict qw(vars);
use Osstest;
use Data::Dumper;
use Osstest::Executive;

csreadconfig();

if (!@ARGV) {
$dbh_tests->do(<<'END');
  CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
  BEGIN
BEGIN
  INSERT INTO t (k,v) VALUES (nk, -1);
EXCEPTION WHEN unique_violation THEN
  INSERT INTO c (k,v) VALUES (nk, c);
END;
RETURN 0;
  END;
  $$ LANGUAGE plpgsql;

  DROP TABLE IF EXISTS t;
  DROP TABLE IF EXISTS c;

  CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
  CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
END
exit 0;
}

our ($k,$v) = @ARGV;

$dbh_tests->begin_work;
$dbh_tests->do("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");

our ($conflictors)= $dbh_tests->selectrow_array(

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol


A few couple more things that caught my eye while hacking on this:

1. We don't use SASLPrep to scrub username's and passwords. That's by 
choice, for usernames, because historically in PostgreSQL usernames can 
be stored in any encoding, but SASLPrep assumes UTF-8. We dodge that by 
passing an empty username in the authentication exchange anyway, because 
we always use the username we got from the startup packet. But for 
passwords, I think we need to fix that. The spec is very clear on that:



Note that implementations MUST either implement SASLprep or disallow
use of non US-ASCII Unicode codepoints in "str".



2. I think we should check nonces, etc. more carefully, to not contain 
invalid characters. For example, in the server, we use the 
read_attr_value() function to read the client's nonce. Per the spec, the 
nonce should consist of ASCII printable characters, but we will accept 
anything except the comma. That's no trouble to the server, but let's be 
strict.



To summarize, here's the overall TODO list so far:

* Use SASLPrep for passwords.

* Check nonces, etc. to not contain invalid characters.

* Derive mock SCRAM verifier for non-existent users deterministically 
from username.


* Allow plain 'password' authentication for users with a SCRAM verifier 
in rolpassword.


* Throw an error if an "authorization identity" is given. ATM, we just 
ignore it, but seems better to reject the attempt than do something that 
might not be what the client expects.


* Add "scram-sha-256" prefix to SCRAM verifiers stored in 
pg_authid.rolpassword.


Anything else I'm missing?

I've created a wiki page, mostly to host that TODO list, while we hack 
this to completion: 
https://wiki.postgresql.org/wiki/SCRAM_authentication. Feel free to add 
stuff that comes to mind, and remove stuff as you push patches to the 
branch on github.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for temporary replication slots

2016-12-12 Thread Fujii Masao

On Mon, Dec 12, 2016 at 11:16 PM, Peter Eisentraut  wrote:
> Add support for temporary replication slots
>
> This allows creating temporary replication slots that are removed
> automatically at the end of the session or on error.
>
> From: Petr Jelinek 
>
> Branch
> --
> master
>
> Details
> ---
> http://git.postgresql.org/pg/commitdiff/a924c327e2793d2025b19e18de7917110dc8afd8
>
> Modified Files
> --
> contrib/test_decoding/Makefile  |  2 +-
> contrib/test_decoding/expected/ddl.out  |  4 +-
> contrib/test_decoding/expected/slot.out | 58 +++
> contrib/test_decoding/sql/slot.sql  | 20 ++
> doc/src/sgml/func.sgml  | 16 ++--
> doc/src/sgml/protocol.sgml  | 13 ++-
> src/backend/catalog/system_views.sql| 11 ++
> src/backend/replication/repl_gram.y | 22 +++
> src/backend/replication/repl_scanner.l  |  1 +
> src/backend/replication/slot.c  | 69 ++---
> src/backend/replication/slotfuncs.c | 24 
> src/backend/replication/walsender.c | 28 +++--
> src/backend/storage/lmgr/proc.c |  3 ++
> src/backend/tcop/postgres.c |  3 ++
> src/include/catalog/pg_proc.h   |  6 +--
> src/include/nodes/replnodes.h   |  1 +
> src/include/replication/slot.h  |  4 +-
> src/test/regress/expected/rules.out |  3 +-
> 18 files changed, 237 insertions(+), 51 deletions(-)

Doesn't this need catversion bump?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-12 Thread Merlin Moncure

On Sun, Dec 11, 2016 at 10:59 PM, Craig Ringer  wrote:
> PgJDBC allows you to write ??, which is ugly, but tolerable, since the
> JDBC spec doesn't have an escape syntax for it.

This is the core problem; *JDBC* is busted.  SQL reserves words but
not punctuation marks so any assumption by client side code that
characters are not going to be interpreted by the server are going to
cause problems.  As noted earlier ":" is equally problematic as that
is hibernate's parameter marker and hibernate is probably in even
greater usage than naked JDBC in the java community.

Imagine trying to embed, say, perl, in java and reserving the very
same punctuation marks and then complaining to the perl community that
their language is broken due to usage overlap...that's what's
happening here.  If we really wanted to fix this, maybe the right way
to think about the problem is a highly reduced character set and a
pre-processor or an extension.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-12 Thread Peter Eisentraut

On 12/8/16 4:10 PM, Petr Jelinek wrote:
> On 08/12/16 20:16, Peter Eisentraut wrote:
>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
 I think that the removal of changes to ReplicationSlotAcquire() that you
 did will result in making it impossible to reacquire temporary slot once
 you switched to different one in the session as the if (active_pid != 0)
 will always be true for temp slot.
>>>
>>> I see.  I suppose it's difficult to get a test case for this.
>>
>> I created a test case, saw the error of my ways, and added your code
>> back in.  Patch attached.
>>
> 
> Hi,
> 
> I am happy with this version, thanks for moving it forward.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-12 Thread Peter Eisentraut

On 12/7/16 1:20 PM, Robert Haas wrote:
> I've committed 0001 - 0006 with that correction and a few other
> adjustments.  There's plenty of room for improvement here, and almost
> certainly some straight-up bugs too, but I think we're at a point
> where it will be easier and less error-prone to commit follow on
> changes incrementally rather than by continuously re-reviewing a very
> large patch set for increasingly smaller changes.

This page
,
which is found via the index entry for "Partitioning", should be updated
for the new functionality.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jacana hung after failing to acquire random number


On 12/12/2016 03:40 PM, Andrew Dunstan wrote:

Or should we at least get pg_regress to try to shut down the
postmaster if it can't connect after 120 seconds?


I think that makes a lot of sense, independently of this random stuff.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jacana hung after failing to acquire random number

On 12/12/2016 03:40 PM, Andrew Dunstan wrote:

On 12/12/2016 02:32 AM, Heikki Linnakangas wrote:

On 12/12/2016 05:58 AM, Michael Paquier wrote:

On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan
wrote:

jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make
check". After starting the autovacuum launcher there are 120
messages on its
log about "Could not acquire random number". Then nothing.

So I suspect the problem here is commit
fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in
detail.

Shouldn't we want the postmaster to shut down if it's not going to go
further? Note that frogmouth, also mingw, which builds with openssl,
doesn't
have this issue.

Did you unlock it in some way at the end? Here is the shape of the
report for others:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-12-10%2022%3A00%3A15

And here is of course the interesting bit:
2016-12-10 17:25:38.822 EST [584c80e2.ddc:2] LOG: could not acquire
random number
2016-12-10 17:25:39.869 EST [584c80e2.ddc:3] LOG: could not acquire
random number
2016-12-10 17:25:40.916 EST [584c80e2.ddc:4] LOG: could not acquire
random number

I am not seeing any problems with MSVC without openssl, so that's a
problem proper to MinGW. I am getting to wonder if it is actually a
good idea to cache the crypt context and then re-use it. Using a new
context all the time is definitely not performance-wise though.

Actually, looking at the config.log on jacana, it's trying to use
/dev/urandom:

configure:15028: checking for /dev/urandom
configure:15041: result: yes
configure:15054: checking which random number source to use
configure:15073: result: /dev/urandom

And looking closer at configure.in, I can see why:

elif test "$PORTNAME" = x"win32" ; then
USE_WIN32_RANDOM=1

That test is broken. It looks like the x"$VAR" = x"constant" idiom,
but the left side of the comparison doesn't have the 'x'. Oops.

Fixed that, let's see if it made jacana happy again.

This makes me wonder if we should work a bit harder to get a good
error message, if acquiring a random number fails for any reason. This
needs to work in the frontend as well backend, but we could still have
an elog(LOG, ...) there, inside an #ifndef FRONTEND block.

I see you have now improved the messages in postmaster.c, which is good.

Well, I only wordsmithed them a bit, it still doesn't give much clue on
why it failed. We should add more details to it.

However, the bigger problem (ISTM) is that when this failed I had a
system which was running but where every connection immediately failed:

== creating temporary instance==
== initializing database system ==
== starting postmaster==

pg_regress: postmaster did not respond within 120 seconds
Examine
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/log/postmaster.log
for the reason
make: *** [check] Error 2

Should one or more of these errors be fatal? Or should we at least get
pg_regress to try to shut down the postmaster if it can't connect after
120 seconds?

Making it fatal, i.e. bringing down the server, doesn't seem like an
improvement. If the failure is transient, you don't want to kill the
whole server, when one connection attempt fails.

It would be nice to fail earlier if it's permanently failing, though.
Like, if someone does "rm /dev/urandom". Perhaps we should perform one
pg_strong_random() call at postmaster startup, and if that fails, refuse
to start up.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for segfault in plpython's exception handling

2016-12-12 Thread Rafa de la Torre

For the record: I tested the patch by Tom Lane in our setup (python 
2.7.3-0ubuntu3.8) and works like a charm.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9cda81f0056ca488dbd6cded64db1238aed816b2

Also in 9.5 and 9.6 series. My request in commitfest queue can be closed. 
Cheers!
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin  wrote:
> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
> Since one of purposes is to orchestrate transactions, may be that
> would be valuable.

A background worker has no client connection, so what would COPY FROM STDIN do?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Fix for documentation of timestamp type

2016-12-12 Thread Aleksander Alekseev

Hello.

Currently doc/src/sgml/datatype.sgml states:

```
When timestamp values are stored as eight-byte integers
(currently the default), microsecond precision is available over 
the full range of values. When timestamp values are
stored as double precision floating-point numbers instead (a
deprecated compile-time option), the effective limit of precision
might be less than 6. timestamp values are stored as
seconds before or after midnight 2000-01-01. [...]
```

It gives a wrong impression that by default timestamp is stored as a
number of seconds after midnight 2000-01-01 in a eight-byte integer. In
fact timestamp is stored in MICROseconds, not seconds. For instance,
2016-12-12 16:03:14.643886 is represented as number 534873794643886:

```
$ echo "select relfilenode from pg_class where relname = 'tst';" | psql
 relfilenode 
-
   16431
(1 row)

$ find /path/to/data/dir -type f -name 16431
[...]

$ hexdump -C path/to/found/table/segment
  00 00 00 00 08 13 10 03  00 00 00 00 1c 00 e0 1f
0010  00 20 04 20 00 00 00 00  e0 9f 40 00 00 00 00 00
0020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
1fe0  3c 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00
1ff0  01 00 01 00 00 09 18 00  ae 87 87 02 77 e6 01 00

$ python
>>> "{:x}".format(534873794643886)
'1e677028787ae'
```

'ae 87 87 02 77 e6 01 00' is exactly what is physically stored on disk.
You can calculate current year from number 534873794643886 like this:

```
>>> int(2000 + 534873794643886 / 1000 / 1000 / 60 / 60 / 24 / 365.2425)
2016
```

I suggest to rewrite the documentation a bit to make it more clear that
by default timestamp is stored in microseconds. Corresponding patch is
attached.

-- 
Best regards,
Aleksander Alekseev
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 67d0c34..01a8492 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1654,10 +1654,11 @@ SELECT E'\\xDEADBEEF';
 the full range of values. When timestamp values are
 stored as double precision floating-point numbers instead (a
 deprecated compile-time option), the effective limit of precision
-might be less than 6. timestamp values are stored as
-seconds before or after midnight 2000-01-01.  When
+might be less than 6. By default timestamp values are
+storead as microseconds before or after midnight 2000-01-01.  When
 timestamp values are implemented using floating-point
-numbers, microsecond precision is achieved for dates within a few
+numbers, values are storead as number of seconds. In this case
+microsecond precision is achieved for dates within a few
 years of 2000-01-01, but the precision degrades for dates further
 away. Note that using floating-point datetimes allows a larger
 range of timestamp values to be represented than


signature.asc
Description: PGP signature

Re: [HACKERS] jacana hung after failing to acquire random number

2016-12-12 Thread Andrew Dunstan

On 12/12/2016 02:32 AM, Heikki Linnakangas wrote:

On 12/12/2016 05:58 AM, Michael Paquier wrote:
On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan
wrote:

jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make
check". After starting the autovacuum launcher there are 120
messages on its

log about "Could not acquire random number". Then nothing.

So I suspect the problem here is commit
fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in
detail.

Shouldn't we want the postmaster to shut down if it's not going to go
further? Note that frogmouth, also mingw, which builds with openssl,
doesn't

have this issue.

Did you unlock it in some way at the end? Here is the shape of the
report for others:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-12-10%2022%3A00%3A15

Actually, looking at the config.log on jacana, it's trying to use
/dev/urandom:

configure:15028: checking for /dev/urandom
configure:15041: result: yes
configure:15054: checking which random number source to use
configure:15073: result: /dev/urandom

And looking closer at configure.in, I can see why:

elif test "$PORTNAME" = x"win32" ; then
USE_WIN32_RANDOM=1

That test is broken. It looks like the x"$VAR" = x"constant" idiom,
but the left side of the comparison doesn't have the 'x'. Oops.

Fixed that, let's see if it made jacana happy again.

I see you have now improved the messages in postmaster.c, which is good.

However, the bigger problem (ISTM) is that when this failed I had a
system which was running but where every connection immediately failed:

== creating temporary instance==
== initializing database system ==
== starting postmaster==

pg_regress: postmaster did not respond within 120 seconds
Examine
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/log/postmaster.log
for the reason
make: *** [check] Error 2

Should one or more of these errors be fatal? Or should we at least get
pg_regress to try to shut down the postmaster if it can't connect after
120 seconds?

[In answer to Michael's question above, I forcibly shut down the
postmaster by hand. Otherwise it would still be running, and we would
not have got the report on the buildfarm server.]

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-12 Thread Greg Stark

On 12 December 2016 at 04:59, Craig Ringer  wrote:
> I didn't realise Pg's use of ? was that old, so thanks. That makes
> offering alternatives much less appealing.

One option might be for Postgres to define duplicate operator names
using ¿ or something else. I think ¿ is a good choice because it's a
common punctuation mark in spanish so it's probably not hard to find
on a lot of keyboards or hard to find instructions on how to type one.

There is always a risk in allowing redundant syntaxes though. For
example people running grep to find all uses of an operator will miss
the alternate spelling. There may even be security implications for
that though to be honest that seems unlikely in this case.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgcrypto compilation error due to stack-allocated EVP_CIPHER_CTX

On Mon, Dec 12, 2016 at 6:17 PM, Heikki Linnakangas  wrote:
> Removed that, did some further cosmetic changes, and pushed. I renamed a
> bunch variables and structs, so that they are more consistent with the
> similar digest stuff.

That definitely makes sense this way, thanks for the commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-12 Thread Fujii Masao

On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada  wrote:
> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  wrote:
>> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada  
>> wrote:
>>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
>>>  wrote:
 On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas  wrote:
> You could do that, but first I would code up the simplest, cleanest
> algorithm you can think of and see if it even shows up in a 'perf'
> profile.  Microbenchmarking is probably overkill here unless a problem
> is visible on macrobenchmarks.

 This is what I would go for! The current code is doing a simple thing:
 select the Nth element using qsort() after scanning each WAL sender's
 values. And I think that Sawada-san got it right. Even running on my
 laptop a pgbench run with 10 sync standbys using a data set that fits
 into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
 using perf top on a non-assert, non-debug build. Hash tables and
 allocations get a far larger share. Using the patch,
 SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
 nodes. Let's kick the ball for now. An extra patch could make things
 better later on if that's worth it.
>>>
>>> Yeah, since the both K and N could be not large these algorithm takes
>>> almost the same time. And current patch does simple thing. When we
>>> need over 100 or 1000 replication node the optimization could be
>>> required.
>>> Attached latest v9 patch.
>>>
>>
>> Few comments:
>
> Thank you for reviewing.
>
>> + * In 10.0 we support two synchronization methods, priority and
>> + * quorum. The number of synchronous standbys that transactions
>> + * must wait for replies from and synchronization method are specified
>> + * in synchronous_standby_names. This parameter also specifies a list
>> + * of standby names, which determines the priority of each standby for
>> + * being chosen as a synchronous standby. In priority method, the standbys
>> + * whose names appear earlier in the list are given higher priority
>> + * and will be considered as synchronous. Other standby servers appearing
>> + * later in this list represent potential synchronous standbys. If any of
>> + * the current synchronous standbys disconnects for whatever reason,
>> + * it will be replaced immediately with the next-highest-priority standby.
>> + * In quorum method, the all standbys appearing in the list are
>> + * considered as a candidate for quorum commit.
>>
>> In the above description, is priority method represented by FIRST and
>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>> it might be better to write about it explicitly.
>
> Added description.
>
>>
>> 2.
>>  --- a/src/backend/utils/sort/tuplesort.c
>> +++ b/src/backend/utils/sort/tuplesort.c
>> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>>   }
>>
>>   /*
>> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
>> - * from each input tape.
>> - */
>> - state->memtupsize = numInputTapes;
>> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
>> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
>> -
>> - /*
>> - * Use all the remaining memory we have available for read buffers among
>> - * the input tapes.
>> + * Use all the spare memory we have available for read buffers among the
>> + * input tapes.
>>
>> This doesn't belong to this patch.
>
> Oops, fixed.
>
>> 3.
>> + * Return the list of sync standbys using according to synchronous method,
>>
>> In above sentence, "using according" seems to either incomplete or wrong 
>> usage.
>
> Fixed.
>
>> 4.
>> + else
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + "invalid synchronization method is specified \"%d\"",
>> + SyncRepConfig->sync_method));
>>
>> Here, the error message doesn't seem to aligned and you might want to
>> use errmsg for the same.
>>
>
> Fixed.
>
>> 5.
>> + * In priority method, we need the oldest these positions among sync
>> + * standbys. In quorum method, we need the newest these positions
>> + * specified by SyncRepConfig->num_sync.
>>
>> /oldest these/oldest of these
>> /newest these positions specified/newest of these positions as specified
>
> Fixed.
>
>> Instead of newest, can we consider to use latest?
>
> Yeah, I changed it so.
>
>
>> 6.
>> + int sync_method; /* synchronization method */
>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>>  } SyncRepConfigData;
>>
>> Can't we use 1 or 2 bytes to store sync_method information?
>
> I changed it to uint8.
>
> Attached latest v10 patch incorporated the review comments so far.
> Please review it.

Thanks for updating the patch!

Do we need to update postgresql.conf.sample?

+{any_ident}{
+yylval.str = pstrdup(yytext);
+return ANY;
+}
+{first_ident}{
+yylval.str = pstrdup(

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-12 Thread Masahiko Sawada

On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  wrote:
> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada  wrote:
>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
>>  wrote:
>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas  wrote:
 You could do that, but first I would code up the simplest, cleanest
 algorithm you can think of and see if it even shows up in a 'perf'
 profile.  Microbenchmarking is probably overkill here unless a problem
 is visible on macrobenchmarks.
>>>
>>> This is what I would go for! The current code is doing a simple thing:
>>> select the Nth element using qsort() after scanning each WAL sender's
>>> values. And I think that Sawada-san got it right. Even running on my
>>> laptop a pgbench run with 10 sync standbys using a data set that fits
>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>>> using perf top on a non-assert, non-debug build. Hash tables and
>>> allocations get a far larger share. Using the patch,
>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>>> nodes. Let's kick the ball for now. An extra patch could make things
>>> better later on if that's worth it.
>>
>> Yeah, since the both K and N could be not large these algorithm takes
>> almost the same time. And current patch does simple thing. When we
>> need over 100 or 1000 replication node the optimization could be
>> required.
>> Attached latest v9 patch.
>>
>
> Few comments:

Thank you for reviewing.

> + * In 10.0 we support two synchronization methods, priority and
> + * quorum. The number of synchronous standbys that transactions
> + * must wait for replies from and synchronization method are specified
> + * in synchronous_standby_names. This parameter also specifies a list
> + * of standby names, which determines the priority of each standby for
> + * being chosen as a synchronous standby. In priority method, the standbys
> + * whose names appear earlier in the list are given higher priority
> + * and will be considered as synchronous. Other standby servers appearing
> + * later in this list represent potential synchronous standbys. If any of
> + * the current synchronous standbys disconnects for whatever reason,
> + * it will be replaced immediately with the next-highest-priority standby.
> + * In quorum method, the all standbys appearing in the list are
> + * considered as a candidate for quorum commit.
>
> In the above description, is priority method represented by FIRST and
> quorum method by ANY in the synchronous_standby_names syntax?  If so,
> it might be better to write about it explicitly.

Added description.

>
> 2.
>  --- a/src/backend/utils/sort/tuplesort.c
> +++ b/src/backend/utils/sort/tuplesort.c
> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>   }
>
>   /*
> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
> - * from each input tape.
> - */
> - state->memtupsize = numInputTapes;
> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
> -
> - /*
> - * Use all the remaining memory we have available for read buffers among
> - * the input tapes.
> + * Use all the spare memory we have available for read buffers among the
> + * input tapes.
>
> This doesn't belong to this patch.

Oops, fixed.

> 3.
> + * Return the list of sync standbys using according to synchronous method,
>
> In above sentence, "using according" seems to either incomplete or wrong 
> usage.

Fixed.

> 4.
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + "invalid synchronization method is specified \"%d\"",
> + SyncRepConfig->sync_method));
>
> Here, the error message doesn't seem to aligned and you might want to
> use errmsg for the same.
>

Fixed.

> 5.
> + * In priority method, we need the oldest these positions among sync
> + * standbys. In quorum method, we need the newest these positions
> + * specified by SyncRepConfig->num_sync.
>
> /oldest these/oldest of these
> /newest these positions specified/newest of these positions as specified

Fixed.

> Instead of newest, can we consider to use latest?

Yeah, I changed it so.


> 6.
> + int sync_method; /* synchronization method */
>   /* member_names contains nmembers consecutive nul-terminated C strings */
>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>  } SyncRepConfigData;
>
> Can't we use 1 or 2 bytes to store sync_method information?

I changed it to uint8.

Attached latest v10 patch incorporated the review comments so far.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57..bc67a99 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3054,42 +3054,75 @@ include_dir 'conf.d'
 transactions waiting for commit will be allowed to proceed after
 these standby servers confirm receipt of their da

Re: [HACKERS] multivariate statistics (v19)

2016-12-12 Thread Amit Langote


Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:
> Hi,
> 
> Attached is v20 of the multivariate statistics patch series, doing mostly
> the changes outlined in the preceding e-mail from October 11.
> 
> The patch series currently has these parts:
> 
> * 0001 : (FIX) teach pull_varno about RestrictInfo
> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients
> * 0003 : (PATCH) functional dependencies (only the ANALYZE part)
> * 0004 : (PATCH) selectivity estimation using functional dependencies
> * 0005 : (PATCH) multivariate MCV lists
> * 0006 : (PATCH) multivariate histograms
> * 0007 : (WIP) selectivity estimation using ndistinct coefficients
> * 0008 : (WIP) use multiple statistics for estimation
> * 0009 : (WIP) psql tab completion basics

Unfortunately, this failed to compile because of the duplicate_oids error.
Partitioning patch consumed same OIDs as used in this patch.

I will try to read the patches in some more detail, but in the meantime,
here are some comments/nitpicks on the documentation:

No updates to doc/src/sgml/catalogs.sgml?

+  
+   The examples presented in  used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence and multiplies the selectivities. When the columns are
+   correlated, the independence assumption is violated, and the estimates
+   may be seriously off, resulting in poor plan choices.
+  

The term independence is used in isolation - independence of what?
Independence of the distributions of values in separate columns?  Also,
the phrase "seriously off" could perhaps be replaced by more rigorous
terminology; it might be unclear to some readers.  Perhaps: wildly
inaccurate, :)

+
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+   QUERY PLAN
+-
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual
time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms

Is there a reason why examples in "67.2. Multivariate Statistics" (like
the one above) use EXPLAIN ANALYZE, whereas those in "67.1. Row Estimation
Examples" (also, other relevant chapters) uses just EXPLAIN.

+   the final 0.01% estimate. The plan however shows that this results in
+   a significant under-estimate, as the actual number of rows matching the

s/under-estimate/underestimate/g

+  
+   For additional details about multivariate statistics, see
+   src/backend/utils/mvstats/README.statsc. There are additional
+   README for each type of statistics, mentioned in the following
+   sections.
+  

Referring to source tree READMEs seems novel around this portion of the
documentation, but I think not too far away, there are some references.
This is under the VII. Internals chapter anyway, so that might be OK.

In any case, s/README.statsc/README.stats/g

Also, s/additional README/additional READMEs/g  (tags omitted for brevity)

+used in definitions of database normal forms. When simplified, saying
that
+b is functionally dependent on a means that

Maybe, s/When simplified/In simple terms/g

+In normalized databases, only functional dependencies on primary keys
+and super keys are allowed. In practice however many data sets are not
+fully normalized, for example thanks to intentional denormalization for
+performance reasons. The table t is an example of a data
+with functional dependencies. As a=b for all rows in the
+table, a is functionally dependent on b and
+b is functionally dependent on a.

"super keys" sounds like a new term.

s/for example thanks to/for example, thanks to/g  (or due to instead of
thanks to)

How about: s/an example of a data with/an example of a schema with/g

Perhaps, s/a=b/a = b/g  (additional white space)

+Similarly to per-column statistics, multivariate statistics are stored in

I notice that "similar to" is used more often than "similarly to".  But
that might be OK.

+ This shows that the statistics is defined on table t,

Perhaps: the statistics is -> the statistics are or the statistic is

+ lists attnums of the columns (references
+ pg_attribute).

While this text may be OK on the catalog description page, it might be
better to expand attnums here as "attribute numbers" dropping the
parenthesized phrase altogether.

+
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+---
+ (1) => 2, (2) => 1
+(1 row)
+

Couldn't this somehow show actual column names, instead of attribute numbers?

Will read more later.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol


On 12/09/2016 01:10 PM, Michael Paquier wrote:

On Fri, Dec 09, 2016 at 11:51:45AM +0200, Heikki Linnakangas wrote:

On 12/09/2016 05:58 AM, Michael Paquier wrote:


One thing is: when do we look up at pg_authid? After receiving the
first message from client or before beginning the exchange? As the
first message from client has the user name, it would make sense to do
the lookup after receiving it, but from PG prospective it would just
make sense to use the data already present in the startup packet. The
current patch does the latter. What do you think?


While hacking on this, I came up with the attached refactoring, against
current master. I think it makes the current code more readable, anyway, and
it provides a get_role_password() function that SCRAM can use, to look up
the stored password. (This is essentially the same refactoring that was
included in the SCRAM patch set, that introduced the get_role_details()
function.)

Barring objections, I'll go ahead and commit this first.


Ok, committed.


-   shadow_pass = TextDatumGetCString(datum);
+   *shadow_pass = TextDatumGetCString(datum);

datum = SysCacheGetAttr(AUTHNAME, roleTup,

Anum_pg_authid_rolvaliduntil, &isnull);
@@ -83,100 +83,146 @@ md5_crypt_verify(const char *role, char *client_pass,
{
*logdetail = psprintf(_("User \"%s\" has an empty password."),
  role);
+   *shadow_pass = NULL;
return STATUS_ERROR;/* empty password */
}


Here the password is allocated by text_to_cstring(), that's only 1 byte
but it should be free()'d.


Fixed. Thanks, good catch! It doesn't matter in practice as we'll 
disconnect shortly afterwards anyway, but given that the callers pfree() 
other things on error, let's be tidy.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-12 Thread Gilles Darold

Le 11/12/2016 à 04:38, Karl O. Pinc a écrit :
> On Sat, 10 Dec 2016 19:41:21 -0600
> "Karl O. Pinc"  wrote:
>
>> On Fri, 9 Dec 2016 23:36:12 -0600
>> "Karl O. Pinc"  wrote:
>>
>>> Instead I propose (code I have not actually executed):
>>> ...
>>> charlbuffer[MAXPGPATH];
>>> char*log_format = lbuffer;
>>> ...
>>>
>>> /* extract log format and log file path from the line */
>>> log_filepath = strchr(lbuffer, ' ');  /* lbuffer == log_format
>>> */ *log_filepath = '\0'; /* terminate log_format */
>>> log_filepath++;   /* start of file path */
>>> log_filepath[strcspn(log_filepath, "\n")] = '\0';  
>> Er, I guess I prefer the more paranoid, just because who knows
>> what might have manged to somehow write the file that's read
>> into lbuffer:
>>
>> ...
>> charlbuffer[MAXPGPATH];
>> char*log_format = lbuffer;
>> ...
>>
>> /* extract log format and log file path from the line */
>> if (log_filepath = strchr(lbuffer, ' ')) /* lbuffer == log_format
>> */ *log_filepath = '\0';/* terminate log_format */
>> log_filepath++;  /* start of file path */
>> log_filepath[strcspn(log_filepath, "\n")] = '\0';
> *sigh*
>
>
> ...
> charlbuffer[MAXPGPATH];
> char*log_format = lbuffer;
> ...
>
> /* extract log format and log file path from the line */
> /* lbuffer == log_format, they share storage */
> if (log_filepath = strchr(lbuffer, ' '))
> *log_filepath = '\0';   /* terminate log_format */
> else
> {
> /* Unknown format, no space. Return NULL to caller. */
> lbuffer[0] = '\0';
> break;
> }
> log_filepath++;  /* start of file path   */
> log_filepath[strcspn(log_filepath, "\n")] = '\0';
>

Applied in last version of the patch v18 as well as removing of an
unused variable.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57..871efec 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4169,6 +4169,14 @@ SELECT * FROM parent WHERE key = 2400;
  server log
 
 
+
+When logs are written to the file-system their paths, names, and
+types are recorded in
+the  file.  This provides
+a convenient way to find and access log content without establishing a
+database connection.
+
+
 
  Where To Log
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index eca98df..20de903 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15444,6 +15444,19 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
   
 
   
+   pg_current_logfile()
+   text
+   primary log file name in use by the logging collector
+  
+
+  
+   pg_current_logfile(text)
+   text
+   log file name, of log in the requested format, in use by the
+   logging collector
+  
+
+  
pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15661,6 +15674,39 @@ SET search_path TO schema , schema, ..
 the time when the postmaster process re-read the configuration files.)

 
+
+pg_current_logile
+   
+
+   
+Logging
+pg_current_logfile function
+   
+
+   
+pg_current_logfile returns, as text,
+the path of either the csv or stderr log file currently in use by the
+logging collector.  This is a path including the
+ directory and the log file name.
+Log collection must be active or the return value
+is NULL.  When multiple logfiles exist, each in a
+different format, pg_current_logfile called
+without arguments returns the path of the file having the first format
+found in the ordered
+list: stderr, csvlog.
+NULL is returned when no log file has any of these
+formats.  To request a specific file format supply,
+as text, either csvlog
+or stderr as the value of the optional parameter. The
+return value is NULL when the log format requested
+is not a configured .
+
+pg_current_logfiles reflects the content of the
+ file.  All caveats
+regards current_logfiles content are applicable
+to pg_current_logfiles' return value.
+   
+

 pg_my_temp_schema

diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 5c52824..3b003f5 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -60,6 +60,81 @@ Item
  Subdirectory containing per-database subdirectories
 
 
+
+ 
+  
+   current_logfiles
+  
+  
+   Logging
+   current_logfiles file
+  
+  current_logfiles
+ 
+ 
+  
+  A file recording the log file(s) currently written to by the syslogger
+  and the file's log formats, stderr
+  or csvlog.  Each line of the file is a space separated list of
+  two elements: the log format and the full path to the log file including the
+  value of .  The log format must

Re: [HACKERS] pgcrypto compilation error due to stack-allocated EVP_CIPHER_CTX