Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-08-15 Thread Ashutosh Bapat
On Wed, Aug 16, 2017 at 8:44 AM, Alvaro Herrera
 wrote:
> Christoph Berg wrote:
>> Re: Thomas Munro 2017-08-10 
>> 

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-15 Thread Amit Langote
Thanks for the review.

On 2017/08/16 2:27, Robert Haas wrote:
> On Wed, Aug 9, 2017 at 10:11 PM, Amit Langote
>  wrote:
>>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>>> is a very good idea.
>>
>> I put this patch ahead in the list and so it's now 0001.
> 
> I think what you've currently got as
> 0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch is a
> bug fix that probably needs to be back-patched into v10, so it should
> come first.

That makes sense.  That patch is now 0001.  Checked that it can be
back-patched to REL_10_STABLE.

> I think 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch and
> 0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch should
> be merged into one patch and that should come next,

Merged the two into one: attached 0002.

> followed by
> 0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch and

This one is now 0003.

> finally what you now have as
> 0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch.

And 0004.

> This patch series is blocking a bunch of other things, so it would be
> nice if you could press forward with this quickly.

Attached updated patches.

Thanks,
Amit
From 23a3e291001394ffa2b79b34b32c582cb4898e87 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 16 Aug 2017 11:36:14 +0900
Subject: [PATCH 1/4] Relieve RelationGetPartitionDispatchInfo() of doing any
 locking

Anyone who wants to call RelationGetPartitionDispatchInfo() must first
acquire locks using find_all_inheritors.

Doing it this way gets rid of the possibility of a deadlock when partitions
are concurrently locked, because RelationGetPartitionDispatchInfo would lock
the partitions in one order and find_all_inheritors would in another.

Reported-by: Amit Khandekar, Robert Haas
Reports: 
https://postgr.es/m/CAJ3gD9fdjk2O8aPMXidCeYeB-mFB%3DwY9ZLfe8cQOfG4bTqVGyQ%40mail.gmail.com
 
https://postgr.es/m/CA%2BTgmobwbh12OJerqAGyPEjb_%2B2y7T0nqRKTcjed6L4NTET6Fg%40mail.gmail.com
---
 src/backend/catalog/partition.c | 55 ++---
 src/backend/executor/execMain.c | 18 +-
 src/include/catalog/partition.h |  3 +--
 3 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index c1a307c8d3..96a64ce6b2 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -999,12 +999,16 @@ get_partition_qual_relid(Oid relid)
  * RelationGetPartitionDispatchInfo
  * Returns information necessary to route tuples down a partition 
tree
  *
- * All the partitions will be locked with lockmode, unless it is NoLock.
- * A list of the OIDs of all the leaf partitions of rel is returned in
- * *leaf_part_oids.
+ * The number of elements in the returned array (that is, the number of
+ * PartitionDispatch objects for the partitioned tables in the partition tree)
+ * is returned in *num_parted and a list of the OIDs of all the leaf
+ * partitions of rel is returned in *leaf_part_oids.
+ *
+ * All the relations in the partition tree (including 'rel') must have been
+ * locked (using at least the AccessShareLock) by the caller.
  */
 PartitionDispatch *
-RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
+RelationGetPartitionDispatchInfo(Relation rel,
 int 
*num_parted, List **leaf_part_oids)
 {
PartitionDispatchData **pd;
@@ -1019,14 +1023,18 @@ RelationGetPartitionDispatchInfo(Relation rel, int 
lockmode,
offset;
 
/*
-* Lock partitions and make a list of the partitioned ones to prepare
-* their PartitionDispatch objects below.
+* We rely on the relcache to traverse the partition tree to build both
+* the leaf partition OIDs list and the array of PartitionDispatch 
objects
+* for the partitioned tables in the tree.  That means every partitioned
+* table in the tree must be locked, which is fine since we require the
+* caller to lock all the partitions anyway.
 *
-* Cannot use find_all_inheritors() here, because then the order of OIDs
-* in parted_rels list would be unknown, which does not help, because we
-* assign indexes within individual PartitionDispatch in an order that 
is
-* predetermined (determined by the order of OIDs in individual 
partition
-* descriptors).
+* For every partitioned table in the tree, starting with the root
+* partitioned table, add its relcache entry to parted_rels, while also
+* queuing its partitions (in the order in which they appear in the
+* partition descriptor) to be looked at later in the same loop.  This 
is
+* a bit tricky but works because the foreach() macro doesn't fetch 

Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-08-15 Thread Thomas Munro
On Thu, Jul 27, 2017 at 4:19 AM, Pavel Stehule  wrote:
> here is a patch - it is trivial

The feature makes sense, follows an existing example (PSQL_EDITOR),
and works nicely.

   is platform-dependent.  Use of the pager can be disabled by setting
   PAGER to empty, or by using pager-related options of

Maybe this should now say "... can be disabled by specifying an empty
string", since "... by setting PAGER to empty" isn't quite the full
story.

-  the \pset command.
+  the \pset command. These variables are examined
+  in the order listed; the first that is set is used.
  
+

Stray blank line.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixup some misusage of appendStringInfo and friends

2017-08-15 Thread David Rowley
On 16 August 2017 at 15:38, Peter Eisentraut
 wrote:
> committed

Thanks!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixup some misusage of appendStringInfo and friends

2017-08-15 Thread Peter Eisentraut
On 4/26/17 20:14, David Rowley wrote:
> On 27 April 2017 at 06:41, Peter Eisentraut
>  wrote:
>> On 4/19/17 08:42, Ashutosh Bapat wrote:
>>> I reviewed the patch. It compiles clean, make check-world passes. I do
>>> not see any issue with it.
>>
>> Looks reasonable.  Let's keep it for the next commit fest.
> 
> Thank you to both of you for looking. I'd thought that maybe the new
> stuff in PG10 should be fixed before the release. If we waited, and
> fix in PG11 then backpatching is more of a pain.
> 
> However, I wasn't careful in the patch to touch only new to PG10 code.
> 
> I'll defer to your better judgment and add to the next 'fest.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-08-15 Thread Alvaro Herrera
Christoph Berg wrote:
> Re: Thomas Munro 2017-08-10 
> 

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-08-15 Thread Masahiko Sawada
On Thu, Jun 22, 2017 at 12:03 AM, Masahiko Sawada  wrote:
> On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada  
> wrote:
>> On Wed, May 17, 2017 at 1:30 AM, Robert Haas  wrote:
>>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila  
>>> wrote:
 On Fri, May 12, 2017 at 9:14 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
>> wrote:
>>> ... I'd like to propose to change relation
>>> extension lock management so that it works using LWLock instead.
>
>> That's not a good idea because it'll make the code that executes while
>> holding that lock noninterruptible.
>
> Is that really a problem?  We typically only hold it over one kernel call,
> which ought to be noninterruptible anyway.

 During parallel bulk load operations, I think we hold it over multiple
 kernel calls.
>>>
>>> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
>>> kernel call, no?  Nor is vm_extend.
>>
>> Yeah, these functions could call more than one kernel calls while
>> holding extension lock.
>>
>>> Also, it's not just the backend doing the filesystem operation that's
>>> non-interruptible, but also any waiters, right?
>>>
>>> Maybe this isn't a big problem, but it does seem to be that it would
>>> be better to avoid it if we can.
>>>
>>
>> I agree to change it to be interruptible for more safety.
>>
>
> Attached updated version patch. To use the lock mechanism similar to
> LWLock but interrupt-able, I introduced new lock manager for extension
> lock. A lot of code especially locking and unlocking, is inspired by
> LWLock but it uses the condition variables to wait for acquiring lock.
> Other part is not changed from previous patch. This is still a PoC
> patch, lacks documentation. The following is the measurement result
> with test script same as I used before.
>
> * Copy test script
>  HEADPatched
> 4436.6   436.1
> 8561.8   561.8
> 16   580.7   579.4
> 32   588.5   597.0
> 64   596.1   599.0
>
> * Insert test script
>  HEADPatched
> 4156.5   156.0
> 8167.0   167.9
> 16   176.2   175.6
> 32   181.1   181.0
> 64   181.5   183.0
>
> Since I replaced heavyweight lock with lightweight lock I expected the
> performance slightly improves from HEAD but it was almost same result.
> I'll continue to look at more detail.
>

The previous patch conflicts with current HEAD, I rebased the patch to
current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Moving_extension_lock_out_of_heavyweight_lock_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Craig Ringer
On 16 August 2017 at 03:42, Tomas Vondra 
wrote:

>
>
> On 08/15/2017 07:54 PM, Robert Haas wrote:
>
>> On Tue, Aug 15, 2017 at 9:59 AM, Tomas Vondra
>>  wrote:
>>
>>> I don't think so -- the "committed" and "invalid" meanings are
 effectively canceled when the "frozen" mask is present.

 I mean, "committed" and "invalid" contradict each other...

>>>
>>> FWIW I agree with Craig - the functions should output the masks raw,
>>> without
>>> any filtering. The reason is that when you're investigating data
>>> corruption
>>> or unexpected behavior, all this is very useful when reasoning about what
>>> might (not) have happened.
>>>
>>> Or at least make the filtering optional.
>>>
>>
>> I don't think "filtering" is the right way to think about it.  It's
>> just labeling each combination of bits with the meaning appropriate to
>> that combination of bits.
>>
>> I mean, if you were displaying the contents of a CLOG entry, would you
>> want the value 3 to be displayed as COMMITTED ABORTED SUBCOMMITTED
>> because TRANSACTION_STATUS_COMMITTED|TRANSACTION_STATUS_ABORTED ==
>> TRANSACTION_STATUS_SUB_COMMITTED?
>>
>> I realize that you may be used to thinking of the HEAP_XMIN_COMMITTED
>> and HEAP_XMAX_COMMITTED bits as two separate bits, but that's not
>> really true any more.  They're a 2-bit field that can have one of four
>> values: committed, aborted, frozen, or none of the above.
>>
>>
> All I'm saying is that having the complete information (knowing which bits
> are actually set in the bitmask) is valuable when reasoning about how you
> might have gotten to the current state. Which I think is what Craig is
> after.
>
> What I think we should not do is interpret the bitmasks (omitting some of
> the information) assuming all the bits were set correctly.


I agree, and the patch already does half of this: it can output just the
raw bit flags, or it can interpret them to show HEAP_XMIN_FROZEN etc.

So the required change, which seems to have broad agreement, is to have the
"interpret the bits" mode show only HEAP_XMIN_FROZEN when it sees
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID, etc. We can retain raw-flags output
as-is for when seriously bogus state is suspected.

Any takers?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Moon Insung
I checked for code related to infomask.
(add flag state -- HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMIN_FROZEN)

first i'm still beginner level about postgresql, so my opinion may be wrong.

if the "HEAP_XMIN_COMMITTED" flag is added, check the function of 
"HeapTupleHeaderXminInvalid"
if the "HEAP_XMIN_INVALID" flag is added, check the function of 
"HeapTupleHeaderXminCommitted"
if the "HEAP_XMIN_FROZEN" flag is added, use the "HeapTupleHeaderSetXminFrozen" 
function or
use the code as 
--
xid = HeapTupleHeaderGetXmin(tuple);
if (TransactionIdIsNormal(xid))
{
if (TransactionIdPrecedes(xid, cutoff_xid))
{
frz->t_infomask |= HEAP_XMIN_FROZEN;
changed = true;
}
else
totally_frozen = false;
} 
--
to add the flag.

so as a result, HEAP_XMIN_INVALID and HEAP_XMIN_COMMITTED is cannot coexist.
unfortunately, i don't know if HEAP_XMIN_COMMITTED and HEAP_XMIN_FROZEN flags 
can coexist.

so i think it's also a good idea to output the raw masks, without any filtering.
however, i think the information that is presented to the user should inform us 
which flags was entered.

Regards.
Moon

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tomas Vondra
> Sent: Wednesday, August 16, 2017 5:36 AM
> To: Robert Haas
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
> 
> 
> 
> On 08/15/2017 09:55 PM, Robert Haas wrote:
> > On Tue, Aug 15, 2017 at 3:42 PM, Tomas Vondra
> >  wrote:
> >> What I think we should not do is interpret the bitmasks (omitting
> >> some of the information) assuming all the bits were set correctly.
> >
> > I'm still confused. HEAP_XMIN_COMMITTED|HEAP_XMIN_ABORTED ==
> > HEAP_XMIN_FROZEN. Nobody is proposing to omit anything; to the
> > contrary, what's being proposed is not to display the same thing twice
> > (and in a misleading fashion, to boot).
> >
> 
> I understand your point. Assume you're looking at this bit of code:
> 
>  if (HeapTupleHeaderXminCommitted(enumval_tup->t_data))
>  return;
> 
> which is essentially
> 
>  if (enumval_tup->t_data & HEAP_XMIN_COMMITTED)
>  return;
> 
> If the function only gives you HEAP_XMIN_FROZEN, how likely is it you miss
> this actually evaluates as true?
> 
> You might say that people investigating issues in this area of code should
> be aware of how HEAP_XMIN_FROZEN is defined, and perhaps you're right ...
> 
> regards
> 
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-15 Thread Amit Langote
On 2017/08/10 18:52, Beena Emerson wrote:
> Hi Amit,
> 
> On Thu, Aug 10, 2017 at 7:41 AM, Amit Langote
>  wrote:
>> On 2017/08/05 2:25, Robert Haas wrote:
>>> Concretely, my proposal is:
>>>
>>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>>> is a very good idea.
>>
>> I put this patch ahead in the list and so it's now 0001.
>>
> 
> FYI, 0001 patch throws the warning:
> 
> execMain.c: In function ‘ExecSetupPartitionTupleRouting’:
> execMain.c:3342:16: warning: ‘next_ptinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>  next_ptinfo->parentid != ptinfo->parentid)

Thanks for the review.  Will fix in the updated version of the patch I
will post sometime later today.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication_slot_catalog_xmin not explicitly initialized when creating procArray

2017-08-15 Thread Peter Eisentraut
On 7/9/17 21:23, Masahiko Sawada wrote:
> On Sat, Jul 8, 2017 at 2:19 AM, Wong, Yi Wen  wrote:
>> replication_slot_catalog_xmin is not explictly initialized to
>> InvalidTransactionId.

> Thank you for the patch. This change makes sense to me.

Committed and backpatched

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-15 Thread Masahiko Sawada
On Tue, Aug 15, 2017 at 2:43 AM, Fabien COELHO  wrote:
>
> Hello,
>
>> I think we can use it like --custom-initialize="create_table, vacuum"
>> which is similar to what we specify a connection option to psql for
>> example.
>
>
> Even if it is allowed, do not advertise it. Or use space as a separator and
> not comma. ISTM that with psql connections space is the higher level
> separator, not an optional thing, and comma is used for lower level
> splitting: "host=foo port=5432,5433 ..."
>
>> But it will be unnecessary if we have the one letter version.
>
>
> Sure.
>
>>> I'm also wondering whether using a list is a good option, because it
>>> implies
>>> a large parse function, list management and so on. With the one letter
>>> version, it could be just a string to be scanned char by char for
>>> operations.
>>
>>
>> I basically agree with the one letter version. But I'm concerned that
>> it'll confuse users if we have more initialization steps for the
>> pgbench initialization. If we add more various initialization steps it
>> makes pgbench command hard to read and the users might have to
>> remember these options.
>
>
> I think that if we get to the point where so many initialization steps that
> people get confused, then adding long names will not be an issue:-)
>
> In the mean time it only needs 5 values.

Agreed.

>
>>> Maybe there could be short-hands for usual setups, eg "default" for
>>> "tdpv"
>>> or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...
>>
>>
>> If --custom-initialize option requires for i option to be set,
>> "pgbench -i" means the initialization with full steps and "pgbench -i
>> --custom-initialize=..." means the initialization with custom
>> operation steps.
>
>
> Sure. It does not preclude the default to have a name.
>
>>> Remove the "no-primary-keys" from the long option array as it has
>>> disappeared. You might consider make "custom-initialize" take the 'I'
>>> short
>>> option.
>>>
>>> Ranting unrelated to this patch: the automatic aid type switching based
>>> on
>>> scale is a bad idea (tm), because when trying to benchmark it means that
>>> changing the scale also changes the schema, and you really do not need
>>> that.
>>> ISTM that it should always use INT8.
>>
>>
>> Hmm, I think it's a valid point. Should we allow users to specify like
>> the above thing in the custom initialization feature as well?
>
>
> I would be in favor of having an option to do a tpc-b conforming schema
> which would include that, but which would also change the default balance
> type which is not large enough per spec. Maybe it could be "T".
>

Yeah, once custom initialization patch get committed we can extend it.

Attached updated patch. I've incorporated the all comments from Fabien
to it, and changed it to single letter version.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-15 Thread Thomas Munro
Will respond to the actionable code review points separately with a
new patch set, but first:

On Wed, Aug 16, 2017 at 10:06 AM, Andres Freund  wrote:
> On 2017-08-15 17:44:55 +1200, Thomas Munro wrote:
>> > @@ -99,12 +72,9 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
>> >
>> >  /*
>> >   * CreateTupleDesc
>> > - * This function allocates a new TupleDesc pointing to a given
>> > + * This function allocates a new TupleDesc by copying a given
>> >   * Form_pg_attribute array.
>> >   *
>> > - * Note: if the TupleDesc is ever freed, the Form_pg_attribute array
>> > - * will not be freed thereby.
>> > - *
>> >
>> > I'm leaning towards no, but you could argue that we should just change
>> > that remark to be about constr?
>>
>> I don't see why.
>
> Because for that the freeing bit is still true, ie. it's still
> separately allocated.

It's true of struct tupleDesc in general but not true of objects
returned by this function in respect of the arguments to the function.
In master, that comment is a useful warning that the object will hold
onto but never free the attrs array you pass in.  The same doesn't
apply to constr so I don't think we need to say anything.

>> > Review of 0003:
>> >
>> > I'm not doing a too detailed review, given I think there's some changes
>> > in the pipeline.
>>
>> Yep.  In the new patch set the hash table formerly known as DHT is now
>> in patch 0004 and I made the following changes based on your feedback:
>>
>> 1.  Renamed it to "dshash".  The files are named dshash.{c,h}, and the
>> prefix on identifiers is dshash_.  You suggested dsmhash, but the "m"
>> didn't seem to make much sense.  I considered dsahash, but dshash
>> seemed better.  Thoughts?
>
> WFM.  Just curious, why didn't m make sense? I was referring to dynamic
> shared memory hash - seems right. Whether there's an intermediary dsa
> layer or not...

I think of DSA as a defining characteristic that dshash exists to work
with (it's baked into dshash's API), but DSM as an implementation
detail which dshash doesn't directly depend on.  Therefore I don't
like the "m".

I speculate that in future we might have build modes where DSA doesn't
use DSM anyway: it could use native pointers and maybe even a
different allocator in a build that either uses threads or
non-portable tricks to carve out a huge amount of virtual address
space so that it can map memory in at the same location in each
backend.  In that universe DSA would still be providing the service of
grouping allocations together into a scope for "rip cord" cleanup
(possibly by forwarding to MemoryContext stuff) but otherwise compile
away to nearly nothing.

>> > +static int32
>> > +find_or_allocate_shared_record_typmod(TupleDesc tupdesc)
>> > +{
>> >
>> > +   /*
>> > +* While we still hold the atts_index entry locked, add this to
>> > +* typmod_index.  That's important because we don't want anyone to 
>> > be able
>> > +* to find a typmod via the former that can't yet be looked up in 
>> > the
>> > +* latter.
>> > +*/
>> > +   PG_TRY();
>> > +   {
>> > +   typmod_index_entry =
>> > +   
>> > dht_find_or_insert(CurrentSharedRecordTypmodRegistry.typmod_index,
>> > +  , 
>> > );
>> > +   if (found)
>> > +   elog(ERROR, "cannot create duplicate shared record 
>> > typmod");
>> > +   }
>> > +   PG_CATCH();
>> > +   {
>> > +   /*
>> > +* If we failed to allocate or elog()ed, we have to be 
>> > careful not to
>> > +* leak the shared memory.  Note that we might have 
>> > created a new
>> > +* atts_index entry above, but we haven't put anything in 
>> > it yet.
>> > +*/
>> > +   dsa_free(CurrentSharedRecordTypmodRegistry.area, 
>> > shared_dp);
>> > +   PG_RE_THROW();
>> > +   }
>> >
>> > Not entirely related, but I do wonder if we don't need abetter solution
>> > to this. Something like dsa pointers that register appropriate memory
>> > context callbacks to get deleted in case of errors?
>>
>> Huh, scope guards.  I have had some ideas about some kind of
>> destructor mechanism that might replace what we're doing with DSM
>> detach hooks in various places and also work in containers like hash
>> tables (ie entries could have destructors), but doing it with the
>> stack is another level...
>
> Not sure what you mean with 'stack'?

I probably read too much into your words.  I was imagining something
conceptually like the following, since the "appropriate memory
context" in the code above is actually a stack frame:

  dsa_pointer p = ...;

  ON_ERROR_SCOPE_EXIT(dsa_free, area, p); /* yeah, I know, no variadic macros */

  elog(ERROR, "boo"); /* this causes p to be freed */

The point being that if the caller of this function catches the 

Re: [HACKERS] The error message "sorry, too many clients already" is imprecise

2017-08-15 Thread Peter Eisentraut
On 8/8/17 06:39, Piotr Stefaniak wrote:
> I've made a hack for myself (attached diff against 9.4) which adds a
> DETAIL-level message telling me which proc list was saturated. It's not
> committable in its current form because of a C99 feature and perhaps for
> other reasons.

There are other places that also emit the "sorry" message that would
also need to be looked at.  Also, the patch would need to be against master.

This kind of patch to make error messages more precise is usually
welcome, so keep working on it.  But this needs a bit more refinement as
it stands.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivewal and messages printed in non-verbose mode

2017-08-15 Thread Michael Paquier
On Wed, Aug 16, 2017 at 9:29 AM, Peter Eisentraut
 wrote:
> I have committed that version.  I think the exit message can be useful,
> because pg_receivewal will usually run as some kind of background
> process where the exit status might be not be visible.

Thanks.

> I have also committed a small documentation patch to describe the exit
> status and behavior better.

That looks fine to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-15 Thread Andres Freund
On 2017-08-15 20:30:16 -0400, Robert Haas wrote:
> On Tue, Aug 15, 2017 at 6:06 PM, Andres Freund  wrote:
> > Interesting. I was apparently thinking slightly differently. I'd have
> > thought we'd have Session struct in statically allocated shared
> > memory. Which'd then have dsa_handle, dshash_table_handle, ... members.
> 
> Sounds an awful lot like what we're already doing with PGPROC.

Except it'd be shared between leader and workers. So no, not really.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 6:06 PM, Andres Freund  wrote:
> Interesting. I was apparently thinking slightly differently. I'd have
> thought we'd have Session struct in statically allocated shared
> memory. Which'd then have dsa_handle, dshash_table_handle, ... members.

Sounds an awful lot like what we're already doing with PGPROC.

I am not sure that inventing a Session thing that should have 500
things in it but actually has the 3 that are relevant to this patch is
really a step forward.  In fact, it sounds like something that will
just create confusion down the road.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivewal and messages printed in non-verbose mode

2017-08-15 Thread Peter Eisentraut
On 7/9/17 22:08, Michael Paquier wrote:
> While the version on my laptop does that:
> if (time_to_abort)
> {
> -   fprintf(stderr, _("%s: received interrupt signal, exiting\n"),
> -   progname);
> +   if (verbose)
> +   fprintf(stderr, _("%s: received interrupt
> signal, exiting\n"),
> +   progname);
> return true;
> }
> return false;
> Not sure how that feel into the cracks.

I have committed that version.  I think the exit message can be useful,
because pg_receivewal will usually run as some kind of background
process where the exit status might be not be visible.

I have also committed a small documentation patch to describe the exit
status and behavior better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 6:40 PM, Tom Lane  wrote:
> No.  The case that I'm concerned about is where the initial estimate
> of "parallelModeOK" is true, but the planner nevertheless selects
> a parallel-unsafe plan --- unsafe for some other reason than that it
> already has a Gather in it.  That would prevent the code further down
> in standard_planner from plastering a Gather on top, but we still end up
> labeling the output plan with parallelModeNeeded = true.
>
> Now, you might argue that there should be no case where that initial
> estimate has parallelModeOK = true and yet we end up with a
> parallel-unsafe plan.  I do not think I believe that; that estimate
> is supposed to be a conservative estimate, not ironclad exactness.
> (In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
> path, that's parallel unsafe, but the original query might've been
> completely safe.)

I'm quite confused here.  What's parallel-unsafe about a MinMaxAgg?
There might be some reason why it's parallel-restricted, but it
shouldn't be parallel-unsafe.

More generally, there is no way for parallelModeOK to go from true to
false after it's initially set.  If there were, it would be a bug,
because we might plan some subquery thinking that parallelModeOK is
true, use a Gather node, and then later plan some other subquery that
changes to parallelModeOK from true to false, making the plan that's
already written in stone no longer valid.  This is exactly why we have
to have max_parallel_hazard() walk the ENTIRE query tree, including
all subqueries, before we get started.

Planning can obviously introduce elements into the query that prevent
parallelism from being used for that part of the query, and the only
thing there is to make sure that such things never make it into a
partial path.  But it can't just decide that parallelism is no longer
allowed *anywhere* in the query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication message type 'Y' is missing in docs

2017-08-15 Thread Masahiko Sawada
On Wed, Aug 16, 2017 at 4:38 AM, Peter Eisentraut
 wrote:
> On 8/9/17 20:22, Masahiko Sawada wrote:
>> There is a type of logical replication message 'Y' for data types, but
>> it's not documented in section 52.9. Logical Replication Message
>> Formats. Attached patch fixes this. I think it can be PG10 item.
>
> Committed with some tweaking.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign tables privileges not shown in information_schema.table_privileges

2017-08-15 Thread Peter Eisentraut
On 8/10/17 09:00, Nicolas Thauvin wrote:
> The information_schema.table_privileges view filters on regular tables
> and views. Foreign tables are not shown in this view but they are in
> other views of the information_schema like tables or column_privileges.
> 
> Is it intentional? A patch is attached if not.

Fix committed to all branches.  Thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
I wrote:
> As I said, I think this code is based on fuzzy thinking.

... or, more charitably, it may just date from before standard_planner's
plaster-a-Gather-on-top logic worked the way it does today.  But in
any case I think it's wrong now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab complete for psql pset pager values

2017-08-15 Thread Peter Eisentraut
On 7/26/17 11:02, Pavel Stehule wrote:
> attached trivial patch for missing tab complete for \pset pager setting

committed (back to 9.6)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 5:15 PM, Tom Lane  wrote:
>> The idea of the existing
>> code seems to be "let's exercise what happens if the executor does
>> EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
>> parallel-unsafe one"; which seems to me to be bogus as heck.

> I think that is not what the current code is doing.  If the plan is
> diagnosed as parallel-unsafe, then parallelModeOK will be false and
> nothing will happen.

No.  The case that I'm concerned about is where the initial estimate
of "parallelModeOK" is true, but the planner nevertheless selects
a parallel-unsafe plan --- unsafe for some other reason than that it
already has a Gather in it.  That would prevent the code further down
in standard_planner from plastering a Gather on top, but we still end up
labeling the output plan with parallelModeNeeded = true.

Now, you might argue that there should be no case where that initial
estimate has parallelModeOK = true and yet we end up with a
parallel-unsafe plan.  I do not think I believe that; that estimate
is supposed to be a conservative estimate, not ironclad exactness.
(In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
path, that's parallel unsafe, but the original query might've been
completely safe.)

> If the plan is actually parallel-unsafe but the
> planner doesn't *think* it's parallel-unsafe, then what you are
> talking about will happen, but that seems to me to be a good thing.

No, you have that backwards.  If the planner incorrectly thinks the plan
is parallel-safe then it will forcibly put a Gather on top, and we'll mark
parallelModeNeeded = true due to the existing assignment where that is
done, and then we will detect the unsafety at runtime.  The case I'm
worried about is where the planner knows (correctly) that the selected
plan is parallel-unsafe, due to some choice made after the initial
parallelModeOK = true estimate.

> It lets you find planner bugs (or functions that a user has labelled
> incorrectly).

Don't believe this argument either.  Certainly we want to be able to
detect incorrectly-labeled-safe functions by turning on
force_parallel_mode; but that will happen anyway, since both the initial
parallelModeOK estimate and the final top_plan->parallel_safe flag will
reflect the false safety labeling.  (If there's something else in the plan
that makes it parallel-unsafe overall, then the test cannot reveal the
false function labeling anyway.)

As I said, I think this code is based on fuzzy thinking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-08-15 Thread Michael Paquier
On Tue, Aug 15, 2017 at 10:35 PM, Robert Haas  wrote:
> +1 for 0001 and 0002 in general, but I can't help noticing that they
> lead to a noticeable worsening of the error messages in the regression
> tests.

As the access restriction gets handled by GRANT in this patch, that's
a price to pay. The verbosity of this message could be kept by
introducing a default role dedicated to LOs. Personally, I am of the
opinion that a default role in this case is not justified for only
those functions. Other opinions are welcome.

I have noticed that 0002 should update lobj.sgml as well, something I
missed. Now the docs say "Therefore, their use is restricted to
superusers." for lo_import and lo_export, but I think that "by
default" should be appended.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 5:15 PM, Tom Lane  wrote:
> The idea of the existing
> code seems to be "let's exercise what happens if the executor does
> EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
> parallel-unsafe one"; which seems to me to be bogus as heck.

I think that is not what the current code is doing.  If the plan is
diagnosed as parallel-unsafe, then parallelModeOK will be false and
nothing will happen.  If the plan is actually parallel-unsafe but the
planner doesn't *think* it's parallel-unsafe, then what you are
talking about will happen, but that seems to me to be a good thing.
It lets you find planner bugs (or functions that a user has labelled
incorrectly).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-15 Thread Andres Freund
On 2017-08-15 17:44:55 +1200, Thomas Munro wrote:
> > @@ -99,12 +72,9 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
> >
> >  /*
> >   * CreateTupleDesc
> > - * This function allocates a new TupleDesc pointing to a given
> > + * This function allocates a new TupleDesc by copying a given
> >   * Form_pg_attribute array.
> >   *
> > - * Note: if the TupleDesc is ever freed, the Form_pg_attribute array
> > - * will not be freed thereby.
> > - *
> >
> > I'm leaning towards no, but you could argue that we should just change
> > that remark to be about constr?
>
> I don't see why.

Because for that the freeing bit is still true, ie. it's still
separately allocated.


> > Review of 0003:
> >
> > I'm not doing a too detailed review, given I think there's some changes
> > in the pipeline.
>
> Yep.  In the new patch set the hash table formerly known as DHT is now
> in patch 0004 and I made the following changes based on your feedback:
>
> 1.  Renamed it to "dshash".  The files are named dshash.{c,h}, and the
> prefix on identifiers is dshash_.  You suggested dsmhash, but the "m"
> didn't seem to make much sense.  I considered dsahash, but dshash
> seemed better.  Thoughts?

WFM.  Just curious, why didn't m make sense? I was referring to dynamic
shared memory hash - seems right. Whether there's an intermediary dsa
layer or not...


> 2.  Ripped out the incremental resizing and iterator support for now,
> as discussed.  I want to post patches to add those features when we
> have a use case but I can see that it's no slam dunk so I want to keep
> that stuff out of the dependency graph for parallel hash.

Cool.


> 3.  Added support for hash and compare functions with an extra
> argument for user data, a bit like qsort_arg_comparator.  This is
> necessary for functions that need to be able to dereference a
> dsa_pointer stored in the entry, since they need the dsa_area.  (I
> would normally call such an argument 'user_data' or 'context' or
> something but 'arg' seemed to be establish by qsort_arg.)

Good.


> > +/*
> > + * The set of parameters needed to create or attach to a hash table.  The
> > + * members tranche_id and tranche_name do not need to be initialized when
> > + * attaching to an existing hash table.  The functions do need to be 
> > supplied
> > + * even when attaching because we can't safely share function pointers 
> > between
> > + * backends in general.
> > + */
> > +typedef struct
> > +{
> > +   size_t key_size;/* Size of the key (initial 
> > bytes of entry) */
> > +   size_t entry_size;  /* Total size of entry */
> > +   dht_compare_function compare_function;  /* Compare function */
> > +   dht_hash_function hash_function;/* Hash function */
> > +   int tranche_id; /* The tranche ID to use 
> > for locks. */
> > +} dht_parameters;
> >
> > Wonder if it'd make sense to say that key/entry sizes to be only
> > minimums? That means we could increase them to be the proper aligned
> > size?
>
> I don't understand.  You mean explicitly saying that there are
> overheads?  Doesn't that go without saying?

I was thinking that we could do the MAXALIGN style calculations once
instead of repeatedly, by including them in the key and entry sizes.


> > Ignoring aspects related to REC_HASH_KEYS and related discussion, since
> > we're already discussing that in another email.
>
> This version includes new refactoring patches 0003, 0004 to get rid of
> REC_HASH_KEYS by teaching the hash table how to use a TupleDesc as a
> key directly.  Then the shared version does approximately the same
> thing, with a couple of extra hoops to jump thought.  Thoughts?

Will look.


> > +static int32
> > +find_or_allocate_shared_record_typmod(TupleDesc tupdesc)
> > +{
> >
> > +   /*
> > +* While we still hold the atts_index entry locked, add this to
> > +* typmod_index.  That's important because we don't want anyone to 
> > be able
> > +* to find a typmod via the former that can't yet be looked up in 
> > the
> > +* latter.
> > +*/
> > +   PG_TRY();
> > +   {
> > +   typmod_index_entry =
> > +   
> > dht_find_or_insert(CurrentSharedRecordTypmodRegistry.typmod_index,
> > +  , );
> > +   if (found)
> > +   elog(ERROR, "cannot create duplicate shared record 
> > typmod");
> > +   }
> > +   PG_CATCH();
> > +   {
> > +   /*
> > +* If we failed to allocate or elog()ed, we have to be 
> > careful not to
> > +* leak the shared memory.  Note that we might have created 
> > a new
> > +* atts_index entry above, but we haven't put anything in 
> > it yet.
> > +*/
> > +   dsa_free(CurrentSharedRecordTypmodRegistry.area, shared_dp);
> > + 

Re: [HACKERS] Thoughts on unit testing?

2017-08-15 Thread Thomas Munro
On Mon, Aug 14, 2017 at 2:31 PM, Craig Ringer  wrote:
> On 14 August 2017 at 03:19, Peter Geoghegan  wrote:
>> It is my understanding that much of the benefit of unit testing comes
>> from maintainability. It's something that goes well with design by
>> contract. Writing unit tests is a forcing function. It requires
>> testable units, making the units more composable. The programmer must
>> be very deliberate about state, and must delineate code as testable
>> units. Unit tests are often supposed to be minimal, to serve as a kind
>> of design document.
>
>
> This is why I personally only find true unit tests useful as part of large
> software packages like Pg when they cover functional units that can be
> fairly well isolated.
>
> However, I'm not sure that Thomas meant unit tests in the formal sense
> isolated and mocked, but rather in the sense of test procedures written in C
> to excercise portions of Pg's code that are not easily/conveniently tested
> from SQL.

I'm interested in both.  The former would be marvellous but difficult
to get to from here, and the latter seems quite achievable and very
useful.  I'm hoping we can start figuring out a good way to do the
latter and make use of it initially for isolated new code, and then
see where that leads.  It's notoriously difficult to retrofit unit
tests to large old code bases and I have a few scars from doing that
(or trying) with big proprietary C and C++ systems.  But I'm
interested in gradual changes that cutting down on global state that
makes it difficult.  Here you can see a tiny example of that:

https://www.postgresql.org/message-id/CAEepm%3D1vUNNBUvTfP%2BJ7wgSqtEbb5NAg01VoZ2hPVyKG2qo8Qw%40mail.gmail.com

Things like this, even after the massive effort of moving enough
things into it to make it useful, wouldn't necessarily allow for
so-called mock session creation (ie a different implementation --
which I guess would involve interacting with the session via a vtable
of function pointers, or some compile time trickery?), but it's a
small step, and might eventually help you set up and tear down test
environments.

> Pg lacks the strict module delineation needed to make formal unit testing
> practical, as you have noted. That doesn't mean test frameworks couldn't be
> useful. I routinely write tests for reasonably isolated functionality in my
> code by writing throwaway SQL-callable functions to exercise it and
> pg_regress tests to run them. It's not strict unit testing as the rest of
> Pg's APIs aren't mocked away, but it's very practical small-unit integration
> testing that helps catch issues.
>
> I wouldn't mind having an easier and nicer way to do that built in to Pg,
> but don't have many ideas about practical, low-maintenance ways to achieve
> it.

I see now that the idea I mentioned earlier amounts to doing what
you've been doing, except with more tooling and conventions to lower
the barrier to use it.  A set of standard test macros, automatic
discovery of your tests, collocation of tests alongside the module
under test in the source tree, good test result reporting etc.  I'm
planning to come back to this in a while and try out some ideas for
the minimum useful scaffolding, but I'd love to hear any suggestions.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-15 Thread Thomas Munro
On Tue, Aug 15, 2017 at 5:44 PM, Thomas Munro
 wrote:
> On Mon, Aug 14, 2017 at 12:32 PM, Andres Freund  wrote:
>> But architecturally I'm still not sure I quite like the a bit ad-hoc
>> manner session state is defined here. I think we much more should go
>> towards a PGPROC like PGSESSION array, that PGPROCs reference. That'd
>> also be preallocated in "normal" shmem. From there things like the
>> handle for a dht typmod table could be referenced.  I think we should
>> slowly go towards a world where session state isn't in a lot of file
>> local static variables.  I don't know if this is the right moment to
>> start doing so, but I think it's quite soon.
>
> No argument from me about that general idea.  All our global state is
> an obstacle for testability, multi-threading, new CPU scheduling
> architectures etc.  I had been trying to avoid getting too adventurous
> here, but here goes nothing... In this version there is an honest
> Session struct.  There is still a single global variable --
> CurrentSession -- which would I guess could be a candidate to become a
> thread-local variable from the future (or alternatively an argument to
> every function that needs session access).  Is this better?  Haven't
> tested this much yet but seems like better code layout to me.

> 0006-Introduce-a-shared-memory-record-typmod-registry.patch

+/*
+ * A struct encapsulating some elements of a user's session.  For now this
+ * manages state that applies to parallel query, but it principle it could
+ * include other things that are currently global variables.
+ */
+typedef struct Session
+{
+   dsm_segment*segment;/* The session-scoped
DSM segment. */
+   dsa_area   *area;   /* The
session-scoped DSA area. */
+
+   /* State managed by typcache.c. */
+   SharedRecordTypmodRegistry *typmod_registry;
+   dshash_table   *record_table;   /* Typmods indexed by tuple
descriptor */
+   dshash_table   *typmod_table;   /* Tuple descriptors indexed
by typmod */
+} Session;

Upon reflection, these members should probably be called
shared_record_table etc.  Presumably later refactoring would introduce
(for example) local_record_table, which would replace the following
variable in typcache.c:

static HTAB *RecordCacheHash = NULL;

... and likewise for NextRecordTypmod and RecordCacheArray which
together embody this session's local typmod registry and ability to
make more.

The idea here is eventually to move all state that is tried to a
session into this structure, though I'm not proposing to do any more
of that than is necessary as part of *this* patchset.  For now I'm
just looking for a decent place to put the minimal shared session
state, but in a way that allows us "slowly [to] go towards a world
where session state isn't in a lot of file local static variables" as
you put it.

There's a separate discussion to be had about whether things like
assign_record_type_typmod() should take a Session pointer or access
the global variable (and perhaps in future thread-local)
CurrentSession, but the path of least resistance for now is, I think,
as I have it.

On another topic, I probably need to study and test some failure paths better.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
I wrote:
> I like your plan (2).  It's not much code and it lends itself to having a
> run-time check, rather than just an Assert, that we found a Result node.
> That seems like a good idea now that we've found the assumption isn't
> bulletproof.  However, do we need to worry about the planner putting some
> nontrivial computation into the Gather's tlist instead of the Result?

I concluded that it'd probably be enough to have an assertion that the
Gather's tlist is trivial, so I made it work that way.

However, while investigating the behavior of force_parallel_mode along
the way to this, I found that standard_planner() contains some fuzzy
thinking about how to set parallelModeNeeded.  It's not necessary or
(IMO) approriate to force that on just because of force_parallel_mode,
so I propose the attached patch, which deletes that stanza in favor of
initializing glob->parallelModeNeeded to just plain false.  The effect of
this will be that parallelModeNeeded is only set true if there's actually
a Gather (or GatherMerge) node somewhere in the plan.  The only case where
that differs from the existing behavior is if the initial checks conclude
that parallelModeOK can be turned on, but then we end up with a
parallel-unsafe plan for some reason or other.  The idea of the existing
code seems to be "let's exercise what happens if the executor does
EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
parallel-unsafe one"; which seems to me to be bogus as heck.  If it failed,
we would not conclude that that was an executor bug.

So I think we should apply and back-patch the below.

regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 407df9a..e407b34 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** standard_planner(Query *parse, int curso
*** 249,254 
--- 249,255 
  	glob->lastPlanNodeId = 0;
  	glob->transientPlan = false;
  	glob->dependsOnRole = false;
+ 	glob->parallelModeNeeded = false;
  
  	/*
  	 * Assess whether it's feasible to use parallel mode for this query. We
*** standard_planner(Query *parse, int curso
*** 290,307 
  		glob->parallelModeOK = false;
  	}
  
- 	/*
- 	 * glob->parallelModeNeeded should tell us whether it's necessary to
- 	 * impose the parallel mode restrictions, but we don't actually want to
- 	 * impose them unless we choose a parallel plan, so it is normally set
- 	 * only if a parallel plan is chosen (see create_gather_plan).  That way,
- 	 * people who mislabel their functions but don't use parallelism anyway
- 	 * aren't harmed.  But when force_parallel_mode is set, we enable the
- 	 * restrictions whenever possible for testing purposes.
- 	 */
- 	glob->parallelModeNeeded = glob->parallelModeOK &&
- 		(force_parallel_mode != FORCE_PARALLEL_OFF);
- 
  	/* Determine what fraction of the plan is likely to be scanned */
  	if (cursorOptions & CURSOR_OPT_FAST_PLAN)
  	{
--- 291,296 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-08-15 Thread Jeff Janes
On Sun, Apr 9, 2017 at 4:22 AM, Michael Banck 
wrote:

> Hi,
>
> Am Freitag, den 24.03.2017, 19:32 +0100 schrieb Michael Banck:
> > On Thu, Mar 23, 2017 at 12:41:54PM +0100, Magnus Hagander wrote:
> > > On Tue, Mar 21, 2017 at 8:34 PM, Robert Haas 
> wrote:
> > > > So I tend to think that there should always be some explicit user
> > > > action to cause the creation of a slot, like --create-slot-if-needed
> > > > or --create-slot=name.  That still won't prevent careless use of that
> > > > option but it's less dangerous than assuming that a user who refers
> to
> > > > a nonexistent slot intended to create it when, perhaps, they just
> > > > typo'd it.
> > >
> > > Well, the explicit user action would be "--slot". But sure, I can
> > > definitely live with a --create-if-not-exists.
> >
> > Can we just make that option create slot and don't worry if one exists
> > already? CreateReplicationSlot() can be told to not mind about already
> > existing slots, and I don't see a huge point in erroring out if the slot
> > exists already, unless somebody can show that this leads to data loss
> > somehow.
> >
> > > The important thing, to me, is that you can run it as a single
> > > command, which makes it a lot easier to work with. (And not like we
> > > currently have for pg_receivewal which requires a separate command to
> > > create the slot)
> >
> > Oh, that is how it works with pg_receivewal, I have to admit I've never
> > used it so was a bit confused about this when I read its code.
> >
> > So in that case I think we don't necessarily need to have the same user
> > interface at all. I first thought about just adding "-C, --create" (as
> > in "--create --slot=foo"), but this on second thought this looked a bit
> > shortsighted - who knows what flashy thing pg_basebackup might create in
> > 5 years... So I settled on --create-slot, which is only slightly more to
> > type (albeit repetive, "--create-slot --slot=foo"), but adding a short
> > option "-C" would be fine I thinkg "-C -S foo".
> >
> > So attached is a patch with adds that option. If people really think it
> > should be --create-slot-if-not-exists instead I can update the patch, of
> > course.
> >
> > I again added a second patch with some further refactoring which makes
> > it print a message on temporary slot creation in verbose mode.
>
> Rebased, squashed and slighly edited version attached. I've added this
> to the 2017-07 commitfest now as well:
>
> https://commitfest.postgresql.org/14/1112/



Can you rebase this past some conflicting changes?

Thanks,

Jeff


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-08-15 Thread Robert Haas
On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy  wrote:
> [ new patch ]

I spent some time going over this patch.  I initially thought it only
needed minor cosmetic tweaking but the more I poked at it the more
things I found that seemed like they should be changed, so the
attached version looks pretty significantly different from what was
last posted.  It's not actually as different as it looks because a lot
of the changes are purely cosmetic.

Changes:

- Rewrote the documentation, many of the comments, and some of the
other messages significantly.
- Renamed private functions so they all start with apw_ instead of
having what seemed to be a mix of naming conventions.
- Reorganized the file so that the important functions are at the top.
- Added prototypes for the static functions that lacked them.
- Got rid of AutoPrewarmTask.
- Got rid of skip_prewarm_on_restart.
- Added LWLockAcquire/LWLockRelease calls in many places where they
were left out.  This may make no difference but it seems safer.
- Refactored the worker-starting code into two separate functions, one
for the main worker and one for the per-database worker.
- Inlined some functions that were only called from one place.
- Rewrote the delay loop.  Previously this used a struct timeval but
tv_usec was always 0 and the actual struct was never passed to any
system function, so I think this loop couldn't have been accurate to
more than the nearest second and depending unnecessarily on the
operating system structure seems pointless.  I changed also changed it
to be more explicit about the autoprewarm_interval == 0 case and to
bump the last dump time before, rather than after, dumping.
Otherwise, the time between dumps will be increased by the amount of
time the dump itself takes, which is not what the user will expect.
- Used the correct PG_RETURN macro -- the return type of
autoprewarm_dump_now is int8, so PG_RETURN_INT64 must be used.
- Updated various other places to use int64 for consistency.
- Possibly a few other things I'm forgetting about right now.

It's quite possible that in making all of these changes I've
introduced some bugs, so I think this needs some testing and review.
It's also possible that some of the changes that I made are actually
not improvements and should be reverted, but it's always hard to tell
that about your own code.  Anyway, please see the attached version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


autoprewarm-rmh.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Tomas Vondra



On 08/15/2017 09:55 PM, Robert Haas wrote:

On Tue, Aug 15, 2017 at 3:42 PM, Tomas Vondra
 wrote:

What I think we should not do is interpret the bitmasks (omitting some of
the information) assuming all the bits were set correctly.


I'm still confused. HEAP_XMIN_COMMITTED|HEAP_XMIN_ABORTED == 
HEAP_XMIN_FROZEN. Nobody is proposing to omit anything; to the 
contrary, what's being proposed is not to display the same thing

twice (and in a misleading fashion, to boot).



I understand your point. Assume you're looking at this bit of code:

if (HeapTupleHeaderXminCommitted(enumval_tup->t_data))
return;

which is essentially

if (enumval_tup->t_data & HEAP_XMIN_COMMITTED)
return;

If the function only gives you HEAP_XMIN_FROZEN, how likely is it you 
miss this actually evaluates as true?


You might say that people investigating issues in this area of code 
should be aware of how HEAP_XMIN_FROZEN is defined, and perhaps you're 
right ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-08-15 Thread Andres Freund
Hi,

Personally I find the split between 03 and 04 and their naming a bit
confusing. I'd rather just merge them.  These patches need a rebase,
they don't apply anymore.


On 2017-07-06 12:04:12 +0530, Beena Emerson wrote:
> @@ -4813,6 +4836,18 @@ XLOGShmemSize(void)
>   {
>   charbuf[32];
>  
> + /*
> +  * The calculation of XLOGbuffers requires the run-time 
> parameter
> +  * XLogSegSize which is set from the control file. This value is
> +  * required to create the shared memory segment. Hence, 
> temporarily
> +  * allocate space for reading the control file.
> +  */

This makes me uncomfortable.  Having to choose the control file multiple
times seems wrong.  We're effectively treating the control file as part
of the configuration now, and that means we should move it's parsing to
an earlier part of startup.


> + if (!IsBootstrapProcessingMode())
> + {
> + ControlFile = palloc(sizeof(ControlFileData));
> + ReadControlFile();
> + pfree(ControlFile);

General plea: Please reset to NULL in cases like this, otherwise the
pointer will [temporarily] point into a freed memory location, which
makes debugging harder.



> @@ -8146,6 +8181,9 @@ InitXLOGAccess(void)
>   ThisTimeLineID = XLogCtl->ThisTimeLineID;
>   Assert(ThisTimeLineID != 0 || IsBootstrapProcessingMode());
>  
> + /* set XLogSegSize */
> + XLogSegSize = ControlFile->xlog_seg_size;
> +

Hm, why do we have two variables keeping track of the segment size?
wal_segment_size and XLogSegSize? That's bound to lead to confusion.


>   /* Use GetRedoRecPtr to copy the RedoRecPtr safely */
>   (void) GetRedoRecPtr();
>   /* Also update our copy of doPageWrites. */
> diff --git a/src/backend/bootstrap/bootstrap.c 
> b/src/backend/bootstrap/bootstrap.c
> index b3f0b3c..d2c524b 100644
> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -19,6 +19,7 @@
>  
>  #include "access/htup_details.h"
>  #include "access/xact.h"
> +#include "access/xlog_internal.h"
>  #include "bootstrap/bootstrap.h"
>  #include "catalog/index.h"
>  #include "catalog/pg_collation.h"
> @@ -47,6 +48,7 @@
>  #include "utils/tqual.h"
>  
>  uint32   bootstrap_data_checksum_version = 0;/* No checksum 
> */
> +uint32   XLogSegSize;

Se we define the same variable declared in a header in multiple files
(once for each applicationq)?  I'm pretty strongly against that. Why's
that needed/a good idea?  Neither is it clear to me why the definition
was moved from xlog.c to bootstrap.c? That doesn't sound like a natural
place.


>  /*
> + * Calculate the default wal_size in proper unit.
> + */
> +static char *
> +pretty_wal_size(int segment_count)
> +{
> + double  val = wal_segment_size / (1024 * 1024) * segment_count;
> + double  temp_val;
> + char   *result = malloc(10);
> +
> + /*
> +  * Wal segment size ranges from 1MB to 1GB and the default
> +  * segment_count is 5 for min_wal_size and 64 for max_wal_size, so the
> +  * final values can range from 5MB to 64GB.
> +  */

Referencing the defaults here seems unnecessary. And nearly a guarantee
that the values in the comment will go out of date soon-ish.


> + /* set default max_wal_size and min_wal_size */
> + snprintf(repltok, sizeof(repltok), "min_wal_size = %s",
> +  pretty_wal_size(DEFAULT_MIN_WAL_SEGS));
> + conflines = replace_token(conflines, "#min_wal_size = 80MB", repltok);
> +
> + snprintf(repltok, sizeof(repltok), "max_wal_size = %s",
> +  pretty_wal_size(DEFAULT_MAX_WAL_SEGS));
> + conflines = replace_token(conflines, "#max_wal_size = 1GB", repltok);
> +

Hm. So postgresql.conf.sample values are now going to contain misleading
information for clusters with non-default segment sizes.

Have we discussed instead defaulting min_wal_size/max_wal_size to a
constant amount of megabytes and rounding up when it doesn't work for
a particular segment size?


> diff --git a/src/include/access/xlog_internal.h 
> b/src/include/access/xlog_internal.h
> index 9c0039c..c805f12 100644
> --- a/src/include/access/xlog_internal.h
> +++ b/src/include/access/xlog_internal.h
> @@ -91,6 +91,11 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
>   */
>  
>  extern uint32 XLogSegSize;
> +#define XLOG_SEG_SIZE XLogSegSize

I don't think this is a good idea, we should rather rip the bandaid
of and remove this macro. If people are assuming it's a macro they'll
just run into more confusing errors/problems.


> diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
> index f3b3529..f31c30e 100644
> --- a/src/include/pg_config_manual.h
> +++ b/src/include/pg_config_manual.h
> @@ -14,6 +14,12 @@
>   */
>  
>  /*
> + * This is default value for WAL_segment_size 

Re: [HACKERS] One-shot expanded output in psql using \gx

2017-08-15 Thread Tobias Bussmann
I've tested the new \gx against 10beta and current git HEAD. Actually one of my 
favourite features of PostgreSQL 10! However in my environment it was behaving 
strangely. After some debugging I found that \gx does not work if you have \set 
FETCH_COUNT n before. Please find attached a patch that fixes this incl. new 
regression test.

Best regards,
Tobias


 

psql_gx_fetch_count_v1.patch
Description: Binary data




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A suspicious code in pgoutput_startup().

2017-08-15 Thread Peter Eisentraut
On 7/27/17 20:52, Yugo Nagata wrote:
> 175 /* Check if we support requested protocol */
> 176 if (data->protocol_version != LOGICALREP_PROTO_VERSION_NUM)
> 177 ereport(ERROR,
> 178 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> 179  errmsg("client sent proto_version=%d but we only 
> support protocol %d or lower",
> 180 data->protocol_version, 
> LOGICALREP_PROTO_VERSION_NUM)));
> 
> Although the if condition is not-equal, the error message says 
> "we only support protocol %d or lower".  Is this intentional?
> Or should this be fixed as below? 
> 
> 176 if (data->protocol_version > LOGICALREP_PROTO_VERSION_NUM)
> 
> Attached is a simple patch in case of fixing.

Fixed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 2:34 PM, Tom Lane  wrote:
>> I assume (haven't looked) that I could hack the plpgsql code to prevent
>> generating a parallel plan when it's decided the command is a simple
>> SELECT.  But I wonder whether that's the right place to fix it.  Does
>> it ever make sense to parallelize a plan that can't possibly benefit?
>> IOW I am not sure that we should carry force_parallel_mode this far.

> Well, I think the point of force_parallel_mode is to try running
> things in workers when it's legal but not necessarily beneficial, so
> I'd be disinclined to nerf this too much, but it might be OK to nerf
> it a little bit.

Fair enough.

> Off-hand, I'd say the obvious options are:
> (1) teach exec_simple_check_plan() to consider everything non-simple
> when force_parallel_mode is not off, or
> (2) teach exec_save_simple_expr() to see through a Gather node to the
> Result node underneath

I had been looking into (3) get plpgsql to drop the CURSOR_OPT_PARALLEL_OK
bit when it decides the query is simple.  But because (a) that bit is set
before we make the test, in current behavior, and (b) there are various
SPI interfaces in the way of changing it later, that approach is looking
pretty messy.

I like your plan (2).  It's not much code and it lends itself to having a
run-time check, rather than just an Assert, that we found a Result node.
That seems like a good idea now that we've found the assumption isn't
bulletproof.  However, do we need to worry about the planner putting some
nontrivial computation into the Gather's tlist instead of the Result?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 3:42 PM, Tomas Vondra
 wrote:
> What I think we should not do is interpret the bitmasks (omitting some of
> the information) assuming all the bits were set correctly.

I'm still confused.  HEAP_XMIN_COMMITTED|HEAP_XMIN_ABORTED ==
HEAP_XMIN_FROZEN.  Nobody is proposing to omit anything; to the
contrary, what's being proposed is not to display the same thing twice
(and in a misleading fashion, to boot).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Tomas Vondra



On 08/15/2017 07:54 PM, Robert Haas wrote:

On Tue, Aug 15, 2017 at 9:59 AM, Tomas Vondra
 wrote:

I don't think so -- the "committed" and "invalid" meanings are
effectively canceled when the "frozen" mask is present.

I mean, "committed" and "invalid" contradict each other...


FWIW I agree with Craig - the functions should output the masks raw, without
any filtering. The reason is that when you're investigating data corruption
or unexpected behavior, all this is very useful when reasoning about what
might (not) have happened.

Or at least make the filtering optional.


I don't think "filtering" is the right way to think about it.  It's
just labeling each combination of bits with the meaning appropriate to
that combination of bits.

I mean, if you were displaying the contents of a CLOG entry, would you
want the value 3 to be displayed as COMMITTED ABORTED SUBCOMMITTED
because TRANSACTION_STATUS_COMMITTED|TRANSACTION_STATUS_ABORTED ==
TRANSACTION_STATUS_SUB_COMMITTED?

I realize that you may be used to thinking of the HEAP_XMIN_COMMITTED
and HEAP_XMAX_COMMITTED bits as two separate bits, but that's not
really true any more.  They're a 2-bit field that can have one of four
values: committed, aborted, frozen, or none of the above.



All I'm saying is that having the complete information (knowing which 
bits are actually set in the bitmask) is valuable when reasoning about 
how you might have gotten to the current state. Which I think is what 
Craig is after.


What I think we should not do is interpret the bitmasks (omitting some 
of the information) assuming all the bits were set correctly.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication message type 'Y' is missing in docs

2017-08-15 Thread Peter Eisentraut
On 8/9/17 20:22, Masahiko Sawada wrote:
> There is a type of logical replication message 'Y' for data types, but
> it's not documented in section 52.9. Logical Replication Message
> Formats. Attached patch fixes this. I think it can be PG10 item.

Committed with some tweaking.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little improvementof ApplyLauncherMain loop code

2017-08-15 Thread Peter Eisentraut
On 8/1/17 02:28, Yugo Nagata wrote:
> When reading the logical replication code, I found that the following
> part could be improved a bit. In the foreach, LWLockAcquire and
> logicalrep_worker_find are called for each loop, but they are needed
> only when sub->enabled is true.

Fixed, thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 2:34 PM, Tom Lane  wrote:
> Tom Lane  writes:
>> Simplify plpgsql's check for simple expressions.
>> ...
>> https://git.postgresql.org/pg/commitdiff/00418c61244138bd8ac2de58076a1d0dd4f539f3
>
> The buildfarm members that are running force_parallel_mode = regress
> are not happy with this.  Apparently, even a trivial SELECT 
> can be turned into a Gather plan if force_parallel_mode says so.
>
> I assume (haven't looked) that I could hack the plpgsql code to prevent
> generating a parallel plan when it's decided the command is a simple
> SELECT.  But I wonder whether that's the right place to fix it.  Does
> it ever make sense to parallelize a plan that can't possibly benefit?
> IOW I am not sure that we should carry force_parallel_mode this far.

Well, I think the point of force_parallel_mode is to try running
things in workers when it's legal but not necessarily beneficial, so
I'd be disinclined to nerf this too much, but it might be OK to nerf
it a little bit.  Off-hand, I'd say the obvious options are:

(1) teach exec_simple_check_plan() to consider everything non-simple
when force_parallel_mode is not off, or
(2) teach exec_save_simple_expr() to see through a Gather node to the
Result node underneath

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Geoghegan
On Tue, Aug 15, 2017 at 11:33 AM, Peter Eisentraut
 wrote:
> On 8/9/17 18:49, Peter Geoghegan wrote:
>> I'd like to give a demo on what is already possible, but not currently
>> documented. I didn't see anyone else comment on this, including Peter
>> E (maybe I missed that?). We should improve the documentation in this
>> area, to get this into the hands of users.
>
> Here is a small piece of documentation.  Thoughts?

This looks pretty good, but I do have some feedback:

* "23.2.2.3. Copying Collations" suggests that the only use of CREATE
COLLATION is copying collations, which is far from true with ICU. We
should change that at the same time as this change is made. I think
that just changing the title would improve the overall flow of the
page.

* Maybe add an example of numeric ordering -- the "alphanumeric
invoice" case, where you want text containing numbers to have the
numbers sort as numbers iff the comparison is to be resolved when
comparing numbers. I think that that's really useful, and worth
specifically calling out. I definitely would have used that had it
been available ten years ago.

* Let's use "en-u-kr-others-digit" instead of "en-u-kr-latn-digit' in
the example. It makes no real difference to us English speakers, but
means that the example works the same for those that use a different
alphabet. It's more culturally neutral.

* If we end up having initdb put all locales rather than all
collations in pg_collation, which I think is very likely, then we can
put in a link to ICU's locale explorer web resource:

https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=en_HK

This lets the user see exactly what they'll get from a base locale
using an intuitive interface (assuming it matches their CLDR version).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PDF content lemma subdivision

2017-08-15 Thread Peter Eisentraut
On 7/8/17 10:24, Erik Rijkers wrote:
> The PDF-version of the documentation has content-'frame' displayed on 
> the left-hand side (I'm viewing with okular; I assmume it will be 
> similar in most viewers).
> 
> That content displays a treeview down to the main entries/lemmata, like 
> 'CREATE TABLE'.  It doesn't go any deeper anymore.

I have committed a fix for this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Geoghegan
On Tue, Aug 15, 2017 at 11:19 AM, Peter Eisentraut
 wrote:
> On 8/14/17 12:15, Peter Eisentraut wrote:
>> Given that we cannot reasonably preload all these new variants that you
>> demonstrated, I think it would make sense to drop all the keyword
>> variants from the preloaded set.
>
> After playing with this a bit, I'm having some doubts.  While the "k"
> keys from TR 35 are algorithmic parameters that apply to all locales and
> can be looked up in the respective documents, I don't find any way a
> user can discover what collation types ("co") are available for a
> locale.  Any ideas?  If there isn't one, I think we need to provide one.

I wanted to do that too, but Tom didn't seem sold on it yesterday. He
called it v11 material over on the ICU bug thread.

All of the unicode "u" extensions are documented per-CLDR version as
an XML file. For example:

http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml

This isn't ideal, because only some of the "co" variants change things
for all possible base collations. But, there isn't that many "co"
options to choose from, and I think that for the most part it's
reasonably obvious which one is desirable. For example, Chinese people
are probably well aware of what Pinyin is, and what stroke is. Things
like EOR and search are much more esoteric, but also much less useful.
So, I wouldn't hate it if this was the only way that users could
discover the variants in v10.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Tom Lane  writes:
> Simplify plpgsql's check for simple expressions.
> ...
> https://git.postgresql.org/pg/commitdiff/00418c61244138bd8ac2de58076a1d0dd4f539f3

The buildfarm members that are running force_parallel_mode = regress
are not happy with this.  Apparently, even a trivial SELECT 
can be turned into a Gather plan if force_parallel_mode says so.

I assume (haven't looked) that I could hack the plpgsql code to prevent
generating a parallel plan when it's decided the command is a simple
SELECT.  But I wonder whether that's the right place to fix it.  Does
it ever make sense to parallelize a plan that can't possibly benefit?
IOW I am not sure that we should carry force_parallel_mode this far.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Eisentraut
On 8/9/17 18:49, Peter Geoghegan wrote:
> I'd like to give a demo on what is already possible, but not currently
> documented. I didn't see anyone else comment on this, including Peter
> E (maybe I missed that?). We should improve the documentation in this
> area, to get this into the hands of users.

Here is a small piece of documentation.  Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From a9d5926b68eb6e0e726b7c9838f6ea8b3b22a157 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 15 Aug 2017 14:31:39 -0400
Subject: [PATCH] doc: Document TR 35 collation options for ICU

---
 doc/src/sgml/charset.sgml | 52 +++
 1 file changed, 52 insertions(+)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 48ecfc5f48..7bb645a39f 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -709,6 +709,58 @@ ICU collations
 will draw an error along the lines of collation "de-x-icu" for
 encoding "WIN874" does not exist.

+
+   
+ICU allows collations to be customized beyond the basic
+language/country/type set that is preloaded by initdb.
+Users are encouraged to define their own collation objects that make use
+of these facilities to suit the sorting behavior to their requirements.
+Here are some examples:
+
+ 
+  CREATE COLLATION digitslast (provider = icu, locale = 
'en-u-kr-latn-digit')
+  
+   
+Sort digits after letters.  (The default is digits before letters.)
+   
+  
+ 
+
+ 
+  CREATE COLLATION upperfirst (provider = icu, locale = 
'en-u-kf-upper')
+  
+   
+Sort upper-case letters before lower-case letters.  (The default is
+lower-case letters first.)
+   
+  
+ 
+
+ 
+  CREATE COLLATION special (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit')
+  
+   
+Combines both of the above options.
+   
+  
+ 
+
+
+
+See http://unicode.org/reports/tr35/tr35-collation.html;>Unicode
+Technical Standard #35
+and https://tools.ietf.org/html/bcp47;>BCP 47 for
+details.
+   
+
+   
+Note that while this system allows creating collations that ignore
+case or ignore accents or similar (using
+the ks key), PostgreSQL does not at the moment allow
+such collations to act in a truly case- or accent-insensitive manner.  Any
+strings that compare equal according to the collation but are not
+byte-wise equal will be sorted according to their byte values.
+   


 
-- 
2.14.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Eisentraut
On 8/14/17 12:15, Peter Eisentraut wrote:
> Given that we cannot reasonably preload all these new variants that you
> demonstrated, I think it would make sense to drop all the keyword
> variants from the preloaded set.

After playing with this a bit, I'm having some doubts.  While the "k"
keys from TR 35 are algorithmic parameters that apply to all locales and
can be looked up in the respective documents, I don't find any way a
user can discover what collation types ("co") are available for a
locale.  Any ideas?  If there isn't one, I think we need to provide one.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 9:59 AM, Tomas Vondra
 wrote:
>> I don't think so -- the "committed" and "invalid" meanings are
>> effectively canceled when the "frozen" mask is present.
>>
>> I mean, "committed" and "invalid" contradict each other...
>
> FWIW I agree with Craig - the functions should output the masks raw, without
> any filtering. The reason is that when you're investigating data corruption
> or unexpected behavior, all this is very useful when reasoning about what
> might (not) have happened.
>
> Or at least make the filtering optional.

I don't think "filtering" is the right way to think about it.  It's
just labeling each combination of bits with the meaning appropriate to
that combination of bits.

I mean, if you were displaying the contents of a CLOG entry, would you
want the value 3 to be displayed as COMMITTED ABORTED SUBCOMMITTED
because TRANSACTION_STATUS_COMMITTED|TRANSACTION_STATUS_ABORTED ==
TRANSACTION_STATUS_SUB_COMMITTED?

I realize that you may be used to thinking of the HEAP_XMIN_COMMITTED
and HEAP_XMAX_COMMITTED bits as two separate bits, but that's not
really true any more.  They're a 2-bit field that can have one of four
values: committed, aborted, frozen, or none of the above.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-15 Thread Robert Haas
On Wed, Aug 9, 2017 at 10:11 PM, Amit Langote
 wrote:
>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>> is a very good idea.
>
> I put this patch ahead in the list and so it's now 0001.

I think what you've currently got as
0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch is a
bug fix that probably needs to be back-patched into v10, so it should
come first.

I think 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch and
0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch should
be merged into one patch and that should come next, followed by
0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch and
finally what you now have as
0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch.

This patch series is blocking a bunch of other things, so it would be
nice if you could press forward with this quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-08-15 Thread Robert Haas
On Thu, Aug 10, 2017 at 8:00 AM, Ashutosh Bapat
 wrote:
> Attached patches with the comments addressed.

I have committed 0001-0003 as 480f1f4329f1bf8bfbbcda8ed233851e1b110ad4
and e139f1953f29db245f60a7acb72fccb1e05d2442.

0004 doesn't apply any more, probably due to commit
d57929afc7063431f80a0ac4c510fc39aacd22e6.  I think something along
these lines could be separately committed prior to the main patch, and
I think that would be a good idea just to flush out any bugs in this
part independently of the rest.  However, I also think that we
probably ought to try to get Amit Langote's changes to this function
to repair the locking order and expand in bound order committed before
proceeding with these changes.

In fact, I think there's a certain amount of conflict between what's
being discussed over there and what you're trying to do here.  In that
thread, we propose to move partitioned tables at any level to the
front of the inheritance expansion.  Here, however, you want to expand
level by level.  I think partitioned-tables-first is the right
approach for the reasons discussed on the other thread; namely, we
want to be able to prune leaf partitions before expanding them, but
that requires us to expand all the non-leaf tables first to maintain a
consistent locking order in all scenarios.  So the approach you've
taken in this patch may need to be re-thought somewhat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-08-15 Thread Christoph Berg
Re: Thomas Munro 2017-08-10 

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-15 Thread Tom Lane
Andres Freund  writes:
> I don't see any relevant uses of sockets in older branches, did I miss 
> something?

No, I think we don't need to go back further than v10.  Even if it turns
out we do, I'd just as soon let this get a bit of field testing first.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-15 Thread Andres Freund


On August 15, 2017 8:07:43 AM PDT, Andres Freund  wrote:
>
>
>On August 15, 2017 7:04:59 AM PDT, Tom Lane  wrote:
>>Peter Eisentraut  writes:
>>> On 8/14/17 10:57, Tom Lane wrote:
 I think we could commit add-connected-event-2.patch and call this
 issue resolved.
>>
>>> Would you like to commit your patch then?
>>
>>It's really Andres' patch, but I can push it.
>
>I'll in a bit, unless you prefer to do it. You seem busy enough ;)

Hah, synchronicity. That works too ;)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-15 Thread Andres Freund


On August 15, 2017 7:04:59 AM PDT, Tom Lane  wrote:
>Peter Eisentraut  writes:
>> On 8/14/17 10:57, Tom Lane wrote:
>>> I think we could commit add-connected-event-2.patch and call this
>>> issue resolved.
>
>> Would you like to commit your patch then?
>
>It's really Andres' patch, but I can push it.

I'll in a bit, unless you prefer to do it. You seem busy enough ;)

I don't see any relevant uses of sockets in older branches, did I miss 
something?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-15 Thread Alvaro Herrera
Tom Lane wrote:
> Robert Haas  writes:
> > I really, really strongly encourage you to rip the use of DSA out here
> > entirely.  It is reducing the reliability of a critical part of the
> > system for no actual benefit other than speculation that this is going
> > to be better in the future, and it adds a bunch of failure cases that
> > we could just as well live without.
> 
> FWIW, I vote with Robert on this.  When and if you actually want to make
> that array resizable, it'd be time to introduce use of a DSA.  But right
> now we need to be looking for simple and reliable solutions for v10.

Okay, I'll make it so.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] recovery_target_time = 'now' is not an error but still impractical setting

2017-08-15 Thread Piotr Stefaniak
First I'll describe my setup just to give you some context. If anyone
would like to discuss my ideas or propose their own ideas for
discussion, let's do so on -ADMIN or -GENERAL.

I have multiple production database clusters which I want to make
backups of. Restoring from plain dumps takes too long, so I made an
almost typical continuous archiving setup. The unusual assumption in
this case is that the standbys are all on a single machine and they are
not always running. There are multiple $PGDATA directories on the
backups machine, but only one postmaster running in standby mode,
replaying archived WAL files from each master. When it's finished
replaying them for one $PGDATA, it'll move to another. That way they all
will be sufficiently up to date while not requiring resources needed for
N replicas running all the time on a single machine. This of course
requires that the standbys are never promoted, never change the
timeline, etc. - they need to be able to keep replaying WAL files from
the masters.

I've achieved what I wanted essentially by setting standby_mode = on and
restore_command = 'cp /archivedir/%f "%p" || { pg_ctl -D . stop && false
; }', but I was looking for a more elegant solution. Which brings us to
the topic.

One thing I tried was a combination of recovery_target_action =
'shutdown' and recovery_target_time = 'now'. The result is surprising,
because then the standby tries to set the target to year 2000. That's
because recovery_target_time depends on timestamptz_in(), the result of
which can depend on GetCurrentTransactionStartTimestamp(). But at that
point there isn't any transaction yet. Which is why I'm getting
"starting point-in-time recovery to 2000-01-01 01:00:00+01".

At the very least, I think timestamptz_in() should either complain about
being called outside of transaction or return the expected value,
because returning year 2000 is unuseful at best. I would also like to
become able to do what I'm doing in a less hacky way (assuming there
isn't one already but I may be wrong), perhaps once there is a new
'furthest' setting for recovery_target or when recovery_target_time =
'now' works as I expected.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Masahiko Sawada
On Tue, Aug 15, 2017 at 10:59 PM, Tomas Vondra
 wrote:
>
>
> On 08/15/2017 03:24 PM, Robert Haas wrote:
>>
>> On Mon, Aug 14, 2017 at 9:59 PM, Craig Ringer 
>> wrote:
>>>
>>> The bits are set, those macros just test to exclude the special meaning
>>> of
>>> both bits being set at once to mean "frozen".
>>>
>>> I was reluctant to filter out  HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID
>>> when we detect that it's frozen, because that could well be misleading
>>> when
>>> debugging.
>>
>>
>> I don't think so -- the "committed" and "invalid" meanings are
>> effectively canceled when the "frozen" mask is present.
>>
>> I mean, "committed" and "invalid" contradict each other...
>>
>
> FWIW I agree with Craig - the functions should output the masks raw, without
> any filtering. The reason is that when you're investigating data corruption
> or unexpected behavior, all this is very useful when reasoning about what
> might (not) have happened.
>
> Or at least make the filtering optional.
>

I'd vote for having both and making one optional (perhaps filtering?).
Both are useful to me for the debugging and study purpose.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-15 Thread Tom Lane
Peter Eisentraut  writes:
> On 8/14/17 10:57, Tom Lane wrote:
>> I think we could commit add-connected-event-2.patch and call this
>> issue resolved.

> Would you like to commit your patch then?

It's really Andres' patch, but I can push it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-15 Thread Chris Travers
On Tue, Aug 15, 2017 at 3:32 PM, Tom Lane  wrote:

> Chris Travers  writes:
> > I wonder about a different solution.  Would it be possible to special
> case
> > vacuum to check for and remove (or just move to where they can be
> removed)
> > files when vacuuming pg_class?  At the point we are vacuuming pg_class,
> we
> > ought to be able to know that a relfilenode shouldn't be used anymore,
> > right?
>
> I don't think so.  It's not clear to me whether you have in mind "scan
> pg_class, collect relfilenodes from all live tuples, then zap all files
> not in that set" or "when removing a dead tuple, zap the relfilenode
> it mentions".  But neither one works.  The first case has a race condition
> against new pg_class entries.  As for the second, the existence of a dead
> tuple bearing relfilenode N isn't evidence that some other live tuple
> can't have relfilenode N.
>

Ah because if the file never made it on to disk the number could be
re-used.

>
> Another problem for the second solution is that in the case you're worried
> about (ie, PANIC due to out-of-WAL-space during relation's creating
> transaction), there's no very good reason to expect that the relation's
> pg_class tuple ever made it to disk at all.
>
> A traditional low-tech answer to this has been to keep the WAL on a
> separate volume from the main data store, so that it's protected from
> out-of-space conditions in the main store and temp areas.  The space
> needs for WAL proper are generally much more predictable than the main
> store, so it's easier to keep the dedicated space from overflowing.
> (Stalled replication/archiving processes can be hazardous to your
> health in this scenario, though, if they prevent prompt recycling of
> WAL files.)
>

Yeah, most of our dbs here have wal on a separate volume but not this
system.  This system is also unusual in that disk usage varies wildly (and
I am not 100% sure that this is the only case which causes it though I can
reproduce it consistently in the case of the wal writer running out of disk
space with symptoms exactly what I found).

So for now that leaves my fallback approach as a way to fix it when I see
it.

I have written a shell script which does as follows:
1.  starts Postgres in single user mode with a data directory or dies
(won't run if Postgres seems to be already running)
2.  gets the old of the current database
3.  lists all files consisting of only digits in the  base/[dboid] directory
4. asks Postgres (In single user mode again) for all relfilenodes and oids
of tables (In my testing both were required because there were some cases
where relfilenodes were not set in some system
5.  Loops through the file nodes gathered, checks against the relfilenode
entries, and zaps $f, $f_*, and $f.*.  Currently for testing "zaps" has
been to move to a lostnfound folder for inspection following the script.
The logic here is not perfect and is very slightly under inclusive, but
better that than the other way.

Then we can start Postgres again.  I cannot find a better way to avoid race
conditions, I guess. At any rate it sounds like preventing the problem more
generally may be something beyond what I would feel comfortable trying to
do as a patch at my current level of familiarity with he source code.

The full script is included inline below my signature in case it is of
interest to anyone on the list.


> regards, tom lane
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com

Saarbrücker Straße 37a, 10405 Berlin

---

#!/bin/bash

datadir=$1
database=$2

pg_ctl -D $datadir stop

dboidfile="$PWD/cleanupdb.oid"
reloidfile="$PWD/refilenodes.list"

echo "COPY (select oid from pg_database where datname = current_database())
TO '$dboidfile'" | postgres --single -D $datadir $database > /dev/null


if (($?))
then
   echo "FATAL: Could not start Postgres in single user mode"
   exit 1
fi

dboid=`cat $dboidfile`
filenodes=`(cd test/base/$dboid; ls [0-9]*[0-9] | grep -v '\.' | sort -n)`
#echo $filenodes

echo "COPY (select relfilenode from pg_class union select oid as
relfilenode from pg_class) TO '$reloidfile'" | postgres --single -D
$datadir $database > /dev/null
relfilenodes=`cat $reloidfile`
#echo $relfilenodes
if [[ -z relfilenodes ]]
then
   echo "FATAL: did not get any relfilenodes"
   exit 2
fi

mkdir lostnfound;
for f in $filenodes
do
  if [[ -z `echo $relfilenodes | grep -w $f` ]]
  then
  echo moving $f to lostnfound
  mv $datadir/base/$dboid/$f lostnfound
  mv $datadir/base/$dboid/${f}_* lostnfound 2> /dev/null
  mv $datadir/base/$dboid/${f}.* lostnfound 2> /dev/null
  fi
done
rm $dboidfile
rm $reloidfile


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-15 Thread Peter Eisentraut
On 8/14/17 10:57, Tom Lane wrote:
> I think we could commit add-connected-event-2.patch and call this
> issue resolved.

Would you like to commit your patch then?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Tomas Vondra



On 08/15/2017 03:24 PM, Robert Haas wrote:

On Mon, Aug 14, 2017 at 9:59 PM, Craig Ringer  wrote:

The bits are set, those macros just test to exclude the special meaning of
both bits being set at once to mean "frozen".

I was reluctant to filter out  HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID
when we detect that it's frozen, because that could well be misleading when
debugging.


I don't think so -- the "committed" and "invalid" meanings are
effectively canceled when the "frozen" mask is present.

I mean, "committed" and "invalid" contradict each other...



FWIW I agree with Craig - the functions should output the masks raw, 
without any filtering. The reason is that when you're investigating data 
corruption or unexpected behavior, all this is very useful when 
reasoning about what might (not) have happened.


Or at least make the filtering optional.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> I really, really strongly encourage you to rip the use of DSA out here
> entirely.  It is reducing the reliability of a critical part of the
> system for no actual benefit other than speculation that this is going
> to be better in the future, and it adds a bunch of failure cases that
> we could just as well live without.

FWIW, I vote with Robert on this.  When and if you actually want to make
that array resizable, it'd be time to introduce use of a DSA.  But right
now we need to be looking for simple and reliable solutions for v10.

In particular, since right now dsm_type = NONE is still considered
supported, we can't really have autovacuum facilities that are dependent
on being able to use DSM.  That might change in future, but not today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-08-15 Thread Robert Haas
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
 wrote:
> I have rebased the patches on the latest commit.

This needs another rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila  wrote:
>> Attached patch fixes the issue for me.  I have locally verified that
>> the gather merge gets executed in rescan path.  I haven't added a test
>> case for the same as having gather or gather merge on the inner side
>> of join can be time-consuming.  However, if you or others feel that it
>> is important to have a test to cover this code path, then I can try to
>> produce one.

> Committed.

> I believe that between this commit and the test-coverage commit from
> Andres, this open item is reasonably well addressed.  If someone
> thinks more needs to be done, please specify.  Thanks.

How big a deal do we think test coverage is?  It looks like
ExecReScanGatherMerge is identical logic to ExecReScanGather,
which *is* covered according to coverage.postgresql.org, but
it wouldn't be too surprising if they diverge in future.

I should think it wouldn't be that expensive to create a test
case, if you already have test cases that invoke GatherMerge.
Adding a right join against a VALUES clause with a small number of
entries, and a non-mergeable/hashable join clause, ought to do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-15 Thread Robert Haas
On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila  wrote:
> I think skipping a generation of gather paths for scan node or top
> level join node generated via standard_join_search seems straight
> forward, but skipping for paths generated via geqo seems to be tricky
> (See use of generate_gather_paths in merge_clump).  Assuming, we find
> some way to skip it for top level scan/join node, I don't think that
> will be sufficient, we have some special way to push target list below
> Gather node in apply_projection_to_path, we need to move that part as
> well in generate_gather_paths.

I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scan on inheritance parent with no children in current session

2017-08-15 Thread Robert Haas
On Mon, Aug 14, 2017 at 1:49 AM, Ashutosh Bapat
 wrote:
> I have modified the comments that way.

Committed with some cleanup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-08-15 Thread Robert Haas
On Sun, Aug 13, 2017 at 11:20 PM, Michael Paquier
 wrote:
> Recent commit 8d98819 has added a missing permissoin check to lo_put()
> to make sure that the write permissions of the object are properly set
> before writing to a large object. When studying the problem, we bumped
> into the fact that it is possible to actually simplify those ACL
> checks and replace them by checks when opening the large object. This
> makes all the checks now in be-fsstubs.c happen in inv_api.c, which is
> where all the large object handling happens at a lower level. This
> way, it is also possible to remove the extra logic in place to have
> the permission checks happen only once.
>
> At the same time, we have bumped into two things:
> - ALLOW_DANGEROUS_LO_FUNCTIONS has been introduced in 1999, so it
> could be time to let it go. I have known of no place where this is
> actively used.
> - lo_import and lo_export on the backend have superuser checks. We
> could remove them and replace them with proper REVOKE permissions to
> allow administrators to give access to those functions.
>
> Attached is a set of 3 patches:
> - 0001 removes ALLOW_DANGEROUS_LO_FUNCTIONS
> - 0002 replaces the superuser checks with GRANT permissions

+1 for 0001 and 0002 in general, but I can't help noticing that they
lead to a noticeable worsening of the error messages in the regression
tests.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-15 Thread Robert Haas
On Mon, Aug 14, 2017 at 7:16 PM, Alvaro Herrera
 wrote:
>> Yeah, and the other question -- which Thomas asked before you
>> originally committed originally, and which I just now asked again is
>> "Why in the world are you using DSA for this at all?".  There are
>> serious problems with that which both he and I have pointed out, and
>> you haven't explained why it's a good idea at any point, AFAICT.
>
> The main reason is that I envision that the workitem stuff will be used
> for other things in the future than just brin summarization, and it
> seemed a lame idea to just use a fixed-size memory area in the standard
> autovacuum shared memory area.  I think unbounded growth is of course
> going to be bad.  The current coding doesn't allow for any growth beyond
> the initial fixed size, but it's easier to extend the system from the
> current point rather than wholly changing shared memory usage pattern
> while at it.

I don't accept that argument.  All the current code does is allocate
one fixed-size chunk of memory in DSA, so the whole pattern would have
to be changed *anyway* if you wanted to grow the array.   It's not
like you are allocating the items one-by-one.

I really, really strongly encourage you to rip the use of DSA out here
entirely.  It is reducing the reliability of a critical part of the
system for no actual benefit other than speculation that this is going
to be better in the future, and it adds a bunch of failure cases that
we could just as well live without.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-15 Thread Tom Lane
Chris Travers  writes:
> I wonder about a different solution.  Would it be possible to special case
> vacuum to check for and remove (or just move to where they can be removed)
> files when vacuuming pg_class?  At the point we are vacuuming pg_class, we
> ought to be able to know that a relfilenode shouldn't be used anymore,
> right?

I don't think so.  It's not clear to me whether you have in mind "scan
pg_class, collect relfilenodes from all live tuples, then zap all files
not in that set" or "when removing a dead tuple, zap the relfilenode
it mentions".  But neither one works.  The first case has a race condition
against new pg_class entries.  As for the second, the existence of a dead
tuple bearing relfilenode N isn't evidence that some other live tuple
can't have relfilenode N.

Another problem for the second solution is that in the case you're worried
about (ie, PANIC due to out-of-WAL-space during relation's creating
transaction), there's no very good reason to expect that the relation's
pg_class tuple ever made it to disk at all.

A traditional low-tech answer to this has been to keep the WAL on a
separate volume from the main data store, so that it's protected from
out-of-space conditions in the main store and temp areas.  The space
needs for WAL proper are generally much more predictable than the main
store, so it's easier to keep the dedicated space from overflowing.
(Stalled replication/archiving processes can be hazardous to your
health in this scenario, though, if they prevent prompt recycling of
WAL files.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Robert Haas
On Mon, Aug 14, 2017 at 9:59 PM, Craig Ringer  wrote:
> The bits are set, those macros just test to exclude the special meaning of
> both bits being set at once to mean "frozen".
>
> I was reluctant to filter out  HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID
> when we detect that it's frozen, because that could well be misleading when
> debugging.

I don't think so -- the "committed" and "invalid" meanings are
effectively canceled when the "frozen" mask is present.

I mean, "committed" and "invalid" contradict each other...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila  wrote:
> On Tue, Aug 15, 2017 at 8:22 AM, Noah Misch  wrote:
>> On Fri, Aug 11, 2017 at 11:59:14AM -0400, Tom Lane wrote:
>>> "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)"  
>>> writes:
>>> > ERROR:  XX000: unrecognized node type: 90
>>> > LOCATION:  ExecReScan, execAmi.c:284
>>>
>>> (gdb) p (NodeTag) 90
>>> $1 = T_GatherMergeState
>>>
>>> So, apparently somebody wrote ExecReScanGatherMerge, but never bothered
>>> to plug it into ExecReScan.
>
> Attached patch fixes the issue for me.  I have locally verified that
> the gather merge gets executed in rescan path.  I haven't added a test
> case for the same as having gather or gather merge on the inner side
> of join can be time-consuming.  However, if you or others feel that it
> is important to have a test to cover this code path, then I can try to
> produce one.

Committed.

I believe that between this commit and the test-coverage commit from
Andres, this open item is reasonably well addressed.  If someone
thinks more needs to be done, please specify.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Stats for triggers on partitioned tables not shown in EXPLAIN ANALYZE

2017-08-15 Thread Etsuro Fujita

Hi hackers,

I noticed that runtime stats for BEFORE ROW INSERT triggers on leaf 
partitions of partitioned tables aren't reported in EXPLAIN ANALYZE. 
Here is an example:


postgres=# create table trigger_test (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table trigger_test1 partition of trigger_test for 
values in (1);

CREATE TABLE
postgres=# create trigger before_ins_row_trig BEFORE INSERT ON 
trigger_test1 FOR EACH ROW EXECUTE PROCEDURE trigger_data();

CREATE TRIGGER
postgres=# create trigger after_ins_row_trig AFTER INSERT ON 
trigger_test1 FOR EACH ROW EXECUTE PROCEDURE trigger_data();

CREATE TRIGGER
postgres=# explain analyze insert into trigger_test values (1, 'foo');
NOTICE:  before_ins_row_trig() BEFORE ROW INSERT ON trigger_test1
NOTICE:  NEW: (1,foo)
NOTICE:  after_ins_row_trig() AFTER ROW INSERT ON trigger_test1
NOTICE:  NEW: (1,foo)
 QUERY PLAN
-
 Insert on trigger_test  (cost=0.00..0.01 rows=1 width=36) (actual 
time=0.193..0.193 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual 
time=0.002..0.003 rows=1 loops=1)

 Planning time: 0.027 ms
 Trigger after_ins_row_trig on trigger_test1: time=0.075 calls=1
 Execution time: 0.310 ms
(5 rows)

where trig_data() is borrowed from the regression test in postgres_fdw. 
The stats for the AFTER ROW INSERT trigger after_ins_row_trig are well 
shown in the output, but the stats for the BEFORE ROW INSERT trigger 
before_ins_row_trig aren't at all.  I think we should show the latter as 
well.


Another thing I noticed is: runtime stats for BEFORE STATEMENT 
UPDATE/DELETE triggers on partitioned table roots aren't reported in 
EXPLAIN ANALYZE, either, as shown in a below example:


postgres=# create trigger before_upd_stmt_trig BEFORE UPDATE ON 
trigger_test FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();

CREATE TRIGGER
postgres=# create trigger after_upd_stmt_trig AFTER UPDATE ON 
trigger_test FOR EACH STATEMENT EXECUTE PROCEDURE trigger_func();

CREATE TRIGGER
postgres=# explain analyze update trigger_test set b = 'bar' where a = 1;
NOTICE:  trigger_func() called: action = UPDATE, when = BEFORE, 
level = STATEMENT
NOTICE:  trigger_func() called: action = UPDATE, when = AFTER, 
level = STATEMENT

  QUERY PLAN

--
-
 Update on trigger_test  (cost=0.00..25.88 rows=6 width=42) (actual 
time=0.296..0.296 rows=0 loops=1)

   Update on trigger_test1
   ->  Seq Scan on trigger_test1  (cost=0.00..25.88 rows=6 width=42) 
(actual time=0.010..0.011 rows=1 loops=1)

 Filter: (a = 1)
 Planning time: 0.152 ms
 Trigger after_upd_stmt_trig on trigger_test: time=0.141 calls=1
 Execution time: 0.476 ms
(7 rows)

where trigger_func() is borrowed from the regression test, too.  The 
stats for the BEFORE STATEMENT UPDATE trigger before_upd_stmt_trig 
aren't shown at all in the output.  I think this should also be fixed. 
So here is a patch for fixing both issues.  Changes I made are:


* To fix the former, I added a new List member es_leaf_result_relations 
to EState, modified ExecSetupPartitionTupleRouting so that it creates 
ResultRelInfos with the EState's es_instrument and then saves them in 
that list, and modified ExplainPrintTriggers to show stats for BEFORE 
ROW INSERT triggers on leaf partitions (if any) by looking at that list. 
 I also modified copy.c so that ExecSetupPartitionTupleRouting and 
related things are performed in CopyFrom after its EState creation.


* To fix the latter, I modified ExplainPrintTriggers to show stats for 
BEFORE STATEMENT UPDATE/DELETE triggers on partitioned table roots (if 
any) by looking at the es_root_result_relations array.


* While fixing these, I noticed that AFTER ROW INSERT triggers on leaf 
partitions and BEFORE STATEMENT UPDATE/DELETE triggers on partitioned 
table roots re-open relations and re-create ResultRelInfos (trigger-only 
ResultRelInfos!) in ExecGetTriggerResultRel when executing triggers (and 
that in the above examples, the stats for AFTER ROW INSERT trigger/AFTER 
STATEMENT UPDATE trigger are shown the result for 
es_trig_target_relations in ExplainPrintTriggers).  But that wouldn't be 
efficient (and EXPLAIN ANALYZE might produce odd outputs), so I modified 
ExecGetTriggerResultRel so that it searches es_leaf_result_relations and 
es_root_result_relations in addition to es_result_relations.


Best regards,
Etsuro Fujita
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 1415,1473  BeginCopy(ParseState *pstate,
(errcode(ERRCODE_UNDEFINED_COLUMN),
 errmsg("table \"%s\" does not have 
OIDs",
   

Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Amit Kapila
On Tue, Aug 15, 2017 at 8:22 AM, Noah Misch  wrote:
> On Fri, Aug 11, 2017 at 11:59:14AM -0400, Tom Lane wrote:
>> "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)"  
>> writes:
>> > ERROR:  XX000: unrecognized node type: 90
>> > LOCATION:  ExecReScan, execAmi.c:284
>>
>> (gdb) p (NodeTag) 90
>> $1 = T_GatherMergeState
>>
>> So, apparently somebody wrote ExecReScanGatherMerge, but never bothered
>> to plug it into ExecReScan.

Attached patch fixes the issue for me.  I have locally verified that
the gather merge gets executed in rescan path.  I haven't added a test
case for the same as having gather or gather merge on the inner side
of join can be time-consuming.  However, if you or others feel that it
is important to have a test to cover this code path, then I can try to
produce one.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


execrescan_gathermerge_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-15 Thread Masahiko Sawada
On Tue, Aug 15, 2017 at 11:14 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Tue, Aug 15, 2017 at 10:27 AM, Masahiko Sawada  
>> wrote:
>> > Currently vacuum verbose outputs vacuum logs as follows. The first log
>> > message INFO: vacuuming "public.hoge" writes the relation name with
>> > schema name but subsequent vacuum logs output only relation name
>> > without schema name. I've encountered a situation where there are some
>> > same name tables in different schemas and the concurrent vacuum logs
>> > made me hard to distinguish tables. Is there any reasons why we don't
>> > write an explicit name in vacuum verbose logs? If not, can we add
>> > schema names to be more clearly?
>>
>> That's definitely a good idea. lazy_vacuum_rel() uses in one place
>> dbname.schname.relname for autovacuum. This is an inconsistent bit,
>> but that's not really worth changing and there is always
>> log_line_prefix = '%d'.
>
> Worth keeping in mind that INFO messages do not normally go to the
> server log, but rather only to the client.  If it were a problem at the
> server side, you could also suggest adding %p to the log line prefix to
> disambiguate.  Maybe the scenario where this is a real problem is
> vacuumdb -j ...

Yeah, the situation I encountered is that. Invoke vaucuumdb -j and
save its logs to check later.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-08-15 Thread Marko Tiikkaja
On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan  wrote:

> On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja  wrote:
> > Attached is a patch for $SUBJECT.  It might still be a bit rough around
> the
> > edges and probably light on docs and testing, but I thought I'd post it
> > anyway so I won't forget.
>
> Is it possible for ON CONFLICT DO SELECT to raise a cardinality
> violation error? Why or why not?


No.  I don't see when that would need to happen.  But I'm guessing you have
a case in mind?


.m


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-15 Thread Michael Paquier
On Tue, Aug 15, 2017 at 8:28 AM, Bossart, Nathan  wrote:
> I’ve rebased this patch with master to create v7, which is attached.

Thanks for the rebased patch. I am switching into review mode actively
now, so I'll look at it soon.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-08-15 Thread Fabien COELHO



Here is a rebase.


Argh, sorry, missing attachement... Here it is really.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 03e1212..520daae 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -825,14 +825,31 @@ pgbench  options  dbname
  
   Sets variable varname to a value calculated
   from expression.
-  The expression may contain integer constants such as 5432,
+  The expression may contain the NULL constant,
+  boolean constants TRUE and FALSE,
+  integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual SQL precedence and associativity,
+  function calls,
+  SQL CASE generic conditional
+  expressions and parentheses.
+ 
+
+ 
+  Functions and most operators return NULL on
+  NULL input.
+ 
+
+ 
+  For conditional purposes, non zero numerical values are TRUE,
+  zero numerical values and NULL are FALSE.
+ 
+
+ 
+  When no final ELSE clause is provided to a CASE,
+  the default value is NULL.
  
 
  
@@ -841,6 +858,7 @@ pgbench  options  dbname
 \set ntellers 10 * :scale
 \set aid (1021 * random(1, 10 * :scale)) % \
(10 * :scale) + 1
+\set divx CASE WHEN :x <> 0 THEN :y/:x ELSE NULL END 
 
 

@@ -917,6 +935,177 @@ pgbench  options  dbname
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  TRUE
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  FALSE
+ 
+ 
+  NOT
+  logical not
+  not false
+  TRUE
+ 
+ 
+  IS [NOT] (NULL|TRUE|FALSE)
+  value tests
+  1 is null
+  FALSE
+ 
+ 
+  ISNULL|NOTNULL
+  null tests
+  1 notnull
+  TRUE
+ 
+ 
+  =
+  is equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  is not equal
+  5  4
+  TRUE
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  FALSE
+ 
+ 
+  
+  lower than
+  5  4
+  FALSE
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  greater than
+  5  4
+  TRUE
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  TRUE
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  integer bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  integer bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -963,6 +1152,13 @@ pgbench  options  dbname
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -984,6 +1180,20 @@ pgbench  options  dbname
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
+   mod(i, j)
+   inteter
+   modulo
+   mod(54, 32)
+   22
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9b..770be98 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,13 +19,17 @@
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
+static PgBenchExpr *make_null_constant(void);
+static PgBenchExpr *make_boolean_constant(bool bval);
 static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_double_constant(double dval);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr 

Re: [HACKERS] pgbench more operators & functions

2017-08-15 Thread Fabien COELHO


Hello Peter,


On 5/24/17 03:14, Fabien COELHO wrote:

I've improved it in attached v11:
  - add a link to the CASE full documentation
  - add an example expression with CASE ...


This patch needs (at least) a rebase for the upcoming commit fest.


Here is a rebase.

It still passes my tests.

From my point of view this patch is mostly ok, after 16 months in the 

pipe.

ISTM that it is not tagged "ready" because there should be tap tests 
attached. However the current tap tests in pgbench are very poor, 
basically nothing at all is tested. There is a patch 
(https://commitfest.postgresql.org/14/1118/) also submitted for adding a 
significant tap test infrastructure, and if it gets through the idea is to 
update this operator patch to cover the different new functions and 
operators. It could be done in reverse order also, i.e. if the operator 
patch get through the tap test patch would be updated to also test the new 
functions and operators. The status of the tap-test patch is that it 
really needs to be tested on Windows.


Note that someone might object that they do not need these operators and 
functions in pgbench so they are useless, hence the patch should be 
rejected. My answer is that standard benchmarks, eg TPC-B, actually use 
these operator classes (bitwise, logical, ln...) so they are needed if 
pgbench is to be used to implement said benchmarks. And if this is not 
desirable, then what is the point of pgbench?


A renew interest is that "\if" commands have recently been added to 
"psql", an "\if" calls for logical expressions, so a next step would be to 
move the expression capability as a shared tool in front-end utils so that 
it may be used both by "pgbench" and "psql". Also, I'll probably submit an 
"\if" extension to pgbench backslash command language, as it is also 
definitely a useful tool in a benchmark script.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Generate wait event list and docs from text file

2017-08-15 Thread Michael Paquier
Hi all,

As $subject has been touched on two threads recently
(https://www.postgresql.org/message-id/CAB7nPqTbHLcHFn6m11tfpwAdgz8BmnBza2jjN9AK=sdx_kb...@mail.gmail.com
and 
https://www.postgresql.org/message-id/20170808213537.wkmmagf2a6i3hjyi@alvherre.pgsql),
the list of wait event and their category are becoming harder to
maintain because of their number and the fact that:
1) It is easy to add a wait event and...
2) It is easy to not update its documentation.
3) the documentation tables get easily broken.

As I have participated in the situation present now, I propose to
write a patch to make the maintenance of the whole thing easier,
because wait events are important, and because I see more of them
added in the future. So here is a thread dedicated to this problem, so
let's discuss a solution to make things more maintainable.

So, like errcodes.txt, let's use two small perl scripts that generate
the set of SGML summarizing wait events and an include file with all
the wait events listed (enum list and sections).

One wait event is associated with the following set of data:
- A class, like Activity, IPC, BufferPin, LWLock, etc.
- A user-visible event name, like AutoVacuumMain, etc.
- A enum-like name, like WAIT_EVENT_AUTOVACUUM_MAIN.
- A description, like "Waiting in main loop of autovacuum launcher process."
Note as well that a class has a user-visible name, and a symbol name.

Using this set of meta-data, it is possible to generate the SGML
tables and the set of include tables.

While thinking about that, I have been wondering about what could be
the shape of the text file holding this meta-data, which needs to be
readable. So here is an idea:
# Comment, which gets ignored when processing.
# First set a section, followed by all the events in it
# A section is name with a class name, and an object name. Perhaps the
object name could be dynamically generated?
Section: ClassName PG_CLASS_NAME

# Each event is listed: user-visible event name, object name,
description within quotes
EventName WAIT_EVENT_NAME "Description of this event name"
EventName2 WAIT_EVENT_NAME_2 "Description of this event name 2"

There is a link between the class name and its variable name, as well
as between the event name and its variable name, so both could be
generated dynamically:
- the class name is upper-cased, and prefixed with PG_.
- the event name is upper-cased, and prefixed with WAIT_EVENT.

Maintaining the description of the object within quotes makes parsing
with a regex easier.

So, thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-15 Thread Andres Freund
Hi,


On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.

Making a pass through the patchset to get a feel where this at, and
where this is headed.  I previously skimmed the thread to get a rough
sense on what's discused, but not in a very detailed manner.


General:

- I think one important discussion we need to have is what kind of
  performance impact we're going to accept introducing this. It seems
  very likely that this'll cause some slowdown.  We can kind of
  alleviate that by doing some optimizations at the same time, but
  nevertheless, this abstraction is going to cost.

- I don't think we should introduce this without a user besides
  heapam. The likelihood that API will be usable by anything else
  without a testcase seems fairly remote.  I think some src/test/modules
  type implementation of a per-session, in-memory storage - relatively
  easy to implement - or such is necessary.

- I think, and detailed some of that, we're going to need some cleanups
  that go in before this, to decrease the size / increase the quality of
  the new APIs.  It's going to get more painful to change APIs
  subsequently.

- We'll have to document clearly that these APIs are going to change for
  a while, even after the release introducing them.


StorageAm - Scan stuff:

- I think API isn't quite right. There's a lot of granular callback
  functionality like scan_begin_function / scan_begin_catalog /
  scan_begin_bm - these largely are convenience wrappers around the same
  function, and I don't think that would, or rather should, change in
  any abstracted storage layer.  So I think there needs to be some
  unification first (pretty close w/ beginscan_internal already, but
  perhaps we should get rid of a few of these wrappers).

- Some of the exposed functionality, e.g. scan_getpage,
  scan_update_snapshot, scan_rescan_set_params looks like it should just
  be excised, i.e. there's no reason for it to exist.

- Minor: don't think the _function suffix for Storis necessary, just
  makes things long, and every member has it. Besides that, it's also
  easy to misunderstand - for a second I understood
  scan_getnext_function to be about getting the next function...

- Scans are still represented as HeapScanDesc - I don't think that's
  going to fly. Either this needs to be an opaque type (i.e. a struct
  that's not defined, just forward declared), or it needs to be a base
  struct that individual AMs embed in their own structs.  Individual AMs
  definitely are going to need different pieces of data.


Storage AM - tuple stuff:

- tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
  each individual functions, that seems pretty painful to maintain, and
  v/ likely to just grow and grow. Not sure what the solution is, but
  this seems like a hard sell.

- The three *speculative* functions don't quite seem right to me, nor do
  I understand:
+*
+* Setting a tuple's speculative token is a slot-only operation, so no 
need
+* for a storage AM method, but after inserting a tuple containing a
+* speculative token, the insertion must be completed by these routines:
+*/
  I don't see anything related to slots, here?


Storage AM - slot stuff:


- I think there's a number of wrapper functions (slot_getattr,
  slot_getallattrs, getsomeattrs, attisnull) around the same
  functionality - that bloats the API and causes slowdowns. Instead we
  need something like slot_virtualize_tuple(int16 upto), and the rest
  should just be wrappers.

- I think it's wrong to have the slot functionality defined on the
  StorageAm level.  That'll cause even more indirect function calls (=>
  slowness), and besides that the TupleTableSlot struct will need
  members for each and every Am.

  I think the solution is to instead have an Am level "create slot"
  function, and the returned slot is allocated by the Am, with a base
  member of TupleTableSlot with basically just tts_nvalid, tts_values,
  tts_isnull as members.  Those are the only members that can be
  accessed without functions.

  Access to the individual functions (say store_tuple) would then be
  direct members of the TupleTableSlot interface. While that costs a bit
  of memory, it removes one indirection from an already performance
  critical path.

- MinimalTuples should be one type of slot for the above, except it's
  not created by an StorageAm but by a function returning a
  TupleTableSlot.

  This should remove the need for the slot_copy_min_tuple,
  slot_is_physical_tuple functions.

- Right now TupleTableSlots are an executor datastructure, but
  these patches (rightly!) make it much more widely used. So I think it
  needs to be moved outside of executor/, and probably renamed to
  something like 

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-15 Thread Chris Travers
There's another side to this and that I am not sure it is a backend crash.

Here is what I did to reproduce:

2 virtual disk images:  100mb for main data, 40 MB for WAL.  work_mem set
to 256MB. The idea is to test different out of space conditions.

Create table as ...; drop table; select
pg_size_pretty(pg_current_xlog_location() - '0/0');

I played around with parameters to determine how different kinds of out of
space errors were handled.

1.  running out of temp space was cleaned up without a server restart
needed.
2.  A relation running out of disk space *seemed* to get cleaned up.
3.  Running out of WAL space left *both* temp and non-temp files.

I wonder about a different solution.  Would it be possible to special case
vacuum to check for and remove (or just move to where they can be removed)
files when vacuuming pg_class?  At the point we are vacuuming pg_class, we
ought to be able to know that a relfilenode shouldn't be used anymore,
right?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin