Re: [BDR] Question on compatibility of Postgres with Open JDK

2019-01-29 Thread Pratheej Chandrika
Criag,
Thanks a lot for the prompt reply.

>From what you mentioned, I assume that other the client side jdbc  there is
no Java dependency from Server side. What I mean is I assume there is no
jdk/java dependency for Postgres server to work. Request your input.

Thanks
Pratheej

On Wed, Jan 30, 2019 at 12:04 PM Craig Ringer  wrote:

> I don't understand the question.
>
> If you want to use PgJDBC, then yes, PgJDBC works on OpenJDK. You do not
> need to use pgjdbc 9.4 with PostgreSQL 9.4, you may use the latest version.
>
> If you want to use pl/java, then I don't know if it's been tested with
> postgres-bdr 9.4. But it should work fine if it works with normal community
> postgres 9.4.
>
>
> On Wed, 30 Jan 2019 at 14:28, Pratheej Chandrika <
> pratheej.chandr...@motorolasolutions.com> wrote:
>
>> Hello,
>> We are using Postgresql (postgresql-bdr94).
>>
>> Please let us know whether Postgresql supports Open JDK?
>>
>> Thanks
>> Pratheej
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Postgres-BDR and pglogical Mailing List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to bdr-list+unsubscr...@2ndquadrant.com.
>> Visit this group at
>> https://groups.google.com/a/2ndquadrant.com/group/bdr-list/
>> 
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/a/2ndquadrant.com/d/msgid/bdr-list/CA%2BOir%3DMAc9OU4iNm%3DRBr%2BF2fLOSjJTACT9X5brChBGVCAJdPzA%40mail.gmail.com
>> 
>> .
>>
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
> 
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise
>
> --
> You received this message because you are subscribed to the Google Groups
> "Postgres-BDR and pglogical Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bdr-list+unsubscr...@2ndquadrant.com.
> Visit this group at
> https://groups.google.com/a/2ndquadrant.com/group/bdr-list/
> 
> .
> To view this discussion on the web visit
> https://groups.google.com/a/2ndquadrant.com/d/msgid/bdr-list/CAMsr%2BYFobFkcsj%2BktcyGo4oaEQQ3eKKZURmKAqdx-1gwAgHwTA%40mail.gmail.com
> 
> .
>


Question on compatibility of Postgres with Open JDK

2019-01-29 Thread Pratheej Chandrika
Hello,
We are using Postgresql (postgresql-bdr94).

Please let us know whether Postgresql supports Open JDK?

Thanks
Pratheej


Unused parameters & co in code

2019-01-29 Thread Michael Paquier
Hi all,

I just got a run with CFLAGS with the following options:
-Wunused-function -Wunused-variable -Wunused-const-variable
 -Wno-unused-result -Wunused-parameter -Wunused-macros

And while this generates a lot of noise for callbacks and depending on
the environment used, this is also pointing to things which could be
simplified:

Unused arguments in functions:
heap_getsysattr() => tupleDesc
brin_start_evacuating_page() => idxRel
UpdateIndexRelation() => parentIndexOid
SerializeReindexState() => maxsize
ginRedoInsertEntry() => isLeaf
ginReadTuple() => ginstate, attnum
startScanKey() => ginstate
resolve_unique_index_expr() => heapRel
gistGetNodeBuffer() => giststate
heap_tuple_needs_freeze() => buf
log_heap_visible() => rnode
transformSQLValueFunction() => pstate
HeapTupleSatisfiesSelf() => snapshot, buffer
HeapTupleSatisfiesAny() (Style enforced by HeapTupleSatisfiesVisibility)
_hash_get_newbucket_from_oldbucket => rel
RelationPutHeapTuple => relation
checkNameSpaceConflicts => pstate
toast_delete_datum => rel
visibilitymap_clear => rel
_bt_relbuf => rel (Quite some cleanup here for the btree code)
ReindexPartitionedIndex => parentIdx
assignOperTypes => typeinfo
storeProcedures => amoid
dropOperators => amoid
dropProcedures => amoid
ATExecColumnDefault => lockmode
rename_constraint_internal => recursing
ATExecDropIdentity => recursing
ATPrepSetStatistics => colNum, newValue, lockmode
ATExecAlterColumnGenericOptions => lockmode
ATPostAlterTypeParse => lockmode
ATExecEnableDisableRule => lockmode
sendAuthRequest => port (Quite some simplification)
get_qual_from_partbound => rel
alloc_edge_table, free_edge_table, gimme_edge, remove_gene => root
set_tablefunc_pathlist => rte
cost_bitmap_and_node
cost_gather_merge
cost_gather
eclass_useful_for_merging => root
initial_cost_hashjoin
clear_external_function_hash => filehandle (could be extended)
pg_SASL_init => payloadlen
reached_end_position => timeline, segment_finished
syncTargetDirectory => argv0
parseAclItem => name
describeAggregates => verbose
blackholeNoticeProcessor => arg

There are also some unused constants and macros:
PREFER in regcomp.c
STATHDRSIZE in tsvector_op.c
NAMESPACE_SQLXML in xml.c (here for documentation)
messageEquals in parallel.c

For some of these functions, the format is determined by the context,
take HeapTupleSatisfiesSelf or some planner routines.  For others, it
usually comes from some code deleted which did not actually correct
the related function.  Anyway, simplifying all that means usually a
back-patch pain.  However, in some cases, like for example _bt_relbuf,
ReindexPartitionedIndex or sendAuthRequest it can simplify the code as
there is no need to move some extra arguments around from an upper
layer.  Fixing all of these makes little sense, still it seems to me
that the cases where we can get extra simplifications for other code
paths calling the functions makes sense and that we should fix them.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: jsonpath

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-30 07:34:00 +0300, Alexander Korotkov wrote:
> Thank you for your review!  Let me answer some points of your review
> while others will be answered later.
> 
> On Wed, Jan 30, 2019 at 5:28 AM Andres Freund  wrote:
> > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > +/*INPUT/OUTPUT*/
> >
> > Why do we need this much code to serialize data that we initially got in
> > serialized manner? Couldn't we just keep the original around?
> 
> As I get, you concern related to fact that we have jsonpath in text
> form (serialized) and convert it to the binary form (also serialized).
> Yes, we should do so.  Otherwise, we would have to parse jsonpath for
> each evaluation.  Basically, for the same reason we have binary
> representation for jsonb.

But why can't we keep the text around, instead of having all of that
code for converting back?


> > > +++ b/src/backend/utils/adt/jsonpath_exec.c
> >
> > Gotta say, I'm unenthusiastic about yet another execution engine in some
> > PG subsystem.
> 
> Better ideas?  I can imagine we can evade introduction of jsonpath
> datatype and turn jsonpath into executor nodes.  But I hardly can
> imagine you would be more enthusiastic about that :)

Not executor nodes, I think it could be possible to put it into the
expression evaluation codepaths, but it's probably too different to fit
well (would get you JIT compilation of the whole thing tho).

> > > @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception
> > >  2200NEERRCODE_INVALID_XML_CONTENT
> > > invalid_xml_content
> > >  2200SEERRCODE_INVALID_XML_COMMENT
> > > invalid_xml_comment
> > >  2200TEERRCODE_INVALID_XML_PROCESSING_INSTRUCTION 
> > > invalid_xml_processing_instruction
> > > +22030EERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE
> > > duplicate_json_object_key_value
> > > +22031EERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION
> > > invalid_argument_for_json_datetime_function
> > > +22032EERRCODE_INVALID_JSON_TEXT  
> > > invalid_json_text
> > > +22033EERRCODE_INVALID_JSON_SUBSCRIPT 
> > > invalid_json_subscript
> > > +22034EERRCODE_MORE_THAN_ONE_JSON_ITEM
> > > more_than_one_json_item
> > > +22035EERRCODE_NO_JSON_ITEM   
> > > no_json_item
> > > +22036EERRCODE_NON_NUMERIC_JSON_ITEM  
> > > non_numeric_json_item
> > > +22037EERRCODE_NON_UNIQUE_KEYS_IN_JSON_OBJECT 
> > > non_unique_keys_in_json_object
> > > +22038EERRCODE_SINGLETON_JSON_ITEM_REQUIRED   
> > > singleton_json_item_required
> > > +22039EERRCODE_JSON_ARRAY_NOT_FOUND   
> > > json_array_not_found
> > > +2203AEERRCODE_JSON_MEMBER_NOT_FOUND  
> > > json_member_not_found
> > > +2203BEERRCODE_JSON_NUMBER_NOT_FOUND  
> > > json_number_not_found
> > > +2203CEERRCODE_JSON_OBJECT_NOT_FOUND  
> > > object_not_found
> > > +2203FEERRCODE_JSON_SCALAR_REQUIRED   
> > > json_scalar_required
> > > +2203DEERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS   
> > > too_many_json_array_elements
> > > +2203EEERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS
> > >  too_many_json_object_members
> >
> > Are all of these invented by us?
> 
> None of them are invented by us.  All of them are part of SQL 2016 standard.

Cool.

Greetings,

Andres Freund



Re: [BDR] Question on compatibility of Postgres with Open JDK

2019-01-29 Thread Craig Ringer
I don't understand the question.

If you want to use PgJDBC, then yes, PgJDBC works on OpenJDK. You do not
need to use pgjdbc 9.4 with PostgreSQL 9.4, you may use the latest version.

If you want to use pl/java, then I don't know if it's been tested with
postgres-bdr 9.4. But it should work fine if it works with normal community
postgres 9.4.


On Wed, 30 Jan 2019 at 14:28, Pratheej Chandrika <
pratheej.chandr...@motorolasolutions.com> wrote:

> Hello,
> We are using Postgresql (postgresql-bdr94).
>
> Please let us know whether Postgresql supports Open JDK?
>
> Thanks
> Pratheej
>
> --
> You received this message because you are subscribed to the Google Groups
> "Postgres-BDR and pglogical Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bdr-list+unsubscr...@2ndquadrant.com.
> Visit this group at
> https://groups.google.com/a/2ndquadrant.com/group/bdr-list/.
> To view this discussion on the web visit
> https://groups.google.com/a/2ndquadrant.com/d/msgid/bdr-list/CA%2BOir%3DMAc9OU4iNm%3DRBr%2BF2fLOSjJTACT9X5brChBGVCAJdPzA%40mail.gmail.com
> 
> .
>


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2019-01-29 Thread Takashi Menjo
Hi,

Sorry but I found that the patchset v2 had a bug in managing WAL segment
file offset.  I fixed it and updated a patchset as v3 (attached).

Regards,
Takashi

-- 
Takashi Menjo - NTT Software Innovation Center




0001-Add-configure-option-for-PMDK-v3.patch
Description: Binary data


0002-Read-write-WAL-files-using-PMDK-v3.patch
Description: Binary data


0003-Walreceiver-WAL-IO-using-PMDK-v3.patch
Description: Binary data


Re: FETCH FIRST clause PERCENT option

2019-01-29 Thread Surafel Temesgen
On Mon, Jan 28, 2019 at 1:28 AM Tomas Vondra 
wrote:

>
>
> OK. Does that mean you agree the incremental approach is reasonable?
>

there are no noticeable performance difference but i love previous
approach more regarding cursor operation it fetch tuple forward and
backward from tuplestore only but in incremental approach we have to
re execute outer node in every forward and backward fetching operation

regards
Surafel


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-29 Thread Michael Paquier
On Tue, Jan 29, 2019 at 09:51:35PM +0100, Peter Eisentraut wrote:
> On 16/01/2019 09:27, Michael Paquier wrote:
>> index_create does not actually need its extra argument with the tuple
>> descriptor.  I think that we had better grab the column name list from
>> indexInfo and just pass that down to index_create() (patched on my
>> local branch), so it is an overkill to take a full copy of the index's
>> TupleDesc.
> 
> Please send a fixup patch.

Sure.  Attached is a patch which can be applied on top of what you
sent last, based on what I noticed at review, here and there.  You
also forgot to switch two heap_open() to table_open() in pg_depend.c.

>> The patch, standing as-is, is close to 2k lines long, so let's cut
>> that first into more pieces refactoring the concurrent build code.
>> Here are some preliminary notes:
>> - WaitForOlderSnapshots() could be in its own patch.
>> - index_concurrently_build() and index_concurrently_set_dead() can be
>> in an independent patch.  set_dead() had better be a wrapper on top of
>> index_set_state_flags actually which is able to set any kind of
>> flags.
>> - A couple of pieces in index_create() could be cut as well.
> 
> I'm not a fan of that.  I had already considered all the ways in which
> subparts of this patch could get committed, and some of it was
> committed, so what's left now is what I thought should stay together.
> The patch isn't really that big and most of it is moving code around.  I
> would also avoid chopping around in this patch now and focus on getting
> it finished instead.  The functionality seems solid, so if it's good,
> let's commit it, if it's not, let's get it fixed up.

Well, the feature looks solid to me, and not much of its code has
actually changed over the years FWIW.

Committing large and complex patches is something you have more
experience with than myself, and I find the exercise very difficult.
So if you feel it's adapted to keep things grouped together, it is not
an issue to me.  I'll follow the lead.
--
Michael
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index bedd9a008d..9b7ef8bf09 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -926,6 +926,7 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
  Acquired by VACUUM (without FULL),
  ANALYZE, CREATE INDEX CONCURRENTLY,
+ REINDEX CONCURRENTLY,
  CREATE STATISTICS, and certain ALTER
  INDEX and ALTER TABLE variants (for full
  details see  and 
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
 
  
 
@@ -67,10 +67,7 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
   An index build with the CONCURRENTLY option failed, leaving
   an invalid index. Such indexes are useless but it can be
-  convenient to use REINDEX to rebuild them. Note that
-  REINDEX will not perform a concurrent build. To build the
-  index without interfering with production you should drop the index and
-  reissue the CREATE INDEX CONCURRENTLY command.
+  convenient to use REINDEX to rebuild them.
  
 
 
@@ -151,6 +148,21 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 

 
+   
+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will rebuild the
+  index without taking any locks that prevent concurrent inserts,
+  updates, or deletes on the table; whereas a standard reindex build
+  locks out writes (but not reads) on the table until it's done.
+  There are several caveats to be aware of when using this option
+   see .
+ 
+
+   
+

 VERBOSE
 
@@ -241,6 +253,161 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
 
+  
+   Rebuilding Indexes Concurrently
+
+   
+index
+rebuilding concurrently
+   
+
+   
+Rebuilding an index can interfere with regular operation of a database.
+Normally PostgreSQL locks the table whose index is rebuilt
+against writes and performs the entire index build with a single scan of the
+table. Other transactions can still read the table, but if they try to
+insert, update, or delete rows in the table they will block until the
+index rebuild is finished. This could have a severe effect if the system is
+a live production database. Very large tables can take many hours to be
+indexed, and even for smaller tables, an index rebuild can lock out writers
+for periods that are unacceptably long for a production system.
+   
+
+   
+PostgreSQL supports rebuilding indexes with minimum locking
+of writes.  This method is invoked by specifying the
+CONCURRENTLY option of REINDEX. When this option
+is used, PostgreSQL must perform two scans of the table
+for each index that needs to be rebuild and in addition it 

RE: Protect syscache from bloating with negative cache entries

2019-01-29 Thread Ideriha, Takeshi
Hi

>> > I suggest you go with just syscache_prune_min_age, get that into PG
>> > 12, and we can then reevaluate what we need.  If you want to
>> > hard-code a minimum cache size where no pruning will happen, maybe
>> > based on the system catalogs or typical load, that is fine.
>>
>> Please forgive me if I say something silly (I might have got lost.)
>>
>> Are you suggesting to make the cache size limit system-defined and 
>> uncontrollable
>by the user?  I think it's necessary for the DBA to be able to control the 
>cache memory
>amount.  Otherwise, if many concurrent connections access many partitions 
>within a
>not-so-long duration, then the cache eviction can't catch up and ends up in 
>OOM.
>How about the following questions I asked in my previous mail?
>
>cache_memory_target does the opposit of limiting memory usage. It keeps some
>amount of syscahe entries unpruned. It is intended for sessions on where
>cache-effective queries runs intermittently.
>syscache_prune_min_age also doesn't directly limit the size. It just eventually
>prevents infinite memory consumption.
>
>The knobs are not no-brainer at all and don't need tuning in most cases.
>
>> --
>> This is a pure question.  How can we answer these questions from users?
>>
>> * What value can I set to cache_memory_target when I can use 10 GB for the
>caches and max_connections = 100?
>> * How much RAM do I need to have for the caches when I set 
>> cache_memory_target
>= 1M?
>>
>> The user tends to estimate memory to avoid OOM.
>> --
>
>You don't have a direct control on syscache memory usage. When you find a 
>queriy
>slowed by the default cache expiration, you can set cache_memory_taret to keep
>them for intermittent execution of a query, or you can increase
>syscache_prune_min_age to allow cache live for a longer time.
>


In current ver8 patch there is a stats view representing age class distribution.
https://www.postgresql.org/message-id/20181019.173457.68080786.horiguchi.kyotaro%40lab.ntt.co.jp
Does it help DBA with tuning cache_prune_age and/or cache_prune_target?
If the amount of cache entries of older age class is large, are people supposed 
to lower prune_age and 
not to change cache_prune_target?
(I get confusion a little bit.)

Regards,
Takeshi Ideriha




Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure
On Tuesday, January 29, 2019, Tom Lane  wrote:

> Merlin Moncure  writes:
> > On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
> >> I propose that we implement and document this as
> >> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> > I hate to bikeshed here, but I think it's better english using that
> > style of syntax to say,
> >  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )
>
> Hm.  Doesn't really seem to fit with our style for options elsewhere;
> for instance, EXPLAIN's options are things like ANALYZE ON/OFF and
> VERBOSE ON/OFF, not ANALYSIS and VERBOSITY.
>
> regards, tom lane
>

Yep...I'll concede the point.

merlin


Re: jsonpath

2019-01-29 Thread Alexander Korotkov
Hi!

Thank you for your review!  Let me answer some points of your review
while others will be answered later.

On Wed, Jan 30, 2019 at 5:28 AM Andres Freund  wrote:
> On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > +/*INPUT/OUTPUT*/
>
> Why do we need this much code to serialize data that we initially got in
> serialized manner? Couldn't we just keep the original around?

As I get, you concern related to fact that we have jsonpath in text
form (serialized) and convert it to the binary form (also serialized).
Yes, we should do so.  Otherwise, we would have to parse jsonpath for
each evaluation.  Basically, for the same reason we have binary
representation for jsonb.

> > +++ b/src/backend/utils/adt/jsonpath_exec.c
>
> Gotta say, I'm unenthusiastic about yet another execution engine in some
> PG subsystem.

Better ideas?  I can imagine we can evade introduction of jsonpath
datatype and turn jsonpath into executor nodes.  But I hardly can
imagine you would be more enthusiastic about that :)

> > @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception
> >  2200NEERRCODE_INVALID_XML_CONTENT  
> >   invalid_xml_content
> >  2200SEERRCODE_INVALID_XML_COMMENT  
> >   invalid_xml_comment
> >  2200TEERRCODE_INVALID_XML_PROCESSING_INSTRUCTION   
> >   invalid_xml_processing_instruction
> > +22030EERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE  
> >   duplicate_json_object_key_value
> > +22031EERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION  
> >   invalid_argument_for_json_datetime_function
> > +22032EERRCODE_INVALID_JSON_TEXT
> >   invalid_json_text
> > +22033EERRCODE_INVALID_JSON_SUBSCRIPT   
> >   invalid_json_subscript
> > +22034EERRCODE_MORE_THAN_ONE_JSON_ITEM  
> >   more_than_one_json_item
> > +22035EERRCODE_NO_JSON_ITEM 
> >   no_json_item
> > +22036EERRCODE_NON_NUMERIC_JSON_ITEM
> >   non_numeric_json_item
> > +22037EERRCODE_NON_UNIQUE_KEYS_IN_JSON_OBJECT   
> >   non_unique_keys_in_json_object
> > +22038EERRCODE_SINGLETON_JSON_ITEM_REQUIRED 
> >   singleton_json_item_required
> > +22039EERRCODE_JSON_ARRAY_NOT_FOUND 
> >   json_array_not_found
> > +2203AEERRCODE_JSON_MEMBER_NOT_FOUND
> >   json_member_not_found
> > +2203BEERRCODE_JSON_NUMBER_NOT_FOUND
> >   json_number_not_found
> > +2203CEERRCODE_JSON_OBJECT_NOT_FOUND
> >   object_not_found
> > +2203FEERRCODE_JSON_SCALAR_REQUIRED 
> >   json_scalar_required
> > +2203DEERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS 
> >   too_many_json_array_elements
> > +2203EEERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS
> >  too_many_json_object_members
>
> Are all of these invented by us?

None of them are invented by us.  All of them are part of SQL 2016 standard.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: speeding up planning with partitions

2019-01-29 Thread Imai, Yoshikazu
On Wed, Jan 30, 2019 at 3:26 PM, Alvaro Herrera wrote:
> On 2019-Jan-30, Imai, Yoshikazu wrote:
> 
> > Why I did these tests is that I wanted to confirm that even if we
> > apply each patch one by one, there's no performance problem. Because
> > patches are quite large, I just felt it might be difficult to commit
> > these patches all at once and I thought committing patch one by one
> > would be another option to commit these patches. I don't know there
> is
> > the rule in the community how patches should be committed, and if
> > there, my thoughts above may be bad.
> 
> There are no absolute rules, but if I was committing it, I would certainly
> commit each separately, mostly because reviewing the whole series at once
> looks daunting ... and given the proposed commit messages, I'd guess that
> writing a combined commit message would also be very difficult.

Ah, I see.
 
> So thanks for doing these tests.

I'm glad to hear that!


Thanks
--
Yoshikazu Imai



Re: ALTER SESSION

2019-01-29 Thread Kyotaro HORIGUCHI
At Tue, 29 Jan 2019 21:46:28 -0500, Stephen Frost  wrote in 
<20190130024628.ge5...@tamriel.snowman.net>
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-01-29 21:09:22 -0500, Stephen Frost wrote:
> > > * Andres Freund (and...@anarazel.de) wrote:
> > > > On 2019-01-29 20:52:08 -0500, Stephen Frost wrote:
> > > > > * Andres Freund (and...@anarazel.de) wrote:
> > > > > > Leaving the desirability of the feature aside, isn't this racy as 
> > > > > > hell?
> > > > > > I.e. it seems entirely possible that backends stop/start between
> > > > > > determining the PID, and the ALTER SESSION creating the file, and it
> > > > > > actually being processed. By the time that happens an entirely 
> > > > > > different
> > > > > > session might be using that pid.
> > > > > 
> > > > > That seems like something that could possibly be fixed, by adding in
> > > > > other things to make it more likely to be the 'right' backend, but my
> > > > > complaint here is that we are, again, using files to pass data between
> > > > > backend processes and that seems like a pretty terrible direction to 
> > > > > be
> > > > > going in.
> > > > 
> > > > I think pid would be wholly unsuitable for this, and if so we'd have to
> > > > use something entirely independent.
> > > 
> > > I would think you'd use pid + other stuff (user OID, backend proc entry
> > > number, other things).  Basically, if you see a file there with your pid
> > > on it, then you look and see if the other things match- if so, act on
> > > it, if not, discard the file.  I still don't like this approach though,
> > 
> > What do we gain by including the pid here? Seems much more reasonable to
> > use a session id that's just unique over the life of a cluster.
> 
> Are you suggesting we have one of those already, or is the idea that
> we'd add a cluster-lifetime session id for this?

Just a 32 bit counter would suffice for such a period. But in the
attached the worst thing to happen is that the new session reads
the only one config line written by the last command, which don't
seem harmful.. (Of couse not the best, though.)

> > > I really don't think files are the right way to be going about this.
> > 
> > Why? They persist and can be removed, they are introspectable, they
> > automatically are removed from memory when there's no demand...
> 
> Well, we don't actually want these to persist, and it's because they do
> that we have to deal with removing them, and I don't see a whole lot of
> gain from them being introspectable; indeed, that seems like more of a
> drawback than anything since it will invite people to whack those files
> around and abuse them as if they were some externally documented
> interface.

.auto.conf is already a kind of such.. My first version signals
the change via shared memory (in a largely-improvable way) and
add the GUC system the complex "nontransactional SET" feature,
which lets a change persists beyond transaction end if
any. Holding changes until the session becomes idle seems also
complex.

https://www.postgresql.org/message-id/20181127.193622.252197705.horiguchi.kyot...@lab.ntt.co.jp

The most significant reason for passing-by-file is the affinity
with the current GUC system.

> They also cost disk space, they require inodes, they have to be cleaned
> up and managed on shutdown/restart, backup tools need to understand what
> to do with them, potentially, we have to consider if we should have a
> checksum for them, we have to handle out-of-disk space cases with them,
> they could cause us to run out of disk space...

The files are so small to cause such problems easily, but I agree
that file handling is bothersome and fragile. A weak point of
signalling via shared memory is incompatibility with the current
GUC system as I mentioned above.

> These same arguments could have been made about how we could have
> implemented parallel query too.  I agree that the use-case is somewhat
> different there but there's also a lot of similarity when it comes to
> managing this passing of information to that use-case.

Parallel query passes data via DSM?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ceabf24d061a15ed39db544a0972ba199ce2c4bf Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 29 Jan 2019 17:03:57 +0900
Subject: [PATCH] ALTER SESSION

Some tracking features are controled by GUC so they cannot be
activated from another session. The ALTER SESSION command allows
sessions to change some of GUC settings of another session.
---
 doc/src/sgml/config.sgml |  26 ++-
 doc/src/sgml/ref/allfiles.sgml   |   1 +
 doc/src/sgml/ref/alter_session.sgml  | 160 
 doc/src/sgml/reference.sgml  |   1 +
 src/backend/nodes/copyfuncs.c|  14 ++
 src/backend/nodes/equalfuncs.c   |  12 ++
 src/backend/parser/gram.y|  51 -
 src/backend/postmaster/postmaster.c  |   6 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/tcop/utility.c

Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
David Fetter  writes:
> On Tue, Jan 29, 2019 at 02:52:44PM -0500, Tom Lane wrote:
>> I propose that we implement and document this as
>> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

> I think this would be better with parentheses like this: 
> WITH ctename [ ( MATERIALIZE { ON | OFF } ) ] AS ( query ) [, ... ]

I take it you haven't actually been reading this thread.

regards, tom lane



Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 8:12 PM John Naylor  wrote:
>
> On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  wrote:
>
> > You can find this change in attached patch.  Then, I ran the make
> > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > You can vary the number of times the test should run, if you are not
> > able to reproduce it with this.
> >
> > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > patch fixes the issue for me.  Kindly verify the same.
>
> I got one failure in 50 runs. With the new patch, I didn't get any
> failures in 300 runs.
>

Thanks for verification.  I have included it in the attached patch and
I have also modified the page.sql test to have enough number of pages
in relation so that FSM will get created irrespective of alignment
boundaries.  Masahiko San, can you verify if this now works for you?

There are two more failures which we need to something about.
1. Make fsm.sql independent of vacuum without much losing on coverage
of newly added code.  John, I guess you have an idea, see if you can
take care of it, otherwise, I will see what I can do for it.
2. I still could not figure out how to verify if the failure on Jacana
will be fixed.  I have posted some theory above and the attached patch
has a solution for it, but I think it would be better if find out some
way to verify the same.

Note - you might see some cosmetic changes in freespace.c due to pgindent.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v19-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: speeding up planning with partitions

2019-01-29 Thread Alvaro Herrera
On 2019-Jan-30, Imai, Yoshikazu wrote:

> Why I did these tests is that I wanted to confirm that even if we
> apply each patch one by one, there's no performance problem. Because
> patches are quite large, I just felt it might be difficult to commit
> these patches all at once and I thought committing patch one by one
> would be another option to commit these patches. I don't know there is
> the rule in the community how patches should be committed, and if
> there, my thoughts above may be bad.

There are no absolute rules, but if I was committing it, I would
certainly commit each separately, mostly because reviewing the whole
series at once looks daunting ... and given the proposed commit
messages, I'd guess that writing a combined commit message would also be
very difficult.

So thanks for doing these tests.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: speeding up planning with partitions

2019-01-29 Thread Imai, Yoshikazu
Amit-san,

On Tue, Jan 29, 2019 at 10:11 AM, Amit Langote wrote:
> On 2019/01/28 10:44, Imai, Yoshikazu wrote:
> > On Thu, Jan 24, 2019 at 6:10 AM, Imai, Yoshikazu wrote:
> >> updating partkey case:
> >>
> >> part-num  master 0001 0002 0003 0004
> >> 18215.34  7924.99  7931.15  8407.40  8475.65
> >> 27137.49  7026.45  7128.84  7583.08  7593.73
> >> 45880.54  5896.47  6014.82  6405.33  6398.71
> >> 84222.96  4446.40  4518.54  4802.43  4785.82
> >> 16   2634.91  2891.51  2946.99  3085.81  3087.91
> >> 32935.12  1125.28  1169.17  1199.44  1202.04
> >> 64352.37   405.27   417.09   425.78   424.53
> >> 128   236.26   310.01   307.70   315.29   312.81
> >> 25665.3686.8487.6784.3989.27
> >> 51218.3424.8423.5523.9123.91
> >> 10244.83 6.93 6.51 6.45 6.49
> >
> > I also tested with non-partitioned table case.
> >
> > updating partkey case:
> >
> > part-num  master 0001 0002 0003 0004
> > 010956.7  10370.5  10472.6  10571.0  10581.5
> > 18215.34  7924.99  7931.15  8407.40  8475.65
> > ...
> > 10244.83 6.93 6.51 6.45 6.49
> >
> >
> > In my performance results, it seems update performance degrades in
> non-partitioned case with v17-patch applied.
> > But it seems this degrades did not happen at v2-patch.
> >
> > On Thu, Aug 30, 2018 at 1:45 AM, Amit, Langote wrote:
> >> UPDATE:
> >>
> >> nparts  master00010002   0003
> >> ==  ==   
> >> 0 285628932862   2816
> >
> > Does this degradation only occur in my tests? Or if this result is correct,
> what may causes the degradation?
> 
> I re-ran tests with v18 using the following setup [1]:
> 
> create table ht (a int, b int) partition by hash (b); create table ht_0
> partition of ht for values with (modulus N, remainder 0); ...
> $ cat update-noprune.sql
> update ht set a = 0;
> 
> pgbench -n -T 60 -f update-noprune,sql
> 
> TPS:
> 
> nparts  master00010002   0003   0004
> ==  ==      
> 0   4408  43354423   4379   4314
> 1   3883  38733679   3856   4007
> 2   3495  34763477   3500   3627
> 
> I can see some degradation for small number of partitions, but maybe it's
> just noise?  At least, I don't yet have a systematic explanation for that
> happening.

Thanks for testing.

I also re-ran tests with v18 using settings almost same as I used before, but 
this time I run pgbench for 60 second which was 30 second in previous test.

TPS:

 nparts  master00010002   0003   0004
 ==  ==      
 0   10794 11018   10761  10552  11066
 1   7574  76257558   8071   8219
 2   6745  67786746   7281   7344

I can see no degradation, so I also think that performance degradation in my 
previous test and your test was because of just noise.


Why I did these tests is that I wanted to confirm that even if we apply each 
patch one by one, there's no performance problem. Because patches are quite 
large, I just felt it might be difficult to commit these patches all at once 
and I thought committing patch one by one would be another option to commit 
these patches. I don't know there is the rule in the community how patches 
should be committed, and if there, my thoughts above may be bad.

Anyway, I'll restart code reviewing :)

--
Yoshikazu Imai


RE: [HACKERS] Cached plans and statement generalization

2019-01-29 Thread Nagaura, Ryohei
Hi,

On Mon, Jan 28, 2019 at 10:46 PM, Konstantin Knizhnik wrote:
> Rebased version of the patch is attached.
Thank you for your quick rebase.

> This files are just output of test execution and it is not possible to expect 
> lack of
> trailing spaces in output of test scripts execution.
I understand this is because regression tests use exact matching.
I learned a lot. Thanks.

Best regards,
-
Ryohei Nagaura



Re: ALTER SESSION

2019-01-29 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-29 21:09:22 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2019-01-29 20:52:08 -0500, Stephen Frost wrote:
> > > > * Andres Freund (and...@anarazel.de) wrote:
> > > > > Leaving the desirability of the feature aside, isn't this racy as 
> > > > > hell?
> > > > > I.e. it seems entirely possible that backends stop/start between
> > > > > determining the PID, and the ALTER SESSION creating the file, and it
> > > > > actually being processed. By the time that happens an entirely 
> > > > > different
> > > > > session might be using that pid.
> > > > 
> > > > That seems like something that could possibly be fixed, by adding in
> > > > other things to make it more likely to be the 'right' backend, but my
> > > > complaint here is that we are, again, using files to pass data between
> > > > backend processes and that seems like a pretty terrible direction to be
> > > > going in.
> > > 
> > > I think pid would be wholly unsuitable for this, and if so we'd have to
> > > use something entirely independent.
> > 
> > I would think you'd use pid + other stuff (user OID, backend proc entry
> > number, other things).  Basically, if you see a file there with your pid
> > on it, then you look and see if the other things match- if so, act on
> > it, if not, discard the file.  I still don't like this approach though,
> 
> What do we gain by including the pid here? Seems much more reasonable to
> use a session id that's just unique over the life of a cluster.

Are you suggesting we have one of those already, or is the idea that
we'd add a cluster-lifetime session id for this?

> > I really don't think files are the right way to be going about this.
> 
> Why? They persist and can be removed, they are introspectable, they
> automatically are removed from memory when there's no demand...

Well, we don't actually want these to persist, and it's because they do
that we have to deal with removing them, and I don't see a whole lot of
gain from them being introspectable; indeed, that seems like more of a
drawback than anything since it will invite people to whack those files
around and abuse them as if they were some externally documented
interface.

They also cost disk space, they require inodes, they have to be cleaned
up and managed on shutdown/restart, backup tools need to understand what
to do with them, potentially, we have to consider if we should have a
checksum for them, we have to handle out-of-disk space cases with them,
they could cause us to run out of disk space...

These same arguments could have been made about how we could have
implemented parallel query too.  I agree that the use-case is somewhat
different there but there's also a lot of similarity when it comes to
managing this passing of information to that use-case.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread David Fetter
On Tue, Jan 29, 2019 at 02:52:44PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> >> Yeah, I thought about that too, but it doesn't seem like an improvement.
> >> If the query is very long (which isn't unlikely) I think people would
> >> prefer to see the option(s) up front.
> 
> > Having these options at the front of the WITH clause looks more
> > natural to me.
> 
> Well, we've managed to get agreement on the semantics of this thing,
> let's not get hung up on the syntax details.
> 
> I propose that we implement and document this as
> 
>   WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

I think this would be better with parentheses like this: 

WITH ctename [ ( MATERIALIZE { ON | OFF } ) ] AS ( query ) [, ... ]

and it's a lot easier to add more query hints later.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: jsonpath

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> +/*
> + * Make datetime type from 'date_txt' which is formated at argument 'fmt'.
> + * Actual datatype (returned in 'typid', 'typmod') is determined by
> + * presence of date/time/zone components in the format string.
> + */
> +Datum
> +to_datetime(text *date_txt, text *fmt, char *tzname, bool strict, Oid *typid,
> + int32 *typmod, int *tz)

Given other to_ functions being SQL callable, I'm not a fan of the
naming here.

> +{
> + struct pg_tm tm;
> + fsec_t  fsec;
> + int fprec = 0;
> + int flags;
> +
> + do_to_timestamp(date_txt, fmt, strict, , , , );
> +
> + *typmod = fprec ? fprec : -1;   /* fractional part precision */
> + *tz = 0;
> +
> + if (flags & DCH_DATED)
> + {
> + if (flags & DCH_TIMED)
> + {
> + if (flags & DCH_ZONED)
> + {
> + TimestampTz result;
> +
> + if (tm.tm_zone)
> + tzname = (char *) tm.tm_zone;
> +
> + if (tzname)
> + {
> + int dterr = 
> DecodeTimezone(tzname, tz);
> +
> + if (dterr)
> + DateTimeParseError(dterr, 
> tzname, "timestamptz");


Do we really need 6/7 indentation levels in new code?




> +jsonpath_scan.c: FLEXFLAGS = -CF -p -p

Why -CF, and why is -p repeated?


> - JsonEncodeDateTime(buf, val, DATEOID);
> + JsonEncodeDateTime(buf, val, DATEOID, NULL);

ISTM API changes like this ought to be done in a separate patch, to ease
review.


>  }
>  
> +
>  /*
>   * Compare two jbvString JsonbValue values, a and b.
>   *

Random WS change.


> +/*INPUT/OUTPUT*/

Why do we need this much code to serialize data that we initially got in
serialized manner? Couldn't we just keep the original around?


> +Datum
> +jsonpath_in(PG_FUNCTION_ARGS)
> +{

No binary input support? I'd suggest adding that, but keeping the
representation the same. Otherwise just adds unnecesary complexity for
driver authors wishing to use the binar protocol.



> +/Support functions for 
> JsonPath**/
> +
> +/*
> + * Support macroses to read stored values
> + */

s/macroses/macros/


> +++ b/src/backend/utils/adt/jsonpath_exec.c

Gotta say, I'm unenthusiastic about yet another execution engine in some
PG subsystem.


> @@ -0,0 +1,2776 @@
> +/*-
> + *
> + * jsonpath_exec.c
> + *Routines for SQL/JSON path execution.
> + *
> + * Copyright (c) 2019, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *   src/backend/utils/adt/jsonpath_exec.c
> + *
> + *-
> + */

Somewhere there needs to be higher level documentation explaining how
this stuff is supposed to work on a code level.

> +
> +/* strict/lax flags is decomposed into four [un]wrap/error flags */
> +#define jspStrictAbsenseOfErrors(cxt)(!(cxt)->laxMode)
> +#define jspAutoUnwrap(cxt)   ((cxt)->laxMode)
> +#define jspAutoWrap(cxt) ((cxt)->laxMode)
> +#define jspIgnoreStructuralErrors(cxt)   ((cxt)->ignoreStructuralErrors)
> +#define jspThrowErrors(cxt)  ((cxt)->throwErrors)

What's the point?


> +#define ThrowJsonPathError(code, detail) \
> + ereport(ERROR, \
> +  (errcode(ERRCODE_ ## code), \
> +   errmsg(ERRMSG_ ## code), \
> +   errdetail detail))
> +
> +#define ThrowJsonPathErrorHint(code, detail, hint) \
> + ereport(ERROR, \
> +  (errcode(ERRCODE_ ## code), \
> +   errmsg(ERRMSG_ ## code), \
> +   errdetail detail, \
> +   errhint hint))

These seem ill-advised, just making it harder to understand the code,
and grepping for it, without actually meaningfully simplifying anything.


> +/*
> + * Find value of jsonpath variable in a list of passing params
> + */

What is that comment supposed to mean?


> +/*
> + * Get the type name of a SQL/JSON item.
> + */
> +static const char *
> +JsonbTypeName(JsonbValue *jb)
> +{

Wasn't there pretty similar infrastructure elsewhere?



> +/*
> + * Cross-type comparison of two datetime SQL/JSON items.  If items are
> + * uncomparable, 'error' flag is set.
> + */

Sounds like it'd not raise an error, but it does in a bunch of scenarios.


> @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception
>  2200NEERRCODE_INVALID_XML_CONTENT 

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2019-01-29 Thread Hubert Zhang
Hi Michael, Robert
For you question about the hook position, I want to explain more about the
background why we want to introduce these hooks.
We wrote a diskquota extension 
for Postgresql(which is inspired by Heikki's pg_quota
). Diskquota extension is used to
control the disk usage in Postgresql in a fine-grained way, which means:
1. You could set disk quota limit at schema level or role level.
2. A background worker will gather the current disk usage for each
schema/role in realtime.
3. A background worker will generate the blacklist for schema/role whose
quota limit is exceeded.
4. New transaction want to insert data into the schema/role in the
blacklist will be cancelled.

In step 2, gathering the current disk usage for each schema needs to sum
disk size of all the tables in this schema. This is a time consuming
operation. We want to use hooks in SMGR to detect the Active Table, and
only recalculate the disk size of all the Active Tables.
For example, the smgrextend hook indicates that you allocate a new block
and the table need to be treated as Active Table.

Do you have some better hook positions recommend to solve the above user
case?
Thanks in advance.

Hubert





On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang  wrote:

> > For this particular purpose, I don't immediately see why you need a
>> > hook in both places.  If ReadBuffer is called with P_NEW, aren't we
>> > guaranteed to end up in smgrextend()?
>> Yes, that's a bit awkward.
>
>
>  Hi Michael, we revisit the ReadBuffer hook and remove it in the latest
> patch.
> ReadBuffer hook is original used to do enforcement(e.g. out of diskquota
> limit) when query is loading data.
> We plan to put the enforcement work of running query to separate diskquota
> worker process.
> Let worker process to detect the backends to be cancelled and send SIGINT
> to these backends.
> So there is no need for ReadBuffer hook anymore.
>
> Our patch currently only contains smgr related hooks to catch the file
> change and get the Active Table list for diskquota extension.
>
> Thanks Hubert.
>
>
> On Mon, Jan 7, 2019 at 6:56 PM Haozhou Wang  wrote:
>
>> Thanks very much for your comments.
>>
>> To the best of my knowledge, smgr is a layer that abstract the storage
>> operations. Therefore, it is a good place to control or collect information
>> the storage operations without touching the physical storage layer.
>> Moreover, smgr is coming with actual disk IO operation (not consider the
>> OS cache) for postgres. So we do not need to worry about the buffer
>> management in postgres.
>> It will make the purpose of hook is pure: a hook for actual disk IO.
>>
>> Regards,
>> Haozhou
>>
>> On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier 
>> wrote:
>>
>>> On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
>>> > +1 for adding some hooks to support this kind of thing, but I think
>>> > the names you've chosen are not very good.  The hook name should
>>> > describe the place from which it is called, not the purpose for which
>>> > one imagines that it will be used, because somebody else might imagine
>>> > another use.  Both BufferExtendCheckPerms_hook_type and
>>> > SmgrStat_hook_type are imagining that they know what the hook does -
>>> > CheckPerms in the first case and Stat in the second case.
>>>
>>> I personally don't mind making Postgres more pluggable, but I don't
>>> think that we actually need the extra ones proposed here at the layer
>>> of smgr, as smgr is already a layer designed to call an underlying set
>>> of APIs able to extend, unlink, etc. depending on the storage type.
>>>
>>> > For this particular purpose, I don't immediately see why you need a
>>> > hook in both places.  If ReadBuffer is called with P_NEW, aren't we
>>> > guaranteed to end up in smgrextend()?
>>>
>>> Yes, that's a bit awkward.
>>> --
>>> Michael
>>
>>
>
> --
> Thanks
>
> Hubert Zhang
>


-- 
Thanks

Hubert Zhang


Re: ALTER SESSION

2019-01-29 Thread Andres Freund
On 2019-01-29 21:09:22 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-01-29 20:52:08 -0500, Stephen Frost wrote:
> > > * Andres Freund (and...@anarazel.de) wrote:
> > > > Leaving the desirability of the feature aside, isn't this racy as hell?
> > > > I.e. it seems entirely possible that backends stop/start between
> > > > determining the PID, and the ALTER SESSION creating the file, and it
> > > > actually being processed. By the time that happens an entirely different
> > > > session might be using that pid.
> > > 
> > > That seems like something that could possibly be fixed, by adding in
> > > other things to make it more likely to be the 'right' backend, but my
> > > complaint here is that we are, again, using files to pass data between
> > > backend processes and that seems like a pretty terrible direction to be
> > > going in.
> > 
> > I think pid would be wholly unsuitable for this, and if so we'd have to
> > use something entirely independent.
> 
> I would think you'd use pid + other stuff (user OID, backend proc entry
> number, other things).  Basically, if you see a file there with your pid
> on it, then you look and see if the other things match- if so, act on
> it, if not, discard the file.  I still don't like this approach though,

What do we gain by including the pid here? Seems much more reasonable to
use a session id that's just unique over the life of a cluster.


> I really don't think files are the right way to be going about this.

Why? They persist and can be removed, they are introspectable, they
automatically are removed from memory when there's no demand...

Greetings,

Andres Freund



Re: speeding up planning with partitions

2019-01-29 Thread David Rowley
On Tue, 29 Jan 2019 at 22:32, Amit Langote
 wrote:
>
> On 2019/01/29 11:23, David Rowley wrote:
> > 7. In set_inherit_target_rel_sizes() I still don't really like the way
> > you're adjusting the EquivalenceClasses. Would it not be better to
> > invent a function similar to adjust_appendrel_attrs(), or just use
> > that function?
>
> OK, I added a function copy_eq_classes_for_child_root() that simply makes
> a copy of eq_classes from the source root (deep-copying where applicable)
> and returns the list.

hmm, but you've added a case for SpecialJoinInfo, is there a good
reason not just to do the translation by just adding an
EquivalenceClass case to adjust_appendrel_attrs_mutator() then just
get rid of the new "replace" flag in add_child_rel_equivalences()?
That way you'd also remove the churn in the couple of places you've
had to modify the existing calls to add_child_rel_equivalences().

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: ALTER SESSION

2019-01-29 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-29 20:52:08 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > Leaving the desirability of the feature aside, isn't this racy as hell?
> > > I.e. it seems entirely possible that backends stop/start between
> > > determining the PID, and the ALTER SESSION creating the file, and it
> > > actually being processed. By the time that happens an entirely different
> > > session might be using that pid.
> > 
> > That seems like something that could possibly be fixed, by adding in
> > other things to make it more likely to be the 'right' backend, but my
> > complaint here is that we are, again, using files to pass data between
> > backend processes and that seems like a pretty terrible direction to be
> > going in.
> 
> I think pid would be wholly unsuitable for this, and if so we'd have to
> use something entirely independent.

I would think you'd use pid + other stuff (user OID, backend proc entry
number, other things).  Basically, if you see a file there with your pid
on it, then you look and see if the other things match- if so, act on
it, if not, discard the file.  I still don't like this approach though,

> > Isn't there a whole system for passing information between different
> > backend processes that we could and probably should be using here
> > instead..?  I get that it wasn't quite intended for this originally, but
> > hopefully it would be possible to make it work...
> 
> I'm not sure which system you're referring to? Procsignals? Those rely
> on the fact that it's harmless to send such signals even when the pid
> has been recycled, so that doesn't really address the issue.  And
> realistically, you're going to need somtehing to persist such settings
> to - they're not fixed size, and using DSM here would complicate things
> to a significant degree. I don't think files would necessarily be wrong
> here, if we actually want this; alternatively we could go with some
> magic catalog, but that'd be a lot of infrastructure for not
> particularly much gain.

I would think we'd use proc signals to say "hey, go check this when you
get a chance" or similar, but, no, I was thinking for actually passing
the data we'd use a DSM.  I can see how that would complicate things but
that seems like something we might be able to solve by making it easier
to use them for this simplified use-case.

I really don't think files are the right way to be going about this.

A magic catalog sounds like an interesting idea.  Another thought I had
was something around pipes but it seems like that would require we have
pipes between every pair of backends..  Instead, I'd think we'd have a
way for any backend to plop a message into some other backend's message
queue and then that backend processes it when it gets to it.

I don't think this is going to be the last time we want to do something
like this and so having a bunch of individually built file-based systems
for passing information between backends seems really grotty.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: ALTER SESSION

2019-01-29 Thread Andres Freund
On 2019-01-29 20:52:08 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > Leaving the desirability of the feature aside, isn't this racy as hell?
> > I.e. it seems entirely possible that backends stop/start between
> > determining the PID, and the ALTER SESSION creating the file, and it
> > actually being processed. By the time that happens an entirely different
> > session might be using that pid.
> 
> That seems like something that could possibly be fixed, by adding in
> other things to make it more likely to be the 'right' backend, but my
> complaint here is that we are, again, using files to pass data between
> backend processes and that seems like a pretty terrible direction to be
> going in.

I think pid would be wholly unsuitable for this, and if so we'd have to
use something entirely independent.


> Isn't there a whole system for passing information between different
> backend processes that we could and probably should be using here
> instead..?  I get that it wasn't quite intended for this originally, but
> hopefully it would be possible to make it work...

I'm not sure which system you're referring to? Procsignals? Those rely
on the fact that it's harmless to send such signals even when the pid
has been recycled, so that doesn't really address the issue.  And
realistically, you're going to need somtehing to persist such settings
to - they're not fixed size, and using DSM here would complicate things
to a significant degree. I don't think files would necessarily be wrong
here, if we actually want this; alternatively we could go with some
magic catalog, but that'd be a lot of infrastructure for not
particularly much gain.

Greetings,

Andres Freund



Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-29 16:18:46 +0100, Christoph Berg wrote:
> Re: Michael Paquier 2019-01-23 <20190123004722.ge3...@paquier.xyz>
> > >> Largely because I think it's an independent patch from the CXXOPT need
> > >> from Christopher / Debian packaging.  It's a larger patch, that needs
> > >> more docs etc.  If whoever applies that wants to backpatch it - I'm not
> > >> going to protest, I just wouldn't myself, unless somebody pipes up that
> > >> it'd help them.
> > > 
> > > Ah, I see.  No arguments against.
> 
> Fwiw I'm not attached to using COPT and friends, I just happend to
> pick these because that was one way that worked. With what I've
> learned now, PG_*FLAGS is the better approach.

I'm confused - that doesn't allow to inject flags to all in-core built
files? So how does that fix your problem fully? Say
e.g. llvmjit_inline.cpp won't get the flag, and thus not be
reproducible?

Greetings,

Andres Freund



Re: ALTER SESSION

2019-01-29 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-29 20:32:54 +0900, Kyotaro HORIGUCHI wrote:
> > Hello.
> > 
> > https://www.postgresql.org/message-id/20190128.133143.115303042.horiguchi.kyot...@lab.ntt.co.jp
> > 
> > > C. Provide session-specific GUC variable (that overides the global one)
> > >- Add new configuration file "postgresql.conf." and
> > >  pg_reload_conf() let the session with the PID loads it as if
> > >  it is the last included file. All such files are removed at
> > >  startup or at the end of the coressponding session.
> > > 
> > >- Add a new syntax like this:
> > >  ALTER SESSION WITH (pid=)
> > > SET configuration_parameter {TO | =} {value | 'value' | DEFAULT}
> > > RESET configuration_parameter
> > > RESET ALL
> > > 
> > >- Target variables are marked with GUC_REMOTE.
> > > 
> > > I'll consider the last choice and will come up with a patch.
> > 
> > This is that, with a small change in design.
> > 
> > ALTER SESSION WITH (pid ) SET param {TO|=} value [ IMMEDIATE ]
> > ALTER SESSION WITH (pid ) RESET param [ IMMEDIATE ]
> > ALTER SESSION WITH (pid ) RESET ALL
> > 
> > The first form create an entry in
> > $PGDATA/pg_session_conf/postgresql..conf.
> > The second form removes the entry.
> > The third form removes the file itself.
> > 
> > IMMEDIATE specifies that the change is applied immediately by
> > sending SIGHUP to the process. pg_reload_conf() works as well.
> > 
> > The session configuration is removed at session-end and the
> > directory is cleaned up at startup.
> > 
> > It can change varaibles of PGC_USERSET by non-superuser or
> > PGC_SUSET by superuser. The local session user should have the
> > same privileges with pg_signal_backend() on the target session.
> > 
> > This patch contains documentation but doesn't contain test yet.
> > 
> > I would appreciate any comments or suggestions on this.
> 
> Leaving the desirability of the feature aside, isn't this racy as hell?
> I.e. it seems entirely possible that backends stop/start between
> determining the PID, and the ALTER SESSION creating the file, and it
> actually being processed. By the time that happens an entirely different
> session might be using that pid.

That seems like something that could possibly be fixed, by adding in
other things to make it more likely to be the 'right' backend, but my
complaint here is that we are, again, using files to pass data between
backend processes and that seems like a pretty terrible direction to be
going in.

Isn't there a whole system for passing information between different
backend processes that we could and probably should be using here
instead..?  I get that it wasn't quite intended for this originally, but
hopefully it would be possible to make it work...

> And IMMEDIATE wouldn't be very immediate, considering e.g. longrunning
> queries / VACUUM etc, which'll only process new config in the mainloop.

That's certainly a good point.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Restricting maximum keep segments by repslots

2019-01-29 Thread Kyotaro HORIGUCHI
At Thu, 20 Dec 2018 16:24:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20181220.162438.121484007.horiguchi.kyot...@lab.ntt.co.jp>
> Thank you for piking this and sorry being late.
> 
> At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier  
> wrote in <20181119043958.ge4...@paquier.xyz>
> > ereport should not be called within xlogreader.c as a base rule:
> 
> Ouch! I forgot that. Fixed to use report_invalid_record slightly
> changing the message. The code is not required (or cannot be
> used) on frontend so #ifndef FRONTENDed the code.
> 
> At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier  
> wrote in <20181120050744.gj4...@paquier.xyz>
> > On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > > I was just coming by to look at bit at the patch series, and bumped
> > > into that:
> > 
> > So I have been looking at the last patch series 0001-0004 posted on this
> > thread, and coming from here:
> > https://postgr.es/m/20181025.215518.189844649.horiguchi.kyot...@lab.ntt.co.jp
> > 
> > /* check that the slot is gone */
> > SELECT * FROM pg_replication_slots
> > It could be an idea to switch to the expanded mode here, not that it
> > matters much still..
> 
> No problem doing that. Done.
> 
> TAP test complains that it still uses recovery.conf. Fixed. On
> the way doing that I added parameter primary_slot_name to
> init_from_backup in PostgresNode.pm
> 
> > +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> > You mean Available here, not Avaiable.  This function is only used when
> > scanning for slot information with pg_replication_slots, so wouldn't it
> > be better to just return the status string in this case?
> 
> Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
> The fix sounds reasonable. The function was created as returning
> boolean and the name doen't fit the current function. I renamed
> the name to GetLsnAvailability() that returns a string.
> 
> > Not sure I see the point of the "remain" field, which can be found with
> > a simple calculation using the current insertion LSN, the segment size
> > and the amount of WAL that the slot is retaining.  It may be interesting
> > to document a query to do that though.
> 
> It's not that simple. wal_segment_size, max_slot_wal_keep_size,
> wal_keep_segments, max_slot_wal_keep_size and the current LSN are
> invoved in the calculation which including several conditional
> branches, maybe as you see upthread. We could show "the largest
> current LSN until WAL is lost" but the "current LSN" is not shown
> there. So it is showing the "remain".
> 
> > GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> > parallel, no?  How is it safe to scan pg_wal on a process querying
> > pg_replication_slots while another process may manipulate its contents
> > (aka the checkpointer or just the startup process with an
> > end-of-recovery checkpoint.).  This routine relies on unsafe
> > assumptions as this is not concurrent-safe.  You can avoid problems by
> > making sure instead that lastRemovedSegNo is initialized correctly at
> > startup, which would be normally one segment older than what's in
> > pg_wal, which feels a bit hacky to rely on to track the oldest segment.
> 
> Concurrent recycling makes the function's result vary between the
> segment numbers before and after it. It is unstable but doesn't
> matter so much. The reason for the timing is to avoid extra
> startup time by a scan over pg_wal that is unncecessary in most
> cases.
> 
> Anyway the attached patch initializes lastRemovedSegNo in
> StartupXLOG().
> 
> > It seems to me that GetOldestXLogFileSegNo() should also check for
> > segments matching the current timeline, no?
> 
> RemoveOldXlogFiles() ignores timeline and the function is made to
> behave the same way (in different manner). I added a comment for
> the behavior in the function.
> 
> > +   if (prev_lost_segs != lost_segs)
> > +   ereport(WARNING,
> > +   (errmsg ("some replication slots have lost
> > required WAL segments"),
> > +errdetail_plural(
> > +"The mostly affected slot has lost %ld
> > segment.",
> > +"The mostly affected slot has lost %ld
> > segments.",
> > +lost_segs, lost_segs)));
> > This can become very noisy with the time, and it would be actually
> > useful to know which replication slot is impacted by that.
> 
> One message per one segment doen't seem so noisy. The reason for
> not showing slot identifier individually is just to avoid
> complexity comes from involving slot details. DBAs will see the
> details in pg_stat_replication.
> 
> Anyway I did that in the attached patch. ReplicationSlotsBehind
> returns the list of the slot names that behind specified
> LSN. With this patch the messages looks as the follows:
> 
> WARNING:  some replication slots have lost required WAL segments
> DETAIL:  Slot s1 lost 8 

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-29 Thread Michael Paquier
On Tue, Jan 29, 2019 at 04:18:46PM +0100, Christoph Berg wrote:
> Re backpatching, I would at least need them in PG11 because that's
> what is going to be released with Debian buster. An official backpatch
> to all supported versions would be nice, but I could also sneak in
> that change into the Debian packages without breaking anything.

The change is non-invasive, so if that helps to make packaging easier
and more consistent as a whole, then I could do the back-patch.
Except if somebody has an objection?
--
Michael


signature.asc
Description: PGP signature


Re: ALTER SESSION

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-29 20:32:54 +0900, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> https://www.postgresql.org/message-id/20190128.133143.115303042.horiguchi.kyot...@lab.ntt.co.jp
> 
> > C. Provide session-specific GUC variable (that overides the global one)
> >- Add new configuration file "postgresql.conf." and
> >  pg_reload_conf() let the session with the PID loads it as if
> >  it is the last included file. All such files are removed at
> >  startup or at the end of the coressponding session.
> > 
> >- Add a new syntax like this:
> >  ALTER SESSION WITH (pid=)
> > SET configuration_parameter {TO | =} {value | 'value' | DEFAULT}
> > RESET configuration_parameter
> > RESET ALL
> > 
> >- Target variables are marked with GUC_REMOTE.
> > 
> > I'll consider the last choice and will come up with a patch.
> 
> This is that, with a small change in design.
> 
> ALTER SESSION WITH (pid ) SET param {TO|=} value [ IMMEDIATE ]
> ALTER SESSION WITH (pid ) RESET param [ IMMEDIATE ]
> ALTER SESSION WITH (pid ) RESET ALL
> 
> The first form create an entry in
> $PGDATA/pg_session_conf/postgresql..conf.
> The second form removes the entry.
> The third form removes the file itself.
> 
> IMMEDIATE specifies that the change is applied immediately by
> sending SIGHUP to the process. pg_reload_conf() works as well.
> 
> The session configuration is removed at session-end and the
> directory is cleaned up at startup.
> 
> It can change varaibles of PGC_USERSET by non-superuser or
> PGC_SUSET by superuser. The local session user should have the
> same privileges with pg_signal_backend() on the target session.
> 
> This patch contains documentation but doesn't contain test yet.
> 
> I would appreciate any comments or suggestions on this.

Leaving the desirability of the feature aside, isn't this racy as hell?
I.e. it seems entirely possible that backends stop/start between
determining the PID, and the ALTER SESSION creating the file, and it
actually being processed. By the time that happens an entirely different
session might be using that pid.

And IMMEDIATE wouldn't be very immediate, considering e.g. longrunning
queries / VACUUM etc, which'll only process new config in the mainloop.

- Andres



Re: ALTER SESSION

2019-01-29 Thread Kyotaro HORIGUCHI
At Tue, 29 Jan 2019 20:32:54 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190129.203254.115361483.horiguchi.kyot...@lab.ntt.co.jp>
> Hello.
> 
> https://www.postgresql.org/message-id/20190128.133143.115303042.horiguchi.kyot...@lab.ntt.co.jp
> 
> > C. Provide session-specific GUC variable (that overides the global one)
> >- Add new configuration file "postgresql.conf." and
> >  pg_reload_conf() let the session with the PID loads it as if
> >  it is the last included file. All such files are removed at
> >  startup or at the end of the coressponding session.
> > 
> >- Add a new syntax like this:
> >  ALTER SESSION WITH (pid=)
> > SET configuration_parameter {TO | =} {value | 'value' | DEFAULT}
> > RESET configuration_parameter
> > RESET ALL
> > 
> >- Target variables are marked with GUC_REMOTE.
> > 
> > I'll consider the last choice and will come up with a patch.
> 
> This is that, with a small change in design.
> 
> ALTER SESSION WITH (pid ) SET param {TO|=} value [ IMMEDIATE ]
> ALTER SESSION WITH (pid ) RESET param [ IMMEDIATE ]
> ALTER SESSION WITH (pid ) RESET ALL
> 
> The first form create an entry in
> $PGDATA/pg_session_conf/postgresql..conf.
> The second form removes the entry.
> The third form removes the file itself.
> 
> IMMEDIATE specifies that the change is applied immediately by
> sending SIGHUP to the process. pg_reload_conf() works as well.
> 
> The session configuration is removed at session-end and the
> directory is cleaned up at startup.
> 
> It can change varaibles of PGC_USERSET by non-superuser or
> PGC_SUSET by superuser. The local session user should have the
> same privileges with pg_signal_backend() on the target session.
> 
> This patch contains documentation but doesn't contain test yet.
> 
> I would appreciate any comments or suggestions on this.

Minor updates and rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f4cbeae278a31ea91b4c82f3e8fa8cd4d45ad580 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 29 Jan 2019 17:03:57 +0900
Subject: [PATCH] ALTER SESSION

Some tracking features are controled by GUC so they cannot be
activated from another session. The ALTER SESSION command allows
sessions to change some of GUC settings of another session.
---
 doc/src/sgml/config.sgml |  26 ++-
 doc/src/sgml/ref/allfiles.sgml   |   1 +
 doc/src/sgml/reference.sgml  |   1 +
 src/backend/nodes/copyfuncs.c|  14 ++
 src/backend/nodes/equalfuncs.c   |  12 ++
 src/backend/parser/gram.y|  47 -
 src/backend/postmaster/postmaster.c  |   6 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/tcop/utility.c   |  13 ++
 src/backend/utils/init/postinit.c|   3 +
 src/backend/utils/misc/guc-file.l|  23 +++
 src/backend/utils/misc/guc.c | 323 +++
 src/bin/initdb/initdb.c  |   3 +-
 src/include/nodes/nodes.h|   1 +
 src/include/nodes/parsenodes.h   |  12 ++
 src/include/utils/guc.h  |  13 ++
 16 files changed, 422 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..a52989342a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -175,6 +175,14 @@ shared_buffers = 128MB
  read whenever postgresql.conf is, and its settings take
  effect in the same way.  Settings in postgresql.auto.conf
  override those in postgresql.conf.
+
+
+
+ Furthermore a directory pg_session_conf in the
+ PostgreSQL data directory contains per-session configuration files. Every
+ file holds settings provided through the
+  command. This is read by reloading
+ after the two above files and removed at the session-end.
 
 
 
@@ -195,8 +203,8 @@ shared_buffers = 128MB
   The already-mentioned  command
   provides a SQL-accessible means of changing global defaults; it is
   functionally equivalent to editing postgresql.conf.
-  In addition, there are two commands that allow setting of defaults
-  on a per-database or per-role basis:
+  In addition, there are three commands that allow setting of defaults
+  on a per-database, per-role or per-session basis:
  
 
  
@@ -213,6 +221,13 @@ shared_buffers = 128MB
per-database settings to be overridden with user-specific values.
   
  
+
+ 
+  
+   The  command allows other sessions to
+override session-local settings with user-specific values.
+  
+ 
 
 
  
@@ -223,6 +238,13 @@ shared_buffers = 128MB
   Note that some settings cannot be changed after server start, and
   so cannot be set with these commands (or the ones listed below).
 
+ 
+  Values set with ALTER SESSION are applied only when
+  reloading.  They override values obtained from the configuration files
+  or server command 

Re: [HACKERS] WAL logging problem in 9.4.3?

2019-01-29 Thread Kyotaro HORIGUCHI
Rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d5f2b47b6ba191d0ad1673f9bd9c5851d91a1b59 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 11 Oct 2018 10:03:21 +0900
Subject: [PATCH 1/4] TAP test for copy-truncation optimization.

---
 src/test/recovery/t/016_wal_optimize.pl | 192 
 1 file changed, 192 insertions(+)
 create mode 100644 src/test/recovery/t/016_wal_optimize.pl

diff --git a/src/test/recovery/t/016_wal_optimize.pl b/src/test/recovery/t/016_wal_optimize.pl
new file mode 100644
index 00..310772a2b3
--- /dev/null
+++ b/src/test/recovery/t/016_wal_optimize.pl
@@ -0,0 +1,192 @@
+# Test WAL replay for optimized TRUNCATE and COPY records
+#
+# WAL truncation is optimized in some cases with TRUNCATE and COPY queries
+# which sometimes interact badly with the other optimizations in line with
+# several setting values of wal_level, particularly when using "minimal" or
+# "replica".  The optimization may be enabled or disabled depending on the
+# scenarios dealt here, and should never result in any type of failures or
+# data loss.
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Wrapper routine tunable for wal_level.
+sub run_wal_optimize
+{
+	my $wal_level = shift;
+
+	# Primary needs to have wal_level = minimal here
+	my $node = get_new_node("node_$wal_level");
+	$node->init;
+	$node->append_conf('postgresql.conf', qq(
+wal_level = $wal_level
+));
+	$node->start;
+
+	# Test direct truncation optimization.  No tuples
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test1 (id serial PRIMARY KEY);
+		TRUNCATE test1;
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+
+	my $result = $node->safe_psql('postgres', "SELECT count(*) FROM test1;");
+	is($result, qq(0),
+	   "wal_level = $wal_level, optimized truncation with empty table");
+
+	# Test truncation with inserted tuples within the same transaction.
+	# Tuples inserted after the truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test2 (id serial PRIMARY KEY);
+		INSERT INTO test2 VALUES (DEFAULT);
+		TRUNCATE test2;
+		INSERT INTO test2 VALUES (DEFAULT);
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test2;");
+	is($result, qq(1),
+	   "wal_level = $wal_level, optimized truncation with inserted table");
+
+	# Data file for COPY query in follow-up tests.
+	my $basedir = $node->basedir;
+	my $copy_file = "$basedir/copy_data.txt";
+	TestLib::append_to_file($copy_file, qq(2,3
+20001,30001
+20002,30002));
+
+	# Test truncation with inserted tuples using COPY.  Tuples copied after the
+	# truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test3 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test3 (id, id2) VALUES (DEFAULT, generate_series(1,1));
+		TRUNCATE test3;
+		COPY test3 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test3;");
+	is($result, qq(3),
+	   "wal_level = $wal_level, optimized truncation with copied table");
+
+	# Test truncation with inserted tuples using both INSERT and COPY. Tuples
+	# inserted after the truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test4 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, generate_series(1,1));
+		TRUNCATE test4;
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, 1);
+		COPY test4 FROM '$copy_file' DELIMITER ',';
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, 1);
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test4;");
+	is($result, qq(5),
+	   "wal_level = $wal_level, optimized truncation with inserted/copied table");
+
+	# Test consistency of COPY with INSERT for table created in the same
+	# transaction.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test5 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test5 VALUES (DEFAULT, 1);
+		COPY test5 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test5;");
+	is($result, qq(4),
+	   "wal_level = $wal_level, replay of optimized copy with inserted table");
+
+	# Test consistency of COPY that inserts more to the same table using
+	# triggers.  If the INSERTS from the trigger go to the same block data
+	# is copied to, and the INSERTs are WAL-logged, WAL replay will fail when
+	# it tries to replay the WAL record but the "before" image doesn't match,
+	# because not all changes were WAL-logged.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test6 (id serial PRIMARY KEY, id2 text);
+		CREATE FUNCTION test6_before_row_trig() RETURNS trigger
+		  LANGUAGE plpgsql as 

Re: Covering GiST indexes

2019-01-29 Thread Andreas Karlsson

On 29/01/2019 19.58, Andrey Borodin wrote:

PFA patch without redundant checks.


I hope it is ok that I fixed a typo and some wording in the comment, 
plus re-added the btree query which I accidentally removed in my last mail.


I think the current state of the patch is fine, assuming the solution 
with truncTupdesc is deemed to be good enough by the committer, so I 
will mark this as ready for committer.


Thanks for the patch and the speedy fixes!

Andreas
>From f1a85940d01e208a6502e689616a8806146c9f3d Mon Sep 17 00:00:00 2001
From: Andreas Karlsson 
Date: Wed, 30 Jan 2019 01:59:14 +0100
Subject: [PATCH] Covering GiST v10

This patch allow adding INCLUDE colums to GiST index.
These colums do not participate in GiST tree build,
are not used for chosing subtree for inserts or splits.
But they can be fetched during IndexOnlyScan.
---
 doc/src/sgml/ref/create_index.sgml |   4 +-
 doc/src/sgml/textsearch.sgml   |   6 +
 src/backend/access/gist/gist.c |  33 +++-
 src/backend/access/gist/gistget.c  |   4 +-
 src/backend/access/gist/gistscan.c |  12 +-
 src/backend/access/gist/gistsplit.c|  12 +-
 src/backend/access/gist/gistutil.c |  42 --
 src/include/access/gist_private.h  |   2 +
 src/test/regress/expected/amutils.out  |   4 +-
 src/test/regress/expected/index_including.out  |   8 +-
 src/test/regress/expected/index_including_gist.out | 166 +
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 +
 src/test/regress/sql/index_including.sql   |   6 +-
 src/test/regress/sql/index_including_gist.sql  |  90 +++
 15 files changed, 358 insertions(+), 34 deletions(-)
 create mode 100644 src/test/regress/expected/index_including_gist.out
 create mode 100644 src/test/regress/sql/index_including_gist.sql

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ad619cdcfe..f8657c6ae2 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -181,8 +181,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 

-Currently, only the B-tree index access method supports this feature.
-In B-tree indexes, the values of columns listed in the
+Currently, the B-tree and the GiST index access methods supports this
+feature. In B-tree indexes, the values of columns listed in the
 INCLUDE clause are included in leaf tuples which
 correspond to heap tuples, but are not included in upper-level
 index entries used for tree navigation.
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index ecebade767..e4c75ebf6f 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3675,6 +3675,12 @@ SELECT plainto_tsquery('supernovae stars');
   
 
   
+   A GiST index can be covering, i.e. use the INCLUDE
+   clause. Included columns can have data types without any GiST operator
+   class. Included attributes will be stored uncompressed.
+  
+
+  
Lossiness causes performance degradation due to unnecessary fetches of table
records that turn out to be false matches.  Since random access to table
records is slow, this limits the usefulness of GiST indexes.  The
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index b75b3a8dac..2bbf7a7f49 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -75,7 +75,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = true;
 	amroutine->ampredlocks = true;
 	amroutine->amcanparallel = false;
-	amroutine->amcaninclude = false;
+	amroutine->amcaninclude = true;
 	amroutine->amkeytype = InvalidOid;
 
 	amroutine->ambuild = gistbuild;
@@ -1382,8 +1382,8 @@ gistSplit(Relation r,
 		IndexTupleSize(itup[0]), GiSTPageSize,
 		RelationGetRelationName(r;
 
-	memset(v.spl_lisnull, true, sizeof(bool) * giststate->tupdesc->natts);
-	memset(v.spl_risnull, true, sizeof(bool) * giststate->tupdesc->natts);
+	memset(v.spl_lisnull, true, sizeof(bool) * giststate->truncTupdesc->natts);
+	memset(v.spl_risnull, true, sizeof(bool) * giststate->truncTupdesc->natts);
 	gistSplitByKey(r, page, itup, len, giststate, , 0);
 
 	/* form left and right vector */
@@ -1462,8 +1462,19 @@ initGISTstate(Relation index)
 	giststate->scanCxt = scanCxt;
 	giststate->tempCxt = scanCxt;	/* caller must change this if needed */
 	giststate->tupdesc = index->rd_att;
+	/*
+	 * The truncated tupdesc does not contain the INCLUDE attributes.
+	 *
+	 * It is used to form tuples during tuple adjustement and page split.
+	 * B-tree creates shortened tuple descriptor for every truncated tuple,
+	 * because it is doing this less often: it does not have to form
+	 * truncated tuples during page split. Also, B-tree is not 

Re: [HACKERS] Block level parallel vacuum

2019-01-29 Thread Haribabu Kommi
On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada 
wrote:

>
> Attached the latest patches.
>

Thanks for the updated patches.
Some more code review comments.

+ started by a single utility command.  Currently, the parallel
+ utility commands that support the use of parallel workers are
+ CREATE INDEX and VACUUM
+ without FULL option, and only when building
+ a B-tree index.  Parallel workers are taken from the pool of


I feel the above sentence may not give the proper picture, how about the
adding following modification?

CREATE INDEX only when building a B-tree index
and VACUUM without FULL option.



+ * parallel vacuum, we perform both index vacuum and index cleanup in
parallel.
+ * Individual indexes is processed by one vacuum process. At beginning of

How about vacuum index and cleanup index similar like other places?


+ * memory space for dead tuples. When starting either index vacuum or
cleanup
+ * vacuum, we launch parallel worker processes. Once all indexes are
processed

same here as well?


+ * Before starting parallel index vacuum and parallel cleanup index we
launch
+ * parallel workers. All parallel workers will exit after processed all
indexes

parallel vacuum index and parallel cleanup index?


+ /*
+ * If there is already-updated result in the shared memory we
+ * use it. Otherwise we pass NULL to index AMs and copy the
+ * result to the shared memory segment.
+ */
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);

I didn't really find a need of the flag to differentiate the stats pointer
from
first run to second run? I don't see any problem in passing directing the
stats
and the same stats are updated in the worker side and leader side. Anyway
no two
processes will do the index vacuum at same time. Am I missing something?

Even if this flag is to identify whether the stats are updated or not before
writing them, I don't see a need of it compared to normal vacuum.


+ * Enter the parallel mode, allocate and initialize a DSM segment. Return
+ * the memory space for storing dead tuples or NULL if no workers are
prepared.
+ */

+ pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
+ request, true);

But we are passing as serializable_okay flag as true, means it doesn't
return
NULL. Is it expected?


+ initStringInfo();
+ appendStringInfo(,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(, ", requested %d", lvstate->options.nworkers);

what is the difference between planned workers and requested workers,
aren't both
are same?


- COMPARE_SCALAR_FIELD(options);
- COMPARE_NODE_FIELD(rels);
+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

Options is changed from SCALAR to check, but why the rels check is removed?
The options is changed from int to a structure so using SCALAR may not work
in other function like _copyVacuumStmt and etc?


+typedef struct VacuumOptions
+{
+ VacuumFlag flags; /* OR of VacuumFlag */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;


Do we need to add NodeTag for the above structure? Because this structure is
part of VacuumStmt structure.


+vacuumdb will require background
workers,
+so make sure your 
+setting is more than one.

removing vacuumdb and changing it as "This option will ..."?

I will continue the testing of this patch and share the details.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: A few new options for vacuumdb

2019-01-29 Thread Michael Paquier
On Tue, Jan 29, 2019 at 09:48:18PM +, Bossart, Nathan wrote:
> On 1/28/19, 6:35 PM, "Michael Paquier"  wrote:
>> -" ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
>> +" ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
>> +" LEFT JOIN pg_catalog.pg_class t"
>> +" ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
>> Why do need this part?
> 
> This is modeled after the query provided in the docs for preventing
> transaction ID wraparound [0].  I think the idea is to combine the
> relation with its TOAST table so that it does not need to be
> considered separately.  The VACUUM commands generated in vacuumdb will
> also process the corresponding TOAST table for the relation, anyway.

Oh, OK.  This makes sense.  It would be nice to add a comment in the
patch and to document this calculation method in the docs of
vacuumdb.

> I noticed a behavior change from the catalog query patch that we
> probably ought to fix.  The "WHERE c.relkind IN ('r', 'm')" clause
> seems sufficient to collect all vacuumable relations (TOAST tables are
> handled when vacuuming the main relation, and partitioned tables are
> handled by vacuuming the partitions individually), but it is not
> sufficient to match the previous behavior when --table is used.
> Previously, we did not filter by relkind at all when --table is used.
> Instead, we let the server emit a WARNING when a relation that
> couldn't be processed was specified.

Indeed, the WARNING can be useful for some users when trying to work
on an incorrect relation kind, especially when not using --verbose.
Fixed after adding a test with command_checks_all.

> Unfortunately, this complicates the --min-xid-age and --min-mxid-age
> patch a bit, as some of the relation types that can be vacuumed and/or
> analyzed do not really have a transaction ID age.  AFAICT the simplest
> way to handle this case is to filter out relations with a relfrozenxid
> or relminmxid of 0.

We should be able to live with that.
--
Michael


signature.asc
Description: PGP signature


Re: COPY FROM WHEN condition

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-30 10:33:30 +1300, David Rowley wrote:
> On Wed, 30 Jan 2019 at 10:12, Andres Freund  wrote:
> >
> > On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> > > On Wed, 30 Jan 2019 at 04:22, Andres Freund  wrote:
> > > > I think I might have a patch addressing the problem incidentally. For 
> > > > pluggable storage I slotified copy.c, which also removes the first 
> > > > heap_form_tuple. Quite possible that nothing more is needed. I've 
> > > > removed the batch context altogether in yesterday's rebase, there was 
> > > > no need anymore.
> > >
> > > In your patch, where do the batched tuples get stored before the heap
> > > insert is done?
> >
> > There's one slot for each batched tuple (they are reused). Before
> > materialization the tuples solely exist in tts_isnull/values into which
> > NextCopyFrom() directly parses the values.  Tuples never get extracted
> > from the slot in copy.c itself anymore, table_multi_insert() accepts
> > slots.  Not quite sure whether I've answered your question?
> 
> I think so.  I imagine that should also speed up COPY WHERE too as
> it'll no longer form a tuple before possibly discarding it.

Right.

I found some issues in my patch (stupid implementation of copying from
one slot to the other), but after fixing that I get:

master:
Time: 16013.509 ms (00:16.014)
Time: 16836.110 ms (00:16.836)
Time: 16636.796 ms (00:16.637)

pluggable storage:
Time: 15974.243 ms (00:15.974)
Time: 16183.442 ms (00:16.183)
Time: 16055.192 ms (00:16.055)

(with a truncate between each run)

So that seems a bit better. Albeit at the cost of having a few, on
demand creatd, empty slots for each encountered partition.

I'm pretty sure we can optimize that further...


Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2019-01-29 Thread David Rowley
On Wed, 30 Jan 2019 at 10:12, Andres Freund  wrote:
>
> On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> > On Wed, 30 Jan 2019 at 04:22, Andres Freund  wrote:
> > > I think I might have a patch addressing the problem incidentally. For 
> > > pluggable storage I slotified copy.c, which also removes the first 
> > > heap_form_tuple. Quite possible that nothing more is needed. I've removed 
> > > the batch context altogether in yesterday's rebase, there was no need 
> > > anymore.
> >
> > In your patch, where do the batched tuples get stored before the heap
> > insert is done?
>
> There's one slot for each batched tuple (they are reused). Before
> materialization the tuples solely exist in tts_isnull/values into which
> NextCopyFrom() directly parses the values.  Tuples never get extracted
> from the slot in copy.c itself anymore, table_multi_insert() accepts
> slots.  Not quite sure whether I've answered your question?

I think so.  I imagine that should also speed up COPY WHERE too as
it'll no longer form a tuple before possibly discarding it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> On Wed, 30 Jan 2019 at 04:22, Andres Freund  wrote:
> > I think I might have a patch addressing the problem incidentally. For 
> > pluggable storage I slotified copy.c, which also removes the first 
> > heap_form_tuple. Quite possible that nothing more is needed. I've removed 
> > the batch context altogether in yesterday's rebase, there was no need 
> > anymore.
> 
> In your patch, where do the batched tuples get stored before the heap
> insert is done?

There's one slot for each batched tuple (they are reused). Before
materialization the tuples solely exist in tts_isnull/values into which
NextCopyFrom() directly parses the values.  Tuples never get extracted
from the slot in copy.c itself anymore, table_multi_insert() accepts
slots.  Not quite sure whether I've answered your question?

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2019-01-29 Thread David Rowley
On Tue, 29 Jan 2019 at 22:51, Tomas Vondra  wrote:
>
> On 1/29/19 8:18 AM, David Rowley wrote:
> > So looks like your change slows this code down 4% for this test case.
> > That's about twice as bad as the 2.2% regression that I had to solve
> > for the multi-insert partition patch (of which you've removed much of
> > the code from)
> >
> > I'd really like to see the alternating batch context code being put
> > back in to fix this. Of course, if you have a better idea, then we can
> > look into that, but just removing code that was carefully written and
> > benchmarked without any new benchmarks to prove that it's okay to do
> > so seems wrong to me.
> >
>
> Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
> shall we try to massage it a bit first? ISTM we could simply create two
> batch memory contexts and alternate those, just like with the expression
> contexts before.

I don't think a revert should be done. I didn't check in detail, but
it seems what you did fixed the memory leak issue for when many tuples
are filtered out.  I'd like to have more detail on what Andres is
proposing, but otherwise, I'd imagine we can just have two batch
contexts and alternate between them, as before, but also keep the
per-tuple context too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

2019-01-29 Thread David Rowley
On Wed, 30 Jan 2019 at 04:22, Andres Freund  wrote:
> I think I might have a patch addressing the problem incidentally. For 
> pluggable storage I slotified copy.c, which also removes the first 
> heap_form_tuple. Quite possible that nothing more is needed. I've removed the 
> batch context altogether in yesterday's rebase, there was no need anymore.

In your patch, where do the batched tuples get stored before the heap
insert is done?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

2019-01-29 Thread David Rowley
On Wed, 30 Jan 2019 at 03:26, Tomas Vondra  wrote:
>
> On 1/29/19 8:18 AM, David Rowley wrote:
> > (details about performance regression)
>
> How do I reproduce this? I don't see this test in the thread you linked,
> so I'm not sure how many partitions you were using, what's the structure
> of the table etc.

The setup is:

create table listp (a int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Chapman Flack
On 1/29/19 3:36 PM, Merlin Moncure wrote:
> I hate to bikeshed here, but I think it's better english using that
> style of syntax to say,
>  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

I had been just about to also engage in bikeshedding, on grounds
that (to me) the MATERIALIZED/NOT MATERIALIZED form seemed more
natural:

FROM GROCER OBTAIN WAXED CUCUMBERS. (this seems downright natural)
FROM GROCER OBTAIN NOT WAXED CUCUMBERS. (nearly natural, s/NOT /UN/)

FROM GROCER OBTAIN WAX ON CUCUMBERS. (these read oddly to me)
FROM GROCER OBTAIN WAX OFF CUCUMBERS.

I do understand Tom's point that the wax-on/wax-off form generalizes
more easily to non-boolean future options. It would really read
better as a parenthetical, so too bad parentheses are already taken
to go around the query.

While gawking at the bikeshed, one more thing came to mind:

I like to hold out hope [1] that, one day, the WITH grammar could
be extended to handle lexically-scoped option settings like those in
the ISO standard.

It doesn't seem to me that any of these current proposals would get
in the way of that. Just another thing to have in mind.

Regards,
-Chap


[1]
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XMLBINARY_and_XMLNAMESPACES



Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-29 Thread Peter Eisentraut
Here is an updated patch, which addresses some of your issues below as
well as the earlier reported issue that comments were lost during
REINDEX CONCURRENTLY.

On 16/01/2019 09:27, Michael Paquier wrote:
> On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
>> NOTICE seems unnecessary here.
>>
>> Unfortunally concurrently reindex loses comments, reproducer:
> 
> Yes, the NOTICE message makes little sense.

This is existing behavior of reindex-not-concurrently.

> I am getting back in touch with this stuff.  It has been some time but
> the core of the patch has not actually changed in its base concept, so
> I am still very familiar with it as the original author.  There are
> even typos I may have introduced a couple of years back, like
> "contraint".  I have not yet spent much time on that, but there are at
> quick glance a bunch of things that could be retouched to get pieces
> of that committable.
> 
> +The concurrent index created during the processing has a name ending in
> +the suffix ccnew, or ccold if it is an old index definiton which we 
> failed
> +to drop. Invalid indexes can be dropped using DROP 
> INDEX,
> +including invalid toast indexes.
> This needs  markups for "ccnew" and "ccold".  "definiton" is
> not correct.

Fixed those.

> index_create does not actually need its extra argument with the tuple
> descriptor.  I think that we had better grab the column name list from
> indexInfo and just pass that down to index_create() (patched on my
> local branch), so it is an overkill to take a full copy of the index's
> TupleDesc.

Please send a fixup patch.

> The patch, standing as-is, is close to 2k lines long, so let's cut
> that first into more pieces refactoring the concurrent build code.
> Here are some preliminary notes:
> - WaitForOlderSnapshots() could be in its own patch.
> - index_concurrently_build() and index_concurrently_set_dead() can be
> in an independent patch.  set_dead() had better be a wrapper on top of
> index_set_state_flags actually which is able to set any kind of
> flags.
> - A couple of pieces in index_create() could be cut as well.

I'm not a fan of that.  I had already considered all the ways in which
subparts of this patch could get committed, and some of it was
committed, so what's left now is what I thought should stay together.
The patch isn't really that big and most of it is moving code around.  I
would also avoid chopping around in this patch now and focus on getting
it finished instead.  The functionality seems solid, so if it's good,
let's commit it, if it's not, let's get it fixed up.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e9600073108c9fbfe64087932f4bb2ea12f58418 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 29 Jan 2019 21:45:29 +0100
Subject: [PATCH v7] REINDEX CONCURRENTLY

---
 doc/src/sgml/mvcc.sgml|   1 +
 doc/src/sgml/ref/reindex.sgml | 184 +++-
 src/backend/catalog/index.c   | 547 ++-
 src/backend/catalog/pg_depend.c   | 143 +++
 src/backend/catalog/toasting.c|   2 +-
 src/backend/commands/indexcmds.c  | 882 +++---
 src/backend/commands/tablecmds.c  |  32 +-
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/parser/gram.y |  22 +-
 src/backend/tcop/utility.c|  10 +-
 src/bin/psql/common.c |  16 +
 src/bin/psql/tab-complete.c   |  18 +-
 src/include/catalog/dependency.h  |   5 +
 src/include/catalog/index.h   |  17 +
 src/include/commands/defrem.h |   6 +-
 src/include/nodes/parsenodes.h|   1 +
 .../expected/reindex-concurrently.out |  78 ++
 src/test/isolation/isolation_schedule |   1 +
 .../isolation/specs/reindex-concurrently.spec |  40 +
 src/test/regress/expected/create_index.out|  95 ++
 src/test/regress/sql/create_index.sql |  61 ++
 22 files changed, 1989 insertions(+), 174 deletions(-)
 create mode 100644 src/test/isolation/expected/reindex-concurrently.out
 create mode 100644 src/test/isolation/specs/reindex-concurrently.spec

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index bedd9a008d..9b7ef8bf09 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -926,6 +926,7 @@ Table-level Lock Modes
 
  Acquired by VACUUM (without FULL),
  ANALYZE, CREATE INDEX 
CONCURRENTLY,
+ REINDEX CONCURRENTLY,
  CREATE STATISTICS, and certain ALTER
  INDEX and ALTER TABLE variants (for full
  details see  and 
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ 
CONCURRENTLY ] 

Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
Merlin Moncure  writes:
> On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
>> I propose that we implement and document this as
>> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

> I hate to bikeshed here, but I think it's better english using that
> style of syntax to say,
>  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

Hm.  Doesn't really seem to fit with our style for options elsewhere;
for instance, EXPLAIN's options are things like ANALYZE ON/OFF and
VERBOSE ON/OFF, not ANALYSIS and VERBOSITY.

regards, tom lane



Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-29 Thread Andres Freund
On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> While chatting with Robert about this issue I came across the following
> section of code:
> 
>   /*
>* If the FSM knows nothing of the rel, try the last page 
> before we
>* give up and extend.  This avoids one-tuple-per-page syndrome 
> during
>* bootstrapping or in a recently-started system.
>*/
>   if (targetBlock == InvalidBlockNumber)
>   {
>   BlockNumber nblocks = 
> RelationGetNumberOfBlocks(relation);
> 
>   if (nblocks > 0)
>   targetBlock = nblocks - 1;
>   }
> 
> 
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs).  Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.

FWIW, after commenting out that block and adapting a few regression
tests to changed plans, I could not reproduce the issue on a FreeBSD
machine in 31 runs, where it previously triggered in roughly 1/3 cases.

Still don't quite understand why so much more likely on BSD...

Greetings,

Andres Freund



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure
On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> >> Yeah, I thought about that too, but it doesn't seem like an improvement.
> >> If the query is very long (which isn't unlikely) I think people would
> >> prefer to see the option(s) up front.
>
> > Having these options at the front of the WITH clause looks more
> > natural to me.
>
> Well, we've managed to get agreement on the semantics of this thing,
> let's not get hung up on the syntax details.
>
> I propose that we implement and document this as
>
> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> which is maybe a bit clunky but not awful, and it would leave room
> to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
> ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
> probably allow just "MATERIALIZE" as well, with the boolean value
> defaulting to true.

I hate to bikeshed here, but I think it's better english using that
style of syntax to say,
 WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

merlin



Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-29 Thread Andres Freund
On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > runs. Looking at the buildfarm it looks like the failures were,
> > > excluding handfish which failed without recognizable symptoms before and
> > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > is interesting.
> > 
> > Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> > got four or five different BSD-ish platforms that reported failures,
> > and it's hard to believe they've all got identical schedulers.
> > 
> > That second handfish failure does match the symptoms elsewhere:
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish=2019-01-29%2000%3A20%3A22
> > 
> > --- 
> > /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
> > 2018-10-30 20:11:45.551967381 -0300
> > +++ 
> > /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
> >  2019-01-28 22:38:20.614211568 -0200
> > @@ -0,0 +1,20 @@
> > +SQL error: page 0 of relation "test_thread" should be empty but is not on 
> > line 125
> > 
> > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > way higher than elsewhere.  Puzzling.
> 
> Interesting.
> 
> While chatting with Robert about this issue I came across the following
> section of code:
> 
>   /*
>* If the FSM knows nothing of the rel, try the last page 
> before we
>* give up and extend.  This avoids one-tuple-per-page syndrome 
> during
>* bootstrapping or in a recently-started system.
>*/
>   if (targetBlock == InvalidBlockNumber)
>   {
>   BlockNumber nblocks = 
> RelationGetNumberOfBlocks(relation);
> 
>   if (nblocks > 0)
>   targetBlock = nblocks - 1;
>   }
> 
> 
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs).  Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.
> 
> Hm, thinking about what a good solution here could be.

I wonder if we should just expand the logic we have for
RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
just use it without any changes, but the name seems a bit confusing) -
because that'd prevent the current weirdness that it's possible that the
buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
the LockBuffer(BUFFER_LOCK_EXCLUSIVE).  I think that'd allow us to
alltogether drop the cleanup lock logic we currently have, and also
protect us against the FSM issue I'd outlined upthread?

Greetings,

Andres Freund



Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-29 Thread Jesper Pedersen

Hi,

On 1/29/19 12:08 AM, Michael Paquier wrote:

On Tue, Jan 29, 2019 at 12:35:30AM +, Jamison, Kirk wrote:

I just checked the patch.
As per advice, you removed the versioning and specified --jobs.
The patch still applies, builds and passed the tests successfully.


I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.



I added most of the documentation back, as requested by Kirk.


It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no?  What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses.  So there is no actual need for the if/else
complication business.


I think it is ok for the echo command to highlight to the user that 
running --analyze-only using the same amount of jobs will give a faster 
result.



2) Perhaps we need to worry about the second vacuumdb --all command,
which may want custom options, which are not necessarily the same as
the options of the first command?  I don't think we need to care as it
applies only to an upgraded cluster using something older 8.4, just
wondering..


I think that --all --analyze-in-stages is what most people want. And 
with the $VACUUMDB_OPTS variable people have an option to put in more, 
such as -j X.


Best regards,
 Jesper


>From b02e1f414e309361595ab3fe84fb6f66bcf63265 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +-
 src/bin/pg_upgrade/check.c  | 28 ++--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..84a7ccb6f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-
  in parallel;  a good place to start is the maximum of the number of
  CPU cores and tablespaces.  This option can dramatically reduce the
  time to upgrade a multi-database server running on a multiprocessor
- machine.
+ machine. Note that this option isn't passed to the
+ vacuumdb application by default.
+ The system environment variable VACUUMDB_OPTS can be
+ used to pass extra options to the vacuumdb
+ application during the script execution.
 
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..847e604088 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2



Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
>> Yeah, I thought about that too, but it doesn't seem like an improvement.
>> If the query is very long (which isn't unlikely) I think people would
>> prefer to see the option(s) up front.

> Having these options at the front of the WITH clause looks more
> natural to me.

Well, we've managed to get agreement on the semantics of this thing,
let's not get hung up on the syntax details.

I propose that we implement and document this as

WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )

which is maybe a bit clunky but not awful, and it would leave room
to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
probably allow just "MATERIALIZE" as well, with the boolean value
defaulting to true.

regards, tom lane



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-01-29 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-18, Peter Geoghegan wrote:
>> I thought that your ALTER OBJECT DEPENDS ON EXTENSION example made the
>> case for fixing that directly inarguable. I'm slightly surprised that
>> you're not fully convinced of this already. Have I missed some
>> subtlety?

> I agree that it needs fixed, but I don't think we know what to change it
> *to*.  The suggestion to use one AUTO and one INTERNAL seems to me to
> break some use cases.  Maybe one INTERNAL and one INTERNAL_AUTO works
> well, not sure.

I've been chewing on this problem off-list, and frankly it's a mess.

The closest I could get to solving it along the original lines
went like this:

 * In addition, we support INTERNAL_AUTO dependencies, which alter the
 * rules for this.  If the target has both INTERNAL and INTERNAL_AUTO
 * dependencies, then it can be dropped if any one of those objects is
 * dropped, but not unless at least one of them is.  In this situation we
 * mustn't automatically transform a random deletion request into a
 * parent-object deletion.  Instead, we proceed with deletion if we are
 * recursing from any owning object, and otherwise set the object aside to
 * recheck at the end.  If we don't later find a reason to delete one of
 * the owning objects, we'll throw an error.

I think that's ugly as sin: INTERNAL means something different if there's
another dependency beside it?  It also turns out to have some
implementation problems we need not get into here, but which would create
a performance penalty for deletions.

A theoretically purer answer is to split INTERNAL_AUTO into two new
dependency types, one linking to the real "owning" object (the parent
index or trigger) and the other to the secondary owner (the partition
table), while leaving regular INTERNAL as it stands.  I don't especially
want to go that way, though: it seems overcomplicated from the standpoint
of outside inspections of pg_depend, and it'd be very risky to back-patch
into v11.  (For instance, if someone went backwards on their postmaster
minor release, the older v11 version would not know what to do with the
new dependency type.)

It strikes me however that we can stick with the existing catalog contents
by making this definition: of the INTERNAL_AUTO dependencies of a given
object, the "true owner" to be reported in errors is the dependency
that is of the same classId as that object.  If this doesn't uniquely
identify one dependency, the report will mention a random one.  This is
ugly as well, but it's certainly a lot more practical to cram into v11,
and it seems like it would cover the current and likely future use-cases
for partition-related objects.  When and if we find a use-case for which
this doesn't work, we can think about extending the catalog representation
to identify a unique owner in a less ad-hoc way.

With either of these, the implementation I envision is to let the first
scan loop in findDependentObjects examine all the ref dependencies before
deciding what to do.  If any one of the INTERNAL_AUTO dependencies is
being recursed from, we can allow the deletion to proceed.  If not, do not
continue with the deletion, but save the target object's ID into a side
array, along with the ID of the object we determined to be its "true
owner".  After completing the recursive dependency-tree walk, check each
object listed in the side array to see if it got deleted (by looking in
the main array of deletable objects that findDependentObjects reports).
If not, complain, citing the also-saved owning object ID as the object to
delete instead.  This closes the hole I identified that the existing code
just assumes it can skip deleting the dependent object.

There are still some subtleties here: even if we didn't delete the
dependent object, someone else might've done so concurrently, which'd
result in getObjectDescription() failing if we just blindly try to
print an error message.  We could fix that by re-locking the object
and seeing if it still exists before complaining; but I wonder if that
raises the risks of deadlocks materially.

regards, tom lane



Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-29 Thread Andres Freund
On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I did that now. I couldn't reproduce it locally, despite a lot of
> > runs. Looking at the buildfarm it looks like the failures were,
> > excluding handfish which failed without recognizable symptoms before and
> > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > is interesting.
> 
> Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> got four or five different BSD-ish platforms that reported failures,
> and it's hard to believe they've all got identical schedulers.
> 
> That second handfish failure does match the symptoms elsewhere:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish=2019-01-29%2000%3A20%3A22
> 
> --- 
> /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
>   2018-10-30 20:11:45.551967381 -0300
> +++ 
> /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
>2019-01-28 22:38:20.614211568 -0200
> @@ -0,0 +1,20 @@
> +SQL error: page 0 of relation "test_thread" should be empty but is not on 
> line 125
> 
> so it's not quite 100% BSD, but certainly the failure rate on BSD is
> way higher than elsewhere.  Puzzling.

Interesting.

While chatting with Robert about this issue I came across the following
section of code:

/*
 * If the FSM knows nothing of the rel, try the last page 
before we
 * give up and extend.  This avoids one-tuple-per-page syndrome 
during
 * bootstrapping or in a recently-started system.
 */
if (targetBlock == InvalidBlockNumber)
{
BlockNumber nblocks = 
RelationGetNumberOfBlocks(relation);

if (nblocks > 0)
targetBlock = nblocks - 1;
}


I think that explains the issue (albeit not why it is much more frequent
on BSDs).  Because we're not going through the FSM, it's perfectly
possible to find a page that is uninitialized, *and* is not yet in the
FSM. The only reason this wasn't previously actively broken, I think, is
that while we previously *also* looked that page (before the extending
backend acquired a lock!), when looking at the page
PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
space because it just interprets the zeroes in pd_upper - pd_lower as no
free space.

Hm, thinking about what a good solution here could be.

Greetings,

Andres Freund



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-29 Thread Robert Haas
On Fri, Jan 25, 2019 at 4:18 PM Alvaro Herrera  wrote:
> > I wrote a little patch that stores the relation OIDs of the partitions
> > into the PartitionedPruneRelInfo and then, at execution time, does an
> > Assert() that what it gets matches what existed at plan time.  I
> > figured that a good start would be to find a test case where this
> > fails with concurrent DDL allowed, but I haven't so far succeeded in
> > devising one.  To make the Assert() fail, I need to come up with a
> > case where concurrent DDL has caused the PartitionDesc to be rebuilt
> > but without causing an update to the plan.  If I use prepared queries
> > inside of a transaction block, [...]
>
> > I also had the idea of trying to use a cursor, because if I could
> > start execution of a query, [...]
>
> Those are the ways I thought of, and the reason for the shape of some of
> those .spec tests.  I wasn't able to hit the situation.

I've managed to come up with a test case that seems to hit this case.

Preparation:

create table foo (a int, b text, primary key (a)) partition by range (a);
create table foo1 partition of foo for values from (0) to (1000);
create table foo2 partition of foo for values from (1000) to (2000);
insert into foo1 values (1, 'one');
insert into foo2 values (1001, 'two');
alter system set plan_cache_mode = force_generic_plan;
select pg_reload_conf();

$ cat >x
alter table foo detach partition foo2;
alter table foo attach partition foo2 for values from (1000) to (2000);
^D

Window #1:

prepare foo as select * from foo where a = $1;
explain execute foo(1500);
\watch 0.01

Window #2:

$ pgbench -n -f x -T 60

Boom:

TRAP: FailedAssertion("!(partdesc->nparts == pinfo->nparts)", File:
"execPartition.c", Line: 1631)

I don't know how to reduce this to something reliable enough to
include it in the regression tests, and maybe we don't really need
that, but it's good to know that this is not a purely theoretical
problem.  I think next I'll try to write some code to make
execPartition.c able to cope with the situation when it arises.

(My draft/WIP patches attached, if you're interested.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0008-Drop-lock-level.patch
Description: Binary data


0006-Adapt-the-executor-to-use-a-PartitionDirectory.patch
Description: Binary data


0005-Adapt-the-optimizer-to-use-a-PartitionDirectory.patch
Description: Binary data


0007-relid_map-crosschecks.patch
Description: Binary data


0004-Postpone-old-context-removal.patch
Description: Binary data


0003-Initial-cut-at-PartitionDirectory.patch
Description: Binary data


0002-Ensure-that-RelationBuildPartitionDesc-sees-a-consis.patch
Description: Binary data


0001-Move-code-for-managing-PartitionDescs-into-a-new-fil.patch
Description: Binary data


Re: Covering GiST indexes

2019-01-29 Thread Andrey Borodin


> 29 янв. 2019 г., в 23:43, Andreas Karlsson  написал(а):
> 
> On 29/01/2019 18.45, Andrey Borodin wrote:
>> Also, I've unified gist and r-tree syntax tests for INCLUDE.
> 
> Why not just keep it simple? The purpose is not to test the GiST indexes, for 
> that we have index_including_gist.sql.
> 
> I propose the following.

Indeed, it's quite strange to check that GiST cannot be built without GiST 
opclass.

PFA patch without redundant checks.

Best regards, Andrey Borodin.



0001-Covering-GiST-v9.patch
Description: Binary data




Re: Covering GiST indexes

2019-01-29 Thread Andreas Karlsson

On 29/01/2019 18.45, Andrey Borodin wrote:

Also, I've unified gist and r-tree syntax tests for INCLUDE.


Why not just keep it simple? The purpose is not to test the GiST 
indexes, for that we have index_including_gist.sql.


I propose the following.

/*
 * 7. Check various AMs. All but btree and gist must fail.
 */
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
CREATE INDEX on tbl USING rtree(c3) INCLUDE (c1, c4);
DROP TABLE tbl;

and

/*
 * 7. Check various AMs. All but btree and gist must fail.
 */
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR:  access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
ERROR:  access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR:  access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
ERROR:  access method "hash" does not support included columns
CREATE INDEX on tbl USING rtree(c3) INCLUDE (c1, c4);
NOTICE:  substituting access method "gist" for obsolete method "rtree"
DROP TABLE tbl;



Re: Covering GiST indexes

2019-01-29 Thread Andrey Borodin
Thanks! Here's new version 8

> 29 янв. 2019 г., в 22:00, Andreas Karlsson  написал(а):
> 
> * I think it is worth writing a short comment when you create truncTupdesc 
> about why this is done.
It resulted in not so short comment, but I think it is OK.
> 
> * Very minor thing: the diff below is pointless churn on a line not touched 
> by the patch.
> 
> -values, isnull, true /* size is currently bogus */ );
> +values, isnull, true /* size is currently bogus */);
Fixed.
> 
> * Another very minor thing: The diff below from gistFormTuple() should 
> probably be consistent about brackets.
> 
> +   if (isnull[i])
> +   compatt[i] = (Datum) 0;
> +   else
> +   {
> +   compatt[i] = attdata[i];
> +   }
Fixed.


Also, I've unified gist and r-tree syntax tests for INCLUDE.

Best regards, Andrey Borodin.



0001-Covering-GiST-v8.patch
Description: Binary data


Ltree syntax improvement

2019-01-29 Thread Dmitry Belyavsky
Dear all,

Please find attached the patch extending the sets of symbols allowed in
ltree labels. The patch introduces 2 variants of escaping symbols, via
backslashing separate symbols and via quoting the labels in whole for
ltree, lquery and ltxtquery datatypes.

-- 
SY, Dmitry Belyavsky
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 8226930905..5f45726229 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION ltree;
+SET standard_conforming_strings=on;
 -- Check whether any of our opclasses fail amvalidate
 SELECT amname, opcname
 FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod
@@ -7679,3 +7680,1587 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ;
 15
 (1 row)
 
+-- Extended syntax, escaping, quoting etc
+-- success
+SELECT E'\\.'::ltree;
+ ltree 
+---
+ "."
+(1 row)
+
+SELECT E'\\ '::ltree;
+ ltree 
+---
+ " "
+(1 row)
+
+SELECT E''::ltree;
+ ltree 
+---
+ "\"
+(1 row)
+
+SELECT E'\\a'::ltree;
+ ltree 
+---
+ a
+(1 row)
+
+SELECT E'\\n'::ltree;
+ ltree 
+---
+ n
+(1 row)
+
+SELECT E'x'::ltree;
+ ltree 
+---
+ "x\"
+(1 row)
+
+SELECT E'x\\ '::ltree;
+ ltree 
+---
+ "x "
+(1 row)
+
+SELECT E'x\\.'::ltree;
+ ltree 
+---
+ "x."
+(1 row)
+
+SELECT E'x\\a'::ltree;
+ ltree 
+---
+ xa
+(1 row)
+
+SELECT E'x\\n'::ltree;
+ ltree 
+---
+ xn
+(1 row)
+
+SELECT 'a b.с d'::ltree;
+ltree
+-
+ "a b"."с d"
+(1 row)
+
+SELECT ' e . f '::ltree;
+ ltree 
+---
+ e.f
+(1 row)
+
+SELECT ' '::ltree;
+ ltree 
+---
+ 
+(1 row)
+
+SELECT E'\\ g  . h\\ '::ltree;
+   ltree   
+---
+ " g"."h "
+(1 row)
+
+SELECT E'\\ g'::ltree;
+ ltree 
+---
+ " g"
+(1 row)
+
+SELECT E' h\\ '::ltree;
+ ltree 
+---
+ "h "
+(1 row)
+
+SELECT '" g  "." h "'::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT '" g  " '::ltree;
+ ltree  
+
+ " g  "
+(1 row)
+
+SELECT '" g  "   ." h "  '::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT nlevel(E'Bottom\\.Test'::ltree);
+ nlevel 
+
+  1
+(1 row)
+
+SELECT subpath(E'Bottom\\.'::ltree, 0, 1);
+  subpath  
+---
+ "Bottom."
+(1 row)
+
+SELECT subpath(E'a\\.b', 0, 1);
+ subpath 
+-
+ "a.b"
+(1 row)
+
+SELECT subpath(E'a\\..b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a\\..\\b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a b.с d'::ltree, 1, 1);
+ subpath 
+-
+ "с d"
+(1 row)
+
+SELECT(
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'\z\z\z\z\z')::ltree;
+  ltree  
+-
+ 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789z
+(1 row)
+
+SELECT('   ' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'\a\b\c\d\e   ')::ltree;
+  ltree  
+-
+ 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abcde
+(1 row)
+
+SELECT 'abc\|d'::lquery;
+ lquery  
+-
+ "abc|d"
+(1 row)
+
+SELECT 'abc\|d'::ltree ~ 'abc\|d'::lquery;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT 'abc|d'::ltree ~ 'abc*'::lquery; --true
+ ?column? 

Re: Covering GiST indexes

2019-01-29 Thread Andreas Karlsson

On 29/01/2019 18.00, Andreas Karlsson wrote:
Thanks for the new version of the patch. Based on my knowledge of PG 
this is starting to look good, and I have only three small comments below.


Sorry, I missed retree in the tests. Can you fix the below to match the 
gist example?


 CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
 NOTICE:  substituting access method "gist" for obsolete method "rtree"
-ERROR:  access method "gist" does not support included columns
+ERROR:  data type integer has no default operator class for access 
method "gist"
+HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.

+CREATE INDEX on tbl USING rtree(c4) INCLUDE (c1, c4);

Andreas



Re: Covering GiST indexes

2019-01-29 Thread Andreas Karlsson
Thanks for the new version of the patch. Based on my knowledge of PG 
this is starting to look good, and I have only three small comments below.


I am not 100% a fan of truncTupdesc, but as long as it is well commented 
I think that it is fine.


= Review

* I think it is worth writing a short comment when you create 
truncTupdesc about why this is done.


* Very minor thing: the diff below is pointless churn on a line not 
touched by the patch.


-values, isnull, true /* size is currently bogus 
*/ );
+values, isnull, true /* size is currently bogus 
*/);


* Another very minor thing: The diff below from gistFormTuple() should 
probably be consistent about brackets.


+   if (isnull[i])
+   compatt[i] = (Datum) 0;
+   else
+   {
+   compatt[i] = attdata[i];
+   }

Andreas



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-29 Thread Robert Haas
On Tue, Jan 29, 2019 at 12:29 AM Simon Riggs  wrote:
>> But I kind of wonder whether we're really gaining as much as you think
>> by trying to support concurrent DETACH in the first place.  If there
>> are queries running against the table, there's probably at least
>> AccessShareLock on the partition itself, not just the parent.  And
>> that means that reducing the necessary lock on the parent from
>> AccessExclusiveLock to something less doesn't really help that much,
>> because we're still going to block trying to acquire
>> AccessExclusiveLock on the child.  Now you might say: OK, well, just
>> reduce that lock level, too.
>
> The whole point of CONCURRENT detach is that you're not removing it whilst 
> people are still using it, you're just marking it for later disuse.

Well, I don't think that's the way any patch so far proposed actually works.

>> But that seems to me to be opening a giant can of worms which we are
>> unlikely to get resolved in time for this release.  The worst of those
>> problem is that if you also reduce the lock level on the partition
>> when attaching it, then you are adding a constraint while somebody
>> might at the exact same time be inserting a tuple that violates that
>> constraint.
>
> Spurious.
>
> This would only be true if we were adding a constraint that affected existing 
> partitions.
>
> The constraint being added affects the newly added partition, not existing 
> ones.

I agree that it affects the newly added partition, not existing ones.
But if you don't hold an AccessExclusiveLock on that partition while
you are adding that constraint to it, then somebody could be
concurrently inserting a tuple that violates that constraint.  This
would be an INSERT targeting the partition directly, not somebody
operating on the partitioning hierarchy to which it is being attached.

>> Those two things have to be synchronized somehow.  You
>> could avoid that by reducing the lock level on the partition when
>> detaching and not when attaching.  But even then, detaching a
>> partition can involve performing a whole bunch of operations for which
>> we currently require AccessExclusiveLock. AlterTableGetLockLevel says:
>>
>> /*
>>  * Removing constraints can affect SELECTs that have been
>>  * optimised assuming the constraint holds true.
>>  */
>> case AT_DropConstraint: /* as DROP INDEX */
>> case AT_DropNotNull:/* may change some SQL plans */
>> cmd_lockmode = AccessExclusiveLock;
>> break;
>>
>> Dropping a partition implicitly involves dropping a constraint.  We
>> could gamble that the above has no consequences that are really
>
> It's not a gamble if you know that the constraints being dropped constrain 
> only the object being dropped.

That's not true, but I can't refute your argument any more than that
because you haven't made one.

>  I've not read every argument on this thread, but many of the later points 
> made here are spurious, by which I mean they sound like they could apply but 
> in fact do not.

I think they do apply, and until somebody explains convincingly why
they don't, I'm going to keep thinking that they do.   Telling me that
my points are wrong without making any kind of argument about why they
are wrong is not constructive.  I've put a lot of energy into
analyzing this topic, both recently and in previous release cycles,
and I'm not inclined to just say "OK, well, Simon says I'm wrong, so
that's the end of it."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rename nodes/relation.h => nodes/pathnodes.h ?

2019-01-29 Thread Andres Freund
Hi,

On 2019-01-29 10:31:39 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Jan-28, Tom Lane wrote:
> > I do like planner/pathnodes.h as a name, FWIW.
> 
> Yeah, I think I'll go with pathnodes.h.  We'd probably keep using that
> for the Path node typedefs themselves, even if somebody comes up with
> a design for splitting out other things.

+1

FWIW, I can live with the #ifndef, #define, typedef .. #endif thing, I
just don't think it's pretty.

Greetings,

Andres Freund



Re: Follow-up on INSERT INTO ... SET ...

2019-01-29 Thread Marko Tiikkaja
On Tue, Jan 29, 2019 at 6:26 PM Sven Berkvens-Matthijsse <
s...@postgresql.berkvens.net> wrote:

> On 29/01/2019 07.20, Tom Lane wrote:
> > Looking at the thread, it seems like Marko lost interest for some
> > reason, and never submitted a revised patch.
>
> That was my conclusion too, but I didn't know whether there had been
> some off-list discussion that eventually led to the abandonment of the
> patch and proposal.
>

There has not.  I just don't have the energy or interest to pursue this at
the moment.


.m


Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2019-01-29 Thread Ildar Musin
Hello,

The patch needs rebase as it doesn't apply to the current master. I applied
it
to the older commit to test it. It worked fine so far.

I found one bug though which would cause resolver to finish by timeout even
though there are unresolved foreign transactions in the list. The
`fdw_xact_exists()` function expects database id as the first argument and
xid
as the second. But everywhere it is called arguments specified in the
different
order (xid first, then dbid).  Also function declaration in header doesn't
match its definition.

There are some other things I found.
* In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
  declared as bool but used as integer.
* In fdwxact.c's module comment there are
`FdwXactRegisterForeignTransaction()`
  and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
  not there anymore.
* In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
  directory.

Couple of stylistic notes.
* In `FdwXactCtlData struct` there are both camel case and snake case naming
  used.
* In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
  `TransactionIdIsValid(xid)`.
* In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of
format
  string instead of being processed by `sprintf` as an extra argument.

I'll continue looking into the patch. Thanks!



On Tue, Nov 20, 2018 at 12:18 PM Masahiko Sawada 
wrote:

> On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada 
> wrote:
> >
> > On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada 
> wrote:
> > >
> > > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
> > > > >  wrote:
> > > > > >
> > > > > > Hello.
> > > > > >
> > > > > > # It took a long time to come here..
> > > > > >
> > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <
> sawada.m...@gmail.com> wrote in
> 
> > > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > > > > ...
> > > > > > > * Updated docs, added the new section "Distributed
> Transaction" at
> > > > > > > Chapter 33 to explain the concept to users
> > > > > > >
> > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact
> directory.
> > > > > > >
> > > > > > > * Some bug fixes.
> > > > > > >
> > > > > > > Please reivew them.
> > > > > >
> > > > > > I have some comments, with apologize in advance for possible
> > > > > > duplicate or conflict with others' comments so far.
> > > > >
> > > > > Thank youf so much for reviewing this patch!
> > > > >
> > > > > >
> > > > > > 0001:
> > > > > >
> > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
> > > > > > relation is modified. Isn't it needed when UNLOGGED tables are
> > > > > > modified? It may be better that we have dedicated classification
> > > > > > macro or function.
> > > > >
> > > > > I think even if we do atomic commit for modifying the an UNLOGGED
> > > > > table and a remote table the data will get inconsistent if the
> local
> > > > > server crashes. For example, if the local server crashes after
> > > > > prepared the transaction on foreign server but before the local
> commit
> > > > > and, we will lose the all data of the local UNLOGGED table whereas
> the
> > > > > modification of remote table is rollbacked. In case of persistent
> > > > > tables, the data consistency is left. So I think the keeping data
> > > > > consistency between remote data and local unlogged table is
> difficult
> > > > > and want to leave it as a restriction for now. Am I missing
> something?
> > > > >
> > > > > >
> > > > > > The flag is handled in heapam.c. I suppose that it should be done
> > > > > > in the upper layer considering coming pluggable storage.
> > > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..)
> > > > > >
> > > > >
> > > > > Yeah, or we can set the flag after heap_insert in ExecInsert.
> > > > >
> > > > > >
> > > > > > 0002:
> > > > > >
> > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How
> > > > > > about FdwXactAtomicCommitPartitcipants?
> > > > >
> > > > > +1, will fix it.
> > > > >
> > > > > >
> > > > > > Well, as the file comment of fdwxact.c,
> > > > > > FdwXactRegisterTransaction is called from FDW driver and
> > > > > > F_X_MarkForeignTransactionModified is called from executor. I
> > > > > > think that we should clarify who is responsible to the whole
> > > > > > sequence. Since the state of local tables affects, I suppose
> > > > > > executor is that. Couldn't we do the whole thing within executor
> > > > > > side?  I'm not sure but I feel that
> > > > > > F_X_RegisterForeignTransaction can be a part of
> > > > > > F_X_MarkForeignTransactionModified.  The callers of
> > > > > > MarkForeignTransactionModified can find whether the table is
> > > > > > involved in 2pc by 

Re: "could not reattach to shared memory" on buildfarm member dory

2019-01-29 Thread Heath Lord
On Thu, Jan 17, 2019 at 3:27 AM Noah Misch  wrote:

> On Sun, Dec 02, 2018 at 09:35:06PM -0800, Noah Misch wrote:
> > On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote:
> > > On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
> > > > Overall, I agree that neither of these approaches are exactly
> attractive.
> > > > We're paying a heck of a lot of performance or complexity to solve a
> > > > problem that shouldn't even be there, and that we don't understand
> well.
> > > > In particular, the theory that some privileged code is injecting a
> thread
> > > > into every new process doesn't square with my results at
> > > >
> https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us
> > > >
> > > > I think our best course of action at this point is to do nothing
> until
> > > > we have a clearer understanding of what's actually happening on dory.
> > > > Perhaps such understanding will yield an idea for a less painful fix.
> > >
> > > I see.
> >
> > Could one of you having a dory login use
> > https://live.sysinternals.com/Procmon.exe to capture process events
> during
> > backend startup?
>
> Ping.
>
> > The ideal would be one capture where startup failed reattach
> > and another where it succeeded, but having the successful run alone
> would be a
> > good start.  The procedure is roughly this:
> >
> > - Install PostgreSQL w/ debug symbols.
> > - Start a postmaster.
> > - procmon /nomonitor
> > - procmon "Filter" menu -> Enable Advanced Output
> > - Ctrl-l, add filter for "Process Name" is "postgres.exe"
> > - Ctrl-e (starts collecting data)
> > - psql (leave it running)
> > - After ~60s, Ctrl-e again in procmon (stops collecting data)
> > - File -> Save -> PML
> > - File -> Save -> XML, include stack traces, resolve stack symbols
> > - Compress the PML and XML files, and mail them here
>

Noah,
   I apologize for not responding earlier, but we attempting to capture the
information you requested.  However, where would you like us to pull the
install for PostgreSQL with the debug symbols in it?  We are aware of the
BigSQL and EDB installers, but are unsure if these contain the debug
symbols.  Currently this machine has nothing on it other than the necessary
dependencies to build postgres for the community.  If you could please give
us some more information on what you would like us to install to gather
this information to help debug the issue we are seeing we would really
appreciate it.  Also, if there is any additional information on what we
have installed on the server or how we are configured please just let us
know and we will get you that as soon as possible.  Thank you again for
your interest in this issue.

-Heath


Re: Follow-up on INSERT INTO ... SET ...

2019-01-29 Thread Sven Berkvens-Matthijsse

Hi Tom,

On 29/01/2019 07.20, Tom Lane wrote:

Sven Berkvens-Matthijsse  writes:

In 2016, a thread was started about implementing INSERT INTO ... SET ...
that included a patch and was basically ready for inclusion in
PostgreSQL. However, it seems as though it stagnated for some reason.
Does anybody remember this and is there perhaps someone who knows what
the current status is? If nobody is working on this any longer, I'd be
willing to try to revive the patch for the current code base.
The thread that I'm talking about can be found at:
https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5e...@joh.to


Looking at the thread, it seems like Marko lost interest for some
reason, and never submitted a revised patch.


That was my conclusion too, but I didn't know whether there had been 
some off-list discussion that eventually led to the abandonment of the 
patch and proposal.



I'm not really sure whether we'd want to support a nonstandard
syntax for this.  I can see that it'd have some usefulness for wide
tables, but is that enough of an argument to risk incompatibility
with future SQL-spec extensions?


I've seen mulitple concerns for this in some messages that I found while 
Googling. But this is something that always plays a role when one 
decides to deivate from the SQL standard, isn't it?


PostgreSQL would not be the first database system to support the INSERT 
INTO ... SET ... syntax, MySQL has had it for a very long time (not that 
I've ever used MySQL, but I gathered this from what I've Googled). I 
have no idea whether the SQL standard folks take that sort of thing into 
account when proposing new features for the SQL standard. But if they 
do, there is less risk of running into problems here because the syntax 
has already been available in the field for a very long time.



Looking at the patch itself, I'd raise two major complaints:

* It looks like the only supported syntax is "INSERT ... SET
set_clause_list", which I guess is meant to be functionally
the same as INSERT ... VALUES with a single values list.
That's not terribly compelling.  I'd expect to be able to
use this syntax for multiple inserted rows.  Maybe allow
something like INSERT ... SET ... FROM ..., where the set-list
entries can use variables emitted by the FROM clause?


Yes, I've thought about this myself. What I ended up thinking about was 
allowing both the syntax


INSERT INTO whatever SET a = 1, b = 2, c = 3;

and

INSERT INTO whatever SET (a = 1, b = 2, c = 3), (a = 2, b = 1, d = 5);

Then I decided that that isn't all that great. And I dropped the thought.

Thinking more about your proposal for INSERT INTO ... SET ... FROM ... 
something like the following comes to mind. It looks like a nice idea, 
but specifically for wide tables, for which this proposal would make the 
most sense, you end up writing things like:


INSERT INTO whatever SET a = a, b = b, c = c, d = d, e = e
   FROM (SELECT 1 AS a, 2 AS b, 3 AS c, 4 AS d, 5 AS e UNION ALL
 SELECT 2 AS a, 1 AS b, 6 AS c, 8 AS d, 0 AS e);

Which does not look very nice in my opinion. The SELECT ... UNION ALL 
SELECT is not really the problem here because the rows could've come 
from some other table or a function call for example. The mostly silly 
SET list is what bugs me personally here.


I would already be very happy if the INSERT INTO syntax would support 
something like this:


INSERT INTO whatever NATURAL SELECT 1 AS c, 2 AS a, 3 AS b, 4 AS d;

Where the NATURAL (or some other keyword) would mean: look at the 
returned columns from the query (or VALUES) and map the values in the 
resulting rows to the correct columns in the target table, so that it 
doesn't matter in which order you select them. Produced columns that 
don't exist in the target table would produce an error. Missing columns 
would use defaults in the target table as usual.


Anybody with any thoughts, ideas and/or concerns about this last 
proposal with the NATURAL keyword?


The only thing it would not support is explicit DEFAULT values, which 
VALUES does allow in an INSERT INTO statement. Not much of a concern 
though, INSERT INTO ... SELECT ... doesn't allow it either.



* If I'm reading it right, it blows off multiple-assignment
syntax -- that is, "SET (a,b,c) = row-valued-expr" -- with
the comment

+ * This is different from set_clause_list used in UPDATE because the SelectStmt
+ * syntax already does everything you might want to do in an in INSERT.

I'm unimpressed with that reasoning, because the SQL-standard
syntax already does everything you might want to do with this.


Yes, I agree, why specifically disallow some behavior because something 
else already supplies that behavior when you're proposing something that 
doesn't really supply any new functionality itself.



Since this patch was originally submitted, we sweated pretty
hard to upgrade our support of UPDATE's multiple-assignment
syntax so that it handles all interesting cases; so I'd want

Re: Why does execReplication.c lock tuples?

2019-01-29 Thread Petr Jelinek
On 29/01/2019 17:14, Andres Freund wrote:
> Hi
> 
> On January 29, 2019 8:04:55 AM PST, Petr Jelinek 
>  wrote:
>> On 29/01/2019 16:28, Andres Freund wrote:
>>> Hi,
>>>
>>> On January 29, 2019 4:19:52 AM PST, Petr Jelinek
>>  wrote:
 Hi,

 On 20/01/2019 21:03, Andres Freund wrote:
> Hi,
>
> Currently RelationFindReplTupleByIndex(),
>> RelationFindReplTupleSeq()
> lock the found tuple. I don't quite get what that achieves - why
 isn't
> dealing with concurrency in the table_update/delete calls at the
> callsites sufficient? As far as I can tell there's no meaningful
> concurrency handling in the heap_lock_tuple() paths, so it's not
>> like
 we
> follow update chains or anything. 
>

 Yeah that's leftover from the conflict detection/handling code that
>> I
 stripped away to keep the patched manageable size-wise. As things
>> stand
 now we could remove that and use normal heap_update instead of
>> simple
 variant. It'll be likely be needed again if we add conflict handling
>> in
 the future, but perhaps we could be smarter about it then (i.e. I
>> can
 imagine that it will be per table anyway, not necessarily default
 behavior).
>>>
>>> Why does conflict handling need the unconditional lock? Wouldn't just
>> doing that after an initial heap_update returned HeapTupleUpdated make
>> more sense?  And wouldn't it need to reckeck the row afterwards as
>> well?
>>>
>>
>> To prevent tuple changing from under the conflict resolution logic - we
>> need to make sure that whatever tuple the conflict resolution logic
>> gets
>> is the current one. If the tuple is locked you won't get the
>> HeapTupleUpdated in heap_update anymore (which is why we can use
>> simple_heap_update).
>>
>> In any case we don't have conflict resolution at the moment so it's
>> probably moot right now.
> 
> Not sure why that's relevant - what you'd do is to attempt the update, if it 
> succeeds: great. If not, then you'd do the lock tuple and after that do the 
> conflict resolution. Less WAL and cpu for the common case, same WAL as now 
> for the conflict, with just a small bit of additional CPU costs for the 
> latter.
> 

But the conflict is not about local node concurrency, rather multi-node
one (ie, merging data from multiple nodes) so you can't depend on
heap_update return value alone. You usually need to figure out if there
is conflict before you even do update (based on tuple metadata/data)
which is why the lock is necessary.

> Either way, right now it's definitely superfluous...
> 

Agreed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Why does execReplication.c lock tuples?

2019-01-29 Thread Andres Freund
Hi

On January 29, 2019 8:04:55 AM PST, Petr Jelinek  
wrote:
>On 29/01/2019 16:28, Andres Freund wrote:
>> Hi,
>> 
>> On January 29, 2019 4:19:52 AM PST, Petr Jelinek
> wrote:
>>> Hi,
>>>
>>> On 20/01/2019 21:03, Andres Freund wrote:
 Hi,

 Currently RelationFindReplTupleByIndex(),
>RelationFindReplTupleSeq()
 lock the found tuple. I don't quite get what that achieves - why
>>> isn't
 dealing with concurrency in the table_update/delete calls at the
 callsites sufficient? As far as I can tell there's no meaningful
 concurrency handling in the heap_lock_tuple() paths, so it's not
>like
>>> we
 follow update chains or anything. 

>>>
>>> Yeah that's leftover from the conflict detection/handling code that
>I
>>> stripped away to keep the patched manageable size-wise. As things
>stand
>>> now we could remove that and use normal heap_update instead of
>simple
>>> variant. It'll be likely be needed again if we add conflict handling
>in
>>> the future, but perhaps we could be smarter about it then (i.e. I
>can
>>> imagine that it will be per table anyway, not necessarily default
>>> behavior).
>> 
>> Why does conflict handling need the unconditional lock? Wouldn't just
>doing that after an initial heap_update returned HeapTupleUpdated make
>more sense?  And wouldn't it need to reckeck the row afterwards as
>well?
>> 
>
>To prevent tuple changing from under the conflict resolution logic - we
>need to make sure that whatever tuple the conflict resolution logic
>gets
>is the current one. If the tuple is locked you won't get the
>HeapTupleUpdated in heap_update anymore (which is why we can use
>simple_heap_update).
>
>In any case we don't have conflict resolution at the moment so it's
>probably moot right now.

Not sure why that's relevant - what you'd do is to attempt the update, if it 
succeeds: great. If not, then you'd do the lock tuple and after that do the 
conflict resolution. Less WAL and cpu for the common case, same WAL as now for 
the conflict, with just a small bit of additional CPU costs for the latter.

Either way, right now it's definitely superfluous...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-29 Thread Nick B
Greetings,
I also would like to thank everyone for looking into this.

On Sat, Jan 26, 2019 at 01:45:46PM +0100, Magnus Hagander wrote:
> One workaround you could perhaps look at here is to run pg_basebackup
> with --no-sync. That way there will be no fsyncs issued while running. You
> will then of course have to take care of syncing all the files to disk
> after it's done, but a network filesystem might be happier in dealing with
> a large "batch-sync" like that rather than piece-by-piece sync.

Thanks for the pointer. I actually was not aware of the existence of this
flag. I've ran two rounds of tests with --no-sync and backup failed at a
much later point in time, which suggests that the bottleneck is in fact the
metadata server of ceph. We're now looking into ways of improving this.
(This is a 15TB cluster with a few hundred thousands tables which on
average generates 4 WAL segments per second, so throttling transfer rate is
not a good option either).

On Sat, Jan 26, 2019 at 4:23 AM Michael Paquier
 wrote:
> The docs could be improved to describe that better..

I had an off-list discussion of a possible documentation update with
Stephen Frost and he voiced an opinion that the behaviour I was trying to
describe sounds a lot like a bug and documenting that is not a good
practice.

Upon further examination of WalSndKeepaliveIfNecessary I found out that the
implementation of "requesting an immediate reply" is done by setting the
socket into non-blocking mode and issuing a flush. I find it hard to
believe there is a scenario where client can react to that keep-alive on
time (unless of course I misunderstood something). So the question is, will
we ever wait the actual wal_sender_timeout before terminating the
connection?

Regards,
Nick.


Re: Why does execReplication.c lock tuples?

2019-01-29 Thread Petr Jelinek
On 29/01/2019 16:28, Andres Freund wrote:
> Hi,
> 
> On January 29, 2019 4:19:52 AM PST, Petr Jelinek 
>  wrote:
>> Hi,
>>
>> On 20/01/2019 21:03, Andres Freund wrote:
>>> Hi,
>>>
>>> Currently RelationFindReplTupleByIndex(), RelationFindReplTupleSeq()
>>> lock the found tuple. I don't quite get what that achieves - why
>> isn't
>>> dealing with concurrency in the table_update/delete calls at the
>>> callsites sufficient? As far as I can tell there's no meaningful
>>> concurrency handling in the heap_lock_tuple() paths, so it's not like
>> we
>>> follow update chains or anything. 
>>>
>>
>> Yeah that's leftover from the conflict detection/handling code that I
>> stripped away to keep the patched manageable size-wise. As things stand
>> now we could remove that and use normal heap_update instead of simple
>> variant. It'll be likely be needed again if we add conflict handling in
>> the future, but perhaps we could be smarter about it then (i.e. I can
>> imagine that it will be per table anyway, not necessarily default
>> behavior).
> 
> Why does conflict handling need the unconditional lock? Wouldn't just doing 
> that after an initial heap_update returned HeapTupleUpdated make more sense?  
> And wouldn't it need to reckeck the row afterwards as well?
> 

To prevent tuple changing from under the conflict resolution logic - we
need to make sure that whatever tuple the conflict resolution logic gets
is the current one. If the tuple is locked you won't get the
HeapTupleUpdated in heap_update anymore (which is why we can use
simple_heap_update).

In any case we don't have conflict resolution at the moment so it's
probably moot right now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Allowing extensions to supply operator-/function-specific info

2019-01-29 Thread Tom Lane
Simon Riggs  writes:
> On Tue, 29 Jan 2019 at 09:55, Tom Lane  wrote:
>> I have no particular
>> interest in working on that right now, because it doesn't respond to
>> what I understand PostGIS' need to be, and there are only so many
>> hours in the day.  But maybe it could be made workable in the future.

> I thought the whole exercise was about adding generic tools for everybody
> to use.

Well, I'm building infrastructure plus a small subset of what might
someday sit atop that infrastructure.  I'm not prepared to commit
right now to building stuff I can't finish for v12.

> You said PostGIS was looking to
> "automatically convert WHERE clauses into lossy index quals." which sounds
> very similar to what I outlined.

As I understand it, what they have is complex WHERE clauses from which
they want to extract clauses usable with simple (non-expression) indexes.
The case you seem to be worried about is the reverse: complicated index
definition and simple WHERE clause.  I think we're agreed that these two
cases can't be solved with the very same facility.  The support-function
mechanism probably can be used to provide extensibility for logic that
tries to attack the complicated-index case, but its mere existence won't
cause that logic to spring into being.

regards, tom lane



Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2019-01-29 Thread Christoph Berg
Re: Tom Lane 2019-01-29 <27653.1548774...@sss.pgh.pa.us>
> > It took a while to notice, but this change does break 8.2's initdb if
> > libpq5 from PG12 is installed:
> > $ /usr/lib/postgresql/8.2/bin/initdb
> > /usr/lib/postgresql/8.2/bin/initdb: symbol lookup error: 
> > /usr/lib/postgresql/8.2/bin/initdb: undefined symbol: pqsignal
> 
> Well, 8.2 is *very* long out of support, and there are plenty of
> nasty bugs you're at risk of if you keep using it.  So I don't
> find this to be a good argument for contorting what we do in v12.

I try to keep the old versions alive on apt.pg.o for Debian unstable (only)
so I have something to grab when customers ask questions about the old
versions. We are probably lucky that 8.2 broke only now, and leaving
it broken is a fair thing. I was just mentioning for completeness, I
didn't mean to insist on fixing it.

> If you really want to keep using 8.2 (and even make new installations
> with it!?), you could back-patch the 8.3 patch that made sure that
> initdb didn't absorb pqsignal, pg_encoding_to_char, etc from libpq.
> It looks like what you'd need is a portion of the Makefile changes
> from 8468146b03c87f86162cee62e0de25f6e2d24b56.

I might give that a shot, but if it's too hard, dropping 8.2 "support"
is not a problem. Thanks for digging up the commit hash.

Christoph



Re: [PATCH] Log PostgreSQL version number on startup

2019-01-29 Thread Christoph Berg
Re: Peter Eisentraut 2019-01-16 
<92bfdfdf-4164-aec5-4e32-c26e67821...@2ndquadrant.com>
> > Why don't we start the logging collector before opening the sockets?
> 
> Specifically, something like the attached.
> 
> This keeps the dynamic module loading before the logging collector
> start, so we see those error messages on stderr, but then the setting up
> of the sockets would get logged.

This works nicely, so +1.

I'm attaching your patch as 0001 and my rebased one on top of it as
0002.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
>From d876be48c3f5beca5da6cb19e804ab0307409f80 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 29 Jan 2019 16:26:04 +0100
Subject: [PATCH 1/2] postmaster: Start syslogger earlier

---
 src/backend/postmaster/postmaster.c | 120 ++--
 1 file changed, 60 insertions(+), 60 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3052bbbc21..119c01d745 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -992,6 +992,66 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Initialize pipe (or process handle on Windows) that allows children to
+	 * wake up from sleep on postmaster death.
+	 */
+	InitPostmasterDeathWatchHandle();
+
+	/*
+	 * Forcibly remove the files signaling a standby promotion request.
+	 * Otherwise, the existence of those files triggers a promotion too early,
+	 * whether a user wants that or not.
+	 *
+	 * This removal of files is usually unnecessary because they can exist
+	 * only during a few moments during a standby promotion. However there is
+	 * a race condition: if pg_ctl promote is executed and creates the files
+	 * during a promotion, the files can stay around even after the server is
+	 * brought up to new master. Then, if new standby starts by using the
+	 * backup taken from that master, the files can exist at the server
+	 * startup and should be removed in order to avoid an unexpected
+	 * promotion.
+	 *
+	 * Note that promotion signal files need to be removed before the startup
+	 * process is invoked. Because, after that, they can be used by
+	 * postmaster's SIGUSR1 signal handler.
+	 */
+	RemovePromoteSignalFiles();
+
+	/* Do the same for logrotate signal file */
+	RemoveLogrotateSignalFiles();
+
+	/* Remove any outdated file holding the current log filenames. */
+	if (unlink(LOG_METAINFO_DATAFILE) < 0 && errno != ENOENT)
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not remove file \"%s\": %m",
+		LOG_METAINFO_DATAFILE)));
+
+	/*
+	 * If enabled, start up syslogger collection subprocess
+	 */
+	SysLoggerPID = SysLogger_Start();
+
+	/*
+	 * Reset whereToSendOutput from DestDebug (its starting state) to
+	 * DestNone. This stops ereport from sending log messages to stderr unless
+	 * Log_destination permits.  We don't do this until the postmaster is
+	 * fully launched, since startup failures may as well be reported to
+	 * stderr.
+	 *
+	 * If we are in fact disabling logging to stderr, first emit a log message
+	 * saying so, to provide a breadcrumb trail for users who may not remember
+	 * that their logging is configured to go somewhere else.
+	 */
+	if (!(Log_destination & LOG_DESTINATION_STDERR))
+		ereport(LOG,
+(errmsg("ending log output to stderr"),
+ errhint("Future log output will go to log destination \"%s\".",
+		 Log_destination_string)));
+
+	whereToSendOutput = DestNone;
+
 	/*
 	 * Establish input sockets.
 	 *
@@ -1183,12 +1243,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	set_stack_base();
 
-	/*
-	 * Initialize pipe (or process handle on Windows) that allows children to
-	 * wake up from sleep on postmaster death.
-	 */
-	InitPostmasterDeathWatchHandle();
-
 #ifdef WIN32
 
 	/*
@@ -1242,60 +1296,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	RemovePgTempFiles();
 
-	/*
-	 * Forcibly remove the files signaling a standby promotion request.
-	 * Otherwise, the existence of those files triggers a promotion too early,
-	 * whether a user wants that or not.
-	 *
-	 * This removal of files is usually unnecessary because they can exist
-	 * only during a few moments during a standby promotion. However there is
-	 * a race condition: if pg_ctl promote is executed and creates the files
-	 * during a promotion, the files can stay around even after the server is
-	 * brought up to new master. Then, if new standby starts by using the
-	 * backup taken from that master, the files can exist at the server
-	 * startup and should be removed in order to avoid an unexpected
-	 * promotion.
-	 *
-	 * Note that promotion signal files need to be 

Re: Rename nodes/relation.h => nodes/pathnodes.h ?

2019-01-29 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-28, Tom Lane wrote:
>> (There was some mention of trying to split relation.h into multiple
>> files, but I fail to see any advantage in that.)

> Hmm, nodes/relation.h includes lots of other files and is widely
> included.

Yup, that's why I'm trying to reduce the number of files that include it,
over in the other thread.

> If we can split it usefully, I vote for that.  However, I
> failed to find any concrete proposal for doing that.  I don't have one
> ATM but I'd like to keep the door open for it happening at some point.

The door's always open, of course, but I don't see any point in waiting
around for a hypothetical redesign.

> I do like planner/pathnodes.h as a name, FWIW.

Yeah, I think I'll go with pathnodes.h.  We'd probably keep using that
for the Path node typedefs themselves, even if somebody comes up with
a design for splitting out other things.

regards, tom lane



Re: Why does execReplication.c lock tuples?

2019-01-29 Thread Andres Freund
Hi,

On January 29, 2019 4:19:52 AM PST, Petr Jelinek  
wrote:
>Hi,
>
>On 20/01/2019 21:03, Andres Freund wrote:
>> Hi,
>> 
>> Currently RelationFindReplTupleByIndex(), RelationFindReplTupleSeq()
>> lock the found tuple. I don't quite get what that achieves - why
>isn't
>> dealing with concurrency in the table_update/delete calls at the
>> callsites sufficient? As far as I can tell there's no meaningful
>> concurrency handling in the heap_lock_tuple() paths, so it's not like
>we
>> follow update chains or anything. 
>> 
>
>Yeah that's leftover from the conflict detection/handling code that I
>stripped away to keep the patched manageable size-wise. As things stand
>now we could remove that and use normal heap_update instead of simple
>variant. It'll be likely be needed again if we add conflict handling in
>the future, but perhaps we could be smarter about it then (i.e. I can
>imagine that it will be per table anyway, not necessarily default
>behavior).

Why does conflict handling need the unconditional lock? Wouldn't just doing 
that after an initial heap_update returned HeapTupleUpdated make more sense?  
And wouldn't it need to reckeck the row afterwards as well?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: COPY FROM WHEN condition

2019-01-29 Thread Andres Freund
Hi,

On January 29, 2019 6:25:59 AM PST, Tomas Vondra  
wrote:
>On 1/29/19 8:18 AM, David Rowley wrote:
>> ...
>> Here are my performance tests of with and without your change to the
>> memory contexts (I missed where you posted your results).
>> 
>> $ cat bench.pl
>> for (my $i=0; $i < 8912891; $i++) {
>> print "1\n1\n2\n2\n";
>> }
>> 36a1281f86c: (with your change)
>> 
>> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
>> COPY 35651564
>> Time: 26825.142 ms (00:26.825)
>> Time: 27202.117 ms (00:27.202)
>> Time: 26266.705 ms (00:26.267)
>> 
>> 4be058fe9ec: (before your change)
>> 
>> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
>> COPY 35651564
>> Time: 25645.460 ms (00:25.645)
>> Time: 25698.193 ms (00:25.698)
>> Time: 25737.251 ms (00:25.737)
>> 
>
>How do I reproduce this? I don't see this test in the thread you
>linked,
>so I'm not sure how many partitions you were using, what's the
>structure
>of the table etc.

I think I might have a patch addressing the problem incidentally. For pluggable 
storage I slotified copy.c, which also removes the first heap_form_tuple. Quite 
possible that nothing more is needed. I've removed the batch context altogether 
in yesterday's rebase, there was no need anymore.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-29 Thread Christoph Berg
Re: Michael Paquier 2019-01-23 <20190123004722.ge3...@paquier.xyz>
> >> Largely because I think it's an independent patch from the CXXOPT need
> >> from Christopher / Debian packaging.  It's a larger patch, that needs
> >> more docs etc.  If whoever applies that wants to backpatch it - I'm not
> >> going to protest, I just wouldn't myself, unless somebody pipes up that
> >> it'd help them.
> > 
> > Ah, I see.  No arguments against.

Fwiw I'm not attached to using COPT and friends, I just happend to
pick these because that was one way that worked. With what I've
learned now, PG_*FLAGS is the better approach.

> The new PGXS flags would be I think useful to make sure
> that things for CFLAGS and LDFLAGS get propagated without having to
> hijack the original flags, so I can handle that part.  Which one would
> be wanted though?
> - PG_CXXFLAGS
> - PG_LDFLAGS
> - PG_CFLAGS
> 
> I'd see value in all of them, still everybody has likely a different
> opinion, so I would not mind discarding the ones are not thought as
> that much useful.  New PGXS infrastructure usually finds only its way
> on HEAD, so I'd rather not back-patch that part.  No issues with the
> back-patch portion for CXXOPT from me as that helps Debian.

The attached patch adds these three.

Re backpatching, I would at least need them in PG11 because that's
what is going to be released with Debian buster. An official backpatch
to all supported versions would be nice, but I could also sneak in
that change into the Debian packages without breaking anything.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
>From 04bc55089a46697fec721aca6225a7ca1db6c66f Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 13 Nov 2018 11:35:26 +0100
Subject: [PATCH] Add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables

Add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables to pgxs.mk which
will append to the corresponding make variables. Notably, there was
previously no way to pass custom CXXFLAGS to 3rd party extension module
builds. (COPT and PROFILE support only CFLAGS and LDFLAGS)
---
 doc/src/sgml/extend.sgml | 27 +++
 src/makefiles/pgxs.mk| 12 
 2 files changed, 39 insertions(+)

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index a6b77c1cfe..a94b216e77 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -1358,6 +1358,33 @@ include $(PGXS)
   
  
 
+ 
+  PG_CFLAGS
+  
+   
+will be added to CFLAGS
+   
+  
+ 
+
+ 
+  PG_CXXFLAGS
+  
+   
+will be added to CXXFLAGS
+   
+  
+ 
+
+ 
+  PG_LDFLAGS
+  
+   
+will be added to LDFLAGS
+   
+  
+ 
+
  
   PG_LIBS
   
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index d214cb9cf2..56a8b26183 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -53,6 +53,9 @@
 # tests require special configuration, or don't use pg_regress
 #   EXTRA_CLEAN -- extra files to remove in 'make clean'
 #   PG_CPPFLAGS -- will be added to CPPFLAGS
+#   PG_CFLAGS -- will be added to CFLAGS
+#   PG_CXXFLAGS -- will be added to CXXFLAGS
+#   PG_LDFLAGS -- will be added to LDFLAGS
 #   PG_LIBS -- will be added to PROGRAM link line
 #   PG_LIBS_INTERNAL -- same, for references to libraries within build tree
 #   SHLIB_LINK -- will be added to MODULE_big link line
@@ -119,6 +122,15 @@ endif
 ifdef PG_CPPFLAGS
 override CPPFLAGS := $(PG_CPPFLAGS) $(CPPFLAGS)
 endif
+ifdef PG_CFLAGS
+override CFLAGS := $(PG_CFLAGS) $(CFLAGS)
+endif
+ifdef PG_CXXFLAGS
+override CXXFLAGS := $(PG_CXXFLAGS) $(CXXFLAGS)
+endif
+ifdef PG_LDFLAGS
+override LDFLAGS := $(PG_LDFLAGS) $(LDFLAGS)
+endif
 
 # logic for HEADERS_* stuff
 
-- 
2.20.1



Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2019-01-29 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane 2018-09-28 <20671.1538142...@sss.pgh.pa.us>
>> I proposed in
>> https://www.postgresql.org/message-id/19581.1538077...@sss.pgh.pa.us
>> that we should remove pqsignal, as well as
>> pg_utf_mblen
>> pg_encoding_to_char
>> pg_char_to_encoding
>> pg_valid_server_encoding
>> pg_valid_server_encoding_id
>> 
>> from libpq's exports, on the grounds that (a) nobody should be using
>> those (they're undocumented as exports), and (b) anybody who is using
>> them should get them from libpgport/libpgcommon instead.  Thoughts?

> It took a while to notice, but this change does break 8.2's initdb if
> libpq5 from PG12 is installed:
> $ /usr/lib/postgresql/8.2/bin/initdb
> /usr/lib/postgresql/8.2/bin/initdb: symbol lookup error: 
> /usr/lib/postgresql/8.2/bin/initdb: undefined symbol: pqsignal

Well, 8.2 is *very* long out of support, and there are plenty of
nasty bugs you're at risk of if you keep using it.  So I don't
find this to be a good argument for contorting what we do in v12.

If you really want to keep using 8.2 (and even make new installations
with it!?), you could back-patch the 8.3 patch that made sure that
initdb didn't absorb pqsignal, pg_encoding_to_char, etc from libpq.
It looks like what you'd need is a portion of the Makefile changes
from 8468146b03c87f86162cee62e0de25f6e2d24b56.

BTW, I noticed this in that patch's commit message:

Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using.  To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.

So it was wrong of me to propose moving pg_encoding_to_char() et al
into libpgcommon: doing so would've re-introduced the hazard of
client code misinterpreting the encoding IDs returned by
PQclientEncoding() (and how the heck did I miss that libpq.sgml
does document that function for exactly that purpose?).

Fortunately, I didn't do that ...

regards, tom lane



Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread John Naylor
On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  wrote:

> You can find this change in attached patch.  Then, I ran the make
> check in src/bin/pgbench multiple times using test_conc_insert.sh.
> You can vary the number of times the test should run, if you are not
> able to reproduce it with this.
>
> The attached patch (clear_local_map_if_exists_1.patch) atop the main
> patch fixes the issue for me.  Kindly verify the same.

I got one failure in 50 runs. With the new patch, I didn't get any
failures in 300 runs.


-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FROM WHEN condition

2019-01-29 Thread Tomas Vondra
On 1/29/19 8:18 AM, David Rowley wrote:
> ...
> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
> 
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
> 
> 4be058fe9ec: (before your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
> 

How do I reproduce this? I don't see this test in the thread you linked,
so I'm not sure how many partitions you were using, what's the structure
of the table etc.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Build src/port files as a library with -fPIC, and use that in li

2019-01-29 Thread Christoph Berg
Re: Tom Lane 2018-09-28 <20671.1538142...@sss.pgh.pa.us>
> >> I proposed in
> >> https://www.postgresql.org/message-id/19581.1538077...@sss.pgh.pa.us
> >> 
> >> that we should remove pqsignal, as well as
> >> pg_utf_mblen
> >> pg_encoding_to_char
> >> pg_char_to_encoding
> >> pg_valid_server_encoding
> >> pg_valid_server_encoding_id
> >> 
> >> from libpq's exports, on the grounds that (a) nobody should be using
> >> those (they're undocumented as exports), and (b) anybody who is using
> >> them should get them from libpgport/libpgcommon instead.  Thoughts?
> 
> > I'm fine with that if we say (a) should be true, and even if it is
> > not, (b) is an easy workaround. Bumping the libpq SONAME just because
> > of this seems excessive.
> 
> Yeah, if anyone insists that this requires a soname bump, I'd probably
> look for another solution.  Making the makefiles a bit cleaner is not
> worth the churn that would cause, IMO.

It took a while to notice, but this change does break 8.2's initdb if
libpq5 from PG12 is installed:

$ /usr/lib/postgresql/8.2/bin/initdb
/usr/lib/postgresql/8.2/bin/initdb: symbol lookup error: 
/usr/lib/postgresql/8.2/bin/initdb: undefined symbol: pqsignal

postgres 8.2 itself seems to work fine.

Christoph



Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-29 Thread Oleksii Kliukin


> On 29. Jan 2019, at 10:45, Magnus Hagander  wrote:
> 
> On Tue, Jan 29, 2019 at 6:19 AM Michael Paquier  > wrote:
> On Mon, Jan 28, 2019 at 02:00:59PM +0100, Alex Kliukin wrote:
> > While reading the doc page for the pg_basebackup, I've been confused
> > by the fact that it says WAL files will be written to .tarballs
> > (either base.tar or pg_wal.tar) when pg_basebackup is instructed to
> > stream WALs alongside the backup itself. I think it makes sense to
> > elaborate that it only happens when tar format is specified (doc
> > patch is attached).
> 
> Agreed.  The current wording can be confusing depending on the format,
> and your suggestion looks right.  Any opinions from others?
> 
> Agreed, definitely confusing.
> 
> Since you also agreed on it, I went ahead and pushed (with backpatch).

Thanks a lot (and to Michael as well for looking into it)!

Cheers,
Oleksii

Re: move hash_any to utils/hash/hashfn.c

2019-01-29 Thread Alvaro Herrera
Hello

On 2019-Jan-25, Andres Freund wrote:

> I hate the current split quite a bit, albeit for somewhat different
> reasons. We make things like tag_hash, uint32_hash unnecessarily more
> expensive by adding an entirely useless external function call. And
> some of these can be fairly hot (e.g. for syscache).  So yea, let's
> move this stuff together.

Here's a patch, but I haven't done anything that would change tag_hash
et al.  Looking at the new hashfn.s, it looks like this:

tag_hash:
.LFB42:
.loc 1 681 0
.cfi_startproc
.LVL310:
.loc 1 682 0
callhash_any
.LVL311:
.loc 1 684 0
rep ret
.cfi_endproc

In master, it looks like this instead:

tag_hash:
.LFB96:
.loc 1 53 0
.cfi_startproc
.LVL5:
subq$8, %rsp
.cfi_def_cfa_offset 16
.loc 1 54 0
callhash_any@PLT
.LVL6:
.loc 1 56 0
addq$8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc


I don't know if the change is significant.  Obviously we lost the
subq/addq (probably nothing to be excited about) but there's also the
@PLT in the call ... I don't know what that means.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8422f23e1e04400806f36672e14811c5fde95cb5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 29 Jan 2019 04:34:25 -0300
Subject: [PATCH 1/2] move hash_any

---
 contrib/citext/citext.c |   2 +-
 contrib/hstore/hstore_op.c  |   2 +-
 contrib/pg_stat_statements/pg_stat_statements.c |   2 +-
 contrib/sepgsql/uavc.c  |   2 +-
 src/backend/access/common/tupdesc.c |   1 -
 src/backend/access/hash/hashfunc.c  | 627 +--
 src/backend/access/tablesample/bernoulli.c  |   2 +-
 src/backend/access/tablesample/system.c |   2 +-
 src/backend/catalog/pg_publication.c|   1 -
 src/backend/executor/execGrouping.c |   1 -
 src/backend/executor/nodeSamplescan.c   |   2 +-
 src/backend/lib/bloomfilter.c   |   2 +-
 src/backend/nodes/bitmapset.c   |   3 +-
 src/backend/storage/file/sharedfileset.c|   4 +-
 src/backend/tsearch/ts_typanalyze.c |   2 +-
 src/backend/utils/adt/acl.c |   2 +-
 src/backend/utils/adt/arrayfuncs.c  |   1 -
 src/backend/utils/adt/date.c|   2 +-
 src/backend/utils/adt/jsonb_gin.c   |   2 +-
 src/backend/utils/adt/jsonb_util.c  |   2 +-
 src/backend/utils/adt/mac.c |   2 +-
 src/backend/utils/adt/mac8.c|   2 +-
 src/backend/utils/adt/network.c |   2 +-
 src/backend/utils/adt/numeric.c |   2 +-
 src/backend/utils/adt/pg_lsn.c  |   1 -
 src/backend/utils/adt/rangetypes.c  |   3 +-
 src/backend/utils/adt/tid.c |   2 +-
 src/backend/utils/adt/timestamp.c   |   1 -
 src/backend/utils/adt/uuid.c|   2 +-
 src/backend/utils/adt/varchar.c |   3 +-
 src/backend/utils/adt/varlena.c |   2 +-
 src/backend/utils/cache/catcache.c  |   1 -
 src/backend/utils/cache/relcache.c  |   1 -
 src/backend/utils/hash/hashfn.c | 632 +++-
 src/backend/utils/resowner/resowner.c   |   3 +-
 src/include/access/hash.h   |  17 -
 src/include/utils/hashutils.h   |  19 +
 37 files changed, 680 insertions(+), 679 deletions(-)

diff --git a/contrib/citext/citext.c b/contrib/citext/citext.c
index 24ceeb11fc..a4adafe895 100644
--- a/contrib/citext/citext.c
+++ b/contrib/citext/citext.c
@@ -3,10 +3,10 @@
  */
 #include "postgres.h"
 
-#include "access/hash.h"
 #include "catalog/pg_collation.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/hashutils.h"
 #include "utils/varlena.h"
 
 PG_MODULE_MAGIC;
diff --git a/contrib/hstore/hstore_op.c b/contrib/hstore/hstore_op.c
index a915215fb6..ec3de5e03f 100644
--- a/contrib/hstore/hstore_op.c
+++ b/contrib/hstore/hstore_op.c
@@ -3,11 +3,11 @@
  */
 #include "postgres.h"
 
-#include "access/hash.h"
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "utils/builtins.h"
+#include "utils/hashutils.h"
 #include "utils/memutils.h"
 
 #include "hstore.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177ebaa2c..ef70475813 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -61,7 +61,6 @@
 #include 
 #include 
 
-#include "access/hash.h"
 #include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -78,6 +77,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include 

Re: pg_dump multi VALUES INSERT

2019-01-29 Thread Alvaro Herrera
On 2019-Jan-23, David Rowley wrote:

> On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera  wrote:
> > Is it possible to avoid the special case for 0 columns by using the
> > UNION ALL syntax I showed?
> 
> It would be possible, but my thoughts are that we're moving away from
> the SQL standard by doing so.

Ah, that's a good point that I missed -- I agree with your reasoning.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-29 Thread Michael Paquier
On Tue, Jan 29, 2019 at 10:45:34AM +0100, Magnus Hagander wrote:
> Since you also agreed on it, I went ahead and pushed (with backpatch).

Thanks for taking care of it, Magnus.
--
Michael


signature.asc
Description: PGP signature


Re: pg_stat_ssl additions

2019-01-29 Thread Kyotaro HORIGUCHI
At Tue, 29 Jan 2019 13:50:17 +0900, Michael Paquier  wrote 
in <20190129045017.gc3...@paquier.xyz>
> On Tue, Jan 29, 2019 at 12:18:29PM +0900, Kyotaro HORIGUCHI wrote:
> > At Mon, 28 Jan 2019 14:53:43 +0100, Peter Eisentraut 
> >  wrote in 
> > <24783370-5acd-e0f3-8eb7-7f42ff2a0...@2ndquadrant.com>
> >> This is strange.  The tests work for me, and also on the cfbot.  The
> > 
> > Agreed. It seemed so also to me.
> 
> The tests look sane to me for what it's worth.
> 
> > When initializing SSL context, it picks up the root certificate
> > from my home directory, not in test installation and I had one
> > there. It is not based on $HOME but pwent so it is unchangeable
> > (and it is the right design for the purpose).
> 
> Oops.  I agree that the tests ought to be as much isolated as
> possible, without optionally fetching things depending on the user
> environment.
> 
> > +# ssl-related properties may defautly set to the files in the users'
> > +# environment. Explicitly provide them a value so that they don't
> > +# point a valid file accidentially. Some other common properties are
> > +# set here together.
> > +# Attach this at the head of $common_connstr.
> > +my $def_connstr = "user=ssltestuser dbname=trustdb sslcert=invalid 
> > sslkey=invalid sslrootcert=invalid sslcrl=invalid ";
> > +
> 
> Maybe it is better to just put that in test_connect_ok and
> test_connect_fails for only the SSL parameters?  I don't see much
> point in enforcing dbname and user.

Hmm. It is the first thing I did for the issue. Anyway..

Of course dbname and user are not necessary to fix it. Ok, I
attached two patches. The first applies on the current master and
prevent psql from loading ssl-related files from out of testing
environment. The second applies on the 0002 patch and prevent the
patch from loading a wrong sslrootcert.

> >  # The server should not accept non-SSL connections.
> >  test_connect_fails(
> > @@ -185,7 +192,7 @@ test_connect_ok(
> >  # Check that connecting with verify-full fails, when the hostname doesn't
> >  # match the hostname in the server's certificate.
> >  $common_connstr =
> > -  "user=ssltestuser dbname=trustdb sslcert=invalid 
> > sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";
> > +  $def_connstr."sslrootcert=ssl/root+server_ca.crt 
> > hostaddr=$SERVERHOSTADDR";
> 
> I think this is bad, inconsistent style.  Adding the variable within
> the quoted section is fine, as much as moving SERVERHOSTADDR out.  But
> mixing styles is not.

I Understood. Thanks for the suggestion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From a3cb52ba165074db1d2910faa8e58030d84fa7c9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 29 Jan 2019 21:06:58 +0900
Subject: [PATCH 1/4] Prevent from loading ssl files out of testing
 environment.

SSL test script can let psql load SSL-related files out of the testing
environemnt. Fix it by explicitly setting sslcert, sslrootcert and
sslcrl when they are not used. sslkey doesn't need the fix because it
is loaded only when sslcert is loadded.
---
 src/test/ssl/t/001_ssltests.pl | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 2b875a3c95..114541590d 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -93,7 +93,7 @@ note "running client tests";
 switch_server_cert($node, 'server-cn-only');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid hostaddr=$SERVERHOSTADDR host=common-name.pg-ssltest.test";
+  "user=ssltestuser dbname=trustdb sslcert=invalid sslcrl=invalid hostaddr=$SERVERHOSTADDR host=common-name.pg-ssltest.test";
 
 # The server should not accept non-SSL connections.
 test_connect_fails(
@@ -185,7 +185,7 @@ test_connect_ok(
 # Check that connecting with verify-full fails, when the hostname doesn't
 # match the hostname in the server's certificate.
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";
+  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt sslcrl=invalid hostaddr=$SERVERHOSTADDR";
 
 test_connect_ok(
 	$common_connstr,
@@ -205,7 +205,7 @@ test_connect_fails(
 switch_server_cert($node, 'server-multiple-alt-names');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
+  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt sslcrl=invalid hostaddr=$SERVERHOSTADDR sslmode=verify-full";
 
 test_connect_ok(
 	$common_connstr,
@@ -236,7 +236,7 @@ test_connect_fails(
 switch_server_cert($node, 'server-single-alt-name');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
+  "user=ssltestuser dbname=trustdb 

Re: Why does execReplication.c lock tuples?

2019-01-29 Thread Petr Jelinek
Hi,

On 20/01/2019 21:03, Andres Freund wrote:
> Hi,
> 
> Currently RelationFindReplTupleByIndex(), RelationFindReplTupleSeq()
> lock the found tuple. I don't quite get what that achieves - why isn't
> dealing with concurrency in the table_update/delete calls at the
> callsites sufficient? As far as I can tell there's no meaningful
> concurrency handling in the heap_lock_tuple() paths, so it's not like we
> follow update chains or anything. 
> 

Yeah that's leftover from the conflict detection/handling code that I
stripped away to keep the patched manageable size-wise. As things stand
now we could remove that and use normal heap_update instead of simple
variant. It'll be likely be needed again if we add conflict handling in
the future, but perhaps we could be smarter about it then (i.e. I can
imagine that it will be per table anyway, not necessarily default behavior).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 5:20 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 29, 2019 at 9:29 AM Amit Kapila  wrote:
> >
>
> I'd suspect the alignment of integer. In my environemnt, the tuple
> actual size is 28 bytes but the aligned size is 32 bytes (=
> MAXALIGN(28)), so we can store 226 tuples to single page. But if
> MAXALIGN(28) = 28 then we can store 255 tuples and 1000 tuples fits
> within 4 pages. The MAXALIGN of four buildfarms seem 4 accroding to
> the configure script so MAXALIGN(28) might be 28 on these buildfarms.
>

Good finding.  I was also wondering along these lines and wanted to
verify.  Thanks a lot.  So, this clearly states why we have a second
failure in my email above [1].  I think this means for the fsm test
also we have to be careful when relying on the number of pages in the
test.  I think now we have found the reasons and solutions for the
first three problems mentioned in my email [1].  For the problem-4
(Failure on jacana:), I have speculated some theory, but not sure how
can we confirm?  Can we try the patch on Jacana before considering the
patch for commit?  Is there any other way we can replicate that error?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1L%3DqWp_bJ5aTc9%2Bfy4Ewx2LPaLWY-RbR4a60g_rupCKnQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pg_stat_ssl additions

2019-01-29 Thread Peter Eisentraut
On 29/01/2019 04:18, Kyotaro HORIGUCHI wrote:
> Some further investigation told me that the file
> ~/.postgresql/root.cert was the culprit.

OK, I could reproduce the problem and found a fix for it.  Basically you
need to specify sslrootcert in each test, and these new tests didn't do
it.  All other tests were OK, so a local fix was enough.

I have committed 0001..0003 now.  Attached is the latest version of
0004, about which I didn't not get an explicit comment in your last
review message.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5168e09ae3ee0c61cdd40ababeee7a6e45bdcb01 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 28 Jan 2019 14:40:04 +0100
Subject: [PATCH v5 4/4] Add more columns to pg_stat_ssl

Add columns client_serial and issuer_dn to pg_stat_ssl.  These allow
uniquely identifying the client certificate.

Rename the existing column clientdn to client_dn, to make the naming
more consistent and easier to read.

Reviewed-by: Kyotaro HORIGUCHI 
Discussion: 
https://www.postgresql.org/message-id/flat/398754d8-6bb5-c5cf-e7b8-22e5f0983...@2ndquadrant.com/
---
 doc/src/sgml/monitoring.sgml  | 20 --
 src/backend/catalog/system_views.sql  |  4 +++-
 src/backend/libpq/be-secure-openssl.c | 29 +++
 src/backend/postmaster/pgstat.c   |  4 +++-
 src/backend/utils/adt/pgstatfuncs.c   | 22 
 src/include/catalog/pg_proc.dat   |  6 +++---
 src/include/libpq/libpq-be.h  |  2 ++
 src/include/pgstat.h  | 16 ---
 src/test/regress/expected/rules.out   | 10 +
 src/test/ssl/t/001_ssltests.pl|  8 
 10 files changed, 99 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 60a85a7898..7a84f51340 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2201,15 +2201,31 @@ pg_stat_ssl View
   or NULL if SSL is not in use on this connection
 
 
- clientdn
+ client_dn
  text
  Distinguished Name (DN) field from the client certificate
   used, or NULL if no client certificate was supplied or if SSL
   is not in use on this connection. This field is truncated if the
   DN field is longer than NAMEDATALEN (64 characters
-  in a standard build)
+  in a standard build).
  
 
+
+ client_serial
+ numeric
+ Serial number of the client certificate, or NULL if no client
+ certificate was supplied or if SSL is not in use on this connection.  The
+ combination of certificate serial number and certificate issuer uniquely
+ identifies a certificate (unless the issuer erroneously reuses serial
+ numbers).
+
+
+ issuer_dn
+ text
+ DN of the issuer of the client certificate, or NULL if no client
+ certificate was supplied or if SSL is not in use on this connection.
+ This field is truncated like client_dn.
+


   
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index f4d9e9daf7..3e229c693c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -782,7 +782,9 @@ CREATE VIEW pg_stat_ssl AS
 S.sslcipher AS cipher,
 S.sslbits AS bits,
 S.sslcompression AS compression,
-S.sslclientdn AS clientdn
+S.ssl_client_dn AS client_dn,
+S.ssl_client_serial AS client_serial,
+S.ssl_issuer_dn AS issuer_dn
 FROM pg_stat_get_activity(NULL) AS S;
 
 CREATE VIEW pg_replication_slots AS
diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 789a975409..f619fe9639 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1117,6 +1117,35 @@ be_tls_get_peerdn_name(Port *port, char *ptr, size_t len)
ptr[0] = '\0';
 }
 
+void
+be_tls_get_issuer_name(Port *port, char *ptr, size_t len)
+{
+   if (port->peer)
+   strlcpy(ptr, 
X509_NAME_to_cstring(X509_get_issuer_name(port->peer)), len);
+   else
+   ptr[0] = '\0';
+}
+
+void
+be_tls_get_peer_serial(Port *port, char *ptr, size_t len)
+{
+   if (port->peer)
+   {
+   ASN1_INTEGER *serial;
+   BIGNUM *b;
+   char   *decimal;
+
+   serial = X509_get_serialNumber(port->peer);
+   b = ASN1_INTEGER_to_BN(serial, NULL);
+   decimal = BN_bn2dec(b);
+   BN_free(b);
+   strlcpy(ptr, decimal, len);
+   OPENSSL_free(decimal);
+   }
+   else
+   ptr[0] = '\0';
+}
+
 #ifdef HAVE_X509_GET_SIGNATURE_NID
 char *
 be_tls_get_certificate_hash(Port *port, size_t *len)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 

Re: Thread-unsafe coding in ecpg

2019-01-29 Thread Michael Meskes
> I like having a hard limit on the number of loop iterations;
> that should ensure that the test terminates no matter how confused
> ecpglib is.

I get your point and thus will only clean up the tests a little bit.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




Re: [WIP] CREATE SUBSCRIPTION with FOR TABLES clause (table filter)

2019-01-29 Thread Evgeniy Efimkin
Hi!
1. done
2. rename to pg_user_subscriptions
3. by pg_dump, i checked upgrade from 10 to 12devel, it's work fine
4. done
5. done
6. I took it from AlterSubscription_refresh, in that function no any free()
7. done

 
Ефимкин Евгений
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 3f2f674a1a..ff8a65a3e4 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -522,12 +522,8 @@
   
 
   
-   To create a subscription, the user must be a superuser.
-  
-
-  
-   The subscription apply process will run in the local database with the
-   privileges of a superuser.
+   To add tables to a subscription, the user must have ownership rights on the
+   table.
   
 
   
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 6dfb2e4d3e..f0a368f90c 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -24,6 +24,9 @@ PostgreSQL documentation
 ALTER SUBSCRIPTION name CONNECTION 'conninfo'
 ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]
+ALTER SUBSCRIPTION name ADD TABLE table_name [, ...]
+ALTER SUBSCRIPTION name SET TABLE table_name [, ...]
+ALTER SUBSCRIPTION name DROP TABLE table_name [, ...]
 ALTER SUBSCRIPTION name ENABLE
 ALTER SUBSCRIPTION name DISABLE
 ALTER SUBSCRIPTION name SET ( subscription_parameter [= value] [, ... ] )
@@ -44,9 +47,7 @@ ALTER SUBSCRIPTION name RENAME TO <
   
You must own the subscription to use ALTER SUBSCRIPTION.
To alter the owner, you must also be a direct or indirect member of the
-   new owning role. The new owner has to be a superuser.
-   (Currently, all subscription owners must be superusers, so the owner checks
-   will be bypassed in practice.  But this might change in the future.)
+   new owning role.
   
  
 
@@ -137,6 +138,35 @@ ALTER SUBSCRIPTION name RENAME TO <
 

 
+   
+ADD TABLE table_name
+
+ 
+  The ADD TABLE clauses will add new table in subscription, table must be
+  present in publication.
+ 
+
+   
+
+   
+SET TABLE table_name
+
+ 
+  The SET TABLE clause will replace the list of tables in
+  the publication with the specified one.
+ 
+
+   
+
+   
+DROP TABLE table_name
+
+ 
+   The DROP TABLE clauses will remove table from subscription.
+ 
+
+   
+

 ENABLE
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 1a90c244fb..04af4e27c7 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -24,6 +24,7 @@ PostgreSQL documentation
 CREATE SUBSCRIPTION subscription_name
 CONNECTION 'conninfo'
 PUBLICATION publication_name [, ...]
+[ FOR TABLE table_name [, ...]
 [ WITH ( subscription_parameter [= value] [, ... ] ) ]
 
  
@@ -88,6 +89,16 @@ CREATE SUBSCRIPTION subscription_name

 
+   
+FOR TABLE
+
+ 
+  Specifies a list of tables to add to the subscription. All tables listed in clause
+  must be present in publication.
+ 
+
+   
+

 WITH ( subscription_parameter [= value] [, ... ] )
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f4d9e9daf7..6ec6b24eb1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -904,6 +904,27 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_user_subscriptions AS
+SELECT
+S.oid,
+		S.subdbid,
+S.subnameAS subname,
+CASE WHEN S.subowner = 0 THEN
+'public'
+ELSE
+A.rolname
+END AS usename,
+		S.subenabled,
+CASE WHEN (S.subowner <> 0 AND A.rolname = current_user)
+OR (SELECT rolsuper FROM pg_authid WHERE rolname = current_user)
+THEN S.subconninfo
+ ELSE NULL END AS subconninfo,
+		S.subslotname,
+		S.subsynccommit,
+		S.subpublications
+FROM pg_subscription S
+LEFT JOIN pg_authid A ON (A.oid = S.subowner);
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
@@ -936,7 +957,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublications)
+GRANT SELECT (tableoid, oid, subdbid, subname,
+	subowner, subenabled, subslotname, subpublications, subsynccommit)
 ON pg_subscription TO public;
 
 
diff --git a/src/backend/commands/subscriptioncmds.c 

ALTER SESSION

2019-01-29 Thread Kyotaro HORIGUCHI
Hello.

https://www.postgresql.org/message-id/20190128.133143.115303042.horiguchi.kyot...@lab.ntt.co.jp

> C. Provide session-specific GUC variable (that overides the global one)
>- Add new configuration file "postgresql.conf." and
>  pg_reload_conf() let the session with the PID loads it as if
>  it is the last included file. All such files are removed at
>  startup or at the end of the coressponding session.
> 
>- Add a new syntax like this:
>  ALTER SESSION WITH (pid=)
> SET configuration_parameter {TO | =} {value | 'value' | DEFAULT}
> RESET configuration_parameter
> RESET ALL
> 
>- Target variables are marked with GUC_REMOTE.
> 
> I'll consider the last choice and will come up with a patch.

This is that, with a small change in design.

ALTER SESSION WITH (pid ) SET param {TO|=} value [ IMMEDIATE ]
ALTER SESSION WITH (pid ) RESET param [ IMMEDIATE ]
ALTER SESSION WITH (pid ) RESET ALL

The first form create an entry in
$PGDATA/pg_session_conf/postgresql..conf.
The second form removes the entry.
The third form removes the file itself.

IMMEDIATE specifies that the change is applied immediately by
sending SIGHUP to the process. pg_reload_conf() works as well.

The session configuration is removed at session-end and the
directory is cleaned up at startup.

It can change varaibles of PGC_USERSET by non-superuser or
PGC_SUSET by superuser. The local session user should have the
same privileges with pg_signal_backend() on the target session.

This patch contains documentation but doesn't contain test yet.

I would appreciate any comments or suggestions on this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 03c53ca5259c2cc4b66c6339ea494de9016d9263 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 29 Jan 2019 17:03:57 +0900
Subject: [PATCH] ALTER SESSION

Some tracking features are controled by GUC so they cannot be
activated from another session. The ALTER SESSION command allows
sessions to change some of GUC settings of another session.
---
 doc/src/sgml/config.sgml |  26 ++-
 doc/src/sgml/ref/allfiles.sgml   |   1 +
 doc/src/sgml/reference.sgml  |   1 +
 src/backend/nodes/copyfuncs.c|  14 ++
 src/backend/nodes/equalfuncs.c   |  12 ++
 src/backend/parser/gram.y|  47 -
 src/backend/postmaster/postmaster.c  |   6 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/tcop/utility.c   |  13 ++
 src/backend/utils/init/postinit.c|   3 +
 src/backend/utils/misc/guc-file.l|  24 +++
 src/backend/utils/misc/guc.c | 334 +++
 src/bin/initdb/initdb.c  |   3 +-
 src/include/nodes/nodes.h|   1 +
 src/include/nodes/parsenodes.h   |  12 ++
 src/include/utils/guc.h  |  11 ++
 16 files changed, 433 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..a52989342a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -175,6 +175,14 @@ shared_buffers = 128MB
  read whenever postgresql.conf is, and its settings take
  effect in the same way.  Settings in postgresql.auto.conf
  override those in postgresql.conf.
+
+
+
+ Furthermore a directory pg_session_conf in the
+ PostgreSQL data directory contains per-session configuration files. Every
+ file holds settings provided through the
+  command. This is read by reloading
+ after the two above files and removed at the session-end.
 
 
 
@@ -195,8 +203,8 @@ shared_buffers = 128MB
   The already-mentioned  command
   provides a SQL-accessible means of changing global defaults; it is
   functionally equivalent to editing postgresql.conf.
-  In addition, there are two commands that allow setting of defaults
-  on a per-database or per-role basis:
+  In addition, there are three commands that allow setting of defaults
+  on a per-database, per-role or per-session basis:
  
 
  
@@ -213,6 +221,13 @@ shared_buffers = 128MB
per-database settings to be overridden with user-specific values.
   
  
+
+ 
+  
+   The  command allows other sessions to
+override session-local settings with user-specific values.
+  
+ 
 
 
  
@@ -223,6 +238,13 @@ shared_buffers = 128MB
   Note that some settings cannot be changed after server start, and
   so cannot be set with these commands (or the ones listed below).
 
+ 
+  Values set with ALTER SESSION are applied only when
+  reloading.  They override values obtained from the configuration files
+  or server command line, and constitute defaults for the rest of the
+  session.  Note that it can change only settings that are changeable
+  on-session.
+
 
  
   Once a client is connected to the database, PostgreSQL
diff --git a/doc/src/sgml/ref/allfiles.sgml 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 5:59 AM Amit Kapila  wrote:
>
> On Tue, Jan 29, 2019 at 12:37 AM John Naylor
>  wrote:
> > > I think here you need to clear the map if it exists or clear it
> > > unconditionally, the earlier one would be better.
> >
> > Ok, maybe all callers should call it unconditonally, but within the
> > function, check "if (FSM_LOCAL_MAP_EXISTS)"?
> >
>
> Sounds sensible.  I think we should try to reproduce these failures,
> for ex. for pgbench failure, we can try the same test with more
> clients.
>

I am able to reproduce this by changing pgbench test as below:

--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -56,9 +56,9 @@ $node->safe_psql('postgres',
  'CREATE UNLOGGED TABLE insert_tbl (id serial primary key); ');

 pgbench(
- '--no-vacuum --client=5 --protocol=prepared --transactions=25',
+ '--no-vacuum --client=10 --protocol=prepared --transactions=25',
  0,
- [qr{processed: 125/125}],
+ [qr{processed: 250/250}],

You can find this change in attached patch.  Then, I ran the make
check in src/bin/pgbench multiple times using test_conc_insert.sh.
You can vary the number of times the test should run, if you are not
able to reproduce it with this.

The attached patch (clear_local_map_if_exists_1.patch) atop the main
patch fixes the issue for me.  Kindly verify the same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


change_pgbench_test_1.patch
Description: Binary data
for ((i = 1 ; i < 25; i++))
do
	make check
done;


clear_local_map_if_exists_1.patch
Description: Binary data


Re: [HACKERS] Cached plans and statement generalization

2019-01-29 Thread Konstantin Knizhnik



On 29.01.2019 4:38, Nagaura, Ryohei wrote:

Hi,

Although I became your reviewer, it seems to be difficult to feedback in this 
CF.
I continue to review, so would you update your patch please?
Until then I review your current patch.

There is one question.
date_1.out which maybe is copy of date.out includes trailing space and gaps of 
indent
e.g., line 3368 and 3380 in your current patch have
space in each end of line
different indent.
This is also seen in date.out.
I'm not sure whether it is ok to add new file including the above features just 
because a existing file includes.
Is it ok?

Best regards,
-
Ryohei Nagaura

Rebased version of the patch is attached.
Concerning spaces in date.out and date_1.out - it is ok.
This files are just output of test execution and it is not possible to 
expect lack of trailing spaces in output of test scripts execution.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/autoprepare.sgml b/doc/src/sgml/autoprepare.sgml
new file mode 100644
index 000..b3309bd
--- /dev/null
+++ b/doc/src/sgml/autoprepare.sgml
@@ -0,0 +1,62 @@
+
+
+ 
+  Autoprepared statements
+
+  
+   autoprepared statements
+  
+
+  
+PostgreSQL makes it possible prepare
+frequently used statements to eliminate cost of their compilation
+and optimization on each execution of the query. On simple queries
+(like ones in pgbench -S) using prepared statements
+increase performance more than two times.
+  
+
+  
+Unfortunately not all database applications are using prepared statements
+and, moreover, it is not always possible. For example, in case of using
+pgbouncer or any other session pooler,
+there is no session state (transactions of one client may be executed at different
+backends) and so prepared statements can not be used.
+  
+
+  
+Autoprepare mode allows to overcome this limitation.
+In this mode Postgres tries to generalize executed statements
+and build parameterized plan for them. Speed of execution of
+autoprepared statements is almost the same as of explicitly
+prepared statements.
+  
+
+  
+By default autoprepare mode is switched off. To enable it, assign non-zero
+value to GUC variable autoprepare_tthreshold.
+This variable specified minimal number of times the statement should be
+executed before it is autoprepared. Please notice that, despite to the
+value of this parameter, Postgres makes a decision about using
+generalized plan vs. customized execution plans based on the results
+of comparison of average time of five customized plans with
+time of generalized plan.
+  
+
+  
+If number of different statements issued by application is large enough,
+then autopreparing all of them can cause memory overflow
+(especially if there are many active clients, because prepared statements cache
+is local to the backend). To prevent growth of backend's memory because of
+autoprepared cache, it is possible to limit number of autoprepared statements
+by setting autoprepare_limit GUC variable. LRU strategy will be used
+to keep in memory most frequently used queries.
+  
+
+  
+It is possible to inspect autoprepared queries in the backend using
+pg_autoprepared_statements view. It shows original text of the
+query, types of the extracted parameters (replacing literals) and
+query execution counter.
+  
+
+ 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index af4d062..def8dce 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8169,6 +8169,11 @@ SCRAM-SHA-256$iteration count:
  
 
  
+  pg_autoprepared_statements
+  autoprepared statements
+ 
+
+ 
   pg_prepared_xacts
   prepared transactions
  
@@ -9480,6 +9485,68 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+
+ 
+  pg_autoprepared_statements
+
+  
+   pg_autoprepared_statements
+  
+
+  
+   The pg_autoprepared_statements view displays
+   all the autoprepared statements that are available in the current
+   session. See  for more information about autoprepared
+   statements.
+  
+
+  
+   pg_autoprepared_statements Columns
+
+   
+
+ 
+  Name
+  Type
+  Description
+ 
+
+
+ 
+  statement
+  text
+  
+The query string submitted by the client from which this prepared statement
+was created. Please notice that literals in this statement are not
+replaced with prepared statement placeholders.
+  
+ 
+ 
+  parameter_types
+  regtype[]
+  
+   The expected parameter types for the autoprepared statement in the
+   form of an array of regtype. The OID corresponding
+   to an element of this array can be obtained by casting the
+   regtype value to oid.
+  
+ 
+ 
+  

Re: speeding up planning with partitions

2019-01-29 Thread Amit Langote
Imai-san,

On 2019/01/28 10:44, Imai, Yoshikazu wrote:
> On Thu, Jan 24, 2019 at 6:10 AM, Imai, Yoshikazu wrote:
>> updating partkey case:
>>
>> part-num  master 0001 0002 0003 0004
>> 18215.34  7924.99  7931.15  8407.40  8475.65
>> 27137.49  7026.45  7128.84  7583.08  7593.73
>> 45880.54  5896.47  6014.82  6405.33  6398.71
>> 84222.96  4446.40  4518.54  4802.43  4785.82
>> 16   2634.91  2891.51  2946.99  3085.81  3087.91
>> 32935.12  1125.28  1169.17  1199.44  1202.04
>> 64352.37   405.27   417.09   425.78   424.53
>> 128   236.26   310.01   307.70   315.29   312.81
>> 25665.3686.8487.6784.3989.27
>> 51218.3424.8423.5523.9123.91
>> 10244.83 6.93 6.51 6.45 6.49
> 
> I also tested with non-partitioned table case.
> 
> updating partkey case:
> 
> part-num  master 0001 0002 0003 0004
> 010956.7  10370.5  10472.6  10571.0  10581.5
> 18215.34  7924.99  7931.15  8407.40  8475.65 
> ...
> 10244.83 6.93 6.51 6.45 6.49
> 
> 
> In my performance results, it seems update performance degrades in 
> non-partitioned case with v17-patch applied.
> But it seems this degrades did not happen at v2-patch.
> 
> On Thu, Aug 30, 2018 at 1:45 AM, Amit, Langote wrote:
>> UPDATE:
>>
>> nparts  master00010002   0003
>> ==  ==   
>> 0 285628932862   2816
> 
> Does this degradation only occur in my tests? Or if this result is correct, 
> what may causes the degradation?

I re-ran tests with v18 using the following setup [1]:

create table ht (a int, b int) partition by hash (b);
create table ht_0 partition of ht for values with (modulus N, remainder 0);
...
$ cat update-noprune.sql
update ht set a = 0;

pgbench -n -T 60 -f update-noprune,sql

TPS:

nparts  master00010002   0003   0004
==  ==      
0   4408  43354423   4379   4314
1   3883  38733679   3856   4007
2   3495  34763477   3500   3627

I can see some degradation for small number of partitions, but maybe it's
just noise?  At least, I don't yet have a systematic explanation for that
happening.

Thanks,
Amit

[1] I changed the compile flags in build scripts to drop
-DCATCACHE_FORCE_RELEASE, which would cause many syscache misses in my
test runs




Re: COPY FROM WHEN condition

2019-01-29 Thread Tomas Vondra



On 1/29/19 8:18 AM, David Rowley wrote:
> On Wed, 23 Jan 2019 at 06:35, Tomas Vondra  
> wrote:
>> It turned out to be a tad more complex due to partitioning, because when
>> we find the partitions do not match, the tuple is already allocated in
>> the "current" context (be it per-tuple or batch). So we can't just free
>> the whole context at that point. The old code worked around this by
>> alternating two contexts, but that seems a bit too cumbersome to me, so
>> the patch simply copies the tuple to the new context. That allows us to
>> reset the batch context always, right after emptying the buffer. I need
>> to do some benchmarking to see if the extra copy causes any regression.
> 
> I agree that fixing the problem mentioned by separating out tuple and
> batch contexts is a good idea, but I disagree with removing the
> alternating batch context idea.  FWIW the reason the alternating
> contexts went in wasn't just for fun, it was on performance grounds.
> There's a lengthy discussion in [1] about not adding any new
> performance regressions to COPY. To be more specific, Peter complains
> about a regression of 5% in [2].
> 
> It's pretty disheartening to see the work there being partially undone.
> 

Whoops :-(

> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
> 
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
> 
> 4be058fe9ec: (before your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
> 
> So looks like your change slows this code down 4% for this test case.
> That's about twice as bad as the 2.2% regression that I had to solve
> for the multi-insert partition patch (of which you've removed much of
> the code from)
> 
> I'd really like to see the alternating batch context code being put
> back in to fix this. Of course, if you have a better idea, then we can
> look into that, but just removing code that was carefully written and
> benchmarked without any new benchmarks to prove that it's okay to do
> so seems wrong to me.
> 

Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
shall we try to massage it a bit first? ISTM we could simply create two
batch memory contexts and alternate those, just like with the expression
contexts before.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-29 Thread Magnus Hagander
On Tue, Jan 29, 2019 at 6:19 AM Michael Paquier  wrote:

> On Mon, Jan 28, 2019 at 02:00:59PM +0100, Alex Kliukin wrote:
> > While reading the doc page for the pg_basebackup, I've been confused
> > by the fact that it says WAL files will be written to .tarballs
> > (either base.tar or pg_wal.tar) when pg_basebackup is instructed to
> > stream WALs alongside the backup itself. I think it makes sense to
> > elaborate that it only happens when tar format is specified (doc
> > patch is attached).
>
> Agreed.  The current wording can be confusing depending on the format,
> and your suggestion looks right.  Any opinions from others?
>

Agreed, definitely confusing.

Since you also agreed on it, I went ahead and pushed (with backpatch).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Built-in connection pooler

2019-01-29 Thread Konstantin Knizhnik




On 29.01.2019 8:14, Michael Paquier wrote:

On Mon, Jan 28, 2019 at 10:33:06PM +0100, Dimitri Fontaine wrote:

In other cases, it's important to measure and accept the possible
performance cost of running a proxy server between the client connection
and the PostgreSQL backend process. I believe the numbers shown in the
previous email by Konstantin are about showing the kind of impact you
can see when using the patch in a use-case where it's not meant to be
helping much, if at all.

Have you looked at the possibility of having the proxy worker be
spawned as a background worker?


Yes, it was my first attempt.
The main reason why I have implemented it in old ways are:
1. I need to know PID of spawned worker. Strange - it is possible to get 
PID of dynamic bgworker, but no of static one.
Certainly it is possible  to find a way of passing this PID to 
postmaster but it complicates start of worker.
2. I need to pass socket to spawner proxy.  Once again: it can be 
implemented also with bgworker but requires more coding (especially 
taken in account support of Win32 and FORKEXEC mode).



  I think that we should avoid spawning
any new processes on the backend from the ground as we have a lot more
infrastructures since 9.3.  The patch should actually be bigger, the
code is very raw and lacks comments in a lot of areas where the logic
is not so obvious, except perhaps to the patch author.


The main reason for publishing this patch was to receive feedbacks and 
find places which should be rewritten.
I will add more comments but I will be pleased if you point me which 
places are obscure now and require better explanation.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Built-in connection pooler

2019-01-29 Thread Konstantin Knizhnik




On 29.01.2019 0:10, Bruce Momjian wrote:

On Thu, Jan 24, 2019 at 08:14:41PM +0300, Konstantin Knizhnik wrote:

The main differences with pgbouncer are that:

1. It is embedded and requires no extra steps for installation and
configurations.
2. It is not single threaded (no bottleneck)
3. It supports all clients (if client needs session semantic, then it will be
implicitly given dedicated backend)


Some performance results (pgbench -S -n):

┌┬┬─┬─┬─┐
│ #Connections   │ Proxy  │ Proxy/SSL   │ Direct  │ Direct/SSL   │
├┼┼─┼─┼──┤
│ 1  │  13752 │   12396 │   17443 │15762 │
├┼┼─┼─┼──┤
│ 10 │  53415 │   59615 │   68334 │85885 │
├┼┼─┼─┼──┤
│ 1000   │  60152 │   20445 │   60003 │24047 │
└┴┴─┴─┴──┘

It is nice it is a smaller patch.  Please remind me of the performance
advantages of this patch.

The primary purpose of pooler is efficient support of large number of 
connections and minimizing system resource usage.
But as far as Postgres is not scaling well at SMP system with larger 
number of CPU cores (due to many reasons discussed in hackers)
reducing number of concurrently working backends can also significantly 
increase performance.


I have not done such testing yet but I am planing to do it as well as 
comparison with pgbouncer and Odyssey.
But please notice that this proxy approach is by design slower than my 
previous implementation used in PgPRO-EE (based on socket redirection).
At some workloads connections throughout proxy cause up to two times 
decrease of performance comparing with dedicated backends.
There is no such problem with old connection pooler implementation which 
was always not worser than vanilla. But it doesn't support SSL connections

and requires much more changes in Postgres core.







--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




RE: static global variable openLogOff in xlog.c seems no longer used

2019-01-29 Thread Takashi Menjo
Michael Paquier wrote:
> It seems to me that keeping openLogOff is still useful to get a report
> about the full chunk area being written if the data gets written in
> multiple chunks and fails afterwards.  Your patch would modify the
> report so as only the area with the partial write is reported.  For
> debugging, having a static reference is also useful in my opinion.

I agree with you on both error reporting and debugging.  Now that you
mention it, I find that my patch modifies ereport...

When I wrote a patchset to xlog.c (in another email thread), I thought that
this can be fixed. But now I understand it is not a simple thing.  Thank
you.


Regards,
Takashi

-- 
Takashi Menjo - NTT Software Innovation Center