Re: [HACKERS] Pluggable storage

2017-09-11 Thread Haribabu Kommi
On Sat, Sep 9, 2017 at 1:23 PM, Haribabu Kommi 
wrote:

>
> I rebased the patch to the latest master and also fixed the duplicate OID
> and some slot fixes. Updated patches are attached.
>


While analyzing the removal of HeapScanDesc usage other than heap
modules, The mostly used member is "*rs_cbuf*" for the purpose of locking
the buffer, visibility checks and marking it as dirty. The buffer is
tightly integrated
with the visibility. Buffer may not be required for some storage routines
where
the data is always in the memory and etc.

I found the references of its structure members in the following
places. I feel other than the following 4 parameters, rest of them needs to
be
moved into their corresponding storage routines.

* Relation rs_rd; /* heap relation descriptor */*
* Snapshot rs_snapshot; /* snapshot to see */*
* int rs_nkeys; /* number of scan keys */*
* ScanKey rs_key; /* array of scan key descriptors */*

But currently I am treating the "*rs_cbuf" *also a needed member and also
expecting all storage routines will be provide it. Or we may need a another
approach to mark the buffer as dirty.

Suggestions?


Following are the rest of the parameters that are used
outside the heap.


* BlockNumber rs_nblocks; /* total number of blocks in rel */*

pgstattuple.c, tsm_system_rows.c, tsm_system_time.c, system.c
nodeBitmapheapscan.c nodesamplescan.c,
*Mostly for the purpose of checking the number of blocks in a rel.*

* BufferAccessStrategy rs_strategy; /* access strategy for reads */*

pgstattuple.c

* bool rs_pageatatime; /* verify visibility page-at-a-time? */*
* BlockNumber rs_startblock; /* block # to start at */*
* bool rs_syncscan; /* report location to syncscan logic? */*
* bool rs_inited; /* false = scan not init'd yet */*

nodesamplescan.c

* HeapTupleData rs_ctup; /* current tuple in scan, if any */*

*genam.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Used for retrieving the last scanned tuple.*

* BlockNumber rs_cblock; /* current block # in scan, if any */*

*index.c, nodesamplescan.c*

* Buffer rs_cbuf; /* current buffer in scan, if any */*

*pgrowlocks.c, pgstattuple.c, genam.c, index.c, cluster.c,*
*tablecmds.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Mostly used for Locking the Buffer.*


* ParallelHeapScanDesc rs_parallel; /* parallel scan information */*

*nodeseqscan.c*

* int rs_cindex; /* current tuple's index in vistuples */*

*nodeBitmapHeapScan.c*

* int rs_ntuples; /* number of visible tuples on page */*
* OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */*

*tsm_system_rows.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Used for retrieve the offsets mainly Bitmap and sample scans.*

I think rest of the above parameters usage other than heap can be changed
once the Bitmap and Sample scans are modified to use the storage routines
while returning the tuple instead of their own implementations. I feel these
scans are the major users of the rest of the parameters. This approach may
need to some more API's to get rid of Bitmap and sample scan's own
implementation.

suggestions?


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] UPDATE of partition key

2017-09-11 Thread Amit Khandekar
On 11 September 2017 at 21:12, Dilip Kumar  wrote:
> On Thu, Sep 7, 2017 at 11:41 AM, Amit Khandekar  
> wrote:
>> On 6 September 2017 at 21:47, Dilip Kumar  wrote:
>
>> Actually, since transition tables came in, the functions like
>> ExecARUpdateTriggers() or ExecARInsertTriggers() have this additional
>> purpose of capturing transition table rows, so that the images of the
>> tables are visible when statement triggers are fired that refer to
>> these transition tables. So in the above code, these functions only
>> capture rows, they do not add any event for firing any ROW triggers.
>> AfterTriggerSaveEvent() returns without adding any event if it's
>> called only for transition capture. So even if UPDATE row triggers are
>> defined, they won't get fired in case of row movement, although the
>> updated rows would be captured if transition tables are referenced in
>> these triggers or in the statement triggers.
>>
>
> Ok then I have one more question,
>
> With transition table, we can only support statement level trigger

Yes, we don't support row triggers with transition tables if the table
is a partition.

> and for update
> statement, we are only going to execute UPDATE statement level
> trigger? so is there
> any point of making transition table entry for DELETE/INSERT trigger
> as those transition
> table will never be used.

But the statement level trigger function can refer to OLD TABLE and
NEW TABLE, which will contain all the OLD rows and NEW rows
respectively. So the updated rows of the partitions (including the
moved ones) need to be captured. So for OLD TABLE, we need to capture
the deleted row, and for NEW TABLE, we need to capture the inserted
row.

In the regression test update.sql, check how the statement trigger
trans_updatetrig prints all the updated rows, including the moved
ones.


>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Kuntal Ghosh
On Tue, Sep 12, 2017 at 9:06 AM, Haribabu Kommi
 wrote:
>
>
> On Tue, Sep 12, 2017 at 2:04 AM, Kuntal Ghosh 
> wrote:
>>
>> On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi 
>> wrote:
>> >
>> > Attached the latest patch and performance report.
>> >
>> While looking into the patch, I realized that a normal backend has to
>> check almost 10 if conditions at worst case inside XLogWrite(6 in
>> am_background_process method, 1 for write, 1 for blocks, 2 for
>> fsyncs), just to decide whether it is a normal backend or not. The
>> count is more for walwriter process. Although I've not done any
>> performance tests, IMHO, it might add further overhead on the
>> XLogWrite Lock.
>>
>> I was thinking whether it is possible to collect all the stats in
>> XLogWrite() irrespective of the type of backend and update the shared
>> counters at once at the end of the function. Thoughts?
>
>
> Thanks for the review.
> Yes, I agree with you that the stats update can be done at the end
> of the function to avoid some overhead. Updated patch is attached.
>
Thanks for the patch.
+ * Check whether the current process is a normal backend or not.
+ * This function checks for the background processes that does
+ * some WAL write activity only and other background processes
+ * are not considered. It considers all the background workers
+ * as WAL write activity workers.
+ *
+ * Returns false - when the current process is a normal backend
+ *true - when the current process a background process/worker
+ */
+static bool
+am_background_process()
+{
+   /* check whether current process is a background process/worker? */
+   if (!AmBackgroundWriterProcess() ||
+   !AmCheckpointerProcess() ||
+   !AmStartupProcess() ||
+   !IsBackgroundWorker ||
+   !am_walsender ||
+   !am_autovacuum_worker)
+   {
+   return false;
+   }
+
+   return true;
+}
I think you've to do AND operation here instead of OR. Isn't it?
Another point is that, the function description could start with
'Check whether the current process is a background process/worker.'

> There is an overhead with IO time calculation. Is the view is good
> enough without IO columns?
I'm not sure how IO columns are useful for tuning the WAL write/fsync
performance from an user's perspective. But, it's definitely useful
for developing/improving stuffs in XLogWrite.

>
> And also during my tests, I didn't observe any other background
> processes performing the xlogwrite operation, the values are always
> zero. Is it fine to merge them with backend columns?
>
Apart from wal writer process, I don't see any reason why you should
track other background processes separately from normal backends.
However, I may be missing some important point.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-09-11 Thread Jing Wang
Hi Surafel,

Sorry for that.

Yes. The test case file is forgotten to be added into the previous patch.

Kindly please use the updated patch in the attached file.


Regards,
Jing Wang
Fujitsu Australia


comment_on_current_database_no_pgdump_v4.1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-09-11 Thread Thomas Munro
On Tue, Sep 12, 2017 at 1:11 PM, Jing Wang  wrote:
> Please find the rebased patch based on latest version in the attached file.

Hi Jing

It looks like you created dbname.sql and dbname.out files for a
regression test but forgot to "git add" them to your branch before you
created the patch, so "make check" fails with the patch applied:

test identity ... ok
test event_trigger... ok
test stats... ok
test dbname   ... /bin/sh: 1: cannot open
/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/sql/dbname.sql:
No such file

+ printf("target = %s\n",target);

Looks like a stray debugging message?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging problem in 9.4.3?

2017-09-11 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 08 Sep 2017 16:30:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170908.163001.53230385.horiguchi.kyot...@lab.ntt.co.jp>
> > >> 2017-04-13 12:11:27.065 JST [85441] t/102_vacuumdb_stages.pl
> > >> STATEMENT:  ANALYZE;
> > >> 2017-04-13 12:12:25.766 JST [85492] LOG:  BufferNeedsWAL: pendingSyncs
> > >> = 0x0, no_pending_sync = 0
> > >> 
> > >> -   lsn = XLogInsert(RM_SMGR_ID,
> > >> -XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
> > >> +   rel->no_pending_sync= false;
> > >> +   rel->pending_sync = pending;
> > >> +   }
> > >> 
> > >> It seems to me that those flags and the pending_sync data should be
> > >> kept in the context of backend process and not be part of the Relation
> > >> data...
> > > 
> > > I understand that the context of "backend process" means
> > > storage.c local. I don't mind the context on which the data is,
> > > but I found only there that can get rid of frequent hash
> > > searching. For pending deletions, just appending to a list is
> > > enough and costs almost nothing, on the other hand pendig syncs
> > > are required to be referenced, sometimes very frequently.
> > > 
> > >> +void
> > >> +RecordPendingSync(Relation rel)
> > >> I don't think that I agree that this should be part of relcache.c. The
> > >> syncs are tracked should be tracked out of the relation context.
> > > 
> > > Yeah.. It's in storage.c in the latest patch. (Sorry for the
> > > duplicate name). I think it is a kind of bond between smgr and
> > > relation.
> > > 
> > >> Seeing how invasive this change is, I would also advocate for this
> > >> patch as only being a HEAD-only change, not many people are
> > >> complaining about this optimization of TRUNCATE missing when wal_level
> > >> = minimal, and this needs a very careful review.
> > > 
> > > Agreed.
> > > 
> > >> Should I code something? Or Horiguchi-san, would you take care of it?
> > >> The previous crash I saw has been taken care of, but it's been really
> > >> some time since I looked at this patch...
> > > 
> > > My point is hash-search on every tuple insertion should be evaded
> > > even if it happens rearely. Once it was a bit apart from your
> > > original patch, but in the latest patch the significant part
> > > (pending-sync hash) is revived from the original one.
> > 
> > This patch has followed along since CF 2016-03, do we think we can reach a
> > conclusion in this CF?  It was marked as "Waiting on Author”, based on
> > developments since in this thread, I’ve changed it back to “Needs Review”
> > again.
> 
> I manged to reload its context into my head. It doesn't apply on
> the current master and needs some amendment. I'm going to work on
> this.

Rebased and slightly modified.

Michael's latest patch on which this patch is piggybacking seems
works perfectly. The motive of my addition is avoiding frequent
(I think specifically per tuple modification) hash accessing
occurs while pending-syncs exist. The hash contains at least 6 or
more entries.

The attached patch emits more log messages that will be removed
in the final shape to see how much the addition reduces the hash
access.  As a basis of determining the worthiness of the
additional mechanism, I'll show an example of a set of queries
below.

In the log messages, "r" is relation oid, "b" is buffer number,
"hash" is the pointer to the backend-global hash table for
pending syncs. "ent" is the entry in the hash belongs to the
relation, "neg" is a flag indicates that the existing pending
sync hash doesn't have an entry for the relation.

=# set log_min_message to debug2;
=# begin;
=# create table test1(a text primary key);
> DEBUG:  BufferNeedsWAL(r 2608, b 55): hash = (nil), ent=(nil), neg = 0
# relid=2608 buf=55, hash has not been created

=# insert into test1 values ('inserted row');
> DEBUG:  BufferNeedsWAL(r 24807, b 0): hash = (nil), ent=(nil), neg = 0
# relid=24807, fist buffer, hash has not bee created

=# copy test1 from '//copy_data.txt';
> DEBUG:  BufferNeedsWAL(r 24807, b 0): hash = 0x171de00, ent=0x171f390, neg = 0
# hash created, pending sync entry linked, no longer needs hash acess
# (repeats for the number of buffers)
COPY 200

=# create table test3(a text primary key);
> DEBUG:  BufferNeedsWAL(r 2608, b 55): hash = 0x171de00, ent=(nil), neg = 1
# no pending sync entry for this relation, no longer needs hash access.

=# insert into test3 (select a from generate_series(0, 99) a);
> DEBUG:  BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 0
> DEBUG:  BufferNeedsWAL: accessing hash : not found
> DEBUG:  BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 1
# This table no longer needs hash access, (repeats for the number of tuples)

=#  truncate test3;
=#  insert into test3 (select a from generate_series(0, 99) a);
> DEBUG:  BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 0
> DEBUG:  BufferNeedsWAL: accessing hash : found
> 

Re: [HACKERS] More flexible LDAP auth search filters?

2017-09-11 Thread Thomas Munro
On Tue, Sep 12, 2017 at 7:21 AM, Peter Eisentraut
 wrote:
> On 9/8/17 13:24, Mark Cave-Ayland wrote:
>> My weapon of choice for LDAP deployments on POSIX-based systems is
>> Arthur De Jong's nss-pam-ldapd (https://arthurdejong.org/nss-pam-ldapd)
>> which is far more flexible than pam_ldap and fixes a large number of
>> bugs, including the tendency for pam_ldap to hang infinitely if it can't
>> contact its LDAP server.
>>
>> Take a look at nss-pam-ldapd's man page for nslcd.conf and in particular
>> pam_authz_search - this is exactly the type of filters I would end up
>> deploying onto servers. This happens a lot in large organisations
>> whereby getting group memberships updated in the main directory can take
>> days/weeks whereas someone with root access to the server itself can
>> hard-code an authentication list of users and/or groups in an LDAP
>> filter in just a few minutes.
>
> Thomas, would you consider using the placeholder syntax described at
>  under
> pam_authz_search?

Sounds good.  Here it is with $username.  It's nice not to have to
escape any characters in URLs.  I suppose more keywords could be added
in follow-up patches if someone thinks that would be useful
($hostname, $dbname, ...?).  I got sick of that buffer sizing code and
changed it to use StringInfo.  Here also are your test patches tweaked
slightly: 0002 just adds FreeBSD support as per previous fixup and
0003 changes to $username.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Allow-custom-search-filters-to-be-configured-for-LDA.patch
Description: Binary data


0002-Add-LDAP-authentication-test-suite.patch
Description: Binary data


0003-Add-tests-for-ldapsearchfilter-functionality.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Haribabu Kommi
On Tue, Sep 12, 2017 at 2:04 AM, Kuntal Ghosh 
wrote:

> On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi 
> wrote:
> >
> > Attached the latest patch and performance report.
> >
> While looking into the patch, I realized that a normal backend has to
> check almost 10 if conditions at worst case inside XLogWrite(6 in
> am_background_process method, 1 for write, 1 for blocks, 2 for
> fsyncs), just to decide whether it is a normal backend or not. The
> count is more for walwriter process. Although I've not done any
> performance tests, IMHO, it might add further overhead on the
> XLogWrite Lock.
>
> I was thinking whether it is possible to collect all the stats in
> XLogWrite() irrespective of the type of backend and update the shared
> counters at once at the end of the function. Thoughts?
>

Thanks for the review.
Yes, I agree with you that the stats update can be done at the end
of the function to avoid some overhead. Updated patch is attached.

There is an overhead with IO time calculation. Is the view is good
enough without IO columns?

And also during my tests, I didn't observe any other background
processes performing the xlogwrite operation, the values are always
zero. Is it fine to merge them with backend columns?

Regards,
Hari Babu
Fujitsu Australia


0001-pg_stat_walwrites-statistics-view_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-09-11 Thread Pavel Stehule
2017-09-11 22:28 GMT+02:00 Tom Lane :

> Jeevan Chalke  writes:
> [ psql-named-arguments-03-jeevan.patch ]
>
> Pushed with minor simplification of the test case.
>
> I'm not quite as convinced as Pavel that this is an improvement ---
> it will make error messages inconsistent between named and unnamed
> arguments.  Still, I follow the point that when there are a lot of
> arguments, $n is pretty unhelpful.  We can always revert this if
> we get complaints.
>

Thank you very much

Regards

Pavel


>
> regards, tom lane
>


Re: [HACKERS] GatherMerge misses to push target list

2017-09-11 Thread Amit Kapila
On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila  wrote:
> During my recent work on costing of parallel paths [1], I noticed that
> we are missing to push target list below GatherMerge in some simple
> cases like below.
>

I think this should be considered as a bug-fix for 10.0, but it
doesn't block any functionality or give wrong results, so we might
consider it as an optimization for GatherMerge.   In any case, I have
added this to next CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-11 Thread Michael Paquier
On Mon, Sep 11, 2017 at 2:05 PM, Bossart, Nathan  wrote:
>> +   if (i == InvalidAttrNumber)
>> +   ereport(ERROR,
>> +   (errcode(ERRCODE_UNDEFINED_COLUMN),
>> +errmsg("column \"%s\" of relation \"%s\" does not 
>> exist",
>> +   col, RelationGetRelationName(rel;
>> This could use the schema name unconditionally as you hold a Relation
>> here, using RelationGetNamespace().
>
> This is added in v16 of the main patch.
>
>> So if the relation is analyzed but skipped, we would have no idea that
>> it actually got skipped because there are no reports about it. That's
>> not really user-friendly. I am wondering if we should not instead have
>> analyze_rel also enforce the presence of a RangeVar, and adding an
>> assert at the beginning of the function to undertline that, and also
>> do the same for vacuum(). It would make things also consistent with
>> vacuum() which now implies on HEAD that a RangeVar *must* be
>> specified.
>
> I've made these changes in v16 of the main patch.

+   if (include_parts)
+   {
+   List *partition_oids = find_all_inheritors(relid, NoLock, NULL);
+   ListCell *part_lc;
+   foreach(part_lc, partition_oids)
+   {
+   VacuumRelation *tmp = copyObject(relinfo);
+   Oid part_oid = lfirst_oid(part_lc);
+   tmp->oid = part_oid;
+   vacrels_tmp = lappend(vacrels_tmp, tmp);
+   }
+   }
I thought that you would have changed that as well, but that's not
completely complete... In my opinion, HEAD is wrong in using the same
RangeVar for error reporting for a parent table and its partitions, so
that's not completely the fault of your patch. But I think that as
this patch makes vacuum routines smarter, you should create a new
RangeVar using makeRangeVar as you hold the OID of the child partition
in this code path. This would allow error reports to actually use the
data of the partition saved here instead of the parent data.

The rest looks fine to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-09-11 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Sep 11, 2017 at 11:55 PM, Tom Lane  wrote:
>> Hm, I don't much like having it silently ignore files that are present;
>> that seems like a foot-gun in the long run.  What do you think of the
>> attached?

> That looks good to me. I have tried pretty hard to break it, but could not.

Pushed, thanks for testing!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Amit Langote
On 2017/09/11 19:45, Ashutosh Bapat wrote:
> On Mon, Sep 11, 2017 at 12:16 PM, Amit Langote wrote:
>> IMHO, we should make it the responsibility of the future patch to set a
>> child PlanRowMark's prti to the direct parent's RT index, when we actually
>> know that it's needed for something.  We clearly know today why we need to
>> pass the other objects like child RT entry, RT index, and Relation, so we
>> should limit this patch to pass only those objects to the recursive call.
>> That makes this patch a relatively easy to understand change.
> 
> I think you are mixing two issues here 1. setting parent RTI in child
> PlanRowMark and 2. passing immediate parent's PlanRowMark to
> expand_single_inheritance_child().
> 
> I have discussed 1 in my reply to Robert.
> 
> About 2 you haven't given any particular comments to my reply. To me
> it looks like it's this patch that introduces the notion of
> multi-level expansion, so it's natural for this patch to pass
> PlanRowMark in cascaded fashion similar to other structures.

You patch does 2 to be able to do 1, doesn't it?  That is, to be able to
set the child PlanRowMark's prti to the direct parent's RT index, you pass
the immediate parent's PlanRowMark to the recursive call of
expand_single_inheritance_child().

All I am trying to say is that this patch's mission is to expand
inheritance step-wise to be able to do certain things in the *planner*
that weren't possible before.  The patch accomplishes that by creating
child AppendRelInfos such that its parent_relid field is set to the
immediate parent's RT index.  It's quite clear why we're doing so.  It's
not clear why we should do so for PlanRowMarks too.  Maybe it's fine as
long as nothing breaks.

>> If we go with your patch, partitioned tables won't get locked, for
>> example, in case of the following query (p is a partitioned table):
>>
>> select 1 from p union all select 2 from p;
>>
>> That's because the RelOptInfos for the two instances of p in the above
>> query are RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL.  They are children
>> of the Append corresponding to the UNION ALL subquery RTE.  So,
>> partitioned_rels does not get set per your proposed code.
> 

[...]

> So, all partitioned partitions are getting locked correctly. Am I
> missing something?

Will reply to this separately to your other email.

> Actually, the original problem that caused this discussion started
> with an assertion failure in get_partitioned_child_rels() as
> Assert(list_length(result) >= 1);
> 
> This assertion fails if result is NIL when an intermediate partitioned
> table is passed. May be we should assert (result == NIL ||
> list_length(result) == 1) and allow that function to be called even
> for intermediate partitioned partitions for which the function will
> return NIL. That will leave the code in add_paths_to_append_rel()
> simple. Thoughts?

Yeah, I guess that could work.  We'll just have to write comments to
describe why the Assert is written that way.

>> In create_lateral_join_info():
>>
>> +Assert(IS_SIMPLE_REL(brel));
>> +Assert(brte);
>>
>> The second Assert is either unnecessary or should be placed first.
> 
> simple_rte_array[] may have some NULL entries. Second assert makes
> sure that we aren't dealing with a NULL entry. Any particular reason
> to reorder the asserts?

Sorry, I missed that the 2nd Assert has b"rte".  I thought it's b"rel".

>> The following comment could be made a bit clearer.
>>
>> + * In the case of table inheritance, the parent RTE is directly
>> linked
>> + * to every child table via an AppendRelInfo.  In the case of table
>> + * partitioning, the inheritance hierarchy is expanded one level at 
>> a
>> + * time rather than flattened.  Therefore, an other member rel
>> that is
>> + * a partitioned table may have children of its own, and must
>> + * therefore be marked with the appropriate lateral info so that
>> those
>> + * children eventually get marked also.
>>
>> How about: In the case of partitioned table inheritance, the original
>> parent RTE is linked, via AppendRelInfo, only to its immediate partitions.
>>  Partitions below the first level are accessible only via their immediate
>> parent's RelOptInfo, which would be of kind RELOPT_OTHER_MEMBER_REL, so
>> consider those as well.
> 
> I don't see much difference between those two. We usually do not use
> macros in comments, so usually comments mention "other member" rel.
> Let's leave this for the committer to judge.

Sure.

>> In expand_inherited_rtentry(), the following comment fragment is obsolete,
>> because we *do* now create AppendRelInfo's for partitioned children:
>>
>> +/*
>> + * We keep a list of objects in root, each of which maps a
>> partitioned
>> + * parent RT index to the list of RT indexes of its partitioned 
>> child
>> + * tables which do not have AppendRelInfos associated with those.
> 
> Good catch. I have 

[HACKERS] Removing wal_keep_segments as default configuration in PostgresNode.pm

2017-09-11 Thread Michael Paquier
Hi all,

Right now, PostgresNode.pm uses this set of parameters when initializing a node:
print $conf "\n# Added by PostgresNode.pm\n";
print $conf "fsync = off\n";
print $conf "restart_after_crash = off\n";
print $conf "log_line_prefix = '%m [%p] %q%a '\n";
print $conf "log_statement = all\n";
print $conf "wal_retrieve_retry_interval = '500ms'\n";
print $conf "port = $port\n";

When streaming is enabled, the following set is used:
print $conf "wal_level = replica\n"; # replace by logical here
is you need ;)
print $conf "max_wal_senders = 5\n";
print $conf "max_replication_slots = 5\n";
print $conf "max_wal_size = 128MB\n";
print $conf "shared_buffers = 1MB\n";
print $conf "wal_log_hints = on\n";
print $conf "hot_standby = on\n";
print $conf "max_connections = 10\n";

While all those settings are good defaults in my opinion, I have been
trapped by wal_keep_segments being present when designing a test. The
TAP test in question here was forcing WAL segments to be recycled with
two checkpoints and some forced segment switches to make a
disconnected standby sync back to a primary using some archives, but
then I got surprised that max_wal_size was not respected. Until I
noticed that keep_wal_segments was in play.

I tend to think that while all the other parameters make sense to
deploy instances that need few resources, wal_keep_segments may cause
up to 350MB of WAL segments to be kept in each pg_wal's instance,
while max_wal_size is set at 128MB. The only test in the code tree in
need of wal_keep_segments is actually pg_rewind, which enforces
checkpoints after the rewind to update the source's control file.

So, thoughts about the attached that reworks this portion of PostgresNode.pm?
Thanks,
-- 
Michael


tap-minimize-default.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-11 Thread Fabien COELHO


Hello Jeff,


Shouldn't we use pg_usleep to ensure portability?  it is defined for
front-end code.  But it returns void, so the error check will have to be
changed.


Attached v3 with pg_usleep called instead.


I didn't see the problem before the commit I originally indicated , so I
don't think it has to be back-patched to before v10.


Hmmm you've got a point, although I'm not sure how it could work 
without sleeping explicitely. Maybe the path was calling select with an 
empty wait list plus timeout, and select is kind enough to just sleep on 
an empty list, or some other miracle. ISTM clearer to explicitely sleep in 
that case.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e37496c..524fcc6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4582,20 +4582,30 @@ threadRun(void *arg)
 		 * or it's time to print a progress report.  Update input_mask to show
 		 * which client(s) received data.
 		 */
-		if (min_usec > 0 && maxsock != -1)
+		if (min_usec > 0)
 		{
-			int			nsocks; /* return from select(2) */
+			int			nsocks = 0; /* return from select(2) if called */
 
 			if (min_usec != PG_INT64_MAX)
 			{
-struct timeval timeout;
+if (maxsock != -1)
+{
+	struct timeval timeout;
 
-timeout.tv_sec = min_usec / 100;
-timeout.tv_usec = min_usec % 100;
-nsocks = select(maxsock + 1, _mask, NULL, NULL, );
+	timeout.tv_sec = min_usec / 100;
+	timeout.tv_usec = min_usec % 100;
+	nsocks = select(maxsock + 1, _mask, NULL, NULL, );
+}
+else /* nothing active, simple sleep */
+{
+	pg_usleep(min_usec);
+}
 			}
-			else
+			else /* no explicit delay, select without timeout */
+			{
 nsocks = select(maxsock + 1, _mask, NULL, NULL, NULL);
+			}
+
 			if (nsocks < 0)
 			{
 if (errno == EINTR)
@@ -4608,7 +4618,7 @@ threadRun(void *arg)
 goto done;
 			}
 		}
-		else
+		else /* min_usec == 0, i.e. something needs to be executed */
 		{
 			/* If we didn't call select(), don't try to read any data */
 			FD_ZERO(_mask);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-09-11 Thread Michael Paquier
On Mon, Sep 11, 2017 at 11:55 PM, Tom Lane  wrote:
> Hm, I don't much like having it silently ignore files that are present;
> that seems like a foot-gun in the long run.  What do you think of the
> attached?

That looks good to me. I have tried pretty hard to break it, but could not.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-09-11 Thread Jing Wang
Hi Surafel,

Please find the rebased patch based on latest version in the attached file.

Regards,
Jing Wang
Fujitsu Australia


comment_on_current_database_for_pgdump_v4.patch
Description: Binary data


comment_on_current_database_no_pgdump_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-11 Thread Noah Misch
On Wed, May 10, 2017 at 10:50:51PM -0400, Bruce Momjian wrote:
> On Mon, May  1, 2017 at 08:12:51AM -0400, Robert Haas wrote:
> > On Tue, Apr 25, 2017 at 10:16 PM, Bruce Momjian  wrote:
> > > Well, we could add "MD5 users are encouraged to switch to
> > > SCRAM-SHA-256".  Now whether we want to list this as something on the
> > > SCRAM-SHA-256 description, or mention it as an incompatibility, or
> > > under Migration.  I am not clear that MD5 is in such terrible shape that
> > > this is warranted.
> > 
> > I think it's warranted.  The continuing use of MD5 has been a headache
> > for some EnterpriseDB customers who have compliance requirements which
> > they must meet.  It's not that they themselves necessarily know or
> > care whether MD5 is secure, although in some cases they do; it's that
> > if they use it, they will be breaking laws or regulations to which
> > their business or agency is subject.  I imagine customers of other
> > PostgreSQL companies have similar issues.  But leaving that aside, the
> > advantage of SCRAM isn't merely that it uses a better algorithm to
> > hash the password.  It has other advantages also, like not being
> > vulnerable to replay attacks.  If you're doing password
> > authentication, you should really be using SCRAM, and encouraging
> > people to move to SCRAM after upgrading is a good idea.
> > 
> > That having been said, SCRAM is a wire protocol break.  You will not
> > be able to upgrade to SCRAM unless and until the drivers you use to
> > connect to the database add support for it.  The only such driver
> > that's part of libpq; other drivers that have reimplemented the
> > PostgreSQL wire protocol will have to be updated with SCRAM support
> > before it will be possible to use SCRAM with those drivers.  I think
> > this should be mentioned in the release notes, too.  I also think it
> > would be great if somebody would put together a wiki page listing all
> > the popular drivers and (1) whether they use libpq or reimplement the
> > wire protocol, and (2) if the latter, the status of any efforts to
> > implement SCRAM, and (3) if those efforts have been completed, the
> > version from which they support SCRAM.  Then, I think we should reach
> > out to all of the maintainers of those driver authors who aren't
> > moving to support SCRAM and encourage them to do so.
> 
> I have added this as an open item because we will have to wait to see
> where we are with driver support as the release gets closer.

With the release near, I'm promoting this to the regular open issues section.


[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Thomas Munro
On Tue, Sep 12, 2017 at 11:57 AM, Bruce Momjian  wrote:
> On Tue, Aug  1, 2017 at 08:53:51AM +1200, Thomas Munro wrote:
>> "Add AFTER trigger transition tables to record changed rows (Kevin Grittner)"
>>
>> Any chance I could ask for a secondary author credit here?
>
> Sure, done.

Thanks.  Can you take it out again?  (Just kidding!)

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Bruce Momjian
On Fri, Sep  1, 2017 at 05:39:31PM +0900, Masahiko Sawada wrote:
> Hi all,
> 
> On Tue, Aug 1, 2017 at 5:53 AM, Thomas Munro
>  wrote:
> > On Tue, Apr 25, 2017 at 1:31 PM, Bruce Momjian  wrote:
> >> I have committed the first draft of the Postgres 10 release notes.  They
> >> are current as of two days ago, and I will keep them current.  Please
> >> give me any feedback you have.
> >
> > Hi Bruce,
> >
> > "Add AFTER trigger transition tables to record changed rows (Kevin 
> > Grittner)"
> >
> > Any chance I could ask for a secondary author credit here?  While I
> > started out as a reviewer and I understand that we don't list those, I
> > finished up writing quite a lot of lines of committed code for this
> > (see commits 1add0b15, 8c55244a, c46c0e52, 501ed02c, f32d57fd,
> > 9e6104c6, 29fd3d9d, 304007d9, 5ebeb579) and was already listed as a
> > co-author by Kevin in the original commits (59702716, 18ce3a4a).  Of
> > course I wish I'd identified and fixed all of those things *before*
> > the original commits, but that's how it played out...
> >
> 
> It might be too late but I found that the following entry is
> categorized in Monitoring. But in PostgreSQL 9.6 release note, the
> feature related to default role is categorized in Permissions
> Management. I think the adding new default roles can be categorized in
> the same category to not confuse users and personally it's more
> suitable.
> 
> "Add default monitoring roles (Dave Page)"

We don't have a "Permissions Management" category in PG 10 and unless we
have at least three items to add in there, I don't think it is worth
adding it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Bruce Momjian
On Tue, Aug  1, 2017 at 08:53:51AM +1200, Thomas Munro wrote:
> On Tue, Apr 25, 2017 at 1:31 PM, Bruce Momjian  wrote:
> > I have committed the first draft of the Postgres 10 release notes.  They
> > are current as of two days ago, and I will keep them current.  Please
> > give me any feedback you have.
> 
> Hi Bruce,
> 
> "Add AFTER trigger transition tables to record changed rows (Kevin Grittner)"
> 
> Any chance I could ask for a secondary author credit here?  While I
> started out as a reviewer and I understand that we don't list those, I
> finished up writing quite a lot of lines of committed code for this
> (see commits 1add0b15, 8c55244a, c46c0e52, 501ed02c, f32d57fd,
> 9e6104c6, 29fd3d9d, 304007d9, 5ebeb579) and was already listed as a
> co-author by Kevin in the original commits (59702716, 18ce3a4a).  Of
> course I wish I'd identified and fixed all of those things *before*
> the original commits, but that's how it played out...

Sure, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Bruce Momjian
On Sat, Sep  9, 2017 at 12:39:43PM +0200, Adrien Nayrat wrote:
> On 07/13/2017 04:36 PM, Adrien Nayrat wrote:
> > Hello hackers,
> > 
> > From: Peter Geoghegan 
> >> Date: Wed, 5 Jul 2017 15:19:57 -0700
> >> Subject: Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit 
> >> overflow
> >> On pgsql-b...@postgresql.org
> > 
> > On 07/06/2017 12:19 AM, Peter Geoghegan wrote:
> >> In Postgres 10, tuplesort external sort run merging became much faster
> >> following commit 24598337c8d. It might be noticeable if such a machine
> >> were using Postgres 10 [...]
> > 
> > Should-we mention this improvement in release notes?
> > 
> > Regards,
> > 
> 
> Hello,
> 
> After seeing theses slides (especially 52) :
> https://speakerdeck.com/peterg/sort-hash-pgconfus-2017
...
> 
> 
> 
> Should we mention it ?

I don't know, but I suggest you read this email thread from April to get
an idea of how performance items are handled:


https://www.postgresql.org/message-id/flat/20170425013144.GA7513%40momjian.us#20170425013144.ga7...@momjian.us

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Bruce Momjian
On Mon, Sep 11, 2017 at 06:30:58PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Jun  2, 2017 at 04:05:44PM -0500, Jim Nasby wrote:
> >> Can you change the attribution on
> >> Allow PL/Tcl functions to return composite types and sets
> >> to Karl Lehenbauer?
> 
> > Done and backpatched.  Sorry for the delay.
> 
> I don't see this pushed to the repo?

Sorry, pushed now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-11 Thread Michael Paquier
On Mon, Sep 11, 2017 at 11:43 PM, Robert Haas  wrote:
> So I think this is just an excuse for turning --no-security-labels
> into --no-object-property=security-label.  To me, that's just plain
> worse.

It does not seem that my thoughts here have been correctly transmitted
to your brain. I do not mean to change the user-facing options, just
to refactor the code internally so as --no-foo switches can be more
easily generated, added and handled as they are associated with an
object type. A portion of the complains is caused by the fact that a
lot of similar code is duplicated.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Jun  2, 2017 at 04:05:44PM -0500, Jim Nasby wrote:
>> Can you change the attribution on
>> Allow PL/Tcl functions to return composite types and sets
>> to Karl Lehenbauer?

> Done and backpatched.  Sorry for the delay.

I don't see this pushed to the repo?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Coverage improvements of src/bin/pg_basebackup and pg_receivewal --endpos

2017-09-11 Thread Michael Paquier
On Tue, Sep 12, 2017 at 5:57 AM, Peter Eisentraut
 wrote:
> I have committed this patch, after a perltidy run, but the way the libz
> detection was implemented was a bit too hackish for me, so I have
> omitted that for now.

Thanks.

> I think a more robust way would be to parse
> Makefile.global, perhaps by a function in TestLib, so it can be reused
> in other tests.  (Even more robust would be to write out an actual Perl
> file with configuration information somewhere, but maybe we don't need
> to go there yet.)  Or maybe have the respective make variables exported
> when make check is called (could be included in the prove_check
> variable?).  Anyway, some more pondering could lead to a more general
> solution.

There is always room for improvement,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-09-11 Thread Bruce Momjian
On Fri, Jun  2, 2017 at 04:05:44PM -0500, Jim Nasby wrote:
> On 4/24/17 8:31 PM, Bruce Momjian wrote:
> >I have committed the first draft of the Postgres 10 release notes.  They
> >are current as of two days ago, and I will keep them current.  Please
> >give me any feedback you have.
> >
> >The only unusual thing is that this release has ~180 items while most
> >recent release have had ~220.  The pattern I see that there are more
> >large features in this release than previous ones.
> 
> Can you change the attribution on
> 
> Allow PL/Tcl functions to return composite types and sets
> 
> to Karl Lehenbauer? He actually wrote the original patch; I just helped to
> get it through the community (something that FlightAware paid for). I didn't
> realize at the time that you could change the listed Author in the
> commitfest.

Done and backpatched.  Sorry for the delay.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-11 Thread Dmitry Dolgov
> On 11 September 2017 at 23:45, Tom Lane  wrote:
>
> Dmitry Dolgov <9erthali...@gmail.com> writes:
> >> On 11 September 2017 at 23:19, Tom Lane  wrote:
> >> Uh, what?  Sure you can.  Just because the existing code never has a
> >> reason to create such a dependency doesn't mean it wouldn't work.
>
> > Well, I thought that `pg_depend` was not intended to be used from
> > user-defined code and it's something "internal".
>
> Well, no, we're not expecting that SQL code will manually insert rows
> there.  This feature should have some sort of SQL command that will
> set up the relevant catalog entries, including the dependencies.
> If you don't want to do that, you're going to need the runtime tests.

Sure, an SQL command for that purpose is much better than a runtime check.
I'm going to add such command to the patch, thank you for the information!


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-11 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On 11 September 2017 at 23:19, Tom Lane  wrote:
>> Uh, what?  Sure you can.  Just because the existing code never has a
>> reason to create such a dependency doesn't mean it wouldn't work.

> Well, I thought that `pg_depend` was not intended to be used from
> user-defined code and it's something "internal".

Well, no, we're not expecting that SQL code will manually insert rows
there.  This feature should have some sort of SQL command that will
set up the relevant catalog entries, including the dependencies.
If you don't want to do that, you're going to need the runtime tests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postponement of v10 release schedule

2017-09-11 Thread Tom Lane
Per the discussion in
https://www.postgresql.org/message-id/flat/20170909064853.25630.12825%40wrigleys.postgresql.org
it seems that our new trigger-transition-table feature isn't
really SQL-spec compliant.  Rather than releasing v10 that way,
and then causing compatibility problems when we fix it later,
the release team has agreed that a slippage in the v10 schedule
is appropriate.

We considered doing a wrap this week anyway and calling it beta5,
but there hasn't really been enough change from beta4 to justify
the overhead of an extra packaging cycle.  Accordingly, there will
be no release this week.

The current plan is to try to implement a fix for the trigger
semantics problem this week, and then put out 10rc1 next week.

I'm expecting that we will also slip 10.0 GA release one week,
although that could change depending on how messy the trigger fix
ends up being.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-11 Thread Dmitry Dolgov
> On 11 September 2017 at 23:19, Tom Lane  wrote:
>
> Uh, what?  Sure you can.  Just because the existing code never has a
> reason to create such a dependency doesn't mean it wouldn't work.

Well, I thought that `pg_depend` was not intended to be used from
user-defined
code and it's something "internal". But if I'm wrong then maybe the problem
Arhur raised is a valid reason for that.


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-11 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> About dependencies between functions - as far as I understand one cannot
> create a `pg_depend` entry or any other kind of dependencies between
> custom user-defined functions.

Uh, what?  Sure you can.  Just because the existing code never has a
reason to create such a dependency doesn't mean it wouldn't work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-11 Thread Dmitry Dolgov
> On 9 September 2017 at 23:33, Arthur Zakirov 
wrote:
> PostgreSQL and documentation with the patch compiles without any errors.
All
> regression tests passed.

Thank you!

> But honestly I still cannot say that I agree with *_extract() and
*_assign()
> functions creation way. For example, there is no entry in pg_depend for
them
> ...
> =# drop function custom_subscripting_extract(internal);
> =# select data[0] from test_subscripting;
> ERROR:  function 0x55deb7911bfd returned NULL

Hm...I never thought about the feature in this way. When I was
experimenting I
also tried another approach for this - save to `refevalfunc` a function
pointer to an appropriate function. For simple situations it was ok, but
there
were questions about how it would work with node-related functions from
`outfuncs`/`copyfuncs` etc. Another my idea was to find out an actual
`refevalfunc` not at the time of a node creation but later on - this was
also
questionable since in this case we need to carry a lot of information with
a node
just for this sole purpose. Maybe you can suggest something else?

About dependencies between functions - as far as I understand one cannot
create
a `pg_depend` entry or any other kind of dependencies between custom
user-defined functions. So yes, looks like with the current approach the
only
solution would be to check in the `_parse` function that `_extract` and
`_assign` functions are existed (which is inconvenient).

> For example, there is no entry in pg_depend

Are there any other disadvantages of this approach?


Re: [HACKERS] Coverage improvements of src/bin/pg_basebackup and pg_receivewal --endpos

2017-09-11 Thread Peter Eisentraut
On 9/6/17 00:54, Michael Paquier wrote:
>> - I think the tests will fail if libz support is not built.  Again,
>> pg_basebackup doesn't have tests for that.  So maybe omit that and
>> address it later.
> 
> Let's address it now. This can be countered by querying pg_config()
> before running the command of pg_receivexwal which uses --compress.

I have committed this patch, after a perltidy run, but the way the libz
detection was implemented was a bit too hackish for me, so I have
omitted that for now.  I think a more robust way would be to parse
Makefile.global, perhaps by a function in TestLib, so it can be reused
in other tests.  (Even more robust would be to write out an actual Perl
file with configuration information somewhere, but maybe we don't need
to go there yet.)  Or maybe have the respective make variables exported
when make check is called (could be included in the prove_check
variable?).  Anyway, some more pondering could lead to a more general
solution.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-09-11 Thread Tom Lane
Jeevan Chalke  writes:
[ psql-named-arguments-03-jeevan.patch ]

Pushed with minor simplification of the test case.

I'm not quite as convinced as Pavel that this is an improvement ---
it will make error messages inconsistent between named and unnamed
arguments.  Still, I follow the point that when there are a lot of
arguments, $n is pretty unhelpful.  We can always revert this if
we get complaints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-09-11 Thread Peter Eisentraut
On 9/8/17 21:31, Thomas Munro wrote:
> +if ($^O eq 'darwin')
> +{
> +   $slapd = '/usr/local/opt/openldap/libexec/slapd';
> +   $ldap_schema_dir = '/usr/local/etc/openldap/schema';
> +}
> 
> I'm guessing this is the MacPorts location, and someone from that
> other tribe that uses Brew can eventually post a patch to make this
> look in more places.

Or the other way around :)

> +my $ldap_port = int(rand() * 16384) + 49152;
> 
> Hmm.  I guess ldapi (Unix domain sockets) would be less roulette-like,
> but require client side support too.

Yeah, issue similar to the SSL tests.  The above formula is what
PostgresNode uses.

> Here's a change I needed to make to run this here.  It seems that to
> use "database mdb" I'd need to add a config line to tell it the path
> to load back_mdb.so from.  I could have done, but I noticed that if I
> tell it to use raw ldif files instead it's happy.  Does this still
> work for you on the systems you tested?

Yes, that seems like a better choice.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:55 PM, Peter Geoghegan  wrote:
> On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
>  wrote:
>>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>>
>> This breaks the documentation build, because
>> doc/src/sgml/release-9.6.sgml still contains > linkend="guc-replacement-sort-tuples"> but you removed that id.
>
> Thanks for looking into it.
>
> I suppose that the solution is to change the 9.6 release notes to no
> longer have a replacement_sort_tuples link. Anyone else have an
> opinion on that?

Attached is a revision of the patch that no longer breaks the
documentation build, by using a literal tag to refer to
replacement_sort_tuples within doc/src/sgml/release-9.6.sgml. The
patch is otherwise unchanged, and so reviewers shouldn't bother with
it (I just want to unbreak Thomas' continuous integration build, and
to save a committer the hassle of fixing the doc build themselves). I
verified that "make check-world" passes this time.

I also eyeballed the html generated by a "make STYLE=website html", to
ensure that it looked consistent with its surroundings. The resulting
9.6 release notes looked good to me.

-- 
Peter Geoghegan
From a22a77d09a5fb37713d8486db84e9619acc449d3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 30 Aug 2017 16:14:36 -0700
Subject: [PATCH] Remove replacement selection sort.

This simplifies tuplesort.c, since heap maintenance routines need no
longer concern themselves with cases where two runs are represented
within a heap.

Replacement selection sort's traditional advantages no longer allow it
to outperform a simple sort-merge strategy, even in sympathetic cases.
This is due to long term trends in hardware performance characteristics,
and enhancements to the tuplesort.c merge code in the past couple of
years.

This will need to be called out as an incompatibility in the v11 release
notes, since it's possible that users set replacement_sort_tuples
themselves.

Discussion: https://postgr.es/m/cah2-wzmmnjg_k0r9nqywmq3zjyjjk+hcbizynghay-zyjs6...@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  39 ---
 doc/src/sgml/release-9.6.sgml |   2 +-
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 415 +++---
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 9 files changed, 52 insertions(+), 448 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..7c625e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1513,45 +1513,6 @@ include_dir 'conf.d'
   
  
 
- 
-  replacement_sort_tuples (integer)
-  
-   replacement_sort_tuples configuration parameter
-  
-  
-  
-   
-When the number of tuples to be sorted is smaller than this number,
-a sort will produce its first output run using replacement selection
-rather than quicksort.  This may be useful in memory-constrained
-environments where tuples that are input into larger sort operations
-have a strong physical-to-logical correlation.  Note that this does
-not include input tuples with an inverse
-correlation.  It is possible for the replacement selection algorithm
-to generate one long run that requires no merging, where use of the
-default strategy would result in many runs that must be merged
-to produce a final sorted output.  This may allow sort
-operations to complete sooner.
-   
-   
-The default is 150,000 tuples.  Note that higher values are typically
-not much more effective, and may be counter-productive, since the
-priority queue is sensitive to the size of available CPU cache, whereas
-the default strategy sorts runs using a cache
-oblivious algorithm.  This property allows the default sort
-strategy to automatically and transparently make effective use
-of available CPU cache.
-   
-   
-Setting maintenance_work_mem to its default
-value usually prevents utility command external sorts (e.g.,
-sorts used by CREATE INDEX to build B-Tree
-indexes) from ever using replacement selection sort, unless the
-input tuples are quite wide.
-   
-  
- 
-
  
   autovacuum_work_mem (integer)
   
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index fa5355f..603300a 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -5110,7 +5110,7 @@ and many others in the same vein
 The new approach makes better use of the CPU 

Re: [HACKERS] More flexible LDAP auth search filters?

2017-09-11 Thread Peter Eisentraut
On 9/8/17 13:24, Mark Cave-Ayland wrote:
> My weapon of choice for LDAP deployments on POSIX-based systems is
> Arthur De Jong's nss-pam-ldapd (https://arthurdejong.org/nss-pam-ldapd)
> which is far more flexible than pam_ldap and fixes a large number of
> bugs, including the tendency for pam_ldap to hang infinitely if it can't
> contact its LDAP server.
> 
> Take a look at nss-pam-ldapd's man page for nslcd.conf and in particular
> pam_authz_search - this is exactly the type of filters I would end up
> deploying onto servers. This happens a lot in large organisations
> whereby getting group memberships updated in the main directory can take
> days/weeks whereas someone with root access to the server itself can
> hard-code an authentication list of users and/or groups in an LDAP
> filter in just a few minutes.

Thomas, would you consider using the placeholder syntax described at
 under
pam_authz_search?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-11 Thread Pavel Stehule
2017-09-11 20:46 GMT+02:00 Fabien COELHO :

>
> I think you're overly optimistic to believe that every failure will
 have a SQLSTATE; I don't think that's true for libpq-reported errors,
 such as connection loss.

>>>
>> Yep, I thought I was optimistic:-) Can I add a special SQLSTATE for that
>>> situation where libpq did not report an error?
>>>
>>
>> Meh.  If we're going to do that I think it might be better to hack
>> libpq itself to do so, ie, force PQresultErrorField(..., PG_DIAG_SQLSTATE)
>> to always return something.  But it seems like a hack either way.
>>
>
> I would not have took the liberty to hack into libpq internals for such a
> small front-end feature. However I agree that having libpq always return
> some diagnostic, even if it means "something unclear happened, sorry not to
> be very precise", would be better.


probably better don't do it before somebody implements this are correctly
.. some temporary solution can introduce possible compatibility issues.

If SQLSTATE has not know value, then it should be NULL or maybe empty
string.

Pavel


>
>
> --
> Fabien.
>


Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-11 Thread Jeff Janes
On Mon, Sep 11, 2017 at 1:49 AM, Fabien COELHO  wrote:

>
> Hello Jeff,
>
> Ok, the problem was a little bit more trivial than I thought.
>
> The issue is that under a low rate there may be no transaction in
> progress, however the wait procedure was relying on select's timeout. If
> nothing is active there is nothing to wait for, thus it was an active loop
> in this case...
>
> I've introduced a usleep call in place of select for this particular case.
> Hopefully this is portable.
>

Shouldn't we use pg_usleep to ensure portability?  it is defined for
front-end code.  But it returns void, so the error check will have to be
changed.

I didn't see the problem before the commit I originally indicated , so I
don't think it has to be back-patched to before v10.

Cheers,

Jeff


Re: [HACKERS] pgbench tap tests & minor fixes.

2017-09-11 Thread Andrew Dunstan


On 09/11/2017 01:58 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 09/08/2017 09:40 AM, Tom Lane wrote:
>>> Like you, I'm a bit worried about the code for extracting an exit
>>> status from IPC::Run::run.  We'll have to keep an eye on the buildfarm
>>> for a bit.  If there's any trouble, I'd be inclined to drop it down
>>> to just success/fail rather than checking the exact exit code.
>> bowerbird seems to have been made unhappy.
> I saw that failure, but it appears to be a server-side crash:
>
> 2017-09-10 19:39:03.395 EDT [1100] LOG:  server process (PID 11464) was 
> terminated by exception 0xC005
> 2017-09-10 19:39:03.395 EDT [1100] HINT:  See C include file "ntstatus.h" for 
> a description of the hexadecimal value.
>
> Given the lack of any log outputs from process 11464, it's hard to tell
> what it was doing, but it seems not to be any of the backends running
> pgbench queries.  So maybe an autovac worker?  I dunno.  Anyway, it's
> difficult to credit that this commit caused the failure, even if it did
> happen during the new test case.  I'm inclined to write it off as another
> one of the random crashes that bowerbird seems prone to.
>
> If the failure proves repeatable, then of course we'll need to look
> more closely.
>
>   


Hmm, it had several failures and now a success. Will keep an eye on it.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assorted code cleanup

2017-09-11 Thread Peter Eisentraut
On 9/7/17 14:53, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 9/5/17 15:32, Tom Lane wrote:
>>> I do agree with the idea that we should use the * notation in cases where
>>> the reader might otherwise think that a plain function was being invoked,
>>> ie I don't like
>>> some_function_pointer(args);
>>> Even if the compiler isn't confused, readers might be.  But in the case of
>>> structname->pointerfield(args);
>>> it's impossible to read that as a plain function call, so I'm okay with
>>> dropping the extra punctuation there.
> 
>> Committed that way.  Thanks.
> 
> Is it worth memorializing this in the docs somewhere, perhaps
> "53.4. Miscellaneous Coding Conventions" ?

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-11 Thread Fabien COELHO



I think you're overly optimistic to believe that every failure will
have a SQLSTATE; I don't think that's true for libpq-reported errors,
such as connection loss.



Yep, I thought I was optimistic:-) Can I add a special SQLSTATE for that
situation where libpq did not report an error?


Meh.  If we're going to do that I think it might be better to hack
libpq itself to do so, ie, force PQresultErrorField(..., PG_DIAG_SQLSTATE)
to always return something.  But it seems like a hack either way.


I would not have took the liberty to hack into libpq internals for such a 
small front-end feature. However I agree that having libpq always return 
some diagnostic, even if it means "something unclear happened, sorry not 
to be very precise", would be better.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-11 Thread Peter Eisentraut
On 9/10/17 12:14, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I'm looking into this now and will report tomorrow.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-11 Thread Tom Lane
Fabien COELHO  writes:
>> I think you're overly optimistic to believe that every failure will
>> have a SQLSTATE; I don't think that's true for libpq-reported errors,
>> such as connection loss.

> Yep, I thought I was optimistic:-) Can I add a special SQLSTATE for that 
> situation where libpq did not report an error?

Meh.  If we're going to do that I think it might be better to hack
libpq itself to do so, ie, force PQresultErrorField(..., PG_DIAG_SQLSTATE)
to always return something.  But it seems like a hack either way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes.

2017-09-11 Thread Tom Lane
Andrew Dunstan  writes:
> On 09/08/2017 09:40 AM, Tom Lane wrote:
>> Like you, I'm a bit worried about the code for extracting an exit
>> status from IPC::Run::run.  We'll have to keep an eye on the buildfarm
>> for a bit.  If there's any trouble, I'd be inclined to drop it down
>> to just success/fail rather than checking the exact exit code.

> bowerbird seems to have been made unhappy.

I saw that failure, but it appears to be a server-side crash:

2017-09-10 19:39:03.395 EDT [1100] LOG:  server process (PID 11464) was 
terminated by exception 0xC005
2017-09-10 19:39:03.395 EDT [1100] HINT:  See C include file "ntstatus.h" for a 
description of the hexadecimal value.

Given the lack of any log outputs from process 11464, it's hard to tell
what it was doing, but it seems not to be any of the backends running
pgbench queries.  So maybe an autovac worker?  I dunno.  Anyway, it's
difficult to credit that this commit caused the failure, even if it did
happen during the new test case.  I'm inclined to write it off as another
one of the random crashes that bowerbird seems prone to.

If the failure proves repeatable, then of course we'll need to look
more closely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-11 Thread Fabien COELHO


Hello Tom,


Hm.  Looking closer at this, I see that it doesn't work so well after all
to put the variable-setting code in ProcessResult:
that fails to cover the ExecQueryUsingCursor code path.


Ok, I'll investigate this path.

And it also fails to cover DescribeQuery, which arguably should set 
these variables as well


And this one.

-- certainly so if it gets a failure.  Maybe you 
could create a small subroutine along the lines of 
SetResultVariables(PGresult *result, bool success) for all three places 
to call.  (ProcessResult certainly has already decided whether it's got 
a success, and I think the other paths would know that as well, so no 
need to re-extract it from the PGresult.)


Ok.


I think you're overly optimistic to believe that every failure will
have a SQLSTATE; I don't think that's true for libpq-reported errors,
such as connection loss.


Yep, I thought I was optimistic:-) Can I add a special SQLSTATE for that 
situation where libpq did not report an error?



Using upper-case TRUE/FALSE for the values of ERROR seems a bit
ugly to me; we generally use lower case for other variable values,
so I'd go with true/false.


Ok. The choice is not aesthetic but systematic: I use upper-case for all 
SQL keywords, and lower-case or capitalized for anything user land. I can 
put lower-case if you want.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes.

2017-09-11 Thread Andrew Dunstan


On 09/08/2017 09:40 AM, Tom Lane wrote:
> Fabien COELHO  writes:
>> [ pgbench-tap-12.patch ]
> Pushed, with some minor fooling with comments and after running it
> through perltidy.  (I have no opinions about Perl code formatting,
> but perltidy does ...)
>
> The only substantive change I made was to drop the test that attempted
> to connect to no-such-host.postgresql.org.  That's (a) unnecessary,
> as this is a test of pgbench not libpq; (b) likely to provoke a wide
> range of platform-specific error messages, which we'd have to account
> for given that the test is looking for specific output; and (c) likely
> to draw complaints from buildfarm owners and packagers who do not like
> test scripts that try to make random external network connections.
>
> Like you, I'm a bit worried about the code for extracting an exit
> status from IPC::Run::run.  We'll have to keep an eye on the buildfarm
> for a bit.  If there's any trouble, I'd be inclined to drop it down
> to just success/fail rather than checking the exact exit code.
>
>   



bowerbird seems to have been made unhappy.



cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER command progress monitor

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas  wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>  wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:50 AM, Robert Haas  wrote:
> On Mon, Sep 11, 2017 at 11:47 AM, Tomas Vondra
>  wrote:
>> The question is what is the optimal replacement_sort_tuples value? I
>> assume it's the number of tuples that effectively uses CPU caches, at
>> least that's what our docs say. So I think you're right it to 1B rows
>> may break this assumption, and make it perform worse.
>>
>> But perhaps the fact that we're testing with multiple work_mem values,
>> and with smaller data sets (100k or 1M rows) makes this a non-issue?
>
> I am not sure that's the case -- I think that before Peter's changes
> it was pretty easy to find cases where lowering work_mem made sorting
> ordered data go faster.

Before enhancements to merging by both Heikki and myself went into
Postgres 10, there were cases where replacement selection of the first
run did relatively well, and cases where it did badly despite managing
to avoid a merge. The former were cases where work_mem was low, and
the latter were cases where it was high. This was true because the
heap used is cache inefficient. When it was small/cache efficient, and
when there was ordering to exploit, replacement selection could win.
Your recollection there sounds accurate to me.

These days, even a small, cache-efficient heap is generally not good
enough to beat quicksorting, since merging attained the ability to
exploit presortedness itself within commit 24598337c, and memory
utilization for merging got better, too. Very roughly speaking,
merging attained the same advantage that replacement selection had all
along, and replacement selection lost all ability to compete.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread amul sul
On Mon, Sep 11, 2017 at 5:30 PM, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
> >  wrote:
> > >> Rebased 0002 against this commit & renamed to 0001, PFA.
> > >
> > > Given that we have default partition support now, I am wondering
> > > whether hash partitioned tables also should have default partitions.
> > > The way we have structured hash partitioning syntax, there can be
> > > "holes" in partitions. Default partition would help plug those holes.
> >
> > Yeah, I was thinking about that, too.  On the one hand, it seems like
> > it's solving the problem the wrong way: if you've set up hash
> > partitioning properly, you shouldn't have any holes.  On the other
> > hand, supporting it probably wouldn't cost anything noticeable and
> > might make things seem more consistent.  I'm not sure which way to
> > jump on this one.
>
> How difficult/tedious/troublesome would be to install the missing
> partitions if you set hash partitioning with a default partition and
> only later on notice that some partitions are missing?  I think if the
> answer is that you need to exclusive-lock something for a long time and
> this causes a disruption in production systems, then it's better not to
> allow a default partition at all and just force all the hash partitions
> to be there from the start.
>
>
I am also leaning toward ​not to support a default partition for a hash
partitioned table.

The major drawback I can see is the constraint get created on the default
partition
table.  IIUC, constraint on the default partition table are just negation
of partition
constraint on all its sibling partitions.

Consider a hash partitioned table having partitions with (modulus 64,
remainder 0) ,
, (modulus 64, remainder 62) hash bound and partition column are col1,
col2,...,so on,
then constraint for the default partition will be :

NOT( (satisfies_hash_partition(64, 0, hash_fn1(col1), hash_fn2(col2), ...)
&& ... &&
  satisfies_hash_partition(64, 62, hash_fn1(col1),hash_fn2(col2), ...))

​Which will be much harmful to the performance than any other partitioning
strategy because it calculate a hash for the same partitioning key multiple
time.
We could overcome this by having an another SQL function (e.g
satisfies_default_hash_partition)
which calculates hash value once and checks the remainder, and that would be
a different path from the current default partition framework.

​Regards,
Amul​


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:47 AM, Tomas Vondra
 wrote:
> The question is what is the optimal replacement_sort_tuples value?

See my remarks to Robert just now.

I think that it's incredibly hard to set replacement_sort_tuples
sensibly in 9.6. As of Postgres 10, it is much more likely to hurt
than to help, because of the enhancements to merging that went into
Postgres 10 reduced the downside of not using replacement selection.
And so, for Postgres 11 replacement_sort_tuples deserves to be
removed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:32 AM, Robert Haas  wrote:
> On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
>> To be clear, you'll still need to set replacement_sort_tuples high
>> when testing RS, to make sure that we really use it for at least the
>> first run when we're expected to. (There is no easy way to have
>> testing mechanically verify that we really do only have one run in the
>> end with RS, but I assume that such paranoia is unneeded.)
>
> I seem to recall that raising replacement_sort_tuples makes
> replacement selection look worse in some cases -- the optimization is
> more likely to apply, sure, but the heap is also bigger, which hurts.

But that was only because work_mem was set relatively high. If you're
going to test work_mem values that are slightly on the high side for
replacement selection (like, 8MB or 32MB), then increasing
replacement_sort_tuples ensures that replacement selection is actually
used for at least the first run, and you don't waste time comparing a
behavior (quicksorting runs) to itself.

All that matters is whether or not replacement_sort_tuples exceeds the
number of tuples that the first run would have if quicksorted
immediately. replacement_sort_tuples is only ever used to answer a
single yes/no question.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Kuntal Ghosh
On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi  wrote:
>
> Attached the latest patch and performance report.
>
While looking into the patch, I realized that a normal backend has to
check almost 10 if conditions at worst case inside XLogWrite(6 in
am_background_process method, 1 for write, 1 for blocks, 2 for
fsyncs), just to decide whether it is a normal backend or not. The
count is more for walwriter process. Although I've not done any
performance tests, IMHO, it might add further overhead on the
XLogWrite Lock.

I was thinking whether it is possible to collect all the stats in
XLogWrite() irrespective of the type of backend and update the shared
counters at once at the end of the function. Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:17 AM, Tomas Vondra
 wrote:
> Overall I think the results show quite significant positive impact of
> the patch. There are a few cases of regression, but ISTM those may
> easily be noise as it's usually 0.03 vs 0.04 second, or something. I'll
> switch to the \timing (instead of /usr/bin/time) to get more accurate
> results, and rerun those tests.

I'm glad to hear it. While I'm not surprised, I still don't consider
the patch to be a performance enhancement. It is intended to lower the
complexity of tuplesort.c, and the overall performance improvement is
just a bonus IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 11:47 AM, Tomas Vondra
 wrote:
> The question is what is the optimal replacement_sort_tuples value? I
> assume it's the number of tuples that effectively uses CPU caches, at
> least that's what our docs say. So I think you're right it to 1B rows
> may break this assumption, and make it perform worse.
>
> But perhaps the fact that we're testing with multiple work_mem values,
> and with smaller data sets (100k or 1M rows) makes this a non-issue?

I am not sure that's the case -- I think that before Peter's changes
it was pretty easy to find cases where lowering work_mem made sorting
ordered data go faster.

But I could easily be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Erik Rijkers

On 2017-09-11 02:12, Thomas Munro wrote:

On Mon, Sep 11, 2017 at 11:40 AM, Michael Paquier
 wrote:

Thomas Munro has hacked up a prototype of application testing
automatically if patches submitted apply and build:
http://commitfest.cputube.org/


I should add: this is a spare-time effort, a work-in-progress and
building on top of a bunch of hairy web scraping, so it may take some
time to perfect.


It would be great if one of the intermediary products of this effort 
could be made available too, namely, a list of latest patches.


Or perhaps such a list should come out of the commitfest app.

For me, such a list would be even more useful than any subsequently 
processed results.


thanks,

Erik Rijkers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Tomas Vondra
On 09/11/2017 05:32 PM, Robert Haas wrote:
> On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
>> To be clear, you'll still need to set replacement_sort_tuples high
>> when testing RS, to make sure that we really use it for at least the
>> first run when we're expected to. (There is no easy way to have
>> testing mechanically verify that we really do only have one run in the
>> end with RS, but I assume that such paranoia is unneeded.)
> 
> I seem to recall that raising replacement_sort_tuples makes
> replacement selection look worse in some cases -- the optimization is
> more likely to apply, sure, but the heap is also bigger, which hurts.
> 

The question is what is the optimal replacement_sort_tuples value? I
assume it's the number of tuples that effectively uses CPU caches, at
least that's what our docs say. So I think you're right it to 1B rows
may break this assumption, and make it perform worse.

But perhaps the fact that we're testing with multiple work_mem values,
and with smaller data sets (100k or 1M rows) makes this a non-issue?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-11 Thread Tom Lane
Fabien COELHO  writes:
> Small v7 update, sorry for the noise.

Hm.  Looking closer at this, I see that it doesn't work so well after all
to put the variable-setting code in ProcessResult: that fails to cover the
ExecQueryUsingCursor code path.  And it also fails to cover DescribeQuery,
which arguably should set these variables as well -- certainly so if it
gets a failure.  Maybe you could create a small subroutine along the lines
of SetResultVariables(PGresult *result, bool success) for all three places
to call.  (ProcessResult certainly has already decided whether it's got a
success, and I think the other paths would know that as well, so no need
to re-extract it from the PGresult.)

I think you're overly optimistic to believe that every failure will
have a SQLSTATE; I don't think that's true for libpq-reported errors,
such as connection loss.

Using upper-case TRUE/FALSE for the values of ERROR seems a bit
ugly to me; we generally use lower case for other variable values,
so I'd go with true/false.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-09-11 Thread Dilip Kumar
On Thu, Sep 7, 2017 at 11:41 AM, Amit Khandekar  wrote:
> On 6 September 2017 at 21:47, Dilip Kumar  wrote:

> Actually, since transition tables came in, the functions like
> ExecARUpdateTriggers() or ExecARInsertTriggers() have this additional
> purpose of capturing transition table rows, so that the images of the
> tables are visible when statement triggers are fired that refer to
> these transition tables. So in the above code, these functions only
> capture rows, they do not add any event for firing any ROW triggers.
> AfterTriggerSaveEvent() returns without adding any event if it's
> called only for transition capture. So even if UPDATE row triggers are
> defined, they won't get fired in case of row movement, although the
> updated rows would be captured if transition tables are referenced in
> these triggers or in the statement triggers.
>

Ok then I have one more question,

With transition table, we can only support statement level trigger and
for update
statement, we are only going to execute UPDATE statement level
trigger? so is there
any point of making transition table entry for DELETE/INSERT trigger
as those transition
table will never be used.  Or I am missing something?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Tomas Vondra


On 09/11/2017 03:01 PM, Aleksander Alekseev wrote:
> Hi Tomas,
> 
>>> Unless there are any objections to give this idea a try I'm willing to
>>> write and host a corresponding script.
>>>
>> That won't work until (2) is reliable enough. There are patches
>> (for example my "multivariate MCV lists and histograms") which
>> fails to apply only because the tool picks the wrong patch.
>> Possibly because it does not recognize compressed patches, or
>> something.
> 
> Agree. However we could simply add an "Enable autotests" checkbox to
> the commitfest application. In fact we could even left the
> commitfest application as is and provide corresponding checkboxes in
> the web interface of the "autoreviewer" application. Naturally every 
> automatically generated code review will include a link that
> disables autotests for this particular commitfest entry.
> 

I think we should try making it as reliable as possible first, and only
then consider adding some manual on/off switches. Otherwise the patches
may randomly flap between "OK" and "failed" after each new submission,
disrupting the CF process.

Making the testing reliable may even require establishing some sort of
policy (say, always send a complete patch, not just one piece).

>
> I hope this observation will change your mind :)
> 

Not sure. But I'm mostly just a passenger here, not the driver.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Robert Haas
On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
> To be clear, you'll still need to set replacement_sort_tuples high
> when testing RS, to make sure that we really use it for at least the
> first run when we're expected to. (There is no easy way to have
> testing mechanically verify that we really do only have one run in the
> end with RS, but I assume that such paranoia is unneeded.)

I seem to recall that raising replacement_sort_tuples makes
replacement selection look worse in some cases -- the optimization is
more likely to apply, sure, but the heap is also bigger, which hurts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-09-11 Thread Jesper Pedersen

Hi,

On 09/08/2017 03:35 PM, Sokolov Yura wrote:

I'm seeing

-M prepared: Up to 11% improvement
-M prepared -S: No improvement, no regression ("noise")
-M prepared -N: Up to 12% improvement

for all runs the improvement shows up the closer you get to the number
of CPU threads, or above. Although I'm not seeing the same
improvements as you on very large client counts there are definitely
improvements :)


It is expected:
- patch "fixes NUMA": for example, it doesn't give improvement on 1 socket
   at all (I've tested it using numactl to bind to 1 socket)
- and certainly it gives less improvement on 2 sockets than on 4 sockets
   (and 28 cores vs 72 cores also gives difference),
- one of hot points were CLogControlLock, and it were fixed with
   "Group mode for CLOG updates" [1]



I'm planning to re-test that patch.



+static inline bool
+LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode,
LWLockMode waitmode)

I'll leave it to the Committer to decide if this method is too big to
be "inline".


GCC 4.9 doesn't want to inline it without directive, and function call
is then remarkable in profile.

Attach contains version with all suggestions applied except remove of
"inline".



Yes, ideally the method will be kept at "inline".


Open questions:
---
* spins_per_delay as extern
* Calculation of skip_wait_list


Currently calculation of skip_wait_list is mostly empirical (ie best
i measured).



Ok, good to know.


I strongly think that instead of spins_per_delay something dependent
on concrete lock should be used. I tried to store it in a LWLock
itself, but it were worse.


Yes, LWLock should be kept as small as possible, and cache line aligned 
due to the cache storms, as shown by perf c2c.



Recently I understand it should be stored in array indexed by tranche,
but I didn't implement it yet, and therefore didn't measure.



Different constants for the LWLock could have an impact, but the 
constants would also be dependent on machine setup, and work load.


Thanks for working on this !

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-09-11 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Sep 10, 2017 at 12:38 AM, Tom Lane  wrote:
>> The specific case we need to allow is "ENOENT on a file/dir that was
>> there a moment ago".  I think it still behooves us to complain about
>> anything else.  If you think it's a simple fix, have at it.  But
>> I see at least three ways for _copypath_recurse to fail depending on
>> exactly when the file disappears.

> With the check for -d and -f, this brings indeed the count to three
> code paths. With the patch attached, I have added some manual sleep
> calls in RecursiveCopy.pm and triggered manual deletions of the source
> repository. The copy is able to complete in any case, warning about
> missing directories or files. I have added warn messages in the patch
> when ENOENT is triggered, though I think that those should be removed
> in the final patch.

Hm, I don't much like having it silently ignore files that are present;
that seems like a foot-gun in the long run.  What do you think of the
attached?

> By the way, 010_logical_decoding_timelines.pl does not need to include
> RecursiveCopy.

Good catch.

regards, tom lane

diff --git a/src/test/perl/RecursiveCopy.pm b/src/test/perl/RecursiveCopy.pm
index 28ecaf6..19f7dd2 100644
*** a/src/test/perl/RecursiveCopy.pm
--- b/src/test/perl/RecursiveCopy.pm
*** use File::Copy;
*** 29,40 
  =head2 copypath($from, $to, %params)
  
  Recursively copy all files and directories from $from to $to.
  
  Only regular files and subdirectories are copied.  Trying to copy other types
  of directory entries raises an exception.
  
  Raises an exception if a file would be overwritten, the source directory can't
! be read, or any I/O operation fails. Always returns true.
  
  If the B parameter is given, it must be a subroutine reference.
  This subroutine will be called for each entry in the source directory with its
--- 29,45 
  =head2 copypath($from, $to, %params)
  
  Recursively copy all files and directories from $from to $to.
+ Does not preserve file metadata (e.g., permissions).
  
  Only regular files and subdirectories are copied.  Trying to copy other types
  of directory entries raises an exception.
  
  Raises an exception if a file would be overwritten, the source directory can't
! be read, or any I/O operation fails.  However, we silently ignore ENOENT on
! open, because when copying from a live database it's possible for a file/dir
! to be deleted after we see its directory entry but before we can open it.
! 
! Always returns true.
  
  If the B parameter is given, it must be a subroutine reference.
  This subroutine will be called for each entry in the source directory with its
*** sub copypath
*** 74,79 
--- 79,87 
  		$filterfn = sub { return 1; };
  	}
  
+ 	# Complain if original path is bogus, because _copypath_recurse won't.
+ 	die "\"$base_src_dir\" does not exist" if !-e $base_src_dir;
+ 
  	# Start recursive copy from current directory
  	return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn);
  }
*** sub _copypath_recurse
*** 89,100 
  	return 1 unless &$filterfn($curr_path);
  
  	# Check for symlink -- needed only on source dir
! 	die "Cannot operate on symlinks" if -l $srcpath;
! 
! 	# Can't handle symlinks or other weird things
! 	die "Source path \"$srcpath\" is not a regular file or directory"
! 	  unless -f $srcpath
! 		  or -d $srcpath;
  
  	# Abort if destination path already exists.  Should we allow directories
  	# to exist already?
--- 97,104 
  	return 1 unless &$filterfn($curr_path);
  
  	# Check for symlink -- needed only on source dir
! 	# (note: this will fall through quietly if file is already gone)
! 	die "Cannot operate on symlink \"$srcpath\"" if -l $srcpath;
  
  	# Abort if destination path already exists.  Should we allow directories
  	# to exist already?
*** sub _copypath_recurse
*** 104,128 
  	# same name and we're done.
  	if (-f $srcpath)
  	{
! 		copy($srcpath, $destpath)
  		  or die "copy $srcpath -> $destpath failed: $!";
  		return 1;
  	}
  
! 	# Otherwise this is directory: create it on dest and recurse onto it.
! 	mkdir($destpath) or die "mkdir($destpath) failed: $!";
! 
! 	opendir(my $directory, $srcpath) or die "could not opendir($srcpath): $!";
! 	while (my $entry = readdir($directory))
  	{
! 		next if ($entry eq '.' or $entry eq '..');
! 		_copypath_recurse($base_src_dir, $base_dest_dir,
! 			$curr_path eq '' ? $entry : "$curr_path/$entry", $filterfn)
! 		  or die "copypath $srcpath/$entry -> $destpath/$entry failed";
  	}
- 	closedir($directory);
  
! 	return 1;
  }
  
  1;
--- 108,154 
  	# same name and we're done.
  	if (-f $srcpath)
  	{
! 		my $fh;
! 		unless (open($fh, '<', $srcpath))
! 		{
! 			return 1 if ($!{ENOENT});
! 			die "open($srcpath) failed: $!";
! 		}
! 		copy($fh, $destpath)
  		  or die "copy $srcpath -> $destpath failed: $!";
+ 		

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-11 Thread Robert Haas
On Sun, Sep 10, 2017 at 6:25 PM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> As there begins to be many switches of this kind and much code
>> duplication, I think that some refactoring into a more generic switch
>> infrastructure would be nicer.
>
> I have been thinking about this also and agree that it would be nicer,
> but then these options would just become "shorthand" for the generic
> switch.

I don't really like the "generic switch infrastructure" concept.  I
think it will make specifying behaviors harder without any
corresponding benefit.  There are quite a few --no-xxx switches now
but the total number of them that we could conceivably end up with
doesn't seem to be a lot bigger than what we have already.

Now, if we want switches to exclude a certain kind of object (e.g.
table, function, text search configuration) from the backup
altogether, that should be done in some generic way, like
--exclude-object-type=table.  But that's not what this is about.  This
is about excluding a certain kind of property (comments, in this case)
from being backed up.  And an individual kind of object doesn't have
many more properties than what we already handle.

So I think this is just an excuse for turning --no-security-labels
into --no-object-property=security-label.  To me, that's just plain
worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER command progress monitor

2017-09-11 Thread Robert Haas
On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
 wrote:
> Thanks for the comment.
>
> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
> heap and write new heap" when INDEX SCAN was selected.
>
> I agree that progress reporting for sort is difficult. So it only reports
> the phase ("sorting tuples") in the current design of progress monitor of
> cluster.
> It doesn't report counter of sort.

Doesn't that make it almost useless?  I would guess that scanning the
heap and writing the new heap would ordinarily account for most of the
runtime, or at least enough that you're going to want something more
than just knowing that's the phase you're in.

>> The patch is getting the value reported as heap_tuples_total from
>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>> see that value anyway if they wish.  The point of the progress
>> counters is to expose things the user couldn't otherwise see.  It's
>> also not necessarily accurate: it's only an estimate in the best case,
>> and may be way off if the relation has recently be extended by a large
>> amount.  I think it's pretty important that we try hard to only report
>> values that are known to be accurate, because users hate (and mock)
>> inaccurate progress reports.
>
> Do you mean to use the number of rows by using below calculation instead
> OldHeap->rd_rel->reltuples?
>
>  estimate rows = physical table size / average row length

No, I mean don't report it at all.  The caller can do that calculation
if they wish, without any help from the progress reporting machinery.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 8:07 AM, Ashutosh Bapat
 wrote:
> I see the same thing when I use prepare and execute

Hmm.  Well, that's good, but it doesn't prove there's no bug.  We have
to understand where and why it's getting locked to know whether the
behavior will be correct in all cases.  I haven't had time to look at
Amit's comments in detail yet so I don't know whether I agree with his
analysis or not, but we have to look at what's going on under the hood
to know whether the engine is working -- not just listen to the noise
it makes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 8:00 AM, Alvaro Herrera  wrote:
> How difficult/tedious/troublesome would be to install the missing
> partitions if you set hash partitioning with a default partition and
> only later on notice that some partitions are missing?  I think if the
> answer is that you need to exclusive-lock something for a long time and
> this causes a disruption in production systems, then it's better not to
> allow a default partition at all and just force all the hash partitions
> to be there from the start.
>
> On the other hand, if you can get tuples out of the default partition
> into their intended regular partitions without causing any disruption,
> then it seems okay to allow default partitions in hash partitioning
> setups.

I think there's no real use case for default partitioning, and yeah,
you do need exclusive locks to repartition things (whether hash
partitioning or otherwise).  It would be nice to fix that eventually,
but it's hard, because the executor has to cope with the floor moving
under it, and as of today, it really can't cope with that at all - not
because of partitioning specifically, but because of existing design
decisions that will require a lot of work (and probably arguing) to
revisit.

I think the way to get around the usability issues for hash
partitioning is to eventually add some syntax that does things like
(1) automatically create the table with N properly-configured
partitions, (2) automatically split an existing partition into N
pieces, and (3) automatically rewrite the whole table using a
different partition count.

People seem to find the hash partitioning stuff a little arcane.  I
don't want to discount that confusion with some sort of high-handed "I
know better" attitude, I think the interface that users will actually
see can end up being pretty straightforward.  The complexity that is
there in the syntax is to allow pg_upgrade and pg_dump/restore to work
properly.  But users don't necessarily have to use the same syntax
that pg_dump does, just as you can say CREATE INDEX ON a (b) and let
the system specify the index name, but at dump time the index name is
specified explicitly.

> (I, like many others, was unable to follow the default partition stuff
> as closely as I would have liked.)

Uh, sorry about that.  Would it help if I wrote a blog post on it or
something?  The general idea is simple: any tuples that don't route to
any other partition get routed to the default partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-09-11 Thread Amit Kapila
On Mon, Sep 11, 2017 at 4:49 PM, Amit Khandekar  wrote:
> On 8 September 2017 at 19:17, Amit Kapila  wrote:
>>
>> In that case, why can't we keep the workers also process in same
>> order, what is the harm in that?
>
> Because of the way the logic of queuing works, the workers finish
> earlier if they start with expensive plans first. For e.g. : with 3
> plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
> finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
> w2 will finish the remaining 2 plans in 8 units.  Whereas if the plans
> are ordered like : 4, 4, 8, then the workers will finish in 12 time
> units (w1 and w2 will finish each of the 1st two plans in 4 units, and
> then w1 or w2 will take up plan 3 and finish in 8 units, while the
> other worker remains idle).
>

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

>
>>
>> Few more comments:
>>
>> 1.
>> + else if (IsA(subpath, MergeAppendPath))
>> + {
>> + MergeAppendPath *mpath = (MergeAppendPath *) subpath;
>> +
>> + /*
>> + * If at all MergeAppend is partial, all its child plans have to be
>> + * partial : we don't currently support a mix of partial and
>> + * non-partial MergeAppend subpaths.
>> + */
>> + if (is_partial)
>> + return list_concat(partial_subpaths, list_copy(mpath->subpaths));
>>
>> In which situation partial MergeAppendPath is generated?  Can you
>> provide one example of such path?
>
> Actually currently we don't support partial paths for MergeAppendPath.
> That code just has that if condition (is_partial) but currently that
> condition won't be true for MergeAppendPath.
>

I think then it is better to have Assert for MergeAppend.  It is
generally not a good idea to add code which we can never hit.

>>
>> 2.
>> add_paths_to_append_rel()
..
>>
>> I think it might be better to add a sentence why we choose a different
>> way to decide a number of workers in the second case
>> (non-parallel-aware append).
>
> Yes, I agree. Will do that after we conclude with your next point below ...
>
>> Do you think non-parallel-aware Append
>> will be better in any case when there is a parallel-aware append?  I
>> mean to say let's try to create non-parallel-aware append only when
>> parallel-aware append is not possible.
>
> By non-parallel-aware append, I am assuming you meant  partial
> non-parallel-aware Append. Yes, if the parallel-aware Append path has
> *all* partial subpaths chosen, then we do omit a partial non-parallel
> Append path, as seen in this code in the patch :
>
> /*
> * Consider non-parallel partial append path. But if the parallel append
> * path is made out of all partial subpaths, don't create another partial
> * path; we will keep only the parallel append path in that case.
> */
> if (partial_subpaths_valid && !pa_all_partial_subpaths)
> {
> ..
> }
>
> But if the parallel-Append path has a mix of partial and non-partial
> subpaths, then we can't really tell which of the two could be cheapest
> until we calculate the cost. It can be that the non-parallel-aware
> partial Append can be cheaper as well.
>

How?  See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy.  Can you
please try to give me some example so that it will be clear.

>>
>> 4.
>> -  select count(*) from a_star;
>> -select count(*) from a_star;
>> +  select round(avg(aa)), sum(aa) from a_star;
>> +select round(avg(aa)), sum(aa) from a_star;
>>
>> Why you have changed the existing test. It seems count(*) will also
>> give what you are expecting.
>
> Needed to do cover some data testing with Parallel Append execution.
>

Okay.

One more thing, I think you might want to update comment atop
add_paths_to_append_rel to reflect the new type of parallel-aware
append path being generated. In particular, I am referring to below
part of the comment:

 * Similarly it collects partial paths from
 * non-dummy children to create partial append paths.
 */
static void
add_paths_to_append_rel()


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-09-11 Thread Etsuro Fujita

On 2017/08/17 17:27, Etsuro Fujita wrote:

On 2017/07/11 6:56, Robert Haas wrote:

I have to admit that I'm a little bit fuzzy about why foreign insert
routing requires all of these changes.  I think this patch would
benefit from being accompanied by several paragraphs of explanation
outlining the rationale for each part of the patch.


Will do.


Here is an updated version of the patch.

* Query planning: the patch creates copies of Query/Plan with a foreign 
partition as target from the original Query/Plan for each foreign 
partition and invokes PlanForeignModify with those copies, to allow the 
FDW to do query planning for remote INSERT with the existing API.  To 
make such Queries the similar way inheritance_planner does, I modified 
transformInsertStmt so that the inh flag for the partitioned table's RTE 
is set to true, which allows (1) expand_inherited_rtentry to build an 
RTE and AppendRelInfo for each partition in the partitioned table and 
(2) make_modifytable to build such Queries using adjust_appendrel_attrs 
and those AppendRelInfos.


* explain.c: I modified show_modifytable_info so that we can show remote 
queries for foreign partitions in EXPLAIN for INSERT into a partitioned 
table the same way as for inherited UPDATE/DELETE cases.  Here is an 
example:


postgres=# explain verbose insert into pt values (1), (2);
QUERY PLAN
---
 Insert on public.pt  (cost=0.00..0.03 rows=2 width=4)
   Foreign Insert on public.fp1
 Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
   Foreign Insert on public.fp2
 Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
 Output: "*VALUES*".column1
(7 rows)

I think I should add more explanation about the patch, but I don't have 
time today, so I'll write additional explanation in the next email. 
Sorry about that.


Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 315,321  SELECT tableoid::regclass, * FROM p2;
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
--- 315,321 
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
! ERROR:  cannot route copied tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
***
*** 342,348  SELECT tableoid::regclass, * FROM p2;
  (2 rows)
  
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
--- 342,348 
  (2 rows)
  
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to foreign table "p1"
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 7029,7034  NOTICE:  drop cascades to foreign table bar2
--- 7029,7172 
  drop table loct1;
  drop table loct2;
  -- ===
+ -- test tuple routing to foreign-table partitions
+ -- ===
+ create table pt (a int, b int) partition by list (a);
+ create table locp partition of pt for values in (1);
+ create table locfoo (a int check (a in (2)), b int);
+ create foreign table remp partition of pt for values in (2) server loopback 
options (table_name 'locfoo');
+ explain (verbose, costs off)
+ insert into pt values (1, 1), (2, 1);
+QUERY PLAN
+ -
+  Insert on public.pt
+Insert on public.locp
+Foreign Insert on public.remp
+  Remote SQL: INSERT INTO public.locfoo(a, b) VALUES ($1, $2)
+->  Values Scan on "*VALUES*"
+  Output: "*VALUES*".column1, "*VALUES*".column2
+ (6 rows)
+ 
+ insert into pt values (1, 1), (2, 1);
+ select tableoid::regclass, * FROM pt;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+  remp | 2 | 1
+ (2 rows)
+ 
+ select tableoid::regclass, * FROM locp;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+ (1 row)
+ 
+ select tableoid::regclass, * FROM remp;
+  tableoid | a | b 
+ --+---+---
+  remp | 2 | 1
+ (1 row)
+ 
+ explain (verbose, costs off)
+ insert into pt values (1, 

Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Aleksander Alekseev
Hi Tomas,

> > Unless there are any objections to give this idea a try I'm willing to
> > write and host a corresponding script.
> > 
> That won't work until (2) is reliable enough. There are patches (for
> example my "multivariate MCV lists and histograms") which fails to apply
> only because the tool picks the wrong patch. Possibly because it does
> not recognize compressed patches, or something.

Agree. However we could simply add an "Enable autotests" checkbox to the
commitfest application. In fact we could even left the commitfest
application as is and provide corresponding checkboxes in the web
interface of the "autoreviewer" application. Naturally every
automatically generated code review will include a link that disables
autotests for this particular commitfest entry.

I hope this observation will change your mind :)

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Remove 1MB size limit in tsvector

2017-09-11 Thread Tomas Vondra
On 09/11/2017 01:54 PM, Robert Haas wrote:
> On Mon, Sep 11, 2017 at 5:33 AM, Ildus Kurbangaliev
>  wrote:
>> Moreover, RUM index
>> stores positions + lexemes, so it doesn't need tsvectors for ranked
>> search. As a result, tsvector becomes a storage for
>> building indexes (indexable type), not something that should be used at
>> runtime. And the change of the format doesn't affect index creation
>> time.
> 
> RUM indexes, though, are not in core.
> 

Yeah, but I think Ildus has a point that this should not really matter
on indexed tsvectors. So the question is how realistic that benchmark
actually is. How likely are we to do queries on fts directly, not
through a GIN/GiST index? Particularly in performance-sensitive cases?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Tomas Vondra

On 09/11/2017 11:41 AM, Aleksander Alekseev wrote:
> Hi Thomas,
> 
> Great job!
> 

+1

> Here is a crazy idea. What if we write a script that would automatically
> return the patches that:
> 
> 1) Are not in "Waiting on Author" status
> 2) Don't apply OR don't pass `make installcheck-world`
> 
> ... to the "Waiting on Author" status and describe the problem through
> the "Add review" form on commitfest.postgresql.org? This will save a lot
> of time to the reviewers. Naturally nobody wants to spam pgsql-hackers@
> with automatic messages to often. I believe that sending such messages
> once a day would be enough.
> 
> Unless there are any objections to give this idea a try I'm willing to
> write and host a corresponding script.
> 

That won't work until (2) is reliable enough. There are patches (for
example my "multivariate MCV lists and histograms") which fails to apply
only because the tool picks the wrong patch. Possibly because it does
not recognize compressed patches, or something.

In such cases switching it to "Waiting on Author" automatically would be
damaging, as (a) there's nothing wrong with the patch, and (b) it's not
clear what to do to fix the problem.

So -1 to this for now, until we make this part smart enough.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-11 Thread Tom Lane
Peter Eisentraut  writes:
> On 9/8/17 13:14, Simon Riggs wrote:
>> 2. Allow a SET to apply only for a single statement
>> SET guc1 = x, guc2 = y FOR stmt
>> e.g. SET max_parallel_workers = 4 FOR SELECT count(*) FROM bigtable
>> Internally a GUC setting already exists for a single use, via
>> GUC_ACTION_SAVE, so we just need to invoke it.

> This doesn't read well to me.  It indicates to me "make this setting for
> this query [in case I run it later]", but it does not indicate that the
> query will be run.

Robert's original LET ... IN ... syntax proposal might be better from that
standpoint.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-11 Thread Peter Eisentraut
On 9/8/17 13:14, Simon Riggs wrote:
> 2. Allow a SET to apply only for a single statement
> SET guc1 = x, guc2 = y FOR stmt
> e.g. SET max_parallel_workers = 4 FOR SELECT count(*) FROM bigtable
> Internally a GUC setting already exists for a single use, via
> GUC_ACTION_SAVE, so we just need to invoke it.

This doesn't read well to me.  It indicates to me "make this setting for
this query [in case I run it later]", but it does not indicate that the
query will be run.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-09-11 Thread Pavel Stehule
2017-09-11 9:46 GMT+02:00 Jeevan Chalke :

> Hi Pavel,
>
>
> On Sat, Sep 9, 2017 at 11:42 AM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> 2017-09-08 9:36 GMT+02:00 Jeevan Chalke :
>>
>>> Hi Pavel,
>>> I like the idea of using parameter name instead of $n symbols.
>>>
>>> However, I am slightly worried that, at execution time if we want to
>>> know the parameter position in the actual function signature, then it
>>> will become difficult to get that from the corresponding datum
>>> variable. I don't have any use-case for that though. But apart from
>>> this concern, idea looks good to me.
>>>
>>
>> Understand - but it was reason why I implemented this function - when I
>> have to search parameter name via offset, I cannot to use string searching.
>> When you know the parameter name, you can use a string searching in text
>> editor, in pager.
>>
>> It is better supported now, then current behave.
>>
>
> Make sense.
>
>
>>
>>
>>>
>>> BTW, instead of doing all these changes, I have done these changes this
>>> way:
>>>
>>> -   /* Build variable and add to datum list */
>>> -   argvariable = plpgsql_build_variable(buf, 0,
>>> -argdtype, false);
>>> +   /*
>>> +* Build variable and add to datum list.  If there's a
>>> name for
>>> +* the argument, then use that else use $n name.
>>> +*/
>>> +   argvariable = plpgsql_build_variable((argnames &&
>>> argnames[i][0] != '\0') ?
>>> +argnames[i] : buf,
>>> +0, argdtype,
>>> false);
>>>
>>> This requires no new variable and thus no more changes elsewhere.
>>>
>>> Attached patch with these changes. Please have a look.
>>>
>>
>> Looks great - I added check to NULL only
>>
>
> Looks good.
> I have not made those changes in my earlier patch as I did not want to
> update other code which is not touched by this patch.
>
> Anyways, your changes related to NULL check seems reasonable.
> However, in attached patch I have fixed indentation.
>
> Passing it on to the committer.
>

Thank you very much

Regards

Pavel


>
> Thanks
> --
> Jeevan Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>


Re: [HACKERS] Constifying numeric.c's local vars

2017-09-11 Thread Tom Lane
Andres Freund  writes:
> One large user of unnecessary non-constant static variables is
> numeric.c.  More out of curiosity - numeric is slow enough in itself to
> make inlining not a huge win - I converted it to use consts.

LGTM.

> It's a bit ugly that some consts have to be casted away in the constant
> definitions, but aside from just inlining the values, I don't quite see
> a better solution?

No, I don't either.  I'm not sure that writing the constant inline would
produce the desired results - the compiler might well decide that it had
to be in read-write storage.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Ashutosh Bapat
On Mon, Sep 11, 2017 at 5:19 PM, Robert Haas  wrote:
> On Mon, Sep 11, 2017 at 6:45 AM, Ashutosh Bapat
>  wrote:
>> So, all partitioned partitions are getting locked correctly. Am I
>> missing something?
>
> That's not a valid test.  In that scenario, you're going to hold all
> the locks acquired by the planner, all the locks acquired by the
> rewriter, and all the locks acquired by the executor, but when using
> prepared queries, it's possible to execute the plan after the planner
> and rewriter locks are no longer held.
>

I see the same thing when I use prepare and execute

Session 1
postgres=# prepare stmt as select 1 from t1 union all select 2 from t1;
PREPARE
postgres=# select pg_backend_pid();
 pg_backend_pid

  50912
(1 row)

postgres=# begin;
BEGIN
postgres=# execute stmt;
 ?column?
--
(0 rows)

Session 2
postgres=# select locktype, relation::regclass, virtualxid,
virtualtransaction, pid, mode, granted, fastpath from pg_locks;
  locktype  | relation | virtualxid | virtualtransaction |  pid  |
 mode   | granted | fastpath
+--+++---+-+-+--
 relation   | pg_locks || 4/4| 50914 |
AccessShareLock | t   | t
 virtualxid |  | 4/4| 4/4| 50914 |
ExclusiveLock   | t   | t
 relation   | t1p1p1   || 3/12   | 50912 |
AccessShareLock | t   | t
 relation   | t1p1 || 3/12   | 50912 |
AccessShareLock | t   | t
 relation   | t1   || 3/12   | 50912 |
AccessShareLock | t   | t
 virtualxid |  | 3/12   | 3/12   | 50912 |
ExclusiveLock   | t   | t
(6 rows)

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
>  wrote:
> >> Rebased 0002 against this commit & renamed to 0001, PFA.
> >
> > Given that we have default partition support now, I am wondering
> > whether hash partitioned tables also should have default partitions.
> > The way we have structured hash partitioning syntax, there can be
> > "holes" in partitions. Default partition would help plug those holes.
> 
> Yeah, I was thinking about that, too.  On the one hand, it seems like
> it's solving the problem the wrong way: if you've set up hash
> partitioning properly, you shouldn't have any holes.  On the other
> hand, supporting it probably wouldn't cost anything noticeable and
> might make things seem more consistent.  I'm not sure which way to
> jump on this one.

How difficult/tedious/troublesome would be to install the missing
partitions if you set hash partitioning with a default partition and
only later on notice that some partitions are missing?  I think if the
answer is that you need to exclusive-lock something for a long time and
this causes a disruption in production systems, then it's better not to
allow a default partition at all and just force all the hash partitions
to be there from the start.

On the other hand, if you can get tuples out of the default partition
into their intended regular partitions without causing any disruption,
then it seems okay to allow default partitions in hash partitioning
setups.

(I, like many others, was unable to follow the default partition stuff
as closely as I would have liked.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove 1MB size limit in tsvector

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 5:33 AM, Ildus Kurbangaliev
 wrote:
> Moreover, RUM index
> stores positions + lexemes, so it doesn't need tsvectors for ranked
> search. As a result, tsvector becomes a storage for
> building indexes (indexable type), not something that should be used at
> runtime. And the change of the format doesn't affect index creation
> time.

RUM indexes, though, are not in core.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 6:45 AM, Ashutosh Bapat
 wrote:
> So, all partitioned partitions are getting locked correctly. Am I
> missing something?

That's not a valid test.  In that scenario, you're going to hold all
the locks acquired by the planner, all the locks acquired by the
rewriter, and all the locks acquired by the executor, but when using
prepared queries, it's possible to execute the plan after the planner
and rewriter locks are no longer held.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
 wrote:
>> Rebased 0002 against this commit & renamed to 0001, PFA.
>
> Given that we have default partition support now, I am wondering
> whether hash partitioned tables also should have default partitions.
> The way we have structured hash partitioning syntax, there can be
> "holes" in partitions. Default partition would help plug those holes.

Yeah, I was thinking about that, too.  On the one hand, it seems like
it's solving the problem the wrong way: if you've set up hash
partitioning properly, you shouldn't have any holes.  On the other
hand, supporting it probably wouldn't cost anything noticeable and
might make things seem more consistent.  I'm not sure which way to
jump on this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mysql_fdw + PG10: unrecognized node type: 217

2017-09-11 Thread Christoph Berg
Re: To Andres Freund 2017-09-11 <20170911095338.mqkiinkpk7gko...@msg.df7cb.de>
> Re: Andres Freund 2017-09-11 
> <20170911090306.s7sj4uyr4t72w...@alap3.anarazel.de>
> > Could you pprint() the expression that's being initialized?
> (gdb) p pprint(node)

Andres helped me to produce a correct dump, my error was that the
breakpoint should have been one line earlier because of elog()
internals.

The outcome is that Andres diagnosed it as a bug in mysql_fdw;
ExecInitExpr() should never get toplevel lists anymore.

Bug filed: https://github.com/EnterpriseDB/mysql_fdw/issues/147

Thanks,
Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Ashutosh Bapat
On Mon, Sep 11, 2017 at 3:11 PM, Aleksander Alekseev
 wrote:
> Hi Thomas,
>
> Great job!
>
> Here is a crazy idea. What if we write a script that would automatically
> return the patches that:
>
> 1) Are not in "Waiting on Author" status
> 2) Don't apply OR don't pass `make installcheck-world`
>
> ... to the "Waiting on Author" status and describe the problem through
> the "Add review" form on commitfest.postgresql.org? This will save a lot
> of time to the reviewers. Naturally nobody wants to spam pgsql-hackers@
> with automatic messages to often. I believe that sending such messages
> once a day would be enough.
>
> Unless there are any objections to give this idea a try I'm willing to
> write and host a corresponding script.
>
+1 that would help.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-09-11 Thread Amit Khandekar
On 8 September 2017 at 19:17, Amit Kapila  wrote:
> On Fri, Sep 8, 2017 at 3:59 PM, Amit Khandekar  wrote:
>> On 7 September 2017 at 11:05, Amit Kapila  wrote:
>>> On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar  
>>> wrote:
>>> 3.
>>> +/* 
>>> + * exec_append_leader_next
>>> + *
>>> + * To be used only if it's a parallel leader. The backend should scan
>>> + * backwards from the last plan. This is to prevent it from taking up
>>> + * the most expensive non-partial plan, i.e. the first subplan.
>>> + * 
>>> + */
>>> +static bool
>>> +exec_append_leader_next(AppendState *state)
>>>
>>> From above explanation, it is clear that you don't want backend to
>>> pick an expensive plan for a leader, but the reason for this different
>>> treatment is not clear.
>>
>> Explained it, saying that for more workers, a leader spends more work
>> in processing the worker tuples , and less work contributing to
>> parallel processing. So it should not take expensive plans, otherwise
>> it will affect the total time to finish Append plan.
>>
>
> In that case, why can't we keep the workers also process in same
> order, what is the harm in that?

Because of the way the logic of queuing works, the workers finish
earlier if they start with expensive plans first. For e.g. : with 3
plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
w2 will finish the remaining 2 plans in 8 units. Whereas if the plans
are ordered like : 4, 4, 8, then the workers will finish in 12 time
units (w1 and w2 will finish each of the 1st two plans in 4 units, and
then w1 or w2 will take up plan 3 and finish in 8 units, while the
other worker remains idle).

> Also, the leader will always scan
> the subplans from last subplan even if all the subplans are partial
> plans.

Since we already need to have two different code paths, I think we can
use the same code paths for any subplans.

> I think this will be the unnecessary difference in the
> strategy of leader and worker especially when all paths are partial.
> I think the selection of next subplan might become simpler if we use
> the same strategy for worker and leader.

Yeah if we had a common method for both it would have been better. But
anyways we have different logics to maintain.

>
> Few more comments:
>
> 1.
> + else if (IsA(subpath, MergeAppendPath))
> + {
> + MergeAppendPath *mpath = (MergeAppendPath *) subpath;
> +
> + /*
> + * If at all MergeAppend is partial, all its child plans have to be
> + * partial : we don't currently support a mix of partial and
> + * non-partial MergeAppend subpaths.
> + */
> + if (is_partial)
> + return list_concat(partial_subpaths, list_copy(mpath->subpaths));
>
> In which situation partial MergeAppendPath is generated?  Can you
> provide one example of such path?

Actually currently we don't support partial paths for MergeAppendPath.
That code just has that if condition (is_partial) but currently that
condition won't be true for MergeAppendPath.

>
> 2.
> add_paths_to_append_rel()
> {
> ..
> + /* Consider parallel append path. */
> + if (pa_subpaths_valid)
> + {
> + AppendPath *appendpath;
> + int parallel_workers;
> +
> + parallel_workers = get_append_num_workers(pa_partial_subpaths,
> +  pa_nonpartial_subpaths);
> + appendpath = create_append_path(rel, pa_nonpartial_subpaths,
> + pa_partial_subpaths,
> + NULL, parallel_workers, true,
> + partitioned_rels);
> + add_partial_path(rel, (Path *) appendpath);
> + }
> +
>   /*
> - * Consider an append of partial unordered, unparameterized partial paths.
> + * Consider non-parallel partial append path. But if the parallel append
> + * path is made out of all partial subpaths, don't create another partial
> + * path; we will keep only the parallel append path in that case.
>   */
> - if (partial_subpaths_valid)
> + if (partial_subpaths_valid && !pa_all_partial_subpaths)
>   {
>   AppendPath *appendpath;
>   ListCell   *lc;
>   int parallel_workers = 0;
>
>   /*
> - * Decide on the number of workers to request for this append path.
> - * For now, we just use the maximum value from among the members.  It
> - * might be useful to use a higher number if the Append node were
> - * smart enough to spread out the workers, but it currently isn't.
> + * To decide the number of workers, just use the maximum value from
> + * among the children.
>   */
>   foreach(lc, partial_subpaths)
>   {
> @@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>   }
>   Assert(parallel_workers > 0);
>
> - /* Generate a partial append path. */
> - appendpath = create_append_path(rel, partial_subpaths, NULL,
> - parallel_workers, partitioned_rels);
> + appendpath = create_append_path(rel, NIL, partial_subpaths,
> + 

Re: [HACKERS] [POC] Faster processing at Gather node

2017-09-11 Thread Alexander Kuzmenkov
Thanks Rafia, Amit, now I understand the ideas behind the patch better. 
I'll see if I can look at the new one.


--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Ashutosh Bapat
On Mon, Sep 11, 2017 at 2:16 PM, Amit Langote
 wrote:
> On 2017/09/11 16:23, Ashutosh Bapat wrote:
>> On Sat, Sep 9, 2017 at 6:28 AM, Robert Haas  wrote:
>>> I'm a bit suspicious about the fact that there are now executor
>>> changes related to the PlanRowMarks.  If the rowmark's prti is now the
>>> intermediate parent's RT index rather than the top-parent's RT index,
>>> it'd seem like that'd matter somehow.  Maybe it doesn't, because the
>>> code that cares about prti seems to only care about whether it's
>>> different from rti.  But if that's true everywhere, then why even
>>> change this?  I think we might be well off not to tinker with things
>>> that don't need to be changed.
>>
>> In the definition of ExecRowMark, I see
>> Index   prti;   /* parent range table index, if child */
>> It just says parent, by which I take as direct parent. For
>> inheritance, which earlier flattened inheritance hierarchy, direct
>> parent was top parent. So, probably nobody thought whether a parent is
>> direct parent or top parent. But now that we have introduced that
>> concept we need to interpret this comment anew. And I think
>> interpreting it as direct parent is non-lossy.
>
> But then we also don't have anything to say about why we're making that
> change.  If you could describe what non-lossy is in this context, then
> fine.

By setting prti to the topmost parent RTI we are loosing information
that this child may be an intermediate child similar to what we did
earlier to AppendRelInfo. That's the lossy-ness in this context.

> But that we'd like to match with what we're going to do for
> AppendRelInfos does not seem to be a sufficient explanation for this change.

The purpose of this patch is to change the parent-child linkages for
partitioned table and prti is one of them. So, in fact, I am wondering
why not to change that along with AppendRelInfo.

>
>> If we set top parent's
>> index, parent RTI in AppendRelInfo and PlanRowMark would not agree.
>> So, it looks quite natural that we set the direct parent's index in
>> PlanRowMark.
>
> They would not agree, yes, but aren't they unrelated?  If we have a reason
> for them to agree, (for example, row-locking breaks in the inherited table
> case if we didn't), then we should definitely make them agree.
>
> Updating the comment for prti definition might be something that this
> patch could (should?) do, but I'm not quite sure about that too.
>

To me that looks backwards again for the reasons described above.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-11 Thread Ashutosh Bapat
On Mon, Sep 11, 2017 at 12:16 PM, Amit Langote
 wrote:
> On 2017/09/09 2:38, Ashutosh Bapat wrote:
>> On Fri, Sep 8, 2017 at 11:17 AM, Amit Langote wrote:
>>> I updated the patch to include just those changes.  I'm not sure about
>>> one of the Ashutosh's changes whereby the child PlanRowMark is also passed
>>> to expand_partitioned_rtentry() to use as the parent PlanRowMark.  I think
>>> the child RTE, child RT index and child Relation are fine, because they
>>> are necessary for creating AppendRelInfos in a desired way for later
>>> planning steps.  But PlanRowMarks are not processed within the planner
>>> afterwards and do not need to be marked with the immediate parent-child
>>> association in the same way that AppendRelInfos need to be.
>>
>> Passing top parent's row mark works today, since there is no
>> parent-child specific translation happening there. But if in future,
>> we introduce such a translation, row marks for indirect children in a
>> multi-level partitioned hierarchy won't be accurate. So, I think it's
>> better to pass row marks of the direct parent.
>
> IMHO, we should make it the responsibility of the future patch to set a
> child PlanRowMark's prti to the direct parent's RT index, when we actually
> know that it's needed for something.  We clearly know today why we need to
> pass the other objects like child RT entry, RT index, and Relation, so we
> should limit this patch to pass only those objects to the recursive call.
> That makes this patch a relatively easy to understand change.

I think you are mixing two issues here 1. setting parent RTI in child
PlanRowMark and 2. passing immediate parent's PlanRowMark to
expand_single_inheritance_child().

I have discussed 1 in my reply to Robert.

About 2 you haven't given any particular comments to my reply. To me
it looks like it's this patch that introduces the notion of
multi-level expansion, so it's natural for this patch to pass
PlanRowMark in cascaded fashion similar to other structures.

>
>>> I also included the changes to add_paths_to_append_rel() from my patch on
>>> the "path toward faster partition pruning" thread.  We'd need that change,
>>> because while add_paths_to_append_rel() is called for all partitioned
>>> table RTEs in a given partition tree, expand_inherited_rtentry() would
>>> have set up a PartitionedChildRelInfo only for the root parent, so
>>> get_partitioned_child_rels() would not find the same for non-root
>>> partitioned table rels and crash failing the Assert.  The changes I made
>>> are such that we call get_partitioned_child_rels() only for the parent
>>> rels that are known to correspond root partitioned tables (or as you
>>> pointed out on the thread, "the table named in the query" as opposed those
>>> added to the query as result of inheritance expansion).  In addition to
>>> the relkind check on the input RTE, it would seem that checking that the
>>> reloptkind is RELOPT_BASEREL would be enough.  But actually not, because
>>> if a partitioned table is accessed in a UNION ALL query, reloptkind even
>>> for the root partitioned table (the table named in the query) would be
>>> RELOPT_OTHER_MEMBER_REL.  The only way to confirm that the input rel is
>>> actually the root partitioned table is to check whether its parent rel is
>>> not RTE_RELATION, because the parent in case of UNION ALL Append is a
>>> RTE_SUBQUERY RT entry.
>>>
>>
>> There was a change in my 0003 patch, which fixed the crash. It checked
>> for RELOPT_BASEREL and RELKIND_PARTITIONED_TABLE. I have pulled it in
>> my 0001 patch. It no more crashes. I tried various queries involving
>> set operations and bare multi-level partitioned table scan with my
>> patch, but none of them showed any anomaly. Do you have a testcase
>> which shows problem with my patch? May be your suggestion is correct,
>> but corresponding code implementation is slightly longer than I would
>> expect. So, we should go with it, if there is corresponding testcase
>> which shows why it's needed.
>
> If we go with your patch, partitioned tables won't get locked, for
> example, in case of the following query (p is a partitioned table):
>
> select 1 from p union all select 2 from p;
>
> That's because the RelOptInfos for the two instances of p in the above
> query are RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL.  They are children
> of the Append corresponding to the UNION ALL subquery RTE.  So,
> partitioned_rels does not get set per your proposed code.

Session 1:
postgres=# begin;
BEGIN
postgres=# select 1 from t1 union all select 2 from t1;
 ?column?
--
(0 rows)

postgres=# select pg_backend_pid();
 pg_backend_pid

  28843
(1 row)


Session 2
postgres=# select locktype, relation::regclass, virtualxid,
virtualtransaction, pid, mode, granted, fastpath from pg_locks;
  locktype  | relation | virtualxid | virtualtransaction |  pid  |
 mode   | granted | fastpath

Re: [HACKERS] Polyphase merge is obsolete

2017-09-11 Thread Tomas Vondra
Hi,

I planned to do some benchmarking on this patch, but apparently the
patch no longer applies. Rebase please?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-11 Thread Andrey Borodin

> 11 сент. 2017 г., в 12:57, Dmitriy Sarafannikov  
> написал(а):
> Hi Andrew! Thanks for the patch, but patch 
> 0001-allow-uncompressed-Gist-2.patch no longer applies on current master 
> branch.
> Please could you rebase it?
Sure, see attachment. Thanks for looking into the patch!

Best regards, Andrey Borodin. 


0001-Allow-uncompressed-GiST-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-09-11 Thread Amit Langote
Hi Amit,

On 2017/09/11 16:16, Amit Khandekar wrote:
> Thanks Amit for the patch. I am still reviewing it, but meanwhile
> below are a few comments so far ...

Thanks for the review.

> +   next_parted_idx += (list_length(*pds) - next_parted_idx - 1);
> 
> I think this can be replaced just by :
> +   next_parted_idx = list_length(*pds) - 1;
> Or, how about removing this variable next_parted_idx altogether ?
> Instead, we can just do this :
> pd->indexes[i] = -(1 + list_length(*pds));

That seems like the simplest possible way to do it.

> + next_leaf_idx += (list_length(*leaf_part_oids) - next_leaf_idx);
> 
> Didn't understand why next_leaf_idx needs to be updated in case when
> the current partrelid is partitioned. I think it should be incremented
> only for leaf partitions, no ? Or for that matter, again, how about
> removing the variable 'next_leaf_idx' and doing this :
> *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
> pd->indexes[i] = list_length(*leaf_part_oids) - 1;

Yep.

Attached updated patch does it that way for both partitioned table indexes
and leaf partition indexes.  Thanks for pointing it out.


> ---
> 
> * For every partitioned table in the tree, starting with the root
> * partitioned table, add its relcache entry to parted_rels, while also
> * queuing its partitions (in the order in which they appear in the
> * partition descriptor) to be looked at later in the same loop.  This is
> * a bit tricky but works because the foreach() macro doesn't fetch the
> * next list element until the bottom of the loop.
> 
> I think the above comment needs to be modified with something
> explaining the relevant changed code. For e.g. there is no
> parted_rels, and the "tricky" part was there earlier because of the
> list being iterated and at the same time being appended.
> 
> 

I think I forgot to update this comment.

> I couldn't see the existing comments like "Indexes corresponding to
> the internal partitions are multiplied by" anywhere in the patch. I
> think those comments are still valid, and important.

Again, I failed to keep this comment.  Anyway, I reworded the comments a
bit to describe what the code is doing more clearly.  Hope you find it so too.

Thanks,
Amit
From 0e04cee14a5168e0652c2aa646c169789ae41e8e Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 30 Aug 2017 10:02:05 +0900
Subject: [PATCH 1/2] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code.  In
particular, it's pretty undesirable that it makes it the responsibility
of the caller to release some resources, such as executor tuple table
slots, tuple-conversion maps, etc.  That makes it harder to use in
places other than where it's currently being used.

After this refactoring, ExecSetupPartitionTupleRouting() now needs to
do some of the work that was previously done in
RelationGetPartitionDispatchInfo().
---
 src/backend/catalog/partition.c| 53 +++-
 src/backend/commands/copy.c| 32 +++--
 src/backend/executor/execMain.c| 88 ++
 src/backend/executor/nodeModifyTable.c | 37 +++---
 src/include/catalog/partition.h| 20 +++-
 src/include/executor/executor.h|  4 +-
 src/include/nodes/execnodes.h  | 40 +++-
 7 files changed, 181 insertions(+), 93 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 73eff17202..555b7c21c7 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1292,7 +1292,6 @@ RelationGetPartitionDispatchInfo(Relation rel,
Relationpartrel = lfirst(lc1);
Relationparent = lfirst(lc2);
PartitionKey partkey = RelationGetPartitionKey(partrel);
-   TupleDesc   tupdesc = RelationGetDescr(partrel);
PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
int j,
m;
@@ -1300,29 +1299,12 @@ RelationGetPartitionDispatchInfo(Relation rel,
pd[i] = (PartitionDispatch) 
palloc(sizeof(PartitionDispatchData));
pd[i]->reldesc = partrel;
pd[i]->key = partkey;
-   pd[i]->keystate = NIL;
pd[i]->partdesc = partdesc;
if (parent != NULL)
-   {
-   /*
-* For every partitioned table other than root, we must 
store a
-* tuple table slot initialized with its tuple 
descriptor and a
-* tuple conversion map to convert a tuple from its 
parent's
-* rowtype to its own. That is to make sure that we are 
looking at
-* the correct row using the 

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-11 Thread Amit Kapila
On Mon, Sep 11, 2017 at 12:43 PM, Michael Paquier
 wrote:
> On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila  wrote:
>> I have prepared separate patches for hash and btree index.  I think
>> for another type of indexes, it is better to first fix the pd_lower
>> issue.
>
> Just wondering (sorry I have not looked at your patch in details)...
> Have you tested the compressibility effects of this patch on FPWs with
> and without wal_compression?
>

I have debugged it to see that it is executing the code path to
eliminate the hole for the hash index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mysql_fdw + PG10: unrecognized node type: 217

2017-09-11 Thread Christoph Berg
Re: Andres Freund 2017-09-11 <20170911090306.s7sj4uyr4t72w...@alap3.anarazel.de>
> Could you pprint() the expression that's being initialized?

(gdb) f 4
#4  0x5604ecedd124 in ExecInitNode (node=node@entry=0x5604ee884f80, 
estate=estate@entry=0x5604ee8c78a0, 
eflags=eflags@entry=16) at 
./build/../src/backend/executor/execProcnode.c:164
164 ./build/../src/backend/executor/execProcnode.c: Datei oder Verzeichnis 
nicht gefunden.
(gdb) p pprint(node)
$1 = void


2017-09-11 11:27:53.268 CEST [31066] postgres@postgres ANWEISUNG:  SELECT 
test_param_where();
   {RESULT 
   :startup_cost 0.00 
   :total_cost 0.26 
   :plan_rows 1 
   :plan_width 4 
   :parallel_aware false 
   :parallel_safe false 
   :plan_node_id 0 
   :targetlist (
  {TARGETENTRY 
  :expr 
 {FUNCEXPR 
 :funcid 16402 
 :funcresulttype 2278 
 :funcretset false 
 :funcvariadic false 
 :funcformat 0 
 :funccollid 0 
 :inputcollid 0 
 :args <> 
 :location 7
 }
  :resno 1 
  :resname test_param_where 
  :ressortgroupref 0 
  :resorigtbl 0 
  :resorigcol 0 
  :resjunk false
  }
   )
   :qual <> 
   :lefttree <> 
   :righttree <> 
   :initPlan <> 
   :extParam (b)
   :allParam (b)
   :resconstantqual <>
   }


Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-11 Thread Aleksander Alekseev
Hi Thomas,

Great job!

Here is a crazy idea. What if we write a script that would automatically
return the patches that:

1) Are not in "Waiting on Author" status
2) Don't apply OR don't pass `make installcheck-world`

... to the "Waiting on Author" status and describe the problem through
the "Add review" form on commitfest.postgresql.org? This will save a lot
of time to the reviewers. Naturally nobody wants to spam pgsql-hackers@
with automatic messages to often. I believe that sending such messages
once a day would be enough.

Unless there are any objections to give this idea a try I'm willing to
write and host a corresponding script.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-11 Thread Alvaro Herrera
Tomas Vondra wrote:

> > Finally, as vertical scrolling is mandatory, I would be fine with
> > skipping lines with entries for readability, but it is just a matter of
> > taste and I expect there should be half a dozen different opinions on
> > the matter of formatting.
> 
> FWIW, +1 to extra lines from me - I find it way more readable, as it
> clearly separates the items.

+1

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove 1MB size limit in tsvector

2017-09-11 Thread Ildus Kurbangaliev
On Thu, 7 Sep 2017 23:08:14 +0200
Tomas Vondra  wrote:

> Hi,
> 
> On 08/17/2017 12:23 PM, Ildus Kurbangaliev wrote:
> > In my benchmarks when database fits into buffers (so it's
> > measurement of the time required for the tsvectors conversion) it
> > gives me these results:
> > 
> > Without conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:04:44 Number of connections:  4
> > 2017/08/17 12:04:44 Database:  test1
> > 2017/08/17 12:09:44 Processed: 51419
> > 
> > With conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:14:31 Number of connections:  4
> > 2017/08/17 12:14:31 Database:  test1
> > 2017/08/17 12:19:31 Processed: 43607
> > 
> > I ran a bunch of these tests, and these results are stable on my
> > machine. So in these specific tests performance regression about
> > 15%.
> > 
> > Same time I think this could be the worst case, because usually data
> > is on disk and conversion will not affect so much to performance.
> >   
> 
> That seems like a fairly significant regression, TBH. I don't quite
> agree we can simply assume in-memory workloads don't matter, plenty of
> databases have 99% cache hit ratio (particularly when considering not
> just shared buffers, but also page cache).

I think part of this regression is caused by better compression of new
format. I can't say exact percent here, need to check with perf.

If you care about performace, you create indexes, which means that
tsvector will no longer be used for text search (except for ORDER BY
rank). Index machinery will only peek into tsquery. Moreover, RUM index
stores positions + lexemes, so it doesn't need tsvectors for ranked
search. As a result, tsvector becomes a storage for
building indexes (indexable type), not something that should be used at
runtime. And the change of the format doesn't affect index creation
time.

> 
> Can you share the benchmarks, so that others can retry running them?

Benchmarks are published at github:
https://github.com/ildus/tsbench . I'm not sure that they are easy to
use.

Best regards,
Ildus Kurbangaliev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-09-11 Thread Konstantin Knizhnik



On 09.09.2017 06:35, Thomas Munro wrote:

On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
 wrote:

Attached please find rebased version of the autoprepare patch based on Tom's
proposal (perform analyze for tree with constant literals and then replace
them with parameters).
Also I submitted this patch for the Autum commitfest.

The patch didn't survive the Summer bitrotfest.  Could you please rebase it?


Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e3eb0c5..17f3dfd 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3688,6 +3688,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree.
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ *
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ *
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case T_A_Star:
+			/* primitive node types with no subnodes */
+			break;
+		case T_Alias:
+			/* we assume the colnames list isn't interesting */
+			break;
+		case T_RangeVar:
+			return mutator(&((RangeVar *) node)->alias, context);
+		case T_GroupingFunc:
+			return mutator(&((GroupingFunc *) node)->args, context);
+		case T_SubLink:
+			{
+SubLink	   *sublink = (SubLink *) node;
+
+if (mutator(>testexpr, context))
+	return true;
+/* we assume the operName is not interesting */
+if (mutator(>subselect, context))
+	return true;
+			}
+			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *caseexpr = (CaseExpr *) node;
+
+if (mutator(>arg, context))
+	return true;
+/* we assume mutator(& doesn't care about CaseWhens, either */
+foreach(temp, caseexpr->args)
+{
+	CaseWhen   *when = (CaseWhen *) lfirst(temp);
+
+	Assert(IsA(when, CaseWhen));
+	if (mutator(>expr, context))
+		return true;
+	if (mutator(>result, context))
+		return true;
+}
+if (mutator(>defresult, context))
+	return true;
+			}
+			break;
+		case T_RowExpr:
+			/* Assume colnames isn't interesting */
+			return mutator(&((RowExpr *) node)->args, context);
+		case T_CoalesceExpr:
+			return mutator(&((CoalesceExpr *) node)->args, context);
+		case T_MinMaxExpr:
+			return mutator(&((MinMaxExpr *) node)->args, context);
+		case T_XmlExpr:
+			{
+XmlExpr	   *xexpr = (XmlExpr *) node;
+
+if (mutator(>named_args, context))
+	return true;
+/* we assume mutator(& doesn't care about arg_names */
+if (mutator(>args, context))
+	return true;
+			}
+			break;
+		case T_NullTest:
+			return mutator(&((NullTest *) node)->arg, context);
+		case T_BooleanTest:
+			return mutator(&((BooleanTest *) node)->arg, context);
+		case T_JoinExpr:
+			{
+JoinExpr   *join = (JoinExpr *) node;
+
+if (mutator(>larg, context))
+	return true;
+if (mutator(>rarg, context))
+	return true;
+if (mutator(>quals, context))
+	return true;
+if (mutator(>alias, context))
+	return true;
+/* using list is deemed uninteresting */
+			}
+			break;
+		case T_IntoClause:
+			{
+IntoClause *into = (IntoClause *) node;
+
+if (mutator(>rel, context))
+		

Re: [HACKERS] Red-black trees: why would anyone want preorder or postorder traversal?

2017-09-11 Thread Aleksander Alekseev
Hi Tom,

> In short, therefore, I propose we rip out the DirectWalk and InvertedWalk
> options along with their support code, and then drop the portions of
> test_rbtree that are needed to exercise them.  Any objections?

Doesn't sound like something that will be used any time soon. When and
if it will happen nothing prevents us from adding this code back. So
it's +1.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-11 Thread Michael Paquier
On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote
 wrote:
> On 2017/09/10 15:22, Michael Paquier wrote:
>> On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila  wrote:
>>> On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane  wrote:
 Michael Paquier  writes:
> On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane  wrote:
>> In short, this patch needs a significant rewrite, and more analysis than
>> you've done so far on whether there's actually any benefit to be gained.
>> It might not be worth messing with.

> I did some measurements of the compressibility of the GIN meta page,
> looking at its FPWs with and without wal_compression and you are
> right: there is no direct compressibility effect when setting pd_lower
> on the meta page. However, it seems to me that there is an argument
> still pleading on favor of this patch for wal_consistency_checking.

 I think that would be true if we did both my point 1 and 2, so that
 the wal replay functions could trust pd_lower to be sane in all cases.
 But really, if you have to touch all the places that write these
 metapages, you might as well mark them REGBUF_STANDARD while at it.

> The same comment ought to be mentioned for btree.

 Yeah, I was wondering if we ought not clean up btree/hash while at it.
 At the very least, their existing comments saying that it's inessential
 to set pd_lower could use some more detail about why or why not.

>>>
>>> +1.  I think we can even use REGBUF_STANDARD in the hash for metapage
>>> where currently it is not used.  I can give a try to write a patch for
>>> hash/btree part if you want.
>>
>> Coordinating efforts here would be nice. If you, Amit K, are taking
>> care of a patch for btree and hash, would you, Amit L, write the part
>> for GIN, BRIN and SpGist? This needs a careful lookup as many code
>> paths need a lookup so it may take time. Please note that I don't mind
>> double-checking this part if you don't have enough room to do so.
>
> Sorry, I didn't have time today to carefully go through the recent
> discussion on this thread (starting with Tom's email wherein he said he
> set the status of the patch to Waiting on Author).  I will try tomorrow.

Thanks for the update! Once you get to this point, please let me know
if you would like to work on a more complete patch for brin, gin and
spgist. If you don't have enough room, I am fine to produce something.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mysql_fdw + PG10: unrecognized node type: 217

2017-09-11 Thread Andres Freund
Hi,

On 2017-09-11 10:53:39 +0200, Christoph Berg wrote:
> Re: To Tom Lane 2017-09-11 <20170911083136.stdnc4w52wk3o...@msg.df7cb.de>
> > postgres=# select test_param_where();
> > FEHLER:  XX000: unrecognized node type: 217
> > KONTEXT:  SQL-Anweisung »select bfrom numbers where a=x«
> > PL/pgSQL-Funktion test_param_where() Zeile 6 bei SQL-Anweisung
> > ORT:  ExecInitExprRec, execExpr.c:2031
> 
> The problem happens on the 6th iteration of this loop:
> 
> CREATE FOREIGN TABLE numbers(a int, b varchar(255)) SERVER mysql_svr OPTIONS 
> (dbname 'testdb', table_name 'numbers');
> 
> create or replace function test_param_where() returns void as $$
> DECLARE
>   n varchar;
> BEGIN
>   FOR x IN 1..9 LOOP
> select b into n from numbers where a=x;
> raise notice 'Found number %', n;
>   end loop;
>   return;
> END
> $$ LANGUAGE plpgsql;
> 
> SELECT test_param_where();
> 
> ***
> *** 345,368 
>   NOTICE:  Found number Three
>   NOTICE:  Found number Four
>   NOTICE:  Found number Five
> ! NOTICE:  Found number Six
> ! NOTICE:  Found number Seven
> ! NOTICE:  Found number Eight
> ! NOTICE:  Found number Nine
> !  test_param_where 
> ! --
> !  
> ! (1 row)
> ! 
>   DELETE FROM employee;
> ...
> --- 344,365 
>   NOTICE:  Found number Three
>   NOTICE:  Found number Four
>   NOTICE:  Found number Five
> ! ERROR:  unrecognized node type: 217
> ! CONTEXT:  SQL statement "select bfrom numbers where a=x"
> ! PL/pgSQL function test_param_where() line 6 at SQL statement
> ! /*

Could you pprint() the expression that's being initialized?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >