date:20170821

[HACKERS] locale problem of bgworker: logical replication launcher and worker process

2017-08-21 Thread Ioseph Kim


Hi,
I tried ver. 10 beta3.
I had below messages.

-
$ pg_ctl start
서버를 시작하기 위해 기다리는 중완료
서버 시작됨
2017-08-22 14:06:21.248 KST [32765] 로그:  IPv6, 주소: "::1", 포트 5433 
번으로 접속을 허용합니다
2017-08-22 14:06:21.248 KST [32765] 로그:  IPv4, 주소: "127.0.0.1", 포트 
5433 번으로 접속을 허용합니다
2017-08-22 14:06:21.364 KST [32765] 로그:  "/tmp/.s.PGSQL.5433" 유닉스 
도메인 소켓으로 접속을 허용합니다
2017-08-22 14:06:21.570 KST [32766] 로그:  데이터베이스 시스템 마지막 가동 
중지 시각: 2017-08-22 14:04:52 KST
2017-08-22 14:06:21.570 KST [32766] 로그:  recovered replication state of 
node 1 to 0/0
2017-08-22 14:06:21.638 KST [32765] 로그:  이제 데이터베이스 서버로 접속할 
수 있습니다
2017-08-22 14:06:21.697 KST [306] 로그:  logical replication apply worker 
for subscription "replica_a" has started
2017-08-22 14:06:21.698 KST [306] 오류:  발행 서버에 연결 할 수 없음: ??? 
??? ? ??: ??? ???

"localhost" (::1)  ??? ?? ???,
5432 ??? TCP/IP ???  ??.
??? ??? ? ??: ??? ???
"localhost" (127.0.0.1)  ??? ?? ???,
5432 ??? TCP/IP ???  ??.
-

main postmaster messages are printed  in korean well,
but bgworker process message is not.

This problem seems to have occurred because the server locale 
environment and the client's that are different.


How I can resolv that?

Regards ioseph.

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan

On Mon, Aug 21, 2017 at 4:48 PM, Peter Eisentraut
 wrote:
> On 8/21/17 12:33, Peter Geoghegan wrote:
>> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>>  wrote:
>>> Here are my patches to address this.
>>
>> These look good.
>
> Committed.  That closes this open item.

Thanks again.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A suspicious code in pgoutput_startup().

2017-08-21 Thread Yugo Nagata

On Tue, 15 Aug 2017 16:23:35 -0400
Peter Eisentraut  wrote:

> On 7/27/17 20:52, Yugo Nagata wrote:
> > 175 /* Check if we support requested protocol */
> > 176 if (data->protocol_version != LOGICALREP_PROTO_VERSION_NUM)
> > 177 ereport(ERROR,
> > 178 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > 179  errmsg("client sent proto_version=%d but we only 
> > support protocol %d or lower",
> > 180 data->protocol_version, 
> > LOGICALREP_PROTO_VERSION_NUM)));
> > 
> > Although the if condition is not-equal, the error message says 
> > "we only support protocol %d or lower".  Is this intentional?
> > Or should this be fixed as below? 
> > 
> > 176 if (data->protocol_version > LOGICALREP_PROTO_VERSION_NUM)
> > 
> > Attached is a simple patch in case of fixing.
> 
> Fixed, thanks.

Thanks, too!
> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A little improvementof ApplyLauncherMain loop code

2017-08-21 Thread Yugo Nagata

On Tue, 15 Aug 2017 15:17:06 -0400
Peter Eisentraut  wrote:

> On 8/1/17 02:28, Yugo Nagata wrote:
> > When reading the logical replication code, I found that the following
> > part could be improved a bit. In the foreach, LWLockAcquire and
> > logicalrep_worker_find are called for each loop, but they are needed
> > only when sub->enabled is true.
> 
> Fixed, thanks!

Thanks, too!

> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-21 Thread Masahiko Sawada

On Wed, Aug 16, 2017 at 4:55 PM, Fabien COELHO  wrote:
>
> Hello Masahiko-san,
>
>> Yeah, once custom initialization patch get committed we can extend it.
>>
>> Attached updated patch. I've incorporated the all comments from Fabien
>> to it, and changed it to single letter version.
>
>
> Patch applies and works.
>
> A few comments and questions about the code and documentation:

Thank you for the comments!

>
> Why not allow -I as a short option for --custom-initialize?

Other options for similar purpose such as --foreign-keys also have
only a long option. Since I think --custom-initialize option is in the
same context as other options I didn't add short option to it for now.
Because the options name is found by a prefix searching we can use a
short name --cu for now.

> I do not think that the --custom-initialize option needs to appear as a
> specific synopsis in the documentation, as it is now a sub-option of -i.
>
> checkCustomCmds: I would suggest to simplify test code with strchr
> and to merge the two fprintf into one, something like:
>
>   if (strchr("tdpfv", *cmd) == NULL) {
>  fprintf(stderr,
> "\n"
> "\n",
> ...);
>  ...
>
> Moreover there is already an error message later if checkCustomCmds fails, I
> think
> it could be expanded and the two-line one in the function dropped entirely?
> It seems strange to have two level error messages for that.
>
> Help message looks strange. I suggest something regexp-like:
>
>  --custom-initialize=[tdvpf]+
>
> I would suggest to put the various init* functions in a more logical order:
> first create table, then load data, etc.
>
> In case 0: do not exchange unlogged_tables & foreign_keys gratuiously.
>
> After checking the initial code, I understand that the previous default was
> "tdvp", but you put "tdpv". I'm unsure whether vaccuum would do something to
> the indexes and that would make more sense. In doubt, I suggest to keep the
> previous default.
>
> Maybe --foreign-keys should really do "tdvpf"?
>
> I may be okay with disallowing --foreign-keys and --no-vacuum if
> --custom-init is used,
> but then there is no need to test it again in init... I think that in any
> case 'f' and 'v'
> should always trigger the corresponding initializations.
>
> On the other hand, I think that it could be more pragmatic with these
> options, i.e. --foreign-keys could just append 'f' to the current command if
> not already there, and '--no-vacuum' could remove 'v' if there, on the fly,
> so that nothing would be banned. This would require to always have a
> malloced custom_init string. It would allow to remove the "foreign_keys"
> global variable. "is_no_vacuum" is probably still needed for benchmarking.
> This way there would be no constraints and "is_custom_init" could be dropped
> as well.

I'm inclined to remove the restriction so that we can specify
--foreign-keys, --no-vacuum and --custom-initialize at the same time.
I think a list of char would be better here rather than a single
malloced string to remove particular initialization step easily.
Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-21 Thread Jeevan Ladhe

Hi Beena,

On Thu, Aug 17, 2017 at 4:59 PM, Beena Emerson 
wrote:

> PFA the patch rebased over v25 patches of default list partition [1]
>

Thanks for rebasing.

Range partition review:

1.
There are lot of changes in RelationBuildPartitionDesc(). It was hard to
understand why these changes are needed for default partition. I did not
find
any explanation for the same, may be I am missing some discussion? Only
when I looked into it carefully I could understand that these changes are
nothing but optimization for identifying the distinct bounds.
I think it is better to keep the patch for code optimisation/simplification
in a
separate patch. I have separated the patch(0001) in attached patches for
this
purpose.

2.
+ * For default partition, it returns the negation of the constraints of all
+ * the other partitions.
+ *
+ * If we end up with an empty result list, we return NULL.

We can rephrase the comment as below:

"For default partition, function returns the negation of the constraints of
all
the other partitions. If default is the only partition then returns NULL."

3.
@@ -2396,6 +2456,8 @@ make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower)
ListCell   *lc;
int i;

+   Assert(datums != NULL);
+
bound = (PartitionRangeBound *) palloc0(sizeof(PartitionRangeBound));
bound->index = index;
bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));

I am not really convinced, why should we have above Assert.

4.
 static List *
-get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
+get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
+  bool for_default)
 {

The addition and the way flag ‘for_default’ is being used is very confusing.
At this moment I am not able to think about a alternate solution to the
way you have used this flag. But will try and see if I can come up with
an alternate approach.

I still need to look at the test, and need to do some manual testing. Will
update here with any findings.

Patches:
0001: Refactoring/simplification of code in RelationBuildPartitionDesc(),
0002: implementation of default range partition by Beena.

Regards,
Jeevan

0001-Refactor-RelationBuildPartitionDesc.patch
Description: Binary data

0002-Add-support-for-default-partition-for-range.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-21 Thread Amit Langote

On 2017/08/22 1:08, Robert Haas wrote:
> On Mon, Aug 21, 2017 at 7:45 AM, Etsuro Fujita
>  wrote:
>> If there are no objections, I'll add this to the open item list for v10.
> 
> This seems fairly ad-hoc to me.  I mean, now you have
> CheckValidResultRel not being called in just this one case -- but that
> bypasses all the checks that function might do, not just this one.  It
> so happens that's OK at the moment because CheckCmdReplicaIdentity()
> doesn't do anything in the insert case.
>
> I'm somewhat inclined to just view this as a limitation of v10 and fix
> it in v11.  If you want to fix it in v10, I think we need a different
> approach -- just ripping the CheckValidResultRel checks out entirely
> doesn't seem like a good idea to me.

Before 389af951552f, the relkind check that is now performed by
CheckValidResultRel(), used to be done in InitResultRelInfo().  ISTM, it
was separated out so that certain ResultRelInfos could be initialized
without the explicit relkind check, either because it's taken care of
elsewhere or the table in question is *known* to be a valid result
relation.  Maybe, mostly just the former of the two reasons when that
commit went in.

IMO, the latter case applies when initializing ResultRelInfos for
partitions during tuple-routing, because the table types we allow to
become partitions are fairly restricted.

Also, it seems okay to show the error messages that CheckValidResultRel()
shows when the concerned table is *directly* addressed in a query, but the
same error does not seem so user-friendly when emitted for one of the
partitions while tuple-routing is being set up.  IMHO, there should be
"tuple routing" somewhere in the error message shown in that case, even if
it's for the total lack of support for inserts by a FDW.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-08-21 Thread Amit Langote

On 2017/08/22 9:39, Michael Paquier wrote:
> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
>  wrote:
>> On 2017/06/27 10:22, Michael Paquier wrote:
>>> On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada  
>>> wrote:
 Thank you for the patches! I checked additional patches for brin and
 spgist. They look good to me.
>>>
>>> Last versions are still missing something: brin_mask() and spg_mask()
>>> can be updated so as mask_unused_space() is called for meta pages.
>>> Except that the patches look to be on the right track.
>>
>> Thanks for the review.
>>
>> I updated brin_mask() and spg_mask() in the attached updated patches so
>> that they consider meta pages as containing unused space.
> 
> Thanks for the new version. I had an extra look at those patches, and
> I am happy with its shape. I also have been doing more testing with
> pg_upgrade and wal_consistency_checking, and found no issues. So
> switched status as ready for committer. Everything could be put into a
> single commit.

Thanks for the review.  Agreed about committing these together.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-08-21 Thread Michael Paquier

On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
 wrote:
> On 2017/06/27 10:22, Michael Paquier wrote:
>> On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada  
>> wrote:
>>> Thank you for the patches! I checked additional patches for brin and
>>> spgist. They look good to me.
>>
>> Last versions are still missing something: brin_mask() and spg_mask()
>> can be updated so as mask_unused_space() is called for meta pages.
>> Except that the patches look to be on the right track.
>
> Thanks for the review.
>
> I updated brin_mask() and spg_mask() in the attached updated patches so
> that they consider meta pages as containing unused space.

Thanks for the new version. I had an extra look at those patches, and
I am happy with its shape. I also have been doing more testing with
pg_upgrade and wal_consistency_checking, and found no issues. So
switched status as ready for committer. Everything could be put into a
single commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-08-21 Thread David G. Johnston

On Mon, Aug 21, 2017 at 2:30 PM, Simon Riggs  wrote:

>
> > The patch applies cleanly to current master and all tests run without
> > failures.
> >
> > I also test against all current supported versions (9.2 ... 9.6) and
> didn't
> > find any issue.
> >
> > Changed status to "ready for commiter".
>
> I get the problem, but not this solution. It's too specific and of
> zero other value, yet not even exactly specific to the issue. We
> definitely don't want --no-extension-comments, but --no-comments
> removes ALL comments just to solve a weird problem. (Meta)Data loss,
> surely?
>

It was argued, successfully IMO, to have this capability independent of
any particular problem to be solved.  In that context I haven't given the
proposed implementation much thought.

I do agree that this is a very unappealing solution for the stated problem
and am hoping someone will either have an ingenious idea to solve it the
right way or is willing to hold their nose and put in a hack.

David J.

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Eisentraut

On 8/21/17 12:33, Peter Geoghegan wrote:
> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>  wrote:
>> Here are my patches to address this.
> 
> These look good.

Committed.  That closes this open item.

> One small piece of feedback: I suggest naming the custom collation
> "numeric" something else instead: "natural". Apparently, the behavior
> it implements is sometimes called natural sorting. See
> https://en.wikipedia.org/wiki/Natural_sort_order.

I have added a note about that, but the official name in the Unicode
documents is "numeric ordering", so I kept that in there as well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-21 Thread Michael Paquier

On Tue, Aug 22, 2017 at 3:43 AM, Robert Haas  wrote:
> Works for me.  While I'm sure this won't eclipse previous achievements
> in this area, it still seems worthwhile.

This one is intentional per what happens in the US today? ;)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-08-21 Thread Simon Riggs

On 7 August 2017 at 16:14, Fabrízio de Royes Mello
 wrote:
>
> On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan  wrote:
>>
>> On 20 July 2017 at 05:14, Robins Tharakan  wrote:
>>>
>>> On 20 July 2017 at 05:08, Michael Paquier 
>>> wrote:

 On Wed, Jul 19, 2017 at 8:59 PM,
 Fabrízio de Royes Mello
 > You should add the properly sgml docs for this pg_dumpall change also.

 Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
 extensions are in src/test/modules/test_pg_dump, but you just care
 about the former with this patch. And if you implement some new tests,
 look at the other tests and base your work on that.
>>>
>>>
>>> Thanks Michael /
>>> Fabrízio.
>>>
>>> Updated patch (attached) additionally adds SGML changes for pg_dumpall.
>>> (I'll try to work on the tests, but sending this
>>> nonetheless
>>> ).
>>>
>>
>> Attached is an updated patch (v4) with basic tests for pg_dump /
>> pg_dumpall.
>> (Have zipped it since patch size jumped to ~40kb).
>>
>
> The patch applies cleanly to current master and all tests run without
> failures.
>
> I also test against all current supported versions (9.2 ... 9.6) and didn't
> find any issue.
>
> Changed status to "ready for commiter".

I get the problem, but not this solution. It's too specific and of
zero other value, yet not even exactly specific to the issue. We
definitely don't want --no-extension-comments, but --no-comments
removes ALL comments just to solve a weird problem. (Meta)Data loss,
surely?

Thinking ahead, are we going to add a new --no-objecttype switch every
time someone wants it?

It would make more sense to add something more general and extensible
such as --exclude-objects=comment
or similar name

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-08-21 Thread Robert Haas

On Mon, Aug 21, 2017 at 2:42 AM, Mithun Cy  wrote:
> Thanks for the patch, I have tested the above fix now it works as
> described. From my test patch looks good, I did not find any other
> issues.

Considering the totality of the circumstances, it seemed appropriate
to me to commit this.  So I did.

Thanks for all your work on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-21 Thread Robert Haas

On Fri, Aug 18, 2017 at 4:08 PM, Douglas Doole  wrote:
>> 4. I am pretty doubtful that "Memory: 25kB" is going to be stable
>> enough for us to want that output memorialized in the regression ...
>
> Fair enough. I wanted to be a bit more sophisticated in my check than
> looking for a single value so I worked out something that distills the
> explain down to the key elements.

Works for me.  While I'm sure this won't eclipse previous achievements
in this area, it still seems worthwhile.  So committed with one minor
change to shorten a long-ish line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan

On Mon, Aug 21, 2017 at 9:33 AM, Peter Geoghegan  wrote:
> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>  wrote:
>> Here are my patches to address this.
>
> These look good.

Also, I don't know why en-u-kr-others-digit wasn't accepted by CREATE
COLLATION, as you said on the other thread just now. That's directly
lifted from TR #35. Is it an ICU version issue? I guess it doesn't
matter that much, though.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan

On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
 wrote:
> Here are my patches to address this.

These look good.

One small piece of feedback: I suggest naming the custom collation
"numeric" something else instead: "natural". Apparently, the behavior
it implements is sometimes called natural sorting. See
https://en.wikipedia.org/wiki/Natural_sort_order.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-21 Thread Robert Haas

On Mon, Aug 21, 2017 at 7:45 AM, Etsuro Fujita
 wrote:
> If there are no objections, I'll add this to the open item list for v10.

This seems fairly ad-hoc to me.  I mean, now you have
CheckValidResultRel not being called in just this one case -- but that
bypasses all the checks that function might do, not just this one.  It
so happens that's OK at the moment because CheckCmdReplicaIdentity()
doesn't do anything in the insert case.

I'm somewhat inclined to just view this as a limitation of v10 and fix
it in v11.  If you want to fix it in v10, I think we need a different
approach -- just ripping the CheckValidResultRel checks out entirely
doesn't seem like a good idea to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-21 Thread Peter Eisentraut

On 8/15/17 15:04, Peter Geoghegan wrote:
> * "23.2.2.3. Copying Collations" suggests that the only use of CREATE
> COLLATION is copying collations, which is far from true with ICU. We
> should change that at the same time as this change is made. I think
> that just changing the title would improve the overall flow of the
> page.

I don't understand why that has to be changed and how.

> * Maybe add an example of numeric ordering -- the "alphanumeric
> invoice" case, where you want text containing numbers to have the
> numbers sort as numbers iff the comparison is to be resolved when
> comparing numbers. I think that that's really useful, and worth
> specifically calling out. I definitely would have used that had it
> been available ten years ago.

done, quite useful

> * Let's use "en-u-kr-others-digit" instead of "en-u-kr-latn-digit' in
> the example. It makes no real difference to us English speakers, but
> means that the example works the same for those that use a different
> alphabet. It's more culturally neutral.

I follow what you are saying, but that locale string is not accepted by
CREATE COLLATION.

> * If we end up having initdb put all locales rather than all
> collations in pg_collation, which I think is very likely, then we can
> put in a link to ICU's locale explorer web resource:
> 
> https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=en_HK
> 
> This lets the user see exactly what they'll get from a base locale
> using an intuitive interface (assuming it matches their CLDR version).

done

Patch has been posted to another thread.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Eisentraut

On 8/19/17 19:15, Peter Geoghegan wrote:
> Noah Misch  wrote:
>> I think you're contending that, as formulated, this is not a valid v10 open
>> item.  Are you?
> 
> As the person that came up with this formulation, I'd like to give a
> quick summary of my current understanding of the item's status:
> 
> * We're in agreement that we ought to have initdb create initial
>   collations based on ICU locales, not based on distinct ICU
>   collations [1].
> 
> * We're in agreement that variant keywords should not be
>   created for each base locale/collation [2].
> 
> Once these two changes are made, I think that everything will be in good
> shape as far as pg_collation name stability goes. It shouldn't take
> Peter E. long to write the patch. I'm happy to write the patch on his
> behalf if that saves time.
> 
> We're also going to work on the documentation, to make keyword variants
> like -emoji and -traditional at least somewhat discoverable, and to
> explain the capabilities of custom ICU collations more generally.

Here are my patches to address this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 5a70c7e97758bf06fd717b391b66f3cc0366f063 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 21 Aug 2017 09:17:06 -0400
Subject: [PATCH 1/2] Expand set of predefined ICU locales

Install language+region combinations even if they are not distinct from
the language's base locale.  This gives better long-term stability of
the set of predefined locales and makes the predefined locales less
implementation-dependent and more practical for users.
---
 doc/src/sgml/charset.sgml| 13 ++---
 src/backend/commands/collationcmds.c | 15 ---
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 48ecfc5f48..f2a4acc115 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -653,9 +653,8 @@ ICU collations
 string will be accepted as a locale name.)
 See http://userguide.icu-project.org/locale;> for
 information on ICU locale naming.  initdb uses the ICU
-APIs to extract a set of locales with distinct collation rules to populate
-the initial set of collations.  Here are some example collations that
-might be created:
+APIs to extract a set of distinct locales to populate the initial set of
+collations.  Here are some example collations that might be created:
 
 
  
@@ -677,9 +676,9 @@ ICU collations
   
German collation for Austria, default variant

-(As of this writing, there is no,
-say, de-DE-x-icu or de-CH-x-icu,
-because those are equivalent to de-x-icu.)
+(There are also, say, de-DE-x-icu
+or de-CH-x-icu, but as of this writing, they are
+equivalent to de-x-icu.)

   
  
@@ -690,6 +689,7 @@ ICU collations
German collation for Austria, phone book variant
   
  
+
  
   und-x-icu (for undefined)
   
@@ -724,7 +724,6 @@ Copying Collations
 
 CREATE COLLATION german FROM "de_DE";
 CREATE COLLATION french FROM "fr-x-icu";
-CREATE COLLATION "de-DE-x-icu" FROM "de-x-icu";
 

 
diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index 8572b2dedc..d36ce53560 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -667,7 +667,16 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
}
 #endif /* READ_LOCALE_A_OUTPUT 
*/
 
-   /* Load collations known to ICU */
+   /*
+* Load collations known to ICU
+*
+* We use uloc_countAvailable()/uloc_getAvailable() rather than
+* ucol_countAvailable()/ucol_getAvailable().  The former returns a full
+* set of language+region combinations, whereas the latter only returns
+* language+region combinations of they are distinct from the language's
+* base collation.  So there might not be a de-DE or en-GB, which would 
be
+* confusing.
+*/
 #ifdef USE_ICU
{
int i;
@@ -676,7 +685,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 * Start the loop at -1 to sneak in the root locale without too 
much
 * code duplication.
 */
-   for (i = -1; i < ucol_countAvailable(); i++)
+   for (i = -1; i < uloc_countAvailable(); i++)
{
/*
 * In ICU 4.2, ucol_getKeywordValuesForLocale() 
sometimes returns
@@ -706,7 +715,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
if (i == -1)
name = "";  /* ICU root locale */
else
-

Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-21 Thread Robert Haas

On Mon, Aug 21, 2017 at 10:07 AM, Craig Ringer  wrote:
> Makes sense, and I'm not especially concerned. If the expected solution to
> such usage is to use non-blocking calls, that's fine with me.
>
> I partly wanted to put this out there to help the next person looking into
> it. Or myself, when I've forgotten and go looking again ;) . But also, to
> ensure that this was in fact fully expected behaviour not an oversight re
> applying shm_mq to non-bgworker endpoints.

Yep, it's expected.  It's possible I should have designed it
differently, so if someone does feel concerned at some point we can
certainly debate how to change things, but what you're describing
matches my expectations and it seems OK to me, pretty much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-08-21 Thread Alexander Kumenkov


|Hello everyone,

I made a new patch according to the previous comments. It is simpler 
now, only adding a few checks to the bitmap heap scan node. When the 
target list for the bitmap heap scan is empty, and there is no filter, 
and the bitmap page generated by the index scan is exact, and the 
corresponding heap page is visible to all transaction, we don't fetch it.


The performance is better than with the previous patch, because now it 
can leverage the parallel heap scan logic. A simple benchmark is 
attached: this patch is more than ten times faster on a frequent search 
term, and two times faster on an infrequent one.


Still, there is one thing that is bothering me. I use empty targetlist 
as the marker of that I should not fetch tuples. Because of that, I have 
to make sure that use_physical_tlist() doesn't touch empty tlists. 
Consequently, if count(*) sits on top of a subquery, this subquery has 
to project and cannot be deleted (see trivial_subqueryscan). There is 
such a query in the regression test select_distinct: "select count(*) 
from (select distinct two, four, two from tenk1);". For that particular 
query it shouldn't matter much, so I changed the test, but the broader 
implications of this escape me at the moment.


The cost estimation is very simplified now: I just halve the number of 
pages to be fetched. The most important missing part is checking whether 
we have any quals that are not checked by the index: if we do, we'll 
have to fetch all the tuples. Finding nonindex qualifications is 
somewhat convoluted for the bitmap index scan tree involving multiple 
indexes, so I didn't implement it for now. We could also consider 
estimating the number of lossy pages in the tid bitmap given current 
work_mem size.


I'll be glad to hear your thoughts on this.|
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 79f534e4e9..d7ea6f6929 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -39,6 +39,7 @@
 
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "miscadmin.h"
@@ -225,9 +226,25 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 
 			/*
-			 * Fetch the current heap page and identify candidate tuples.
+			 * If we don't need the tuple contents and are only counting them,
+			 * we can skip fetching the page if the bitmap doesn't need rechecking
+			 * and all tuples on the page are visible to our transaction
 			 */
-			bitgetpage(scan, tbmres);
+			node->bhs_nofetch = !tbmres->recheck
+&& node->ss.ps.qual == NULL
+&& node->ss.ps.plan->targetlist == NIL
+&& VM_ALL_VISIBLE(node->ss.ss_currentRelation, tbmres->blockno,
+  >bhs_vmbuffer);
+
+			if (node->bhs_nofetch)
+scan->rs_ntuples = tbmres->ntuples;
+			else
+			{
+/*
+ * Fetch the current heap page and identify candidate tuples.
+ */
+bitgetpage(scan, tbmres);
+			}
 
 			if (tbmres->ntuples >= 0)
 node->exact_pages++;
@@ -289,45 +306,58 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		BitmapPrefetch(node, scan);
 
-		/*
-		 * Okay to fetch the tuple
-		 */
-		targoffset = scan->rs_vistuples[scan->rs_cindex];
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
-		lp = PageGetItemId(dp, targoffset);
-		Assert(ItemIdIsNormal(lp));
 
-		scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
-		scan->rs_ctup.t_len = ItemIdGetLength(lp);
-		scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
-		ItemPointerSet(>rs_ctup.t_self, tbmres->blockno, targoffset);
+		if (node->bhs_nofetch)
+		{
+			/*
+			 * If we don't have to fetch the tuple, just return a
+			 * bogus one 
+			 */
+			slot->tts_isempty = false;
+			slot->tts_nvalid = 0;
+		}
+		else
+		{
+			/*
+			 * Okay to fetch the tuple
+			 */
+			targoffset = scan->rs_vistuples[scan->rs_cindex];
+			dp = (Page) BufferGetPage(scan->rs_cbuf);
+			lp = PageGetItemId(dp, targoffset);
+			Assert(ItemIdIsNormal(lp));
 
-		pgstat_count_heap_fetch(scan->rs_rd);
+			scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
+			scan->rs_ctup.t_len = ItemIdGetLength(lp);
+			scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
+			ItemPointerSet(>rs_ctup.t_self, tbmres->blockno, targoffset);
 
-		/*
-		 * Set up the result slot to point to this tuple. Note that the slot
-		 * acquires a pin on the buffer.
-		 */
-		ExecStoreTuple(>rs_ctup,
-	   slot,
-	   scan->rs_cbuf,
-	   false);
+			pgstat_count_heap_fetch(scan->rs_rd);
 
-		/*
-		 * If we are using lossy info, we have to recheck the qual conditions
-		 * at every tuple.
-		 */
-		if (tbmres->recheck)
-		{
-			econtext->ecxt_scantuple = slot;
-			ResetExprContext(econtext);
+			/*
+			 * Set up the result slot to point to this tuple. Note that the slot
+			 * acquires a pin on the buffer.
+			 */
+			ExecStoreTuple(>rs_ctup,
+		   slot,
+

Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-21 Thread Craig Ringer

On 21 August 2017 at 21:44, Robert Haas  wrote:

> While this would work, I don't really see the need for it given the
> availability of nonblocking operations.  See mq_putmessage() for an
> example.
>

Makes sense, and I'm not especially concerned. If the expected solution to
such usage is to use non-blocking calls, that's fine with me.

I partly wanted to put this out there to help the next person looking into
it. Or myself, when I've forgotten and go looking again ;) . But also, to
ensure that this was in fact fully expected behaviour not an oversight re
applying shm_mq to non-bgworker endpoints.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-21 Thread Robert Haas

On Sun, Aug 20, 2017 at 10:57 PM, Craig Ringer  wrote:
> I've noticed a possible bug / design limitation where shm_mq_wait_internal
> sleep in a latch wait forever, and the postmaster gets stuck waiting for the
> bgworker the wait is running in to exit.
>
> This happens when the shm_mq does not have an associated bgworker handle
> registered because the other end is not known at mq creation time or is a
> normal backend not a bgworker. So a BGW handle cannot be passed.

Well, right.  The reason the API lets you pass a bgworker handle is to
avoid the problem that happens if you don't pass a bgworker handle.
:-)

> shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
> interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
> way; it just merrily resets its latch and keeps looping.
>
> It will bail out correctly on SIGQUIT.

There are a couple of ways to avoid this in the existing code
structure.  One is to not use blocking operations.  If you pass nowait
as true then you can put the latch wait loop in the caller and inject
whatever logic you want.  Another approach is to use die() or
something else that sets ProcDiePending as your SIGTERM handler.

> The only ways I can see to fix this are:
>
> * Generalize SIGTERM handling across postgres, so there's a global
> "got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
> loop, and every backend's signal handler must set it. Lots of churn.

Not a bad idea otherwise, though.

> * In a proc's signal handler, use globals set before entry and after exit
> from shm_mq operations to detect if we're currently in shm_mq and promote
> SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
> CHECK_FOR_INTERRUPTS() will notice when the handler returns.

Sounds ugly.

> * Allow passing of a *bool that tests for SIGTERM, or a function pointer
> called on each iteration to test whether looping should continue, to be
> passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
> that instead. Provide a shm_mq_set_handle equivalent for it too.

While this would work, I don't really see the need for it given the
availability of nonblocking operations.  See mq_putmessage() for an
example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Text search configuration extension

2017-08-21 Thread Arthur Zakirov

Hello,

On Fri, Aug 18, 2017 at 03:30:38PM +0300, Aleksandr Parfenov wrote:
> Hello hackers!
> 
> I'm working on a new approach in text search configuration and want to
> share my thought with community in order to get some feedback and maybe
> some new ideas.
> 

There are several cases, where the new syntax could be useful:

https://www.postgresql.org/message-id/4733b65a.9030...@students.mimuw.edu.pl
Firstly check is lexeme stopword or not, and only then normalize it.

https://www.postgresql.org/message-id/c6851b7e-da25-3d8e-a5df-022c395a11b4%40postgrespro.ru
Support union of outputs of several dictionaries.

https://www.postgresql.org/message-id/46D57E6F.8020009%40enterprisedb.com
Support of chain of dictionaries using MAP BY operator.

The basic idea of the approach is to bring to a user more control of text 
search configurations without writing additional or modifing existing 
dictionaries.

> ALTER TEXT SEARCH CONFIGURATION en_de_search ADD MAPPING FOR asciiword,
> word WITH
> CASE
>WHEN english_hunspell IS NOT NULL THEN english_hunspell
>WHEN german_hunspell IS NOT NULL THEN german_hunspell
>ELSE
>  -- stem dictionaries can't be used for language detection
>  english_stem UNION german_stem
> END;

For example, the configuration mentioned above will bring the following results:

=# select d @@ q, d, q from to_tsvector('german_hunspell', 'Dieser Hund wollte 
ihn jedoch nicht nach Hause begleiten') d, to_tsquery('en_de_search', 'hause') 
q;
 ?column? |  d   |q 
--+--+--
 t| 'begleiten':9 'hausen':8 'hund':2 'jedoch':5 | 'hausen'
(1 row)

This configuration is useful when a query language is unknown.

Best regards,

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-08-21 Thread Michael Paquier

On Tue, Jun 20, 2017 at 1:11 PM, Michael Paquier
 wrote:
> With the tests directly in the patch, things are easy to run. WIth
> PG10 stabilization work, of course I don't expect much feedback :)
> But this set of patches looks like the direction we want to go so as
> JDBC and libpq users can take advantage of channel binding with SCRAM.

Attached is a new patch set, rebased as of c6293249.
-- 
Michael


0001-Refactor-routine-to-test-connection-to-SSL-server.patch
Description: Binary data


0002-Support-channel-binding-tls-unique-in-SCRAM.patch
Description: Binary data


0003-Add-connection-parameters-saslname-and-saslchannelbi.patch
Description: Binary data


0004-Implement-channel-binding-tls-server-end-point-for-S.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-21 Thread Michael Paquier

On Mon, Aug 14, 2017 at 7:36 PM, Michael Paquier
 wrote:
> Thanks for adding the details directly, downgrading the hard way is
> what I am doing now using the past packages of libxml2 in Arch's
> archives [1]. ArchLinux is a bit wrong in the fact of shipping a
> package with a behavior change. Let's wait also for libxml2 folks to
> see what they have to provide on the matter... The next release of
> libxml2 would hurt Postgres if it were to be released today.

This story has finished with a fix in libxml2 itself:
https://git.gnome.org/browse/libxml2/commit/?id=3aca7f31cb9901dc3af449e08dda647898bfc1fe
And Archlinux has released a new package of libxml2 with a fix two days back:
https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/libxml2=06ece61e13f760ee5c4c1df19849807b887b744d
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-21 Thread Etsuro Fujita


On 2017/08/07 15:45, Etsuro Fujita wrote:

On 2017/08/07 15:33, Amit Langote wrote:

On 2017/08/07 15:22, Etsuro Fujita wrote:

On 2017/08/07 13:11, Amit Langote wrote:> The patch looks good too.
Although, looking at the following hunk:


+ Assert(partrel->rd_rel->relkind == RELKIND_RELATION ||
+    partrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE);
+
    /*
 * Verify result relation is a valid target for the current
operation.
 */
! if (partrel->rd_rel->relkind == RELKIND_RELATION)
! CheckValidResultRel(partrel, CMD_INSERT);

makes me now wonder if we need the CheckValidResultRel check at 
all.  The

only check currently in place for RELKIND_RELATION is
CheckCmdReplicaIdentity(), which is a noop (returns true) for inserts
anyway.


Good point!  I left the verification for a plain table because that is
harmless but as you mentioned, that is nothing but an overhead.  So, 
here

is a new version which removes the verification at all from
ExecSetupPartitionTupleRouting.


The updated patch looks good to me, thanks.


Thanks for the review!


If there are no objections, I'll add this to the open item list for v10.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-21 Thread Etsuro Fujita


On 2017/04/08 4:24, Robert Haas wrote:

Looking at the code itself, I find the changes to joinpath.c rather alarming.


I missed this mail.  Sorry about that, Robert.


+/* Save hashclauses for possible use by the FDW */
+if (extra->consider_foreignjoin && hashclauses)
+extra->hashclauses = hashclauses;

A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown.  But I also think there's no
real need for this.  The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.


I refactored that into a new function so that we can call that function 
at the top of add_paths_to_joinrel and store the result in 
JoinPathExtraData.



+/* Save first mergejoin data for possible use by the FDW */
+if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->outersortkeys = outerkeys;
+extra->innersortkeys = innerkeys;
+}

Similarly here.  select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me.  I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.


Done that way.


There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses.  Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.


Done.


-/*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
-if (IsA(joinpath->outerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-}
-
-if (IsA(joinpath->innerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-}

This logic is removed and not replaced with anything, but I don't see
what keeps this code...

+Path   *outer_path = outerrel->cheapest_total_path;
+Path   *inner_path = innerrel->cheapest_total_path;

...from picking a ForeignPath?


CreateLocalJoinPath creates an alternative local join path for a foreign 
join from the cheapest total paths for the outer/inner relations.  The 
reason for the above is to pass these paths to that function.  On second 
thought, however, I think it would be convenient for the caller to just 
pass outerrel/innerrel to that function.  So, I modified that function's 
API as such.  Another change is: the previous version of that function 
allowed the caller to create a parameterized local-join path 
corresponding to a parameterized foreign join, but that is a feature, 
not a bug fix, so I dropped that.  (I'll propose that as part of the 
patch in [1].)



There's probably more to think about here, but those are my question
on an initial read-through.


Thanks for the review!

Attached is an updated version of the patch.

Best regards,
Etsuro Fujita

[1] https://commitfest.postgresql.org/14/1042/
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!

Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-21 Thread Amit Kapila

On Mon, Aug 21, 2017 at 3:15 PM, Simon Riggs  wrote:
> On 21 August 2017 at 10:08, Amit Kapila  wrote:
>
>> Thoughts?
>
> This seems like a very basic problem for parallel queries.
>
> The problem seems to be that we are calculating the cost of the plan
> rather than the speed of the plan.
>
> Clearly, a parallel task has a higher overall cost but a lower time to
> complete if resources are available.
>
> We have the choice of 1) adding a new optimizable quantity,
>

I think this has the potential of making costing decisions difficult.
I mean to say, if we include any such new parameter, then we need to
consider that along with cost as we can't completely ignore the cost.

> or of 2)
> treating cost = speed, so we actually reduce the cost of a parallel
> plan rather than increasing it so it is more likely to be picked.
>

Yeah, this is what is being currently followed for costing of parallel
plans and this patch also tries to follow the same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-21 Thread Simon Riggs

On 21 August 2017 at 10:08, Amit Kapila  wrote:

> Thoughts?

This seems like a very basic problem for parallel queries.

The problem seems to be that we are calculating the cost of the plan
rather than the speed of the plan.

Clearly, a parallel task has a higher overall cost but a lower time to
complete if resources are available.

We have the choice of 1) adding a new optimizable quantity, or of 2)
treating cost = speed, so we actually reduce the cost of a parallel
plan rather than increasing it so it is more likely to be picked.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-08-21 Thread Amit Kapila

On Mon, Aug 21, 2017 at 12:58 PM, Haribabu Kommi
 wrote:
>
> On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila 
> wrote:
>>
>>
>> Also, it is quite possible that some of the storage Am's don't even
>> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
>> guess what we need here is to provide a way so that different storage
>> am's can register their function pointer for an equivalent to
>> satisfies function.  So, we need to change
>> SnapshotData.SnapshotSatisfiesFunc in some way so that different
>> handlers can register their function instead of using that directly.
>> I think that should address the problem you are planning to solve by
>> omitting buffer parameter.
>
>
> Thanks for your suggestion. Yes, it is better to go in the direction of
> SnapshotSatisfiesFunc.
>
> I verified the above idea of implementing the Tuple visibility functions
> and assign them into the snapshotData structure based on the snapshot.
>
> The Tuple visibility functions that are specific to the relation are
> available
> with the RelationData structure and this structure may not be available,
>

Which functions are you referring here?  I don't see anything in
tqual.h that uses RelationData.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-08-21 Thread Amit Kapila

On Mon, Aug 21, 2017 at 1:44 PM, Haribabu Kommi
 wrote:
>
>
> Thanks for adding more details. It is easy to understand.
>
> I marked the patch as ready for committer in the commitfest.
>

Thank you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-21 Thread Amit Kapila

On Wed, Aug 16, 2017 at 5:04 PM, Robert Haas  wrote:
> On Wed, Aug 16, 2017 at 7:23 AM, Amit Kapila  wrote:
>> On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas  wrote:
>>> On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila  
>>> wrote:
 I think skipping a generation of gather paths for scan node or top
 level join node generated via standard_join_search seems straight
 forward, but skipping for paths generated via geqo seems to be tricky
 (See use of generate_gather_paths in merge_clump).  Assuming, we find
 some way to skip it for top level scan/join node, I don't think that
 will be sufficient, we have some special way to push target list below
 Gather node in apply_projection_to_path, we need to move that part as
 well in generate_gather_paths.
>>>
>>> I don't think that can work, because at that point we don't know what
>>> target list the upper node wants to impose.
>>>
>>
>> I am suggesting to call generate_gather_paths just before we try to
>> apply projection on paths in grouping_planner (file:planner.c;
>> line:1787; commit:004a9702).  Won't the target list for upper nodes be
>> available at that point?
>
> Oh, yes.  Apparently I misunderstood your proposal.
>

Thanks for acknowledging the idea.  I have written a patch which
implements the above idea.  At this stage, it is merely to move the
discussion forward. Few things which I am not entirely happy about
this patch are:

(a) To skip generating gather path for top level scan node, I have
used the number of relations which has RelOptInfo, basically
simple_rel_array_size. Is there any problem with it or do you see any
better way?
(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths.  I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost).").  It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

Thoughts?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

parallel_paths_include_tlist_cost_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-08-21 Thread Haribabu Kommi

On Mon, Aug 14, 2017 at 8:41 PM, Amit Kapila 
wrote:

> On Sun, Aug 13, 2017 at 6:49 PM, Haribabu Kommi
>  wrote:
> > On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila 
> > wrote:
> >>
> >
> > Thanks for the updated patch. Patch looks fine. I just have some
> > minor comments.
> >
> > + * ExecEvalParamExecParams
> > + *
> > + * Execute the subplan stored in PARAM_EXEC initplans params, if not
> > executed
> > + * till now.
> > + */
> > +void
> > +ExecEvalParamExecParams(Bitmapset *params, EState *estate)
> >
> > I feel it is better to explain when this function executes the sub plans
> > that are
> > not executed till now? Means like in what scenario?
> >
>
> It just means that it will execute the same initplan (subplan) just
> once in master backend even if it used in multiple places.  This is
> the same kind of usage as we have in ExecEvalParamExec.  You can find
> its usage by using some query like
>
> explain analyse select sum(t1.i) from t1, t2 where t1.j=t2.j and t1.k
> = (select count(k) from t3) and t1.k=t2.k;
>
> Ensure you insert some rows in t1 and t2 that match the count from t3.
> If you are using the schema and data given in Kuntal's script in email
> above, then you need to insert something like
> t1(900,900,900);t2(900,900,900);
>
> It generates plan like below:
>
> postgres=# explain analyse select sum(t1.i) from t1, t2 where
> t1.j=t2.j and t1.k = (select count(k) from t3) and t1.k=t2.k;
> QUERY PLAN
> 
> ---
>  Aggregate  (cost=29.65..29.66 rows=1 width=8) (actual
> time=22572.521..22572.521 rows=1 loops=1)
>InitPlan 1 (returns $0)
>  ->  Finalize Aggregate  (cost=9.70..9.71 rows=1 width=8) (actual
> time=4345.110..4345.111 rows=1 loops=1)
>->  Gather  (cost=9.69..9.70 rows=2 width=8) (actual
> time=4285.019..4345.098 rows=3 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Partial Aggregate  (cost=9.69..9.70 rows=1
> width=8) (actual time=0.154..0.155 rows=1 loops=3)
>->  Parallel Seq Scan on t3  (cost=0.00..8.75
> rows=375 width=4) (actual time=0.011..0.090 rows=300 loops=3)
>->  Nested Loop  (cost=0.00..19.93 rows=1 width=4) (actual
> time=22499.918..22572.512 rows=1 loops=1)
>  Join Filter: (t1.j = t2.j)
>  ->  Gather  (cost=0.00..9.67 rows=2 width=12) (actual
> time=10521.356..10521.363 rows=1 loops=1)
>Workers Planned: 2
>Params Evaluated: $0
>Workers Launched: 2
>->  Parallel Seq Scan on t1  (cost=0.00..9.67 rows=1
> width=12) (actual time=0.506..0.507 rows=0 loops=3)
>  Filter: (k = $0)
>  Rows Removed by Filter: 299
>  ->  Materialize  (cost=0.00..10.21 rows=2 width=8) (actual
> time=11978.557..12051.142 rows=1 loops=1)
>->  Gather  (cost=0.00..10.20 rows=2 width=8) (actual
> time=11978.530..12051.113 rows=1 loops=1)
>  Workers Planned: 2
>  Params Evaluated: $0
>  Workers Launched: 2
>  ->  Parallel Seq Scan on t2  (cost=0.00..10.20
> rows=1 width=8) (actual time=0.067..0.067 rows=0 loops=3)
>Filter: (k = $0)
>Rows Removed by Filter: 333
>  Planning time: 15103.237 ms
>  Execution time: 22574.703 ms
> (27 rows)
>
> You can notice that initplan is used at multiple nodes, but it will be
> evaluated just once.  If you want, I can add a sentence in the
> comments, but I think this is somewhat obvious and the same use case
> already exists.  Let me know if you still think that comments need to
> be expanded?
>

Thanks for providing details. Yes it is clear to me.


>
> > + if (IsA(plan, Gather))
> > + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> > initSetParam);
> > + else
> > + ((GatherMerge *) plan)->initParam =
> > bms_intersect(plan->lefttree->extParam, initSetParam);
> >
> >
> > I think the above code is to find out the common parameters that are
> prsent
> > in the external
> > and out params.
> >
>
> Here, we want to save all the initplan params that can be used below
> the gather node.  extParam contains the set of all external PARAM_EXEC
> params that can be used below gather node and we just want initplan
> params out of those.
>
> > It may be better to explain the logic in the comments.
> >
>
> I have kept comments atop of the function set_param_references to
> explain the context, but now I have added few more in the code as
> suggested by you.  See if that suffices the need.


Thanks for adding more details. It is easy to understand.

I marked the patch as ready for committer in the commitfest.


Regards,

Re: [HACKERS] assorted code cleanup

2017-08-21 Thread Michael Meskes

> And there are many "(0)" "S_ANYTHING" in src/interfaces/ecpg/test/ and
> src/interfaces/ecpg/preproc/.

I might have missed something here, but where/why is S_ANYTHING a problem?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Document pgstattuple privileges without ambiguity

2017-08-21 Thread Feike Steenbergen

Hi,

When installing pgstattuple on 10, the documentation about its
privileges was unclear to me. (Does the pg_stat_scan_tables role get
EXECUTE privileges by default or not?).

By making the privilege paragraph less verbose and a duplicate of the
paragraph used for pgfreespacemap and pgbuffercache we remove the
ambiguity and make the documentation more uniform.

The replacement paragrahp is much less verbose and loses some detailed
pointers (to GRANT syntax), but in this instance I feel less is more.

Regards,

Feike


pgstattuple_privilege_documentation_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-21 Thread Chris Travers

On Sun, Aug 20, 2017 at 4:10 AM, MauMau  wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

Sure but stored functions can take arguments, such as a query string which
gets handled by the language handler.  There's absolutely no reason you
cannot declare a function in C that takes in a Cypher query and returns a
set of tuples.   And you can do a whole lot with preloaded shared libraries
if you need to.

The planning bit is more difficult, but see below as to where I see major
limits here.

>
> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

That sounds like a foot gun.  I would probably think of those cases as
being ideal for a custom background worker, similar to Mongress.
Expecting to be able to switch query languages on the fly strikes me as
adding totally needless complexity everywhere to be honest.  Having
different listeners on different ports simplifies this a lot and having,
additionally, query languages for ad-hoc mixing via language handlers might
be able to get most of what you want already.

>
> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

It seems like all you really need is a planner hook for user defined
languages (I.e. "how many rows does this function return with these
parameters" right?).  Right now we allow hints but they are static.  I
wonder how hard this would be using preloaded, shared libraries.

>
> So in_region() and cast_region() are not functions to be executed
> during execution phase, but are syntax constructs that are converted,
> during analysis phase, into calls to another region's parser/analyzer
> and an inter-model cast routine.
>

So basically they work like immutable functions except that you cannot
index the output?

>
> 1. The relational parser finds in_region('cypher_graph', 'graph
> query') and produces a parse node InRegion(region_name, query) in the
> parse tree.
>
> 2. The relational analyzer looks up the system catalog to checks if
> the specified region exists, then calls its parser/analyzer to produce
> the query tree for the graph query fragment.  The relational analyser
> attaches the graph query tree to the InRegion node.
>
> 3. When the relational planner finds the graph query tree, it passes
> the graph query tree to the graph planner to produce the graph
> execution plan.
>
> 4. The relational planner produces a join plan node, based on the
> costs/statistics of the relational table scan and graph query.  The
> graph execution plan is attached to the join plan node.
>
> The parse/query/plan nodes have a label to denote a region, so that
> appropriate region's routines can be called.
>

It would be interesting to see how much of what you want you can get with
what we currently have and what pieces are really missing.

Am I right that if you wrote a function in C to take a Cypher query plan,
and analyse it, and execute it, the only thing really missing would be
feedback to the PostgreSQL planner regarding number of rows expected?

>
> Regards
> MauMau
>
>

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-21 Thread Amit Langote

On 2017/08/21 13:11, Ashutosh Bapat wrote:
> On Sat, Aug 19, 2017 at 1:21 AM, Robert Haas  wrote:
>> On Fri, Aug 18, 2017 at 1:17 AM, Ashutosh Bapat
>>  wrote:
>>> 0004 patch in partition-wise join patchset has code to expand
>>> partition hierarchy. That patch is expanding inheritance hierarchy in
>>> depth first manner. Robert commented that instead of depth first
>>> manner, it will be better if we expand it in partitioned tables first
>>> manner. With the latest changes in your patch-set I don't see the
>>> reason for expanding in partitioned tables first order. Can you please
>>> elaborate if we still need to expand in partitioned table first
>>> manner? May be we should just address the expansion issue in 0004
>>> instead of dividing it in two patches.
>>
>> Let me see if I can clarify.  I think there are three requirements here:
>>
>> A. Amit wants to be able to prune leaf partitions before opening and
>> locking those relations, so that pruning can be done earlier and,
>> therefore, more cheaply.
> 
> We could actually prune partitioned tables thus pruning whole
> partitioned tree. Do we want to then lock those partitioned tables but
> not the leaves in that tree?

I think it would be nice if we keep the current approach of expanding the
whole partition tree in expand_inherited_rtentry(), at least to know how
many more entries a given partitioned table will add to the query's range
table.  It would be nice, because that way, we don't have to worry *right
away* about modifying the planner to cope with some new behavior whereby
range table entries will get added at some later point.

Then, as you might already know, if we want to use the partition bound
order when expanding the whole partition tree, we will depend on the
relcache entries of the partitioned tables in that tree, which will
require us to take locks on them.

It does sound odd that we may end up locking a child *partitioned* table
that is potentially prune-able, but maybe there is some way to relinquish
that lock once we find out that it is pruned after all.

>> B. Partition-wise join wants to expand the inheritance hierarchy a
>> level at a time instead of all at once, ending up with rte->inh = true
>> entries for intermediate partitioned tables.
> 
> And create AppendRelInfos which pair children with their partitioned
> parent rather than the root.

There should be *some* way to preserve the parent-child RT index mapping
and to preserve the multi-level hierarchy, a way that doesn't map all the
child tables in a partition tree to the root table's RT index.

AppendRelInfo is one way of doing that mapping currently, but if we
continue to treat it as the only way (for the purpose of mapping), we will
be stuck with the way they are created and manipulated.  Especially, if we
are going to always depend on the fact that root->append_rel_list contains
all the required AppendRelInfos, then we will always have to fully expand
the inheritance in expand_inherited_rtentry() (by fully I mean, locking
and opening all the child tables, instead of just the partitioned tables).
 In a world where we don't want to open the partition child tables in
expand_inherited_rtentry(), we cannot build the corresponding
AppendRelInfos there.  Note that this is not about completely dispelling
AppendRelInfos-for-partition-child-tables, but about doing without them
being present in root->append_rel_list.  We would still need them to be
able to use adjust_appendrel_attrs(), etc., but we can create them at a
different time and store them in a place that's not root->append_rel_list;
 For example, inside the RelOptInfo of the child table.  Or perhaps, we
can still add them to root->append_rel_list, but will need to be careful
about the places that depend on the timing of AppendRelInfos being present
there.

>> So in the end game I think
>> expand_inherited_rtentry looks approximately like this:
>>
>> 1. Calling find_all_inheritors with a new only-lock-the-partitions
>> flag.  This should result in locking all partitioned tables in the
>> inheritance hierarchy in breadth-first, low-OID-first order.  (When
>> the only-lock-the-partitions isn't specified, all partitioned tables
>> should still be locked before any unpartitioned tables, so that the
>> locking order in that case is consistent with what we do here.)
> 
> I am confused. When "only-lock-the-partitions" is true, do we expect
> intermediate partitioned tables to be locked? Why then "only" in the
> flag?

I guess Robert meant to say lock-only-"partitioned"-tables?

>> 2. Iterate over the partitioned tables identified in step 1 in the
>> order in which they were returned.  For each one:
>> - Decide which children can be pruned.
>> - Lock the unpruned, non-partitioned children in low-OID-first order.
>>
>> 3. Make another pass over the inheritance hierarchy, starting at the
>> root.  Traverse the whole hierarchy in breadth-first in *bound* order.
>> Add RTEs and

Re: [HACKERS] Pluggable storage

2017-08-21 Thread Haribabu Kommi

On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila 
wrote:

> On Sat, Aug 12, 2017 at 10:31 AM, Haribabu Kommi
>  wrote:
> >>
> >> Why do we need to store handler function in TupleDesc?  As of now, the
> >> above patch series has it available in RelationData and
> >> TupleTableSlot, I am not sure if instead of that keeping it in
> >> TupleDesc is a good idea.  Which all kind of places require TupleDesc
> >> to contain handler?  If those are few places, can we think of passing
> >> it as a parameter?
> >
> >
> > Till now I am to able to proceed without adding any storage handler
> > functions to
> > TupleDesc structure. Sure, I will try the way of passing as a parameter
> when
> > there is a need of it.
> >
>
> Okay, I think it is better if you discuss such locations before
> directly modifying those.
>

Sure. I will check with community before making any such changes.

> > During the progress of the patch, I am facing problems in designing the
> > storage API
> > regarding the Buffer. For example To replace all the
> HeapTupleSatisfiesMVCC
> > and
> > related functions with function pointers, In HeapTuple format, the tuple
> may
> > belongs
> > to one buffer, so the buffer is passed to the HeapTupleSatifisifes***
> > functions along
> > with buffer, But in case of other storage formats, the single buffer may
> not
> > contains
> > the actual data.
> >
>
> Also, it is quite possible that some of the storage Am's don't even
> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
> guess what we need here is to provide a way so that different storage
> am's can register their function pointer for an equivalent to
> satisfies function.  So, we need to change
> SnapshotData.SnapshotSatisfiesFunc in some way so that different
> handlers can register their function instead of using that directly.
> I think that should address the problem you are planning to solve by
> omitting buffer parameter.
>

Thanks for your suggestion. Yes, it is better to go in the direction of
SnapshotSatisfiesFunc.

I verified the above idea of implementing the Tuple visibility functions
and assign them into the snapshotData structure based on the snapshot.

The Tuple visibility functions that are specific to the relation are
available
with the RelationData structure and this structure may not be available,
so I changed the SnapShotData structure to hold an enum to represent
what type of snapshot it is, instead of storing the pointer to the tuple
visibility function. Whenever there is a need to check for the tuple
visibilty
the storageam handler pointer corresponding to the snapshot type is
called and result is obtained as earlier.

> This buffer is used to set the Hint bits and mark the
> > buffer as dirty.
> > In case if the buffer is not available, the performance may affect for
> the
> > following
> > queries if the hint bits are not set.
> >
>
> I don't think it is advisable to change that for the current heap.
>

I didn't change the prototype of existing functions. Currently tuple
visibility
functions assumes that Buffer is always proper, but that may not be correct
based on the storage.

>
> > And also the Buffer is used to get from heap_fetch, heap_lock_tuple and
> > related
> > functions to check the Tuple visibility, but currently returning a buffer
> > from the above
> > heap_** function is not possible for other formats.
> >
>
> Why not?  I mean if we consider that all the formats we are worried at
> this stage have TID (block number, tuple location), then we can get
> the buffer.  We might want to consider passing TID as a parameter to
> these API's if required to make that possible.  You also agreed above
> [1] that we can first design the API considering storage formats
> having TID.
>

The current approach is to support the storages that support TID bits.
But what I mean here, in some storage methods (for example column
storage), the tuple is not present in one buffer, the tuple data may be
calculated from many buffers and return the slot/storageTuple (until
unless we change everywhere to slot).

If any of the following code after the storage methods is expecting
a Buffer that should be valid may need some changes to check it first
whether it is a valid or not and perform the operations based on that.

> > And also for the
> > HeapTuple data,
> > the tuple data is copied into palloced buffer instead of pointing
> directly
> > to the page.
> > So, returning a Buffer is a valid or not here?
> >
>
> Yeah, but I think for the sake of compatibility and not changing too
> much in the current API's signature, we should try to avoid it.
>

Currently I am trying to avoid changing the current API's signatures.
Most of the signature changes are something like HeapTuple -> StorageTuple
and etc.

> > Currently I am proceeding to remove the Buffer as parameter in the API
> and
> > proceed
> > further, In case if it affects the performance, we need to find out a
> >

[HACKERS] advanced partition matching algorithm for partition-wise join

2017-08-21 Thread Ashutosh Bapat

The patch-set in [1] supports partition-wise join when the partition bounds and
partition keys of the joining tables exactly match. The last two patches in the
last few patch-sets in that thread implement more
advanced partition matching code. In order to avoid mixing reviews for advanced
partition matching and the basic partition-wise join implementation, I am
starting a new thread to discuss the same. I am attaching the last two
patches from that patch set here.

The new partition matching algorithm handles following cases when a
given partition
on one side has at most one matching partition matching on the other side.

1. When the ranges of the joining tables do not match exactly E.g. partition
table t1 has partitions t1p1 (0 - 100), t1p2 (150 - 200) and partition table t2
has partitions t2p1 (0 - 50), t2p2 (100 - 175). In this case (t1p1, t2p1) and
(t1p2, t2p2) form the matching partition pairs, which can be joined. While
matching the pairs, we also compute the partition bounds for the resulting
join. An INNER join between t1 and t2 will have ranges (0 - 50) since no row
with 50 <= key < 100 from t1p1 is going to find a matching row in t2p1 and (150
- 175) since no row with 100 <= key < 150 from t2p2 is going to find a matching
row in t1p2 and no row with 175 <= key < 200 in t1p2 is going to find a
matching row in t1p2. A t1 LEFT join t2 on the other hand will have ranges same
as the outer relation i.e. t1, (0 - 100), (150 - 200) since all rows from t1
will be part of the join. Thus depending upon the type of join the partition
bounds of the resultant join relation change. Similarly for list partitioned
table, when the lists do not match exactly, the algorithm finds matching pairs
of partitions and the lists of resultant join relation. E.g.  t1 has
partitions t1p1 ('a',
'b', 'c'), t1p2 ('e', 'f') and t2 has partitions t2p1 ('a', 'b'), t2p2 ('d',
'e', 'f'). In this case (t1p1, t2p1) and (t2p1, t2p2) form the matching
pairs which are joined. Inner join will have bounds ('a','b'), ('e', 'f') and
t1 LEFT JOIN t2 will have bounds same as t1.

2. When one or both side have at least one partition that does not have
matching partition on the other side. E.g. t1 has partitions t1p1 ('a','b'),
t1p2 ('c','d') and t2 has only one partition t2p1 ('a','b') OR t1 has
partitions t1p1 (0 - 100), t1p2 (100 - 200) and t2 has only one partition t2p1
(0 - 100). In this case as well different types of joins will have different
partition bounds for the result using similar rules described above.

3. A combination of 1 and 2 e.g. t1 has partitions t1p1('a','b','c'),
t1p2('d','e','f') and t2 has a single partition t2p1 ('a','b', 'z').

Algorithm
-
The pairs of matching partitions and the partition bounds of the join are
calculated by an algorithm similar to merge join.

In such a join, it can be observed that every partition on either side,
contributes to at most one partition of the resultant join relation. Thus for
every partition on either side, we keep track of the partition of resultant
join (if any), which it contributes to.  If multiple partitions from any of the
joining relations map to a single partition of the resultant join, we need to
gang those partitions together before joining the partition/s from the other
side. Since we do not have infrastructure for ganging multiple arbitrary
RelOptInfos together in a parent RelOptInfo, we do not support such a
partitionw-wise join right now. We stop merging the bounds immediately when we
detect such a case.

For list partitioned tables, we compare list values from both the sides,
starting with the lowest. If the two list values being compared match,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the list value in join bounds.
If the two list values do not match and the lower of those two comes from the
outer side of the join, we include it in the join bounds. We advance to the
next list value on side with the lower list value continuing the process of
merging till list values on at least one side are exhausted. If the remaining
values are from the outer side, we include those in the join partition bounds.
Every list value included in the join bounds, and its originating partition/s
are associated with appropriate partition of the resultant join. For more
details please see partition_list_bounds_merge() in the attached patch.

In case of range partitioned tables, we compare the ranges of the partitions in
increasing order of their bounds. If two ranges being compared overlap,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the merged range in the bounds
of resultant join. The overlapping ranges are merged based on the type of join
as described above. If either of the ranges completely precedes the other, and
it's on the outer side, we include that range in the bounds of resultant join.
We advance to the next range on the

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-08-21 Thread Mithun Cy

On Fri, Aug 18, 2017 at 9:43 PM, Robert Haas  wrote:
> On Fri, Aug 18, 2017 at 2:23 AM, Mithun Cy  wrote:
> Ah, good catch.  While I agree that there is probably no great harm
> from skipping the lock here, I think it would be better to just avoid
> throwing an error while we hold the lock.  I think apw_dump_now() is
> the only place where that could happen, and in the attached version,
> I've fixed it so it doesn't do that any more.  Independent of the
> correctness issue, I think the code is easier to read this way.
>
> I also realize that it's not formally sufficient to use
> PG_TRY()/PG_CATCH() here, because a FATAL would leave us in a bad
> state.  Changed to PG_ENSURE_ERROR_CLEANUP().
>
>> 2) I also found one issue which was my own mistake in my previous patch 19.
>> In "apw_dump_now" I missed calling FreeFile() on first write error,
>> whereas on othercases I am already calling the same.
>> ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
>> + if (ret < 0)
>> + {
>> + int save_errno = errno;
>> +
>> + unlink(transient_dump_file_path);
>
> Changed in the attached version.

Thanks for the patch, I have tested the above fix now it works as
described. From my test patch looks good, I did not find any other
issues.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.

2017-08-21 Thread Simon Riggs

On 19 August 2017 at 20:54, Noah Misch  wrote:
> On Tue, Dec 06, 2016 at 01:59:18PM -0500, Robert Haas wrote:
>> On Tue, Dec 6, 2016 at 1:36 PM, Tom Lane  wrote:
>> > Robert Haas  writes:
>> >> On Tue, Nov 15, 2016 at 3:55 PM, Tom Lane  wrote:
>> >>> Account for catalog snapshot in PGXACT->xmin updates.
>> >
>> >> It seems to me that this makes it possible for
>> >> ThereAreNoPriorRegisteredSnapshots() to fail spuriously.  The only
>> >> core code that calls that function is in copy.c, and apparently we
>> >> never reach that point with the catalog snapshot set.  But that seems
>> >> fragile.
>
> I recently noticed this by way of the copy2 regression test failing, in 9.4
> and later, under REPEATABLE READ and SERIALIZABLE.  That failure started with
> $SUBJECT.  Minimal example:
>
> CREATE TABLE vistest (a text);
> BEGIN ISOLATION LEVEL REPEATABLE READ;
> TRUNCATE vistest;
> COPY vistest FROM stdin CSV FREEZE;
> x
> \.
>
> Before $SUBJECT, the registered snapshot count was one.  Since $SUBJECT, the
> catalog snapshot raises the count to two.
>
>> > Hm.  If there were some documentation explaining what
>> > ThereAreNoPriorRegisteredSnapshots is supposed to do, it would be possible
>> > for us to have a considered argument as to whether this patch broke it or
>> > not.  As things stand, it is not obvious whether it ought to be ignoring
>> > the catalog snapshot or not; one could construct plausible arguments
>
> The rows COPY FREEZE writes will be visible (until deleted) to every possible
> catalog snapshot, so whether CatalogSnapshot==NULL doesn't matter.  It may be
> useful to error out if a catalog scan is in-flight, but the registration for
> CatalogSnapshot doesn't confirm or refute that.
>
>> > either way.  It's not even very obvious why both 0 and 1 registered
>> > snapshots should be considered okay; that looks a lot more like a hack
>> > than reasoned-out behavior.  So I'm satisfied if COPY FREEZE does what
>
> Fair summary.  ThereAreNoPriorRegisteredSnapshots() has functioned that way
> since an early submission[1], and I don't see that we ever discussed it
> explicitly.  Adding Simon for his recollection.

The intention, IIRC, was to allow the current snapshot (i.e. 1) but no
earlier (i.e. prior) snapshots (so not >=2)

The name was chosen so it was (hopefully) clear what it did --
ThereAreNoPriorRegisteredSnapshots
...but that was before we had MVCC scans on catalogs, which changed
things in unforeseen ways.

The right fix is surely to do a more principled approach and renaming
of the function so that it reflects the current snapshot types.

As Noah mentions there are potential anomalies in other transactions
but this was understood and hence why we have a specific command
keyword to acknowledge user acceptance of these behaviours.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] path toward faster partition pruning

2017-08-21 Thread Amit Langote

I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one.  It's not fully cooked yet though.

Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful.  They are to be applied on top of
the patches being discussed at [1].  Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning.  I will share that patch after
debugging it some more.

The main design goal of the patches I'm sharing here now is to defer the
locking and  opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
 Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child.  ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time.  Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly.   Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.

Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
only use that node internally.

Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table.  The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables.  Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).

When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query.  This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on.  As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions.  The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node.  For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it.  For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list.  Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected.  The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().

set_append_rel_size() processing then continues for the root partitioned
table.  It is at this point that we will create the

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-21 Thread Amit Langote

Hi Amit,

On 2017/08/17 21:18, Amit Khandekar wrote:
> Anyways, some more comments :
> 
> In ExecSetupPartitionTupleRouting(), not sure why ptrinfos array is an
> array of pointers. Why can't it be an array of
> PartitionTupleRoutingInfo structure  rather than pointer to that
> structure ?

AFAIK, assigning pointers is less expensive than assigning struct and we
end up doing a lot of assigning of the members of that array to a local
variable in get_partition_for_tuple(), for example.  Perhaps, we could
avoid those assignments and implement it the way you suggest.

> diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
> + * Close all the leaf partitions and their indices.
> *
> Above comment needs to be shifted a bit down to the subsequent "for"
> loop where it's actually applicable.

That's right, done.

> * node->mt_partition_dispatch_info[0] corresponds to the root partitioned
> * table, for which we didn't create tupslot.
> Above : node->mt_partition_dispatch_info[0] => node->mt_ptrinfos[0]

Oops, fixed.

> /*
>  * XXX- do we need a pinning mechanism for partition descriptors
>  * so that there references can be managed independently of
>  * the parent relcache entry? Like PinPartitionDesc(partdesc)?
>  */
> pd->partdesc = partdesc;
> 
> Any idea if the above can be handled ? I am not too sure.

A similar mechanism exists for TupleDesc ref-counting (see the usage of
PinTupleDesc and ReleaseTupleDesc across the backend code.)  I too am
currently unsure if such an elaborate mechanism is actually *necessary*
for rd_partdesc.


Attached updated patches.

Thanks,
Amit
From 0cf8ab795fd3a8db462e8c692cfaa73f19e71ed6 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Jul 2017 18:59:57 +0900
Subject: [PATCH 1/2] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code.  In
particular, it's pretty undesirable that it makes it the responsibility
of the caller to release some resources, such as relcache references
and tuple table slots.  That makes it harder to use in places other
than where it's currently being used.

After this refactoring, ExecSetupPartitionTupleRouting() now needs to
do some of the work that was previously done in
RelationGetPartitionDispatchInfo() and expand_inherited_rtentry() no
longer needs to do some things that it used to.
---
 src/backend/catalog/partition.c| 309 +
 src/backend/commands/copy.c|  37 ++--
 src/backend/executor/execMain.c| 145 ++--
 src/backend/executor/nodeModifyTable.c |  35 ++--
 src/include/catalog/partition.h|  52 +++---
 src/include/executor/executor.h|   4 +-
 src/include/nodes/execnodes.h  |  53 +-
 7 files changed, 398 insertions(+), 237 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 96a64ce6b2..7618e4cb31 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -105,6 +105,24 @@ typedef struct PartitionRangeBound
boollower;  /* this is the lower (vs upper) 
bound */
 } PartitionRangeBound;
 
+/*---
+ * PartitionDispatchData - information of partitions of one partitioned table
+ *in a partition tree
+ *
+ * partkey Partition key of the table
+ * partdescPartition descriptor of the table
+ * indexes Array with partdesc->nparts members (for details on 
what the
+ * individual value represents, see the comments in
+ * RelationGetPartitionDispatchInfo())
+ *---
+ */
+typedef struct PartitionDispatchData
+{
+   PartitionKeypartkey;/* Points into the table's relcache 
entry */
+   PartitionDesc   partdesc;   /* Ditto */
+   int*indexes;
+} PartitionDispatchData;
+
 static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
   void *arg);
 static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -981,181 +999,165 @@ get_partition_qual_relid(Oid relid)
 }
 
 /*
- * Append OIDs of rel's partitions to the list 'partoids' and for each OID,
- * append pointer rel to the list 'parents'.
- */
-#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \
-   do\
-   {\
-   int i;\
-   for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\
-   {\
-   (partoids) = lappend_oid((partoids), 
(rel)->rd_partdesc->oids[i]);\
-   (parents) = lappend((parents), (rel));\
-   }\
-   } while(0)
-
-/*
  * RelationGetPartitionDispatchInfo
- * Returns information necessary

45 matches

Mail list logo