Re: Loaded footgun open_datasync on Windows

2018-09-11 Thread Michael Paquier
On Mon, Sep 10, 2018 at 04:46:40PM +0200, Laurenz Albe wrote:
> I didn't get pg_upgrade to work without the log file hacks; I suspect
> that there is more than just log file locking going on, but my Windows
> skills are limited.
> 
> How shall I proceed?

I do like this patch, and we have an occasion to clean a bunch of things
in pg_upgrade, so this argument is enough to me to put my hands in the
dirt and check by myself, so...

> I think that it is important to get pg_test_fsync to work correctly on
> Windows, and if my patch does not break the buildfarm, that's what it
> does.

This argument argument is sound, still... [ ... suspense ... ]

> I have attached a new version, the previous one was bit-rotted.

I really thought that this was not ambitious enough, so I have hacked on
top of your patch, so as pg_upgrade concurrent issues are removed, and I
have found one barrier in pg_ctl which decides that it is smarter to
redirect the log file (here pg_upgrade_server.log) using CMD.  The
problem is that the lock taken by the process which does the redirection
does not work nicely with what pg_upgrade does in parallel.  So I think
that it is better to drop that part.

+#ifdef WIN32
+   if ((infile = fopen(path, "rt")) == NULL)
+#else
if ((infile = fopen(path, "r")) == NULL)
+#endif
This should have a comment, saying roughly that as this uses
win32_fopen, text mode needs to be enforced to get proper CRLF.

One spot for open() is missed in file_utils.c, please see
pre_sync_fname().

The patch fails to apply for pg_verify_checksums, with a conflict easy
enough to fix.

At the end I would be incline to accept the patch proposed, knowing that
this would fix https://postgr.es/m/16922.1520722...@sss.pgh.pa.us
mentioned by Thomas upthread as get_pgpid would do things in a shared
manner, putting an end at some of the random failures we've seen on the
buildfarm.

Laurenz, could you update your patch?  I am switching that as waiting on
author for now.
Thanks,
--
Michael


signature.asc
Description: PGP signature


RE: Protect syscache from bloating with negative cache entries

2018-09-11 Thread Ideriha, Takeshi
Hi, 

>Subject: Re: Protect syscache from bloating with negative cache entries
>
>Hello. The previous v4 patchset was just broken.

>Somehow the 0004 was merged into the 0003 and applying 0004 results in 
>failure. I
>removed 0004 part from the 0003 and rebased and repost it.

I have some questions about syscache and relcache pruning
though they may be discussed at upper thread or out of point.

Can I confirm about catcache pruning?
syscache_memory_target is the max figure per CatCache.
(Any CatCache has the same max value.)
So the total max size of catalog caches is estimated around or 
slightly more than # of SysCache array times syscache_memory_target.

If correct, I'm thinking writing down the above estimation to the document 
would help db administrators with estimation of memory usage.
Current description might lead misunderstanding that syscache_memory_target
is the total size of catalog cache in my impression.

Related to the above I just thought changing sysycache_memory_target per 
CatCache
would make memory usage more efficient.
Though I haven't checked if there's a case that each system catalog cache 
memory usage varies largely,
pg_class cache might need more memory than others and others might need less.
But it would be difficult for users to check each CatCache memory usage and 
tune it
because right now postgresql hasn't provided a handy way to check them.
Another option is that users only specify the total memory target size and 
postgres 
dynamically change each CatCache memory target size according to a certain 
metric.
(, which still seems difficult and expensive to develop per benefit)
What do you think about this?

+   /*  
 
+* Set up pruning.  
 
+*  
 
+* We have two knobs to control pruning and a hash can share them of
 
+* syscache.
 
+*  
 
+*/ 
 
+   if (flags & HASH_PRUNABLE)  
 
+   {   
 
+   hctl->prunable = true;  
 
+   hctl->prune_cb = info->prune_cb;
 
+   if (info->memory_target)
 
+   hctl->memory_target = info->memory_target;  
 
+   else
 
+   hctl->memory_target = _memory_target; 
 
+   if (info->prune_min_age)
 
+   hctl->prune_min_age = info->prune_min_age;  
 
+   else
 
+   hctl->prune_min_age = _prune_min_age; 
 
+   }   
 
+   else
 
+   hctl->prunable = false;

As you commented here, guc variable syscache_memory_target and
syscache_prune_min_age are used for both syscache and relcache (HTAB), right?

Do syscache and relcache have the similar amount of memory usage?
If not, I'm thinking that introducing separate guc variable would be fine.
So as syscache_prune_min_age.

Regards,

Takeshi Ideriha
Fujitsu Limited





Re: [HACKERS] Secondary index access optimizations

2018-09-11 Thread David Rowley
On 12 September 2018 at 08:32, Konstantin Knizhnik
 wrote:
> Also the patch proposed by you is much simple and does mostly the same. Yes,
> it is not covering CHECK constraints,
> but as far as partitioning becomes now standard in Postgres, I do not think
> that much people will use old inheritance mechanism and CHECK constraints.
> In any case, there are now many optimizations which works only for
> partitions, but not for inherited tables.

I've not had time to look at your updated patch yet, but one thing I
thought about after my initial review, imagine you have a setup like:

create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create index listp_a_b_idx on listp (a,b);

and a query:

select * from listp where a = 1 order by b;

if we remove the "a = 1" qual, then listp_a_b_idx can't be used.

I didn't test this in your patch, but I guess since the additional
quals are not applied to the children in set_append_rel_size() that by
the time set_append_rel_pathlist() is called, then when we go
generating the paths, the (a,b) index won't be any good.

Perhaps there's some workaround like inventing some sort of "no-op"
qual that exists in planning but never makes it way down to scans.
Although I admit to not having fully thought that idea through.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-11 Thread Masahiko Sawada
On Mon, Sep 10, 2018 at 7:19 PM, Kyotaro HORIGUCHI
 wrote:
> Hello.
>
> At Thu, 6 Sep 2018 19:55:39 +0900, Masahiko Sawada  
> wrote in 
>> On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Thank you for the comment.
>> >
>> > At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada  
>> > wrote in 
>> > 
>> >> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
>> >> Thank you for updating! Here is the review comment for v8 patch.
>> >>
>> >> +/*
>> >> + * This slot still has all required segments. Calculate how 
>> >> many
>> >> + * LSN bytes the slot has until it loses restart_lsn.
>> >> + */
>> >> +fragbytes = wal_segment_size - (currLSN % wal_segment_size);
>> >> +XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
>> >> fragbytes,
>> >> +wal_segment_size, *restBytes);
>> >>
>> >> For the calculation of fragbytes, I think we should calculate the
>> >> fragment bytes of restartLSN instead. The the formula "restartSeg +
>> >> limitSegs - currSeg" means the # of segment between restartLSN and the
>> >> limit by the new parameter. I don't think that the summation of it and
>> >> fragment bytes of currenLSN is correct. As the following result
>> >> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
>> >> actual remains + 16MB (get_bytes function returns the value of
>> >> max_slot_wal_keep_size in bytes).
>> >
>> > Since a oldest segment is removed after the current LSN moves to
>> > the next segmen, current LSN naturally determines the fragment
>> > bytes. Maybe you're concerning that the number of segments looks
>> > too much by one segment.
>> >
>> > One arguable point of the feature is how max_slot_wal_keep_size
>> > works exactly. I assume that even though the name is named as
>> > "max_", we actually expect that "at least that bytes are
>> > kept". So, for example, with 16MB of segment size and 50MB of
>> > max_s_w_k_s, I designed this so that the size of preserved WAL
>> > doesn't go below 50MB, actually (rounding up to multples of 16MB
>> > of 50MB), and loses the oldest segment when it reaches 64MB +
>> > 16MB = 80MB as you saw.
>> >
>> > # I believe that the difference is not so significant since we
>> > # have around hunderd or several hundreds of segments in common
>> > # cases.
>> >
>> > Do you mean that we should define the GUC parameter literally as
>> > "we won't have exactly that many bytes of WAL segmetns"? That is,
>> > we have at most 48MB preserved WAL records for 50MB of
>> > max_s_w_k_s setting. This is the same to how max_wal_size is
>> > counted but I don't think max_slot_wal_keep_size will be regarded
>> > in the same way.
>>
>> I might be missing something but what I'm expecting to this feature is
>> to restrict the how much WAL we can keep at a maximum for replication
>> slots. In other words, the distance between the current LSN and the
>> minimum restart_lsn of replication slots doesn't over the value of
>> max_slot_wal_keep_size.
>
> Yes, it's one possible design, the same with "we won't have more
> than exactly that many bytes of WAL segmetns" above ("more than"
> is added, which I meant). But anyway we cannot keep the limit
> strictly since WAL segments are removed only at checkpoint
> time.

Agreed. It should be something like a soft limit.

> So If doing so, we can reach the lost state before the
> max_slot_wal_keep_size is filled up meanwhile WAL can exceed the
> size by a WAL flood. We can define it precisely at most as "wal
> segments are preserved at most aorund the value".  So I choosed
> the definition so that we can tell about this as "we don't
> guarantee more than that bytes".

Agreed.

>
> # Uuuu. sorry for possiblly hard-to-read sentence..
>
>>  It's similar to wal_keep_segments except for
>> that this feature affects only replication slots. And
>
> It defines the *extra* segments to be kept, that is, if we set it
> to 2, at least 3 segments are present. If we set
> max_slot_wal_keep_size to 32MB (= 2 segs here), we have at most 3
> segments since 32MB range before the current LSN almost always
> spans over 3 segments. Doesn't this seemingly in a similar way
> with wal_keep_segments

Yeah, that's fine with me. The wal_keep_segments works regardless
existence of replication slots. If we have replication slots and set
both settings we can reserve extra WAL as much as
max(wal_keep_segments, max_slot_wal_keep_size).

>
> If the current LSN is at the very last of a segment and
> restart_lsn is catching up to the current LSN, the "remain" is
> equal to max_slot_wal_keep_size as the guaranteed size. If very
> beginning of a segments, it gets extra 16MB.

Agreed.

>
>> wal_keep_segments cannot restrict WAL that replication slots are
>> holding. For example, with 16MB of segment size and 50MB of
>> max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
>> slots. However, once we consumed more 

Re: Retrieve memory size allocated by libpq

2018-09-11 Thread Pavel Stehule
2018-09-12 0:59 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > The implementation of PQresultMemsize is simple and correct.
>
> Hm, well, only for small values of "correct".  A quick look at PQclear
> to see just what it clears shows that PGEvent objects associated with the
> PGresult were overlooked.


My mistake. I though so event data are independent.


> While that's not hard to fix (and I did so),
> there's a problem hiding behind that: we have no idea what sort of
> "instance data" the external event procedures may have associated with
> the PGresult.  In principle, any space occupied by such data needs to
> be accounted for in this number.
>

> This isn't an issue if the event procedure takes the lazy man's approach
> of using PQresultAlloc to allocate such data.  That's recommendable anyway
> since it saves having to write cleanup logic; and even if you don't do it
> like that, the space involved may be negligible.  Still, I can foresee an
> eventual need to let event procedures adjust the result-size number,
> along the lines of PQincrementMemorySize(PGresult *res, size_t delta).
> But that can probably wait till somebody actually requests it.
>
> I didn't like the name "PQresultMemsize" much either, and changed it to
> "PQresultMemorySize", which seems more in keeping with naming and
> capitalization conventions elsewhere in libpq.
>
> Pushed with those and some cosmetic changes.
>

Thank you

Pavel


> regards, tom lane
>


Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-11 Thread Michael Paquier
On Tue, Sep 11, 2018 at 11:27:08PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> It is possible to get away with the error by using _stat64(), which
>> returns as result a _stat64 structure.  Still, it has one difference
>> with the native result returned by stat() (which maps to _stat64i32) as
>> st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
>> _stat64.  This mess is mixed also with the fact that pgwin32_safestat
>> relies on a result stored in _stat, so we'd lose the 32 high bits from
>> the size if we only do a direct mapping, which is bad.
> 
> Could we fix things so that Postgres thinks that struct stat is
> struct __stat64?  That is, act as though that variant is the native
> stat structure?

That's exactly what I would like to do and what I meant two paragraphs
above as that's the only solution I think would be clean, and this would
take care of st_size nicely.  I have not tested (yet), but some tweaks
in win32_port.h could be enough before mapping lstat to stat?
--
Michael


signature.asc
Description: PGP signature


Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-11 Thread Tom Lane
Michael Paquier  writes:
> At the end, I have finally been able to put my hands on a Windows VM
> which uses VS2015, and I am able to see the problem.  In short, Windows
> definition of stat() is an utter mess as this documentation page, which
> is the latest one available, nicely summarizes:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=vs-2017

Egad.

> It is possible to get away with the error by using _stat64(), which
> returns as result a _stat64 structure.  Still, it has one difference
> with the native result returned by stat() (which maps to _stat64i32) as
> st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
> _stat64.  This mess is mixed also with the fact that pgwin32_safestat
> relies on a result stored in _stat, so we'd lose the 32 high bits from
> the size if we only do a direct mapping, which is bad.

Could we fix things so that Postgres thinks that struct stat is
struct __stat64?  That is, act as though that variant is the native
stat structure?

regards, tom lane



Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-11 Thread Michael Paquier
On Tue, Sep 11, 2018 at 10:36:51AM +, Higuchi, Daisuke wrote:
> So, attached patch help me and strange message disappeared, 
> but I ignore the impact of this for others now.

@@ -386,7 +386,6 @@ pgwin32_safestat(const char *path, struct stat *buf)
return -1;
}

-   return r;
}
Simply ignoring errors is not a solution, and makes things worse.

At the end, I have finally been able to put my hands on a Windows VM
which uses VS2015, and I am able to see the problem.  In short, Windows
definition of stat() is an utter mess as this documentation page, which
is the latest one available, nicely summarizes:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=vs-2017

It is possible to get away with the error by using _stat64(), which
returns as result a _stat64 structure.  Still, it has one difference
with the native result returned by stat() (which maps to _stat64i32) as
st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
_stat64.  This mess is mixed also with the fact that pgwin32_safestat
relies on a result stored in _stat, so we'd lose the 32 high bits from
the size if we only do a direct mapping, which is bad.

Getting full support for stat() with files larger than 4GB would be the
nicest solution, and requires roughly the following things I think:
- Use _stat64 in pgwin32_safestat.
- Enforce the return result stat to map with _stat64, so as st_size gets
the correct 64-bit size.

Postgres could live in a better world if Windows decided that stat() is
able to handle properly files bigger than 4GB with x64, but as
backward-compatibility matters a lot for Redmond's folks, it is hard to
believe that this is going to ever change.  I cannot blame the
compatibility argument either.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-09-11 Thread Thomas Munro
On Sat, Jun 24, 2017 at 10:55 AM Peter Geoghegan  wrote:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
>  wrote:
> > It's something I hope to address soon.
>
> I hope you do. I think that we'd realize significant benefits by
> having ICU become the defacto standard collation provider, that most
> users get without even realizing it. As things stand, you have to make
> a point of specifying an ICU collation as your per-column collation
> within every CREATE TABLE. That's a significant barrier to adoption.
>
> > 1) Associate by name only.  That is, you can create a database with any
> > COLLATION "foo" that you want, and it's only checked when you first
> > connect to or do anything in the database.
> >
> > 2) Create shared collations.  Then we'd need a way to manage having a
> > mix of shared and non-shared collations around.
> >
> > There are significant pros and cons to all of these ideas.  Some people
> > I talked to appeared to prefer the shared collations approach.
>
> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

Sorry to join this thread late.  I was redirected here from another one[1].

I like the shared catalog idea, but here's one objection I thought
about: it makes it a bit harder to track whether you've sorted out all
your indexes after a version change.  Say collation fr_CA's version
changes according to the provider, so that it no longer matches the
stored collversion.  Now you'll need to be careful to connect to every
database in the cluster and run REINDEX, before you run ALTER
COLLATION "fr_CA" REFRESH VERSION to update the single shared
pg_collation row's collversion.  With the non-shared pg_collation
scheme we have currently, you'd need to refresh the collation row in
each database after reindexing the whole database, which is IMHO a bit
nicer (you track which databases you've dealt with as you go through
them).

In other words, using a shared catalog moves the "scope" of the
version tracking even further away from the ideal scope, and requires
humans to actually get the cleanup right, and it's extra confusing
because you can only be connected to one database at a time so there
is no "REINDEX MY CLUSTER" and no possibility of making a command that
reindexes dependent indexes and then refreshes the collation version.

The ideal scope would be to track all referenced collation versions on
every index, and only update them at CREATE INDEX or REINDEX time
(also, as discussed in some other thread, CHECK constraints and
partition keys might be invalidated and should in theory also carry
versions that can only be updated by running a hypothetical RECHECK or
REPARTITION command).  Then a shared pg_collation catalog would make
perfect sense, and there would be no need for it to have a collversion
column at all, or an ALTER COLLATION ... REFRESH VERSION command, and
therefore there would be no way to screw it up by REFRESHing the
VERSION without having really fixed the problem.

[1] 
https://www.postgresql.org/message-id/242e081c-aec8-a20a-510c-f4d0f183cebd%402ndquadrant.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Retrieve memory size allocated by libpq

2018-09-11 Thread Tom Lane
Pavel Stehule  writes:
> The implementation of PQresultMemsize is simple and correct.

Hm, well, only for small values of "correct".  A quick look at PQclear
to see just what it clears shows that PGEvent objects associated with the
PGresult were overlooked.  While that's not hard to fix (and I did so),
there's a problem hiding behind that: we have no idea what sort of
"instance data" the external event procedures may have associated with
the PGresult.  In principle, any space occupied by such data needs to
be accounted for in this number.

This isn't an issue if the event procedure takes the lazy man's approach
of using PQresultAlloc to allocate such data.  That's recommendable anyway
since it saves having to write cleanup logic; and even if you don't do it
like that, the space involved may be negligible.  Still, I can foresee an
eventual need to let event procedures adjust the result-size number,
along the lines of PQincrementMemorySize(PGresult *res, size_t delta).
But that can probably wait till somebody actually requests it.

I didn't like the name "PQresultMemsize" much either, and changed it to
"PQresultMemorySize", which seems more in keeping with naming and
capitalization conventions elsewhere in libpq.

Pushed with those and some cosmetic changes.

regards, tom lane



Re: libpq stricter integer parsing

2018-09-11 Thread Michael Paquier
On Tue, Sep 11, 2018 at 07:03:41PM +0200, Fabien COELHO wrote:
> Attached Michael's simplified version updated with your remarks.

Okay, this version looks good to me so pushed.  Thanks Fabien and
Peter.
--
Michael


signature.asc
Description: PGP signature


Re: Stored procedures and out parameters

2018-09-11 Thread Peter Eisentraut
On 30/08/2018 21:35, Robert Haas wrote:
> The semantics you've chosen for procedures are more like Oracle that
> the existing function semantics, which, as I can attest from my work
> experience, can be very useful for users looking to migrate.  Worth
> noting, however, is Oracle also has those semantics for function
> calls.  So what you've ended up creating here is a situation where
> procedures behave more or less like they do in Oracle and the SQL
> standard, but functions behave the way they historically have in
> PostgreSQL.  That's kind of a weird incompatibility, and I think that
> incompatibility is a significant part of what people are complaining
> about.

> This probably should have been discussed in more detail before this
> got committed, but I guess that's water under the bridge at this
> point.

Note that OUT parameters in procedures are not implemented yet, so you
didn't miss any discussion, but it's a discussion I would like to have
in the future, and not in haste.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Collation versioning

2018-09-11 Thread Peter Eisentraut
On 07/09/2018 23:34, Thomas Munro wrote:
> On Thu, Sep 6, 2018 at 5:36 PM Thomas Munro
>  wrote:
>> On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut
>>  wrote:
>>> Also, note that this mechanism only applies to collation objects, not to
>>> database-global locales.  So most users wouldn't be helped by this approach.
>>
>> Yeah, right, that would have to work for this to be useful.  I will
>> look into that.
> 
> We could perform a check up front in (say) CheckMyDatabase(), or maybe
> defer until the first string comparison.  The tricky question is where
> to store it.
> 
> 1.  We could add datcollversion to pg_database.
> 
> 2.  We could remove datcollate and datctype and instead store a
> collation OID.  I'm not sure what problems would come up, but for
> starters it seems a bit weird to have a shared catalog pointing to
> rows in a non-shared catalog.
> 
> The same question comes up if we want to support ICU as a database
> level default.  Add datcollprovider, or point to a pg_collation row?

This was previously discussed here:
https://www.postgresql.org/message-id/f689322a-4fc5-10cc-4a60-81f1ff016...@2ndquadrant.com
-- without a conclusion.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_ugprade test failure on data set with column with default value with type bit/varbit

2018-09-11 Thread Tom Lane
Paul Guo  writes:
> [ 0001-Fix-pg_upgrade-test-failure-caused-by-the-DDL-below.v2.patch ]

Actually, there's an even easier way to fix this, which is to discard
the special case for BITOID/VARBITOID altogether, and let the "default"
case handle it.  Fixing things by removing code is always great when
possible.

Also, it's fairly customary to add a test case that actually exhibits
the behavior you want to fix, so I added a regression test table
that has some bit/varbit columns with defaults.  I confirmed that that
makes the pg_upgrade test fail without this ruleutils change.

Pushed with those changes.

regards, tom lane



Re: [HACKERS] Secondary index access optimizations

2018-09-11 Thread Konstantin Knizhnik

Hi David,


On 11.09.2018 06:49, David Rowley wrote:

On 9 July 2018 at 13:26, David Rowley  wrote:

I started looking over this patch and have a few comments:

Hi Konstantin,

Wondering, are you going to be submitting an updated patch for this
commitfest?  If not then I think we can mark this as returned with
feedback as it's been waiting on author for quite a while now.



First of all thank your for review.
I am very sorry for delay with answer: I was in vacation in July and 
just forgot about this mail.
I have to agree with you that it is better to split this patch into two 
and that using range type for open and close intervals match is so good 
idea.
Also the patch proposed by you is much simple and does mostly the same. 
Yes, it is not covering CHECK constraints,
but as far as partitioning becomes now standard in Postgres, I do not 
think that much people will use old inheritance mechanism and CHECK 
constraints. In any case, there are now many optimizations which works 
only for partitions, but not for inherited tables.


I attach to this mail your patch combined with corrected tests outputs.
Unfortunately the performance gain is not so much as I expected (even 
without additional

predicate_implied_by() smarts). On the following database:

create table bt (k integer, v integer) partition by range (k);
create table dt1 partition of bt for values from (1) to (10001);
create table dt2 partition of bt for values from (10001) to (20001);
create index dti1 on dt1(v);
create index dti2 on dt2(v);
insert into bt values (generate_series(1,2), generate_series(1,2));
analyze bt;

and pgbench script:

\set x random(1, 1)
select * from bt where k between 1 and 20001 and v=:x;

I got ~170k TPS with this patch and about ~160k TPS without it.
My hypothesis was that we have to perform redundant predicate only once 
(for one selected record)

and it adds no so much overhead comparing with index search cost.
So I tried another version of  the query which selects large number of 
records:


select sum(*) from bt where k between 1 and 20001 and v between :x and 
:x + 1000;


Now patch version shows 23k TPS vs. 19k for Vanilla.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5db1688..48359f4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -37,6 +37,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planner.h"
+#include "optimizer/predtest.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -1039,6 +1040,12 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 /* Restriction reduces to constant TRUE, so drop it */
 continue;
 			}
+
+			/* We can skip any quals that are implied by any partition bound. */
+			if (childrel->partition_qual != NIL &&
+predicate_implied_by(list_make1(rinfo->clause), childrel->partition_qual, false))
+continue;
+
 			/* might have gotten an AND clause, if so flatten it */
 			foreach(lc2, make_ands_implicit((Expr *) childqual))
 			{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8369e3a..8cd9b06 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -450,7 +450,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 */
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
-
+	else if (relation->rd_rel->relispartition)
+		rel->partition_qual = RelationGetPartitionQual(relation);
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 4f29d9f..91219fa 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1772,30 +1772,26 @@ explain (costs off) select * from list_parted where a is not null;
 -
  Append
->  Seq Scan on part_ab_cd
- Filter: (a IS NOT NULL)
->  Seq Scan on part_ef_gh
- Filter: (a IS NOT NULL)
->  Seq Scan on part_null_xy
  Filter: (a IS NOT NULL)
-(7 rows)
+(5 rows)
 
 explain (costs off) select * from list_parted where a in ('ab', 'cd', 'ef');
 QUERY PLAN
 --
  Append
->  Seq Scan on part_ab_cd
- Filter: ((a)::text = ANY ('{ab,cd,ef}'::text[]))
->  Seq Scan on part_ef_gh
  Filter: ((a)::text = ANY ('{ab,cd,ef}'::text[]))
-(5 rows)
+(4 rows)
 
 explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd');
-  QUERY PLAN   

+  QUERY PLAN  

Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-09-11 Thread Andres Freund
Hi,

On 2018-09-11 16:13:15 -0400, Andrew Dunstan wrote:
> I've pushed support for the latest MSVC compilers back to all live branches.

Thanks!

Greetings,

Andres Freund



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-09-11 Thread Andrew Dunstan




On 08/24/2018 03:42 PM, Andrew Dunstan wrote:



On 08/24/2018 02:38 PM, Tom Lane wrote:

Andres Freund  writes:

On 2018-08-24 14:09:09 -0400, Andrew Dunstan wrote:
However, we only support VS2017 down to 9.6 and Vs2015 down to 9.5. 
Perhaps

we should consider backpatching support for those down to 9.3.

Hm, I have no strong objections to that.   I don't think it's strictly
necessary, given 2013 is supported across the board, but for the non 
MSVC

world, we do fix compiler issues in older branches.  There's not that
much code for the newer versions afaict?

+1 for taking a look at how big a patch it would be.  But I kind of
thought we'd intentionally rejected back-patching some of those changes
to begin with, so I'm not sure the end decision will change.


The VS2017 patch applies cleanly to 9.5, so that seems easy. The 
VS2015 patch from 9.5 needs a very small amount of adjustment by the 
look of it for 9.3 and 9.4, after which I hope the VS2017 patch would 
again apply cleanly.


I'll try to put this together.

The trouble with not back patching support to all live branches as new 
versions come in is that it acts as a significant discouragement to 
buildfarm owners to use the latest Visual Studio versions. I've never 
argued stringly on this point before, but I think i'm goiung to ber 
inclined to in future.


Meanwhile, I will turn bowerbird back on but just for >= 9.6 for now.




I've pushed support for the latest MSVC compilers back to all live branches.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: libpq stricter integer parsing

2018-09-11 Thread Fabien COELHO



+   ...


I would leave this out.  We don't need to document every single
refinement of parsing rules.  This might better belong in the release notes.


Why not, I'd be fine with that. The fact that the libpq parser was quite 
fuzzy was not documented, the behavior was really a side effect of the 
implementation which never suggested that it would work.



+   appendPQExpBuffer(>errorMessage,
+ libpq_gettext("invalid value for keyword 
\"%s\"\n"),
+ context);


Add the actual invalid value to the error message.


Indeed.

Attached Michael's simplified version updated with your remarks.

--
Fabien.diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 42cdb971a3..d79f311241 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1587,6 +1587,34 @@ useKeepalives(PGconn *conn)
 	return val != 0 ? 1 : 0;
 }
 
+/*
+ * Parse and try to interpret "value" as an integer value, and if successful,
+ * store it in *result, complaining if there is any trailing garbage or an
+ * overflow.
+ */
+static bool
+parse_int_param(const char *value, int *result, PGconn *conn,
+const char *context)
+{
+	char   *end;
+	long	numval;
+
+	*result = 0;
+
+	errno = 0;
+	numval = strtol(value, , 10);
+	if (errno == 0 && *end == '\0' && numval == (int) numval)
+	{
+		*result = numval;
+		return true;
+	}
+
+	appendPQExpBuffer(>errorMessage,
+	  libpq_gettext("invalid integer value \"%s\" for keyword \"%s\"\n"),
+	  value, context);
+	return false;
+}
+
 #ifndef WIN32
 /*
  * Set the keepalive idle timer.
@@ -1599,7 +1627,8 @@ setKeepalivesIdle(PGconn *conn)
 	if (conn->keepalives_idle == NULL)
 		return 1;
 
-	idle = atoi(conn->keepalives_idle);
+	if (!parse_int_param(conn->keepalives_idle, , conn, "keepalives_idle"))
+		return 0;
 	if (idle < 0)
 		idle = 0;
 
@@ -1631,7 +1660,8 @@ setKeepalivesInterval(PGconn *conn)
 	if (conn->keepalives_interval == NULL)
 		return 1;
 
-	interval = atoi(conn->keepalives_interval);
+	if (!parse_int_param(conn->keepalives_interval, , conn, "keepalives_interval"))
+		return 0;
 	if (interval < 0)
 		interval = 0;
 
@@ -1664,7 +1694,8 @@ setKeepalivesCount(PGconn *conn)
 	if (conn->keepalives_count == NULL)
 		return 1;
 
-	count = atoi(conn->keepalives_count);
+	if (!parse_int_param(conn->keepalives_count, , conn, "keepalives_count"))
+		return 0;
 	if (count < 0)
 		count = 0;
 
@@ -1698,13 +1729,15 @@ setKeepalivesWin32(PGconn *conn)
 	int			idle = 0;
 	int			interval = 0;
 
-	if (conn->keepalives_idle)
-		idle = atoi(conn->keepalives_idle);
+	if (conn->keepalives_idle &&
+		!parse_int_param(conn->keepalives_idle, , conn, "keepalives_idle"))
+		return 0;
 	if (idle <= 0)
 		idle = 2 * 60 * 60;		/* 2 hours = default */
 
-	if (conn->keepalives_interval)
-		interval = atoi(conn->keepalives_interval);
+	if (conn->keepalives_interval &&
+		!parse_int_param(conn->keepalives_interval, , conn, "keepalives_interval"))
+		return 0;
 	if (interval <= 0)
 		interval = 1;			/* 1 second = default */
 
@@ -1831,7 +1864,10 @@ connectDBComplete(PGconn *conn)
 	 */
 	if (conn->connect_timeout != NULL)
 	{
-		timeout = atoi(conn->connect_timeout);
+		if (!parse_int_param(conn->connect_timeout, , conn,
+			 "connect_timeout"))
+			return 0;
+
 		if (timeout > 0)
 		{
 			/*
@@ -1842,6 +1878,8 @@ connectDBComplete(PGconn *conn)
 			if (timeout < 2)
 timeout = 2;
 		}
+		else /* negative means 0 */
+			timeout = 0;
 	}
 
 	for (;;)
@@ -2108,7 +2146,9 @@ keep_going:		/* We will come back to here until there is
 			thisport = DEF_PGPORT;
 		else
 		{
-			thisport = atoi(ch->port);
+			if (!parse_int_param(ch->port, , conn, "port"))
+goto error_return;
+
 			if (thisport < 1 || thisport > 65535)
 			{
 appendPQExpBuffer(>errorMessage,


Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Andres Freund
On 2018-09-11 12:50:06 -0400, Tom Lane wrote:
> I am not sure which part of "I will not fix this" you didn't understand.

Maybe the "this is an open list, and we can discuss things" bit?



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-11 12:26:44 -0400, Tom Lane wrote:
>> Well, there remains the fact that we've seen no field reports that seem
>> to trace to failure-to-acquire-AEL since 9.6 came out.  So arguing that
>> this *could* be a probable scenario fails to comport with the available
>> evidence.

> It might even be that we've seen reports of this, but didn't attribute
> the errors correctly.

Yeah, that's certainly possible.  One good thing about the change I'm
recommending is that the symptom would be very clear (ie, "out of shared
memory" from the startup process).  If we do start getting reports of it,
we'd know where the problem is.

>> My inclination is to fix it as I've suggested and wait to see if there
>> are field complaints before expending a whole lot of effort to create
>> a better solution.  In any case, I am not willing to create that better
>> solution myself, and neither is Robert; are you?

> On master I'd be ok with that, but on the backbranches that seems like
> playing with user's setups.

I am not sure which part of "I will not fix this" you didn't understand.

*If* we get clear evidence that it happens for a non-negligible number
of users, I might be willing to reconsider.  Right now my reading of
the evidence is that it hasn't, and won't; so I judge that putting in
a complex mechanism to try to cope with the situation would be a net
loss for reliability.  Back-patching such a mechanism seems like it'd
be an even worse engineering decision.

regards, tom lane



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-11 Thread Robert Haas
On Wed, Sep 5, 2018 at 1:05 AM, Tom Lane  wrote:
>> Would expanding this a git further really be that noticeable?
>
> Frankly, I think it would be not so much "noticeable" as "disastrous".
>
> Making the overhead tolerable would require very large compromises
> in coverage, perhaps like "we'll only lock during DDL not DML".
> At which point I'd question why bother.  We've seen no field reports
> (that I can recall offhand, anyway) that trace to not locking these
> objects.

I think that would actually be a quite principled separation.  If your
DML fails with a strange error, that sucks, but you can retry it and
the problem will go away -- it will fail in some more sane way, or it
will work.  On the other hand, if you manage to create an object in a
no-longer-existing schema, you now have a database that can't be
backed up in pg_dump, and the only way to fix it is to run manual
DELETE commands against the PostgreSQL catalogs.  It's not even easily
to figure out what you need to DELETE, because there can be references
to pg_namespace.oid from zillions of other catalogs -- pg_catcheck
will tell you, but that's a third-party tool which many users won't
have and won't know that they should use.

I do agree with you that locking every schema referenced by any object
in a query would suck big time.  At a minimum, we'd need to extend
fast-path locking to cover AccessShareLock on schemas.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Slotification of partition tuple conversion

2018-09-11 Thread Amit Khandekar
On 3 September 2018 at 12:14, Amit Langote
 wrote:
> Hi Amit,
>
> Thanks for the updated patch and sorry I couldn't reply sooner.
>
> On 2018/08/21 16:18, Amit Khandekar wrote:
>> On 21 August 2018 at 08:12, Amit Langote  
>> wrote:
>>> Here are some comments on the patch:
>>
>> Thanks for the review.
>>
>>>
>>> +ConvertTupleSlot
>>>
>>> Might be a good idea to call this ConvertSlotTuple?
>>
>> I thought the name 'ConvertTupleSlot' emphasizes the fact that we are
>> operating rather on a slot without having to touch the tuple wherever
>> possible. Hence I chose the name. But I am open to suggestions if
>> there are more votes against this.
>
> Hmm, okay.
>
> The verb here is "to convert", which still applies to a tuple, the only
> difference is that the new interface accepts and returns TupleTableSlot,
> instead of HeapTuple.

Yeah, in that sense you are right. Let's see others' suggestions. For
now I haven't changed it.

> @@ -2691,12 +2861,14 @@ CopyFrom(CopyState cstate)
>
>   /*
>* We might need to convert from the parent rowtype to the
> -  * partition rowtype.
> +  * partition rowtype.  Don't free the already stored tuple as it
> +  * may still be required for a multi-insert batch.
>*/
>   tuple =
> ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[leaf_part_index],
>tuple,
>proute->partition_tuple_slot,
> -  );
> +  ,
> +  false);
>
> So the "already stored tuple" means a previous tuple that may have been
> stored in 'proute->partition_tuple_slot', which shouldn't be freed in
> ConvertPartitionTupleSlot (via ExecStoreTuple), because it might also have
> been stored in the batch insert tuple array.
>
> Now, because with ConvertTupleSlot, there is no worry about freeing an
> existing tuple, because we never perform ExecStoreTuple on
> proute->partition_tuple_slot (or one of the slots in it if we consider my
> patch at [1] which converts it to an array).  All I see applied to those
> slots is ExecStoreVirtualTuple() in ConvertTupleSlot(), and in some cases,
> ExecCopySlotTuple() in the caller of ConvertTupleSlot().
>
> So, I think that that sentence is obsolete with your patch.

Right. Removed the "Don't free the already stored tuple" part.


> I was saying, because we no longer use do_convert_tuple for
> "partitioning-related" tuple conversions, we could get rid of
> TupleConversionMaps in the partitioning-specific PartitionTupleRouting
> structure.

We use child_parent_tupconv_maps in AfterTriggerSaveEvent() where we
call do_convert_tuple() with the map.
We pass parent_child_tupconv_maps to adjust_partition_tlist(), which
requires the map->outdesc.
So there are these places where still we can't get rid of
TupleConversionMaps even for partition-related tuples.

Regarding adjust_partition_tlist(), we can perhaps pass the outdesc
from somewhere else rather than map->outdesc, making it possible to
use AttrNumber map rather than TupleConversionMap. But it needs some
investigation. Overall, I think both the above maps should be worked
on as a separate item, and not put in this patch that focusses on
ConvertTupleSlot().

I have changed the PartitionDispatch->tupmap type to AttrNumber[],
though. This was possible because we are using the tupmap->attrMap and
no other fields.

>
> Also, since ConvertTupleSlot itself just uses the attrMap field of
> TupleConversionMap, I was wondering if its signature should contain
> AttrNumber * instead of TupleConversionMap *?

Done.

>
> Maybe if we get around to getting rid of do_convert_tuple altogether, that
> would also mean getting rid of the TupleConversionMap struct, because its
> various tuple data related arrays would be redundant, because
> ConvertTupleSlot has input and output TupleTableSlots, which have space
> for that information.

Yeah agree. We will be working towards that.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


tup_convert_v4.patch
Description: Binary data


Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Andres Freund
On 2018-09-11 12:26:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-11 12:18:59 -0400, Tom Lane wrote:
> >> Doesn't matter: startup would hit a lock conflict and cancel the pg_dump
> >> to get out of it, long before approaching locktable full.
> 
> > Only if all that's happening in the same database, which is far from a
> > given.
> 
> Well, there remains the fact that we've seen no field reports that seem
> to trace to failure-to-acquire-AEL since 9.6 came out.  So arguing that
> this *could* be a probable scenario fails to comport with the available
> evidence.

True.  But it seems like it'd be really hard to actually encounter any
problems due to the failing lock, especially in a way that is visible
enough to be reported.  In most cases the missing AEL will let things
just continue to operate as normal, and in some cases you'll get errors
about not being able to access files.  I have *zero* trust that we'd
actually see this kind of error reported.

It might even be that we've seen reports of this, but didn't attribute
the errors correctly. There were a few reports about standbys having
corruptly replayed truncations - and a truncation that happens
concurrently to buffer accesses (due to the missing AEL) could
e.g. explain that, by later then re-enlarging the relations due to a
non-purged dirty block.


> My inclination is to fix it as I've suggested and wait to see if there
> are field complaints before expending a whole lot of effort to create
> a better solution.  In any case, I am not willing to create that better
> solution myself, and neither is Robert; are you?

On master I'd be ok with that, but on the backbranches that seems like
playing with user's setups.

Greetings,

Andres Freund



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-11 12:18:59 -0400, Tom Lane wrote:
>> Doesn't matter: startup would hit a lock conflict and cancel the pg_dump
>> to get out of it, long before approaching locktable full.

> Only if all that's happening in the same database, which is far from a
> given.

Well, there remains the fact that we've seen no field reports that seem
to trace to failure-to-acquire-AEL since 9.6 came out.  So arguing that
this *could* be a probable scenario fails to comport with the available
evidence.

My inclination is to fix it as I've suggested and wait to see if there
are field complaints before expending a whole lot of effort to create
a better solution.  In any case, I am not willing to create that better
solution myself, and neither is Robert; are you?

regards, tom lane



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Andres Freund
On 2018-09-11 12:18:59 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-11 12:03:44 -0400, Tom Lane wrote:
> >> If the startup process has acquired enough AELs to approach locktable
> >> full, any concurrent pg_dump has probably failed already, because it'd
> >> be trying to share-lock every table and so would have a huge conflict
> >> cross-section; it's hard to believe it wouldn't get cancelled pretty
> >> early in that process.  (Again, if you think this scenario is probable,
> >> you have to explain the lack of field complaints.)
> 
> > I was thinking of the other way round - there's a running pg_dump and
> > then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a
> > multi-tenant scenario).
> 
> Doesn't matter: startup would hit a lock conflict and cancel the pg_dump
> to get out of it, long before approaching locktable full.

Only if all that's happening in the same database, which is far from a
given.

Greetings,

Andres Freund



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-11 12:03:44 -0400, Tom Lane wrote:
>> If the startup process has acquired enough AELs to approach locktable
>> full, any concurrent pg_dump has probably failed already, because it'd
>> be trying to share-lock every table and so would have a huge conflict
>> cross-section; it's hard to believe it wouldn't get cancelled pretty
>> early in that process.  (Again, if you think this scenario is probable,
>> you have to explain the lack of field complaints.)

> I was thinking of the other way round - there's a running pg_dump and
> then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a
> multi-tenant scenario).

Doesn't matter: startup would hit a lock conflict and cancel the pg_dump
to get out of it, long before approaching locktable full.

regards, tom lane



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Andres Freund
Hi,

On 2018-09-11 12:03:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Isn't one of the most common ways to run into "out of shared memory"
> > "You might need to increase max_locks_per_transaction." to run pg_dump?
> > And that's pretty commonly done against standbys?
> 
> If the startup process has acquired enough AELs to approach locktable
> full, any concurrent pg_dump has probably failed already, because it'd
> be trying to share-lock every table and so would have a huge conflict
> cross-section; it's hard to believe it wouldn't get cancelled pretty
> early in that process.  (Again, if you think this scenario is probable,
> you have to explain the lack of field complaints.)

I was thinking of the other way round - there's a running pg_dump and
then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a
multi-tenant scenario).

Greetings,

Andres Freund



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-11 16:23:44 +0100, Simon Riggs wrote:
>> It's hard to see how any reasonable workload would affect the standby. And
>> if it did, you'd change the parameter and restart, just like you already
>> have to do if someone changes max_connections on master first.

> Isn't one of the most common ways to run into "out of shared memory"
> "You might need to increase max_locks_per_transaction." to run pg_dump?
> And that's pretty commonly done against standbys?

If the startup process has acquired enough AELs to approach locktable
full, any concurrent pg_dump has probably failed already, because it'd
be trying to share-lock every table and so would have a huge conflict
cross-section; it's hard to believe it wouldn't get cancelled pretty
early in that process.  (Again, if you think this scenario is probable,
you have to explain the lack of field complaints.)

The case where this behavior might really be of some use, IMO, is the
lots-of-small-transactions scenario --- but we lack the stop-new-locks
mechanism that would be needed to make the behavior actually effective
for that case.

regards, tom lane



Re: patch to allow disable of WAL recycling

2018-09-11 Thread Peter Eisentraut
On 10/09/2018 16:10, Jerry Jelinek wrote:
> Thank you again for running all of these tests on your various hardware
> configurations. I was not aware of the convention that the commented
> example in the config file is expected to match the default value, so I
> was actually trying to show what to use if you didn't want the default,
> but I am happy to update the patch so the comment matches the default.
> Beyond that, I am unsure what the next steps are for this proposal.

Could you organize the code so that the block below

/*
 * Initialize info about where to try to recycle to.
 */

isn't executed if recycling is off, since we don't need it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane  wrote:
>> The point of the previous coding here was that perhaps there's some
>> range of number-of-locks-needed where kicking hot-standby queries
>> off of locks would allow recovery to proceed.  However, it is (as
>> far as I know) unproven that that actually works, let alone is
>> effective enough to justify maintaining very-hard-to-test code for.

> I think, though, that it is pretty obvious that the intermediate zone
> which you refer to as "perhaps" existing does indeed exist.  Queries
> running on the standby consume lock table slots, and killing them
> frees up those slots.  Q.E.D.

Sounds like lock whack-a-mole to me.  If there are enough standby queries
running to create the problem, there's probably a continuous inbound
query stream; you aren't going to be able to free up locktable space on
net without some way of suppressing new lock acquisitions altogether.
That's still more mechanism that'd have to be designed, written, and
tested.  And believe you me, I will insist on a test case, now that we
know this has been broken for at least two years with nobody noticing.

regards, tom lane



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Andres Freund
Hi,

On 2018-09-11 16:23:44 +0100, Simon Riggs wrote:
> It's hard to see how any reasonable workload would affect the standby. And
> if it did, you'd change the parameter and restart, just like you already
> have to do if someone changes max_connections on master first.

Isn't one of the most common ways to run into "out of shared memory"
"You might need to increase max_locks_per_transaction." to run pg_dump?
And that's pretty commonly done against standbys?

Greetings,

Andres Freund



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Fabien COELHO



Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that were 
not retried (the client test code itself could not run correctly or the SQL 
sent was somehow wrong, which is also the client's fault), can't we?


I think not.

If a client asks for something "legal", but some other client in parallel 
happens to make an incompatible change which result in a serialization or 
deadlock error, the clients are not responsible for the raised errors, it 
is just that they happen to ask for something incompatible at the same 
time. So there is no user error per se, but the server is reporting its 
(temporary) inability to process what was asked for. For these errors, 
retrying is fine. If the client was alone, there would be no such errors, 
you cannot deadlock with yourself. This is really an isolation issue 
linked to parallel execution.


Why not handle client errors that can occur (but they may also not 
occur) the same way? (For example, always abort the client, or 
conversely do not make aborts in these cases.) Here's an example of such 
error:



client 5 got an error in command 1 (SQL) of script 0; ERROR:  division by zero


This is an interesting case. For me we must stop the script because the 
client is asking for something "stupid", and retrying the same won't 
change the outcome, the division will still be by zero. It is the client 
responsability not to ask for something stupid, the bench script is buggy, 
it should not submit illegal SQL queries. This is quite different from 
submitting something legal which happens to fail.


Maybe we can limit the number of failures in one statement, and abort the 
client if this limit is exceeded?...


I think this is quite debatable, and that the best option is to leavze 
this point out of the current patch, so that we could have retry on 
serial/deadlock errors.


Then you can submit another patch for a feature about other errors if you 
feel that there is a use case for going on in some cases. I think that the 
previous behavior made sense, and that changing it should only be 
considered as an option. As it involves discussing and is not obvious, 
later is better.


To get a clue about the actual issue you can use the options 
--failures-detailed (to find out out whether this is a serialization failure 
/ deadlock failure / other SQL failure / meta command failure) and/or 
--print-errors (to get the complete error message).


Yep, but for me it should haved stopped immediately, as it did before.

If you use the option --latency-limit, the time of tries will be limited 
regardless of the use of the option -t. Therefore ISTM that an unlimited 
number of tries can be used only if the time of tries is limited by the 
options -T and/or -L.


Indeed, I'm ok with forbidding unlimitted retries when under -t.


I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?


I'm not sure that the main debugging output will give a good clue of what's 
happened without full messages about errors, retries and failures...


I'm more argumenting about letting the user decide what they want.


These lines are quite long - do you suggest to wrap them this way?


Sure, if it is too long, then wrap.

Function getTransactionStatus name does not seem to correspond fully to 
what the function does. There is a passthru case which should be either 
avoided or clearly commented.


I don't quite understand you - do you mean that in fact this function finds 
out whether we are in a (failed) transaction block or not? Or do you mean 
that the case of PQTRANS_INTRANS is also ok?...


The former: although the function is named "getTransactionStatus", it does 
not really return the "status" of the transaction (aka PQstatus()?).


I tried not to break the limit of 80 characters, but if you think that this 
is better, I'll change it.


Hmmm. 80 columns, indeed...


I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).


If you mean CState.cnt I'm not sure if this is practically useful because the 
code uses only the sum of all client transactions including skipped and 
failed... Maybe we can rename this field to nxacts or total_cnt?


I'm fine with renaming the field if it makes thinks clearer. They are all 
counters, so naming them "cnt" or "total_cnt" does not help much. Maybe 
"succeeded" or "success" to show what is really counted?


--
Fabien.



Re: Flexible configuration for full-text search

2018-09-11 Thread Tom Lane
Aleksandr Parfenov  writes:
> As I wrote few weeks ago, there is a issue with stopwords processing in
> proposed syntax for full-text configurations. I want to separate word
> normalization and stopwords detection to two separate dictionaries. The
> problem is how to configure stopword detection dictionary.

> The cause of the problem is counting stopwords, but not using any
> lexemes for them. However, do we have to count stopwords during words
> counting or can we ignore them like unknown words? The problem I see is
> backward compatibility, since we have to regenerate all queries and
> vectors. But is it real problem or we can change its behavior in this
> way?

I think there should be a pretty high bar for forcing people to regenerate
all that data when they haven't made any change of their own choice.

Also, I'm not very clear on exactly what you're proposing here, but it
sounds like it'd have the effect of changing whether stopwords count in
phrase distances ('a  b').  I think that's right out --- whether or not
you feel the current distance behavior is ideal, asking people to *both*
rebuild all their derived data *and* change their applications will cause
a revolt.  It's not sufficiently obviously broken that we can change it.

regards, tom lane



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Simon Riggs
On 11 September 2018 at 16:11, Robert Haas  wrote:

> On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane  wrote:
> > The point of the previous coding here was that perhaps there's some
> > range of number-of-locks-needed where kicking hot-standby queries
> > off of locks would allow recovery to proceed.  However, it is (as
> > far as I know) unproven that that actually works, let alone is
> > effective enough to justify maintaining very-hard-to-test code for.
> > The field demand for such behavior can be measured by the number of
> > complaints we've had since breaking it in 9.6, namely zero.
>

Agreed.


> > So no, I do not want to re-implement and maintain that behavior on
> > the strength of just a guess that sometimes it might be useful.
> > If somebody else feels a burning passion for it, they can do the
> > work --- but I'd be inclined to argue that it'd be a HEAD-only
> > performance improvement, not a back-patchable bug fix.
>
> Mmph.  Well, I'm not in love with that position, because having the
> standby exit in such a way as to require manual intervention when an
> automated recovery strategy is possible is undesirable, but I'm not
> volunteering to do the work, either, so maybe we don't have many
> alternatives.
>
> I think, though, that it is pretty obvious that the intermediate zone
> which you refer to as "perhaps" existing does indeed exist.  Queries
> running on the standby consume lock table slots, and killing them
> frees up those slots.  Q.E.D.
>
> I suspect the reason why this hasn't come up much in practice is
> because (1) there are guards against setting various GUCs including
> max_connections and max_locks_per_transaction lower on the standby
> than they are set on the master (cf. CheckRequiredParameterValues) and
> (2) if those guards are not quite sufficient to ensure that the lock
> table on the standby is always as large there as it is on the master,
> it doesn't end up mattering because the number of AccessExclusiveLocks
> on relations is generally going to be a very small percentage of the
> total number of lock table slots.  But if somebody's interested in
> working on this, maybe we could construct a TAP test case that
> involves the master running "BEGIN; LOCK TABLE a1, a2, a3, a4, ;"
> concurrently with some "select pg_sleep from empty1, empty2, ..."
> queries on the standby.


max_connections on standby must be same or higher on standby

standby users are not allowed to request strong locks, so the only strong
locks coming through are AccessExclusiveLocks from master.

max_locks_per_transaction is minimum 10 and it would be reasonable to
assume it is set to same or higher than master also.

Workloads on master are also subject to memory errors, so excessive use of
locks on master would hit limits and that would naturally prune the
workload before it hit the standby.

It's hard to see how any reasonable workload would affect the standby. And
if it did, you'd change the parameter and restart, just like you already
have to do if someone changes max_connections on master first.

So I don't see any problem or anything abnormal in what Tom suggests.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bug report: Dramatic increase in conflict with recovery after upgrading 10.2->10.5

2018-09-11 Thread Justin Pryzby
On Mon, Sep 10, 2018 at 11:43:47AM +0200, Chris Travers wrote:
> At present I believe this to likely be a regression.  But if nobody else
> knows otherwise, I should know more in a couple days.

Do you have query logs or can you send details of the query ?

We're not using replication, but I can't help but think of this:
https://www.postgresql.org/message-id/flat/20180829140149.GO23024%40telsasoft.com
..since it's effectively a regression WRT reliability (at least if you reindex
pg_class).  Tom has a patch in HEAD to avoid the issue, but I don't know if
there's any plan to release 10.6 until november.

See related, earlier thread:
https://www.postgresql.org/message-id/flat/12259.1532117714%40sss.pgh.pa.us

You could compile your own binaries with Tom's patch applied (f868a81).
As you probably know, it's maybe not safe to install PG10.4 binaries on a data
dir where you've already upgraded to 10.5 (I believe because data file content
might be written which may not be handled correctly by earlier minor release).

Justin



Re: [HACKERS] Bug in to_timestamp().

2018-09-11 Thread Arthur Zakirov
On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:
> So, pushed!  Thanks to every thread participant for review and feedback.

Great! Should we close the commitfest entry? There is FX part of the
patch though. But it seems that nobody is happy with it. It could be
done with a separate patch anyway.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 11, 2018 at 5:54 AM, Simon Riggs  wrote:
>> Please explain why you think that would be with no restart.

> Because the startup process will die, and if that happens, IIRC,
> there's no crash-and-restart loop.  You're just done.

Unless we think that the startup process will never never ever throw
an error, that might be a behavior that needs discussion in itself.

Obviously an infinite crash-and-restart loop would be bad, but
perhaps the postmaster could have logic that would allow restarting
the startup process some small number of times.  I think the hard
part would be in deciding whether a previous restart had succeeded
(ie made progress beyond the prior crash point), so that it should
no longer count against the retry limit.

regards, tom lane



Re: Hint to set owner for tablespace directory

2018-09-11 Thread Peter Eisentraut
On 07/09/2018 17:59, Maksim Milyutin wrote:
>> You are still proposing to hint that the permissions on the tablespace
>> directory might be correct but that the main database instance is
>> running under the wrong user.
> Not exactly this way. The case that motivated me to make this patch was 
> the novice user after installing postgres from package (with setting up 
> postgres user and initializing PGDATA with right permissions) tried to 
> create the necessary tablespace directories from himself (the owner of 
> those directories was that user). The error message "could not set 
> permissions on directory ..." disoriented that user. The need to change 
> the owner of those directories came after careful reading of 
> documentation. I think it would be helpful to show the proposed hint to 
> more operatively resolve the problem.

I think it might be worth clarifying the documentation instead.  I'm
looking at the CREATE TABLESPACE reference page and it's not super clear
on first reading.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Robert Haas
On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane  wrote:
> The point of the previous coding here was that perhaps there's some
> range of number-of-locks-needed where kicking hot-standby queries
> off of locks would allow recovery to proceed.  However, it is (as
> far as I know) unproven that that actually works, let alone is
> effective enough to justify maintaining very-hard-to-test code for.
> The field demand for such behavior can be measured by the number of
> complaints we've had since breaking it in 9.6, namely zero.
>
> So no, I do not want to re-implement and maintain that behavior on
> the strength of just a guess that sometimes it might be useful.
> If somebody else feels a burning passion for it, they can do the
> work --- but I'd be inclined to argue that it'd be a HEAD-only
> performance improvement, not a back-patchable bug fix.

Mmph.  Well, I'm not in love with that position, because having the
standby exit in such a way as to require manual intervention when an
automated recovery strategy is possible is undesirable, but I'm not
volunteering to do the work, either, so maybe we don't have many
alternatives.

I think, though, that it is pretty obvious that the intermediate zone
which you refer to as "perhaps" existing does indeed exist.  Queries
running on the standby consume lock table slots, and killing them
frees up those slots.  Q.E.D.

I suspect the reason why this hasn't come up much in practice is
because (1) there are guards against setting various GUCs including
max_connections and max_locks_per_transaction lower on the standby
than they are set on the master (cf. CheckRequiredParameterValues) and
(2) if those guards are not quite sufficient to ensure that the lock
table on the standby is always as large there as it is on the master,
it doesn't end up mattering because the number of AccessExclusiveLocks
on relations is generally going to be a very small percentage of the
total number of lock table slots.  But if somebody's interested in
working on this, maybe we could construct a TAP test case that
involves the master running "BEGIN; LOCK TABLE a1, a2, a3, a4, ;"
concurrently with some "select pg_sleep from empty1, empty2, ..."
queries on the standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench - add pseudo-random permutation function

2018-09-11 Thread Fabien COELHO



Hello Hironobu-san,


I reviewed `pgbench-prp-func-1.patch` and found an incomplete implementation.


Indeed, thanks a lot for the debug, and the proposed fix!

I'm going to work a little bit more on the patch based on your proposed 
changes, and submit a v3, hopefully soon.


--
Fabien.



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Robert Haas
On Tue, Sep 11, 2018 at 5:54 AM, Simon Riggs  wrote:
> Please explain why you think that would be with no restart.

Because the startup process will die, and if that happens, IIRC,
there's no crash-and-restart loop.  You're just done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: CREATE ROUTINE MAPPING

2018-09-11 Thread David Fetter
On Mon, Sep 10, 2018 at 09:28:31AM +0200, Hannu Krosing wrote:
> Hi Corey
> 
> Have you looked at pl/proxy ?

DBI-Link pre-dated PL/proxy by some years, and was a good bit more
flexible as to what types of functions it could send where. Neither
has a capability fundamentally similar to this because neither had any
way to interact with the planner, other quals, etc.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: libpq stricter integer parsing

2018-09-11 Thread Peter Eisentraut
On 11/09/2018 11:00, Michael Paquier wrote:
> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index 5e7931ba90..bc7836d103 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1591,6 +1591,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
>  
>  
> 
> +
> +   
> +Integer values expected for keywords port,
> +connect_timeout, keepalives_idle,
> +keepalives_interval and
> +keepalives_timeout are parsed more strictly as
> +of PostgreSQL 12, i.e. values including trailing 
> garbage
> +or overflowing are rejected.
> +   
>
>   

I would leave this out.  We don't need to document every single
refinement of parsing rules.  This might better belong in the release notes.

> + appendPQExpBuffer(>errorMessage,
> +   libpq_gettext("invalid value for 
> keyword \"%s\"\n"),
> +   context);

Add the actual invalid value to the error message.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Tom Lane
Simon Riggs  writes:
> On 10 September 2018 at 19:16, Robert Haas  wrote:
>> On Fri, Sep 7, 2018 at 6:37 PM, Tom Lane  wrote:
>>> So my inclination is to remove the reportMemoryError = false parameter,
>>> and just let an error happen in the unlikely situation that we hit OOM
>>> for the lock table.

>> Wouldn't that take down the entire cluster with no restart?

> Please explain why you think that would be with no restart.

More to the point, what are our alternatives, and how much can we
really improve it?  If the WAL stream requires more concurrent AELs
than the standby's lock table can hold, there isn't going to be any
solution short of manual intervention to increase the standby's
max_locks_per_transaction.

The point of the previous coding here was that perhaps there's some
range of number-of-locks-needed where kicking hot-standby queries
off of locks would allow recovery to proceed.  However, it is (as
far as I know) unproven that that actually works, let alone is
effective enough to justify maintaining very-hard-to-test code for.
The field demand for such behavior can be measured by the number of
complaints we've had since breaking it in 9.6, namely zero.

So no, I do not want to re-implement and maintain that behavior on
the strength of just a guess that sometimes it might be useful.
If somebody else feels a burning passion for it, they can do the
work --- but I'd be inclined to argue that it'd be a HEAD-only
performance improvement, not a back-patchable bug fix.

regards, tom lane



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 11-09-2018 16:47, Marina Polyakova wrote:

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,
I'd insist in a comment that "cnt" does not include "skipped" 
transactions

(anymore).


If you mean CState.cnt I'm not sure if this is practically useful
because the code uses only the sum of all client transactions
including skipped and failed... Maybe we can rename this field to
nxacts or total_cnt?


Sorry, I misread your proposal for the first time. Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,


v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).


About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and 
local

"make check" are ok.


:-)


* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta
command and other sql errors, which I find not desirable.

When a meta-command fails, before the patch the command is aborted and
there is a convenient error message:

  sh> pgbench -T 10 -f bad-meta.sql
  bad-meta.sql:1: unexpected function name (false) in command "set" 
[...]

  \set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error
till the time is completed, and there are no clue about the actual
issue:

  sh> pgbench -T 10 -f bad-meta.sql
  starting vacuum...end.
  transaction type: bad-meta.sql
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 27993953 (100.000%)
  ...

Same thing about SQL errors, an immediate abort...

  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  client 0 aborted in command 0 of script 0; ERROR:  syntax error at or 
near ";"

  LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so
that there is no clue that the user was asking for something bad.

  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  transaction type: bad-sql.sql
  scaling factor: 1
  query mode: simple
  number of clients: 1
  number of threads: 1
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 274617 (100.000%)
  # no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta 
command and
miscellaneous SQL errors should result in immediatly aborting the whole 
run,
because the client test code itself could not run correctly or the SQL 
sent

was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior
on other kind of errors. If these are to be changed, ISTM that it
would be a distinct patch and would require some discussion, and
possibly an option to enable it or not if some use case emerge. AFA
this patch is concerned, I'd suggest to let that out.

...
The following remarks are linked to the change of behavior discussed 
above:
makeVariableValue error message is not for debug, but must be kept in 
all

cases, and the false returned must result in an immediate abort. Same
thing about
lookupCreateVariable, an invalid name is a user error which warrants
an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not 
desirable.


Hmm, but we can say the same for serialization or deadlock errors that 
were not retried (the client test code itself could not run correctly or 
the SQL sent was somehow wrong, which is also the client's fault), can't 
we? Why not handle client errors that can occur (but they may also not 
occur) the same way? (For example, always abort the client, or 
conversely do not make aborts in these cases.) Here's an example of such 
error:


starting vacuum...end.
transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2500/2500
maximum number of tries: 1
latency average = 0.375 ms
tps = 26695.292848 (including connections establishing)
tps = 27489.678525 (excluding connections establishing)
statement latencies in milliseconds and failures:
 0.001   0  \set divider random(-1000, 1000)
 0.245   0  SELECT 1 / :divider;

starting vacuum...end.
client 5 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


client 0 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


client 7 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2497/2500
number of failures: 3 (0.120%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 3 (0.120%)
maximum number of tries: 1
latency average = 0.579 ms (including failures)
tps = 17240.662547 (including connections establishing)
tps = 17862.090137 (excluding connections establishing)
statement latencies 

Re: Index Skip Scan

2018-09-11 Thread Jesper Pedersen

Hi Dmitry,

On 9/10/18 5:47 PM, Dmitry Dolgov wrote:

On Mon, 18 Jun 2018 at 17:26, Jesper Pedersen  
wrote:

I've looked through the patch more closely, and have a few questions:



Thanks for your review !


* Is there any reason why only copy function for the IndexOnlyScan node
   includes an implementation for distinctPrefix?


Oversight -- thanks for catching that.


* What is the purpose of HeapFetches? I don't see any usage of this variable
   except assigning 0 to it once.



That can be removed.

New version WIP v2 attached.

Best regards,
 Jesper
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 6b2b9e3742..74ed15bfeb 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -129,6 +129,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afbe4e..fd06549491 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3725,6 +3725,22 @@ ANY num_sync ( 
+  enable_indexskipscan (boolean)
+  
+   enable_indexskipscan configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of index-skip-scan plan
+types (see ). This parameter requires
+that enable_indexonlyscan is on.
+The default is on.
+   
+  
+ 
+
  
   enable_material (boolean)
   
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index beb99d1831..ccbb44288d 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -135,6 +135,7 @@ typedef struct IndexAmRoutine
 amendscan_function amendscan;
 ammarkpos_function ammarkpos;   /* can be NULL */
 amrestrpos_function amrestrpos; /* can be NULL */
+amskip_function amskip; /* can be NULL */
 
 /* interface functions to support parallel index scans */
 amestimateparallelscan_function amestimateparallelscan;/* can be NULL */
@@ -665,6 +666,14 @@ amrestrpos (IndexScanDesc scan);
 
   
 
+bool
+amskip (IndexScanDesc scan, ScanDirection direction, int prefix);
+
+   TODO
+  
+
+  
+
 Size
 amestimateparallelscan (void);
 
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index a57c5e2e1f..842db029fa 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1312,6 +1312,22 @@ SELECT target FROM tests WHERE subject = 'some-subject' AND success;
such cases and allow index-only scans to be generated, but older versions
will not.
   
+
+  
+Index Skip Scans
+
+
+  index
+  index-skip scans
+
+
+  index-skip scan
+
+
+
+  TODO
+
+  
  
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e95fbbcea7..85d6571c6d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -106,6 +106,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = brincostestimate;
 	amroutine->amoptions = brinoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 0a32182dd7..162639090d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -61,6 +61,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gincostestimate;
 	amroutine->amoptions = ginoptions;
 	amroutine->amproperty = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8a42effdf7..ecd4af49d8 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -83,6 +83,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = gistcostestimate;
 	amroutine->amoptions = gistoptions;
 	amroutine->amproperty = gistproperty;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0002df30c0..7120950868 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -79,6 +79,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
+	amroutine->amskip = NULL;
 	amroutine->amcostestimate = hashcostestimate;
 	amroutine->amoptions = hashoptions;
 	amroutine->amproperty = NULL;
diff --git 

RE: stat() on Windows might cause error if target file is larger than 4GB

2018-09-11 Thread Higuchi, Daisuke
Michael-san, 

From: Michael Paquier [mailto:mich...@paquier.xyz]
>Does something like the patch attached help?
>This makes sure that st_size is set correctly for files with a size larger 
>than 4GB.

Thank you for creating patch, but this does not solve current problem. 
Of cause setting wrong size to st_size is problem, 
I think the cause of this problem is stat()'s return value (=-1). 

In pgwin32_safestat(), if stat() try to deal with files with a size larger than 
4GB, 
the return value is -1. So, pgwin32_safestat() exits before calculating 
buf->st_size. 


pgwin32_safestat(const char *path, struct stat *buf)
{
int r;
WIN32_FILE_ATTRIBUTE_DATA attr;

r = stat(path, buf);
if (r < 0)
{
...
return r;
}
...
buf->st_size = attr.nFileSizeLow;

return 0;
}


So, attached patch help me and strange message disappeared, 
but I ignore the impact of this for others now. 

Regards, 
Daisuke, Higuchi



win32-stat-remove-return.patch
Description: win32-stat-remove-return.patch


Re: CREATE ROUTINE MAPPING

2018-09-11 Thread Masahiko Sawada
Thank you for the comment.

On Mon, Sep 10, 2018 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
> Hello.
>
> At Tue, 4 Sep 2018 09:34:21 +0900, Masahiko Sawada  
> wrote in 
>> On Tue, Sep 4, 2018 at 5:48 AM, David Fetter  wrote:
>> > On Fri, Aug 31, 2018 at 05:18:26PM +0900, Masahiko Sawada wrote:
>> >> On Thu, Jan 25, 2018 at 2:13 PM, David Fetter  wrote:
>> >> > On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
>> >> >> >
>> >> >> >
>> >> >> > >
>> >> >> > > But other situations seem un-handle-able to me:
>> >> >> > >
>> >> >> > > SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
>> >> >> >
>> >> >> > Do we have any way, or any plan to make a way, to push the set 
>> >> >> > (SELECT
>> >> >> > x FROM local_table WHERE active = true) to the remote side for
>> >> >> > execution there?  Obviously, there are foreign DBs that couldn't
>> >> >> > support this, but I'm guessing they wouldn't have much by way of UDFs
>> >> >> > either.
>> >> >> >
>> >> >>
>> >> >> No. The remote query has to be generated at planning time, so it can't 
>> >> >> make
>> >> >> predicates out of anything that can't be resolved into constants by the
>> >> >> planner itself. The complexities of doing so would be excessive, far 
>> >> >> better
>> >> >> to let the application developer split the queries up because they know
>> >> >> better which parts have to resolve first.
>> >> >
>> >> > So Corey and I, with lots of inputs from Andrew Gierth and Matheus
>> >> > Oliveira, have come up with a sketch of how to do this, to wit:
>> >> >
>> >> > - Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
>> >> >   LANGUAGE as parameters, but not both. This seems simpler, at least
>> >> >   in a proof of concept, than creating SQL standard compliant grammar
>> >> >   out of whole cloth.  The SQL standard grammar could be layered in
>> >> >   later via the rewriter if this turns out to work.
>> >>
>> >> I'm also interested in this feature. While studying this feature, I
>> >> understood that this feature just pair a local function with a remote
>> >> function, not means that creates a kind of virtual function that can
>> >> be invoked on only foreign servers. For example, if we execute the
>> >> following SQL the local_func() is invoked in local because the col1
>> >> column of local_table is referenced by it.
>
> Do you mean that ISO/IEC 9075-9:2016 (right?) is defining that
> (and we must follow it)?  Or does it comes by referring to
> something like [1]? As far as I see David's mail upthread,
> OPTIONS is not precisely defined.

Yeah, I read [1] and the final committee draft ISO/IEC 9075-9:2006, it
might be old though.

>
> [1] http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf
>
> Unfortunately I don't have access to the document nor concrete
> use cases. With a rough idea of "remote mapping", I can guess the
> followng four use cases.  Each example syntax is just a guess
> without any consideration on implementability or other
> restrictions. The patch looks currently covering B.

Thank you for summarizing.

>
> A. Just notify a function can be just pushed down.
>
>   ex. SELECT foo(1, 'bar');  Remote: SELECT foo(1, 'bar');
>
>CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem;
>(or same as B)
>
> B. Replace function name with the remote equivalent.
>
>   ex. SELECT foo(1, 'bar');  Remote: SELECT hoge(1, 'bar');
>
>CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
>OPTIONS (remote_func_name 'hoge'));
>
> C. Adjust function specification with remote.
>
>   ex. SELECT foo(1, 'bar');  Remote: SELECT hoge('bar', 1, true);
>
>CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
>OPTIONS (remote_expression 'hoge($2,$1,true)');
>
> D. Replace with an equivalent remote expression.
>
>   ex. SELECT foo(1, 'bar');  Remote: SELECT ('bar' || to_char(1 % 10));
>
>CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
>   OPTIONS (remote_expression '$2 || to_char($1 % 10)');
>
> I haven't looked the patch in depth, but the core side looks
> generic and the FDW side is extensible to A, C and D. I think B
> is enough as a starter.
> I don't mean that we should implement all of them. They are just 
> possibilities.

I agree that this feature covers A and B as the first step. But I'm
concerned that for D (and maybe for C?) the volatility of mapped
function could be changed. That is, currently we allow to push down
only immutable functions but they might break it. Also, can the
replacing a function with any expression be a risk of sql injections?

Also, according to the standard the routine mapping seems to work when
columns of referenced foreign table are passed to the function that is
mapped to the remote function. The functions in WHERE clause will
obviously be mapped but I'm not sure for function in target lists.

>
>
> I have some comments on the patch.
>
> It doesn't seem working. Am I missing something?
> 
> create server sv1 foreign data wrapper 

Re: Flexible configuration for full-text search

2018-09-11 Thread Aleksandr Parfenov
Hello hackers!

As I wrote few weeks ago, there is a issue with stopwords processing in
proposed syntax for full-text configurations. I want to separate word
normalization and stopwords detection to two separate dictionaries. The
problem is how to configure stopword detection dictionary.

The cause of the problem is counting stopwords, but not using any
lexemes for them. However, do we have to count stopwords during words
counting or can we ignore them like unknown words? The problem I see is
backward compatibility, since we have to regenerate all queries and
vectors. But is it real problem or we can change its behavior in this
way?

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: StandbyAcquireAccessExclusiveLock doesn't necessarily

2018-09-11 Thread Simon Riggs
On 10 September 2018 at 19:16, Robert Haas  wrote:

> On Fri, Sep 7, 2018 at 6:37 PM, Tom Lane  wrote:
> > So my inclination is to remove the reportMemoryError = false parameter,
> > and just let an error happen in the unlikely situation that we hit OOM
> > for the lock table.
>
> Wouldn't that take down the entire cluster with no restart?
>

Please explain why you think that would be with no restart.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Query is over 2x slower with jit=on

2018-09-11 Thread Amit Khandekar
On 10 September 2018 at 21:39, Andres Freund  wrote:
> Hi,
>
> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>> Attached is a patch that accumulates the per-worker jit counters into
>> the leader process.
>
> Thanks!
>
>
>> I think we better show per-worker jit info also. The current patch
>> does not show that. I think it would be easy to continue on the patch
>> to show per-worker info also. Under the Gather node, we can show
>> per-worker jit counters. I think this would be useful too, besides the
>> cumulative figures in the leader process. Comments ?
>
> Yes, I think that'd be good.
Ok. Will continue on the patch.

> I think we either should print the stats at
> the top level as $leader_value, $combined_worker_value, $total_value or
> just have the $combined_worker_value at the level where we print other
> stats from the worker, too.

Yes, I think we can follow and be consistent with whatever way in
which the other worker stats are printed. Will check.

Note: Since there can be a multiple separate Gather plans under a plan
tree, I think we can show this info for each Gather plan.

>
>
>>  /*
>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>> + */
>> +static void
>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>> +
>> SharedJitInstrumentation *shared_jit)
>> +{
>> + int n;
>> + JitContext *jit = planstate->state->es_jit;
>> +
>> + /* If the leader hasn't yet created a jit context, allocate one now. */
>> + if (!jit)
>> + {
>> + planstate->state->es_jit = jit =
>> + jit_create_context(planstate->state->es_jit_flags);
>> + }
>
> Wouldn't it be better to move the jit instrumentation to outside of the
> context, to avoid having to do this?  Or just cope with not having
> instrumentation for the leader in this case?  We'd kinda need to deal
> with failure to create one anyway?

Yeah, I think taking out the instrumentation out of the context looks
better. Will work on that.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: libpq stricter integer parsing

2018-09-11 Thread Michael Paquier
On Sun, Sep 09, 2018 at 09:01:15AM +0200, Fabien COELHO wrote:
> Hmmm. This is what the sentence following the above tries to explain
> implicitely:
> 
>   Versions of libpq before
>   PostgreSQL 12 accepted trailing garbage or overflows.
> 
> Maybe I can rephrase it in one sentence, eg:
> 
> "From PostgreSQL 12, integer values for keywords ... are parsed strictly,
> i.e. trailing garbage and errors on overflows are not accepted
> anymore."

Okay, I am including that formulation.  I have not put yet much thoughts
into locating this in another place of the docs.  Or perhaps we could
just discard it from the final commit.

I have been reviewing your patch a bit more, and I have found an issue:
overflows are not correctly detected.  For example by specifying
something like port=50 I would have expected an error but the
parsing code failed to detect that.  Values like -1 need to be accepted
though are equivalent to an unknown state when it comes to keepalive_*.

In conclusion, I finish with the simplified patch attached.  Fabien, is
that acceptable to you?
--
Michael
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 5e7931ba90..bc7836d103 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1591,6 +1591,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 
 

+
+   
+Integer values expected for keywords port,
+connect_timeout, keepalives_idle,
+keepalives_interval and
+keepalives_timeout are parsed more strictly as
+of PostgreSQL 12, i.e. values including trailing garbage
+or overflowing are rejected.
+   
   
  
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 42cdb971a3..c7a4814e8e 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1587,6 +1587,34 @@ useKeepalives(PGconn *conn)
 	return val != 0 ? 1 : 0;
 }
 
+/*
+ * Parse and try to interpret "value" as an integer value, and if successful,
+ * store it in *result, complaining if there is any trailing garbage or an
+ * overflow.
+ */
+static bool
+parse_int_param(const char *value, int *result, PGconn *conn,
+const char *context)
+{
+	char   *end;
+	long	numval;
+
+	*result = 0;
+
+	errno = 0;
+	numval = strtol(value, , 10);
+	if (errno == 0 && *end == '\0' && numval == (int) numval)
+	{
+		*result = numval;
+		return true;
+	}
+
+	appendPQExpBuffer(>errorMessage,
+	  libpq_gettext("invalid value for keyword \"%s\"\n"),
+	  context);
+	return false;
+}
+
 #ifndef WIN32
 /*
  * Set the keepalive idle timer.
@@ -1599,7 +1627,8 @@ setKeepalivesIdle(PGconn *conn)
 	if (conn->keepalives_idle == NULL)
 		return 1;
 
-	idle = atoi(conn->keepalives_idle);
+	if (!parse_int_param(conn->keepalives_idle, , conn, "keepalives_idle"))
+		return 0;
 	if (idle < 0)
 		idle = 0;
 
@@ -1631,7 +1660,8 @@ setKeepalivesInterval(PGconn *conn)
 	if (conn->keepalives_interval == NULL)
 		return 1;
 
-	interval = atoi(conn->keepalives_interval);
+	if (!parse_int_param(conn->keepalives_interval, , conn, "keepalives_interval"))
+		return 0;
 	if (interval < 0)
 		interval = 0;
 
@@ -1664,7 +1694,8 @@ setKeepalivesCount(PGconn *conn)
 	if (conn->keepalives_count == NULL)
 		return 1;
 
-	count = atoi(conn->keepalives_count);
+	if (!parse_int_param(conn->keepalives_count, , conn, "keepalives_count"))
+		return 0;
 	if (count < 0)
 		count = 0;
 
@@ -1698,13 +1729,15 @@ setKeepalivesWin32(PGconn *conn)
 	int			idle = 0;
 	int			interval = 0;
 
-	if (conn->keepalives_idle)
-		idle = atoi(conn->keepalives_idle);
+	if (conn->keepalives_idle &&
+		!parse_int_param(conn->keepalives_idle, , conn, "keepalives_idle"))
+		return 0;
 	if (idle <= 0)
 		idle = 2 * 60 * 60;		/* 2 hours = default */
 
-	if (conn->keepalives_interval)
-		interval = atoi(conn->keepalives_interval);
+	if (conn->keepalives_interval &&
+		!parse_int_param(conn->keepalives_interval, , conn, "keepalives_interval"))
+		return 0;
 	if (interval <= 0)
 		interval = 1;			/* 1 second = default */
 
@@ -1831,7 +1864,10 @@ connectDBComplete(PGconn *conn)
 	 */
 	if (conn->connect_timeout != NULL)
 	{
-		timeout = atoi(conn->connect_timeout);
+		if (!parse_int_param(conn->connect_timeout, , conn,
+			 "connect_timeout"))
+			return 0;
+
 		if (timeout > 0)
 		{
 			/*
@@ -1842,6 +1878,8 @@ connectDBComplete(PGconn *conn)
 			if (timeout < 2)
 timeout = 2;
 		}
+		else /* negative means 0 */
+			timeout = 0;
 	}
 
 	for (;;)
@@ -2108,7 +2146,9 @@ keep_going:		/* We will come back to here until there is
 			thisport = DEF_PGPORT;
 		else
 		{
-			thisport = atoi(ch->port);
+			if (!parse_int_param(ch->port, , conn, "port"))
+goto error_return;
+
 			if (thisport < 1 || thisport > 65535)
 			{
 appendPQExpBuffer(>errorMessage,


signature.asc
Description: PGP signature


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 08-09-2018 10:17, Fabien COELHO wrote:

Hello Marina,


Hello, Fabien!


About the two first preparatory patches.


v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a 
client's random seed during the repeating of transactions after 
serialization/deadlock failures).


Same version as the previous one, which was ok. Still applies,
compiles, passes tests. Fine with me.


v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).


Simpler version, applies cleanly on top of previous patch, compiles
and global & local "make check" are ok. Fine with me as well.


Glad to hear it :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-09-11 Thread Masahiko Sawada
On Sat, Jul 28, 2018 at 4:12 AM, Alexander Kuzmenkov
 wrote:
> Daniel,
>
> Thanks for the update.
>
>
> On 07/25/2018 01:37 AM, Daniel Gustafsson wrote:
>>
>>
>>> Hmm, this is missing the eqop fields of SortGroupClause. I haven't
>>> tested yet but does the similar degradation happen if two
>>> SortGroupCaluses have different eqop and the same other values?
>>
>> I wasn’t able to construct a case showing this, but I also think you’re
>> right.
>> Do you have an idea of a query that can trigger a regression?  The
>> attached
>> patch adds a stab at this, but I’m not sure if it’s the right approach.
>
>
> To trigger that, in your test example you could order by empno::int8 for one
> window, and by empno::int2 for another. But don't I think you have to
> compare eqop here, because if eqop differs, sortop will differ too. I
> removed the comparison from the patch. I also clarified (I hope) the
> comments, and did the optimization I mentioned earlier: using array instead
> of list for active clauses. Please see the attached v6. Otherwise I think
> the patch is good.
>

Thank you! That makes sense and the patch looks good to me. FWIW maybe
it's good idea to add the comment describing why we didn't that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Pluggable Storage - Andres's take

2018-09-11 Thread Beena Emerson
Hello,

On Mon, 10 Sep 2018, 19:33 Amit Kapila,  wrote:

> On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi 
> wrote:
> >
> > On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi 
> wrote:
> >>
> > pg_stat_get_tuples_hot_updated and others:
> > /*
> > * Counter tuples_hot_updated stores number of hot updates for heap table
> > * and the number of inplace updates for zheap table.
> > */
> > if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
> > RelationStorageIsZHeap(rel))
> > result = 0;
> > else
> > result = (int64) (tabentry->tuples_hot_updated);
> >
> >
> > Is the special condition is needed? The values should be 0 because of
> zheap right?
> >
>
> I also think so.  Beena/Mithun has worked on this part of the code, so
> it is better if they also confirm once.
>

We have used the hot_updated counter to count the number of inplace updates
for zheap to qvoid introducing a new counter. Though, technically, hot
updates are 0 for zheap, the counter could hold non-zero value indicating
the inplace updates.

Thank you

>


Re: stat() on Windows might cause error if target file is larger than 4GB

2018-09-11 Thread Michael Paquier
Higuchi-san,

On Mon, Sep 10, 2018 at 03:44:54PM +0900, Michael Paquier wrote:
> Yes, the fix needs to happen there.  It seems to me that what we are
> looking for here is to complete the calculation of st_size with
> nFileSizeHigh, so as we are able to compile a correct 64-bit integer
> size, allowing support for larger files on Win32, which is something
> that fsync() support for pg_dump has actually changed by opening stat()
> to be called for files with a size larger than 4GB.  (I need badly to
> setup a Windows machine...)

Does something like the patch attached help?  This makes sure that
st_size is set correctly for files with a size larger than 4GB.
Unfortunately I have not been able to test it, I have tried for some
time today to deploy a Win10 VM with VS 2017 (community version), but I
cannot get past some errors because of a set of missing files when
trying to compile, the one build complains about is Platform.targets,
which goes missing even if vc_redist is installed for all the C++
deliverables.  The user experience has gotten worse lately, at least for
me..
--
Michael
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 26611922db..3ee9d953c7 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -395,11 +395,8 @@ pgwin32_safestat(const char *path, struct stat *buf)
 		return -1;
 	}
 
-	/*
-	 * XXX no support for large files here, but we don't do that in general on
-	 * Win32 yet.
-	 */
-	buf->st_size = attr.nFileSizeLow;
+	/* note that this supports files larger than 4GB */
+	buf->st_size = ((off_t) attr.nFileSizeHigh) << 32 | attr.nFileSizeLow;
 
 	return 0;
 }


signature.asc
Description: PGP signature