Re: Issues while building PG in MS Windows, using MSYS2 and MinGW-w64

2018-05-02 Thread Alexander Lakhin

"insaf.k"  wrote:
I am trying to build PG from source, in MS Windows using MSYS2 and 
MinGW-w64. I've tried to build PG 10.0 as wells as 10.3.

I've done configuring like this
./configure --prefix="/d/pg10/"
And when I do "make" or "make world", I'm getting compilation error. 
I've attached complete error report at the end of the mail.
Can you try the .cmd script (https://pastebin.com/LU9gdn27), that 
compiles REL_10_STABLE on Windows with msys2 and mingw-w64 successfully?

In case of any issues with it, please let me now.

Best regards,
--
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 4:48 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Thu, May 3, 2018 at 4:04 PM, Tom Lane  wrote:
>>> It strikes me also that, at least for debugging purposes, it's seriously
>>> awful that you can't tell from outside what result this function got.
>
>> I don't think *broken* CPUs are something we need to handle, are they?
>
> I'm not worried so much about broken hardware as about scenarios like
> "Munro got the magic constant wrong and nobody ever noticed", or more
> likely "somebody broke it later and we didn't notice".  We absolutely
> do not expect the code path with function-returns-the-wrong-answer to be
> taken, and I think it would be appropriate to complain loudly if it is.

Ok.  Here is a patch that compares hw and sw results and calls
elog(ERROR) if they don't match.  It also does elog(DEBUG1) with its
result just before returning.

Here's what I see at startup on my ARMv8 machine when I set
log_min_messages = debug1 in my .conf (it's the very first line
emitted):

2018-05-03 05:07:25.904 UTC [19677] DEBUG:  using armv8 crc2 hardware = 1

Here's what I see if I hack the _armv8() function to do kill(getpid(), SIGILL):

2018-05-03 05:09:47.012 UTC [21079] DEBUG:  using armv8 crc2 hardware = 0

Here's what I see if I hack the _armv8() function to add 1 to its result:

2018-05-03 05:11:07.366 UTC [22218] FATAL:  crc32 hardware and
software results disagree
2018-05-03 05:11:07.367 UTC [22218] LOG:  database system is shut down

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-endianness-bug-in-ARMv8-CRC32-detection.patch
Description: Binary data


Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro  writes:
> On Thu, May 3, 2018 at 4:04 PM, Tom Lane  wrote:
>> It strikes me also that, at least for debugging purposes, it's seriously
>> awful that you can't tell from outside what result this function got.

> I don't think *broken* CPUs are something we need to handle, are they?

I'm not worried so much about broken hardware as about scenarios like
"Munro got the magic constant wrong and nobody ever noticed", or more
likely "somebody broke it later and we didn't notice".  We absolutely
do not expect the code path with function-returns-the-wrong-answer to be
taken, and I think it would be appropriate to complain loudly if it is.

regards, tom lane



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 4:04 PM, Tom Lane  wrote:
> I wrote:
>> Andrew Gierth  writes:
>>> Isn't there a hidden assumption about endianness there?

Right, thanks.

>> Yeah, I was wondering about that too, but does anyone actually run
>> ARMs big-endian?

I think all the distros I follow dropped big endian support in recent
years, but that's no excuse.

> After a bit more thought ... we could remove the magic constant,
> along with any assumptions about endianness, by doing it like this:
>
> result = (pg_comp_crc32c_armv8(0, , sizeof(data)) ==
>   pg_comp_crc32c_sb8(0, , sizeof(data)));
>
> Of course we'd eat a few more cycles during the test, but it's hard
> to see that mattering.

I was thinking of proposing this:

-   uint64  data = 42;
+   uint64  data = UINT64CONST(0x4242424242424242);

... and:

-   result = (pg_comp_crc32c_armv8(0, , sizeof(data))
== 0xdd439b0d);
+   result = (pg_comp_crc32c_armv8(0, , sizeof(data))
== 0x54eb145b);

No strong preference though.

> It strikes me also that, at least for debugging purposes, it's seriously
> awful that you can't tell from outside what result this function got.
> Certainly the outcome that "pg_comp_crc32c_armv8 executed but returned
> the wrong answer" is not something that ought to go unremarked.
> We could elog the results, but I'm not very sure what log level is
> appropriate --- also, I doubt we want another log entry from every
> launched process, so how to prevent that?

I don't think *broken* CPUs are something we need to handle, are they?
 Hmm, I suppose there might be a hypothetical operating system or ARM
chip that somehow doesn't raise SIGILL and returns garbage, and then
it's nice to fall back to the software version (unless you're 1/2^32
unlucky).

For debugging I was just putting a temporary elog in.  Leaving one in
there at one of the DEBUG levels seems reasonable though.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
I wrote:
> Andrew Gierth  writes:
>> Isn't there a hidden assumption about endianness there?

> Yeah, I was wondering about that too, but does anyone actually run
> ARMs big-endian?

After a bit more thought ... we could remove the magic constant,
along with any assumptions about endianness, by doing it like this:

result = (pg_comp_crc32c_armv8(0, , sizeof(data)) ==
  pg_comp_crc32c_sb8(0, , sizeof(data)));

Of course we'd eat a few more cycles during the test, but it's hard
to see that mattering.

It strikes me also that, at least for debugging purposes, it's seriously
awful that you can't tell from outside what result this function got.
Certainly the outcome that "pg_comp_crc32c_armv8 executed but returned
the wrong answer" is not something that ought to go unremarked.
We could elog the results, but I'm not very sure what log level is
appropriate --- also, I doubt we want another log entry from every
launched process, so how to prevent that?

regards, tom lane



Re: GSoC 2018: thrift encoding format

2018-05-02 Thread Charles Cui
Thanks for your confirm Aleksander!
Also I am thinking of how to deal with complex
data structure like map, list, or set. I guess one possible
solution is to get raw data bytes for these data structure?
Otherwise it could be hard to wrap into a Datum.

2018-05-02 8:38 GMT-07:00 Aleksander Alekseev :

> Hello Charles,
>
> > Can I assume the data in thrift is always send inside a struct?
>
> Sure!
>
> > I think this question also valid for protobuf?
>
> Right, pg_protobuf assumes that data is always a message which is an
> equivalent of Thrift's struct.
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Andrew Gierth  writes:
> "Thomas" == Thomas Munro  writes:
> +   uint64  data = 42;

> Isn't there a hidden assumption about endianness there?

Yeah, I was wondering about that too, but does anyone actually run
ARMs big-endian?

regards, tom lane



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

+   uint64  data = 42;

Isn't there a hidden assumption about endianness there?

-- 
Andrew (irc:RhodiumToad)



Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread David G. Johnston
On Wednesday, May 2, 2018, Alvaro Herrera  wrote:

> Robert Haas wrote:
> > On Wed, May 2, 2018 at 9:28 AM, Alvaro Herrera 
> wrote:
> > > I admit I am more concerned about the possibility of bugs than I am
> > > about providing a performance-related tool.
> >
> > I agree that if partition pruning has bugs, somebody might want to
> > turn it off.  On the other hand, when they do, there's a good chance
> > that they will lose so much performance that they'll still be pretty
> > sad.  Somebody certainly could have a workload where the pruning
> > helps, but by a small enough amount that shutting it off is
> > acceptable.  But I suspect that's a somewhat narrow target.
> >
> > I'm not going to go to war over this, though.  I'm just telling you
> > what I think.
>
> Well, we didn't have a GUC initially, evidently because none of us
> thought that this would be a huge problem.  So maybe you're both right
> and it's overkill to have it.  I'm not set on having it, either.  Does
> anybody else have an opinion?
>

I toss my +1 to removing it altogether.

David J.


Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, May 2, 2018 at 9:28 AM, Alvaro Herrera  
> wrote:
> > I admit I am more concerned about the possibility of bugs than I am
> > about providing a performance-related tool.
> 
> I agree that if partition pruning has bugs, somebody might want to
> turn it off.  On the other hand, when they do, there's a good chance
> that they will lose so much performance that they'll still be pretty
> sad.  Somebody certainly could have a workload where the pruning
> helps, but by a small enough amount that shutting it off is
> acceptable.  But I suspect that's a somewhat narrow target.
> 
> I'm not going to go to war over this, though.  I'm just telling you
> what I think.

Well, we didn't have a GUC initially, evidently because none of us
thought that this would be a huge problem.  So maybe you're both right
and it's overkill to have it.  I'm not set on having it, either.  Does
anybody else have an opinion?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Yuriy Zhuravlev
>
> Sorry, but comparing lines at that state is just bullshit.


I totally disagree, proportions will be same in any case.

Most of the comments of converted tests are missing.
>

Add 100-500 lines? ok.

You detect like a third of the things that the old configure
> detected.
>

I tried to use CMake way when it exists but for some other things, I
porting checking from old autoconf system.

The
> thread safety check definitely aren't comparable. The int128 type checks
> aren't comparable.
>

The atomics check don't guard
> against compilers that allow to reference undefined functions at compile
> time.


 I am not sure about "comparable",  but anyway you can make PR with a fix
or at least make an issue in my tracker and I fix it.

No LLVM detection.
>

Sure! Because my code base still on postgres 10. After all words about new
build system and cmake here, I have no plan to support not
released versions.  I am not a masochist...


Regards,


Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Andres Freund
On 2018-05-03 09:42:49 +0900, Yuriy Zhuravlev wrote:
> 2018-05-03 9:32 GMT+09:00 Andres Freund :
> > Given that you don't have feature parity this just seems like trolling.
> >
> 
> I have. I have some lacks with .po generation and documentation but all!
> other features same, I even can run tap tests.
> Look into my task issue list
> https://github.com/stalkerg/postgres_cmake/issues it's can increase number
> of lines maximum on 10%.

You detect like a third of the things that the old configure
detected. Most of the comments of converted tests are missing. The
thread safety check definitely aren't comparable. The int128 type checks
aren't comparable. No LLVM detection. The atomics check don't guard
against compilers that allow to reference undefined functions at compile
time.  That's like a 60s scan of what you have. Sorry, but comparing
lines at that state is just bullshit.

Greetings,

Andres Freund



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread Robert Haas
On Wed, May 2, 2018 at 9:28 AM, Alvaro Herrera  wrote:
> I admit I am more concerned about the possibility of bugs than I am
> about providing a performance-related tool.

I agree that if partition pruning has bugs, somebody might want to
turn it off.  On the other hand, when they do, there's a good chance
that they will lose so much performance that they'll still be pretty
sad.  Somebody certainly could have a workload where the pruning
helps, but by a small enough amount that shutting it off is
acceptable.  But I suspect that's a somewhat narrow target.

I'm not going to go to war over this, though.  I'm just telling you
what I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Yuriy Zhuravlev
2018-05-03 9:32 GMT+09:00 Andres Freund :

> On 2018-05-03 09:29:32 +0900, Yuriy Zhuravlev wrote:
> > >
> > > I don't think that unsubstantiated hyperbole is the right way to
> > > approach the task of convincing the community to adopt the approach
> > > you prefer.
> >
> >
> > It's not a hyperbole it's fact and I even talked about it on conference.
> > You should just compare all my cmake files with Makefile+.in+.m4 (and
> msvc
> > folder) it was significant reduce code to maintain.
> > Anyway all my intention in this field it's to reduce pain and reduce
> suppor
> > time for build system.
> > Curren state:
> >
> > cat `find ./ | grep '\.in\|\.m4\|Makefile\|\/msvc\/'` | wc
> >  22942   76111  702163
> >
> > cat `find ./ | grep 'CMakeLists\|\.cmake'` | wc
> >   9160   16604  278061
>
> Given that you don't have feature parity this just seems like trolling.
>

I have. I have some lacks with .po generation and documentation but all!
other features same, I even can run tap tests.
Look into my task issue list
https://github.com/stalkerg/postgres_cmake/issues it's can increase number
of lines maximum on 10%.


Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Andres Freund
On 2018-05-03 09:29:32 +0900, Yuriy Zhuravlev wrote:
> >
> > I don't think that unsubstantiated hyperbole is the right way to
> > approach the task of convincing the community to adopt the approach
> > you prefer.
> 
> 
> It's not a hyperbole it's fact and I even talked about it on conference.
> You should just compare all my cmake files with Makefile+.in+.m4 (and msvc
> folder) it was significant reduce code to maintain.
> Anyway all my intention in this field it's to reduce pain and reduce suppor
> time for build system.
> Curren state:
> 
> cat `find ./ | grep '\.in\|\.m4\|Makefile\|\/msvc\/'` | wc
>  22942   76111  702163
> 
> cat `find ./ | grep 'CMakeLists\|\.cmake'` | wc
>   9160   16604  278061

Given that you don't have feature parity this just seems like trolling.


> and also, I use code style when a source file names every time on new
> line... it's serious increase numbers of line.
> If compare the same style as in Makefile it will be ~3000 (you can just
> compare words ;) )

Right, because m4 is uses so few lines.



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Yuriy Zhuravlev
>
> I don't think that unsubstantiated hyperbole is the right way to
> approach the task of convincing the community to adopt the approach
> you prefer.


It's not a hyperbole it's fact and I even talked about it on conference.
You should just compare all my cmake files with Makefile+.in+.m4 (and msvc
folder) it was significant reduce code to maintain.
Anyway all my intention in this field it's to reduce pain and reduce suppor
time for build system.
Curren state:

cat `find ./ | grep '\.in\|\.m4\|Makefile\|\/msvc\/'` | wc
 22942   76111  702163

cat `find ./ | grep 'CMakeLists\|\.cmake'` | wc
  9160   16604  278061

and also, I use code style when a source file names every time on new
line... it's serious increase numbers of line.
If compare the same style as in Makefile it will be ~3000 (you can just
compare words ;) )

Regards.


Re: Unportable code in autoprewarm.c

2018-05-02 Thread Robert Haas
On Wed, May 2, 2018 at 3:21 PM, Tom Lane  wrote:
> Is there a reason why this record count needs to be int64 rather than
> plain int, and if so what?  This code is not exactly well documented,
> but it looks to me like the number of records should be bounded by
> NBuffers, which is an int and is unlikely ever to not be an int.
> So I'm inclined to just flush autoprewarm.c's use of int64 counters
> altogether.

I don't know of a reason not to make that change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Michael Paquier
On Wed, May 02, 2018 at 07:03:21PM -0400, Tom Lane wrote:
> It's only ~100 bytes per stack level.  I think under normal loads
> nobody would notice.  If you're worried about cross-transaction
> memory consumption, our various caches tend to be a lot worse.

Perhaps, that's one reason why people drop connections from time to time
to the server even with a custom pooler.  I am wondering if we are going
to have complains about "performance regressions" found after upgrading
to Postgres 11 for deployments which rely on complicated PL call stacks,
or complains about the OOM killer though.  Getting to review large
procedures stacks can be a pain for application developers.
--
Michael


signature.asc
Description: PGP signature


Re: A few warnings on Windows

2018-05-02 Thread Michael Paquier
On Thu, May 03, 2018 at 09:03:15AM +1200, Thomas Munro wrote:
> On Thu, May 3, 2018 at 8:01 AM, Tom Lane  wrote:
> Build succeeded.
> 0 Warning(s)
> 0 Error(s)
> 
> Huzzah!

Great work.  This has been some time...
--
Michael


signature.asc
Description: PGP signature


Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread David Rowley
On 3 May 2018 at 11:38, David G. Johnston  wrote:
> Maybe "Partition Filtering" (I'm disliking selection, I'm thinking we must
> always select partitions)

I don't see why "Filtering" is any different from pruning, they both
imply removing something that was once there. What I'm saying is, that
it's backward to think of what we have now as pruning, so I don't
think renaming it to "partition filtering" addresses my concern.

FWIW, I'm not set on changing this. I just want to discuss this now so
that the chances of having regrets about this later are reduced.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread David G. Johnston
On Wed, May 2, 2018 at 4:06 PM, David Rowley 
wrote:

> On 1 May 2018 at 21:44, Amit Langote 
> wrote:
> > About the patch in general, it seems like the newly added documentation
> > talks about "Partition Pruning" as something that *replaces* constraint
> > exclusion.  But, I think "Partition Pruning" is not the thing that
> > replaces constraint exclusion.
>
> Just thinking about this a bit more. I've become a bit concerned that
> we've completely misnamed this feature. It's true that at the moment
> we build RelOptInfos for all partitions then eliminate what we can,
> but the new algorithm that we've been calling "partition pruning" is
> really not pruning anything at all, it's selecting the smallest set of
> matching partitions. It's only the current usage of the algorithm
> that's using it that way, and I kinda hope to change that for PG12.
>
> Isn't the whole thing better to be named "partition selection"?


​The user-exposed Name/GUC need (and in some ways should) ​not reflect the
implementation.  Partitioning creates a tree and during planning and
execution we prune those branches/paths from the tree that are not going to
yield fruit.  Its not like you can outright ignore their existence so at
some point you choose to ignore them which is a form of pruning.

Writing that I can support partition_pruning on technical grounds but to
what extent are we alienating the international community that we serve?

Maybe "Partition Filtering" (I'm disliking selection, I'm thinking we must
always select partitions)

Then again a Google search suggests we will be keeping good company by
sticking with "Partition Pruning" - any language dynamic is probably
overcome through extent of use.

On the whole I'd stick with what we've got.

David J.


Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 10:08 AM, Tom Lane  wrote:
> I don't have any way to test this, but it looks plausible, so pushed.
>
> (I note you forgot to run autoheader, btw.)

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread David Rowley
On 1 May 2018 at 21:44, Amit Langote  wrote:
> About the patch in general, it seems like the newly added documentation
> talks about "Partition Pruning" as something that *replaces* constraint
> exclusion.  But, I think "Partition Pruning" is not the thing that
> replaces constraint exclusion.

Just thinking about this a bit more. I've become a bit concerned that
we've completely misnamed this feature. It's true that at the moment
we build RelOptInfos for all partitions then eliminate what we can,
but the new algorithm that we've been calling "partition pruning" is
really not pruning anything at all, it's selecting the smallest set of
matching partitions. It's only the current usage of the algorithm
that's using it that way, and I kinda hope to change that for PG12.

Isn't the whole thing better to be named "partition selection"?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Tom Lane
Michael Paquier  writes:
> With connection poolers letting the connections to the server be around
> for a long time, wouldn't it be an issue to let this much memory live
> longer than the transaction context?  The deeper the stack, the more
> memory consumed, hence the more OS cache that PostgreSQL cannot use.  So
> this could impact performance for some loads.  I would vote for cleaning
> up this memory instead of letting it live unused in TopMemoryContext.

It's only ~100 bytes per stack level.  I think under normal loads
nobody would notice.  If you're worried about cross-transaction
memory consumption, our various caches tend to be a lot worse.

regards, tom lane



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Michael Paquier
On Wed, May 02, 2018 at 05:20:37PM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Yes, that was the idea.  Here is an adjusted patch.
> 
> Looks OK to me as far as the leak issue goes.  I have no opinion on
> whether this is adequate in respect to cleanup-after-error problems.

With connection poolers letting the connections to the server be around
for a long time, wouldn't it be an issue to let this much memory live
longer than the transaction context?  The deeper the stack, the more
memory consumed, hence the more OS cache that PostgreSQL cannot use.  So
this could impact performance for some loads.  I would vote for cleaning
up this memory instead of letting it live unused in TopMemoryContext.
--
Michael


signature.asc
Description: PGP signature


Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Catalin Iacob
On Wed, May 2, 2018 at 5:44 PM, Robert Haas  wrote:
> I don't think that unsubstantiated hyperbole is the right way to
> approach the task of convincing the community to adopt the approach
> you prefer.  I don't see that any compelling evidence has been
> presented that a cmake-based solution would really save thousands of
> lines of code.

Let me try to list the advantages as I see them.

* ability to use ninja
  ** meson's requirement to use ninja might be a disadvantage but
ability to use is definitely good
  ** faster than make - the difference is really noticeable
  ** good dependency story, parallel everything
  ** simply a very nice developer experience, for example the screen
is not filled with scrolling lines instead progress updates shown as
x/y files to go, currently at file z; try it and you'll see what I
mean
  ** I got interested in the ninja for PG and therefore CMake or meson
after trying clang-cl.exe for PG on Windows. clang-cl.exe is a drop in
open source replacement for Microsoft's cl.exe but using it does not
interact well with the fact that MSBuild runs only one cl.exe with
lots of .c files as input and expects cl.exe to handle the parallelism
while clang-cl.exe does not handle any parallelism taking the position
that the build system should handle that. Being able to invoke
clang-cl.exe from ninja instead of MSBuild would make fast compilation
with clang-cl.exe easy while now only slow serial compilation is
possible.

* IDE integration:
  ** cmake and meson can generate XCode and Visual Studio, granted
Visual Studio already works via the MSVC scripts
  ** CLion can consume cmake giving a good IDE story on Linux which PG
currently lacks

* get rid of the ad-hoc MSVC generation Perl scripts
  ** granted, I looked at those recently in the clang-cl context above
and they're reasonably understandable/approachable even without
knowing too much Perl

* appeal to new developers
  ** I think it's not a controversial statement that, as time passes,
autotools and make syntax are seen more and more as arcane things that
only old beards know how to handle and that the exciting stuff moved
elsewhere; in the long run this is a real problem
  ** on the other hand, as an autotools almost complete novice, after
reading some autotools docs, I was pleasantly surprised at how small
and easy to follow Andres' build patch adding LLVM and C++ support
was, especially as it's doing big, non conventional changes: add
support for another compiler but in a specific "emit LLVM bitcode"
mode, add support for C++ etc. So autoconf ugliness is not that big of
a deal but perception does matter.

* PGXS on Windows
  ** could be solvable without moving wholesale

>From the above, I would rate ninja as a high nice to have, IDE, PGXS
on Windows and new developers as medium high nice to haves (but see
below for long term concerns) and no MSVC Perl scripts as low nice to
have.

I started the thread as it seemed to me energy was consumed to move to
another system (proof of concept and discussions) while it wasn't even
clarified whether a new system isn't a complete no go due to the old
platforms PG supports. I find Tom's and Robert's position of
"acceptable but we would need to see real benefits as there definitely
are real downsides" perfectly reasonable. The build system dictating
platform support would indeed be the tail wagging the dog. Personally,
with the current information I'd not vote for switching to another
system, mainly because I ultimately think developer convenience should
not trump end user benefits.

I do have a real concern about the long term attractiveness of the
project to new developers, especially younger ones as time passes.
It's not a secret that people will just avoid creaky old projects, and
for Postgres old out of fashion things do add up: autoconf, raw make,
Perl for tests, C89, old platform support. I have no doubt that the
project is already loosing competent potential developers due to this.
One can say this is superficial and those developers should look at
the important things but that does not change reality that some will
just say pass because of dislike of the old technologies I mentioned.
Personally, I can say that if the project were still in CVS I would
probably not bother as I just don't have energy to learn an inferior
old version control system especially as I see version control as
fundamental to a developer. I don't feel the balance between
recruiting new developers and end user benefits tilted enough to
replace the build system but maybe in some years that will be the
case.



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Tom Lane
Andres Freund  writes:
> Now you could argue that we could just rewrite to non-recursive
> make. But that'd be nearly as much work as migrating to another
> buildsystem.

I'm sure it'd be a significant amount of work ... but it wouldn't require
redesigning any configuration or portability hacks, nor any change in
tool prerequisites, and at least in principle it wouldn't require changes
in users' build scripts.  So I think claiming it's as expensive as
migrating to cmake is almost certainly wrong.

(I don't know offhand if tricks like "build plpython only" would
still work unchanged, but that's a sufficiently niche usage that
I'd not be too concerned about making those people adapt their
scripts.)

regards, tom lane



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro  writes:
> On Thu, May 3, 2018 at 2:30 AM, Tom Lane  wrote:
>> Do you really need the pg_crc32c_armv8_choose_dummy global variable?

> True.  Of course I needed an interesting length too...

I don't have any way to test this, but it looks plausible, so pushed.

(I note you forgot to run autoheader, btw.)

regards, tom lane



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Andres Freund
On 2018-05-02 23:43:50 +0200, Hartmut Holzgraefe wrote:
> On 02.05.2018 17:44, Robert Haas wrote:
> > But having parallel make work better and more efficiently
> > and with fewer hard-to-diagnose failure modes would definitely be
> > nice.
> 
> that's especially a thing I haven't seen in "our" environment,
> this was an area where autotools and cmake didn't really differ,
> at least not for the Unix/Makefile side of things.

Recursive make like ours can't do full parallelism because dependencies
can't be fully expressed. With cmake that's not an issue. And its ninja
generator ends up being considerably faster than makefiles.

Now you could argue that we could just rewrite to non-recursive
make. But that'd be nearly as much work as migrating to another
buildsystem.

Greetings,

Andres Freund



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Hartmut Holzgraefe

On 02.05.2018 17:44, Robert Haas wrote:

But having parallel make work better and more efficiently
and with fewer hard-to-diagnose failure modes would definitely be
nice.


that's especially a thing I haven't seen in "our" environment,
this was an area where autotools and cmake didn't really differ,
at least not for the Unix/Makefile side of things.

The only thing about parallelism I remember that it sometimes
doesn't work well with the progress percentage output of cmake
generated makefiles ... but that's purely cosmetic.

--
hartmut



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Tom Lane
Peter Eisentraut  writes:
> Yes, that was the idea.  Here is an adjusted patch.

Looks OK to me as far as the leak issue goes.  I have no opinion on
whether this is adequate in respect to cleanup-after-error problems.

regards, tom lane



Re: A few warnings on Windows

2018-05-02 Thread Tom Lane
Thomas Munro  writes:
> One more problem.  whelk builds against Python 3.6 and says:

> c:\users\pgbf\appdata\local\programs\python\python36-32\include\pyconfig.h(174):
> warning C4142: benign redefinition of type
> (src/pl/plpython/plpy_elog.c)
> [C:\buildfarm\buildenv\HEAD\pgsql.build\plpython3.vcxproj]

> Does anyone know what line 174 of pyconfig.h happens to say?

We'll have to ask the machine's owner I guess (cc'd here).
In the python 3.6 installation I have at hand, there's nothing
except #define's in pyconfig.h ... but the Windows version must
do things differently.

regards, tom lane



Re: A few warnings on Windows

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 8:01 AM, Tom Lane  wrote:
> Seems reasonable to me.  Pushed, we'll see what the buildfarm thinks.

Build succeeded.
0 Warning(s)
0 Error(s)

Huzzah!

One more problem.  whelk builds against Python 3.6 and says:

c:\users\pgbf\appdata\local\programs\python\python36-32\include\pyconfig.h(174):
warning C4142: benign redefinition of type
(src/pl/plpython/plpy_elog.c)
[C:\buildfarm\buildenv\HEAD\pgsql.build\plpython3.vcxproj]

Does anyone know what line 174 of pyconfig.h happens to say?

thrips builds against Python 3.5 and doesn't show any warnings.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Peter Eisentraut
On 5/2/18 12:30, Tom Lane wrote:
> I'm not particularly.  It seems impossible that _SPI_stack could grow
> faster than the machine's actual stack, which would mean (on typical
> setups) that you couldn't get more than perhaps 10MB of _SPI_stack
> before hitting a stack-overflow error.  That's a lot by some measures,
> but I don't think it's enough to cripple anything if we don't free it.
> 
> Also, if someone is routinely using pretty deep SPI nesting, they'd
> probably be happier that we do keep the stack rather than repeatedly
> creating and enlarging it.

Yes, that was the idea.  Here is an adjusted patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From bcb8d9e495503b281fe5a606153a82d81fe0406a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 May 2018 16:50:03 -0400
Subject: [PATCH v2] Fix SPI error cleanup and memory leak

Since the SPI stack has been moved from TopTransactionContext to
TopMemoryContext, setting _SPI_stack to NULL in AtEOXact_SPI() leaks
memory.  In fact, we don't need to do that anymore: We just leave the
allocated stack around for the next SPI use.

Also, refactor the SPI cleanup so that it is run both at transaction end
and when returning to the main loop on an exception.  The latter is
necessary when a procedure calls a COMMIT or ROLLBACK command that
itself causes an error.
---
 src/backend/executor/spi.c  | 30 --
 src/backend/tcop/postgres.c |  2 ++
 src/include/executor/spi.h  |  1 +
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 08f6f67a15..22dd55c378 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -260,35 +260,37 @@ SPI_rollback(void)
_SPI_current->internal_xact = false;
 }
 
+/*
+ * Clean up SPI state.  Called on transaction end (of non-SPI-internal
+ * transactions) and when returning to the main loop on error.
+ */
+void
+SPICleanup(void)
+{
+   _SPI_current = NULL;
+   _SPI_connected = -1;
+   SPI_processed = 0;
+   SPI_lastoid = InvalidOid;
+   SPI_tuptable = NULL;
+}
+
 /*
  * Clean up SPI state at transaction commit or abort.
  */
 void
 AtEOXact_SPI(bool isCommit)
 {
-   /*
-* Do nothing if the transaction end was initiated by SPI.
-*/
+   /* Do nothing if the transaction end was initiated by SPI. */
if (_SPI_current && _SPI_current->internal_xact)
return;
 
-   /*
-* Note that memory contexts belonging to SPI stack entries will be 
freed
-* automatically, so we can ignore them here.  We just need to restore 
our
-* static variables to initial state.
-*/
if (isCommit && _SPI_connected != -1)
ereport(WARNING,
(errcode(ERRCODE_WARNING),
 errmsg("transaction left non-empty SPI stack"),
 errhint("Check for missing \"SPI_finish\" 
calls.")));
 
-   _SPI_current = _SPI_stack = NULL;
-   _SPI_stack_depth = 0;
-   _SPI_connected = -1;
-   SPI_processed = 0;
-   SPI_lastoid = InvalidOid;
-   SPI_tuptable = NULL;
+   SPICleanup();
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3828cae921..f4133953be 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_type.h"
 #include "commands/async.h"
 #include "commands/prepare.h"
+#include "executor/spi.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -3941,6 +3942,7 @@ PostgresMain(int argc, char *argv[],
WalSndErrorCleanup();
 
PortalErrorCleanup();
+   SPICleanup();
 
/*
 * We can't release replication slots inside AbortTransaction() 
as we
diff --git a/src/include/executor/spi.h b/src/include/executor/spi.h
index e5bdaecc4e..143a89a16c 100644
--- a/src/include/executor/spi.h
+++ b/src/include/executor/spi.h
@@ -163,6 +163,7 @@ extern void SPI_start_transaction(void);
 extern void SPI_commit(void);
 extern void SPI_rollback(void);
 
+extern void SPICleanup(void);
 extern void AtEOXact_SPI(bool isCommit);
 extern void AtEOSubXact_SPI(bool isCommit, SubTransactionId mySubid);
 

base-commit: 40f52b16dd31aa9ddc3bd42daa78459562693567
-- 
2.17.0



Re: A few warnings on Windows

2018-05-02 Thread Tom Lane
Thomas Munro  writes:
> The only remaining warnings on those machines now come from the fact
> that our port_win32.h and Perl's XSUB.h both want to define macros to
> define macros for libc functions like mkdir etc.  plperl.h already
> seems to deal with other similar kinds of ugliness.  Perhaps if
> PG_NEED_PERL_XSUB_H is defined we should undefine port_win32.h's
> libc-stealing macros before including it?  Either way XSUB.h's
> definitions win.  That should get us to 0 warnings.  See attached (not
> tested).

Seems reasonable to me.  Pushed, we'll see what the buildfarm thinks.

regards, tom lane



Unportable code in autoprewarm.c

2018-05-02 Thread Tom Lane
I've been going through compiler warnings from the buildfarm, and
I notice jacana is unhappy about this:

/* First line of the file is a record count. */
if (fscanf(file, "<<" INT64_FORMAT ">>\n", _elements) != 1)

It's entirely correct to complain, because we only guarantee that
INT64_FORMAT works with snprintf, not with the scanf family of
functions.

Is there a reason why this record count needs to be int64 rather than
plain int, and if so what?  This code is not exactly well documented,
but it looks to me like the number of records should be bounded by
NBuffers, which is an int and is unlikely ever to not be an int.
So I'm inclined to just flush autoprewarm.c's use of int64 counters
altogether.

regards, tom lane



Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE

2018-05-02 Thread Andrew Dunstan
On Wed, May 2, 2018 at 11:48 AM, Tom Lane  wrote:
> tl;dr: we lack buildfarm coverage of CATCACHE_FORCE_RELEASE.
> This is probably bad.  It might be okay to just add that to
> prion's configuration;

Will do that pronto.


cheers

andre

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Clock with Adaptive Replacement

2018-05-02 Thread Vladimir Sitnikov
>If somebody produced a trace showing the buffer lookups in order

To get things moving, I've created a DTrace script that captures buffer
reads:
https://github.com/vlsi/pgsqlstat/blob/pgsqlio/pgsqlio

Is it something that can be used to capture live traces?

Sample output can be seen here:
https://github.com/vlsi/pgsqlstat/tree/pgsqlio#pgsqlio

For instance (time units are nano-seconds):

 Timestamp  Elapsed  ForkNum BlockNum  TableSp DataBase Relation Cached
 3271837060212 156301 166316385 1259  1
 32718388813748820500 16631638516604  0
 3271840973321 436800 16631638516604  1

If DTrace is acceptable, trace format might be adjusted for easier
consumption by https://github.com/ben-manes/caffeine/wiki/Simulator

Vladimir


Re: Sort performance cliff with small work_mem

2018-05-02 Thread Peter Geoghegan
On Wed, May 2, 2018 at 11:06 AM, Tom Lane  wrote:
>> -1 from me. What about the case where only some tuples are massive?
>
> Well, what about it?  If there are just a few wide tuples, then the peak
> memory consumption will depend on how many of those happen to be in memory
> at the same time ... but we have zero control over that in the merge
> phase, so why sweat about it here?  I think Heikki's got a good idea about
> setting a lower bound on the number of tuples we'll hold in memory during
> run creation.

We don't have control over it, but I'm not excited about specifically
going out of our way to always use more memory in dumptuples() because
it's no worse than the worst case for merging. I am supportive of the
idea of making sure that the amount of memory left over for tuples is
reasonably in line with memtupsize at the point that the sort starts,
though.

-- 
Peter Geoghegan



Re: Sort performance cliff with small work_mem

2018-05-02 Thread Peter Geoghegan
On Wed, May 2, 2018 at 10:43 AM, Heikki Linnakangas  wrote:
> I'm not sure what you could derive that from, to make it less arbitrary. At
> the moment, I'm thinking of just doing something like this:
>
> /*
>  * Minimum amount of memory reserved to hold the sorted tuples in
>  * TSS_BUILDRUNS phase.  This specifies a minimum size for the merge runs,
>  * when work_mem is very small.
>  */
> #define MIN_TUPLE_MEMORY(32 * 1024)

If you end up doing something like this, I suggest that you also
change this code to simply assign 1024 (or maybe a new preprocessor
constant):

state->memtupsize = Max(1024, ALLOCSET_SEPARATE_THRESHOLD /
sizeof(SortTuple) + 1);

The ALLOCSET_SEPARATE_THRESHOLD part can become a static assertion.

-- 
Peter Geoghegan



Re: Sort performance cliff with small work_mem

2018-05-02 Thread Peter Geoghegan
On Wed, May 2, 2018 at 10:43 AM, Heikki Linnakangas  wrote:
> Independently of this, perhaps we should put in special case in
> dumptuples(), so that it would never create a run with fewer than maxTapes
> tuples. The rationale is that you'll need to hold that many tuples in memory
> during the merge phase anyway, so it seems silly to bail out before that
> while building the initial runs. You're going to exceed work_mem by the
> roughly same amount anyway, just in a different phase. That's not the case
> in this example, but it might happen when sorting extremely wide tuples.

-1 from me. What about the case where only some tuples are massive?

-- 
Peter Geoghegan



Re: Sort performance cliff with small work_mem

2018-05-02 Thread Heikki Linnakangas

On 02/05/18 19:41, Tom Lane wrote:

Robert Haas  writes:

On Wed, May 2, 2018 at 11:38 AM, Heikki Linnakangas  wrote:

To fix, I propose that we change the above so that we always subtract
tapeSpace, but if there is less than e.g. 32 kB of memory left after that
(including, if it went below 0), then we bump availMem back up to 32 kB. So
we'd always reserve 32 kB to hold the tuples, even if that means that we
exceed 'work_mem' slightly.



Sounds very reasonable.


Agreed.  I think that was my code to start with, and the issue certainly
didn't occur to me at the time.

I don't like the idea of using hardwired "32kB" though; some multiple
of TAPE_BUFFER_OVERHEAD seems more plausible.  Perhaps MINORDER times
TAPE_BUFFER_OVERHEAD would be good?


I don't think the amount that we reserve to hold the tuples should 
depend on those things. The function is "allocating" memory for the tape 
buffers, yes, but now we're talking about keeping some memory for the 
tuples, while quicksorting the initial runs. That seems independent from 
the number of tapes or the tape buffer size.


I'm not sure what you could derive that from, to make it less arbitrary. 
At the moment, I'm thinking of just doing something like this:


/*
 * Minimum amount of memory reserved to hold the sorted tuples in
 * TSS_BUILDRUNS phase.  This specifies a minimum size for the merge runs,
 * when work_mem is very small.
 */
#define MIN_TUPLE_MEMORY(32 * 1024)

Independently of this, perhaps we should put in special case in 
dumptuples(), so that it would never create a run with fewer than 
maxTapes tuples. The rationale is that you'll need to hold that many 
tuples in memory during the merge phase anyway, so it seems silly to 
bail out before that while building the initial runs. You're going to 
exceed work_mem by the roughly same amount anyway, just in a different 
phase. That's not the case in this example, but it might happen when 
sorting extremely wide tuples.


- Heikki



GSOC 2018

2018-05-02 Thread Joshua D. Drake

-hackers,

Who is coordinating GSOC this year?


Thanks,


JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Sort performance cliff with small work_mem

2018-05-02 Thread Peter Geoghegan
On Wed, May 2, 2018 at 8:38 AM, Heikki Linnakangas  wrote:
> With a small work_mem values, maxTapes is always 6, so tapeSpace is 48 kB.
> With a small enough work_mem, 84 kB or below in this test case, there is not
> enough memory left at this point, so we don't subtract tapeSpace. However,
> with a suitably evil test case, you can get arbitrary close to the edge, so
> that we will subtract away almost all the remaining memory above, leaving
> only a few bytes for the tuples. In this example case, with work_mem='85
> kB', each quicksorted run consists of only 15 tuples on average.

This is an extreme example of the memtuples array and memory for
tuples becoming unbalanced. You'll have 1024 memtuples (24KiB), which
is rather a lot more than the 15 tuples that you can fit in this
example case.

I don't feel strongly about it, but arguably the minimum additional
amount of memory should be big enough that you have some chance of
using all 1024 SortTuples (the whole memtuples array).

-- 
Peter Geoghegan



Re: Transform for pl/perl

2018-05-02 Thread Dagfinn Ilmari Mannsåker
Peter Eisentraut  writes:

> These two items are now outstanding:
>
> On 4/10/18 07:33, Dagfinn Ilmari Mannsåker wrote:
>> 2) jsonb scalar values are passed to the plperl function wrapped in not
>>one, but _two_ layers of references
>
> I don't understand this one, or why it's a problem, or what to do about it.

It means that if you call a jsonb-transforming pl/perl function like

   select somefunc(jsonb '42');

it receives not the scalar 42, but reference to a reference to the
scalar (**int instead of an int, in C terms).  This is not caught by the
current round-trip tests because the output transform automatically
dereferences any number of references on the way out again.

The fix is to reshuffle the newRV() calls in Jsonb_to_SV() and
jsonb_to_plperl().  I am working on a patch (and improved tests) for
this, but have not have had time to finish it yet.  I hope be able to in
the next week or so.

>> 3) jsonb numeric values are passed as perl's NV (floating point) type,
>>losing precision if they're integers that would fit in an IV or UV.
>
> This seems fixable, but perhaps we need to think through whether this
> will result in other strange behaviors.

Nubers > 2⁵³ are not "interoperable" in the sense of the JSON spec,
because JavaScript only has doubles, but it seems desirable to preserve
whatever precision one reasonably can, and I can't think of any
downsides.  We already support the full numeric range when processing
JSONB in SQL, it's just in the PL/Perl transform (and possibly
PL/Python, I didn't look) we're losing precision.

Perl can also be configured to use long double or __float128 (via
libquadmath) for its NV type, but I think preserving 64bit integers when
building against a Perl with a 64bit integer type would be sufficient.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law



Re: Is there a memory leak in commit 8561e48?

2018-05-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 5/1/18 11:42, Tom Lane wrote:
>> Peter Eisentraut  writes:
>> That seems like an odd way to approach this.  Why not just remove the
>> reset of _SPI_stack and _SPI_stack_depth, so as to subtract code rather
>> than adding it --- that is, make it actually work like you mistakenly
>> thought it did?  If we're going to keep the stack in TopMemoryContext,
>> there's no need to thrash it on every transaction.

> Yes, that seems attractive.

> Are we concerned about the case where someone runs a very deeply nested
> SPI setup once in a session, creating a large stack allocation, which
> then persists for the rest of the session?

I'm not particularly.  It seems impossible that _SPI_stack could grow
faster than the machine's actual stack, which would mean (on typical
setups) that you couldn't get more than perhaps 10MB of _SPI_stack
before hitting a stack-overflow error.  That's a lot by some measures,
but I don't think it's enough to cripple anything if we don't free it.

Also, if someone is routinely using pretty deep SPI nesting, they'd
probably be happier that we do keep the stack rather than repeatedly
creating and enlarging it.

regards, tom lane



Re: [HACKERS] Clock with Adaptive Replacement

2018-05-02 Thread Robert Haas
On Tue, May 1, 2018 at 6:37 PM, Peter Geoghegan  wrote:
> This seems to be an old idea.

I'm not too surprised ... this area has been well-studied.

> I've always had a problem with the 8GB/16GB upper limit on the size of
> shared_buffers. Not because it's wrong, but because it's not something
> that has ever been explained. I strongly suspect that it has something
> to do with usage_count saturation, since it isn't reproducible with
> any synthetic workload that I'm aware of. Quite possibly because there
> are few bursty benchmarks.

I've seen customer have very good luck going higher if it lets all the
data fit in shared_buffers, or at least all the data that is accessed
with any frequency.  I think it's useful to imagine a series of
concentric working sets - maybe you have 1GB of the hottest data, 3GB
of data that is at least fairly hot, 10GB of data that is at least
somewhat hot, and another 200GB of basically cold data.  Increasing
shared_buffers in a way that doesn't let the next "ring" fit in
shared_buffers isn't likely to help very much.  If you have 8GB of
shared_buffers on this workload, going to 12GB is probably going to
help -- that should be enough for the 10GB of somewhat-hot stuff and a
little extra so that the somewhat-hot stuff doesn't immediately start
getting evicted if some of the cold data is accessed.  Similarly,
going from 2GB to 4GB should be a big help, because now the fairly-hot
stuff should stay in cache.  But going from 4GB to 6GB or 12GB to 16GB
may not do very much.  It may even hurt, because the duplication
between shared_buffers and the OS page cache means an overall
reduction in available cache space.  If for example you've got 16GB of
memory and shared_buffers=2GB, you *may* be fitting all of the
somewhat-hot data into cache someplace; bumping shared_buffers=4GB
almost certainly means that will no longer happen, causing performance
to tank.

I don't really think that the 8GB rule of thumb is something that
originates in any technical limitation of PostgreSQL or Linux.  First
of all it's just a rule of thumb -- the best value in a given
installation can easily be something completely different.  Second, to
the extent that it's a useful rule of thumb, I think it's really a
guess about what people's working set looks like: that going from 4GB
to 8GB, say, significantly increases the chances of fitting the
next-larger, next-cooler working set entirely in shared_buffers, going
from 8GB to 16GB is less likely to accomplish this, and going from
16GB to 32GB probably won't.  To a lesser extent, it's reflective of
the point where scanning shared buffers to process relation drops gets
painful, and the point where an immediate checkpoint suddenly dumping
that much data out to the OS all at once starts to overwhelm the I/O
subsystem for a significant period of time.  But I think those really
are lesser effects.  My guess is that the big effect is balancing
increased hit ratio vs. increased double buffering.

> I agree that wall-clock time is a bad approach, actually. If we were
> to use wall-clock time, we'd only do so because it can be used to
> discriminate against a thing that we actually care about in an
> approximate, indirect way. If there is a more direct way of
> identifying correlated accesses, which I gather that there is, then we
> should probably use it.

For a start, I think it would be cool if somebody just gathered traces
for some simple cases.  For example, consider a pgbench transaction.
If somebody produced a trace showing the buffer lookups in order
annotated as heap, index leaf, index root, VM page, FSM root page, or
whatever.  Examining some other simple, common cases would probably
help us understand whether it's normal to bump the usage count more
than once per buffer for a single scan, and if so, exactly why that
happens.  If the code knows that it's accessing the same buffer a
second (or subsequent) time, it could pass down a flag saying not to
bump the usage count again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-02 Thread Alexander Korotkov
Hi!

Thank you for your attention on this subject.  It's definitely right,
that documentation needs to be revised in these places.

On Wed, May 2, 2018 at 6:43 PM, Justin Pryzby  wrote:

> index eabe2a9..e305de9 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -1893,14 +1893,16 @@ include_dir 'conf.d'
>
>
> 
> -When no tuples were deleted from the heap, B-tree indexes might
> still
> -be scanned during VACUUM cleanup stage by two
> -reasons.  The first reason is that B-tree index contains deleted
> pages
> -which can be recycled during cleanup.  The second reason is that
> B-tree
> -index statistics is stalled.  The criterion of stalled index
> statistics
> -is number of inserted tuples since previous statistics collection
> -is greater than vacuum_cleanup_index_
> scale_factor
> -fraction of total number of heap tuples.
> +When no tuples were deleted from the heap, B-tree indexes are
> still
> +scanned during VACUUM cleanup stage unless two
> +conditions are met: the index contains no deleted pages which can
> be
> +recycled during cleanup; and, the index statistics are not stale.
> +Index statistics are considered stale unless
> +vacuum_cleanup_index_scale_factor
> +is set to a non-negative value, and the number of inserted tuples
> since
> +the previous statistics collection is less than that fraction of
> the
> +total number of heap tuples.  The default is -1, which means index
> +scans during VACUUM cleanup are not skipped.
> 
>
>   
>

The default value of vacuum_cleanup_index_scale_factor GUC is 0.1,
that means that 10% of tuples need to be inserted in order to trigger
vacuum cleanup.  See guc.c

{
{"vacuum_cleanup_index_scale_factor", PGC_SIGHUP, AUTOVACUUM,
gettext_noop("Number of tuple inserts prior to index cleanup as a fraction
of reltuples."),
NULL
},
_cleanup_index_scale_factor,
0.1, 0.0, 100.0,
NULL, NULL, NULL
},


Default value of vacuum_cleanup_index_scale_factor reloption is -1,
it means that by default value of vacuum_cleanup_index_scale_factor GUC
is used.  See following piece of code in _bt_vacuum_needs_cleanup().

cleanup_scale_factor = (relopts &&
relopts->vacuum_cleanup_index_scale_factor >= 0)
? relopts->vacuum_cleanup_index_scale_factor
: vacuum_cleanup_index_scale_factor;

In order to have vacuum cleanup scan every time, one should set
vacuum_cleanup_index_scale_factor GUC to 0.  Assuming this,
we need to replace "cleanup_scale_factor < 0" to
"cleanup_scale_factor <= 0" in the following condition:

if (cleanup_scale_factor < 0 ||
metad->btm_last_cleanup_num_heap_tuples < 0 ||
info->num_heap_tuples > (1.0 + cleanup_scale_factor) *
metad->btm_last_cleanup_num_heap_tuples)
result = true;

Another issue is that we by default store -1 in
metad->btm_last_cleanup_num_heap_tuples in order to evade overhead
of meta-page rewrite.  metad->btm_last_cleanup_num_heap_tuples is
set at first btcleanup() call when no tuples were deleted.  Second and
subsequent btcleanup() calls may skip index scan.  This aspect needs
to be properly documented.

I'm going to propose a patch for this subject in a couple of days.
That patch would incorporate some of your changes as well as contain
some changes from me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Sort performance cliff with small work_mem

2018-05-02 Thread Peter Geoghegan
On Wed, May 2, 2018 at 8:46 AM, Robert Haas  wrote:
> On Wed, May 2, 2018 at 11:38 AM, Heikki Linnakangas  wrote:
>> To fix, I propose that we change the above so that we always subtract
>> tapeSpace, but if there is less than e.g. 32 kB of memory left after that
>> (including, if it went below 0), then we bump availMem back up to 32 kB. So
>> we'd always reserve 32 kB to hold the tuples, even if that means that we
>> exceed 'work_mem' slightly.
>
> Sounds very reasonable.

+1

-- 
Peter Geoghegan



Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-05-02 Thread Peter Eisentraut
On 5/2/18 02:26, Amit Langote wrote:
> You're right.  I think that's what you were also saying on the other
> thread, in reply to which I directed you to this thread.  I very clearly
> missed that BEFORE ROW triggers are still disallowed.

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE

2018-05-02 Thread Tom Lane
Amit Langote  writes:
> On 2018/05/02 0:33, Tom Lane wrote:
>> Amit Langote  writes:
>>> While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that
>>> stats_ext test failed with errors for multiple statements that looked like
>>> this:
>>> ERROR:  invalid ndistinct magic 7f7f7f7f (expected a352bfa4)

>> Interesting.  How come the buildfarm hasn't noticed this?  I should
>> think that the CLOBBER_CACHE_ALWAYS animals, as well as the one(s)
>> using -DCATCACHE_FORCE_RELEASE, would have shown failures.

> I too wondered why.  Fwiw, similar failure occurs in PG 10 branch.

Ah, after looking closer I understand that.  First, there isn't any
buildfarm member using CATCACHE_FORCE_RELEASE --- what I was thinking
of is Andrew's prion, which uses RELCACHE_FORCE_RELEASE.  Not the
same thing.

Second, the nature of the bug is that these functions are reading
from a catcache entry immediately after ReleaseSysCache, when they
should do that immediately before.  CLOBBER_CACHE_ALWAYS does not
trigger the problem because it clobbers cache only at invalidation
opportunities.  In the current implementation, ReleaseSysCache per
se is not an invalidation opportunity.

CATCACHE_FORCE_RELEASE does model a real-world hazard, which is
that if we get an invalidation signal for a catcache item *while
it's pinned*, it'd go away as soon as the last pin is released.
Evidently, these code paths do not contain any invalidation
opportunities occurring while the pin on the stats_ext catcache
entry is already held, so CCA can't trigger the problem.  I think
this means that there's no production hazard here, just a violation
of coding convention.  Nonetheless, we certainly should fix it,
since it's easy to imagine future changes that would create a live
hazard of the tuple going away during the ReleaseSysCache call.

tl;dr: we lack buildfarm coverage of CATCACHE_FORCE_RELEASE.
This is probably bad.  It might be okay to just add that to
prion's configuration; I'm not sure whether there's any value
in testing it separately from RELCACHE_FORCE_RELEASE.

regards, tom lane



Re: Sort performance cliff with small work_mem

2018-05-02 Thread Robert Haas
On Wed, May 2, 2018 at 11:38 AM, Heikki Linnakangas  wrote:
> To fix, I propose that we change the above so that we always subtract
> tapeSpace, but if there is less than e.g. 32 kB of memory left after that
> (including, if it went below 0), then we bump availMem back up to 32 kB. So
> we'd always reserve 32 kB to hold the tuples, even if that means that we
> exceed 'work_mem' slightly.

Sounds very reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-02 Thread Robert Haas
On Wed, May 2, 2018 at 11:43 AM, Justin Pryzby  wrote:
> 2nd attempt

That looks like good English to me.  I can't vouch for whether it's
technically correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Robert Haas
On Tue, May 1, 2018 at 8:12 PM, Yuriy Zhuravlev  wrote:
>> That indeed would be an improvement, but maybe we could fix that specific
>> pain point without having to throw away twenty years worth of work?
>
> Indeed! Only a few thousands of lines of code can generate the whole you
> manually wrote, it's the perfect result!

I don't think that unsubstantiated hyperbole is the right way to
approach the task of convincing the community to adopt the approach
you prefer.  I don't see that any compelling evidence has been
presented that a cmake-based solution would really save thousands of
lines of code.  True, some Perl code that we have now to generate
project files and so forth would go away, but I bet we'll end up
adding new code someplace else because of something-or-other that
doesn't work the same under cmake that it does under the current build
system.  For example:

>> re-invention of portability hacks
> This is the main goal for migrating to cmake - most of hacks cmake takes on
> itself.

Whatever hacks cmake *doesn't* handle will have to be reimplemented in
the cmake framework, and frankly, if history is any indication, we'll
be very lucky indeed if whoever submits the cmake patches is willing
to follow up on the things that break over the days, weeks, months,
years that follow the original commit.  More likely, after the first
few commits, or perhaps the first few months, they'll move on to their
next project and leave it to the committers to sort out whatever stuff
turns out to be broken later.  And very likely, too, they'll not
handle all the changes that are needed on the buildfarm side of
things, and maybe the PGXN side of things if that needs changes, and
they certainly won't update every third-party module in existence to
use the cmake framework.  Accepting a patch to support cmake means
some amount of work and adaptation will need to be done by hundreds of
developers on both the core PostgreSQL code base and various other
code bases, open source and proprietary.  Now it's probably not a lot
of work for any individual person, but it's a lot of work and
disruption over all.  It has to be worth it.

Now, I grant that my ears perked up when Andres mentioned making
parallel make work better.  I don't build on Windows so that issue
doesn't affect me personally -- it's great if it can be made to work
better with or without cmake but I don't have a view on the best way
forward.  But having parallel make work better and more efficiently
and with fewer hard-to-diagnose failure modes would definitely be
nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: GSoC 2018: thrift encoding format

2018-05-02 Thread Aleksander Alekseev
Hello Charles,

> Can I assume the data in thrift is always send inside a struct?

Sure!

> I think this question also valid for protobuf?

Right, pg_protobuf assumes that data is always a message which is an
equivalent of Thrift's struct.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Explain buffers wrong counter with parallel plans

2018-05-02 Thread Adrien Nayrat
Hi,

On 05/02/2018 05:22 PM, Robert Haas wrote:
> On Tue, May 1, 2018 at 4:57 PM, Andres Freund  wrote:
>> Robert, you added this as an open item. I don't think it's clear that
>> there's a bug here, and even less clear that it's something new for
>> v11. Am I understanding that right?  Should we close this open item?
> 
> Yeah, I added it in response to the original post, but it sounds like
> this is just another case of somebody being confused by the fact that
> EXPLAIN divides the row counts, startup time, total time, and number
> of rows filtered by the loop count.  So it's not an open item.

To precise, I am not confused by these counters. They do not change between 9.6
and 10.

I am confused by buffers counter which are differents between 9.6 and 10.

In 9.6 gather node reports sum of buffers for main process + workers. In 10,
gather node only reports buffers from the main process.


Thanks,


-- 
Adrien




signature.asc
Description: OpenPGP digital signature


Re: Explain buffers wrong counter with parallel plans

2018-05-02 Thread Robert Haas
On Tue, May 1, 2018 at 4:57 PM, Andres Freund  wrote:
> Robert, you added this as an open item. I don't think it's clear that
> there's a bug here, and even less clear that it's something new for
> v11. Am I understanding that right?  Should we close this open item?

Yeah, I added it in response to the original post, but it sounds like
this is just another case of somebody being confused by the fact that
EXPLAIN divides the row counts, startup time, total time, and number
of rows filtered by the loop count.  So it's not an open item.

That having been said, what I think we ought to do about the confusion
that regularly results from that logic is rip it out, as in the
attached patch.  This would affect not only the parallel query case
but also the inner sides of nested loops, but that doesn't bother me a
bit because I think dividing is unhelpful in both cases.  Note that
explain.depesz.com, for example, goes and re-multiplies the times by
the loop count to get a figure for how much time was spent in that
node; I regularly do the same thing by hand when I read such EXPLAIN
output.  Also, if you did think those cases should be handled
differently, you'd have to explain what should happen when there's a
nested loop that gets run by multiple workers, possibly different
numbers of times in each one.

One problem with this patch is that tools like explain.depesz.com (or
my brain) would need to know if they were getting output from 11+ or
10-, so it might be good to perturb this output in some other way so
as to avoid confusion, like by replacing "loops=" with some other
text.  I don't have a specific proposal for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-loop-division.patch
Description: Binary data


Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-02 Thread Robert Haas
On Tue, May 1, 2018 at 10:30 PM, Justin Pryzby  wrote:
> -When no tuples were deleted from the heap, B-tree indexes might still
> -be scanned during VACUUM cleanup stage by two
> -reasons.  The first reason is that B-tree index contains deleted 
> pages
> -which can be recycled during cleanup.  The second reason is that 
> B-tree
> -index statistics is stalled.  The criterion of stalled index 
> statistics
> -is number of inserted tuples since previous statistics collection
> -is greater than vacuum_cleanup_index_scale_factor
> -fraction of total number of heap tuples.
> +When no tuples were deleted from the heap, B-tree indexes are still
> +scanned during VACUUM cleanup stage unless
> +two conditions are met.  First, if a B-tree index contains no 
> deleted pages
> +which can be recycled during cleanup.  Second, if B-tree
> +index statistics are not stale.  Index statistics are considered 
> stale unless
> +vacuum_cleanup_index_scale_factor is 
> non-negative, and the
> +number of inserted tuples since the previous statistics collection is
> +less than that fraction of the total number of heap tuples.
> +The default is -1, meaning index scan during cleanup is not skipped.

I agree that this documentation needs to be rewritten but your rewrite
doesn't strike me as very good English either.  A sentence of the form
"First, if I like hamburgers." is not correct English.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: lingering references to V0 calling convention

2018-05-02 Thread Heikki Linnakangas

On 23/04/18 14:17, John Naylor wrote:

I noticed one in the SPI docs and tried to look for more. The attached
patch is an attempt at removing the remaining references. The original
SPI example returned a uint64 as a signed int8 SQL type, so I'm not
sure if I handled that correctly.


Committed, thanks!

- Heikki



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Hartmut Holzgraefe

On 02.05.2018 16:22, Tom Lane wrote:

  (-D? really?)


it's worse ... "cmake -L" only produces a alphabetically sorted list
of known -D settings, just listing the names without description.

That's so far behind from what

 ./configure --help

produces.

(And don't get me started on cmake-gui. One day I may even eventually
complete my "autotools-gui" ... https://github.com/hholzgra/autogui )

But at least on most Linux distributions TAB completion now works for
CMake -D options these days ...

--
hartmut



Re: power() function in Windows: "value out of range: underflow"

2018-05-02 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, May 1, 2018 at 12:08 PM, Robert Haas  wrote:
>> That's probably true, but making dev, test, and production boxes
>> similar is generally good practice and users can do as much or as
>> little of it as they find they need in order to avoid getting burned.
>> They can't do anything about behavior changes we inject into minor
>> releases.

> ​+1; this doesn't seem clear-cut and important enough to deviate from the
> (for me) preferable position of leaving well-enough-alone in the back
> branches.

Well, I seem to be in the minority here, so unless additional people
chime in, I'll revert this in the back branches shortly.

regards, tom lane



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro  writes:
> Ahh, OpenSSL's armcap.c shows how to do this.  You need to
> siglongjmp() out of there.  Here's a patch that does it that way.
> Isn't this better?

Do you really need the pg_crc32c_armv8_choose_dummy global variable?
That seems pretty ugly.  If you're concerned about the compiler
optimizing away the call to the crc function, you could write it like

result = (pg_comp_crc32c_armv8(0, 0, 0) == expected-value);

which'd provide a bit of extra checking that the code's not broken,
too.

regards, tom lane



Re: BufFileSize() doesn't work on a "shared" BufFiles

2018-05-02 Thread Heikki Linnakangas

On 30/04/18 22:00, Peter Geoghegan wrote:

On Mon, Apr 30, 2018 at 10:18 AM, Heikki Linnakangas  wrote:

Perhaps that would be OK, if it was properly commented. But it's not
actually hard to make BufFileSize() work on shared files, too, so I think we
should do that.


I agree that this could use a comment, but I don't see much point in
adding the extra FileSeek(). The fact that fd.c is unwilling to track
filesize for non-temp files (where "non-temp" means a
PathNameOpenTemporaryFile()-returned/exported file) is due to
vfdP->fileSize being involved in temp_file_limit enforcement. Maybe a
FD_TEMP_FILE_LIMIT assertion within FileGetSize() would help?


Seems simpler to just make it work. It's not like the extra lseek() is 
going to make any performance difference. Functions that work only under 
certain conditions are annoying.


Pushed, thanks.

- Heikki



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Tom Lane
Geoff Winkless  writes:
> Being blunt, unless I've missed the point all the arguments I've read so
> far for cmake seem to be advantages for the developers, not the users. As
> developers who put in your time you are of course entitled to make your
> lives easier but I'm just making the counterpoint that if you do so at the
> expense of your users you lose a certain amount of goodwill. It's up to you
> all how much that matters.

Yeah, one of the things that I find to be a very significant turn-off in
these proposals is that they'd break the "configure; make; make install"
ritual that so many people are accustomed to.  User-unfriendly decisions
like cmake's approach to configuration switches (-D? really?) are icing
on top of what's already an un-tasty cake.

What we do internally is our business, but these things are part of the
package's API in a real sense.  Changing them has a cost, one that's not
all borne by us.

regards, tom lane



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread David G. Johnston
On Wed, May 2, 2018 at 1:07 AM, Amit Langote 
wrote:

> Hi David.
>
> On 2018/05/02 8:18, David Rowley wrote:
> > On 1 May 2018 at 21:44, Amit Langote 
> wrote:
> >
> > I re-read the patch and it still looks fine to me. I'm sure it could
> > be made better, but I just don't currently see how. I think it would
> > be better if you commented on the specifics of what you think could be
> > improved rather than a general comment that it could be improved.
>
> Sorry, I may have been a bit vague.  I've read the patch one more time by
> considering the phrase "partition pruning" as the name of the new feature
> and that constraint exclusion is an optimization technique which doubled
> as partition pruning until now.  The new feature achieves results faster
> and can be used in more cases than constraint exclusion.  With that
> reading, I don't see much to complain about with your patch at a high
> level.
>
> Except some nitpicking:
>
> +   
> +Partition Pruning is also more powerful than constraint exclusion as
> +partition pruning is not something that is performed only during the
> +planning of a given query.
>
> Maybe, don't repeat "partition pruning" again in the same sentence.  How
> about:
>

​good thought
​

>
> .. more powerful than constraint exclusion as *it* is not something..
>

​technically "it" refers to "constraint exclusion" when written this way.

Better would be:

Partition pruning, unlike constraint exclusion, may be performed during
query execution.

Saying "not only planning" where there is only one other possible time it
happens is unnecessarily vague.
​

> +   If partition pruning can be
> +   performed here then there is the added benefit of not having to
> +   initialize partitions which are pruned.  Partitions which are
> pruned
> +   during this stage will not show up in the query's
> +   EXPLAIN or EXPLAIN ANALYZE.
> It
> +   is possible to determine the number of partitions which were
> removed
> +   using this method by observing the Subplans Removed
> +   property in the EXPLAIN output.
>
> While it might be OK to keep the last two sentences, not sure about the
> 1st, which seems like it's spelling out an implementation detail -- that
> there is an initialization step for partitions.  It's a nice performance
> enhancement, sure, but might be irrelevant to the users reading this
> documentation.
>

​I would concur with omitting the initialization implementation detail.


>
> +   nested loop joins.  Since the value of these parameters may change
> many
> +   times during the execution of the query, partition pruning is
> performed
> +   whenever one of the execution parameters which is being compared
> to a
> +   partition column or expression changes.
>
> How about writing the last part as: whenever one of the execution
> parameters relevant to pruning changes
>

​Is it when the values change or for each different value?  The difference
being if values are not sorted an something like: 1,2,3,2,3,4,1,2 were to
appear.


>
> +   
> +
> + Currently, partition pruning of partitions during the planning of an
> + UPDATE or DELETE command are
> + internally implemented using the constraint exclusion method.  Only
> + SELECT uses the faster partition pruning method.
> Also
> + partition pruning performed during execution is only done so for the
> + Append node type.  Both of these limitations are likely to be removed
> + in a future release of PostgreSQL.
> +
> +   
>
> Do we need to write this given that we decided to decouple even the
> UPDATE/DELETE pruning from the constraint_exclusion configuration?  Also,
> noting that only Append nodes can use execution-time pruning seems
> unnecessary.  I don't see plan node names mentioned like this elsewhere in
> the documentation.  But more to the point, it seems like spilling out
> finer implementation details (and/or limitations thereof) in the
> user-facing documentation.
>

​I suppose it would matter relative to what explain plans the user might
see.  I do think the distinction between UPDATE/DELETE and SELECT can be
removed here though.  The execution limitation seems potentially worthy
though as written I am unable to convert the provided information into
something I can use.  Knowing when it cannot happen, even if incomplete,
would be more helpful to me.

David J.
​


Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Apr 24, 2018 at 5:59 PM, Alvaro Herrera  
> wrote:
> > Constraint
> > exclusion was pretty easy to get wrong, hence the need for a separate
> > section, and I suppose the new partition pruning may be prey to the same
> > problems, so it seems worth to document them specially.  But not sure
> > about the others, if they are mostly debugging tools.
> 
> Weighing in here late, but I have a hard time understanding why we
> want a GUC to control partition pruning at all. With constraint
> exclusion, the issue is whether you want to spend planner cycles to
> try to deduce things using CHECK constraints when, quite possibly,
> your CHECK constraints are unrelated to table inheritance and thus
> won't help.  But seems extremely unlikely that the same thing would
> happen with partition pruning.  Unlike your CHECK constraints, your
> partition bounds are, by definition, potentially useful for pruning.

I admit I am more concerned about the possibility of bugs than I am
about providing a performance-related tool.  If partition prune can do
its thing with only a 1.1% of overhead, that's a great result.  While
I'm sure that some real-world partitioning scenarios exist that have a
higher overhead than that, that's not what I am worried about the most.

In a couple of releases, once we know for sure that all this new code is
absolutely stable and that there are no bugs (keeping in mind that PG12
will boast additional pruning for MergeAppend as well as for UPDATE/
DELETE queries,) we can remove the GUC -- hoping that no user will bark
at us about they having to keep it disabled by default.  

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-05-02 Thread Ashutosh Bapat
On Wed, May 2, 2018 at 11:56 AM, Amit Langote
 wrote:
> But one could very well argue that BEFORE ROW triggers on the
> parent should run before performing the tuple routing and not be cloned to
> individual partitions, in which case the result would not have been the
> same.  Perhaps that's the motivation behind leaving this to be considered
> later.

+1 for that. We should associate before row triggers with the
partitioned table and not with the partitions. Those should be
executed before tuple routing (for that partition level) kicks in. But
then that would be asymetric with AFTER row trigger behaviour. From
this POV, pushing AFTER row triggers to the individual partitions
doesn't look good.

Other question that comes to my mind is what happens when rows are
inserted into a partition directly. Do we execute BEFORE trigger on
the partitioned table?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: inconsistency and inefficiency in setup_conversion()

2018-05-02 Thread John Naylor
On 4/28/18, Tom Lane  wrote:
> John Naylor  writes:
>> Solution #1 - As alluded to in [1], turn the conversions into
>> pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
>> pg_wchar.h to map conversion names to numbers.
>> Pros:
>> -likely easy to do
>> -allows for the removal of an install target in the Makefile as well
>> as ad hoc logic in MSVC
>> -uses a format that developers need to use anyway
>> Cons:
>> -immediately burns up 88 hard-coded OIDs and one for each time a
>> conversion proc is created
>> -would require editing data in two catalogs every time a conversion
>> proc is created
>
> Given the rate at which new conversion procs have been created
> historically (ie, next door to zero, after the initial feature addition),
> I don't think that second "con" argument has any force.  Eating a batch
> of manually-assigned OIDs seems risky mainly just in that it might force
> adjustment of pending patches --- but we deal with that all the time.
> So I like this answer, I think.

Attached is a draft patch to do this, along with the conversion script
used to create the entries. In writing this, a few points came up that
are worth bringing up:

-In the original SQL file the functions were not declared with an
explicit volatility, so by default they are 'volatile'. That seems
wrong for this kind of function, so I changed it to 'immutable'. It
seems the CREATE CONVERSION facility was created shortly after the
volatility classes were created, and I couldn't find any discussion
about it.

-I have not done performance testing of initdb yet. I'll do so at a
later date unless someone is excited enough to beat me to it.

-I piggy-backed on the OID lookup machinery for the encoding lookup,
but haven't changed all the comments that refer only to catalogs and
OIDs.

-With the 88 pg_proc entries with prolang=13 along with the 50 or so
with prolang=14, it might be worth it to create a language lookup.
This patch does not do so, however.

-This actually uses up 220 OIDs (88 + 132), since the conversions need
them for their comments to be loaded.

> However, there is a "con" you didn't mention that perhaps ought to be
> accounted for.  The way things are done now, neither these C functions
> nor the pg_conversion entries are "pinned"; it's possible to drop and/or
> recreate them.  That perhaps had significant value during development
> of the conversions feature, but I'm doubtful that it's worth much
> anymore.  Still, it's worth pointing out in case somebody disagrees.

-For this draft, I let them get pinned, and changed the sanity test to
reflect that. It'd be easy enough to add exceptions to setup_depend(),
though. (one for pg_conversion, and one to change the pg_proc query to
exclude C language functions)

I'll create a commitfest entry soon.

-John Naylor
From 27605cd5ced2fd0a69598de93491cfb20a74836c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 2 May 2018 17:50:53 +0700
Subject: [PATCH v1] Replace ad hoc format for conversion functions

Convert info for conversion functions into entries in pg_proc.dat
and pg_conversion.dat. This fixes wrong comments on the functions
and removes cruft from utils/mb/conversion_procs/Makefile,
initdb.c, and msvc/Install.pm.

Functional changes:
1. Conversions are now pinned. This can be reverted, but it's not
clear there would be any benefit in doing so.
2. The functions are now declared IMMUTABLE.
---
 src/backend/catalog/Makefile   |   2 +-
 src/backend/catalog/genbki.pl  |  56 +++-
 src/backend/utils/mb/conversion_procs/Makefile | 177 +--
 src/bin/initdb/initdb.c|  26 --
 src/include/catalog/pg_conversion.dat  | 417 +
 src/include/catalog/pg_conversion.h|  42 +--
 src/include/catalog/pg_proc.dat| 408 
 src/test/regress/expected/misc_sanity.out  |   1 -
 src/tools/msvc/Install.pm  |  38 ---
 9 files changed, 906 insertions(+), 261 deletions(-)
 create mode 100644 src/include/catalog/pg_conversion.dat

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a54197d..c987ca0 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -85,7 +85,7 @@ generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 # instead is cheating a bit, but it will achieve the goal of updating the
 # version number when it changes.
 bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
-	$(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
+	$(PERL) -I $(catalogdir) $< -I $(top_srcdir)/src/include/ --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
 	touch $@
 
 # The generated headers must all be symlinked into builddir/src/include/,
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 

Re: FPW stats?

2018-05-02 Thread Dmitry Dolgov
> On 2 May 2018 at 13:10, Michael Paquier  wrote:
> On Wed, May 02, 2018 at 12:22:34PM +0200, Dmitry Dolgov wrote:
>> Recently I've heard people complaining that Postgres doesn't expose any
>> statistics about how many full page writes happened during some time
>> frame.
>
> pg_waldump --stats?

Yep, pg_waldump is the only option so far, but I thought about something that
will directly expose this info.

> The bar to add new fields into pgstat structures is usually quite high
> depending on the location where those are added.  For example not so
> long ago there was a patch discussed about adding more fields to
> PgStat_StatTabEntry, which has been rejected as pgstat can be a problem
> for users with many tables.  See here:
> https://www.postgresql.org/message-id/1323.1511624064%40sss.pgh.pa.us
>
> Your patch adds a new field to PgStat_StatDBEntry?  Wouldn't you
> increase the bottleneck of deployments with many databases?  What's
> actually your use case?

This was discussed mostly in the context of benchmarking and understanding IO
for different workloads. I actually never realized that adding a new stats
field can have significant impact in those cases when there are too many
databases, and yeah, I'm afraid it may be not justified in this context.



Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, Apr 5, 2018 at 12:00 AM, Thomas Munro
 wrote:
> ... trying to figure out how to detect the instruction portably using SIGILL 
> ...

Ahh, OpenSSL's armcap.c shows how to do this.  You need to
siglongjmp() out of there.  Here's a patch that does it that way.
Isn't this better?

I tested this on a Linux ARM system that has the instruction, and I
put a kill(getpid(), SIGILL) in there to test the negative case
because I don't have access to an ARM system without the instruction.
I don't have a FreeBSD/ARM system to test on either but I checked that
the flow control technique works fine on FreeBSD on another
architecture when it hits an instruction it doesn't support.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-a-portable-way-to-detect-ARMv8-CRC32-hardware.patch
Description: Binary data


Re: FPW stats?

2018-05-02 Thread Michael Paquier
On Wed, May 02, 2018 at 12:22:34PM +0200, Dmitry Dolgov wrote:
> Recently I've heard people complaining that Postgres doesn't expose any
> statistics about how many full page writes happened during some time
> frame.

pg_waldump --stats?

> I guess it can be implemented in a more effective and optimized way, but with
> what I have right now first naive pgbench tests show that slowdown is about 
> 3%.
> Before I'll dig into it more, it would be nice to hear your opinion about this
> idea -  does it make sense to have something like this?

The bar to add new fields into pgstat structures is usually quite high
depending on the location where those are added.  For example not so
long ago there was a patch discussed about adding more fields to
PgStat_StatTabEntry, which has been rejected as pgstat can be a problem
for users with many tables.  See here:
https://www.postgresql.org/message-id/1323.1511624064%40sss.pgh.pa.us

Your patch adds a new field to PgStat_StatDBEntry?  Wouldn't you
increase the bottleneck of deployments with many databases?  What's
actually your use case?
--
Michael


signature.asc
Description: PGP signature


Re: Oddity in tuple routing for foreign partitions

2018-05-02 Thread Etsuro Fujita

(2018/05/02 15:45), Etsuro Fujita wrote:

(2018/05/02 10:10), Amit Langote wrote:

On 2018/05/02 6:09, Andres Freund wrote:

On 2018-05-01 13:41:32 -0400, Robert Haas wrote:

Committed.


Here is a small patch to remove a no-longer-needed cast in 
postgresBeginForeignInsert().


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 2007,2013  postgresBeginForeignInsert(ModifyTableState *mtstate,
  	/* Check if we add the ON CONFLICT clause to the remote query. */
  	if (plan)
  	{
! 		OnConflictAction onConflictAction = ((ModifyTable *) plan)->onConflictAction;
  
  		/* We only support DO NOTHING without an inference specification. */
  		if (onConflictAction == ONCONFLICT_NOTHING)
--- 2007,2013 
  	/* Check if we add the ON CONFLICT clause to the remote query. */
  	if (plan)
  	{
! 		OnConflictAction onConflictAction = plan->onConflictAction;
  
  		/* We only support DO NOTHING without an inference specification. */
  		if (onConflictAction == ONCONFLICT_NOTHING)


Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Geoff Winkless
 On Wed, 2 May 2018 at 00:57, Yuriy Zhuravlev  wrote:

> Hello Geoff!
>
> About cmake:
> 1. You can still use the binary build for your system.
> 2. You can still build Postgres from source and with old gcc, you need
> only install cmake (it's very easy) Only most modern versions of CMake
> depend on modern gcc. I have good experience with old Solaris and AIX. (I
> mean build Postgres by current cmake branch).
> 3. You can try and put your impressions on issue tracker on github.
>

​First, please don't top post.

Second, I can't "use the binary build", there isn't one for the systems I'm
talking about.

Third​, as you said, newer cmake refuses to build on this system.
Admittedly v2 built fine, but how long until someone tells me something
like "oh well, we need to use bracket arguments otherwise our files are
terribly hard to read so you need v3. It shouldn't be that hard to build,
you only need to compile gcc 4, and that's at least 5 years old, so it's
time you upgraded".

Being blunt, unless I've missed the point all the arguments I've read so
far for cmake seem to be advantages for the developers, not the users. As
developers who put in your time you are of course entitled to make your
lives easier but I'm just making the counterpoint that if you do so at the
expense of your users you lose a certain amount of goodwill. It's up to you
all how much that matters.

Geoff


FPW stats?

2018-05-02 Thread Dmitry Dolgov
Hi,

Recently I've heard people complaining that Postgres doesn't expose any
statistics about how many full page writes happened during some time frame.
Indeed, I couldn't find any easy way to do so, and judging from my
understanding of xloginsert.c it actually can be done per database with the
attached poc patch.

I guess it can be implemented in a more effective and optimized way, but with
what I have right now first naive pgbench tests show that slowdown is about 3%.
Before I'll dig into it more, it would be nice to hear your opinion about this
idea -  does it make sense to have something like this?
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 5bea073..64e450c 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -31,6 +31,7 @@
 #include "storage/proc.h"
 #include "utils/memutils.h"
 #include "pg_trace.h"
+#include "pgstat.h"
 
 /* Buffer size required to store a compressed version of backup block image */
 #define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ)
@@ -103,6 +104,15 @@ static int	max_rdatas;			/* allocated size */
 
 static bool begininsert_called = false;
 
+#define FPW_COUNTER_HASH_SIZE 100
+
+typedef struct {
+	Oiddb;
+	PgStat_Counter	counter;
+} FpwCounterEntry;
+
+static HTAB *fpwCounterStatHash = NULL;
+
 /* Memory context to hold the registered buffer and data references. */
 static MemoryContext xloginsert_cxt;
 
@@ -192,7 +202,9 @@ XLogEnsureRecordSpace(int max_block_id, int ndatas)
 void
 XLogResetInsertion(void)
 {
-	int			i;
+	inti;
+	HASH_SEQ_STATUS fstat;
+	FpwCounterEntry *fpwEntry;
 
 	for (i = 0; i < max_registered_block_id; i++)
 		registered_buffers[i].in_use = false;
@@ -203,6 +215,14 @@ XLogResetInsertion(void)
 	mainrdata_last = (XLogRecData *) _head;
 	curinsert_flags = 0;
 	begininsert_called = false;
+
+
+	hash_seq_init(, fpwCounterStatHash);
+	while ((fpwEntry = (FpwCounterEntry *) hash_seq_search()) != NULL)
+	{
+		pgstat_report_fpw(fpwEntry->db, fpwEntry->counter);
+		fpwEntry->counter = 0;
+	}
 }
 
 /*
@@ -584,7 +604,20 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 		if (include_image)
 		{
 			Page		page = regbuf->page;
+			bool		found = false;
 			uint16		compressed_len = 0;
+			FpwCounterEntry *fpwEntry;
+			Oid			dbOid = regbuf->rnode.dbNode;
+
+			fpwEntry = (FpwCounterEntry *) hash_search(fpwCounterStatHash,
+	   , HASH_ENTER, );
+			if (!found)
+			{
+fpwEntry->counter = 0;
+fpwEntry->db = dbOid;
+			}
+
+			fpwEntry->counter++;
 
 			/*
 			 * The page needs to be backed up, so calculate its hole length
@@ -1055,4 +1088,16 @@ InitXLogInsert(void)
 	if (hdr_scratch == NULL)
 		hdr_scratch = MemoryContextAllocZero(xloginsert_cxt,
 			 HEADER_SCRATCH_SIZE);
+
+	if (fpwCounterStatHash == NULL)
+	{
+		HASHCTL hash_ctl;
+		memset(_ctl, 0, sizeof(hash_ctl));
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(FpwCounterEntry);
+
+		fpwCounterStatHash = hash_create("Full page write counter hask",
+		 FPW_COUNTER_HASH_SIZE, _ctl,
+		 HASH_ELEM | HASH_BLOBS | HASH_PREALLOC | HASH_FIXED_SIZE);
+	}
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8cd8bf4..6ee5692 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -822,7 +822,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
 pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
 pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
-pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
+pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset,
+pg_stat_get_db_fpw(D.oid) AS fpw
 FROM pg_database D;
 
 CREATE VIEW pg_stat_database_conflicts AS
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 084573e..9606bb7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -336,6 +336,7 @@ static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
+static void pgstat_recv_fpw(PgStat_MsgFpw *msg, int len);
 
 /* 
  * Public functions called from postmaster follow
@@ -3210,6 +3211,26 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
 	pgstat_increment_changecount_after(beentry);
 }
 
+/* 
+ * pgstat_report_fpw() -
+ *
+ *	Tell the collector about full page writes.
+ * 
+ */
+void
+pgstat_report_fpw(Oid dboid, PgStat_Counter fpwCounter)
+{
+	PgStat_MsgFpw msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(_hdr, PGSTAT_MTYPE_FPW);
+	msg.m_databaseid = 

Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-02 Thread Amit Langote
Hi David.

On 2018/05/02 8:18, David Rowley wrote:
> On 1 May 2018 at 21:44, Amit Langote  wrote:
>> About the patch in general, it seems like the newly added documentation
>> talks about "Partition Pruning" as something that *replaces* constraint
>> exclusion.  But, I think "Partition Pruning" is not the thing that
>> replaces constraint exclusion.
> 
> Not sure where you see the mention partition pruning replacing
> constraint exclusion.
> 
>> We used to do partition pruning even
>> before and used constraint exclusion as the algorithm.
> 
> That depends on if you think of partition pruning as the new feature
> or the act of removing unneeded partitions. We seem to have settled on
> partition pruning being the new feature given that we named the GUC
> this way. So I don't quite understand what you mean here.
> 
>>  What's new is the
>> algorithm that we now use to perform partition pruning for declaratively
>> partitioned tables.  Also, the characteristics of the new algorithm are
>> such that it can now be used in more situations, thus making it more
>> useful than the earlier method of partition pruning, so that new features
>> like runtime pruning could be realized.  I like that the patch adds
>> various details about the new pruning features, but think that the wording
>> and the flow could be improved a bit.
>>
>> What do you think?
> 
> I re-read the patch and it still looks fine to me. I'm sure it could
> be made better, but I just don't currently see how. I think it would
> be better if you commented on the specifics of what you think could be
> improved rather than a general comment that it could be improved.

Sorry, I may have been a bit vague.  I've read the patch one more time by
considering the phrase "partition pruning" as the name of the new feature
and that constraint exclusion is an optimization technique which doubled
as partition pruning until now.  The new feature achieves results faster
and can be used in more cases than constraint exclusion.  With that
reading, I don't see much to complain about with your patch at a high level.

Except some nitpicking:

+   
+Partition Pruning is also more powerful than constraint exclusion as
+partition pruning is not something that is performed only during the
+planning of a given query.

Maybe, don't repeat "partition pruning" again in the same sentence.  How
about:

.. more powerful than constraint exclusion as *it* is not something..

Or may suggest to rewrite it as:

Partition pruning is also more powerful than constraint exclusion as it
can be performed not only during the planning of a given query, but also
during its execution.

If you accept the above rewrite, the next sentences in the paragraph:

+In certain cases, partition pruning may also
+be performed during execution of the query as well.  This allows pruning
+to be performed using values which are unknown during query planning, for
+example, using parameters defined in a PREPARE
+statement, using a value obtained from a subquery or using parameters
from
+a parameterized nested loop join.

could be adjusted a bit to read as:

For example, this allows pruning to be performed using values which are
unknown during query planning but will be known during execution, such as
using parameters defined in a PREPARE statement (if a
generic plan is chosen), or using a value obtained from a subquery, or
using values from an outer row of a parameterized nested loop join.

+   
+The partition pruning which is performed during execution is done so at
+either one or both of the following times:

done so at -> done at

+   If partition pruning can be
+   performed here then there is the added benefit of not having to
+   initialize partitions which are pruned.  Partitions which are pruned
+   during this stage will not show up in the query's
+   EXPLAIN or EXPLAIN ANALYZE.  It
+   is possible to determine the number of partitions which were removed
+   using this method by observing the Subplans Removed
+   property in the EXPLAIN output.

While it might be OK to keep the last two sentences, not sure about the
1st, which seems like it's spelling out an implementation detail -- that
there is an initialization step for partitions.  It's a nice performance
enhancement, sure, but might be irrelevant to the users reading this
documentation.

+   nested loop joins.  Since the value of these parameters may change
many
+   times during the execution of the query, partition pruning is
performed
+   whenever one of the execution parameters which is being compared to a
+   partition column or expression changes.

How about writing the last part as: whenever one of the execution
parameters relevant to pruning changes

+   
+
+ Currently, partition pruning of partitions during the planning of an
+ UPDATE or DELETE command are
+ internally implemented using the constraint 

Re: Patch missing from back branches

2018-05-02 Thread Haroon
On 1 May 2018 at 21:04, Tom Lane  wrote:

> I've pushed this patch into the 9.5 branch, so it should be possible to
> spin dory up on that branch now.

Thanks Tom!

Thanks everyone else for your input on the issue!

Regards, Haroon

-- 
Haroonhttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Global snapshots

2018-05-02 Thread Stas Kelvich


> On 2 May 2018, at 05:58, Peter Eisentraut  
> wrote:
> 
> On 5/1/18 12:27, Stas Kelvich wrote:
>> Clock-SI is described in [5] and here I provide a small overview, which
>> supposedly should be enough to catch the idea. Assume that each node runs 
>> Commit
>> Sequence Number (CSN) based visibility: database tracks one counter for each
>> transaction start (xid) and another counter for each transaction commit 
>> (csn).
>> In such setting, a snapshot is just a single number -- a copy of current CSN 
>> at
>> the moment when the snapshot was taken. Visibility rules are boiled down to
>> checking whether current tuple's CSN is less than our snapshot's csn. Also it
>> worth of mentioning that for the last 5 years there is an active proposal to
>> switch Postgres to CSN-based visibility [6].
> 
> But that proposal has so far not succeeded.  How are you overcoming the
> reasons for that?

Well, CSN proposal is aiming to switch all postgres visibility stuff with CSN.
This proposal is far more ambitious and original postgres visibility with
snapshots being arrays of XIDs is preserved. In this patch CSNs are written
to SLRU during commit (in a way like commit_ts does) but will be read in two
cases:

1) When the local transaction faced XID that in progress according to XIP-based
snapshot, it CSN need to be checked, as it may already be InDoubt. XIDs that
viewed as committed doesn't need that check (in [6] they also need to be
checked through SLRU).

2) If we are in backend that imported global snapshot, then only CSN-based
visibility can be used. But that happens only for global transactions.

So I hope that local transactions performance will be affected only by
in-progress check and there are ways to circumvent this check.

Also all this behaviour is optional and can be switched off by not enabling
track_global_snapshots.


--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Explain buffers wrong counter with parallel plans

2018-05-02 Thread Adrien Nayrat
On 05/01/2018 10:57 PM, Andres Freund wrote:
> Robert, you added this as an open item. I don't think it's clear that
> there's a bug here, and even less clear that it's something new for
> v11. Am I understanding that right?  Should we close this open item?

Hi,

FIY this change is related to 01edb5c7fc which is not specific to v11.
01edb5c7fc is also in v10.

Regards,

-- 
Adrien



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Clock with Adaptive Replacement

2018-05-02 Thread Yura Sokolov
02.05.2018 01:37, Peter Geoghegan пишет:
> On Tue, May 1, 2018 at 11:58 AM, Robert Haas  wrote:
>> I agree that double-counting correlated accesses is a a problem, and I
>> agree that we probably want to do something about it.  I am skeptical
>> that using wall-clock time is the right way to solve that problem
>> because it leads to edge effects
> 
> I agree that wall-clock time is a bad approach, actually. If we were
> to use wall-clock time, we'd only do so because it can be used to
> discriminate against a thing that we actually care about in an
> approximate, indirect way. If there is a more direct way of
> identifying correlated accesses, which I gather that there is, then we
> should probably use it.

I suggest incrementing only once in 1/32 replacements in shared_buffers.
I.e. if size of shared_buffers is 1024, and this page were put into
shared_buffers as 21200, then next time its usage count will be
incremented only after 21232 page were put into shared buffers. And
second time only after 21264 page.

regards,
Yura.



Re: Oddity in tuple routing for foreign partitions

2018-05-02 Thread Etsuro Fujita

(2018/05/02 10:10), Amit Langote wrote:

On 2018/05/02 6:09, Andres Freund wrote:

On 2018-05-01 13:41:32 -0400, Robert Haas wrote:

Committed.


Thank you.


Thanks for committing, Robert!


There's still an open items entry for this thread - has that been
resolved by this commit


Afaik, yes.  So moved that to resolved items.


Yeah, thanks for that, Amit!

Best regards,
Etsuro Fujita



Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-05-02 Thread Amit Langote
On 2018/05/02 14:17, Ashutosh Bapat wrote:
> On Tue, May 1, 2018 at 5:20 PM, Peter Eisentraut
>  wrote:
>> On 4/26/18 05:03, Amit Langote wrote:
>>> On 2018/04/26 13:01, David Rowley wrote:
 The attached small patch removes the mention that partitioned tables
 cannot have foreign keys defined on them.

 This has been supported since 3de241db
>>>
>>> I noticed also that the item regarding row triggers might be obsolete as
>>> of 86f575948c7, thanks again to Alvaro!  So, I updated your patch to take
>>> care of that.
>>
>> committed both
> 
> AFAIK we still don't support BEFORE ROW triggers, so removing that
> entry altogether is misleading.

You're right.  I think that's what you were also saying on the other
thread, in reply to which I directed you to this thread.  I very clearly
missed that BEFORE ROW triggers are still disallowed.

create trigger br_insert_trig before insert on p for each row execute
procedure br_insert_trigfunc();
ERROR:  "p" is a partitioned table
DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.

Here is a patch that adds back a line to state this particular limitation.
Sorry about the earlier misleading patch.

Fwiw, I am a bit surprised to see the error, but that's irrelevant now.  I
see that 86f575948c7 added the following comment in CreateTrigger() above
the check that causes above error.

/*
 * BEFORE triggers FOR EACH ROW are forbidden, because they would
 * allow the user to direct the row to another partition, which
 * isn't implemented in the executor.
 */

But if that's the only reason, I think it might be too restrictive.
Allowing them would not lead to something wrong happening, afaics.  To
illustrate, I temporarily removed the check to allow BR triggers to be
created on the parent and thus being cloned to each partition:

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);

create or replace function br_insert_trigfunc() returns trigger as
$$ begin new.a := new.a + 1; return new; end $$
language plpgsql;

create trigger br_insert_trig before insert on p for each row execute
procedure br_insert_trigfunc();

insert into p values (1, 'a') returning *;
ERROR:  new row for relation "p1" violates partition constraint
DETAIL:  Failing row contains (2, a).

Since the same error would occur if the trigger were instead defined
directly on the partition (which *is* allowed), maybe users could live
with this.  But one could very well argue that BEFORE ROW triggers on the
parent should run before performing the tuple routing and not be cloned to
individual partitions, in which case the result would not have been the
same.  Perhaps that's the motivation behind leaving this to be considered
later.

Thanks,
Amit
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 0c8eb48a24..0b1948f069 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3340,6 +3340,12 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
version.
   
  
+ 
+  
+   BEFORE ROW triggers, if necessary, must be defined
+   on individual partitions, not the partitioned table.
+  
+ 
 
 
 


Re: [SPAM] Re: Local partitioned indexes and pageinspect

2018-05-02 Thread Michael Paquier
On Wed, May 02, 2018 at 01:38:22PM +0900, Amit Langote wrote:
> Perhaps, I'm just repeating what's already been said, but I think it might
> be better to have the word "partitioned" in the message.

That's what Peter is pointing to upthread and what the v1 of upthread
was doing.  I would tend to think to just keep the code simple and don't
add those extra checks, but I am fine to be beaten as well.
--
Michael


signature.asc
Description: PGP signature