Re: NOTIFY and pg_notify performance when deduplicating notifications

2018-10-10 Thread Catalin Iacob
On Wed, Oct 10, 2018 at 5:42 PM Catalin Iacob  wrote:
> One way could be to take inspiration from
> src/test/isolation/specs/async-notify.spec and check that
> pg_notification_queue_usage() does grow when repeating the same
> payload with collapse_mode='never' (while for always it would grow).

Sorry, the last part should be "(while for *maybe* it would *not* grow)".



Re: NOTIFY and pg_notify performance when deduplicating notifications

2018-10-10 Thread Catalin Iacob
On Tue, Oct 9, 2018 at 2:17 PM  wrote:
> I just caught an error in my patch, it's fixed in the attachment. The 'never' 
> and 'maybe' collapse modes were mixed up in one location.

Here's a partial review of this version, did not read the doc part
very carefully.

First of all, I agree that this is a desirable feature as, for a large
number of notiifications, the O(n^2) overhead quickly becomes very
noticeable.

I would expect the collapse mode to be an enum which is created from
the string early on during parsing and used for the rest of the code.
Instead the string is used all the way leading to string comparisons
in the notification dispatcher and to the need of hardcoding special
strings in various places, including the contrib module.

This comment in the beginning of async.c should also be updated:
*   Duplicate notifications from the same transaction are sent out as one
*   notification only. This is done to save work when for example a trigger

pg_notify_3args duplicates pg_notify, I would expect a helper function
to be extracted and called from both.

There are braces placed on the same line as the if, for example if
(strlen(collapse_mode) != 0) { which seems to not be the project's
style.

>
> I can't find a reasonable way to build a regression test that checks whether 
> notifications are effectively deduplicated. The output of the LISTEN command 
> lists the PID of the notifying backend for each notification, e.g. : 
> 'Asynchronous notification "foobar" received from server process with PID 
> 24917'. I can't just add this to async.out. I did test manually for all eight 
> combinations : four collapse mode values (missing, empty string, 'maybe' and 
> 'never'), both with NOTIFY and pg_notify().

One way could be to take inspiration from
src/test/isolation/specs/async-notify.spec and check that
pg_notification_queue_usage() does grow when repeating the same
payload with collapse_mode='never' (while for always it would grow).
But I'm not sure it's worth the effort.



Re: NOTIFY and pg_notify performance when deduplicating notifications

2018-10-05 Thread Catalin Iacob
On Tue, Oct 2, 2018 at 7:20 PM  wrote:
> I have a feature built around LISTEN / NOTIFY that works perfectly well, 
> except for the enormous performance impact to transactions that emit large 
> numbers of notifications.

Indeed, I have the same and am very interested in this.

> I hope this patch can be reviewed and included in PostgreSQL.

I added this to the next Commitfest and added myself as a reviewer.
Will try to a review beginning of next week.
https://commitfest.postgresql.org/20/1820/



Re: Is a modern build system acceptable for older platforms

2018-05-02 Thread Catalin Iacob
On Wed, May 2, 2018 at 5:44 PM, Robert Haas  wrote:
> I don't think that unsubstantiated hyperbole is the right way to
> approach the task of convincing the community to adopt the approach
> you prefer.  I don't see that any compelling evidence has been
> presented that a cmake-based solution would really save thousands of
> lines of code.

Let me try to list the advantages as I see them.

* ability to use ninja
  ** meson's requirement to use ninja might be a disadvantage but
ability to use is definitely good
  ** faster than make - the difference is really noticeable
  ** good dependency story, parallel everything
  ** simply a very nice developer experience, for example the screen
is not filled with scrolling lines instead progress updates shown as
x/y files to go, currently at file z; try it and you'll see what I
mean
  ** I got interested in the ninja for PG and therefore CMake or meson
after trying clang-cl.exe for PG on Windows. clang-cl.exe is a drop in
open source replacement for Microsoft's cl.exe but using it does not
interact well with the fact that MSBuild runs only one cl.exe with
lots of .c files as input and expects cl.exe to handle the parallelism
while clang-cl.exe does not handle any parallelism taking the position
that the build system should handle that. Being able to invoke
clang-cl.exe from ninja instead of MSBuild would make fast compilation
with clang-cl.exe easy while now only slow serial compilation is
possible.

* IDE integration:
  ** cmake and meson can generate XCode and Visual Studio, granted
Visual Studio already works via the MSVC scripts
  ** CLion can consume cmake giving a good IDE story on Linux which PG
currently lacks

* get rid of the ad-hoc MSVC generation Perl scripts
  ** granted, I looked at those recently in the clang-cl context above
and they're reasonably understandable/approachable even without
knowing too much Perl

* appeal to new developers
  ** I think it's not a controversial statement that, as time passes,
autotools and make syntax are seen more and more as arcane things that
only old beards know how to handle and that the exciting stuff moved
elsewhere; in the long run this is a real problem
  ** on the other hand, as an autotools almost complete novice, after
reading some autotools docs, I was pleasantly surprised at how small
and easy to follow Andres' build patch adding LLVM and C++ support
was, especially as it's doing big, non conventional changes: add
support for another compiler but in a specific "emit LLVM bitcode"
mode, add support for C++ etc. So autoconf ugliness is not that big of
a deal but perception does matter.

* PGXS on Windows
  ** could be solvable without moving wholesale

>From the above, I would rate ninja as a high nice to have, IDE, PGXS
on Windows and new developers as medium high nice to haves (but see
below for long term concerns) and no MSVC Perl scripts as low nice to
have.

I started the thread as it seemed to me energy was consumed to move to
another system (proof of concept and discussions) while it wasn't even
clarified whether a new system isn't a complete no go due to the old
platforms PG supports. I find Tom's and Robert's position of
"acceptable but we would need to see real benefits as there definitely
are real downsides" perfectly reasonable. The build system dictating
platform support would indeed be the tail wagging the dog. Personally,
with the current information I'd not vote for switching to another
system, mainly because I ultimately think developer convenience should
not trump end user benefits.

I do have a real concern about the long term attractiveness of the
project to new developers, especially younger ones as time passes.
It's not a secret that people will just avoid creaky old projects, and
for Postgres old out of fashion things do add up: autoconf, raw make,
Perl for tests, C89, old platform support. I have no doubt that the
project is already loosing competent potential developers due to this.
One can say this is superficial and those developers should look at
the important things but that does not change reality that some will
just say pass because of dislike of the old technologies I mentioned.
Personally, I can say that if the project were still in CVS I would
probably not bother as I just don't have energy to learn an inferior
old version control system especially as I see version control as
fundamental to a developer. I don't feel the balance between
recruiting new developers and end user benefits tilted enough to
replace the build system but maybe in some years that will be the
case.



Re: Postgres, fsync, and OSs (specifically linux)

2018-05-01 Thread Catalin Iacob
On Sat, Apr 28, 2018 at 12:28 AM, Andres Freund  wrote:
> Before linux v4.13 errors in kernel writeback would be reported at most
> once, without a guarantee that that'd happen (IIUC memory pressure could
> lead to the relevant information being evicted) - but it was pretty
> likely.  After v4.13 (see https://lwn.net/Articles/724307/) errors are
> reported exactly once to all open file descriptors for a file with an
> error - but never for files that have been opened after the error
> occurred.

snip

> == Proposed Linux Changes ==
>
> - Matthew Wilcox proposed (and posted a patch) that'd partially revert
>   behaviour to the pre v4.13 world, by *also* reporting errors to
>   "newer" file-descriptors if the error hasn't previously been
>   reported. That'd still not guarantee that the error is reported
>   (memory pressure could evict information without open fd), but in most
>   situations we'll again get the error in the checkpointer.
>
>   This seems largely be agreed upon. It's unclear whether it'll go into
>   the stable backports for still-maintained >= v4.13 kernels.

This is now merged, if it's not reverted it will appear in v4.17.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff75eb2a08c2ac96404a2d79685668f3cf5a7a3

The commit is cc-ed to stable so it should get picked up in the near future.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b4678df184b314a2bd47d2329feca2c2534aa12b



Is a modern build system acceptable for older platforms

2018-04-18 Thread Catalin Iacob
There have been several discussions of replacing PG's autoconf +
src/tools/msvc system. The last example is happening now at the bottom of
the Setting rpath on llvmjit.so thread.

I see potentially big advantages to moving but also to PG's conservative
approach that keeps it running on edge and old platforms so I set to look
more carefully what could be problematic or a showstopper for a more modern
build system. Here are my findings, hope they help.

Unlike autoconf, all newer alternatives that I know of (certainly CMake and
Meson which were floated as alternatives so far) require themselves to be
present on the build machine when building. I know they have good reasons
to do this but that means they impose new dependencies for building PG.
Let's see what those are for CMake and Meson to get an idea if that's
acceptable and a feeling for how much friction they will introduce.

CMake
=

* needs a C++11 compiler (since 3.10, before it used to only need C++98)
* needs libuv (since 3.10 apparently, I know that some years ago it had no
library dependencies besides the C++ standard library)
* has a make backend so no new depedency (it maybe works with non GNU make
so maybe it lowers one dependency)
* can bootstrap on a number of Unix systems, see
https://gitlab.kitware.com/cmake/cmake/blob/master/bootstrap

For the platforms in "CMake's buildfarm" see
https://open.cdash.org/index.php?project=CMake

The C++11 requirement caused 3.10 and higher to not build anymore for HP-UX:
https://gitlab.kitware.com/cmake/cmake/issues/17137

Meson
=

* needs Python >= 3.4
* needs ninja
** meson has no make backend see
http://mesonbuild.com/FAQ.html#why-is-there-not-a-make-backend for rationale
** as a small positive, this would mean not having to explain "you need
GNU make, BSD make is not enough"

Ninja:
* needs C++
** I think C++98 is enough but not 100% sure, with a quick look at the
code I noticed no newer C++ features and the bootstrap script does not pass
any -std arguments to the C++ compiler so it should be 98
* https://github.com/ninja-build/ninja/pull/1007 talks about adding AIX
support and is in a release already
* https://github.com/ninja-build/ninja/blob/master/configure.py is the
bootstrap script which lists these as known platforms: 'linux', 'darwin',
'freebsd', 'openbsd', 'solaris', 'sunos5', 'mingw', 'msvc', 'gnukfreebsd',
'bitrig', 'netbsd', 'aix', 'dragonfly'

Python 3:
* points to ActivePython for HP-UX: https://www.python.org/download/other/
* some googling suggests Python > 3.2 works well on AIX and there are some
links to binaries

If I look at the requirements above versus what Postgres has in
src/template and in the build farm it seems like HP-UX and AIX could be the
more problematic or at least fiddly ones.

A related issue is that future versions of CMake or Meson could move their
baseline dependencies and desupport old platforms faster than PG might want
to but there one could make the case to just use the older meson or CMake.

So before the discussion whether the gains from switching build systems
would offset the pain, I think the project needs to decide whether a newer
build system is acceptable in the first place as it has a chance of
desupporting a platform alltogether or at least making it more painful for
some platforms by adding the bootstrap step for the build system with
potentially cascading dependencies (get Python 3 working, get ninja
bootstrapped, get PG built or get libuv built, get CMake built, get PG
built).

The above is all about getting the build system to work at all. If that
isn't a showstopper there's a subsequent discussion to be had about older
platforms where one could get the build system to work but convenient
packages are missing. For example not even RHEL7 has any Python3 packages
in the base system (it does in Software Collections though) which means
some extra hoops on getting meson running there. And RHEL5 is in an even
worse spot as it has no Software Collections, who knows if Python 3 builds
on it out of the box etc.


Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-03-29 Thread Catalin Iacob
On Thu, Mar 29, 2018 at 2:07 PM, Thomas Munro
 wrote:
> I found your discussion with kernel hacker Jeff Layton at
> https://lwn.net/Articles/718734/ in which he said: "The stackoverflow
> writeup seems to want a scheme where pages stay dirty after a
> writeback failure so that we can try to fsync them again. Note that
> that has never been the case in Linux after hard writeback failures,
> AFAIK, so programs should definitely not assume that behavior."

And a bit below in the same comments, to this question about PG: "So,
what are the options at this point? The assumption was that we can
repeat the fsync (which as you point out is not the case), or shut
down the database and perform recovery from WAL", the same Jeff Layton
seems to agree PANIC is the appropriate response:
"Replaying the WAL synchronously sounds like the simplest approach
when you get an error on fsync. These are uncommon occurrences for the
most part, so having to fall back to slow, synchronous error recovery
modes when this occurs is probably what you want to do.".
And right after, he confirms the errseq_t patches are about always
detecting this, not more:
"The main thing I working on is to better guarantee is that you
actually get an error when this occurs rather than silently corrupting
your data. The circumstances where that can occur require some
corner-cases, but I think we need to make sure that it doesn't occur."

Jeff's comments in the pull request that merged errseq_t are worth
reading as well:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750

> The article above that says the same thing a couple of different ways,
> ie that writeback failure leaves you with pages that are neither
> written to disk successfully nor marked dirty.
>
> If I'm reading various articles correctly, the situation was even
> worse before his errseq_t stuff landed.  That fixed cases of
> completely unreported writeback failures due to sharing of PG_error
> for both writeback and read errors with certain filesystems, but it
> doesn't address the clean pages problem.

Indeed, that's exactly how I read it as well (opinion formed
independently before reading your sentence above). The errseq_t
patches landed in v4.13 by the way, so very recently.

> Yeah, I see why you want to PANIC.

Indeed. Even doing that leaves question marks about all the kernel
versions before v4.13, which at this point is pretty much everything
out there, not even detecting this reliably. This is messy.



Re: JIT compiling with LLVM v12.2

2018-03-21 Thread Catalin Iacob
On Wed, Mar 21, 2018 at 4:07 AM, Andres Freund  wrote:
> Indeed. I've pushed a rebased version now, that basically just fixes the
> issue Thomas observed.

Testing 2d6f2fba from your repository configured --with-llvm I noticed
some weird things in the configure output.

Without --enable-debug:
configure: using compiler=gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
configure: using CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -O2
configure: using CPPFLAGS= -D_GNU_SOURCE
configure: using LDFLAGS= -L/opt/rh/llvm-toolset-7/root/usr/lib64
-Wl,--as-needed
configure: using CXX=g++
configure: using CXXFLAGS=-Wall -Wpointer-arith -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -O2
configure: using CLANG=/opt/rh/llvm-toolset-7/root/usr/bin/clang
configure: using BITCODE_CFLAGS= -fno-strict-aliasing -fwrapv -O2
configure: using BITCODE_CXXFLAGS= -fno-strict-aliasing -fwrapv -O2
BITCODE_CXXFLAGS

With --enable-debug:
configure: using compiler=gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
configure: using CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2
configure: using CPPFLAGS= -D_GNU_SOURCE
configure: using LDFLAGS= -L/opt/rh/llvm-toolset-7/root/usr/lib64
-Wl,--as-needed
configure: using CXX=g++
configure: using CXXFLAGS=-Wall -Wpointer-arith -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -g -O2
configure: using CLANG=/opt/rh/llvm-toolset-7/root/usr/bin/clang
configure: using BITCODE_CFLAGS= -fno-strict-aliasing -fwrapv -O2
configure: using BITCODE_CXXFLAGS= -fno-strict-aliasing -fwrapv -O2
BITCODE_CXXFLAGS

So I unconditionally get one -g added to CXXFLAGS regardless of
whether I specify --enable-debug or not. And --enable-debug results in
-g -g in CXXFLAGS.

Didn't get to look at the code yet, maybe that comes from:
$ llvm-config --cxxflags
-I/opt/rh/llvm-toolset-7/root/usr/include -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic
-fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter
-Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic
-Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor
-Wno-comment -std=c++11 -ffunction-sections -fdata-sections -O2 -g
-DNDEBUG  -fno-exceptions -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS

But on the other hand there are lots of other flags in there that
don't end up in CXXFLAGS.

BTW, you should probably specify -std=c++11 (or whatever you need) as
various g++ and clang++ versions default to various things. Will the
required C++ standard be based on the requirements of the C++ code in
the PG tree or will you take it from LLVM's CXXFLAGS? Can --std=c++11
and --std=c++14 compiled .o files be linked together? Or in other
words, in case in the future LLVM starts requiring C++14 but the code
in the PG tree you wrote still builds with C++11, will PG upgrade it's
requirement with LLVM or will it stay with the older standard?

Also, my CXXFLAGS did not get -fexcess-precision=standard neither did
BITCODE_CFLAGS nor BITCODE_CXXFLAGS.

In case it's interesting:
$ llvm-config --cflags
-I/opt/rh/llvm-toolset-7/root/usr/include -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic
-fPIC -Wall -W -Wno-unused-parameter -Wwrite-strings
-Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-comment
-ffunction-sections -fdata-sections -O2 -g -DNDEBUG -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS


2. Unlike all the other *FLAGS, BITCODE_CXXFLAGS includes itself on
the right hand side of the equal
configure: using BITCODE_CXXFLAGS= -fno-strict-aliasing -fwrapv -O2
BITCODE_CXXFLAGS



Re: JIT compiling with LLVM v11

2018-03-15 Thread Catalin Iacob
On Thu, Mar 15, 2018 at 6:19 PM, Andres Freund  wrote:
> What we were talking about in this subthread was about a depency on
> clang, not LLVM. And that's just needed at buildtime, to generate the
> bitcode files (including synchronizing types / function signatures).

I was actually thinking of both the buildtime and runtime dependency
because I did not realize the PGDG packages already depend on EPEL.

> For the yum.pg.o, which already depends on EPEL, there's a new enough
> LLVM version.  There's a new enough version in RHEL proper, but it
> appears to only be there for mesa (llvm-private).

Indeed RHEL 7 comes with llvm-private for mesa but that doesn't seem
kosher to use for other things.

When I said packagers I was only thinking of PGDG. I was thinking the
software collections would be the likely solution for the PGDG
packages for both buildtime and runtime. But it seems using clang from
software collections and LLVM from EPEL is also a possibility,
assuming that the newer clang generates IR that the older libraries
are guaranteed to be able to load.

For RHEL proper, I would guess that PG11 is too late for RHEL8 which,
according to history, should be coming soon.

For RHEL9 I would really expect RedHat to add llvm and clang to proper
RHEL and build/run against those, even if they add it only for
Postgres (like they did for mesa). I really don't see them shipping
without a major speedup for a major DB, also because in the meantime
the JIT in PG will have matured. That's also why I find it important
to support gcc and not restrict JIT to clang builds as I expect that
RedHat and all other Linux distros want to build everything with gcc
and asking them to switch to clang or give up JIT will put them in a
hard spot. As far as I know clang does promise gcc compatibility in
the sense that one can link together .o files compiled with both so I
expect the combination not to cause issues (assuming the other
compiler flags affecting binary compatibility are aligned).



Re: JIT compiling with LLVM v11

2018-03-15 Thread Catalin Iacob
On Thu, Mar 15, 2018 at 1:20 AM, Andres Freund  wrote:
> I don't really live in the RHEL world, but I wonder if
> https://developers.redhat.com/blog/2017/10/04/red-hat-adds-go-clangllvm-rust-compiler-toolsets-updates-gcc/
> is relevant?

Indeed. It might be a bit awkward for packagers to depend on something
from Software Collections, for example because they come as separate
trees in /opt that are by default not in your path or dynamic loader
path - one needs to run everything via a scl wrapper or source the
/opt/rh/llvm-toolset-7/enable file to get the appropriate PATH and
LD_LIBRARY_PATH settings, But it seems doable.

I just installed llvm-toolset-7 (the LLVM version is 4.0.1) on RHEL
7.4 and did a build of your tree at
475b4da439ae397345ab3df509e0e8eb26a8ff39. make installcheck passes for
both the default config and a server forced to jit everything (I
think) via:
jit_above_cost = '0'
jit_inline_above_cost = '0'
jit_optimize_above_cost = '0'

As a side note, this increases the runtime from approx 4 min to 18
min. Disabling jit completely with -1 in all of the above yields 3 min
48s, close to the default question raising maybe the question of how
much coverage does jit get with the default config.

The build was with the newer gcc 7.2.1 from the aforementioned
collections, I'll try the system gcc as well. I run a buildfarm animal
(katydid) on this RHEL. When JIT gets committed I'll make it use
--with-llvm against this Software Collections LLVM.

> Appears to be available on centos too
> https://www.softwarecollections.org/en/scls/rhscl/devtoolset-7/

Indeed they are available for CentOS as well.



Re: prokind column (was Re: [HACKERS] SQL procedures)

2018-02-27 Thread Catalin Iacob
On Tue, Feb 27, 2018 at 4:03 AM, Michael Paquier  wrote:
> I would just recommend users to use a version of psql matching
> the one of the server instead of putting an extra load of maintenance
> into psql for years to come

Breaking tab completion in new psql against old servers might be
acceptable as it's a fringe feature, but I don't think your
recommendation of matching versions is practical. Lots of people
manage multiple server versions and using the latest psql for all of
them is currently, as far as I know, a perfectly supported way of
doing that, getting new psql features and keeping compatibility. I
think it would be a pity to loose that.



Re: Doc tweak for huge_pages?

2018-01-23 Thread Catalin Iacob
On Tue, Jan 23, 2018 at 7:13 PM, Catalin Iacob  wrote:
> By the way, Fedora 27 does disable THP by default, they deviate from
> upstream in this regard:

> When I have some time I'll try to do some digging into history of the
> Fedora kernel package to see if they provide a rationale for changing
> the default. That might hint whether it's likely that future RHEL will
> change as well.

I see Peter assigned himself as committer, some more information below
for him to decide on the strength of the anti THP message.

commit 9a031d5070d9f8f5916c48637bd0c237cd52eaf9
Author: Josh Boyer 
Date:   Thu Mar 27 18:31:16 2014 -0400

Switch to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE instead of always on

The benefit of THP has been somewhat questionable overall for a while,
and it's been known to cause performance issues with some workloads.
Upstream also considers it to be overly complicated and really not worth
it on machines with memory in the amounts found on typical desktops/SMB
servers.

Switch to using it via madvise, which most applications that care about
it should likely already be doing.

Debian 9 also seems to default to madvise instead of always.

Digging more into it, there were changes in the 4.6 kernel (released
May 2016) that should improve THP, more precisely:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=444eb2a449ef36fe115431ed7b71467c4563c7f1

This also lead Debian to change their default in September 2017 (so
for the future Debian release) back to always, referencing the 44eb2a
improvements:
https://anonscm.debian.org/cgit/kernel/linux.git/commit/debian/changelog?id=611a8e67260e8b8190ab991206a3867681d6df91

Ben Hutchings 2017-09-29 14:32:09 (GMT)
thp: Enable TRANSPARENT_HUGEPAGE_ALWAYS instead of TRANSPARENT_HUGEPAGE_MADVISE
As advised by Andrea Arcangeli - since commit 444eb2a449ef "mm: thp:
set THP defrag by default to madvise and add a stall-free defrag
option" this will generally be best for performance.

So maybe we should weaken the language against THP. Maybe present the
known facts so far, even if the post 4.6 situation is vague/unknown:
before Linux 4.6 there were repeated reports of THP problems with
Postgres, Linux >= 4.6 might improve things but this isn't confirmed.
And it would be good if somebody could run benchmarks on pre 4.6 and
post 4.6 kernels. I would love to but have no access to big (or
medium) hardware.



Re: Doc tweak for huge_pages?

2018-01-23 Thread Catalin Iacob
On Mon, Jan 22, 2018 at 7:23 AM, Justin Pryzby  wrote:
> Consider this shorter, less-severe sounding alternative:
> "... (but note that this feature can degrade performance of some
> PostgreSQL workloads)."

I think the patch looks good now.

As Justin mentions, as far as I see the only arguable piece is how
strong the language should be against Linux THP.

On one hand it can be argued that warning about THP issues is not the
job of this patch. On the other hand this patch does say more about
THP and Googling does bring up a lot of trouble and advice to disable
THP, including:

https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi=zfsol14rqnqa8hwz1r4rsndlr1b...@mail.gmail.com
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge

The RedHat article above says "However, THP is not recommended for
database workloads."

I'll leave this to the committer and switch this patch to Ready for Committer.

By the way, Fedora 27 does disable THP by default, they deviate from
upstream in this regard:

[catalin@fedie scripts]$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
[catalin@fedie scripts]$ grep TRANSPARENT /boot/config-4.14.13-300.fc27.x86_64
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y

When I have some time I'll try to do some digging into history of the
Fedora kernel package to see if they provide a rationale for changing
the default. That might hint whether it's likely that future RHEL will
change as well.



Re: Doc tweak for huge_pages?

2018-01-08 Thread Catalin Iacob
On Fri, Dec 1, 2017 at 10:09 PM, Thomas Munro
 wrote:
>> On 11/30/17 23:35, Thomas Munro wrote:
>>> Hmm.  Yeah, it does, but apparently it's not so transparent.  So if we
>>> mention that we'd better indicate in the same paragraph that you
>>> probably don't actually want to use it.  How about the attached?

Here's a review for v3.

I find that the first paragraph is an improvement as it's more precise.

What I didn't like about the second paragraph is that it pointed out
Linux transparent huge pages too favorably while they are actually
known to cause big (huge?, pardon the pun) issues (as witnessed in
this thread as well). v3 basically says "in Linux it can be
transparent or explicit and explicit is faster than transparent".
Reading that, and seeing that explicit needs tweaking of kernel
parameters and so on, one might very well conclude "I'll use the
slightly-slower-but-still-better-than-nothing transparent version".

So I tried to redo the second paragraph and ended up with the
attached. Rationale for the changes:
* changed "this feature" to "explicitly requesting huge pages" to
contrast with the automatic one described below
* made the wording of Linux THP more negative (but still with some
wiggle room for future kernel versions which might improve THP),
contrasting with the positive explicit request from this GUC
* integrated your mention of other OSes with automatic huge pages
* moved the new text to the last paragraph to lower its importance

What do you think?
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e4a01699e4..b6b309a943 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1363,14 +1363,15 @@ include_dir 'conf.d'
   
   

-Enables/disables the use of huge memory pages. Valid values are
-try (the default), on,
-and off.
+Controls whether huge memory pages are requested for the main shared
+memory area. Valid values are try (the default),
+on, and off.

 

-At present, this feature is supported only on Linux. The setting is
-ignored on other systems when set to try.
+At present, explicitly requesting huge pages is supported only on
+Linux. The setting is ignored on other systems when set to
+try.

 

@@ -1386,6 +1387,18 @@ include_dir 'conf.d'
 to use huge pages will prevent the server from starting up. With
 off, huge pages will not be used.

+
+   
+Note that, besides explicitly requesting huge pages via
+huge_pages, operating systems including Linux,
+FreeBSD and Illumos can also use huge pages (sometimes known as "super"
+pages or "large" pages) automatically, without an explicit request from
+PostgreSQL. In Linux this automatic use is
+called "transparent huge pages" but, for some Linux kernel versions,
+transparent huge pages are known to cause performance degradation with
+PostgreSQL so, unlike
+huge_pages, their use is discouraged.
+