Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-02 Thread Christoph Berg
Re: Tom Lane 2014-07-01 20654.1404247...@sss.pgh.pa.us
 Yeah, I'm unexcited about this proposal.  In any case, given the two
 existing APIs we have to deal with, allowing PG_OOM_ADJUST_VALUE to
 default to 0 is sane in both APIs but a default for the file name
 can work for only one.

Nod.

 Fair enough.  I went for a minimum-change approach when hacking that
 script, but we could change it some more in the name of readability.
 Will do something about that.

Thanks, it's much nicer now. There's one uglyness left though: The
name PG_CHILD_OOM_SCORE_ADJ should match what actually gets passed to
the backends,

DAEMON_ENV=PG_OOM_ADJUST_FILE=$PG_OOM_ADJUST_FILE 
PG_OOM_ADJUST_VALUE=$PG_CHILD_OOM_SCORE_ADJ

would better be PG_OOM_ADJUST_VALUE=$PG_OOM_ADJUST_VALUE.

(Possibly the smart way to fix this would be to change
src/backend/postmaster/fork_process.c to use PG_CHILD_OOM_SCORE_ADJ
instead.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-02 Thread Jonathan Corbet
On Tue, 1 Jul 2014 15:57:25 -0400
Robert Haas robertmh...@gmail.com wrote:

 Of course, we have no guarantee that the Linux kernel guys won't
 change this again.  Apparently we don't break userspace is a
 somewhat selectively-enforced principle.

It's selectively enforced in that kernel developers don't (and can't)
scan the world for software that might break.  OTOH, if somebody were
to respond to a proposed change by saying this breaks PostgreSQL, I
would be amazed if the change were to be merged in that form.

In other words, it's important to scream.  There were concerns during
the last round of messing with the OOM logic, but nobody pointed to
something that actually broke.

I predict that somebody will certainly try to rewrite the OOM code
again in the future.  The nature of that code is such that it is never
going to work as well as people would like.  But if application
developers make it clear that they are dependent on the current
interface working, kernel developers will be constrained to keep it
working.

jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-02 Thread Tom Lane
Christoph Berg c...@df7cb.de writes:
 Re: Tom Lane 2014-07-01 20654.1404247...@sss.pgh.pa.us
 Fair enough.  I went for a minimum-change approach when hacking that
 script, but we could change it some more in the name of readability.
 Will do something about that.

 Thanks, it's much nicer now. There's one uglyness left though: The
 name PG_CHILD_OOM_SCORE_ADJ should match what actually gets passed to
 the backends,

Meh ... I don't find that to be important.  Making a parallel with
PG_MASTER_OOM_SCORE_ADJ seemed more helpful here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-02 Thread Robert Haas
On Wed, Jul 2, 2014 at 9:08 AM, Jonathan Corbet cor...@lwn.net wrote:
 On Tue, 1 Jul 2014 15:57:25 -0400
 Robert Haas robertmh...@gmail.com wrote:
 Of course, we have no guarantee that the Linux kernel guys won't
 change this again.  Apparently we don't break userspace is a
 somewhat selectively-enforced principle.

 It's selectively enforced in that kernel developers don't (and can't)
 scan the world for software that might break.  OTOH, if somebody were
 to respond to a proposed change by saying this breaks PostgreSQL, I
 would be amazed if the change were to be merged in that form.

 In other words, it's important to scream.  There were concerns during
 the last round of messing with the OOM logic, but nobody pointed to
 something that actually broke.

 I predict that somebody will certainly try to rewrite the OOM code
 again in the future.  The nature of that code is such that it is never
 going to work as well as people would like.  But if application
 developers make it clear that they are dependent on the current
 interface working, kernel developers will be constrained to keep it
 working.

Well, of course the reality is that if you don't break some things,
you can never change anything.  And clearly we want Linux to be able
to change things, to make them better.

On the flip side, interface changes do cause some pain, and it's not
realistic to expect every software project to monitor LKML.

This one is not really a big deal, though; I think we just need to
keep in mind, as you say, that it might change again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-01 Thread Christoph Berg
Re: Tom Lane 2014-06-23 17054.1403542...@sss.pgh.pa.us
  While I'd love to reduce the number of future installations without
  this fix in place, I respect the decision to honor project policy. At
  the same time, this change does not break anything. It introduces new
  environment variables which change the behaviour, but behaves the old
  way in the absence of those variables.
 
 Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
 If a packager is expecting that to still work in 9.4, he's going to be
 unpleasantly surprised, because the system will silently fail to do what
 he's expecting: it will run all the backend processes at no-OOM-kill
 priority, which is likely to be bad.

Ok I'm late to the party, but the reason I'm still joining is we have
proper unit tests which just told me the 9.5 packages have changed OOM
handling. So it wouldn't just slip through if you changed that in 9.4
as well, but would get fixed.

I have two comments on the patch:

The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.
On every modestly new kernel oom_score_adj is present, so
PG_OOM_ADJUST_FILE should default to using it. On the other hand, what
people really want to achieve (or tune) with this feature is to set
the OOM adjustment back to 0 (or some other value), so to me, it would
be much more natural to set PG_OOM_ADJUST_VALUE=0 to activate the
feature, and only have to mess with PG_OOM_ADJUST_FILE if my kernel is
older. (Which it isn't on any kernel supported by Debian and Ubuntu
LTS.)

The other bit is the non-deprecation of OOM_ADJ in
contrib/start-scripts/linux. It took me a while to understand the
logic of the variables used there - the interface would be much
clearer if it just was like this:

... set default PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj

... and then this in the configurable part of the script:

PG_MASTER_OOM_SCORE_ADJ=-1000
PG_OOM_SCORE_ADJ=0
# Older Linux kernels may not have /proc/self/oom_score_adj, but instead
# /proc/self/oom_adj, which works similarly except the disable value is -17.
# For such a system, uncomment these three lines instead.
#PG_OOM_ADJUST_FILE=/proc/self/oom_adj
#PG_MASTER_OOM_SCORE_ADJ=-17
#PG_OOM_SCORE_ADJ=0

... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
out which proc file to write to by looking at OOM.*_ADJ.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-01 Thread Robert Haas
On Tue, Jul 1, 2014 at 8:22 AM, Christoph Berg c...@df7cb.de wrote:
 Re: Tom Lane 2014-06-23 17054.1403542...@sss.pgh.pa.us
  While I'd love to reduce the number of future installations without
  this fix in place, I respect the decision to honor project policy. At
  the same time, this change does not break anything. It introduces new
  environment variables which change the behaviour, but behaves the old
  way in the absence of those variables.

 Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
 If a packager is expecting that to still work in 9.4, he's going to be
 unpleasantly surprised, because the system will silently fail to do what
 he's expecting: it will run all the backend processes at no-OOM-kill
 priority, which is likely to be bad.

 Ok I'm late to the party, but the reason I'm still joining is we have
 proper unit tests which just told me the 9.5 packages have changed OOM
 handling. So it wouldn't just slip through if you changed that in 9.4
 as well, but would get fixed.

 I have two comments on the patch:

 The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
 only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.
 On every modestly new kernel oom_score_adj is present, so
 PG_OOM_ADJUST_FILE should default to using it. On the other hand, what
 people really want to achieve (or tune) with this feature is to set
 the OOM adjustment back to 0 (or some other value), so to me, it would
 be much more natural to set PG_OOM_ADJUST_VALUE=0 to activate the
 feature, and only have to mess with PG_OOM_ADJUST_FILE if my kernel is
 older. (Which it isn't on any kernel supported by Debian and Ubuntu
 LTS.)

Of course, we have no guarantee that the Linux kernel guys won't
change this again.  Apparently we don't break userspace is a
somewhat selectively-enforced principle.

 The other bit is the non-deprecation of OOM_ADJ in
 contrib/start-scripts/linux. It took me a while to understand the
 logic of the variables used there - the interface would be much
 clearer if it just was like this:

 ... set default PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj

 ... and then this in the configurable part of the script:

 PG_MASTER_OOM_SCORE_ADJ=-1000
 PG_OOM_SCORE_ADJ=0
 # Older Linux kernels may not have /proc/self/oom_score_adj, but instead
 # /proc/self/oom_adj, which works similarly except the disable value is -17.
 # For such a system, uncomment these three lines instead.
 #PG_OOM_ADJUST_FILE=/proc/self/oom_adj
 #PG_MASTER_OOM_SCORE_ADJ=-17
 #PG_OOM_SCORE_ADJ=0

 ... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
 out which proc file to write to by looking at OOM.*_ADJ.

I can't help but agree with this, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-07-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 1, 2014 at 8:22 AM, Christoph Berg c...@df7cb.de wrote:
 I have two comments on the patch:
 
 The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
 only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.

 Of course, we have no guarantee that the Linux kernel guys won't
 change this again.  Apparently we don't break userspace is a
 somewhat selectively-enforced principle.

Yeah, I'm unexcited about this proposal.  In any case, given the two
existing APIs we have to deal with, allowing PG_OOM_ADJUST_VALUE to
default to 0 is sane in both APIs but a default for the file name
can work for only one.

 The other bit is the non-deprecation of OOM_ADJ in
 contrib/start-scripts/linux. It took me a while to understand the
 logic of the variables used there - the interface would be much
 clearer if it just was like this:
 ... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
 out which proc file to write to by looking at OOM.*_ADJ.

 I can't help but agree with this, though.

Fair enough.  I went for a minimum-change approach when hacking that
script, but we could change it some more in the name of readability.
Will do something about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh
On Wed, Jun 18, 2014 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh gurj...@singh.im writes:

 Please find attached the patch. It includes the doc changes as well.

 Applied with some editorialization.

Thanks!

would it be possible to include this in 9.4 as well?

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 would it be possible to include this in 9.4 as well?

While this is clearly an improvement over what we had before, it's
impossible to argue that it's a bug fix, and we are way past the 9.4
feature freeze deadline.  In particular, packagers who've already done
their 9.4 development work might be blindsided by us slipping this into
9.4 release.  So while I wouldn't have a problem with putting this change
into 9.4 from a technical standpoint, it's hard to argue that it'd meet
project norms from a development-process standpoint.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh
On Mon, Jun 23, 2014 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh gurj...@singh.im writes:
 would it be possible to include this in 9.4 as well?

 While this is clearly an improvement over what we had before, it's
 impossible to argue that it's a bug fix, and we are way past the 9.4
 feature freeze deadline.  In particular, packagers who've already done
 their 9.4 development work might be blindsided by us slipping this into
 9.4 release.  So while I wouldn't have a problem with putting this change
 into 9.4 from a technical standpoint, it's hard to argue that it'd meet
 project norms from a development-process standpoint.

While I'd love to reduce the number of future installations without
this fix in place, I respect the decision to honor project policy. At
the same time, this change does not break anything. It introduces new
environment variables which change the behaviour, but behaves the old
way in the absence of those variables. So I guess it's not changing
the default behaviour in incompatible way.

BTW, does the project publish the feature-freeze deadlines and other
dates somewhere (apart from on this list). I was looking the other day
and didn't find any reference. Either the commitfest app or the
'Developer' section of the wiki [1] seem to be good candidates for
this kind of information.

[1]: https://wiki.postgresql.org/wiki/Development_information

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 On Mon, Jun 23, 2014 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 While this is clearly an improvement over what we had before, it's
 impossible to argue that it's a bug fix, and we are way past the 9.4
 feature freeze deadline.  In particular, packagers who've already done
 their 9.4 development work might be blindsided by us slipping this into
 9.4 release.  So while I wouldn't have a problem with putting this change
 into 9.4 from a technical standpoint, it's hard to argue that it'd meet
 project norms from a development-process standpoint.

 While I'd love to reduce the number of future installations without
 this fix in place, I respect the decision to honor project policy. At
 the same time, this change does not break anything. It introduces new
 environment variables which change the behaviour, but behaves the old
 way in the absence of those variables.

Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
If a packager is expecting that to still work in 9.4, he's going to be
unpleasantly surprised, because the system will silently fail to do what
he's expecting: it will run all the backend processes at no-OOM-kill
priority, which is likely to be bad.

It is possible to make packages that will work either way, along the lines
of the advice I sent to the Red Hat guys:
https://bugzilla.redhat.com/show_bug.cgi?id=1110969

but I think it's a bit late in the cycle to be telling packagers to do
that for 9.4.

 BTW, does the project publish the feature-freeze deadlines and other
 dates somewhere (apart from on this list).

It's usually in the dev meeting minutes
https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh
On Mon, Jun 23, 2014 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh gurj...@singh.im writes:
 While I'd love to reduce the number of future installations without
 this fix in place, I respect the decision to honor project policy. At
 the same time, this change does not break anything. It introduces new
 environment variables which change the behaviour, but behaves the old
 way in the absence of those variables.

 Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
 If a packager is expecting that to still work in 9.4, he's going to be
 unpleasantly surprised, because the system will silently fail to do what
 he's expecting: it will run all the backend processes at no-OOM-kill
 priority, which is likely to be bad.

True. I didn't think from a packager's perspective.

 It is possible to make packages that will work either way, along the lines
 of the advice I sent to the Red Hat guys:
 https://bugzilla.redhat.com/show_bug.cgi?id=1110969

 but I think it's a bit late in the cycle to be telling packagers to do
 that for 9.4.

Understandable.

 BTW, does the project publish the feature-freeze deadlines and other
 dates somewhere (apart from on this list).

 It's usually in the dev meeting minutes
 https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

Thanks. But it doesn't spell out the specific dates, or even the month
of feature-freeze.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-18 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we're going to do this, I would say that we should also take the
 opportunity to get out from under the question of which kernel API
 we're dealing with.  So perhaps a design like this:
 
 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
 the name of a file to write something into after forking.  The typical
 value would be /proc/self/oom_score_adj or /proc/self/oom_adj.
 If not set, nothing happens.
 
 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
 the string we write, otherwise we write 0.

 Please find attached the patch. It includes the doc changes as well.

Applied with some editorialization.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-16 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 Please find attached the patch. It includes the doc changes as well.

What exactly is the point of the static state you added here?  There
is no situation where that could possibly be useful, because this
code is executed at most once per process, and not at all in the
postmaster.  AFAICS, that's just complication (and permanent waste
of a kilobyte of static data space) for no return.

Also, this seems to try to write the file whether or not the environment
variable was set, which wasn't the plan.

I also don't really see the point of parsing the value variable into an
int.  Why not just spit it out as the original string?  It's not ours to
legislate whether the kernel will take a value that's not an integer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-12 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we're going to do this, I would say that we should also take the
 opportunity to get out from under the question of which kernel API
 we're dealing with.  So perhaps a design like this:

 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
 the name of a file to write something into after forking.  The typical
 value would be /proc/self/oom_score_adj or /proc/self/oom_adj.
 If not set, nothing happens.

 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
 the string we write, otherwise we write 0.

Please find attached the patch. It includes the doc changes as well.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 9fadef5..4492a1d 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1284,8 +1284,15 @@ echo -1000  /proc/self/oom_score_adj
 in the postmaster's startup script just before invoking the postmaster.
 Note that this action must be done as root, or it will have no effect;
 so a root-owned startup script is the easiest place to do it.  If you
-do this, you may also wish to build productnamePostgreSQL/
-with literal-DLINUX_OOM_SCORE_ADJ=0/ added to varnameCPPFLAGS/.
+do this, you may also wish to export these environment variables
+programlisting
+PG_OOM_ADJUST_FILE=oom_score_adj
+PG_OOM_ADJUST_VALUE=0
+
+export PG_OOM_ADJUST_FILE
+export PG_OOM_ADJUST_VALUE
+/programlisting
+in the startup script, before invoking the postmaster.
 That will cause postmaster child processes to run with the normal
 varnameoom_score_adj/ value of zero, so that the OOM killer can still
 target them at need.
@@ -1296,8 +1303,7 @@ echo -1000  /proc/self/oom_score_adj
 but may have a previous version of the same functionality called
 filename/proc/self/oom_adj/.  This works the same except the disable
 value is literal-17/ not literal-1000/.  The corresponding
-build flag for productnamePostgreSQL/ is
-literal-DLINUX_OOM_ADJ=0/.
+value for envarPG_OOM_ADJUST_FILE/ should be varnameoom_adj/.
/para
 
note
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index f6df2de..21559af 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,13 @@
 #endif
 
 #ifndef WIN32
+
+#ifdef __linux__
+static bool	oom_env_checked = false;
+static char	oom_adj_file[MAXPGPATH];
+static int	oom_adj_value = 0;
+#endif	/* __linux__ */
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
@@ -78,13 +85,38 @@ fork_process(void)
 		 * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
 		 * want child processes to adopt here.
 		 */
-#ifdef LINUX_OOM_SCORE_ADJ
+#ifdef __linux__
+		if (!oom_env_checked)
+		{
+			char *env;
+
+			oom_env_checked = true;
+
+			env = getenv(PG_OOM_ADJUST_FILE);
+
+			/* Don't allow a path separator in file name */
+			if (env  !strchr(env, '/')  !strchr(env, '\\'))
+			{
+snprintf(oom_adj_file, MAXPGPATH, /proc/self/%s, env);
+
+env = getenv(PG_OOM_ADJUST_VALUE);
+if (env)
+{
+	oom_adj_value = atoi(env);
+}
+else
+	oom_adj_value = 0;
+			}
+			else
+oom_adj_file[0] = '\0';
+		}
+
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
 			 * Linux security environments reject anything but O_WRONLY.
 			 */
-			int			fd = open(/proc/self/oom_score_adj, O_WRONLY, 0);
+			int			fd = open(oom_adj_file, O_WRONLY, 0);
 
 			/* We ignore all errors */
 			if (fd = 0)
@@ -92,41 +124,14 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_SCORE_ADJ);
+snprintf(buf, sizeof(buf), %d\n, oom_adj_value);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_SCORE_ADJ */
 
-		/*
-		 * Older Linux kernels have oom_adj not oom_score_adj.  This works
-		 * similarly except with a different scale of adjustment values. If
-		 * it's necessary to build Postgres to work with either API, you can
-		 * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
-		 */
-#ifdef LINUX_OOM_ADJ
-		{
-			/*
-			 * Use open() not stdio, to ensure we control the open flags. Some
-			 * Linux security environments reject anything but O_WRONLY.
-			 */
-			int			fd = open(/proc/self/oom_adj, O_WRONLY, 0);
-
-			/* We ignore all errors */
-			if (fd = 0)
-			{
-char		buf[16];
-int			rc;
-
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_ADJ);
-rc = write(fd, buf, strlen(buf));
-(void) rc;
-close(fd);
-			}
-		}
-#endif   /* LINUX_OOM_ADJ */
+#endif   /* __linux__ */
 
 		/*
 		 * Make sure processes do not share OpenSSL randomness state.

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Mon, Sep 19, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
 But having said that, it wouldn't be very hard to arrange things so that
 if you did have both symbols defined, the code would only attempt to
 write oom_adj if it had failed to write oom_score_adj; which is about as
 close as you're likely to get to a kernel version test for this.

 Why is this feature not a run-time configuration variable or at least a
 configure option?  It's awfully well hidden now.  I doubt a lot of
 people are using this even though they might wish to.

 See the thread in which the feature was designed originally:
 http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php

 The key point is that to get useful behavior, you need cooperation
 between both a root-privileged startup script and the PG executable.
 That tends to throw the problem into the domain of packagers, more
 than end users, and definitely puts a big crimp in the idea that
 run-time configuration of just half of the behavior could be helpful.
 So far, no Linux packagers have complained that the design is inadequate
 (a position that I also hold when wearing my red fedora) so I do not
 feel a need to complicate it further.

Startup scripts are not solely in the domain of packagers. End users
can also be expected to develop/edit their own startup scripts.

Providing it as GUC would have given end users both the peices, but
with a compile-time option they have only one half of the solution;
except if they go compile their own binaries, which forces them into
being packagers.

I am not alone in feeling that if Postgres wishes to provide a control
over child backend's oom_score_adj, it should be a GUC parameter
rather than a compile-time option. Yesterday a customer wanted to
leverage this and couldn't because they refuse to maintain their own
fork of Postgres code.

Please find attached a patch to turn it into a GUC, #ifdef'd by __linux__ macro.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index f6df2de..7b9bc38 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,11 @@
 #endif
 
 #ifndef WIN32
+
+#ifdef __linux__
+int linux_oom_score_adj = 0;
+#endif
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
@@ -78,7 +83,7 @@ fork_process(void)
 		 * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
 		 * want child processes to adopt here.
 		 */
-#ifdef LINUX_OOM_SCORE_ADJ
+#ifdef __linux__
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
@@ -92,13 +97,12 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_SCORE_ADJ);
+snprintf(buf, sizeof(buf), %d\n, linux_oom_score_adj);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_SCORE_ADJ */
 
 		/*
 		 * Older Linux kernels have oom_adj not oom_score_adj.  This works
@@ -106,7 +110,6 @@ fork_process(void)
 		 * it's necessary to build Postgres to work with either API, you can
 		 * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
 		 */
-#ifdef LINUX_OOM_ADJ
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
@@ -120,13 +123,13 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_ADJ);
+snprintf(buf, sizeof(buf), %d\n, linux_oom_score_adj);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_ADJ */
+#endif   /* __linux__ */
 
 		/*
 		 * Make sure processes do not share OpenSSL randomness state.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1d094f0..8713abb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -128,6 +128,7 @@ extern bool synchronize_seqscans;
 extern char *SSLCipherSuites;
 extern char *SSLECDHCurve;
 extern bool SSLPreferServerCiphers;
+extern int	linux_oom_score_adj;
 
 #ifdef TRACE_SORT
 extern bool trace_sort;
@@ -2554,6 +2555,18 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef __linux__
+	{
+		{linux_oom_score_adj, PGC_POSTMASTER, RESOURCES_KERNEL,
+			gettext_noop(Sets the oom_score_adj parameter of postmaster's child processes.),
+			NULL,
+		},
+		linux_oom_score_adj,
+		0, -1000, 1000,
+		NULL, NULL, NULL
+	},
+#endif /* __linux__ */
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 Startup scripts are not solely in the domain of packagers. End users
 can also be expected to develop/edit their own startup scripts.

 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.

I don't find that this argument holds any water at all.  Anyone who's
developing their own start script can be expected to manage recompiling
Postgres.

Extra GUCs do not have zero cost, especially not ones that are as
complicated-to-explain as this would be.

I would also argue that there's a security issue with making it a GUC.
A non-root DBA should not have the authority to decide whether or not
postmaster child processes run with nonzero OOM adjustment; that decision
properly belongs to whoever has authorized the root-owned startup script
to change the adjustment in the first place.  So seeing this as two
independent pieces is not only wrong but dangerous.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh gurj...@singh.im writes:
 Startup scripts are not solely in the domain of packagers. End users
 can also be expected to develop/edit their own startup scripts.

 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.

 I don't find that this argument holds any water at all.  Anyone who's
 developing their own start script can be expected to manage recompiling
 Postgres.

I respectfully disagree. Writing and managing init/start scripts
requires much different set of expertise than compiling and managing
builds of a critical software like a database product.

I would consider adding `echo -1000  /proc/self/oom_score_adj` to a
start script as a deployment specific tweak, and not 'developing own
start script'.

 Extra GUCs do not have zero cost, especially not ones that are as
 complicated-to-explain as this would be.

 I would also argue that there's a security issue with making it a GUC.
 A non-root DBA should not have the authority to decide whether or not
 postmaster child processes run with nonzero OOM adjustment; that decision
 properly belongs to whoever has authorized the root-owned startup script
 to change the adjustment in the first place.  So seeing this as two
 independent pieces is not only wrong but dangerous.

From experiments last night, I see that child process' oom_score_adj
is capped by the parent process' setting that is forking. So I don't
think it's a security risk from that perspective.

I'll retest and provide the exact findings.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh gurj...@singh.im writes:
 Startup scripts are not solely in the domain of packagers. End users
 can also be expected to develop/edit their own startup scripts.

 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.

 I don't find that this argument holds any water at all.  Anyone who's
 developing their own start script can be expected to manage recompiling
 Postgres.

Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
want to change their startup script locally.  I don't think those
users should have to give up the benefits of RPM packaging because
they want a different oom_adj value.  Then they have to be responsible
for updating the packages every time there's a new minor release,
instead of just typing 'yum update'.  That's a LOT of extra work.

 Extra GUCs do not have zero cost, especially not ones that are as
 complicated-to-explain as this would be.

NOT having them isn't free either.

 I would also argue that there's a security issue with making it a GUC.
 A non-root DBA should not have the authority to decide whether or not
 postmaster child processes run with nonzero OOM adjustment; that decision
 properly belongs to whoever has authorized the root-owned startup script
 to change the adjustment in the first place.  So seeing this as two
 independent pieces is not only wrong but dangerous.

I think the only possible issue is if the DBA doesn't even have shell
access.  If he doesn't have root but does have shell access, he could
have recompiled anyway - it's just more work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.
 
 I am not alone in feeling that if Postgres wishes to provide a control
 over child backend's oom_score_adj, it should be a GUC parameter
 rather than a compile-time option. Yesterday a customer wanted to
 leverage this and couldn't because they refuse to maintain their own
 fork of Postgres code.

Independent of the rest of the discussion, I think there's one more
point: Trying to keep your system stable by *increasing* the priority of
normal backends is a bad idea. If you system gets into OOM land you need
to fix that, not whack who gets killed around.
The reason it makes sense to increase the priority of the postmaster is
that that *does* increase the stability by cleaning up resources and
restarting everything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.

 I am not alone in feeling that if Postgres wishes to provide a control
 over child backend's oom_score_adj, it should be a GUC parameter
 rather than a compile-time option. Yesterday a customer wanted to
 leverage this and couldn't because they refuse to maintain their own
 fork of Postgres code.

 Independent of the rest of the discussion, I think there's one more
 point: Trying to keep your system stable by *increasing* the priority of
 normal backends is a bad idea. If you system gets into OOM land you need
 to fix that, not whack who gets killed around.
 The reason it makes sense to increase the priority of the postmaster is
 that that *does* increase the stability by cleaning up resources and
 restarting everything.

But the priority is inherited by child processes, so to decrease the
priority of JUST the postmaster you need to decrease its priority and
then set the priority of each child back to the original value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.

 I am not alone in feeling that if Postgres wishes to provide a control
 over child backend's oom_score_adj, it should be a GUC parameter
 rather than a compile-time option. Yesterday a customer wanted to
 leverage this and couldn't because they refuse to maintain their own
 fork of Postgres code.

 Independent of the rest of the discussion, I think there's one more
 point: Trying to keep your system stable by *increasing* the priority of
 normal backends is a bad idea. If you system gets into OOM land you need
 to fix that, not whack who gets killed around.
 The reason it makes sense to increase the priority of the postmaster is
 that that *does* increase the stability by cleaning up resources and
 restarting everything.

As it stands right now, a user can decrease the likelyhood of
Postmaster being killed by adjusting the start script, but that
decreases the likelyhood of al the child processes, too, making the
Postmaster just as likely as a kill-candidate, maybe even higher
because it's the parent, as any backend.

This patch gives the user a control to let the backend's likelyhood of
being killed be different/higher than that of the postmaster.

The users were already capable of doing that, but were required to
custom-compile Postgres to get the benefits. This patch just removes
that requirement.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 10:42:41 -0400, Robert Haas wrote:
 On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
  Providing it as GUC would have given end users both the peices, but
  with a compile-time option they have only one half of the solution;
  except if they go compile their own binaries, which forces them into
  being packagers.
 
  I am not alone in feeling that if Postgres wishes to provide a control
  over child backend's oom_score_adj, it should be a GUC parameter
  rather than a compile-time option. Yesterday a customer wanted to
  leverage this and couldn't because they refuse to maintain their own
  fork of Postgres code.
 
  Independent of the rest of the discussion, I think there's one more
  point: Trying to keep your system stable by *increasing* the priority of
  normal backends is a bad idea. If you system gets into OOM land you need
  to fix that, not whack who gets killed around.
  The reason it makes sense to increase the priority of the postmaster is
  that that *does* increase the stability by cleaning up resources and
  restarting everything.
 
 But the priority is inherited by child processes, so to decrease the
 priority of JUST the postmaster you need to decrease its priority and
 then set the priority of each child back to the original value.

Gurjeet argues for making the absolute value of child processes priority
configurable though? My point is that attacking the problem from that
angle is wrong.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't find that this argument holds any water at all.  Anyone who's
 developing their own start script can be expected to manage recompiling
 Postgres.

 Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
 want to change their startup script locally.

So?  The RPM packager could probably be expected to have compiled with the
oom-adjust-reset option enabled.  If not, why aren't these people lobbying
the packager to meet their expectation?

I remain of the opinion that allowing nonprivileged people to decide
whether that code is active or not is unsafe from a system level.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't find that this argument holds any water at all.  Anyone who's
 developing their own start script can be expected to manage recompiling
 Postgres.

 Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
 want to change their startup script locally.

 So?  The RPM packager could probably be expected to have compiled with the
 oom-adjust-reset option enabled.  If not, why aren't these people lobbying
 the packager to meet their expectation?

Because that might take years to happen, or the packager might never
do it at all on the theory that what is good for customers in general
is different than what's good for one particular customer, or on the
theory that it's just not a high enough priority for that packager.

Are you seriously saying that you've never needed to customize a
startup script on a RHEL box somewhere?

 I remain of the opinion that allowing nonprivileged people to decide
 whether that code is active or not is unsafe from a system level.

On what factual basis?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Independent of the rest of the discussion, I think there's one more
 point: Trying to keep your system stable by *increasing* the priority of
 normal backends is a bad idea. If you system gets into OOM land you need
 to fix that, not whack who gets killed around.
 The reason it makes sense to increase the priority of the postmaster is
 that that *does* increase the stability by cleaning up resources and
 restarting everything.

That's half of the reason.  The other half is that, at least back when
we added this code, the Linux kernel's victim-selection code
disproportionately chose to kill the postmaster rather than any child
backend, an outcome definitely not to be preferred.  IIRC this was because
it blamed the postmaster for the sum of childrens' memory sizes *including
shared memory*, counting the shared memory over again for each child.

I don't know whether our switch away from SysV shmem has helped that, or
if recent kernel versions have less brain-dead OOM victim selection.
I'm not terribly hopeful though.

But anyway, yeah, the point of this feature is that the OOM priority
of the postmaster, and *only* the postmaster, should be raised.  Allowing
unprivileged people to break that is not attractive on any level.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So?  The RPM packager could probably be expected to have compiled with the
 oom-adjust-reset option enabled.  If not, why aren't these people lobbying
 the packager to meet their expectation?

 Because that might take years to happen,

... unlike adding a GUC?

 or the packager might never
 do it at all on the theory that what is good for customers in general
 is different than what's good for one particular customer, or on the
 theory that it's just not a high enough priority for that packager.

 Are you seriously saying that you've never needed to customize a
 startup script on a RHEL box somewhere?

Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
come with this code already enabled in the build, so there is no need for
anyone to have a GUC to play around with the behavior.

 I remain of the opinion that allowing nonprivileged people to decide
 whether that code is active or not is unsafe from a system level.

 On what factual basis?

Because it would convert the intended behavior (postmaster and only
postmaster is exempt from OOM kill) into a situation where possibly
all of the database processes are exempt from OOM kill, at the whim
of somebody who should not have the privilege to decide that.

In my view, the root-owned startup script grants OOM exemption to
the postmaster because it *knows* that the postmaster's children
will drop the exemption.  If that trust can be violated because some
clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
about giving the postmaster the exemption in the first place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
 Andres Freund and...@2ndquadrant.com writes:
  Independent of the rest of the discussion, I think there's one more
  point: Trying to keep your system stable by *increasing* the priority of
  normal backends is a bad idea. If you system gets into OOM land you need
  to fix that, not whack who gets killed around.
  The reason it makes sense to increase the priority of the postmaster is
  that that *does* increase the stability by cleaning up resources and
  restarting everything.
 
 That's half of the reason.  The other half is that, at least back when
 we added this code, the Linux kernel's victim-selection code
 disproportionately chose to kill the postmaster rather than any child
 backend, an outcome definitely not to be preferred.  IIRC this was because
 it blamed the postmaster for the sum of childrens' memory sizes *including
 shared memory*, counting the shared memory over again for each child.

That's essentially the same thing. We want the postmaster survive, not
the children...

 I don't know whether our switch away from SysV shmem has helped that, or
 if recent kernel versions have less brain-dead OOM victim selection.
 I'm not terribly hopeful though.

It has gotten much better in the latest few versions. But it'll be a
fair bit till those will be wildly used. And even then, from the
kernel's view, there's just no reason strongly prefer not to kill the
postmaster over other processes without the user providing the information.

On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
 But anyway, yeah, the point of this feature is that the OOM priority
 of the postmaster, and *only* the postmaster, should be raised.  Allowing
 unprivileged people to break that is not attractive on any level.

A superuser could already write a function that echoes into /proc? I
fail to see how a GUC changes the landscape significantly here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
 But anyway, yeah, the point of this feature is that the OOM priority
 of the postmaster, and *only* the postmaster, should be raised.  Allowing
 unprivileged people to break that is not attractive on any level.

 A superuser could already write a function that echoes into /proc?

Maybe I'm mistaken, but I thought once the fork_process code has reset our
process's setting to zero it's not possible to lower it again (without
privileges we'd not have).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
 Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
 come with this code already enabled in the build, so there is no need for
 anyone to have a GUC to play around with the behavior.

That's imo a fair point. Unless I misunderstand things Gurjeet picked
the topic up again because he wants to increase the priority of the
children. Is that correct Gurjeet?

I do think a GUC might be a nicer interface for this than a
#define. Possibly even with a absolute reset value for the children. But
only because there's no way, afaics, to explicitly reset to the default
value.

  I remain of the opinion that allowing nonprivileged people to decide
  whether that code is active or not is unsafe from a system level.
 
  On what factual basis?
 
 Because it would convert the intended behavior (postmaster and only
 postmaster is exempt from OOM kill) into a situation where possibly
 all of the database processes are exempt from OOM kill, at the whim
 of somebody who should not have the privilege to decide that.

Meh^3. By that argument we need to forbid superusers to create any form
of untrusted functions. Forbid anything that does malloc(), system(),
fork(), whatever from a user's influence.
Superusers can execute code in the postmaster using
shared_preload_libraries. Feigning that that's not possible isn't buying
us anything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 As it stands right now, a user can decrease the likelyhood of
 Postmaster being killed by adjusting the start script, but that
 decreases the likelyhood of al the child processes, too, making the
 Postmaster just as likely as a kill-candidate, maybe even higher
 because it's the parent, as any backend.

Exactly.

 This patch gives the user a control to let the backend's likelyhood of
 being killed be different/higher than that of the postmaster.

If you think your users might want to give the postmaster OOM-exemption,
why don't you just activate the existing code when you build?  Resetting
the OOM setting to zero is safe whether or not the startup script did
anything to the postmaster's setting.

In short, I don't see what a GUC adds here, except uncertainty, which
is not a good thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
  But anyway, yeah, the point of this feature is that the OOM priority
  of the postmaster, and *only* the postmaster, should be raised.  Allowing
  unprivileged people to break that is not attractive on any level.
 
  A superuser could already write a function that echoes into /proc?
 
 Maybe I'm mistaken, but I thought once the fork_process code has reset our
 process's setting to zero it's not possible to lower it again (without
 privileges we'd not have).

No, doesn't look that way. It's possible to reset it to the value set at
process start. So unless we introduce double forks for every backend
start it can be reset by ordinary processes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 11:14 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So?  The RPM packager could probably be expected to have compiled with the
 oom-adjust-reset option enabled.  If not, why aren't these people lobbying
 the packager to meet their expectation?

 Because that might take years to happen,

 ... unlike adding a GUC?

/me shrugs.

Sure, it'll take a release cycle, but once we've made this
configurable, it'll always be configurable.  New packagers who don't
have this set up properly can show up at any time.

 or the packager might never
 do it at all on the theory that what is good for customers in general
 is different than what's good for one particular customer, or on the
 theory that it's just not a high enough priority for that packager.

 Are you seriously saying that you've never needed to customize a
 startup script on a RHEL box somewhere?

 Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
 come with this code already enabled in the build, so there is no need for
 anyone to have a GUC to play around with the behavior.

Well, clearly, somebody hasn't got it right, or there wouldn't be this
complaint.  I'll grant you that somebody may be EnterpriseDB's own
packaging in this instance, but I wouldn't like to bet that no one
else has ever got this wrong nor ever will.  Peter asked upthread why
this wasn't a GUC with the comment that Why is this feature not a
run-time configuration variable or at least a configure option?  It's
awfully well hidden now.  I doubt a lot of people are using this even
though they might wish to.  I think that's quite right, and note that
Peter is in no way affiliated with EnterpriseDB and made that comment
(rather presciently) long before Gurjeet's recent report.

 I remain of the opinion that allowing nonprivileged people to decide
 whether that code is active or not is unsafe from a system level.

 On what factual basis?

 Because it would convert the intended behavior (postmaster and only
 postmaster is exempt from OOM kill) into a situation where possibly
 all of the database processes are exempt from OOM kill, at the whim
 of somebody who should not have the privilege to decide that.

Gurjeet already refused that argument.  Apparently, the child
processes can only increase their chance of being OOM-killed, not
decrease it, so you have this exactly backwards: right now, an
individual system administrator can exempt all of the database
processes from OOM kill, but cannot exempt the postmaster alone.

 In my view, the root-owned startup script grants OOM exemption to
 the postmaster because it *knows* that the postmaster's children
 will drop the exemption.  If that trust can be violated because some
 clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
 about giving the postmaster the exemption in the first place.

How about using an environment variable?  It seems to me that would
address the concern about DBAs without shell access.  They might be
able to frob GUCs, but presumably not the postmaster's starting
environment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 This patch gives the user a control to let the backend's likelyhood of
 being killed be different/higher than that of the postmaster.

 If you think your users might want to give the postmaster OOM-exemption,
 why don't you just activate the existing code when you build?  Resetting
 the OOM setting to zero is safe whether or not the startup script did
 anything to the postmaster's setting.

The whole scenario here is that the user *doesn't want to recompile*.
You seem to be trying to relitigate an argument that Gurjeet already
discussed in his original post and I already refuted once after that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
 Because it would convert the intended behavior (postmaster and only
 postmaster is exempt from OOM kill) into a situation where possibly
 all of the database processes are exempt from OOM kill, at the whim
 of somebody who should not have the privilege to decide that.

 Meh^3. By that argument we need to forbid superusers to create any form
 of untrusted functions. Forbid anything that does malloc(), system(),
 fork(), whatever from a user's influence.

That's utter and complete nonsense.  We're discussing an operation that is
root-privileged (ie, lowering a process's OOM score), not random stuff
that unprivileged processes can do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 11:35:23 -0400, Robert Haas wrote:
  Because it would convert the intended behavior (postmaster and only
  postmaster is exempt from OOM kill) into a situation where possibly
  all of the database processes are exempt from OOM kill, at the whim
  of somebody who should not have the privilege to decide that.
 
 Gurjeet already refused that argument.

Only that he was wrong:
~# echo -500  /proc/$BASHPID/oom_score_adj
~# su andres
~$ echo -1000  /proc/$BASHPID/oom_score_adj
bash: echo: write error: Permission denied # so I can't decrease the likelihood
~$ echo -400  /proc/$BASHPID/oom_score_adj # but I can increase it
~$ echo -500  /proc/$BASHPID/oom_score_adj # but also *reset* it

Since we have to do the adjustment after the fork - otherwise the
postmaster would be vulnerable for a time - normal backends can reset
their score.

 Apparently, the child
 processes can only increase their chance of being OOM-killed, not
 decrease it, so you have this exactly backwards: right now, an
 individual system administrator can exempt all of the database
 processes from OOM kill, but cannot exempt the postmaster alone.

Well, if you compile postgres with the #define it'll reset the
likelihood after the fork? That's the reset? Or do you mean without
compiling postgres with the option?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 11:14 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In my view, the root-owned startup script grants OOM exemption to
 the postmaster because it *knows* that the postmaster's children
 will drop the exemption.  If that trust can be violated because some
 clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
 about giving the postmaster the exemption in the first place.

Even if the clueless DBA tries to set the oom_score_adj below that of
Postmaster, Linux kernel prevents that from being done. I demonstrate
that in the below screen share. I used GUC as well as plain command
line to try and set a child's badness (oom_score_adj) to be lower than
that of Postmaster's, and Linux disallows doing that, unless I use
root's powers.

So the argument that this GUC is a security concern, can be ignored.
Root user (one with control of start script) still controls the lowest
badness setting of all Postgres processes. If done at fork_process
time, the child process simply inherits parent's badness setting.

Best regards,

# Set postmaster badness: -1000
$ sudo echo -1000  /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-1000

# Set backend badness the same as postmaster: -1000
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj =
-1000/' $PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = -1000

$ pgstop; pgstart
pg_ctl: server is running (PID: 65355)
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres -D
/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
waiting for server to shut down done
server stopped
pg_ctl: no server running
waiting for server to start done
server started
Database cluster state:   in production

# Backends and the postmaster have the same badness; lower than default 0
$ for p in $(echo $(pgserverPIDList)| cut -d , --output-delimiter=' '
-f 1-); do echo $p $(cat /proc/$p/oom_score_adj) $(cat
/proc/$p/cmdline); done
537 -1000 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
539 -1000 postgres: checkpointer process
540 -1000 postgres: writer process
541 -1000 postgres: wal writer process
542 -1000 postgres: autovacuum launcher process
543 -1000 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100  /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the lower than postmaster: -500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = -500...
linux_oom_score_adj = -500

$ pgstop; pgstart
...

# Backends are capped to parent's badness.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
716 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
718 -100 postgres: checkpointer process
719 -100 postgres: writer process
720 -100 postgres: wal writer process
721 -100 postgres: autovacuum launcher process
722 -100 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100  /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the higher than postmaster: +500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 500/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 500

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1602 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1604 500 postgres: checkpointer process
1605 500 postgres: writer process
1606 500 postgres: wal writer process
1607 500 postgres: autovacuum launcher process
1608 500 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100  /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Reset backend badness to default: 0
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 0/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 0

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 0 postgres: checkpointer process
1838 0 postgres: writer process
1839 0 postgres: wal writer process
1840 0 postgres: autovacuum launcher process
1841 0 postgres: stats collector process

# Lower checkpointer's badness, but keep it higher than postmaster's
$ echo -1  /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -1 postgres: checkpointer process
...

# Lower checkpointer's badness, but keep it same as postmaster's
$ echo -100  /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, clearly, somebody hasn't got it right, or there wouldn't be this
 complaint.  I'll grant you that somebody may be EnterpriseDB's own
 packaging in this instance, but I wouldn't like to bet that no one
 else has ever got this wrong nor ever will.  Peter asked upthread why
 this wasn't a GUC with the comment that Why is this feature not a
 run-time configuration variable or at least a configure option?  It's
 awfully well hidden now.  I doubt a lot of people are using this even
 though they might wish to.  I think that's quite right, and note that
 Peter is in no way affiliated with EnterpriseDB and made that comment
 (rather presciently) long before Gurjeet's recent report.

I'd be okay with a configure option, if you think that would make this
issue more visible to packagers.  It's delegating the responsibility to
the DBA level that I'm unhappy about.

 Because it would convert the intended behavior (postmaster and only
 postmaster is exempt from OOM kill) into a situation where possibly
 all of the database processes are exempt from OOM kill, at the whim
 of somebody who should not have the privilege to decide that.

 Gurjeet already refused that argument.

He can refuse it all he likes, but that doesn't make his opinion correct.

 How about using an environment variable?  It seems to me that would
 address the concern about DBAs without shell access.  They might be
 able to frob GUCs, but presumably not the postmaster's starting
 environment.

Hmmm ... yeah, that might work.  It would solve the problem I'm worried
about, which is making sure that the startup script has control of what
happens.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 11:40:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
  Because it would convert the intended behavior (postmaster and only
  postmaster is exempt from OOM kill) into a situation where possibly
  all of the database processes are exempt from OOM kill, at the whim
  of somebody who should not have the privilege to decide that.
 
  Meh^3. By that argument we need to forbid superusers to create any form
  of untrusted functions. Forbid anything that does malloc(), system(),
  fork(), whatever from a user's influence.
 
 That's utter and complete nonsense.  We're discussing an operation that is
 root-privileged (ie, lowering a process's OOM score), not random stuff
 that unprivileged processes can do.

Oh, comeon. Tom. You a) conveniently left of the part where I said that
the user can execute code from the postmaster. b) fork() can be used to
escape the oom killer. c) Lots of much worse things can be done to the
system with arbitrary system calls than adjusting oom_score_adj.

The postmaster can currently change oom_score_adj. Users can run code as
a postmaster. Simple as that.

Besides, as demonstrated in
http://www.postgresql.org/message-id/20140610154536.gn8...@alap3.anarazel.de
postmaster children can already reset their score.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd be okay with a configure option, if you think that would make this
 issue more visible to packagers.  It's delegating the responsibility to
 the DBA level that I'm unhappy about.

[...]

 How about using an environment variable?  It seems to me that would
 address the concern about DBAs without shell access.  They might be
 able to frob GUCs, but presumably not the postmaster's starting
 environment.

 Hmmm ... yeah, that might work.  It would solve the problem I'm worried
 about, which is making sure that the startup script has control of what
 happens.

A configure option wouldn't address the issue from my perspective; you
still have to recompile if you don't like what your packager did, and
your packager might still be stupid.  However, an environment variable
seems like it would be just fine.  It might even be more convenient
for packagers that having to compile the desired value into the
binary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
and...@anarazel.de (Andres Freund) writes:
 On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
 Maybe I'm mistaken, but I thought once the fork_process code has reset our
 process's setting to zero it's not possible to lower it again (without
 privileges we'd not have).

 No, doesn't look that way. It's possible to reset it to the value set at
 process start. So unless we introduce double forks for every backend
 start it can be reset by ordinary processes.

That's kind of annoying --- I wonder why they went to the trouble of doing
that?  But anyway, it's probably not worth the cost of double-forking to
prevent it.  I still say though that this is not a reason to make it as
easy as change-a-GUC to break the intended behavior.

Robert's idea of having the start script set an environment variable to
control the OOM adjustment reset seems like it would satisfy my concern.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 11:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
 Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
 come with this code already enabled in the build, so there is no need for
 anyone to have a GUC to play around with the behavior.

 That's imo a fair point. Unless I misunderstand things Gurjeet picked
 the topic up again because he wants to increase the priority of the
 children. Is that correct Gurjeet?

Yes. A DBA would like to prevent the postmaster from being killed by
OOM killer, but let the child processes be still subject to OOM
killer's whim.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If you think your users might want to give the postmaster OOM-exemption,
 why don't you just activate the existing code when you build?  Resetting
 the OOM setting to zero is safe whether or not the startup script did
 anything to the postmaster's setting.

 The whole scenario here is that the user *doesn't want to recompile*.

Yeah, I understood that.  The question is why EDB isn't satisfied to just
add -DLINUX_OOM_ADJ=0 to their build options, but instead would like to
dump a bunch of uncertainty on other packagers who might not like the
implications of a GUC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Andres Freund
On 2014-06-10 11:52:17 -0400, Tom Lane wrote:
 and...@anarazel.de (Andres Freund) writes:
  On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
  Maybe I'm mistaken, but I thought once the fork_process code has reset our
  process's setting to zero it's not possible to lower it again (without
  privileges we'd not have).
 
  No, doesn't look that way. It's possible to reset it to the value set at
  process start. So unless we introduce double forks for every backend
  start it can be reset by ordinary processes.
 
 That's kind of annoying --- I wonder why they went to the trouble of doing
 that?

My guess is that they want to allow a process to only temporarily reduce
the likelihood of getting killed while it's doing something
important.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 and...@anarazel.de (Andres Freund) writes:
 On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
 Maybe I'm mistaken, but I thought once the fork_process code has reset our
 process's setting to zero it's not possible to lower it again (without
 privileges we'd not have).

 No, doesn't look that way. It's possible to reset it to the value set at
 process start. So unless we introduce double forks for every backend
 start it can be reset by ordinary processes.

 That's kind of annoying --- I wonder why they went to the trouble of doing
 that?  But anyway, it's probably not worth the cost of double-forking to
 prevent it.  I still say though that this is not a reason to make it as
 easy as change-a-GUC to break the intended behavior.

 Robert's idea of having the start script set an environment variable to
 control the OOM adjustment reset seems like it would satisfy my concern.

I'm fine with this solution. Should this be a constant 0, or be
configurable based on env. variable's value?

-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 On Tue, Jun 10, 2014 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert's idea of having the start script set an environment variable to
 control the OOM adjustment reset seems like it would satisfy my concern.

 I'm fine with this solution. Should this be a constant 0, or be
 configurable based on env. variable's value?

If we're going to do this, I would say that we should also take the
opportunity to get out from under the question of which kernel API
we're dealing with.  So perhaps a design like this:

1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
the name of a file to write something into after forking.  The typical
value would be /proc/self/oom_score_adj or /proc/self/oom_adj.
If not set, nothing happens.

2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
the string we write, otherwise we write 0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 11:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If you think your users might want to give the postmaster OOM-exemption,
 why don't you just activate the existing code when you build?  Resetting
 the OOM setting to zero is safe whether or not the startup script did
 anything to the postmaster's setting.

 The whole scenario here is that the user *doesn't want to recompile*.

 Yeah, I understood that.  The question is why EDB isn't satisfied to just
 add -DLINUX_OOM_ADJ=0 to their build options, but instead would like to
 dump a bunch of uncertainty on other packagers who might not like the
 implications of a GUC.

Well, I think we should do that, too, and I'm going to propose it to
Dave  team.  If we're not already doing the right thing here (and I
don't know off-hand whether we are), then that is definitely our issue
to fix and there's no reason to change PostgreSQL on that account.

But I think the point is that this is a pretty subtle and
well-concealed thing which a PostgreSQL packager might fail to do
correctly in every instance - or where an individual user might want
to install a different policy than whatever the packager's default is.
Making that easy to do without recompiling seems to me to be a general
good.  Magnus once commented to me that every GUC which is
PGC_POSTMASTER is a bug - perhaps not an easy bug to fix, but a bug
all the same (his words), because requiring people to restart the
postmaster to change settings is often problematic.  I think this is
even more true of recompiling.  IMO, it should be an explicit goal of
the project to make reconfiguration of an already-installed system -
perhaps even already in production - as easy as possible. We will
doubtless fail to achieve perfection but that doesn't mean we
shouldn't try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread David G Johnston
In short:

I can accept the idea that picking reasonable specific values is impossible
so let's just ensure that children are always killed before the parent
(basically the default behavior). If you then say that any system that is
capable of implementing that rule should have it set then leaving it
unexposed makes sense.  That line of thought does not require the abstract
cost of a GUC to factor into the equation.

However, in considering system configuration and concurrently running
processes


Tom Lane-2 wrote
 Extra GUCs do not have zero cost, especially not ones that are as
 complicated-to-explain as this would be.

The explain factor does little for me since if given a reasonable default
the GUC can be ignored until a problem manifests.  At that point not having
a GUC makes resolving the problem require someone to stop using packaging
and instead compile their own source.  This results in a poor user
experience.

So, if someone where to provide a complete patch that introduced a GUC to
enable/disable as well as set priorities - to include documentation and
reasonable postgresql.conf defaults - what specific ongoing GUC costs would
prevent applying said patch?

You mention a security hazard but that is not a cost of GUCs generally but
this one specifically; and one that seems to have been deemed no riskier
than other DBA capabilities.

Help me understand the cost side of the equation since the benefit side
seems clear-cut enough to me.  The OOM problem is real and making PostgreSQL
fit within the overall memory management architecture of a given server is
something that is the responsibility of the system admin in conjunction with
the DBA - not us or the packager.  I'll buy that because this is a linux
issue that the implementation could be packager selectable but given the
dynamics of Linux centralizing a reasonable default implementation into core
makes sense.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/proc-self-oom-adj-is-deprecated-in-newer-Linux-kernels-tp4810971p5806697.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Joshua D. Drake


On 06/10/2014 07:02 AM, Tom Lane wrote:

Gurjeet Singh gurj...@singh.im writes:

Startup scripts are not solely in the domain of packagers. End users
can also be expected to develop/edit their own startup scripts.



Providing it as GUC would have given end users both the peices, but
with a compile-time option they have only one half of the solution;
except if they go compile their own binaries, which forces them into
being packagers.


I don't find that this argument holds any water at all.  Anyone who's
developing their own start script can be expected to manage recompiling
Postgres.


This is certainly not true in any fashion. Production systems require a 
degree of flexibility and configuration that does not require the 
maintaining or compiling of source code.


Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread David G Johnston
Gurjeet Singh-4 wrote
 Even if the clueless DBA tries to set the oom_score_adj below that of
 Postmaster, Linux kernel prevents that from being done. I demonstrate
 that in the below screen share. I used GUC as well as plain command
 line to try and set a child's badness (oom_score_adj) to be lower than
 that of Postmaster's, and Linux disallows doing that, unless I use
 root's powers.
 
 So the argument that this GUC is a security concern, can be ignored.
 Root user (one with control of start script) still controls the lowest
 badness setting of all Postgres processes. If done at fork_process
 time, the child process simply inherits parent's badness setting.

The counter point here is that the postmaster can be set to no kill and
the = condition allows for children to achieve the same while it is our
explicit intent that the children be strictly  parent.

To that end, should the adjustment value be provided as an offset to the
postmasters instead of an absolute value - and disallow = zero offset
values in the process?

I get and generally agree with the environment variable proposal and it's
stated goal to restrict whom can makes changes. But how much less cost does
an environment variable have than a GUC if one GUC argument is still its
maintenance overhead?

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/proc-self-oom-adj-is-deprecated-in-newer-Linux-kernels-tp4810971p5806700.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Magnus Hagander
On Jun 10, 2014 7:05 PM, Joshua D. Drake j...@commandprompt.com wrote:


 On 06/10/2014 07:02 AM, Tom Lane wrote:

 Gurjeet Singh gurj...@singh.im writes:

 Startup scripts are not solely in the domain of packagers. End users
 can also be expected to develop/edit their own startup scripts.


 Providing it as GUC would have given end users both the peices, but
 with a compile-time option they have only one half of the solution;
 except if they go compile their own binaries, which forces them into
 being packagers.


 I don't find that this argument holds any water at all.  Anyone who's
 developing their own start script can be expected to manage recompiling
 Postgres.


 This is certainly not true in any fashion. Production systems require a
degree of flexibility and configuration that does not require the
maintaining or compiling of source code.


As JD surely understands, it pains me a lot but I have to agree with him
here.

There are also a non-trivial number of organizations that just don't allow
compilers on production boxes, adding another hurdle for those. We may
think that's ridiculous, and it may be, but it's reality.

That said, I agree that a guc is probably a bad idea. I like the idea of an
environment variable if we need it to be changeable. Or perhaps what we
need is just better documentation of what's there already. Perhaps the
documentation should have a section specifically aimed at packagers?

/Magnus


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh
On Tue, Jun 10, 2014 at 1:13 PM, David G Johnston
david.g.johns...@gmail.com wrote:
 Gurjeet Singh-4 wrote
 So the argument that this GUC is a security concern, can be ignored.
 Root user (one with control of start script) still controls the lowest
 badness setting of all Postgres processes. If done at fork_process
 time, the child process simply inherits parent's badness setting.

 The counter point here is that the postmaster can be set to no kill and

Only the root user can do that, since he/she has control over the
start script. All that the DBA could do with a GUC is to set backends'
badness worse than postmaster's, but bot better.

 the = condition allows for children to achieve the same while it is our
 explicit intent that the children be strictly  parent.

I don't think anyone argued for that behaviour.

 To that end, should the adjustment value be provided as an offset to the
 postmasters instead of an absolute value - and disallow = zero offset
 values in the process?

Seems unnecessary, given current knowledge.

 I get and generally agree with the environment variable proposal and it's
 stated goal to restrict whom can makes changes. But how much less cost does
 an environment variable have than a GUC if one GUC argument is still its
 maintenance overhead?

Having it as a GUC would have meant that two entities are required to
get the configuration right: one who controls start scripts, and the
other who controls GUC settings.

With the environment variable approach, a root user alone can control
the behaviour like so in start script:

echo -200  /proc/self/oom_score_adj
export PG_OOM_ADJUST_FILE=oom_score_adj
export PG_OOM_ADJUST_VALUE=-100

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2012-06-12 Thread Tom Lane
I was reminded today that we still haven't done anything about this:

Tom Lane t...@sss.pgh.pa.us writes:
 While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice
 messages like these in the kernel log:
 Sep 11 13:38:56 rhl kernel: [  415.308092] postgres (18040): 
 /proc/18040/oom_adj is deprecated, please use /proc/18040/oom_score_adj 
 instead.

At this point there are no shipping Fedora versions that don't emit this
gripe, and F15 is even about to go EOL.

The previous discussion thread at
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00794.php
went off into the weeds of what was in my opinion over-design.
I still think it's sufficient to do what I suggested initially:

 ... The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.
 If we do that, then it would be up to the packager whether to #define
 LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
 he wants.

and would like to squeeze that into 9.2 so that we're only a year late
and not two years late in responding to this issue :-(.

Objections?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2012-06-12 Thread Robert Haas
On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I was reminded today that we still haven't done anything about this:

 Tom Lane t...@sss.pgh.pa.us writes:
 While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice
 messages like these in the kernel log:
 Sep 11 13:38:56 rhl kernel: [  415.308092] postgres (18040): 
 /proc/18040/oom_adj is deprecated, please use /proc/18040/oom_score_adj 
 instead.

 At this point there are no shipping Fedora versions that don't emit this
 gripe, and F15 is even about to go EOL.

 The previous discussion thread at
 http://archives.postgresql.org/pgsql-hackers/2011-09/msg00794.php
 went off into the weeds of what was in my opinion over-design.
 I still think it's sufficient to do what I suggested initially:

 ... The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.
 If we do that, then it would be up to the packager whether to #define
 LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
 he wants.

 and would like to squeeze that into 9.2 so that we're only a year late
 and not two years late in responding to this issue :-(.

 Objections?

I think my position, and the position of the people who responded to
the original thread, was that it seems like you're architecting a
kludge when it wouldn't be that hard to architect a nicer solution.
That having been said, I don't think it's such a large kludge that we
should all have massive dry heaves at the thought of it living in our
repository.  And then too, this isn't the time to be architecting new
9.2 feature anyway.  So I say go for it.  If it makes your life
easier, back-patch it.  Code that requires a #define to enable it
won't break anything for anyone else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2012-06-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I still think it's sufficient to do what I suggested initially:
 ... The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.

 I think my position, and the position of the people who responded to
 the original thread, was that it seems like you're architecting a
 kludge when it wouldn't be that hard to architect a nicer solution.

I'd be the first to agree it's a kluge.  But the end result of the
previous discussion was that it wasn't so obvious how to architect
a nicer solution.  Nor is there all that much room for people to use a
nicer solution, given that we need help from not just fork_process.c
but the root-privileged startup script (or lately in Fedora it's a
systemd unit script doing the privilege-increasing end of this).

In the short run, if we don't have consensus on this, I'll probably
just carry a Fedora patch like so:

-intfd = open(/proc/self/oom_adj, O_WRONLY, 0);
+intfd = open(/proc/self/oom_score_adj, O_WRONLY, 0);

But it seems a tad silly to be carrying such a patch for a block of
code that is only of interest to Linux packagers anyway, and nearly
all such packagers are facing this same issue either now or in the
very near future.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2012-06-12 Thread Robert Haas
On Tue, Jun 12, 2012 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I still think it's sufficient to do what I suggested initially:
 ... The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.

 I think my position, and the position of the people who responded to
 the original thread, was that it seems like you're architecting a
 kludge when it wouldn't be that hard to architect a nicer solution.

 I'd be the first to agree it's a kluge.  But the end result of the
 previous discussion was that it wasn't so obvious how to architect
 a nicer solution.  Nor is there all that much room for people to use a
 nicer solution, given that we need help from not just fork_process.c
 but the root-privileged startup script (or lately in Fedora it's a
 systemd unit script doing the privilege-increasing end of this).

Well, I don't think a GUC or two would be such a bad way of doing it, but...

 In the short run, if we don't have consensus on this, I'll probably
 just carry a Fedora patch like so:

 -            int        fd = open(/proc/self/oom_adj, O_WRONLY, 0);
 +            int        fd = open(/proc/self/oom_score_adj, O_WRONLY, 0);

 But it seems a tad silly to be carrying such a patch for a block of
 code that is only of interest to Linux packagers anyway, and nearly
 all such packagers are facing this same issue either now or in the
 very near future.

...I also agree with this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-19 Thread Peter Eisentraut
On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
  So it looks like it behooves us to cater for oom_score_adj in the
  future.  The simplest, least risky change that I can think of is to
  copy-and-paste the relevant #ifdef code block in fork_process.c.
  If we do that, then it would be up to the packager whether to #define
  LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
  he wants.
 
  There are lots of reasons why that won't work: backports, forward ports,
  derivatives, custom kernels, distribution upgrades, virtual hosting.
 
 [ shrug... ]  These are all hypothetical reasons.  A packager who
 foresaw needing that could just turn on both write attempts, or for that
 matter patch the code to do whatever else he saw fit.  In practice,
 we've not had requests for anything significantly smarter than what is
 there.
 
 But having said that, it wouldn't be very hard to arrange things so that
 if you did have both symbols defined, the code would only attempt to
 write oom_adj if it had failed to write oom_score_adj; which is about as
 close as you're likely to get to a kernel version test for this.

Why is this feature not a run-time configuration variable or at least a
configure option?  It's awfully well hidden now.  I doubt a lot of
people are using this even though they might wish to.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
 But having said that, it wouldn't be very hard to arrange things so that
 if you did have both symbols defined, the code would only attempt to
 write oom_adj if it had failed to write oom_score_adj; which is about as
 close as you're likely to get to a kernel version test for this.

 Why is this feature not a run-time configuration variable or at least a
 configure option?  It's awfully well hidden now.  I doubt a lot of
 people are using this even though they might wish to.

See the thread in which the feature was designed originally:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php

The key point is that to get useful behavior, you need cooperation
between both a root-privileged startup script and the PG executable.
That tends to throw the problem into the domain of packagers, more
than end users, and definitely puts a big crimp in the idea that
run-time configuration of just half of the behavior could be helpful.
So far, no Linux packagers have complained that the design is inadequate
(a position that I also hold when wearing my red fedora) so I do not
feel a need to complicate it further.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-18 Thread Peter Eisentraut
On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
 So it looks like it behooves us to cater for oom_score_adj in the
 future.  The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.
 If we do that, then it would be up to the packager whether to #define
 LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
 he wants.

There are lots of reasons why that won't work: backports, forward ports,
derivatives, custom kernels, distribution upgrades, virtual hosting.

ISTM that we could at least query the currently used kernel version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-18 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
 So it looks like it behooves us to cater for oom_score_adj in the
 future.  The simplest, least risky change that I can think of is to
 copy-and-paste the relevant #ifdef code block in fork_process.c.
 If we do that, then it would be up to the packager whether to #define
 LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
 he wants.

 There are lots of reasons why that won't work: backports, forward ports,
 derivatives, custom kernels, distribution upgrades, virtual hosting.

[ shrug... ]  These are all hypothetical reasons.  A packager who
foresaw needing that could just turn on both write attempts, or for that
matter patch the code to do whatever else he saw fit.  In practice,
we've not had requests for anything significantly smarter than what is
there.

But having said that, it wouldn't be very hard to arrange things so that
if you did have both symbols defined, the code would only attempt to
write oom_adj if it had failed to write oom_score_adj; which is about as
close as you're likely to get to a kernel version test for this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-16 Thread Greg Stark
On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Does anyone want
 to argue for doing something more complicated, and if so what exactly?


Well there's no harm trying to write to oom_score_adj and if that
fails with EEXISTS trying to write to oom_adj.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-16 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Does anyone want
 to argue for doing something more complicated, and if so what exactly?

 Well there's no harm trying to write to oom_score_adj and if that
 fails with EEXISTS trying to write to oom_adj.

Well, (1) what value are you going to write (they need to be different
for the two files), and (2) the main point of the exercise, at present,
is to avoid kernel log messages.  I'm not sure that trying to create
random files under /proc isn't going to draw bleats in the kernel log.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-16 Thread Alvaro Herrera

Excerpts from Tom Lane's message of vie sep 16 13:37:46 -0300 2011:
 Greg Stark st...@mit.edu writes:
  On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Does anyone want
  to argue for doing something more complicated, and if so what exactly?
 
  Well there's no harm trying to write to oom_score_adj and if that
  fails with EEXISTS trying to write to oom_adj.
 
 Well, (1) what value are you going to write (they need to be different
 for the two files), and (2) the main point of the exercise, at present,
 is to avoid kernel log messages.  I'm not sure that trying to create
 random files under /proc isn't going to draw bleats in the kernel log.

I guess the question is what semantics the new code has.  In the old
badness() world, child processes inherited the oom_adj value of its
parent.  What the code in fork_process was used for was resetting the
value back to 0 (meaning kernel is allowed to kill this process on
OOM), so that you could set the oom_adj in the start script for
postmaster (to a value meaning never kill this one), and the backends
would see their values reset to zero.

The new oom_score_adj has a scale of -1000 to +1000, with zero being
neutral and -1000 being never kill.  So what we want to do here in
most cases, it seems, is set the value to zero whether it's oom_adj or
oom_score_adj -- assuming the new code is still causing children
processes to inherit the adj value from the parent.

Now the problem is that we have defined the LINUX_OOM_ADJ symbol as
meaning the value we're going to write.  Maybe this wasn't the best
choice.  I mean, it's very flexible, but it doesn't seem to offer any
benefit over a plain boolean choice.

Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the
same semantics?

The most thorough documentation on this option seems to be this commit:
https://github.com/mirrors/linux-2.6/commit/a63d83f427fbce97a6cea0db2e64b0eb8435cd10#include/linux/oom.h

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-16 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Now the problem is that we have defined the LINUX_OOM_ADJ symbol as
 meaning the value we're going to write.  Maybe this wasn't the best
 choice.  I mean, it's very flexible, but it doesn't seem to offer any
 benefit over a plain boolean choice.

 Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the
 same semantics?

Yes, that's what I was thinking.  We could avoid that if we were going
to hard-wire a decision that zero is the thing to write, but I see no
reason to place such a restriction on users.  Who's to say they might
not want backends to adopt some other value?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2011-09-16 Thread Alex Hunsaker
On Fri, Sep 16, 2011 at 10:37, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Does anyone want
 to argue for doing something more complicated, and if so what exactly?

 Well there's no harm trying to write to oom_score_adj and if that
 fails with EEXISTS trying to write to oom_adj.

Yeah, I don't really like the idea of a compile time option that is
kernel version dependent... But i don't feel too strongly about it
either (all my kernels are new enough that they support
oom_score_adj).

I'll also note that on my system we are in the good company of ssd and chromium:
sshd (978): /proc/978/oom_adj is deprecated, please use
/proc/978/oom_score_adj instead.
chromium-sandbo (1377): /proc/1375/oom_adj is deprecated, please use
/proc/1375/oom_score_adj instead.

[ It quite annoying that soon after we decided to stick
-DLINUX_OOM_ADJ in they changed it.  Whatever happened to a stable
userspace API :-( ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers