Re: [HACKERS] proposal: hide application_name from other users

2014-01-29 Thread Josh Berkus
On 01/29/2014 10:19 AM, Simon Riggs wrote:
 No specific reason that I can recall but replication is heavily
 protected by layers of security.
 
 pg_stat_replication is a join with pg_stat_activity, so some of the
 info is open, some closed. It seems possible to relax that.

I'm all for the idea of restrict, then open up.  That is, it made
sense to start with data restricted, but then unrestrict is as we know
it's OK.  Going the other way generally isn't possible, as this patch
demonstrates.

 Presumably the current patch is returned with feedback? Or can we fix
 these problems by inventing a new user aspect called MONITOR (similar
 to REPLICATION)? We can grant application_name and replication details
 to that.

Yeah, except I don't see doing the MONITOR thing for 9.4.  We'd need a
spec for it first.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-29 Thread Josh Berkus
On 01/29/2014 11:54 AM, Robert Haas wrote:
 I agree.  I find it somewhat unlikely that pg_stat_statements is
 fragile enough that these few extra counters are going to make much of
 a difference.  At the same time, I find min and max a dubious value
 proposition.  It seems highly likely to me that stddev will pay its
 way, but I'm much less certain about the others.

What I really want is percentiles, but I'm pretty sure we already shot
that down. ;-)

I could use min/max -- think of performance test runs.  However, I agree
that they're less valuable than stddev.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-01-29 Thread Josh Berkus
On 01/29/2014 12:46 PM, Merlin Moncure wrote:
 I think the opening paragraphs contrasting json/jsonb be needs
 refinement.  json is going to be slightly faster than jsonb for input
 *and* output.  For example, in one application I store fairly large
 json objects containing pre-compiled static polygon data that is
 simply flipped up to google maps.  This case will likely be pessimal
 for jsonb.  For the next paragaph, I'd like to expand it a bit on
 'specialized needs' and boil it down to specific uses cases.
 Basically, json will likely be more compact in most cases and slightly
 faster for input/output;  jsonb would be preferred in any context
 where processing, or searching or extensive server side parsing is
 employed.
 
 If you agree, I'd be happy to do that...

Please take a stab at it, I'll be happy to revise it.

I was working on doing a two-column table comparison chart; I still
think that's the best way to go.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-01-29 Thread Josh Berkus
On 01/29/2014 02:37 PM, Merlin Moncure wrote:
 create table bar(a int, b int[]);
 postgres=# select jsonb_populate_record(null::bar, '{a: 1, b:
 [1,2]}'::jsonb, false);
 ERROR:  cannot populate with a nested object unless use_json_as_text is true

Hmmm. What about just making any impossibly complex objects type JSON?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Josh Berkus
On 01/28/2014 09:58 AM, Merlin Moncure wrote:
 yeah. note: I think the json documentation needs *major* overhaul. too
 much is going in inside the function listings where there really
 should be a big breakout discussing the big picture of json/jsonb
 with examples of various use cases.  I want to give it a shot but
 unfortunately can not commit to do that by the end of the 'fest.

FWIW, I've promised Andrew that I'll overhaul this by the end of beta.
Given that we have all of beta for doc refinements.

In addition to this, the JSON vs JSONB datatype page really needs
expansion and clarification.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-28 Thread Josh Berkus
On 01/27/2014 01:06 PM, Alvaro Herrera wrote:
 Andrew Dunstan escribió:
 
 I'm not sure I understand the need. This is the difference between
 the _text variants and their parents. Why would you call
 json_object_field when you want the dequoted text?
 
 Because I first need to know its type.  Sometimes it's an array, or an
 object, or a boolean, and for those I won't call the _text version
 afterwards but just use the original.

It would make more sense to extract them as JSON, check the type, and
convert.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins

2014-01-28 Thread Josh Berkus
On 01/28/2014 09:46 AM, Atri Sharma wrote:
 I would like to bring up the addition to MADLIB algorithms again this year.
 
 Also, some work on the foreign table constraints could be helpful.

We can only take MADLIB this year if we have confirmed mentors who are
MADLIB committers before the end of the application period (Feb 15).  We
can't have a repeat of last year.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Josh Berkus
On 01/28/2014 10:29 AM, Merlin Moncure wrote:
 In addition to this, the JSON vs JSONB datatype page really needs
 expansion and clarification.
 
 right: exactly.  I'd be happy to help (such as I can) ...I wanted to
 see if jsonb to make it in on this 'fest (doc issues notwithstanding);
 it hasn't been formally reviewed yet AFAICT.  So my thinking here is
 to get docs to minimum acceptable standards in the short term and
 focus on the structural code issues for the 'fest (if jsonb slips then
 it's moot obviously).

Well, having reviewed the docs before Andrew sent them in, I felt they
already *were* minimum acceptable.  Certainly they're as complete as
the original JSON docs were.

Or is this just about whitespace and line breaks?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Josh Berkus
On 01/28/2014 10:56 AM, Alvaro Herrera wrote:
 Josh Berkus escribió:
 
 Or is this just about whitespace and line breaks?
 
 If the docs are going to be rehauled, please ignore my whitespace
 comments.

I'm sure you'll find plenty to criticize in my version.  ;-)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-28 Thread Josh Berkus
On 01/28/2014 07:27 AM, Greg Stark wrote:
 Why is application_name useful for users who aren't the DBA and aren't
 the user in question. The sql_query would probably be more useful than
 application_name but we hide that...

I have non-privileged monitoring scripts do counts of connections by
application name all the time as a way of checking for runaway
applications, and would be quite put out by restricting this to superusers.

Really the only way we're going to solve this is to make column
permissions on special system views fully configurable.

For example, I would really like to GRANT an unpriv user access to the
WAL columns in pg_stat_replication so that I can monitor replication
delay without granting superuser permissions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-28 Thread Josh Berkus
On 01/28/2014 12:10 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 For example, I would really like to GRANT an unpriv user access to the
 WAL columns in pg_stat_replication so that I can monitor replication
 delay without granting superuser permissions.
 
 Just out of curiosity, why is that superuser-only at all?  AFAICS the
 hidden columns are just some LSNs ... what is the security argument
 for hiding them in the first place?

Beats me, I can't find any discussion on it at all.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Josh Berkus
On 01/26/2014 07:56 PM, Rajeev rastogi wrote:
 I shall rework to improve this patch. Below are the summarization of all
 discussions, which will be used as input for improving the patch:
 
 1. Method of degrading the synchronous mode:
   a. Expose the configuration variable to a new SQL-callable functions.
   b. Using ALTER SYSTEM SET.
   c. Auto-degrade using some sort of configuration parameter as done in 
 current patch.
   d. Or may be combination of above, which DBA can use depending on their 
 use-cases.  
 
   We can discuss further to decide on one of the approach.
 
 2. Synchronous mode should upgraded/restored after at-least one synchronous 
 standby comes up and has caught up with the master.
 
 3. A better monitoring/administration interfaces, which can be even better if 
 it is made as a generic trap system.
 
   I shall propose a better approach for this.
 
 4. Send committing clients, a WARNING if they have committed a synchronous 
 transaction and we are in degraded mode.
 
 5. Please add more if I am missing something.

I think we actually need two degrade modes:

A. degrade once: if the sync standby connection is ever lost, degrade
and do not resync.

B. reconnect: if the sync standby catches up again, return it to sync
status.

The reason you'd want degrade once is to avoid the flaky network
issue where you're constantly degrading then reattaching the sync
standby, resulting in horrible performance.

If we did offer degrade once though, we'd need some easy way to
determine that the master was in a state of permanent degrade, and a
command to make it resync.

Discuss?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-24 Thread Josh Berkus
On 01/23/2014 07:22 PM, Alvaro Herrera wrote:
 If you ask me, I'd like autovac to know when not to run (or rather
  wait a bit, not forever), perhaps by checking load factors or some
  other tell-tale of an already-saturated I/O system.
 We had a proposed design to tell autovac when not to run (or rather,
 when to switch settings very high so that in practice it'd never run).
 At some point somebody said but we can just change autovacuum=off in
 postgresql.conf via crontab when the high load period starts, and turn
 it back on afterwards --- and that was the end of it.

Anything which depends on a timing-based feedback loop is going to be
hopeless.  Saying autovac shouldn't run if load is high sounds like a
simple statement, until you actually try to implement it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-24 Thread Josh Berkus
On 01/24/2014 12:59 PM, Andrew Dunstan wrote:
 
 On 01/24/2014 03:40 PM, Laurence Rowe wrote:
 For consistency with the existing json functions (json_each,
 json_each_text, etc.) it might be better to add separate
 json_to_record_text and json_to_recordset_text functions in place of
 the nested_as_text parameter to json_to_record and json_to_recordset.


 
 It wouldn't be consistent with json_populate_record() and
 json_populate_recordset(), the two closest relatives, however.
 
 And yes, I appreciate that we have not been 100% consistent. Community
 design can be a bit messy that way.

FWIW, I prefer the parameter to having differently named functions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Josh Berkus
On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
 ISTM the consensus is that we need better monitoring/administration
 interfaces so that people can script the behavior they want in external
 tools. Also, a new synchronous apply replication mode would be handy,
 but that'd be a whole different patch. We don't have a patch on the
 table that we could consider committing any time soon, so I'm going to
 mark this as rejected in the commitfest app.

I don't feel that we'll never do auto-degrade is determinative;
several hackers were for auto-degrade, and they have a good use-case
argument.  However, we do have consensus that we need more scaffolding
than this patch supplies in order to make auto-degrade *safe*.

I encourage the submitter to resumbit and improved version of this patch
(one with more monitorability) for  9.5 CF1.  That'll give us a whole
dev cycle to argue about it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 12:34 PM, Joshua D. Drake wrote:
 
 Hello,
 
 I have run into yet again another situation where there was an
 assumption that autovacuum was keeping up and it wasn't. It was caused
 by autovacuum quitting because another process requested a lock.
 
 In turn we received a ton of bloat on pg_attribute which caused all
 kinds of other issues (as can be expected).
 
 The more I run into it, the more it seems like autovacuum should behave
 like vacuum, in that it gets precedence when it is running. First come,
 first serve as they say.
 
 Thoughts?

If we let autovacuum block user activity, a lot more people would turn
it off.

Now, if you were to argue that we should have some way to monitor the
tables which autovac can never touch because of conflicts, I would agree
with you.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 02:17 PM, Magnus Hagander wrote:
 FWIW, I have a patch around somewhere that I never cleaned up properly for
 submissions that simply added a counter to pg_stat_user_tables indicating
 how many times vacuum had aborted on that specific table. If that's enough
 info  (it was for my case) to cover this case, I can try to dig it out
 again and clean it up...

It would be 100% more information than we currently have.  How much more
difficult would it be to count completed autovacuums as well?  It's
really the ratio of the two which matters ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 02:55 PM, Josh Berkus wrote:
 On 01/23/2014 02:17 PM, Magnus Hagander wrote:
 FWIW, I have a patch around somewhere that I never cleaned up properly for
 submissions that simply added a counter to pg_stat_user_tables indicating
 how many times vacuum had aborted on that specific table. If that's enough
 info  (it was for my case) to cover this case, I can try to dig it out
 again and clean it up...
 
 It would be 100% more information than we currently have.  How much more
 difficult would it be to count completed autovacuums as well?  It's
 really the ratio of the two which matters ...

Actually, now that I think about it, the ratio of the two doesn't matter
as much as whether the most recent autovacuum aborted or not.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-22 Thread Josh Berkus
On 01/21/2014 05:21 PM, Andres Freund wrote:
 I think the only realistic thing is a monitoring capability, like we
 have replication. GRANT/REVOKE doesn't even come close to being able
 to generically allow to grant permissions of even the moderate
 complexity pg_stat_get_activity() has.

That would work for me, personally.   I don't know how it would work for
anyone else.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-22 Thread Josh Berkus
On 01/21/2014 05:22 PM, Mark Kirkwood wrote:
 If said malicious attacker can log into postgres and issue its own
 queries, and connect to other database then you are in serious trouble
 already.
 
 I also wonder that if such an attacker knows the application name, that
 would suggest that they have access to the application server and are
 able to read its config files...which would probably also contain the
 host and db name too (and possibly the password in some unfortunate cases)!

Common case:

Multitenant shared hosting on a public cloud.

1. attacker writes a tool which exploits VulnerableApplication and takes
it over.

2. they exploit SiteX, running that web app.

3. using SiteX's database credentials, they check pg_stat_activity and
see what other hosts are running VulnerableApplication.

4. They then infect the other hosts running VulnerableApplication.

Alternately:

4. They use VulnerableApplication's predictable password-generating flaw
to log into the other databases, or to try the default password which
ships with the app.

However, thinking about the case above, there are a number of caveats
and workarounds which make the above not that interesting of an exploit
case:

A. it's easy to find VulnerableApplication simply by scanning the web.
Easier, in fact, than the above, if you have an IP block to start with,
and you would.

B. Most applications don't actually set application-specific application
names anyway (psycopg2, libpq).

C. It would be trivially easy for a DBA concerned about security to
obfuscate application names in a way which would not be easy for an
attacker to analyze.

D. Don't use default passwords.  Also, the attacker merely needs to try
each database in turn anyway.

Given the above, I think this specific patch falls into the broad class
of things we would like to have in a multi-tenant toolkit (and is far
from the most useful of those), which would include:

* hiding application_name, user_name, and database_names from users of
other databases

* local superuser who can create per-database users and extensions
from an approved list

* ability to block users from changing some resource limits (work_mem,
for example).

* per-database logging (could be done with log hooks, just needs a contrib).

It seems to me that it's not terribly useful to fix one item on the
above list without having at least a *plan* to address the others. This
really needs to be part of a comprehensive system, not piecework, or
we'll end up with a bunch of little security options which don't work
together.

Probably Heroku has some more specific exploit case to be concerned
about here; if so, might I suggest taking it up with the -security list?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_istready and older versions

2014-01-21 Thread Josh Berkus
All,

pg_isready works against older versions of PostgreSQL.  Does anyone know
if there's a limit to that?  v3 protocol change?  Something else?

Backwards compatibility ought to be in its docs, but to fix that I need
to know what version it's compatible *to*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Josh Berkus
On 01/21/2014 04:12 AM, Stephen Frost wrote:
 It also means that monitoring tools must run as superuser to see
  information they require, which to me is a total showstopper.
 We've already got *far* too much of that going on for my taste.  I'd
 love to see a comprehensive solution to this problem which allows
 monitoring systems to run w/o superuser privileges.
 

Agreed.  I just ran into the issue that getting data out of
pg_stat_replication on replication lag requires superuser privs, which
is annoying; without that, I can run Handyrep entirely as an
unprivileged user, but that one fact requires the handyrep database user
to be a superuser.

It would be really nice to be able to GRANT/REVOKE on some of these
special system views ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Josh Berkus
Mel,

So we have a few interested parties.  What do we need to do to set up
the Collab session?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Josh Berkus
On 01/16/2014 10:46 AM, Tom Lane wrote:
 I'm fine if the proposal is that postgresql.conf include include_dir
 conf.d by default (where that's read as relative to postgresql.conf's own
 directory).  Even better if it's not terribly difficult for a packager to
 change that, because I think some will want to.  We could possibly reduce
 the need for packagers to change it if we made it be
 include_dir postgresql.d, because conf.d is a damn generic name for
 something that might be in the same /etc directory as configs for other
 packages.

FWIW, this is what I was proposing.  We have an include_dir
postgresql.conf.d currently in the stock postgresql.conf, but it's
disabled (commented out) by default.  I'd just like it enabled by
default, and to pass a suggestion to the packagers that they pick an
appropriate directory and enable it by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Josh Berkus
On 01/16/2014 07:32 AM, Christian Kruse wrote:
 Hi Alvaro,
 
 On 16/01/14 10:21, Alvaro Herrera wrote:
 1. it is to be read automatically by the server without need for an
include_dir conf.d option in the main postgresql.conf.
 
 +1
 
 4. there is no point in disabling it, and thus we offer no mechanism
to do that.
 
 Not only there is „no point“ in disabling it, it makes this feature
 nearly useless. One can't rely on it if the distro may disable
 it. There are so many out there, it will never be a reliable feature
 if it can be disabled.

It would make *my* life vastly easier if we could mandate things like
the presence and relative directory of a conf.d.  However, if Apache
can't do it, we certainly can't.  Ultimately, we cannot impose things on
distributions which they are unwilling to support; Debian, for one, will
happily fork PostgreSQL rather than accept directory assignments which
don't meet their standards.

Also, enough people install PostgreSQL from source or using custom
packages to make for a high degree of variation anyway.

That's why I was just advocating changing the *defaults*, not mandating
anything.  Actual directory locations and usage should be configurable
by distros, packagers and users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink performance regression

2014-01-16 Thread Josh Berkus
On 12/07/2013 05:50 PM, Joe Conway wrote:
 On 12/07/2013 05:41 PM, Fabrízio de Royes Mello wrote:
 
 On Sat, Dec 7, 2013 at 11:20 PM, Michael Paquier 
 michael.paqu...@gmail.com mailto:michael.paqu...@gmail.com
 wrote:

 IMHO is more elegant create a procedure to encapsulate the code
 to avoid redundancy.
 Yep, perhaps something like PQsetClientEncodingIfDifferent or
 similar would make sense.
 
 Well I think at this first moment we can just create a procedure
 inside the dblink contrib and not touch in libpq.
 
 Maybe a libpq function could be done for 9.4, but not for back branches.
 
 I don't think it makes sense to create a new function in dblink either
 -- we're only talking about two lines of added redundancy which is
 less lines of code than a new function would add. But if we create
 PQsetClientEncodingIfDifferent() (or whatever) we can remove those
 extra lines in 9.4 ;-)

Hey, since we're about to do 9.3.3: was this patch ever committed?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-15 Thread Josh Berkus
Hackers,

ALTER SYSTEM SET has been committed and recovery.conf GUCs are being
reviewed.  I'm going to make a last case for conf.d in relation to these
two patches before 9.4 goes out the door.

In 9.3, we added support for a config directory (conf.d), but have it
disabled by default.  For tool authors, this makes conf.d useless since
you never know, on any given installation, whether it will be
present/enabled or not.  While we don't want to prevent users from
disabling it, conf.d only becomes useful if it's present by default.
There's a simple reason why: if you want to write a tool which manages
postgresql.conf, you don't want the user to have to manually edit
postgresql.conf (and create a directory) in order to enable the tool.

I'm particularly thinking about this in relation to the merger of
recovery.conf and postgresql.conf.  There are several tools already
(RepMgr, OminPITR, HandyRep, pgPool, etc.) which manage recovery.conf
separately from postgresql.conf.  These tools will want to continue
managing recovery.conf as a separate file, even if it's /included in
postgresql.conf now.

If conf.d exists by default, and is enabled in postgresql.conf by
default, this is easy: the tool just drops a recovery.conf file into
conf.d.  Changing file locations and variable names is a fairly simple
exercise in backwards compatibility.

If conf.d does NOT exist by default, things become complicated, and
backwards compatibility for replication management tools becomes much
harder.

Yes, I'm also arguing that postgresql.auto.conf should go into conf.d.
I said I'd bring that up again after ALTER SYSTEM SET was committed, and
here it is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-15 Thread Josh Berkus
/ vs /var), then postgresql.auto.conf should stay in the
 data directory.

Ah, I'd forgotten about that line of argument.  Where is auto.conf now?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-01-14 Thread Josh Berkus

 But you can have a single $SHAREDIR per set of executables, right?
 
 Please read the following email to know what they asked for and how they
 do operate OpenShift:
 
   
 http://www.postgresql.org/message-id/341087492.2585530.1376776393038.javamail.r...@redhat.com

FWIW, I'm talking with Amazon later this week and checking how they're
handling their tenant-loadable extensions.  I'd like to come up with one
solution here which covers all cloud providers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
Mel,

 I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014
 (LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/)
 that you may have seen already.
 
 In recent years we have had at least one topic that was shared between
 all three tracks that was lead by a person outside of the usual kernel
 development community. I am checking if the PostgreSQL community
 would be willing to volunteer someone to lead a topic discussing
 PostgreSQL performance with recent kernels or to highlight regressions
 or future developments you feel are potentially a problem. With luck
 someone suitable is already travelling to the collaboration summit
 (http://events.linuxfoundation.org/events/collaboration-summit) and it
 would not be too inconvenient to drop in for LSF/MM as well.

We can definitely get someone there.  I'll certainly be there; I'm
hoping to get someone who has closer involvement with our kernel
interaction as well.

 There are two reasons why I'm suggesting this. First, PostgreSQL was the
 basis of a test used to highlight a scheduler problem around kernel 3.6
 but otherwise in my experience it is rare that PostgreSQL is part of a
 bug report.  I am skeptical this particular bug report was a typical use
 case for PostgreSQL (pgbench, read-only, many threads, very small in-memory
 database). I wonder why reports related to PostgreSQL are not more common.
 One assumption would be that PostgreSQL is perfectly happy with the current
 kernel behaviour in which case our discussion here is done.

To be frank, it's because most people are still running on 2.6.19, and
as a result are completely unaware of recent developments.  Second,
because there's no obvious place to complain to ... lkml doesn't welcome
bug reports, and where else do you go?

 Does the PostgreSQL community have a problem with recent kernels,
 particularly with respect to the storage, filesystem or memory management
 layers? If yes, do you have some data that can highlight this and can you
 volunteer someone to represent your interests to the kernel community? 

Yes, and yes.

 Are
 current developments in the IO layer counter to the PostgreSQL requirements?
 If so, what developments, why are they a problem, do you have a suggested
 alternative or some idea of what we should watch out for? 

Mostly the issue is changes to the IO scheduler which improve one use
case at the expense of others, or set defaults which emphasize desktop
hardware over server hardware.

What also came up with the recent change to LRU is that the Postgres
community apparently has more experience than the Linux community with
buffer-clearing algorithms, and we ought to share that.

 The track topic
 would be up to you but just as a hint, we'd need something a lot more
 concrete than you should test more.

How about don't add major IO behavior changes with no
backwards-compatibility switches?  ;-)

Seriously, one thing I'd like to get out of Collab would be a reasonable
regimen for testing database performance on Linux kernels.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
On 01/13/2014 10:51 AM, Kevin Grittner wrote:
 How about don't add major IO behavior changes with no
 backwards-compatibility switches?  ;-)
 
 I notice, Josh, that you didn't mention the problems many people
 have run into with Transparent Huge Page defrag and with NUMA
 access.  Is that because there *are* configuration options that
 allow people to get decent performance once the issue is diagnosed?
 It seems like maybe there could be a better way to give a heads-up
 on hazards in a new kernel to the database world, but I don't know
 quite what that would be.  For all I know, it is already available
 if you know where to look.

Well, it was the lack of sysctl options which takes the 2Q change from
annoyance to potential disaster.  We can't ever get away from the
possibility that the Postgres use-case might be the minority use-case,
and we might have to use non-default options.  It's when those options
aren't present *at all* that we're stuck.

However, I agree that a worthwhile thing to talk about is having some
better channel to notify the Postgres (and other DB) communities about
major changes to IO and Memory management.

Wanna go to Collab?

 Seriously, one thing I'd like to get out of Collab would be a
 reasonable regimen for testing database performance on Linux
 kernels.
 
 ... or perhaps you figure this is what would bring such issues to
 the community's attention before people are bitten in production
 environments?

That, too.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
Everyone,

I am looking for one or more hackers to go to Collab with me to discuss
this.  If you think that might be you, please let me know and I'll look
for funding for your travel.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
On 01/13/2014 02:26 PM, Mel Gorman wrote:
 Really?
 
 zone_reclaim_mode is often a complete disaster unless the workload is
 partitioned to fit within NUMA nodes. On older kernels enabling it would
 sometimes cause massive stalls. I'm actually very surprised to hear it
 fixes anything and would be interested in hearing more about what sort
 of circumstnaces would convince you to enable that thing.

So the problem with the default setting is that it pretty much isolates
all FS cache for PostgreSQL to whichever socket the postmaster is
running on, and makes the other FS cache unavailable.  This means that,
for example, if you have two memory banks, then only one of them is
available for PostgreSQL filesystem caching ... essentially cutting your
available cache in half.

And however slow moving cached pages between memory banks is, it's an
order of magnitude faster than moving them from disk.  But this isn't
how the NUMA stuff is configured; it seems to assume that it's less
expensive to get pages from disk than to move them between banks, so
whatever you've got cached on the other bank, it flushes it to disk as
fast as possible.  I understand the goal was to make memory usage local
to the processors stuff was running on, but that includes an implicit
assumption that no individual process will ever want more than one
memory bank worth of cache.

So disabling all of the NUMA optimizations is the way to go for any
workload I personally deal with.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Josh Berkus
On 01/13/2014 03:41 PM, Florian Pflug wrote:
 It therefor isn't an oversight that SELECT ... INTO allows multiple result 
 rows
 but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
 for a reason. We shouldn't be second-guessing ourselves by changing that 
 later -
 not, at least, unless we have a *very* good reason for it. Which, AFAICS, we 
 don't.
 
 (And yeah, personally I'd prefer if we'd complain about multiple rows. But 
 it's
 IMHO just too late for that)

I *really* don't want to go through all my old code to find places where
I used SELECT ... INTO just to pop off the first row, and ignored the
rest.  I doubt anyone else does, either.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Josh Berkus
On 01/13/2014 04:20 PM, Jim Nasby wrote:
 On 1/13/14, 5:57 PM, Josh Berkus wrote:
 I *really* don't want to go through all my old code to find places where
 I used SELECT ... INTO just to pop off the first row, and ignored the
 rest.  I doubt anyone else does, either.
 
 Do you regularly have use cases where you actually want just one RANDOM
 row? I suspect the far more likely scenario is that people write code
 assuming they'll get only one row and they'll end up with extremely hard
 to trace bugs if that assumption is ever wrong.

Regularly?  No.  But I've seen it, especially as part of a does this
query return any rows? test.  That's not the best way to test that, but
that doesn't stop a lot of people doing it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-13 Thread Josh Berkus
On 01/13/2014 05:10 PM, Jim Nasby wrote:
 On 1/13/14, 7:06 PM, Josh Berkus wrote:
 Regularly?  No.  But I've seen it, especially as part of a does this
 query return any rows? test.  That's not the best way to test that, but
 that doesn't stop a lot of people doing it.
 
 Right, and I certainly don't want to force anyone to rewrite all their
 code. But I'd certainly like a safer default so people don't mistakenly
 go the multiple rows is OK route without doing so very intentionally.

The problem is that if you change the default, you're creating an
unexpected barrier to upgrading.  I just don't think that it's worth
doing so in order to meet some standard of code neatness, especially in
plpgsql, the unwanted bastard child of SQL and ADA.

For people who want to enable this in order to prevent stupid query bugs
from creeping into their plpgsql, that's great, let's have an easy
option to turn on.  But it's hard enough to get people to upgrade as it
is.  If we're going to add an upgrade landmine, it better be for
something really important.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where do we stand on 9.3 bugs?

2014-01-13 Thread Josh Berkus
On 01/13/2014 07:16 AM, Tom Lane wrote:
 What remaining issues are there blocking a 9.3.3 release?  I know that
 there were unresolved multixact issues when we put out 9.3.2 --- are
 those all dealt with now?  What else do people see as release-blockers?

I see this bug as a release-blocker.  It's a fairly crippling
replication bug for a bunch of users (5 reports to date):

http://archives.postgresql.org/message-id/20140106134815.GE28320%40alap2.anarazel.de

Despite actually dating back to 9.0, something in the 9.3.2/9.2.6
updates is causing users to hit it now.  And for those who do hit it,
replication is impossible and there's no workaround.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
On 01/13/2014 05:30 PM, Dave Chinner wrote:
 On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote:
 No matter what default NUMA allocation policy we set, there will be
 an application for which that behaviour is wrong. As such, we've had
 tools for setting application specific NUMA policies for quite a few
 years now. e.g:

Yeah, that's why I personally regard the NUMA stuff as just an
information problem; there's an easy configuration variable, and you
can't please everyone (and our project would hardly be one to point
fingers about sub-optimal default configurations).  I was responding to
a question of what's wrong with the default setting?

Personally, I have my doubts that the NUMA memory isolation, as
currently implemented, accomplishes what it wants to do.  But that's a
completely different discussion.

The real issue there was that our users had never heard of this change
until suddenly half their RAM became unavailable.  So the solution is
for our project to somehow have these kinds of changes flagged for our
attention so that we can update our docs.  The kernel change list is
quite volumnious, and it's very easy to miss changes of significance in
it.  The easiest way to do this is going to be getting involved in
kernel-database performance testing.

Of course, we are annoyed that we finally removed the main reason to
modify sysctl.conf (SHMMAX), and here we are needing to advise users
about sysctl again.  :-(

I'm much more bothered by the introduction of 2Q logic, since that comes
without a configuration variable to modify its behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
On 01/13/2014 05:48 PM, Andres Freund wrote:
 On 2014-01-13 10:56:00 -0800, Josh Berkus wrote:
 Well, it was the lack of sysctl options which takes the 2Q change from
 annoyance to potential disaster.  We can't ever get away from the
 possibility that the Postgres use-case might be the minority use-case,
 and we might have to use non-default options.  It's when those options
 aren't present *at all* that we're stuck.
 
 Unless I am missing something the kernel's going further *away* from a
 simple 2q system, not the contrary.

Well, they implemented a 2Q system and deliberately offered no sysctl
variables to modify its behavior.  Now they're talking about
implementing an ARC system -- which we know the perils of -- again,
without any configuration variables in case the default behavior doesn't
work for everyone.  And it's highly unlikely that an ARC which is
designed for desktop and/or file server users -- let alone mobile users
-- is going to be optimal for PostgreSQL out of the box.

In fact, I'd assert that it's flat-out impossible to engineer an ARC
which will work for multiple different use cases without user-level
configuration.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Josh Berkus
All,

I'm leading this off with a review of the features offered by the actual
patch submitted.  My general discussion of the issues of Sync Degrade,
which justifies my specific suggestions below, follows that.  Rajeev,
please be aware that other hackers may have different opinions than me
on what needs to change about the patch, so you should collect all
opinions before changing code.

===

 Add a new parameter :

 synchronous_standalone_master = on | off

I think this is a TERRIBLE name for any such parameter.  What does
synchronous standalone even mean?  A better name for the parameter
would be auto_degrade_sync_replication or synchronous_timeout_action
= error | degrade, or something similar.  It would be even better for
this to be a mode of synchronous_commit, except that synchronous_commit
is heavily overloaded already.

Some issues raised by this log script:

LOG:  standby tx0113 is now the synchronous standby with priority 1
LOG:  waiting for standby synchronization
  -- standby wal receiver on the standby is killed (SIGKILL)
LOG:  unexpected EOF on standby connection
LOG:  not waiting for standby synchronization
  -- restart standby so that it connects again
LOG:  standby tx0113 is now the synchronous standby with priority 1
LOG:  waiting for standby synchronization
  -- standby wal receiver is first stopped (SIGSTOP) to make sure

The not waiting for standby synchronization message should be marked
something stronger than LOG.  I'd like ERROR.

Second, you have the master resuming sync rep when the standby
reconnects.  How do you determine when it's safe to do that?  You're
making the assumption that you have a failing sync standby instead of
one which simply can't keep up with the master, or a flakey network
connection (see discussion below).

 a.   Master_to_standalone_cmd: To be executed before master
switches to standalone mode.

 b.  Master_to_sync_cmd: To be executed before master switches from
sync mode to standalone mode.

I'm not at all clear what the difference between these two commands is.
 When would one be excuted, and when would the other be executed?  Also,
renaming ...

Missing features:

a) we should at least send committing clients a WARNING if they have
commited a synchronous transaction and we are in degraded mode.

I know others have dismissed this idea as too talky, but from my
perspective, the agreement with the client for each synchronous commit
is being violated, so each and every synchronous commit should report
failure to sync.  Also, having a warning on every commit would make it
easier to troubleshoot degraded mode for users who have ignored the
other warnings we give them.

b) pg_stat_replication needs to show degraded mode in some way, or we
need pg_sync_rep_degraded(), or (ideally) both.

I'm also wondering if we need a more sophisticated approach to
wal_sender_timeout to go with all this.

===

On 01/11/2014 08:33 PM, Bruce Momjian wrote:
 On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
 In other words, if we're going to have auto-degrade, the most
 intelligent place for it is in
 RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
 place.  Anything we do *inside* Postgres is going to have a really,
 really hard time determining when to degrade.
 
 Well, one goal I was considering is that if a commit is hung waiting for
 slave sync confirmation, and the timeout happens, then the mode is
 changed to degraded and the commit returns success.  I am not sure how
 you would do that in an external tool, meaning there is going to be
 period where commits fail, unless you think there is a way that when the
 external tool changes the mode to degrade that all hung commits
 complete.  That would be nice.

Realistically, though, that's pretty unavoidable.  Any technique which
waits a reasonable interval to determine that the replica isn't going to
respond is liable to go beyond the application's timeout threshold
anyway.  There are undoubtedly exceptions to that, but it will be the
case a lot of the time -- how many applications are willing to wait
*minutes* for a COMMIT?

I also don't see any way to allow the hung transactions to commit
without allowing the walsender to make a decision on degrading.  As I've
outlined elsewhere (and below), the walsender just doesn't have enough
information to make a good decision.

On 01/11/2014 08:52 PM, Amit Kapila wrote: It is better than async mode
in a way such that in async mode it never
 waits for commits to be written to standby, but in this new mode it will
 do so unless it is not possible (all sync standby's goes down).
 Can't we use existing wal_sender_timeout, or even if user expects a
 different timeout because for this new mode, he expects master to wait
 more before it start operating like standalone sync master, we can provide
 a new parameter.

One of the reasons that there's so much disagreement about this feature
is that most

Re: [HACKERS] Standalone synchronous master

2014-01-12 Thread Josh Berkus
On 01/12/2014 12:35 PM, Stephen Frost wrote:
 * Josh Berkus (j...@agliodbs.com) wrote:
 You don't want to handle all of those issues the same way as far as sync
 rep is concerned.  For example, if the standby is restaring, you
 probably want to wait instead of degrading.
 
 *What*?!  Certainly not in any kind of OLTP-type system; a system
 restart can easily take minutes.  Clearly, you want to resume once the
 standby is back up, which I feel like the people against an auto-degrade
 mode are missing, but holding up a commit until the standby finishes
 rebooting isn't practical.

Well, then that becomes a reason to want better/more configurability.
In the couple of sync rep sites I admin, I *would* want to wait.

 There's also the issue that this patch, and necessarily any
 walsender-level auto-degrade, has IMHO no safe way to resume sync
 replication.  This means that any use who has a network or storage blip
 once a day (again, think AWS) would be constantly in degraded mode, even
 though both the master and the replica are up and running -- and it will
 come as a complete surprise to them when the lose the master and
 discover that they've lost data.
 
 I don't follow this logic at all- why is there no safe way to resume?
 You wait til the slave is caught up fully and then go back to sync mode.
 If that turns out to be an extended problem then an alarm needs to be
 raised, of course.

So, if you have auto-resume, how do you handle the flaky network case?
 And how would an alarm be raised?

On 01/12/2014 12:51 PM, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
 I know others have dismissed this idea as too talky, but from my
 perspective, the agreement with the client for each synchronous
 commit is being violated, so each and every synchronous commit
 should report failure to sync.  Also, having a warning on every
 commit would make it easier to troubleshoot degraded mode for users
 who have ignored the other warnings we give them.

 I agree that every synchronous commit on a master which is configured
 for synchronous replication which returns without persisting the work
 of the transaction on both the (local) primary and a synchronous
 replica should issue a WARNING.  That said, the API for some
 connectors (like JDBC) puts the burden on the application or its
 framework to check for warnings each time and do something reasonable
 if found; I fear that a Venn diagram of those shops which would use
 this new feature and those shops that don't rigorously look for and
 reasonably deal with warnings would have significant overlap.

Oh, no question.  However, having such a WARNING would help with
interactive troubleshooting once a problem has been identified, and
that's my main reason for wanting it.

Imagine the case where you have auto-degrade and a flaky network.  The
user would experience problems as performance problems; that is, some
commits take minutes on-again, off-again.  They wouldn't necessarily
even LOOK at the sync rep settings.  So next step is to try walking
through a sample transaction on the command line, and then the
DBA/consultant gets WARNING messages, which gives an idea where the real
problem lies.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-11 Thread Josh Berkus
On 01/10/2014 06:27 PM, Bruce Momjian wrote:
 How would that work?  Would it be a tool in contrib?  There already is a
 timeout, so if a tool checked more frequently than the timeout, it
 should work.  The durable notification of the admin would happen in the
 tool, right?

Well, you know what tool *I'm* planning to use.

Thing is, when we talk about auto-degrade, we need to determine things
like Is the replica down or is this just a network blip? and take
action according to the user's desired configuration.  This is not
something, realistically, that we can do on a single request.  Whereas
it would be fairly simple for an external monitoring utility to do:

1. decide replica is offline for the duration (several poll attempts
have failed)

2. Send ALTER SYSTEM SET to the master and change/disable the
synch_replicas.

Such a tool would *also* be capable of detecting when the synchronous
replica was back up and operating, and switch back to sync mode,
something we simply can't do inside Postgres.  And it would be a lot
easier to configure an external tool with monitoring system integration
so that it can alert the DBA to degradation in a way which the DBA was
liable to actually see (which is NOT the Postgres log).

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever.  It's also the *easiest*
place.  Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] units in postgresql.conf comments

2014-01-11 Thread Josh Berkus
On 01/11/2014 11:06 AM, Bruce Momjian wrote:
 On Wed, May 29, 2013 at 09:59:10PM -0400, Peter Eisentraut wrote:
 I think these sort of entries don't make much sense:

 #wal_sender_timeout = 60s  # in milliseconds; 0 disables

 I think we should remove units from the comments when it's clear from
 the name or the default value that time units are accepted.
 
 So, is anyone doing this?  Should it be a TODO item?

I don't agree, actually, unless we take the next step and actually clean
all the documentation garbage out of the file and leave it in the main
docs and pg_settings where it belongs.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-10 Thread Josh Berkus
All,

To make this easier for everyone to participate in, I've created a wiki
page:

https://wiki.postgresql.org/wiki/9.4CF4Triage

Please add the patches you know well to the appropriate list, thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 02:59 PM, Joshua D. Drake wrote:
 
 On 01/10/2014 02:47 PM, Andres Freund wrote:
 
 Really, the commits themselves are sent to the server at exactly the
 same speed independent of sync/async. The only thing that's delayed is
 the *notificiation* of the client that sent the commit. Not the commit
 itself.
 
 Which is irrelevant to the point that if the standby goes down, we are
 now out of business.
 
 Any continuous replication should not be a SPOF. The current behavior
 guarantees that a two node sync cluster is a SPOF. The proposed behavior
 removes that.

Again, if that's your goal, then use async replication.

I really don't understand the use-case here.

The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.

BTW, people are using RAID1 as an analogy to 2-node sync replication.
That's a very bad analogy, because in RAID1 you have a *single*
controller which is capable of determining if the disks are in a failed
state or not, and this is all happening on a single node where things
like network outages aren't a consideration.  It's really not the same
situation at all.

Also, frankly, I absolutely can't count the number of times I've had to
rescue a customer or family member who had RAID1 but wan't monitoring
syslog, and so one of their disks had been down for months without them
knowning it.  Heck, I've done this myself.

So ... the Filesystem geeks have already been through this.  Filesystem
clustering started out with systems like DRBD, which includes an
auto-degrade option.  However, DBRD with auto-degrade is widely
considered untrustworthy and is a significant portion of why DBRD isn't
trusted today.

From here, clustered filesystems went in two directions: RHCS added
layers of monitoring and management to make auto-degrade a safer option
than it is with DRBD (and still not the default option).  Scalable
clustered filesystems added N(M) quorum commit in order to support more
than 2 nodes.  Either of these courses are reasonable for us to pursue.

What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If
I'm wrong, then someone can point it out to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 01:49 PM, Andres Freund wrote:
 On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:

 On 01/10/2014 07:47 AM, Bruce Momjian wrote:

 I know there was a desire to remove this TODO item, but I think we have
 brought up enough new issues that we can keep it to see if we can come
 up with a solution.  I have added a link to this discussion on the TODO
 item.

 I think we will need at least four new GUC variables:

 *  timeout control for degraded mode
 *  command to run during switch to degraded mode
 *  command to run during switch from degraded mode
 *  read-only variable to report degraded mode

I would argue that we don't need the first.  We just want a command to
switch synchronous/degraded, and a variable (or function) to report on
degraded mode.  If we have those things, then it becomes completely
possible to have an external monitoring framework, which is capable of
answering questions like is the replica down or just slow?, control
degrade.

Oh, wait!  We DO have such a command.  It's called ALTER SYSTEM SET!
Recently committed.  So this is really a solvable issue if one is
willing to use an external utility.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-10 Thread Josh Berkus
On 01/10/2014 01:34 PM, David Rowley wrote:
 On Sat, Jan 11, 2014 at 8:28 AM, Josh Berkus j...@agliodbs.com wrote:
 
 All,

 To make this easier for everyone to participate in, I've created a wiki
 page:

 https://wiki.postgresql.org/wiki/9.4CF4Triage

 Please add the patches you know well to the appropriate list, thanks!


 I know my own patch pretty well and from my own point of view it's very
 close to being about ready to go, but a good review may change that. Should
 we be waiting for 2nd opinions or can patch authors decide for themselves?
 Or were you talking only to commiters?

Well, I'd prefer that someone other than the patch author assess the
patch state; the author is going to be characteristically optimistic.

However, it's a wiki.  If you put it under good to go someone else who
disagrees can move it.  IMHO, if the patch hasn't had at least one
review yet (in a prior CF), though, I'd put it under Nearly Ready.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-09 Thread Josh Berkus
Robert,

 I think the problem here is that we tend to have a limited view of
 the right way to use synch rep. If I have 5 nodes, and I set 1
 synchronous and the other 3 asynchronous, I've set up a known
 successor in the event that the leader fails. In this scenario
 though, if the successor fails, you actually probably want to keep
 accepting writes; since you weren't using synchronous for durability
 but for operational simplicity. I suspect there are probably other
 scenarios where users are willing to trade latency for improved and/or
 directed durability but not at the extent of availability, don't you?

That's a workaround for a completely different limitation though; the
inability to designate a specific async replica as first.  That is, if
there were some way to do so, you would be using that rather than sync
rep.  Extending the capabilities of that workaround is not something I
would gladly do until I had exhausted other options.

The other problem is that *many* users think they can get improved
availability, consistency AND durability on two nodes somehow, and to
heck with the CAP theorem (certain companies are happy to foster this
illusion).  Having a simple, easily-accessable auto-degrade without
treading degrade as a major monitoring event will feed this
self-deception.  I know I already have to explain the difference between
synchronous and simultaneous to practically every one of my clients
for whom I set up replication.

Realistically, degrade shouldn't be something that happens inside a
single PostgreSQL node, either the master or the replica.  It should be
controlled by some external controller which is capable of deciding on
degrade or not based on a more complex set of circumstances (e.g. Is
the replica actually down or just slow?).  Certainly this is the case
with Cassandra, VoltDB, Riak, and the other serious multinode databases.

 This isn't to say there isn't a lot of confusion around the issue.
 Designing, implementing, and configuring different guarantees in the
 presence of node failures is a non-trivial problem. Still, I'd prefer
 to see Postgres head in the direction of providing more options in
 this area rather than drawing a firm line at being a CP-oriented
 system.

I'm not categorically opposed to having any form of auto-degrade at all;
what I'm opposed to is a patch which adds auto-degrade **without adding
any additional monitoring or management infrastructure at all**.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2014-01-09 Thread Josh Berkus
On 01/09/2014 06:12 AM, Andrew Dunstan wrote:
 Oleg,
 
 Please merge in the jsonb work and resubmit. See
 https://github.com/feodor/postgres/commits/jsonb_and_hstore I note that
 this repo does not apparently contain any of your latest changes.

I'll go further and say that if the Hstore2 patch doesn't support JSONB
for 9.4, we should postpone it to 9.5.  We really don't want to get into
a situation where we need an Hstore3 because we accepted an Hstore2
which needs to be rev'd for JSON.

Especially since there's no good reason for the JSON changes not to be
merged already.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-01-09 Thread Josh Berkus
On 01/09/2014 12:05 PM, Heikki Linnakangas wrote:

 Actually, why is the partially-filled 00010002 file
 archived in the first place? Looking at the code, it's been like that
 forever, but it seems like a bad idea. If the original server is still
 up and running, and writing more data to that file, what will happen is
 that when the original server later tries to archive it, it will fail
 because the partial version of the file is already in the archive. Or
 worse, the partial version overwrites a previously archived more
 complete version.

Oh!  This explains some transient errors I've seen.

 Wouldn't it be better to not archive the old segment, and instead switch
 to a new segment after writing the end-of-recovery checkpoint, so that
 the segment on the new timeline is archived sooner?

It would be better to zero-fill and switch segments, yes.  We should
NEVER be in a position of archiving two different versions of the same
segment.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus
Hackers,

Per the Developer Meeting, we are scheduled to do a final triage of 9.4
patches the week before CF4 starts, which is *now*.  The goal of this
triage is to divide patches already in queue into 5 groups:

1) Good To Go: patches which are 100% ready for final testing and commit.

2) Needs a Little Work: smaller patches which can be included in 9.4 if
they get a few hours of love from a committer or major hacker.

3) Big Patches: big, important patches which will need a major time
committement to commit even though they are 90% ready, just due to size.

4) Not Nearly Ready: Patches which need major work and/or spec
discussions before commitment.

5) WIP: any patch which is acknowledged just there for review, or any
brand-new patch which wasn't in CF3 and is non-trivial.

Obviously, any patches in groups 4 and 5 aren't going into 9.4.  The
idea is to handle patches in CF4 in this order:

1. do immediately

2. do after (1) is complete

3. assign 1 senior hacker reviewer to each patch

4. review as time permits after 1-3

5. review as time permits after 1-3

Let the triage begin!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus
On 01/08/2014 11:07 AM, David Fetter wrote:
 On Wed, Jan 08, 2014 at 10:45:37AM -0800, Josh Berkus wrote:
 Hackers,

 Per the Developer Meeting, we are scheduled to do a final triage of 9.4
 patches the week before CF4 starts, which is *now*.  The goal of this
 triage is to divide patches already in queue into 5 groups:
 
 With utmost respect, there are unsent patches which don't fit into the
 categories below.  Are you saying now that the cut-off is today, and
 not actually 1/15, as everyone understands it to be?

It has always been our policy that non-trivial patches which appear for
the first time in CF4 go to the bottom of the priority list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-08 Thread Josh Berkus

 What about doing the triage the first two days or so of the CF? Then we
 know which patches have been submitted/updated and we haven't yet
 performed too many reviews of patches which aren't going to make it.

If we triage the patches we already have NOW, then it'll be possible to
triage the new/updated stuff which comes in in the first 48 hours of the
CF.  If we wait until the CF begins, we'll spend at least the first week
of the CF triaging.

That's why we set this schedule at the developer meeting.

And besides, we already know what category *your* patch belongs in ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 12:27 PM, Bruce Momjian wrote:
 I am glad Heikki and Simon agree, but I don't.  ;-)
 
 The way that I understand it is that you might want durability, but
 might not want to sacrifice availability.  Phrased that way, it makes
 sense, and notifying the administrator seems the appropriate action.

I think there's a valid argument to want things the other way, but I
find the argument not persuasive.  In general, people who want
auto-degrade for sync rep either:

a) don't understand what sync rep actually does (lots of folks confuse
synchronous with simultaneous), or

b) want more infrastructure than we actually have around managing sync
replicas

Now, the folks who want (b) have a legitimate need, and I'll point out
that we always planned to have more features around sync rep, it's just
that we never actually worked on any.  For example, quorum sync was
extensively discussed and originally projected for 9.2, only certain
hackers changed jobs and interests.

If we just did the minimal change, that is, added an auto-degrade GUC
and an alert to the logs each time the master server went into degraded
mode, as Heikki says we'd be loading a big foot-gun for a bunch of
ill-informed DBAs.  People who want that are really much better off with
async rep in the first place.

If we really want auto-degrading sync rep, then we'd (at a minimum) need
a way to determine *from the replica* whether or not it was in degraded
mode when the master died.  What good do messages to the master log do
you if the master no longer exists?

Mind you, being able to determine on the replica whether it was
synchronous or not when it lost communication with the master would be a
great feature to have for sync rep groups as well, and would make them
practical (right now, they're pretty useless).  However, I seriously
doubt that someone is going to code that up in the next 5 days.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest manager?

2014-01-08 Thread Josh Berkus
On 01/08/2014 02:04 PM, Peter Eisentraut wrote:
 Anyone else?
 
 Or you'll have to deal with me again?
 
 

I vote for Peter.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 01:49 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 If we really want auto-degrading sync rep, then we'd (at a minimum) need
 a way to determine *from the replica* whether or not it was in degraded
 mode when the master died.  What good do messages to the master log do
 you if the master no longer exists?
 
 How would it be possible for a replica to know whether the master had
 committed more transactions while communication was lost, if the master
 dies without ever restoring communication?  It sounds like pie in the
 sky from here ...

Oh, right.  Because the main reason for a sync replica degrading is that
it's down.  In which case it isn't going to record anything.  This would
still be useful for sync rep candidates, though, and I'll document why
below.  But first, lemme demolish the case for auto-degrade.

So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:

1. A data center network/power event starts.

2. The sync replica goes down.

3. A short time later, the master goes down.

4. Data center power is restored.

5. The master is fried and is a permanent loss.  The replica is ok, though.

Question: how does the DBA know whether data has been lost or not?

With current sync rep, it's easy: no data was lost, because the master
stopped accepting writes once the replica went down.  If we support
auto-degrade, though, there's no way to know; the replica doesn't have
that information, and anything which was on the master is permanently
lost.  And the point several people have made is: if you can live with
indeterminancy, then you're better off with async rep in the first place.

Now, what we COULD definitely use is a single-command way of degrading
the master when the sync replica is down.  Something like ALTER SYSTEM
DEGRADE SYNC.  Right now you have to push a change to the conf file and
reload, and there's no way to salvage the transaction which triggered
the sync failure.  This would be a nice 9.5 feature.

HOWEVER, we've already kind of set up an indeterminate situation with
allowing sync rep groups and candidate sync rep servers.  Consider this:

1. Master server A is configured with sync replica B and candidate sync
replica C

2. A rolling power/network failure event occurs, which causes B and C to
go down sometime before A, and all of them to go down before the
application does.

3. On restore, only C is restorable; both A and B are a total loss.

Again, we have no way to know whether or not C was in sync replication
when it went down.  If C went down before B, then we've lost data; if B
went down before C, we haven't.  But we can't find out.  *This* is where
it would be useful to have C log whenever it went into (or out of)
synchronous mode.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
Stephen,


 I'm aware, my point was simply that we should state, up-front in
 25.2.7.3 *and* where we document synchronous_standby_names, that it
 requires at least three servers to be involved to be a workable
 solution.

It's a workable solution with 2 servers.  That's a low-availability,
high-integrity solution; the user has chosen to double their risk of
not accepting writes against never losing a write.  That's a perfectly
valid configuration, and I believe that NTT runs several applications
this way.

In fact, that can already be looked at as a kind of auto-degrade mode:
if there aren't two nodes, then the database goes read-only.

Might I also point out that transactions are synchronous or not
individually?  The sensible configuration is for only the important
writes being synchronous -- in which case auto-degrade makes even less
sense.

I really think that demand for auto-degrade is coming from users who
don't know what sync rep is for in the first place.  The fact that other
vendors are offering auto-degrade as a feature instead of the ginormous
foot-gun it is adds to the confusion, but we can't help that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 03:18 PM, Stephen Frost wrote:
 Do you really feel that a WARNING and increasing the docs to point
 out that three systems are necessary, particularly under the 'high
 availability' documentation and options, is a bad idea?  I fail to see
 how that does anything but clarify the use-case for our users.

I think the warning is dumb, and that the suggested documentation change
is insufficient.  If we're going to clarify things, then we need to have
a full-on several-page doc showing several examples of different sync
rep configurations and explaining their tradeoffs (including the
different sync modes and per-transaction sync).  Anything short of that
is just going to muddy the waters further.

Mind you, someone needs to take a machete to the HA section of the docs
anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Josh Berkus
On 01/08/2014 03:27 PM, Tom Lane wrote:
 Good point, but C can't solve this for you just by logging.  If C was the
 first to go down, it has no way to know whether A and B committed more
 transactions before dying; and it's unlikely to have logged its own crash,
 either.

Sure.  But if we *knew* that C was not in synchronous mode when it went
down, then we'd expect some data loss.  As you point out, though, the
converse is not true; even if C was in sync mode, we don't know that
there's been no data loss, since B could come back up as a sync replica
before going down again.

 What we lack, and should work on, is a way for sync mode to have M larger
 than one.  AFAICS, right now we'll report commit as soon as there's one
 up-to-date replica, and some high-reliability cases are going to want
 more.

Yeah, we talked about having this when sync rep originally went in.  It
involves a LOT more bookeeping on the master though, which is why nobody
has been willing to attempt it -- and why we went with the
single-replica solution in the first place.  Especially since most
people who want quorum sync really want MM replication anyway.

Sync N times is really just a guarantee against data loss as long as
you lose N-1 servers or fewer.  And it becomes an even
lower-availability solution if you don't have at least N+1 replicas.
For that reason, I'd like to see some realistic actual user demand
before we take the idea seriously.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about Lockhart's book

2013-12-27 Thread Josh Berkus
On 12/27/2013 08:14 AM, Christian Convey wrote:
 Hi Andrew,
 
 Thanks for your response.  Sometimes overall software architectures stay
 (mostly) unchanged for a long time, and so I figured that's possibly the
 case for Postgresql as well.  But I didn't know, which is why I asked.

Some things in that book will still be accurate and informative.  The
problem is that you, as a beginner, won't know which things are still
good and which are obsolete.

I'd suggest:

- Developer documentation in our primary docs
- Developer FAQ on the wiki
- Bruce's presentations on various internals
- Tom's presentations on how the query planner works
- Various other people's presentations on other aspects, such as foreign
data wrappers, event triggers, etc.

Unfortunately, there's no central index of presentations.

I'm a big fan of learn by doing, and here's a program which would
bring you up on a LOT of PostgreSQL:

1. Write a few of your own C functions, including trigger functions and
an operator.

2. Write your own foreign data wrapper for something.

3. Write your own Type, including input/output functions, stats
estimation and custom indexing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
Pavel,

 So constructor should to look like:
 
 CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int
 DEFAULT 0, ...)
 
 and usage:
 
 SELECT make_interval(years := 2)
 SELECT make_interval(days := 14)
 
 Is there a interest for this (or similar) function?

It would certainly make our Python users happy.

And for that matter would get rid of this kind of stupid thing in stored
procedure code:

time_ahead := ( interval '1 minute' * var_skip );

So, +1 for the feature.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
On 12/20/2013 03:09 PM, Gavin Flower wrote:
 What about leap years?

What about them?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
On 12/20/2013 04:44 PM, Gavin Flower wrote:
 On 21/12/13 13:40, Josh Berkus wrote:
 On 12/20/2013 03:09 PM, Gavin Flower wrote:
 What about leap years?
 What about them?

 some years have 365 days others have 366, so how any days in an interval
 of 2 years?, 4 years?

Your question isn't relevant to this patch.  It's not defining the
interval type, just creating an alternate constructor for it.

(the answer is, it depends on what timestamp you're adding it to ...)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL assertions prototype

2013-12-18 Thread Josh Berkus
On 12/18/2013 08:44 AM, Alvaro Herrera wrote:
 Another thought: at the initial run of the assertion, note which tables
 it locked, and record this as an OID array in the catalog row for the
 assertion; consider running the assertion only when those tables are
 touched.  This doesn't work if the assertion code locks some tables when
 run under certain conditions and other tables under different
 conditions.  But then this can be checked too: if an assertion lists in
 its catalog row that it involves tables A, B, C and then, under
 different conditions, it tries to acquire lock on table D, have the
 whole thing fail indicating that the assertion is misdeclared.

This sounds like you're re-inventing SSI.

SERIALIZABLE mode *exists* in order to be able to enforce constraints
which potentially involve more than one transaction.  Balance can never
go below 0, for example. The whole reason we have this really cool and
unique SSI mode is so that we can do such things without killing
performance.  These sorts of requirements are ideally suited to
Assertions, so it's logically consistent to require Serializable mode in
order to use Assertions.

I'm leaning towards the alternative that Assertions require SERIALIZABLE
mode, and throw a WARNING at the user and the log every time we create,
modify, or trigger an assertion while not in SERIALIZABLE mode.   And
beyond, that, we don't guarantee the integrity of Assertions if people
choose to run in READ COMMITTED anyway.

This is consistent with how we treat the interaction of constraints and
triggers; under some circumstances, we allow triggers to violate CHECK
and FK constraints.

Alternately, we add a GUC assertion_serializable_mode, which can be
off, warn or error.  If it's set to error, and the user triggers
an assertion while in READ COMMITTED mode, an exception occurs.  If it's
set to off, then assertions are disabled, in order to deal with buggy
assertions.

Now, it would be even better if we could prevent users from switching
transaction mode, but that's a MUCH bigger and  more complicated patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL assertions prototype

2013-12-18 Thread Josh Berkus
On 12/18/2013 11:26 AM, Jim Nasby wrote:
 The flip-side is that now you can get serialization failures, and I
 think there's a ton of software that has no clue how to deal with that.
 So now you don't get to use assertions at all unless you re-engineer
 your application (but see below).

Well, the software will need to deal with an Assertion failure, which I
doubt it's prepared to do right now either.

 This is consistent with how we treat the interaction of constraints and
 triggers; under some circumstances, we allow triggers to violate CHECK
 and FK constraints.
 
 We do? Under what circumstances?

AFTER triggers are allowed to ignore constraints sometimes.  For
example, if you have a tree table with an FK to other rows in the same
table, and you have an AFTER trigger on it, the AFTER trigger is allowed
to violate the self-FK.  That's the one I ran across, but I vaguely
remember other cases, and there's some documentation on this in the
order of application of triggers in the main docs.

 Another possibility is to allow for two different types of assertions,
 one based on SSI and one based on locking.

The locking version would have to pretty much lock on a table basis (or
even a whole-database basis) every time an assertion executed, no?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why no INSTEAD OF triggers on tables?

2013-12-17 Thread Josh Berkus
On 12/16/2013 07:53 PM, Robert Haas wrote:
 So, put a BEFORE trigger, and make it return NULL.  Same effect,
 different notation.

NOT the same:

Master partition table with BEFORE trigger:

josh=# insert into a ( id, val ) values ( 23, 'test' ), ( 24, 'test'),
(25,'test');

INSERT 0 0
^^^

View with INSTEAD OF trigger:

josh=# insert into a_v ( id, val ) values ( 23, 'test' ), ( 24, 'test'),
(25,'test');

INSERT 0 3
^^^

The difference here is that the INSTEAD OF trigger returns a
rows-affected count, and the BEFORE trigger does not (it returns 0).
Some drivers and ORMs, most notably Hibernate, check this rows-returned
count, and error if they don't match the rows sent.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-17 Thread Josh Berkus
On 12/16/2013 11:44 AM, Tom Lane wrote: Right.  I think a lot of the
tension comes from people being unconvinced
 that the existing extension feature is an ideal model for this sort of
 use-case.  Extensions were mainly designed around the notion of a .so
 with some SQL datatype/function/etc declarations that have to match up
 with the C code.  So it made sense for them to be relatively static things
 that live in the filesystem.  Notably, if you're migrating to a new PG
 major version, you're at the very least gonna have to recompile the C code
 and quite likely will need to change it some.  So adapting the SQL code
 if needed goes along with that, and would never be automatic in any case.

I see what you mean.  On the other hand:

a) introducing a new concept would require a new reserved word

b) it would also require figuring out how it interacts with extensions

c) extensions already have versioning, which this feature needs

d) extensions already have dependancies, which this feature needs

While it splits Extensions into two slightly different concepts, I find
that on the whole less confusing than the alternative.

On 12/16/2013 05:17 PM, Jim Nasby wrote:
 Somewhat related to this, I really wish Postgres had the idea of a
 class, that was allowed to contain any type of object and could be
 instantiated when needed. For example, if we had an address class,
 we could instantiate it once for tracking our customer addresses, and a
 second time for tracking the addresses customers supply for their
 employers. Such a mechanism would probably be ideal for what we need,
 but of course you'd still have the question of how to load a class
 definition that someone else has published.

Well, the idea originally (POSTGRES) was for the Type, Domain, and
Inheritance system to do just what you propose.  Nobody ever worked out
all the practicalities and gotchas to make it really work in production,
though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewarm status

2013-12-17 Thread Josh Berkus
On 12/17/2013 06:34 AM, Robert Haas wrote:
 On Tue, Dec 17, 2013 at 12:09 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have used pg_prewarm during some of work related to Buffer Management and
 other performance related work. It is quite useful utility.
 +1 for reviving this patch for 9.4
 
 Any other votes?

I still support this patch (as I did originally), and don't think that
the overlap with pgFincore is of any consequence.  pgFincore does more
than pgrewarm ever will, but it's also platform-specific, so it still
makes sense for both to exist.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL assertions prototype

2013-12-17 Thread Josh Berkus
On 11/15/2013 05:41 AM, Heikki Linnakangas wrote:
 A fundamental problem with this is that it needs to handle isolation
 reliable, so that the assertion cannot be violated when two concurrent
 backends do things. Consider the example from the manual, which checks
 that a table has at least one row. Now, if the table has two rows to
 begin with, and in one backend you delete one row, and concurrently in
 another backend you delete the other row, and then commit both
 transactions, the assertion is violated.
 
 In other words, the assertions need to be checked in serializable mode.
 Now that we have a real serializable mode, I think that's actually
 feasible.

Going back over this patch, I haven't seen any further discussion of the
point Heikki raises above, which seems like a bit of a showstopper.

Heikki, did you have specific ideas on how to solve this?  Right now my
mind boggles.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL assertions prototype

2013-12-17 Thread Josh Berkus
On 12/17/2013 01:42 PM, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
 Going back over this patch, I haven't seen any further discussion of the
 point Heikki raises above, which seems like a bit of a showstopper.

 Heikki, did you have specific ideas on how to solve this?  Right now my
 mind boggles.
 
 It works fine as long as you set default_transaction_isolation =
 'serializable' and never override that.  :-)  Of course, it sure
 would be nice to have a way to prohibit overrides, but that's
 another issue.
 
 Otherwise it is hard to see how to make it work in a general way
 without a mutually exclusive lock mode on the table for the
 duration of any transaction which modifies the table.

Serializable or not, *what* do we lock for assertions?  It's not rows.
Tables?  Which tables?  What if the assertion is an interpreted language
function?  Does the SSI reference counter really take care of all of this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-16 Thread Josh Berkus
On 12/15/2013 10:47 PM, Jeff Davis wrote:
 The patch offers an alternative to dropping files on the filesystem
 before doing CREATE EXTENSION. Instead, if the extension has no C code,
 you can put it in the catalog using ordinary SQL access, and execute the
 same kind of CREATE EXTENSION. Aside from that, it's pretty much
 identical to existing extensions.

OK, so the idea is that for psql-only extensions (i.e. ones without
.so's) the user shouldn't be required to create a file on disk.  That
seems straightforwards and acceptable.

 Stephen doesn't like the idea that the SQL in an extension is a blob of
 text. 

I, personally, would prefer per-object line-items, but I don't think
that's a deal-breaker.  Having a single text blob does match up with
existing Extension design.

Note for Dimitri, though: I think that having line-item objects in
dependancy order would make this feature vastly more useful for
schema-template maintenance.  Give it some thought.

 There are weird cases, like if you make local modifications to
 objects held in an extension, then dump/reload will lose those local
 modifications.

What does DUMP/Reload do with regular Extensions currently in that case?

 Another issue, which I agree is dubious in many
 situations, is that the version of an extension is not preserved across
 dump/reload (this is actually a feature, which was designed with
 contrib-style extensions in mind, but can be surprising in other
 circumstances).

Well, this should work with a versioning system, in which dump/reload
can load older versions of the extension if they are present, the same
as external Extensions do now.  Is that built in?

 This isn't necessarily a dead-end, but there are a lot of unsettled
 issues, and it will take some soul-searching to answer them. Is an
 extension a blob of text with a version, that's maintained in some
 external repo?

Well, plus potentially binaries and library references, yes.  Although
you could describe all of Postgres as a bunch of text blobs and some
library references, when you get right down to it.

 Is it the job of postgres to ensure that dump/reload
 creates the same situation that you started with, including local
 modifications to objects that are part of an extension?

IMHO: No.  AFAIK, if a user modifies, say, information_schema views in
PostgreSQL, we don't respect that in dump/restore either.

Now, I can see adding to this mechanism a method for tracking such
modifications in a way that pgdump can support them.  But that can
easily be a version 2 feature.

 Should
 everything be an extension, or do we need to invent a new concept for
 some of the use cases? What role to external tools play in all of this?

So, the reason I was confused by this feature -- and the reason Stephen
hates it, I think -- is that I thought it was solving the Extensions
don't follow replication, and they are complicated to install if your OS
doesn't have good packages problem.  It's not, and it never will solve
that issue.

It's solving a completely different problem, to wit:

Some PostgreSQL shops with lots of servers have large internal libraries
of functions, views, and similar code that they've written to support
their applications, which don't comprise a complete database.  This
feature would allow them to package those libraries, and version,
upgrade and track them, without requiring a filesystem-based install.  I
myself have a couple clients who could benefit from this.

I think the name Extension Templates is horrible because it misleads
all of us on this list into thinking the proposed feature is completely
something other than what it is.  I don't have a better name offhand,
but that's got to change before it becomes a feature.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-16 Thread Josh Berkus
On 12/16/2013 10:53 AM, Josh Berkus wrote:
 Some PostgreSQL shops with lots of servers have large internal libraries
 of functions, views, and similar code that they've written to support
 their applications, which don't comprise a complete database.  This
 feature would allow them to package those libraries, and version,
 upgrade and track them, without requiring a filesystem-based install.  I
 myself have a couple clients who could benefit from this.

cc'ing Jim Nasby, since I think Enova is part of the target market for
this feature.  Jim, have you taken a look at this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why no INSTEAD OF triggers on tables?

2013-12-16 Thread Josh Berkus
Hackers,

I've looked in the archives, but I can't find a reason why INSTEAD OF
triggers were never enabled for tables.  I'm interested in them in order
to return a rowcount to JDBC for INSERTs into partitioned tables.

Was there a technical obstacle, or is this just a TUIT issue?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why no INSTEAD OF triggers on tables?

2013-12-16 Thread Josh Berkus
On 12/16/2013 04:22 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 I've looked in the archives, but I can't find a reason why INSTEAD OF
 triggers were never enabled for tables.
 
 What would that mean exactly?  And how would you do the actual update
 when it came time to?

Well, I'm specifically thinking of master partition tables.  In that
case, we really want an INSTEAD OF trigger.

It seems a little silly that I need to create a separate view, and then
an INSTEAD OF trigger on the view, in order to get a rows-updated count
back from an INSERT which hits a partitioned table.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-14 Thread Josh Berkus
All:

Can someone summarize the issues with this patch for those of us who
haven't been following it closely?  I was just chatting with a couple
other contributors, and at this point none of just know what it
implements, what it doesn't implement, what the plans are for expanding
its feature set (if any), and why Frost doesn't like it.  I tried
reading through the thread on -hackers, and came away even more confused.

Is there maybe a wiki page for it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Josh Berkus
On 12/14/2013 05:00 PM, Tom Lane wrote:
 This consideration also makes me question whether we should apply the
 method for NUMERIC.  Although in principle numeric addition/subtraction
 is exact, such a sequence could leave us with a different dscale than
 is returned by the existing code.  I'm not sure if changing the number of
 trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration.  What's the
worst-case drift?  In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about NUMERIC is
fast, but FLOAT is slow?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-12 Thread Josh Berkus
On 12/12/2013 10:33 AM, Claudio Freire wrote:
 Well, why not take a supersample containing all visible tuples from N
 selected blocks, and do bootstrapping over it, with subsamples of M
 independent rows each?

Well, we still need to look at each individual block to determine
grouping correlation.  Let's take a worst case example: imagine a table
has *just* been created by:

CREATE TABLE newdata AS SELECT * FROM olddata ORDER BY category, item;

If category is fairly low cardinality, then grouping will be severe;
we can reasonably expect that if we sample 100 blocks, many of them will
have only one category value present.  The answer to this is to make our
block samples fairly widely spaced and compare them.

In this simplified example, if the table had 1000 blocks, we would take
blocks 1,101,201,301,401,etc.  Then we would compare the number and
content of values found on each block with the number and content found
on each other block.  For example, if we see that block 101 is entirely
the category cats, and block 701 is entirely the category shopping
and block 901 is split 60/40 between the categories transportation and
voting, then we can assume that the level of grouping is very high,
and the number of unknown values we haven't seen is also high.

Whereas if 101 is cats and 201 is cats and 301 through 501 are
cats with 2% other stuff, then we assume that the level of grouping is
moderate and it's just the case that most of the dataset is cats.
Which means that the number of unknown values we haven't seen is low.

Whereas if 101, 201, 501, and 901 have near-identical distributions of
values, we assume that the level of grouping is very low, and that there
are very few values we haven't seen.

As someone else pointed out, full-block (the proposal) vs. random-row
(our current style) doesn't have a very significant effect on estimates
of Histograms and nullfrac, as long as the sampled blocks are widely
spaced.  Well, nullfrac is affected in the extreme example of a totally
ordered table where the nulls are all in one block, but I'll point out
that we can (and do) also miss that using our current algo.

Estimated grouping should, however, affect MCVs.  In cases where we
estimate that grouping levels are high, the expected % of observed
values should be discounted somehow.  That is, with total random
distribution you have a 1:1 ratio between observed frequency of a value
and assumed frequency.  However, with highly grouped values, you might
have a 2:1 ratio.

Again, more math (backed by statistical analysis) is needed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-11 Thread Josh Berkus
On 12/11/2013 08:48 AM, Tom Lane wrote:
 The fundamental problem IMO is that you want to complicate the definition
 of what these things mean as a substitute for DBAs learning something
 about Postgres.  That seems like a fool's errand from here.  They're going
 to have to learn what FATAL means sooner or later, and making it more
 complicated just raises the height of that barrier.

I don't think it works to change the NOTICE/ERROR/FATAL tags; for one
thing, I can hear the screaming about people's log scripts from here.

However, it would really be useful to have an extra tag (in addition to
the ERROR or FATAL) for If you're seeing this message, something has
gone seriously wrong on the server.  Just stuff like corruption
messages, backend crashes, etc.

Otherwise we're requiring users to come up with an alphabet soup of
regexes to filter out the noise error messages from the really, really
important ones.  Speaking as someone who does trainings for new DBAs,
the part where I do what to look for in the logs requires over an hour
and still doesn't cover everything. And doesn't internationalize. That's
nasty.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus
On 12/11/2013 09:57 AM, Robert Haas wrote:
 I don't agree with that assessment.  Anything that involves changing
 the scheduling of autovacuum is a major project that will legitimately
 provoke much controversy.  Extensive testing will be needed to prove
 that the new algorithm doesn't perform worse than the current
 algorithm in any important cases.  I have my doubts about whether that
 can be accomplished in an entire release cycle, let alone 2-3 days.
 In contrast, the patch proposed does something that is easy to
 understand, clearly safe, and an improvement over what we have now.

+1

There is an inherent tuning and troubleshooting challenge in anything
involving a feedback loop.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus
On 12/11/2013 11:37 AM, Simon Riggs wrote: On 11 December 2013 17:57,
Robert Haas robertmh...@gmail.com wrote:

 Extensive testing will be needed to prove
 that the new algorithm doesn't perform worse than the current
 algorithm in any important cases.

 Agreed, but the amount of testing seems equivalent in both cases,
 assuming we weren't going to skip it for this patch.

No performance testing is required for this patch.  The effect of memory
limits on vacuum are already well-known and well-understood.

 With considerable regret, I don't see how this solves the problem at
 hand. We can and should do better.

I strongly disagree.  The problem we are dealing with currently is that
two resource limits which should have *always* been independent of each
other are currently conflated into a single GUC variable.  This forces
users to remember to set maintenance_work_mem interactively every time
they want to run a manual VACUUM, because the setting in postgresql.conf
is needed to tune autovacuum.

In other words, we are having an issue with *non-atomic data*, and this
patch partially fixes that.

Would it be better to have an admissions-control policy engine for
launching autovacuum which takes into account available RAM, estimated
costs of concurrent vacuums, current CPU activity, and which tables are
in cache?  Yes.  And if you started on that now, you might have it ready
for 9.5.

And, for that matter, accepting this patch by no means blocks doing
something more sophisticated in the future.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum_work_mem

2013-12-11 Thread Josh Berkus
On 12/11/2013 12:40 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 And, for that matter, accepting this patch by no means blocks doing
 something more sophisticated in the future.
 
 Yeah.  I think the only real argument against it is do we really need
 yet another knob?.  Since Josh, who's usually the voicer of that
 argument, is for this one, I don't have a problem with it.

This passes the is it a chronic problem not to have a knob for this? test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-11 Thread Josh Berkus
On 12/11/2013 02:39 PM, Martijn van Oosterhout wrote:
 In this discussion we've mostly used block = 1 postgresql block of 8k. 
 But when reading from a disk once you've read one block you can
 basically read the following ones practically for free.
 
 So I wonder if you could make your sampling read always 16 consecutive
 blocks, but then use 25-50% of the tuples.  That way you get many more
 tuples for the same amount of disk I/O seeks..

Yeah, that's what I meant by tune this for the FS.   We'll probably
have to test a lot of different block sizes on different FSes before
we arrive at a reasonable size, and even then I'll bet we have to offer
a GUC.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
Andres, all:

We've just run across a case of this exact issue on 9.2.4.  I thought it
was supposed to be 9.3-only?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 10:39 AM, Andres Freund wrote:
 Hi,
 
 On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
 We've just run across a case of this exact issue on 9.2.4.  I thought it
 was supposed to be 9.3-only?
 
 Could you please describe this exact issue?

Fatal errors due to missing pg_subtrans files on creating a new replica.

Sequence:

1. Failed over from master to replica #1

2. Remastered other replicas

3. Tried to create a new replica.

4. New replica started failing with errors similar to the original report.

If this is fixed in 9.2.6, great, but I didn't get that impression from
the commits ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] coredump of 9.3.2

2013-12-10 Thread Josh Berkus
On 12/10/2013 09:39 AM, Teodor Sigaev wrote:
 
 SELECT
 *
 FROM
 t
 WHERE (
 CASE
 WHEN a%2 IN (SELECT c FROM tt) THEN a
 END IN (SELECT c FROM tt)
 );

Wow, it wouldn't have occured to me that that was even supported syntax.
 I'm not suprised that it doesn't work ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 10:48 AM, Andres Freund wrote:
 On 2013-12-10 10:44:30 -0800, Josh Berkus wrote:
 On 12/10/2013 10:39 AM, Andres Freund wrote:
 Hi,

 On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
 We've just run across a case of this exact issue on 9.2.4.  I thought it
 was supposed to be 9.3-only?

 Could you please describe this exact issue?

 Fatal errors due to missing pg_subtrans files on creating a new replica.
 
 If this is fixed in 9.2.6, great, but I didn't get that impression from
 the commits ...
 
 I am pretty sure this thread isn't about the bug you're hitting. You get
 errors during xid assignment, right? If so, upgrade do 9.2.6, that's
 fixed (it's the bug in which wake the replication bug was introduced).

I thought that only affected 9.2.5?  This machine is 9.2.4, as was the
prior master.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Josh Berkus
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
 On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote: 
 I don't think that anyone believes that not doing block sampling is
 tenable, fwiw. Clearly some type of block sampling would be preferable
 for most or all purposes.

As discussed, we need math though.  Does anyone have an ACM subscription
and time to do a search?  Someone must.  We can buy one with community
funds, but no reason to do so if we don't have to.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 12:50 PM, Tom Lane wrote:
 One would hope that turning off check_function_bodies would be sufficient
 to disable any added checking, though, so I don't see this being a problem
 for pg_dump.  But there might be other scenarios where an additional knob
 would be useful.

I can't think of one, offhand.  And +1 for NOT adding a new knob.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Josh Berkus
On 12/10/2013 01:33 PM, Mark Kirkwood wrote:
 Yeah - and we seem to be back to Josh's point about needing 'some math'
 to cope with the rows within a block not being a purely random selection.

Well, sometimes they are effectively random.  But sometimes they are
not.  The Chaudri et al paper had a formula for estimating randomness
based on the grouping of rows in each block, assuming that the sampled
blocks were widely spaced (if they aren't there's not much you can do).
 This is where you get up to needing a 5% sample; you need to take
enough blocks that you're confident that the blocks you sampled are
representative of the population.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-09 Thread Josh Berkus
Greg,

 I really don't believe the 5% thing. It's not enough for n_distinct
 and it's *far* too high a value for linear properties like histograms
 or nullfrac etc. 

Actually, it is enough for n_distinct, or more properly, 5% is as good
as you can get for n_distinct unless you're going to jump to scanning
50% or more.

It's also applicable for the other stats; histogram buckets constructed
from a 5% sample are more likely to be accurate than those constructed
from a 0.1% sample.   Same with nullfrac.  The degree of improved
accuracy, would, of course, require some math to determine.

 From a computer point of view it's too high to be
 worth bothering. If we have to read 5% of the table we might as well
 do a full scan anyways, it'll be marginally slower but much better
 quality results.

Reading 5% of a 200GB table is going to be considerably faster than
reading the whole thing, if that 5% is being scanned in a way that the
FS understands.

Also, we can optimize this significantly by using the VM, as Robert (I
think) suggested.

In the advanced approaches section, there's also the idea of collecting
analyze data from table pages while they're in memory anyway for other
reasons.

You do seem kind of hostile to the idea of full-page-sampling, going
pretty far beyond the I'd need to see the math.  Why?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-09 Thread Josh Berkus
On 12/09/2013 02:37 PM, Robert Haas wrote:
 I've never seen an n_distinct value of more than 5 digits, regardless
 of reality.  Typically I've seen 20-50k, even if the real number is
 much higher.  But the n_distinct value is only for non-MCVs, so if we
 estimate the selectivity of column = 'rarevalue' to be
 (1-nullfrac-mcvfrac)/n_distinct, then making mcvfrac bigger reduces
 the estimate, and making the MCV list longer naturally makes mcvfrac
 bigger.  I'm not sure how important the
 less-frequent-than-the-least-common-MCV part is, but I'm very sure
 that raising the statistics target helps to solve the problem of
 overestimating the prevalence of uncommon values in a very big table.

I did an analysis of our ndistinct algorithm several years ago ( ~~
8.1), and to sum up:

1. we take far too small of a sample to estimate ndistinct well for
tables larger than 100,000 rows.

2. the estimation algo we have chosen is one which tends to be wrong in
the downwards direction, rather strongly so.  That is, if we could
potentially have an ndistinct of 1000 to 100,000 based on the sample,
our algo estimates 1500 to 3000.

3. Other algos exist.  The tend to be wrong in other directions.

4. Nobody has done an analysis of whether it's worse, on average, to
estimate low vs. high for ndistinct.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-08 Thread Josh Berkus
On 12/08/2013 10:14 AM, Greg Stark wrote:
 With rows_per_block=1   the MCV frequency list ranges from .0082 to .0123
 With rows_per_block=4   the MCV frequency list ranges from .0063 to .0125
 With rows_per_block=16 the MCV frequency list ranges from .0058 to .0164
 With rows_per_block=64 the MCV frequency list ranges from .0021 to .0213
 
 I'm not really sure if this is due to the blocky sample combined with
 the skewed pgbench run or not. It doesn't seem to be consistently
 biasing towards or against bid 1 which I believe are the only rows
 that would have been touched by pgbench. Still it's suspicious that
 they seem to be consistently getting less accurate as the blockiness
 increases.

They will certainly do so if you don't apply any statistical adjustments
for selecting more rows from the same pages.

So there's a set of math designed to calculate for the skew introduced
by reading *all* of the rows in each block.  That's what I meant by
block-based sampling; you read, say, 400 pages, you compile statistics
on *all* of the rows on those pages, you apply some algorithms to adjust
for groupings of rows based on how grouped they are.  And you have a
pretty good estimate of how grouped they are, because you just looked a
complete sets of rows on a bunch of nonadjacent pages.

Obviously, you need to look at more rows than you would with a
pure-random sample.  Like I said, the 80%+ accurate point in the papers
seemed to be at a 5% sample.  However, since those rows come from the
same pages, the cost of looking at more rows is quite small, compared to
the cost of looking at 64 times as many disk pages.

My ACM subscription has lapsed, though; someone with a current ACM
subscription could search for this; there are several published papers,
with math and pseudocode.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-07 Thread Josh Berkus
On 12/07/2013 11:46 AM, Robert Haas wrote:
 Maybe there's some highly-principled statistical approach which could
 be taken here, and if so that's fine, but I suspect not.  So what I
 think we should do is auto-tune the statistics target based on the
 table size.  If, say, we think that the generally useful range for the
 statistics target is something like 10 to 400, then let's come up with
 a formula based on table size that outputs 10 for small tables, 400
 for really big tables, and intermediate values for tables in the
 middle.

The only approach which makes sense is to base it on a % of the table.
In fact, pretty much every paper which has examined statistics
estimation for database tables has determined that any estimate based on
a less-than-5% sample is going to be wildly inaccurate.  Not that 5%
samples are 100% accurate, but at least they fit the 80/20 rule.

This is the reason why implementing block-based sampling is critical;
using our current take one row out of every page method, sampling 5%
of the table means scanning the whole thing in most tables.  We also
need to decouple the number of MCVs we keep from the sample size.
Certainly our existing sampling algo seems designed to maximize IO for
the sample size.

There's other qualitative improvements we could make, which Nathan Boley
has spoken on.   For example, our stats code has no way to recognize a
normal or exponential distrbution -- it assumes that all columns are
randomly distributed.  If we could recoginze common distribution
patterns, then not only could we have better query estimates, those
would require keeping *fewer* stats, since all you need for a normal
distribution are the end points and the variance.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink performance regression

2013-12-07 Thread Josh Berkus
All,

I tested out Joe's original patch, and it does eliminate the 8%
performance regression.

Will try the new one.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Josh Berkus
On 12/06/2013 01:30 PM, Andrew Gierth wrote:
 Since I don't particularly trust my own judgement on aesthetics, I used
 the feedback I got from others when deciding what to do. Frankly I think
 this one needs wider input than just you and me arguing over it.

Can someone paste examples of the two syntax alternatives we're talking
about here?  I've lost track.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-05 Thread Josh Berkus
On 12/05/2013 07:40 AM, Greg Stark wrote:
 On Thu, Dec 5, 2013 at 2:54 PM, Claudio Freire klaussfre...@gmail.com wrote:
 That's a bad idea in the current state of affairs. MM files haven't
 been designed for that usage, and getting stable performance out of
 that will be way too difficult.
 
 I'm talking about long-term goals here. Either of these two routes
 would require whole new kernel interfaces to work effectively. Without
 those new kernel interfaces our current approach is possibly the best
 we can get.

Well, in the long run we'll probably be using persistent RAM.  And the
geeks who manage that have already said that MMAP is a bad interface for
persistent RAM.  They haven't defined a good one, though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-05 Thread Josh Berkus
On 12/05/2013 05:48 AM, Stephen Frost wrote:
 * Peter Geoghegan (p...@heroku.com) wrote:
 On Wed, Dec 4, 2013 at 11:07 AM, Josh Berkus j...@agliodbs.com wrote:
 But you know what?  2.6, overall, still performs better than any kernel
 in the 3.X series, at least for Postgres.

 What about the fseek() scalability issue?
 
 Not to mention that the 2.6 which I suspect you're referring to (RHEL)
 isn't exactly 2.6..

Actually, I've been able to do 35K TPS on commodity hardware on Ubuntu
10.04.   I have yet to go about 15K on any Ubuntu running a 3.X Kernel.
 The CPU scheduling on 2.6 just seems to be far better tuned, aside from
the IO issues; at 35K TPS, the CPU workload is evenly distributed across
cores, whereas on 3.X it lurches from core to core like a drunk in a
cathedral.   However, the hardware is not identical, and this is on
proprietary, not benchmark, workloads, which is why I haven't published
anything.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-05 Thread Josh Berkus
On 12/05/2013 10:21 AM, Stephen Frost wrote:
 * David Johnston (pol...@yahoo.com) wrote:
 ISTM that instituting some level of categorization for messages would be
 helpful.  Then logging and reporting frameworks would be able to identify
 and segregate the logs in whatever way they and the configuration deems
 appropriate.
 
 I've wanted to do that and have even discussed it with folks in the
 past, the trick is finding enough toit's, which is difficult when you
 start to look at the size of the task...

But ... if we set a firm policy on this, then we could gradually clean
up the error messages piecemeal over the next couple of major versions.
 We could also make sure that any new features complied with the
categorization policy.

Right now, how to categorize errors is up to each individual patch
author, which means that things are all over the place, and get worse
with each new feature added.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-05 Thread Josh Berkus
On 12/05/2013 10:46 AM, Tom Lane wrote:
 Before we could get very far we'd need a better understanding than we have
 of what cases a DBA might be interested in.  To take the specific example
 that started this thread, there wouldn't be a lot of value IMO in a
 classification like connection failure messages.  I think the OP is
 probably right that those are often uninteresting --- but as I mentioned,
 too many clients might become interesting if he's wondering whether he
 needs to enlarge max_connections.  Or password failure cases might become
 interesting if he starts to suspect breakin attempts.  So I'd want to see
 a design that credibly covers those sorts of needs before we put any large
 effort into code changes.

Heck, I'd be happy just to have a class of messages which specifically
means OMG, there's something wrong with the server, that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error.  Right now, I have to suss those out by
regex.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-05 Thread Josh Berkus
On 12/05/2013 12:41 PM, Robert Haas wrote:
 On Thu, Dec 5, 2013 at 12:54 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, I've been able to do 35K TPS on commodity hardware on Ubuntu
 10.04.   I have yet to go about 15K on any Ubuntu running a 3.X Kernel.
  The CPU scheduling on 2.6 just seems to be far better tuned, aside from
 the IO issues; at 35K TPS, the CPU workload is evenly distributed across
 cores, whereas on 3.X it lurches from core to core like a drunk in a
 cathedral.
 
 Do drunks lurch differently in cathedrals than they do elsewhere?

Yeah, beacause they lurch from one column to another.  It's a visual
metaphor.  ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    4   5   6   7   8   9   10   11   12   13   >