Re: [HACKERS] proposal: hide application_name from other users
On 01/29/2014 10:19 AM, Simon Riggs wrote: No specific reason that I can recall but replication is heavily protected by layers of security. pg_stat_replication is a join with pg_stat_activity, so some of the info is open, some closed. It seems possible to relax that. I'm all for the idea of restrict, then open up. That is, it made sense to start with data restricted, but then unrestrict is as we know it's OK. Going the other way generally isn't possible, as this patch demonstrates. Presumably the current patch is returned with feedback? Or can we fix these problems by inventing a new user aspect called MONITOR (similar to REPLICATION)? We can grant application_name and replication details to that. Yeah, except I don't see doing the MONITOR thing for 9.4. We'd need a spec for it first. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 11:54 AM, Robert Haas wrote: I agree. I find it somewhat unlikely that pg_stat_statements is fragile enough that these few extra counters are going to make much of a difference. At the same time, I find min and max a dubious value proposition. It seems highly likely to me that stddev will pay its way, but I'm much less certain about the others. What I really want is percentiles, but I'm pretty sure we already shot that down. ;-) I could use min/max -- think of performance test runs. However, I agree that they're less valuable than stddev. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/29/2014 12:46 PM, Merlin Moncure wrote: I think the opening paragraphs contrasting json/jsonb be needs refinement. json is going to be slightly faster than jsonb for input *and* output. For example, in one application I store fairly large json objects containing pre-compiled static polygon data that is simply flipped up to google maps. This case will likely be pessimal for jsonb. For the next paragaph, I'd like to expand it a bit on 'specialized needs' and boil it down to specific uses cases. Basically, json will likely be more compact in most cases and slightly faster for input/output; jsonb would be preferred in any context where processing, or searching or extensive server side parsing is employed. If you agree, I'd be happy to do that... Please take a stab at it, I'll be happy to revise it. I was working on doing a two-column table comparison chart; I still think that's the best way to go. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/29/2014 02:37 PM, Merlin Moncure wrote: create table bar(a int, b int[]); postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, false); ERROR: cannot populate with a nested object unless use_json_as_text is true Hmmm. What about just making any impossibly complex objects type JSON? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/28/2014 09:58 AM, Merlin Moncure wrote: yeah. note: I think the json documentation needs *major* overhaul. too much is going in inside the function listings where there really should be a big breakout discussing the big picture of json/jsonb with examples of various use cases. I want to give it a shot but unfortunately can not commit to do that by the end of the 'fest. FWIW, I've promised Andrew that I'll overhaul this by the end of beta. Given that we have all of beta for doc refinements. In addition to this, the JSON vs JSONB datatype page really needs expansion and clarification. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/27/2014 01:06 PM, Alvaro Herrera wrote: Andrew Dunstan escribió: I'm not sure I understand the need. This is the difference between the _text variants and their parents. Why would you call json_object_field when you want the dequoted text? Because I first need to know its type. Sometimes it's an array, or an object, or a boolean, and for those I won't call the _text version afterwards but just use the original. It would make more sense to extract them as JSON, check the type, and convert. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins
On 01/28/2014 09:46 AM, Atri Sharma wrote: I would like to bring up the addition to MADLIB algorithms again this year. Also, some work on the foreign table constraints could be helpful. We can only take MADLIB this year if we have confirmed mentors who are MADLIB committers before the end of the application period (Feb 15). We can't have a repeat of last year. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/28/2014 10:29 AM, Merlin Moncure wrote: In addition to this, the JSON vs JSONB datatype page really needs expansion and clarification. right: exactly. I'd be happy to help (such as I can) ...I wanted to see if jsonb to make it in on this 'fest (doc issues notwithstanding); it hasn't been formally reviewed yet AFAICT. So my thinking here is to get docs to minimum acceptable standards in the short term and focus on the structural code issues for the 'fest (if jsonb slips then it's moot obviously). Well, having reviewed the docs before Andrew sent them in, I felt they already *were* minimum acceptable. Certainly they're as complete as the original JSON docs were. Or is this just about whitespace and line breaks? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/28/2014 10:56 AM, Alvaro Herrera wrote: Josh Berkus escribió: Or is this just about whitespace and line breaks? If the docs are going to be rehauled, please ignore my whitespace comments. I'm sure you'll find plenty to criticize in my version. ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/28/2014 07:27 AM, Greg Stark wrote: Why is application_name useful for users who aren't the DBA and aren't the user in question. The sql_query would probably be more useful than application_name but we hide that... I have non-privileged monitoring scripts do counts of connections by application name all the time as a way of checking for runaway applications, and would be quite put out by restricting this to superusers. Really the only way we're going to solve this is to make column permissions on special system views fully configurable. For example, I would really like to GRANT an unpriv user access to the WAL columns in pg_stat_replication so that I can monitor replication delay without granting superuser permissions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/28/2014 12:10 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: For example, I would really like to GRANT an unpriv user access to the WAL columns in pg_stat_replication so that I can monitor replication delay without granting superuser permissions. Just out of curiosity, why is that superuser-only at all? AFAICS the hidden columns are just some LSNs ... what is the security argument for hiding them in the first place? Beats me, I can't find any discussion on it at all. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/26/2014 07:56 PM, Rajeev rastogi wrote: I shall rework to improve this patch. Below are the summarization of all discussions, which will be used as input for improving the patch: 1. Method of degrading the synchronous mode: a. Expose the configuration variable to a new SQL-callable functions. b. Using ALTER SYSTEM SET. c. Auto-degrade using some sort of configuration parameter as done in current patch. d. Or may be combination of above, which DBA can use depending on their use-cases. We can discuss further to decide on one of the approach. 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master. 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. I shall propose a better approach for this. 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. 5. Please add more if I am missing something. I think we actually need two degrade modes: A. degrade once: if the sync standby connection is ever lost, degrade and do not resync. B. reconnect: if the sync standby catches up again, return it to sync status. The reason you'd want degrade once is to avoid the flaky network issue where you're constantly degrading then reattaching the sync standby, resulting in horrible performance. If we did offer degrade once though, we'd need some easy way to determine that the master was in a state of permanent degrade, and a command to make it resync. Discuss? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 07:22 PM, Alvaro Herrera wrote: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. Anything which depends on a timing-based feedback loop is going to be hopeless. Saying autovac shouldn't run if load is high sounds like a simple statement, until you actually try to implement it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/24/2014 12:59 PM, Andrew Dunstan wrote: On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. FWIW, I prefer the parameter to having differently named functions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. I don't feel that we'll never do auto-degrade is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 12:34 PM, Joshua D. Drake wrote: Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? If we let autovacuum block user activity, a lot more people would turn it off. Now, if you were to argue that we should have some way to monitor the tables which autovac can never touch because of conflicts, I would agree with you. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 02:17 PM, Magnus Hagander wrote: FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up... It would be 100% more information than we currently have. How much more difficult would it be to count completed autovacuums as well? It's really the ratio of the two which matters ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 02:55 PM, Josh Berkus wrote: On 01/23/2014 02:17 PM, Magnus Hagander wrote: FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up... It would be 100% more information than we currently have. How much more difficult would it be to count completed autovacuums as well? It's really the ratio of the two which matters ... Actually, now that I think about it, the ratio of the two doesn't matter as much as whether the most recent autovacuum aborted or not. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/21/2014 05:21 PM, Andres Freund wrote: I think the only realistic thing is a monitoring capability, like we have replication. GRANT/REVOKE doesn't even come close to being able to generically allow to grant permissions of even the moderate complexity pg_stat_get_activity() has. That would work for me, personally. I don't know how it would work for anyone else. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/21/2014 05:22 PM, Mark Kirkwood wrote: If said malicious attacker can log into postgres and issue its own queries, and connect to other database then you are in serious trouble already. I also wonder that if such an attacker knows the application name, that would suggest that they have access to the application server and are able to read its config files...which would probably also contain the host and db name too (and possibly the password in some unfortunate cases)! Common case: Multitenant shared hosting on a public cloud. 1. attacker writes a tool which exploits VulnerableApplication and takes it over. 2. they exploit SiteX, running that web app. 3. using SiteX's database credentials, they check pg_stat_activity and see what other hosts are running VulnerableApplication. 4. They then infect the other hosts running VulnerableApplication. Alternately: 4. They use VulnerableApplication's predictable password-generating flaw to log into the other databases, or to try the default password which ships with the app. However, thinking about the case above, there are a number of caveats and workarounds which make the above not that interesting of an exploit case: A. it's easy to find VulnerableApplication simply by scanning the web. Easier, in fact, than the above, if you have an IP block to start with, and you would. B. Most applications don't actually set application-specific application names anyway (psycopg2, libpq). C. It would be trivially easy for a DBA concerned about security to obfuscate application names in a way which would not be easy for an attacker to analyze. D. Don't use default passwords. Also, the attacker merely needs to try each database in turn anyway. Given the above, I think this specific patch falls into the broad class of things we would like to have in a multi-tenant toolkit (and is far from the most useful of those), which would include: * hiding application_name, user_name, and database_names from users of other databases * local superuser who can create per-database users and extensions from an approved list * ability to block users from changing some resource limits (work_mem, for example). * per-database logging (could be done with log hooks, just needs a contrib). It seems to me that it's not terribly useful to fix one item on the above list without having at least a *plan* to address the others. This really needs to be part of a comprehensive system, not piecework, or we'll end up with a bunch of little security options which don't work together. Probably Heroku has some more specific exploit case to be concerned about here; if so, might I suggest taking it up with the -security list? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_istready and older versions
All, pg_isready works against older versions of PostgreSQL. Does anyone know if there's a limit to that? v3 protocol change? Something else? Backwards compatibility ought to be in its docs, but to fix that I need to know what version it's compatible *to*. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: hide application_name from other users
On 01/21/2014 04:12 AM, Stephen Frost wrote: It also means that monitoring tools must run as superuser to see information they require, which to me is a total showstopper. We've already got *far* too much of that going on for my taste. I'd love to see a comprehensive solution to this problem which allows monitoring systems to run w/o superuser privileges. Agreed. I just ran into the issue that getting data out of pg_stat_replication on replication lag requires superuser privs, which is annoying; without that, I can run Handyrep entirely as an unprivileged user, but that one fact requires the handyrep database user to be a superuser. It would be really nice to be able to GRANT/REVOKE on some of these special system views ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)
Mel, So we have a few interested parties. What do we need to do to set up the Collab session? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it
On 01/16/2014 10:46 AM, Tom Lane wrote: I'm fine if the proposal is that postgresql.conf include include_dir conf.d by default (where that's read as relative to postgresql.conf's own directory). Even better if it's not terribly difficult for a packager to change that, because I think some will want to. We could possibly reduce the need for packagers to change it if we made it be include_dir postgresql.d, because conf.d is a damn generic name for something that might be in the same /etc directory as configs for other packages. FWIW, this is what I was proposing. We have an include_dir postgresql.conf.d currently in the stock postgresql.conf, but it's disabled (commented out) by default. I'd just like it enabled by default, and to pass a suggestion to the packagers that they pick an appropriate directory and enable it by default. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it
On 01/16/2014 07:32 AM, Christian Kruse wrote: Hi Alvaro, On 16/01/14 10:21, Alvaro Herrera wrote: 1. it is to be read automatically by the server without need for an include_dir conf.d option in the main postgresql.conf. +1 4. there is no point in disabling it, and thus we offer no mechanism to do that. Not only there is „no point“ in disabling it, it makes this feature nearly useless. One can't rely on it if the distro may disable it. There are so many out there, it will never be a reliable feature if it can be disabled. It would make *my* life vastly easier if we could mandate things like the presence and relative directory of a conf.d. However, if Apache can't do it, we certainly can't. Ultimately, we cannot impose things on distributions which they are unwilling to support; Debian, for one, will happily fork PostgreSQL rather than accept directory assignments which don't meet their standards. Also, enough people install PostgreSQL from source or using custom packages to make for a high degree of variation anyway. That's why I was just advocating changing the *defaults*, not mandating anything. Actual directory locations and usage should be configurable by distros, packagers and users. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink performance regression
On 12/07/2013 05:50 PM, Joe Conway wrote: On 12/07/2013 05:41 PM, Fabrízio de Royes Mello wrote: On Sat, Dec 7, 2013 at 11:20 PM, Michael Paquier michael.paqu...@gmail.com mailto:michael.paqu...@gmail.com wrote: IMHO is more elegant create a procedure to encapsulate the code to avoid redundancy. Yep, perhaps something like PQsetClientEncodingIfDifferent or similar would make sense. Well I think at this first moment we can just create a procedure inside the dblink contrib and not touch in libpq. Maybe a libpq function could be done for 9.4, but not for back branches. I don't think it makes sense to create a new function in dblink either -- we're only talking about two lines of added redundancy which is less lines of code than a new function would add. But if we create PQsetClientEncodingIfDifferent() (or whatever) we can remove those extra lines in 9.4 ;-) Hey, since we're about to do 9.3.3: was this patch ever committed? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it
Hackers, ALTER SYSTEM SET has been committed and recovery.conf GUCs are being reviewed. I'm going to make a last case for conf.d in relation to these two patches before 9.4 goes out the door. In 9.3, we added support for a config directory (conf.d), but have it disabled by default. For tool authors, this makes conf.d useless since you never know, on any given installation, whether it will be present/enabled or not. While we don't want to prevent users from disabling it, conf.d only becomes useful if it's present by default. There's a simple reason why: if you want to write a tool which manages postgresql.conf, you don't want the user to have to manually edit postgresql.conf (and create a directory) in order to enable the tool. I'm particularly thinking about this in relation to the merger of recovery.conf and postgresql.conf. There are several tools already (RepMgr, OminPITR, HandyRep, pgPool, etc.) which manage recovery.conf separately from postgresql.conf. These tools will want to continue managing recovery.conf as a separate file, even if it's /included in postgresql.conf now. If conf.d exists by default, and is enabled in postgresql.conf by default, this is easy: the tool just drops a recovery.conf file into conf.d. Changing file locations and variable names is a fairly simple exercise in backwards compatibility. If conf.d does NOT exist by default, things become complicated, and backwards compatibility for replication management tools becomes much harder. Yes, I'm also arguing that postgresql.auto.conf should go into conf.d. I said I'd bring that up again after ALTER SYSTEM SET was committed, and here it is. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it
/ vs /var), then postgresql.auto.conf should stay in the data directory. Ah, I'd forgotten about that line of argument. Where is auto.conf now? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
But you can have a single $SHAREDIR per set of executables, right? Please read the following email to know what they asked for and how they do operate OpenShift: http://www.postgresql.org/message-id/341087492.2585530.1376776393038.javamail.r...@redhat.com FWIW, I'm talking with Amazon later this week and checking how they're handling their tenant-loadable extensions. I'd like to come up with one solution here which covers all cloud providers. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
Mel, I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014 (LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/) that you may have seen already. In recent years we have had at least one topic that was shared between all three tracks that was lead by a person outside of the usual kernel development community. I am checking if the PostgreSQL community would be willing to volunteer someone to lead a topic discussing PostgreSQL performance with recent kernels or to highlight regressions or future developments you feel are potentially a problem. With luck someone suitable is already travelling to the collaboration summit (http://events.linuxfoundation.org/events/collaboration-summit) and it would not be too inconvenient to drop in for LSF/MM as well. We can definitely get someone there. I'll certainly be there; I'm hoping to get someone who has closer involvement with our kernel interaction as well. There are two reasons why I'm suggesting this. First, PostgreSQL was the basis of a test used to highlight a scheduler problem around kernel 3.6 but otherwise in my experience it is rare that PostgreSQL is part of a bug report. I am skeptical this particular bug report was a typical use case for PostgreSQL (pgbench, read-only, many threads, very small in-memory database). I wonder why reports related to PostgreSQL are not more common. One assumption would be that PostgreSQL is perfectly happy with the current kernel behaviour in which case our discussion here is done. To be frank, it's because most people are still running on 2.6.19, and as a result are completely unaware of recent developments. Second, because there's no obvious place to complain to ... lkml doesn't welcome bug reports, and where else do you go? Does the PostgreSQL community have a problem with recent kernels, particularly with respect to the storage, filesystem or memory management layers? If yes, do you have some data that can highlight this and can you volunteer someone to represent your interests to the kernel community? Yes, and yes. Are current developments in the IO layer counter to the PostgreSQL requirements? If so, what developments, why are they a problem, do you have a suggested alternative or some idea of what we should watch out for? Mostly the issue is changes to the IO scheduler which improve one use case at the expense of others, or set defaults which emphasize desktop hardware over server hardware. What also came up with the recent change to LRU is that the Postgres community apparently has more experience than the Linux community with buffer-clearing algorithms, and we ought to share that. The track topic would be up to you but just as a hint, we'd need something a lot more concrete than you should test more. How about don't add major IO behavior changes with no backwards-compatibility switches? ;-) Seriously, one thing I'd like to get out of Collab would be a reasonable regimen for testing database performance on Linux kernels. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 10:51 AM, Kevin Grittner wrote: How about don't add major IO behavior changes with no backwards-compatibility switches? ;-) I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Is that because there *are* configuration options that allow people to get decent performance once the issue is diagnosed? It seems like maybe there could be a better way to give a heads-up on hazards in a new kernel to the database world, but I don't know quite what that would be. For all I know, it is already available if you know where to look. Well, it was the lack of sysctl options which takes the 2Q change from annoyance to potential disaster. We can't ever get away from the possibility that the Postgres use-case might be the minority use-case, and we might have to use non-default options. It's when those options aren't present *at all* that we're stuck. However, I agree that a worthwhile thing to talk about is having some better channel to notify the Postgres (and other DB) communities about major changes to IO and Memory management. Wanna go to Collab? Seriously, one thing I'd like to get out of Collab would be a reasonable regimen for testing database performance on Linux kernels. ... or perhaps you figure this is what would bring such issues to the community's attention before people are bitten in production environments? That, too. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance
Everyone, I am looking for one or more hackers to go to Collab with me to discuss this. If you think that might be you, please let me know and I'll look for funding for your travel. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 02:26 PM, Mel Gorman wrote: Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned to fit within NUMA nodes. On older kernels enabling it would sometimes cause massive stalls. I'm actually very surprised to hear it fixes anything and would be interested in hearing more about what sort of circumstnaces would convince you to enable that thing. So the problem with the default setting is that it pretty much isolates all FS cache for PostgreSQL to whichever socket the postmaster is running on, and makes the other FS cache unavailable. This means that, for example, if you have two memory banks, then only one of them is available for PostgreSQL filesystem caching ... essentially cutting your available cache in half. And however slow moving cached pages between memory banks is, it's an order of magnitude faster than moving them from disk. But this isn't how the NUMA stuff is configured; it seems to assume that it's less expensive to get pages from disk than to move them between banks, so whatever you've got cached on the other bank, it flushes it to disk as fast as possible. I understand the goal was to make memory usage local to the processors stuff was running on, but that includes an implicit assumption that no individual process will ever want more than one memory bank worth of cache. So disabling all of the NUMA optimizations is the way to go for any workload I personally deal with. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 01/13/2014 03:41 PM, Florian Pflug wrote: It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. We shouldn't be second-guessing ourselves by changing that later - not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's IMHO just too late for that) I *really* don't want to go through all my old code to find places where I used SELECT ... INTO just to pop off the first row, and ignored the rest. I doubt anyone else does, either. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 01/13/2014 04:20 PM, Jim Nasby wrote: On 1/13/14, 5:57 PM, Josh Berkus wrote: I *really* don't want to go through all my old code to find places where I used SELECT ... INTO just to pop off the first row, and ignored the rest. I doubt anyone else does, either. Do you regularly have use cases where you actually want just one RANDOM row? I suspect the far more likely scenario is that people write code assuming they'll get only one row and they'll end up with extremely hard to trace bugs if that assumption is ever wrong. Regularly? No. But I've seen it, especially as part of a does this query return any rows? test. That's not the best way to test that, but that doesn't stop a lot of people doing it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 01/13/2014 05:10 PM, Jim Nasby wrote: On 1/13/14, 7:06 PM, Josh Berkus wrote: Regularly? No. But I've seen it, especially as part of a does this query return any rows? test. That's not the best way to test that, but that doesn't stop a lot of people doing it. Right, and I certainly don't want to force anyone to rewrite all their code. But I'd certainly like a safer default so people don't mistakenly go the multiple rows is OK route without doing so very intentionally. The problem is that if you change the default, you're creating an unexpected barrier to upgrading. I just don't think that it's worth doing so in order to meet some standard of code neatness, especially in plpgsql, the unwanted bastard child of SQL and ADA. For people who want to enable this in order to prevent stupid query bugs from creeping into their plpgsql, that's great, let's have an easy option to turn on. But it's hard enough to get people to upgrade as it is. If we're going to add an upgrade landmine, it better be for something really important. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where do we stand on 9.3 bugs?
On 01/13/2014 07:16 AM, Tom Lane wrote: What remaining issues are there blocking a 9.3.3 release? I know that there were unresolved multixact issues when we put out 9.3.2 --- are those all dealt with now? What else do people see as release-blockers? I see this bug as a release-blocker. It's a fairly crippling replication bug for a bunch of users (5 reports to date): http://archives.postgresql.org/message-id/20140106134815.GE28320%40alap2.anarazel.de Despite actually dating back to 9.0, something in the 9.3.2/9.2.6 updates is causing users to hit it now. And for those who do hit it, replication is impossible and there's no workaround. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 05:30 PM, Dave Chinner wrote: On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote: No matter what default NUMA allocation policy we set, there will be an application for which that behaviour is wrong. As such, we've had tools for setting application specific NUMA policies for quite a few years now. e.g: Yeah, that's why I personally regard the NUMA stuff as just an information problem; there's an easy configuration variable, and you can't please everyone (and our project would hardly be one to point fingers about sub-optimal default configurations). I was responding to a question of what's wrong with the default setting? Personally, I have my doubts that the NUMA memory isolation, as currently implemented, accomplishes what it wants to do. But that's a completely different discussion. The real issue there was that our users had never heard of this change until suddenly half their RAM became unavailable. So the solution is for our project to somehow have these kinds of changes flagged for our attention so that we can update our docs. The kernel change list is quite volumnious, and it's very easy to miss changes of significance in it. The easiest way to do this is going to be getting involved in kernel-database performance testing. Of course, we are annoyed that we finally removed the main reason to modify sysctl.conf (SHMMAX), and here we are needing to advise users about sysctl again. :-( I'm much more bothered by the introduction of 2Q logic, since that comes without a configuration variable to modify its behavior. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 05:48 PM, Andres Freund wrote: On 2014-01-13 10:56:00 -0800, Josh Berkus wrote: Well, it was the lack of sysctl options which takes the 2Q change from annoyance to potential disaster. We can't ever get away from the possibility that the Postgres use-case might be the minority use-case, and we might have to use non-default options. It's when those options aren't present *at all* that we're stuck. Unless I am missing something the kernel's going further *away* from a simple 2q system, not the contrary. Well, they implemented a 2Q system and deliberately offered no sysctl variables to modify its behavior. Now they're talking about implementing an ARC system -- which we know the perils of -- again, without any configuration variables in case the default behavior doesn't work for everyone. And it's highly unlikely that an ARC which is designed for desktop and/or file server users -- let alone mobile users -- is going to be optimal for PostgreSQL out of the box. In fact, I'd assert that it's flat-out impossible to engineer an ARC which will work for multiple different use cases without user-level configuration. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
All, I'm leading this off with a review of the features offered by the actual patch submitted. My general discussion of the issues of Sync Degrade, which justifies my specific suggestions below, follows that. Rajeev, please be aware that other hackers may have different opinions than me on what needs to change about the patch, so you should collect all opinions before changing code. === Add a new parameter : synchronous_standalone_master = on | off I think this is a TERRIBLE name for any such parameter. What does synchronous standalone even mean? A better name for the parameter would be auto_degrade_sync_replication or synchronous_timeout_action = error | degrade, or something similar. It would be even better for this to be a mode of synchronous_commit, except that synchronous_commit is heavily overloaded already. Some issues raised by this log script: LOG: standby tx0113 is now the synchronous standby with priority 1 LOG: waiting for standby synchronization -- standby wal receiver on the standby is killed (SIGKILL) LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization -- restart standby so that it connects again LOG: standby tx0113 is now the synchronous standby with priority 1 LOG: waiting for standby synchronization -- standby wal receiver is first stopped (SIGSTOP) to make sure The not waiting for standby synchronization message should be marked something stronger than LOG. I'd like ERROR. Second, you have the master resuming sync rep when the standby reconnects. How do you determine when it's safe to do that? You're making the assumption that you have a failing sync standby instead of one which simply can't keep up with the master, or a flakey network connection (see discussion below). a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. I'm not at all clear what the difference between these two commands is. When would one be excuted, and when would the other be executed? Also, renaming ... Missing features: a) we should at least send committing clients a WARNING if they have commited a synchronous transaction and we are in degraded mode. I know others have dismissed this idea as too talky, but from my perspective, the agreement with the client for each synchronous commit is being violated, so each and every synchronous commit should report failure to sync. Also, having a warning on every commit would make it easier to troubleshoot degraded mode for users who have ignored the other warnings we give them. b) pg_stat_replication needs to show degraded mode in some way, or we need pg_sync_rep_degraded(), or (ideally) both. I'm also wondering if we need a more sophisticated approach to wal_sender_timeout to go with all this. === On 01/11/2014 08:33 PM, Bruce Momjian wrote: On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. Well, one goal I was considering is that if a commit is hung waiting for slave sync confirmation, and the timeout happens, then the mode is changed to degraded and the commit returns success. I am not sure how you would do that in an external tool, meaning there is going to be period where commits fail, unless you think there is a way that when the external tool changes the mode to degrade that all hung commits complete. That would be nice. Realistically, though, that's pretty unavoidable. Any technique which waits a reasonable interval to determine that the replica isn't going to respond is liable to go beyond the application's timeout threshold anyway. There are undoubtedly exceptions to that, but it will be the case a lot of the time -- how many applications are willing to wait *minutes* for a COMMIT? I also don't see any way to allow the hung transactions to commit without allowing the walsender to make a decision on degrading. As I've outlined elsewhere (and below), the walsender just doesn't have enough information to make a good decision. On 01/11/2014 08:52 PM, Amit Kapila wrote: It is better than async mode in a way such that in async mode it never waits for commits to be written to standby, but in this new mode it will do so unless it is not possible (all sync standby's goes down). Can't we use existing wal_sender_timeout, or even if user expects a different timeout because for this new mode, he expects master to wait more before it start operating like standalone sync master, we can provide a new parameter. One of the reasons that there's so much disagreement about this feature is that most
Re: [HACKERS] Standalone synchronous master
On 01/12/2014 12:35 PM, Stephen Frost wrote: * Josh Berkus (j...@agliodbs.com) wrote: You don't want to handle all of those issues the same way as far as sync rep is concerned. For example, if the standby is restaring, you probably want to wait instead of degrading. *What*?! Certainly not in any kind of OLTP-type system; a system restart can easily take minutes. Clearly, you want to resume once the standby is back up, which I feel like the people against an auto-degrade mode are missing, but holding up a commit until the standby finishes rebooting isn't practical. Well, then that becomes a reason to want better/more configurability. In the couple of sync rep sites I admin, I *would* want to wait. There's also the issue that this patch, and necessarily any walsender-level auto-degrade, has IMHO no safe way to resume sync replication. This means that any use who has a network or storage blip once a day (again, think AWS) would be constantly in degraded mode, even though both the master and the replica are up and running -- and it will come as a complete surprise to them when the lose the master and discover that they've lost data. I don't follow this logic at all- why is there no safe way to resume? You wait til the slave is caught up fully and then go back to sync mode. If that turns out to be an extended problem then an alarm needs to be raised, of course. So, if you have auto-resume, how do you handle the flaky network case? And how would an alarm be raised? On 01/12/2014 12:51 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: I know others have dismissed this idea as too talky, but from my perspective, the agreement with the client for each synchronous commit is being violated, so each and every synchronous commit should report failure to sync. Also, having a warning on every commit would make it easier to troubleshoot degraded mode for users who have ignored the other warnings we give them. I agree that every synchronous commit on a master which is configured for synchronous replication which returns without persisting the work of the transaction on both the (local) primary and a synchronous replica should issue a WARNING. That said, the API for some connectors (like JDBC) puts the burden on the application or its framework to check for warnings each time and do something reasonable if found; I fear that a Venn diagram of those shops which would use this new feature and those shops that don't rigorously look for and reasonably deal with warnings would have significant overlap. Oh, no question. However, having such a WARNING would help with interactive troubleshooting once a problem has been identified, and that's my main reason for wanting it. Imagine the case where you have auto-degrade and a flaky network. The user would experience problems as performance problems; that is, some commits take minutes on-again, off-again. They wouldn't necessarily even LOOK at the sync rep settings. So next step is to try walking through a sample transaction on the command line, and then the DBA/consultant gets WARNING messages, which gives an idea where the real problem lies. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/10/2014 06:27 PM, Bruce Momjian wrote: How would that work? Would it be a tool in contrib? There already is a timeout, so if a tool checked more frequently than the timeout, it should work. The durable notification of the admin would happen in the tool, right? Well, you know what tool *I'm* planning to use. Thing is, when we talk about auto-degrade, we need to determine things like Is the replica down or is this just a network blip? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. Such a tool would *also* be capable of detecting when the synchronous replica was back up and operating, and switch back to sync mode, something we simply can't do inside Postgres. And it would be a lot easier to configure an external tool with monitoring system integration so that it can alert the DBA to degradation in a way which the DBA was liable to actually see (which is NOT the Postgres log). In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] units in postgresql.conf comments
On 01/11/2014 11:06 AM, Bruce Momjian wrote: On Wed, May 29, 2013 at 09:59:10PM -0400, Peter Eisentraut wrote: I think these sort of entries don't make much sense: #wal_sender_timeout = 60s # in milliseconds; 0 disables I think we should remove units from the comments when it's clear from the name or the default value that time units are accepted. So, is anyone doing this? Should it be a TODO item? I don't agree, actually, unless we take the next step and actually clean all the documentation garbage out of the file and leave it in the main docs and pg_settings where it belongs. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to do our Triage for 9.4
All, To make this easier for everyone to participate in, I've created a wiki page: https://wiki.postgresql.org/wiki/9.4CF4Triage Please add the patches you know well to the appropriate list, thanks! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/10/2014 02:59 PM, Joshua D. Drake wrote: On 01/10/2014 02:47 PM, Andres Freund wrote: Really, the commits themselves are sent to the server at exactly the same speed independent of sync/async. The only thing that's delayed is the *notificiation* of the client that sent the commit. Not the commit itself. Which is irrelevant to the point that if the standby goes down, we are now out of business. Any continuous replication should not be a SPOF. The current behavior guarantees that a two node sync cluster is a SPOF. The proposed behavior removes that. Again, if that's your goal, then use async replication. I really don't understand the use-case here. The purpose of sync rep is to know determinatively whether or not you have lost data when disaster strikes. If knowing for certain isn't important to you, then use async. BTW, people are using RAID1 as an analogy to 2-node sync replication. That's a very bad analogy, because in RAID1 you have a *single* controller which is capable of determining if the disks are in a failed state or not, and this is all happening on a single node where things like network outages aren't a consideration. It's really not the same situation at all. Also, frankly, I absolutely can't count the number of times I've had to rescue a customer or family member who had RAID1 but wan't monitoring syslog, and so one of their disks had been down for months without them knowning it. Heck, I've done this myself. So ... the Filesystem geeks have already been through this. Filesystem clustering started out with systems like DRBD, which includes an auto-degrade option. However, DBRD with auto-degrade is widely considered untrustworthy and is a significant portion of why DBRD isn't trusted today. From here, clustered filesystems went in two directions: RHCS added layers of monitoring and management to make auto-degrade a safer option than it is with DRBD (and still not the default option). Scalable clustered filesystems added N(M) quorum commit in order to support more than 2 nodes. Either of these courses are reasonable for us to pursue. What's a bad idea is adding an auto-degrade option without any tools to manage and monitor it, which is what this patch does by my reading. If I'm wrong, then someone can point it out to me. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/10/2014 01:49 PM, Andres Freund wrote: On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: On 01/10/2014 07:47 AM, Bruce Momjian wrote: I know there was a desire to remove this TODO item, but I think we have brought up enough new issues that we can keep it to see if we can come up with a solution. I have added a link to this discussion on the TODO item. I think we will need at least four new GUC variables: * timeout control for degraded mode * command to run during switch to degraded mode * command to run during switch from degraded mode * read-only variable to report degraded mode I would argue that we don't need the first. We just want a command to switch synchronous/degraded, and a variable (or function) to report on degraded mode. If we have those things, then it becomes completely possible to have an external monitoring framework, which is capable of answering questions like is the replica down or just slow?, control degrade. Oh, wait! We DO have such a command. It's called ALTER SYSTEM SET! Recently committed. So this is really a solvable issue if one is willing to use an external utility. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to do our Triage for 9.4
On 01/10/2014 01:34 PM, David Rowley wrote: On Sat, Jan 11, 2014 at 8:28 AM, Josh Berkus j...@agliodbs.com wrote: All, To make this easier for everyone to participate in, I've created a wiki page: https://wiki.postgresql.org/wiki/9.4CF4Triage Please add the patches you know well to the appropriate list, thanks! I know my own patch pretty well and from my own point of view it's very close to being about ready to go, but a good review may change that. Should we be waiting for 2nd opinions or can patch authors decide for themselves? Or were you talking only to commiters? Well, I'd prefer that someone other than the patch author assess the patch state; the author is going to be characteristically optimistic. However, it's a wiki. If you put it under good to go someone else who disagrees can move it. IMHO, if the patch hasn't had at least one review yet (in a prior CF), though, I'd put it under Nearly Ready. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Robert, I think the problem here is that we tend to have a limited view of the right way to use synch rep. If I have 5 nodes, and I set 1 synchronous and the other 3 asynchronous, I've set up a known successor in the event that the leader fails. In this scenario though, if the successor fails, you actually probably want to keep accepting writes; since you weren't using synchronous for durability but for operational simplicity. I suspect there are probably other scenarios where users are willing to trade latency for improved and/or directed durability but not at the extent of availability, don't you? That's a workaround for a completely different limitation though; the inability to designate a specific async replica as first. That is, if there were some way to do so, you would be using that rather than sync rep. Extending the capabilities of that workaround is not something I would gladly do until I had exhausted other options. The other problem is that *many* users think they can get improved availability, consistency AND durability on two nodes somehow, and to heck with the CAP theorem (certain companies are happy to foster this illusion). Having a simple, easily-accessable auto-degrade without treading degrade as a major monitoring event will feed this self-deception. I know I already have to explain the difference between synchronous and simultaneous to practically every one of my clients for whom I set up replication. Realistically, degrade shouldn't be something that happens inside a single PostgreSQL node, either the master or the replica. It should be controlled by some external controller which is capable of deciding on degrade or not based on a more complex set of circumstances (e.g. Is the replica actually down or just slow?). Certainly this is the case with Cassandra, VoltDB, Riak, and the other serious multinode databases. This isn't to say there isn't a lot of confusion around the issue. Designing, implementing, and configuring different guarantees in the presence of node failures is a non-trivial problem. Still, I'd prefer to see Postgres head in the direction of providing more options in this area rather than drawing a firm line at being a CP-oriented system. I'm not categorically opposed to having any form of auto-degrade at all; what I'm opposed to is a patch which adds auto-degrade **without adding any additional monitoring or management infrastructure at all**. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 01/09/2014 06:12 AM, Andrew Dunstan wrote: Oleg, Please merge in the jsonb work and resubmit. See https://github.com/feodor/postgres/commits/jsonb_and_hstore I note that this repo does not apparently contain any of your latest changes. I'll go further and say that if the Hstore2 patch doesn't support JSONB for 9.4, we should postpone it to 9.5. We really don't want to get into a situation where we need an Hstore3 because we accepted an Hstore2 which needs to be rev'd for JSON. Especially since there's no good reason for the JSON changes not to be merged already. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 01/09/2014 12:05 PM, Heikki Linnakangas wrote: Actually, why is the partially-filled 00010002 file archived in the first place? Looking at the code, it's been like that forever, but it seems like a bad idea. If the original server is still up and running, and writing more data to that file, what will happen is that when the original server later tries to archive it, it will fail because the partial version of the file is already in the archive. Or worse, the partial version overwrites a previously archived more complete version. Oh! This explains some transient errors I've seen. Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Time to do our Triage for 9.4
Hackers, Per the Developer Meeting, we are scheduled to do a final triage of 9.4 patches the week before CF4 starts, which is *now*. The goal of this triage is to divide patches already in queue into 5 groups: 1) Good To Go: patches which are 100% ready for final testing and commit. 2) Needs a Little Work: smaller patches which can be included in 9.4 if they get a few hours of love from a committer or major hacker. 3) Big Patches: big, important patches which will need a major time committement to commit even though they are 90% ready, just due to size. 4) Not Nearly Ready: Patches which need major work and/or spec discussions before commitment. 5) WIP: any patch which is acknowledged just there for review, or any brand-new patch which wasn't in CF3 and is non-trivial. Obviously, any patches in groups 4 and 5 aren't going into 9.4. The idea is to handle patches in CF4 in this order: 1. do immediately 2. do after (1) is complete 3. assign 1 senior hacker reviewer to each patch 4. review as time permits after 1-3 5. review as time permits after 1-3 Let the triage begin! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to do our Triage for 9.4
On 01/08/2014 11:07 AM, David Fetter wrote: On Wed, Jan 08, 2014 at 10:45:37AM -0800, Josh Berkus wrote: Hackers, Per the Developer Meeting, we are scheduled to do a final triage of 9.4 patches the week before CF4 starts, which is *now*. The goal of this triage is to divide patches already in queue into 5 groups: With utmost respect, there are unsent patches which don't fit into the categories below. Are you saying now that the cut-off is today, and not actually 1/15, as everyone understands it to be? It has always been our policy that non-trivial patches which appear for the first time in CF4 go to the bottom of the priority list. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to do our Triage for 9.4
What about doing the triage the first two days or so of the CF? Then we know which patches have been submitted/updated and we haven't yet performed too many reviews of patches which aren't going to make it. If we triage the patches we already have NOW, then it'll be possible to triage the new/updated stuff which comes in in the first 48 hours of the CF. If we wait until the CF begins, we'll spend at least the first week of the CF triaging. That's why we set this schedule at the developer meeting. And besides, we already know what category *your* patch belongs in ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/08/2014 12:27 PM, Bruce Momjian wrote: I am glad Heikki and Simon agree, but I don't. ;-) The way that I understand it is that you might want durability, but might not want to sacrifice availability. Phrased that way, it makes sense, and notifying the administrator seems the appropriate action. I think there's a valid argument to want things the other way, but I find the argument not persuasive. In general, people who want auto-degrade for sync rep either: a) don't understand what sync rep actually does (lots of folks confuse synchronous with simultaneous), or b) want more infrastructure than we actually have around managing sync replicas Now, the folks who want (b) have a legitimate need, and I'll point out that we always planned to have more features around sync rep, it's just that we never actually worked on any. For example, quorum sync was extensively discussed and originally projected for 9.2, only certain hackers changed jobs and interests. If we just did the minimal change, that is, added an auto-degrade GUC and an alert to the logs each time the master server went into degraded mode, as Heikki says we'd be loading a big foot-gun for a bunch of ill-informed DBAs. People who want that are really much better off with async rep in the first place. If we really want auto-degrading sync rep, then we'd (at a minimum) need a way to determine *from the replica* whether or not it was in degraded mode when the master died. What good do messages to the master log do you if the master no longer exists? Mind you, being able to determine on the replica whether it was synchronous or not when it lost communication with the master would be a great feature to have for sync rep groups as well, and would make them practical (right now, they're pretty useless). However, I seriously doubt that someone is going to code that up in the next 5 days. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest manager?
On 01/08/2014 02:04 PM, Peter Eisentraut wrote: Anyone else? Or you'll have to deal with me again? I vote for Peter. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/08/2014 01:49 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: If we really want auto-degrading sync rep, then we'd (at a minimum) need a way to determine *from the replica* whether or not it was in degraded mode when the master died. What good do messages to the master log do you if the master no longer exists? How would it be possible for a replica to know whether the master had committed more transactions while communication was lost, if the master dies without ever restoring communication? It sounds like pie in the sky from here ... Oh, right. Because the main reason for a sync replica degrading is that it's down. In which case it isn't going to record anything. This would still be useful for sync rep candidates, though, and I'll document why below. But first, lemme demolish the case for auto-degrade. So here's the case that we can't possibly solve for auto-degrade. Anyone who wants auto-degrade needs to come up with a solution for this case as a first requirement: 1. A data center network/power event starts. 2. The sync replica goes down. 3. A short time later, the master goes down. 4. Data center power is restored. 5. The master is fried and is a permanent loss. The replica is ok, though. Question: how does the DBA know whether data has been lost or not? With current sync rep, it's easy: no data was lost, because the master stopped accepting writes once the replica went down. If we support auto-degrade, though, there's no way to know; the replica doesn't have that information, and anything which was on the master is permanently lost. And the point several people have made is: if you can live with indeterminancy, then you're better off with async rep in the first place. Now, what we COULD definitely use is a single-command way of degrading the master when the sync replica is down. Something like ALTER SYSTEM DEGRADE SYNC. Right now you have to push a change to the conf file and reload, and there's no way to salvage the transaction which triggered the sync failure. This would be a nice 9.5 feature. HOWEVER, we've already kind of set up an indeterminate situation with allowing sync rep groups and candidate sync rep servers. Consider this: 1. Master server A is configured with sync replica B and candidate sync replica C 2. A rolling power/network failure event occurs, which causes B and C to go down sometime before A, and all of them to go down before the application does. 3. On restore, only C is restorable; both A and B are a total loss. Again, we have no way to know whether or not C was in sync replication when it went down. If C went down before B, then we've lost data; if B went down before C, we haven't. But we can't find out. *This* is where it would be useful to have C log whenever it went into (or out of) synchronous mode. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Stephen, I'm aware, my point was simply that we should state, up-front in 25.2.7.3 *and* where we document synchronous_standby_names, that it requires at least three servers to be involved to be a workable solution. It's a workable solution with 2 servers. That's a low-availability, high-integrity solution; the user has chosen to double their risk of not accepting writes against never losing a write. That's a perfectly valid configuration, and I believe that NTT runs several applications this way. In fact, that can already be looked at as a kind of auto-degrade mode: if there aren't two nodes, then the database goes read-only. Might I also point out that transactions are synchronous or not individually? The sensible configuration is for only the important writes being synchronous -- in which case auto-degrade makes even less sense. I really think that demand for auto-degrade is coming from users who don't know what sync rep is for in the first place. The fact that other vendors are offering auto-degrade as a feature instead of the ginormous foot-gun it is adds to the confusion, but we can't help that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/08/2014 03:18 PM, Stephen Frost wrote: Do you really feel that a WARNING and increasing the docs to point out that three systems are necessary, particularly under the 'high availability' documentation and options, is a bad idea? I fail to see how that does anything but clarify the use-case for our users. I think the warning is dumb, and that the suggested documentation change is insufficient. If we're going to clarify things, then we need to have a full-on several-page doc showing several examples of different sync rep configurations and explaining their tradeoffs (including the different sync modes and per-transaction sync). Anything short of that is just going to muddy the waters further. Mind you, someone needs to take a machete to the HA section of the docs anyway. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/08/2014 03:27 PM, Tom Lane wrote: Good point, but C can't solve this for you just by logging. If C was the first to go down, it has no way to know whether A and B committed more transactions before dying; and it's unlikely to have logged its own crash, either. Sure. But if we *knew* that C was not in synchronous mode when it went down, then we'd expect some data loss. As you point out, though, the converse is not true; even if C was in sync mode, we don't know that there's been no data loss, since B could come back up as a sync replica before going down again. What we lack, and should work on, is a way for sync mode to have M larger than one. AFAICS, right now we'll report commit as soon as there's one up-to-date replica, and some high-reliability cases are going to want more. Yeah, we talked about having this when sync rep originally went in. It involves a LOT more bookeeping on the master though, which is why nobody has been willing to attempt it -- and why we went with the single-replica solution in the first place. Especially since most people who want quorum sync really want MM replication anyway. Sync N times is really just a guarantee against data loss as long as you lose N-1 servers or fewer. And it becomes an even lower-availability solution if you don't have at least N+1 replicas. For that reason, I'd like to see some realistic actual user demand before we take the idea seriously. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about Lockhart's book
On 12/27/2013 08:14 AM, Christian Convey wrote: Hi Andrew, Thanks for your response. Sometimes overall software architectures stay (mostly) unchanged for a long time, and so I figured that's possibly the case for Postgresql as well. But I didn't know, which is why I asked. Some things in that book will still be accurate and informative. The problem is that you, as a beginner, won't know which things are still good and which are obsolete. I'd suggest: - Developer documentation in our primary docs - Developer FAQ on the wiki - Bruce's presentations on various internals - Tom's presentations on how the query planner works - Various other people's presentations on other aspects, such as foreign data wrappers, event triggers, etc. Unfortunately, there's no central index of presentations. I'm a big fan of learn by doing, and here's a program which would bring you up on a LOT of PostgreSQL: 1. Write a few of your own C functions, including trigger functions and an operator. 2. Write your own foreign data wrapper for something. 3. Write your own Type, including input/output functions, stats estimation and custom indexing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
Pavel, So constructor should to look like: CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int DEFAULT 0, ...) and usage: SELECT make_interval(years := 2) SELECT make_interval(days := 14) Is there a interest for this (or similar) function? It would certainly make our Python users happy. And for that matter would get rid of this kind of stupid thing in stored procedure code: time_ahead := ( interval '1 minute' * var_skip ); So, +1 for the feature. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
On 12/20/2013 03:09 PM, Gavin Flower wrote: What about leap years? What about them? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
On 12/20/2013 04:44 PM, Gavin Flower wrote: On 21/12/13 13:40, Josh Berkus wrote: On 12/20/2013 03:09 PM, Gavin Flower wrote: What about leap years? What about them? some years have 365 days others have 366, so how any days in an interval of 2 years?, 4 years? Your question isn't relevant to this patch. It's not defining the interval type, just creating an alternate constructor for it. (the answer is, it depends on what timestamp you're adding it to ...) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 08:44 AM, Alvaro Herrera wrote: Another thought: at the initial run of the assertion, note which tables it locked, and record this as an OID array in the catalog row for the assertion; consider running the assertion only when those tables are touched. This doesn't work if the assertion code locks some tables when run under certain conditions and other tables under different conditions. But then this can be checked too: if an assertion lists in its catalog row that it involves tables A, B, C and then, under different conditions, it tries to acquire lock on table D, have the whole thing fail indicating that the assertion is misdeclared. This sounds like you're re-inventing SSI. SERIALIZABLE mode *exists* in order to be able to enforce constraints which potentially involve more than one transaction. Balance can never go below 0, for example. The whole reason we have this really cool and unique SSI mode is so that we can do such things without killing performance. These sorts of requirements are ideally suited to Assertions, so it's logically consistent to require Serializable mode in order to use Assertions. I'm leaning towards the alternative that Assertions require SERIALIZABLE mode, and throw a WARNING at the user and the log every time we create, modify, or trigger an assertion while not in SERIALIZABLE mode. And beyond, that, we don't guarantee the integrity of Assertions if people choose to run in READ COMMITTED anyway. This is consistent with how we treat the interaction of constraints and triggers; under some circumstances, we allow triggers to violate CHECK and FK constraints. Alternately, we add a GUC assertion_serializable_mode, which can be off, warn or error. If it's set to error, and the user triggers an assertion while in READ COMMITTED mode, an exception occurs. If it's set to off, then assertions are disabled, in order to deal with buggy assertions. Now, it would be even better if we could prevent users from switching transaction mode, but that's a MUCH bigger and more complicated patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 11:26 AM, Jim Nasby wrote: The flip-side is that now you can get serialization failures, and I think there's a ton of software that has no clue how to deal with that. So now you don't get to use assertions at all unless you re-engineer your application (but see below). Well, the software will need to deal with an Assertion failure, which I doubt it's prepared to do right now either. This is consistent with how we treat the interaction of constraints and triggers; under some circumstances, we allow triggers to violate CHECK and FK constraints. We do? Under what circumstances? AFTER triggers are allowed to ignore constraints sometimes. For example, if you have a tree table with an FK to other rows in the same table, and you have an AFTER trigger on it, the AFTER trigger is allowed to violate the self-FK. That's the one I ran across, but I vaguely remember other cases, and there's some documentation on this in the order of application of triggers in the main docs. Another possibility is to allow for two different types of assertions, one based on SSI and one based on locking. The locking version would have to pretty much lock on a table basis (or even a whole-database basis) every time an assertion executed, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why no INSTEAD OF triggers on tables?
On 12/16/2013 07:53 PM, Robert Haas wrote: So, put a BEFORE trigger, and make it return NULL. Same effect, different notation. NOT the same: Master partition table with BEFORE trigger: josh=# insert into a ( id, val ) values ( 23, 'test' ), ( 24, 'test'), (25,'test'); INSERT 0 0 ^^^ View with INSTEAD OF trigger: josh=# insert into a_v ( id, val ) values ( 23, 'test' ), ( 24, 'test'), (25,'test'); INSERT 0 3 ^^^ The difference here is that the INSTEAD OF trigger returns a rows-affected count, and the BEFORE trigger does not (it returns 0). Some drivers and ORMs, most notably Hibernate, check this rows-returned count, and error if they don't match the rows sent. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 12/16/2013 11:44 AM, Tom Lane wrote: Right. I think a lot of the tension comes from people being unconvinced that the existing extension feature is an ideal model for this sort of use-case. Extensions were mainly designed around the notion of a .so with some SQL datatype/function/etc declarations that have to match up with the C code. So it made sense for them to be relatively static things that live in the filesystem. Notably, if you're migrating to a new PG major version, you're at the very least gonna have to recompile the C code and quite likely will need to change it some. So adapting the SQL code if needed goes along with that, and would never be automatic in any case. I see what you mean. On the other hand: a) introducing a new concept would require a new reserved word b) it would also require figuring out how it interacts with extensions c) extensions already have versioning, which this feature needs d) extensions already have dependancies, which this feature needs While it splits Extensions into two slightly different concepts, I find that on the whole less confusing than the alternative. On 12/16/2013 05:17 PM, Jim Nasby wrote: Somewhat related to this, I really wish Postgres had the idea of a class, that was allowed to contain any type of object and could be instantiated when needed. For example, if we had an address class, we could instantiate it once for tracking our customer addresses, and a second time for tracking the addresses customers supply for their employers. Such a mechanism would probably be ideal for what we need, but of course you'd still have the question of how to load a class definition that someone else has published. Well, the idea originally (POSTGRES) was for the Type, Domain, and Inheritance system to do just what you propose. Nobody ever worked out all the practicalities and gotchas to make it really work in production, though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
On 12/17/2013 06:34 AM, Robert Haas wrote: On Tue, Dec 17, 2013 at 12:09 AM, Amit Kapila amit.kapil...@gmail.com wrote: I have used pg_prewarm during some of work related to Buffer Management and other performance related work. It is quite useful utility. +1 for reviving this patch for 9.4 Any other votes? I still support this patch (as I did originally), and don't think that the overlap with pgFincore is of any consequence. pgFincore does more than pgrewarm ever will, but it's also platform-specific, so it still makes sense for both to exist. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 11/15/2013 05:41 AM, Heikki Linnakangas wrote: A fundamental problem with this is that it needs to handle isolation reliable, so that the assertion cannot be violated when two concurrent backends do things. Consider the example from the manual, which checks that a table has at least one row. Now, if the table has two rows to begin with, and in one backend you delete one row, and concurrently in another backend you delete the other row, and then commit both transactions, the assertion is violated. In other words, the assertions need to be checked in serializable mode. Now that we have a real serializable mode, I think that's actually feasible. Going back over this patch, I haven't seen any further discussion of the point Heikki raises above, which seems like a bit of a showstopper. Heikki, did you have specific ideas on how to solve this? Right now my mind boggles. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/17/2013 01:42 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: Going back over this patch, I haven't seen any further discussion of the point Heikki raises above, which seems like a bit of a showstopper. Heikki, did you have specific ideas on how to solve this? Right now my mind boggles. It works fine as long as you set default_transaction_isolation = 'serializable' and never override that. :-) Of course, it sure would be nice to have a way to prohibit overrides, but that's another issue. Otherwise it is hard to see how to make it work in a general way without a mutually exclusive lock mode on the table for the duration of any transaction which modifies the table. Serializable or not, *what* do we lock for assertions? It's not rows. Tables? Which tables? What if the assertion is an interpreted language function? Does the SSI reference counter really take care of all of this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 12/15/2013 10:47 PM, Jeff Davis wrote: The patch offers an alternative to dropping files on the filesystem before doing CREATE EXTENSION. Instead, if the extension has no C code, you can put it in the catalog using ordinary SQL access, and execute the same kind of CREATE EXTENSION. Aside from that, it's pretty much identical to existing extensions. OK, so the idea is that for psql-only extensions (i.e. ones without .so's) the user shouldn't be required to create a file on disk. That seems straightforwards and acceptable. Stephen doesn't like the idea that the SQL in an extension is a blob of text. I, personally, would prefer per-object line-items, but I don't think that's a deal-breaker. Having a single text blob does match up with existing Extension design. Note for Dimitri, though: I think that having line-item objects in dependancy order would make this feature vastly more useful for schema-template maintenance. Give it some thought. There are weird cases, like if you make local modifications to objects held in an extension, then dump/reload will lose those local modifications. What does DUMP/Reload do with regular Extensions currently in that case? Another issue, which I agree is dubious in many situations, is that the version of an extension is not preserved across dump/reload (this is actually a feature, which was designed with contrib-style extensions in mind, but can be surprising in other circumstances). Well, this should work with a versioning system, in which dump/reload can load older versions of the extension if they are present, the same as external Extensions do now. Is that built in? This isn't necessarily a dead-end, but there are a lot of unsettled issues, and it will take some soul-searching to answer them. Is an extension a blob of text with a version, that's maintained in some external repo? Well, plus potentially binaries and library references, yes. Although you could describe all of Postgres as a bunch of text blobs and some library references, when you get right down to it. Is it the job of postgres to ensure that dump/reload creates the same situation that you started with, including local modifications to objects that are part of an extension? IMHO: No. AFAIK, if a user modifies, say, information_schema views in PostgreSQL, we don't respect that in dump/restore either. Now, I can see adding to this mechanism a method for tracking such modifications in a way that pgdump can support them. But that can easily be a version 2 feature. Should everything be an extension, or do we need to invent a new concept for some of the use cases? What role to external tools play in all of this? So, the reason I was confused by this feature -- and the reason Stephen hates it, I think -- is that I thought it was solving the Extensions don't follow replication, and they are complicated to install if your OS doesn't have good packages problem. It's not, and it never will solve that issue. It's solving a completely different problem, to wit: Some PostgreSQL shops with lots of servers have large internal libraries of functions, views, and similar code that they've written to support their applications, which don't comprise a complete database. This feature would allow them to package those libraries, and version, upgrade and track them, without requiring a filesystem-based install. I myself have a couple clients who could benefit from this. I think the name Extension Templates is horrible because it misleads all of us on this list into thinking the proposed feature is completely something other than what it is. I don't have a better name offhand, but that's got to change before it becomes a feature. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 12/16/2013 10:53 AM, Josh Berkus wrote: Some PostgreSQL shops with lots of servers have large internal libraries of functions, views, and similar code that they've written to support their applications, which don't comprise a complete database. This feature would allow them to package those libraries, and version, upgrade and track them, without requiring a filesystem-based install. I myself have a couple clients who could benefit from this. cc'ing Jim Nasby, since I think Enova is part of the target market for this feature. Jim, have you taken a look at this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why no INSTEAD OF triggers on tables?
Hackers, I've looked in the archives, but I can't find a reason why INSTEAD OF triggers were never enabled for tables. I'm interested in them in order to return a rowcount to JDBC for INSERTs into partitioned tables. Was there a technical obstacle, or is this just a TUIT issue? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why no INSTEAD OF triggers on tables?
On 12/16/2013 04:22 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I've looked in the archives, but I can't find a reason why INSTEAD OF triggers were never enabled for tables. What would that mean exactly? And how would you do the actual update when it came time to? Well, I'm specifically thinking of master partition tables. In that case, we really want an INSTEAD OF trigger. It seems a little silly that I need to create a separate view, and then an INSTEAD OF trigger on the view, in order to get a rows-updated count back from an INSERT which hits a partitioned table. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
All: Can someone summarize the issues with this patch for those of us who haven't been following it closely? I was just chatting with a couple other contributors, and at this point none of just know what it implements, what it doesn't implement, what the plans are for expanding its feature set (if any), and why Frost doesn't like it. I tried reading through the thread on -hackers, and came away even more confused. Is there maybe a wiki page for it? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 12/14/2013 05:00 PM, Tom Lane wrote: This consideration also makes me question whether we should apply the method for NUMERIC. Although in principle numeric addition/subtraction is exact, such a sequence could leave us with a different dscale than is returned by the existing code. I'm not sure if changing the number of trailing zeroes is a big enough behavior change to draw complaints. If we're going to disqualify NUMERIC too, we might as well bounce the feature. Without a fast FLOAT or NUMERIC, you've lost most of the target audience. I think even the FLOAT case deserves some consideration. What's the worst-case drift? In general, folks who do aggregate operations on FLOATs aren't expecting an exact answer, or one which is consistent beyond a certain number of significant digits. And Dave is right: how many bug reports would we get about NUMERIC is fast, but FLOAT is slow? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/2013 10:33 AM, Claudio Freire wrote: Well, why not take a supersample containing all visible tuples from N selected blocks, and do bootstrapping over it, with subsamples of M independent rows each? Well, we still need to look at each individual block to determine grouping correlation. Let's take a worst case example: imagine a table has *just* been created by: CREATE TABLE newdata AS SELECT * FROM olddata ORDER BY category, item; If category is fairly low cardinality, then grouping will be severe; we can reasonably expect that if we sample 100 blocks, many of them will have only one category value present. The answer to this is to make our block samples fairly widely spaced and compare them. In this simplified example, if the table had 1000 blocks, we would take blocks 1,101,201,301,401,etc. Then we would compare the number and content of values found on each block with the number and content found on each other block. For example, if we see that block 101 is entirely the category cats, and block 701 is entirely the category shopping and block 901 is split 60/40 between the categories transportation and voting, then we can assume that the level of grouping is very high, and the number of unknown values we haven't seen is also high. Whereas if 101 is cats and 201 is cats and 301 through 501 are cats with 2% other stuff, then we assume that the level of grouping is moderate and it's just the case that most of the dataset is cats. Which means that the number of unknown values we haven't seen is low. Whereas if 101, 201, 501, and 901 have near-identical distributions of values, we assume that the level of grouping is very low, and that there are very few values we haven't seen. As someone else pointed out, full-block (the proposal) vs. random-row (our current style) doesn't have a very significant effect on estimates of Histograms and nullfrac, as long as the sampled blocks are widely spaced. Well, nullfrac is affected in the extreme example of a totally ordered table where the nulls are all in one block, but I'll point out that we can (and do) also miss that using our current algo. Estimated grouping should, however, affect MCVs. In cases where we estimate that grouping levels are high, the expected % of observed values should be discounted somehow. That is, with total random distribution you have a 1:1 ratio between observed frequency of a value and assumed frequency. However, with highly grouped values, you might have a 2:1 ratio. Again, more math (backed by statistical analysis) is needed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
On 12/11/2013 08:48 AM, Tom Lane wrote: The fundamental problem IMO is that you want to complicate the definition of what these things mean as a substitute for DBAs learning something about Postgres. That seems like a fool's errand from here. They're going to have to learn what FATAL means sooner or later, and making it more complicated just raises the height of that barrier. I don't think it works to change the NOTICE/ERROR/FATAL tags; for one thing, I can hear the screaming about people's log scripts from here. However, it would really be useful to have an extra tag (in addition to the ERROR or FATAL) for If you're seeing this message, something has gone seriously wrong on the server. Just stuff like corruption messages, backend crashes, etc. Otherwise we're requiring users to come up with an alphabet soup of regexes to filter out the noise error messages from the really, really important ones. Speaking as someone who does trainings for new DBAs, the part where I do what to look for in the logs requires over an hour and still doesn't cover everything. And doesn't internationalize. That's nasty. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 12/11/2013 09:57 AM, Robert Haas wrote: I don't agree with that assessment. Anything that involves changing the scheduling of autovacuum is a major project that will legitimately provoke much controversy. Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. I have my doubts about whether that can be accomplished in an entire release cycle, let alone 2-3 days. In contrast, the patch proposed does something that is easy to understand, clearly safe, and an improvement over what we have now. +1 There is an inherent tuning and troubleshooting challenge in anything involving a feedback loop. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 12/11/2013 11:37 AM, Simon Riggs wrote: On 11 December 2013 17:57, Robert Haas robertmh...@gmail.com wrote: Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. Agreed, but the amount of testing seems equivalent in both cases, assuming we weren't going to skip it for this patch. No performance testing is required for this patch. The effect of memory limits on vacuum are already well-known and well-understood. With considerable regret, I don't see how this solves the problem at hand. We can and should do better. I strongly disagree. The problem we are dealing with currently is that two resource limits which should have *always* been independent of each other are currently conflated into a single GUC variable. This forces users to remember to set maintenance_work_mem interactively every time they want to run a manual VACUUM, because the setting in postgresql.conf is needed to tune autovacuum. In other words, we are having an issue with *non-atomic data*, and this patch partially fixes that. Would it be better to have an admissions-control policy engine for launching autovacuum which takes into account available RAM, estimated costs of concurrent vacuums, current CPU activity, and which tables are in cache? Yes. And if you started on that now, you might have it ready for 9.5. And, for that matter, accepting this patch by no means blocks doing something more sophisticated in the future. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 12/11/2013 12:40 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: And, for that matter, accepting this patch by no means blocks doing something more sophisticated in the future. Yeah. I think the only real argument against it is do we really need yet another knob?. Since Josh, who's usually the voicer of that argument, is for this one, I don't have a problem with it. This passes the is it a chronic problem not to have a knob for this? test. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/11/2013 02:39 PM, Martijn van Oosterhout wrote: In this discussion we've mostly used block = 1 postgresql block of 8k. But when reading from a disk once you've read one block you can basically read the following ones practically for free. So I wonder if you could make your sampling read always 16 consecutive blocks, but then use 25-50% of the tuples. That way you get many more tuples for the same amount of disk I/O seeks.. Yeah, that's what I meant by tune this for the FS. We'll probably have to test a lot of different block sizes on different FSes before we arrive at a reasonable size, and even then I'll bet we have to offer a GUC. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Andres, all: We've just run across a case of this exact issue on 9.2.4. I thought it was supposed to be 9.3-only? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On 12/10/2013 10:39 AM, Andres Freund wrote: Hi, On 2013-12-10 10:38:32 -0800, Josh Berkus wrote: We've just run across a case of this exact issue on 9.2.4. I thought it was supposed to be 9.3-only? Could you please describe this exact issue? Fatal errors due to missing pg_subtrans files on creating a new replica. Sequence: 1. Failed over from master to replica #1 2. Remastered other replicas 3. Tried to create a new replica. 4. New replica started failing with errors similar to the original report. If this is fixed in 9.2.6, great, but I didn't get that impression from the commits ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] coredump of 9.3.2
On 12/10/2013 09:39 AM, Teodor Sigaev wrote: SELECT * FROM t WHERE ( CASE WHEN a%2 IN (SELECT c FROM tt) THEN a END IN (SELECT c FROM tt) ); Wow, it wouldn't have occured to me that that was even supported syntax. I'm not suprised that it doesn't work ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On 12/10/2013 10:48 AM, Andres Freund wrote: On 2013-12-10 10:44:30 -0800, Josh Berkus wrote: On 12/10/2013 10:39 AM, Andres Freund wrote: Hi, On 2013-12-10 10:38:32 -0800, Josh Berkus wrote: We've just run across a case of this exact issue on 9.2.4. I thought it was supposed to be 9.3-only? Could you please describe this exact issue? Fatal errors due to missing pg_subtrans files on creating a new replica. If this is fixed in 9.2.6, great, but I didn't get that impression from the commits ... I am pretty sure this thread isn't about the bug you're hitting. You get errors during xid assignment, right? If so, upgrade do 9.2.6, that's fixed (it's the bug in which wake the replication bug was introduced). I thought that only affected 9.2.5? This machine is 9.2.4, as was the prior master. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/10/2013 11:49 AM, Peter Geoghegan wrote: On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote: I don't think that anyone believes that not doing block sampling is tenable, fwiw. Clearly some type of block sampling would be preferable for most or all purposes. As discussed, we need math though. Does anyone have an ACM subscription and time to do a search? Someone must. We can buy one with community funds, but no reason to do so if we don't have to. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql_check_function - rebase for 9.3
On 12/10/2013 12:50 PM, Tom Lane wrote: One would hope that turning off check_function_bodies would be sufficient to disable any added checking, though, so I don't see this being a problem for pg_dump. But there might be other scenarios where an additional knob would be useful. I can't think of one, offhand. And +1 for NOT adding a new knob. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/10/2013 01:33 PM, Mark Kirkwood wrote: Yeah - and we seem to be back to Josh's point about needing 'some math' to cope with the rows within a block not being a purely random selection. Well, sometimes they are effectively random. But sometimes they are not. The Chaudri et al paper had a formula for estimating randomness based on the grouping of rows in each block, assuming that the sampled blocks were widely spaced (if they aren't there's not much you can do). This is where you get up to needing a 5% sample; you need to take enough blocks that you're confident that the blocks you sampled are representative of the population. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
Greg, I really don't believe the 5% thing. It's not enough for n_distinct and it's *far* too high a value for linear properties like histograms or nullfrac etc. Actually, it is enough for n_distinct, or more properly, 5% is as good as you can get for n_distinct unless you're going to jump to scanning 50% or more. It's also applicable for the other stats; histogram buckets constructed from a 5% sample are more likely to be accurate than those constructed from a 0.1% sample. Same with nullfrac. The degree of improved accuracy, would, of course, require some math to determine. From a computer point of view it's too high to be worth bothering. If we have to read 5% of the table we might as well do a full scan anyways, it'll be marginally slower but much better quality results. Reading 5% of a 200GB table is going to be considerably faster than reading the whole thing, if that 5% is being scanned in a way that the FS understands. Also, we can optimize this significantly by using the VM, as Robert (I think) suggested. In the advanced approaches section, there's also the idea of collecting analyze data from table pages while they're in memory anyway for other reasons. You do seem kind of hostile to the idea of full-page-sampling, going pretty far beyond the I'd need to see the math. Why? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/09/2013 02:37 PM, Robert Haas wrote: I've never seen an n_distinct value of more than 5 digits, regardless of reality. Typically I've seen 20-50k, even if the real number is much higher. But the n_distinct value is only for non-MCVs, so if we estimate the selectivity of column = 'rarevalue' to be (1-nullfrac-mcvfrac)/n_distinct, then making mcvfrac bigger reduces the estimate, and making the MCV list longer naturally makes mcvfrac bigger. I'm not sure how important the less-frequent-than-the-least-common-MCV part is, but I'm very sure that raising the statistics target helps to solve the problem of overestimating the prevalence of uncommon values in a very big table. I did an analysis of our ndistinct algorithm several years ago ( ~~ 8.1), and to sum up: 1. we take far too small of a sample to estimate ndistinct well for tables larger than 100,000 rows. 2. the estimation algo we have chosen is one which tends to be wrong in the downwards direction, rather strongly so. That is, if we could potentially have an ndistinct of 1000 to 100,000 based on the sample, our algo estimates 1500 to 3000. 3. Other algos exist. The tend to be wrong in other directions. 4. Nobody has done an analysis of whether it's worse, on average, to estimate low vs. high for ndistinct. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/08/2013 10:14 AM, Greg Stark wrote: With rows_per_block=1 the MCV frequency list ranges from .0082 to .0123 With rows_per_block=4 the MCV frequency list ranges from .0063 to .0125 With rows_per_block=16 the MCV frequency list ranges from .0058 to .0164 With rows_per_block=64 the MCV frequency list ranges from .0021 to .0213 I'm not really sure if this is due to the blocky sample combined with the skewed pgbench run or not. It doesn't seem to be consistently biasing towards or against bid 1 which I believe are the only rows that would have been touched by pgbench. Still it's suspicious that they seem to be consistently getting less accurate as the blockiness increases. They will certainly do so if you don't apply any statistical adjustments for selecting more rows from the same pages. So there's a set of math designed to calculate for the skew introduced by reading *all* of the rows in each block. That's what I meant by block-based sampling; you read, say, 400 pages, you compile statistics on *all* of the rows on those pages, you apply some algorithms to adjust for groupings of rows based on how grouped they are. And you have a pretty good estimate of how grouped they are, because you just looked a complete sets of rows on a bunch of nonadjacent pages. Obviously, you need to look at more rows than you would with a pure-random sample. Like I said, the 80%+ accurate point in the papers seemed to be at a 5% sample. However, since those rows come from the same pages, the cost of looking at more rows is quite small, compared to the cost of looking at 64 times as many disk pages. My ACM subscription has lapsed, though; someone with a current ACM subscription could search for this; there are several published papers, with math and pseudocode. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/07/2013 11:46 AM, Robert Haas wrote: Maybe there's some highly-principled statistical approach which could be taken here, and if so that's fine, but I suspect not. So what I think we should do is auto-tune the statistics target based on the table size. If, say, we think that the generally useful range for the statistics target is something like 10 to 400, then let's come up with a formula based on table size that outputs 10 for small tables, 400 for really big tables, and intermediate values for tables in the middle. The only approach which makes sense is to base it on a % of the table. In fact, pretty much every paper which has examined statistics estimation for database tables has determined that any estimate based on a less-than-5% sample is going to be wildly inaccurate. Not that 5% samples are 100% accurate, but at least they fit the 80/20 rule. This is the reason why implementing block-based sampling is critical; using our current take one row out of every page method, sampling 5% of the table means scanning the whole thing in most tables. We also need to decouple the number of MCVs we keep from the sample size. Certainly our existing sampling algo seems designed to maximize IO for the sample size. There's other qualitative improvements we could make, which Nathan Boley has spoken on. For example, our stats code has no way to recognize a normal or exponential distrbution -- it assumes that all columns are randomly distributed. If we could recoginze common distribution patterns, then not only could we have better query estimates, those would require keeping *fewer* stats, since all you need for a normal distribution are the end points and the variance. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink performance regression
All, I tested out Joe's original patch, and it does eliminate the 8% performance regression. Will try the new one. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
On 12/06/2013 01:30 PM, Andrew Gierth wrote: Since I don't particularly trust my own judgement on aesthetics, I used the feedback I got from others when deciding what to do. Frankly I think this one needs wider input than just you and me arguing over it. Can someone paste examples of the two syntax alternatives we're talking about here? I've lost track. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/05/2013 07:40 AM, Greg Stark wrote: On Thu, Dec 5, 2013 at 2:54 PM, Claudio Freire klaussfre...@gmail.com wrote: That's a bad idea in the current state of affairs. MM files haven't been designed for that usage, and getting stable performance out of that will be way too difficult. I'm talking about long-term goals here. Either of these two routes would require whole new kernel interfaces to work effectively. Without those new kernel interfaces our current approach is possibly the best we can get. Well, in the long run we'll probably be using persistent RAM. And the geeks who manage that have already said that MMAP is a bad interface for persistent RAM. They haven't defined a good one, though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/05/2013 05:48 AM, Stephen Frost wrote: * Peter Geoghegan (p...@heroku.com) wrote: On Wed, Dec 4, 2013 at 11:07 AM, Josh Berkus j...@agliodbs.com wrote: But you know what? 2.6, overall, still performs better than any kernel in the 3.X series, at least for Postgres. What about the fseek() scalability issue? Not to mention that the 2.6 which I suspect you're referring to (RHEL) isn't exactly 2.6.. Actually, I've been able to do 35K TPS on commodity hardware on Ubuntu 10.04. I have yet to go about 15K on any Ubuntu running a 3.X Kernel. The CPU scheduling on 2.6 just seems to be far better tuned, aside from the IO issues; at 35K TPS, the CPU workload is evenly distributed across cores, whereas on 3.X it lurches from core to core like a drunk in a cathedral. However, the hardware is not identical, and this is on proprietary, not benchmark, workloads, which is why I haven't published anything. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?
On 12/05/2013 10:21 AM, Stephen Frost wrote: * David Johnston (pol...@yahoo.com) wrote: ISTM that instituting some level of categorization for messages would be helpful. Then logging and reporting frameworks would be able to identify and segregate the logs in whatever way they and the configuration deems appropriate. I've wanted to do that and have even discussed it with folks in the past, the trick is finding enough toit's, which is difficult when you start to look at the size of the task... But ... if we set a firm policy on this, then we could gradually clean up the error messages piecemeal over the next couple of major versions. We could also make sure that any new features complied with the categorization policy. Right now, how to categorize errors is up to each individual patch author, which means that things are all over the place, and get worse with each new feature added. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?
On 12/05/2013 10:46 AM, Tom Lane wrote: Before we could get very far we'd need a better understanding than we have of what cases a DBA might be interested in. To take the specific example that started this thread, there wouldn't be a lot of value IMO in a classification like connection failure messages. I think the OP is probably right that those are often uninteresting --- but as I mentioned, too many clients might become interesting if he's wondering whether he needs to enlarge max_connections. Or password failure cases might become interesting if he starts to suspect breakin attempts. So I'd want to see a design that credibly covers those sorts of needs before we put any large effort into code changes. Heck, I'd be happy just to have a class of messages which specifically means OMG, there's something wrong with the server, that is, a flag for messages which only occur when PostgreSQL encounters a bug, data corrpution, or platform error. Right now, I have to suss those out by regex. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/05/2013 12:41 PM, Robert Haas wrote: On Thu, Dec 5, 2013 at 12:54 PM, Josh Berkus j...@agliodbs.com wrote: Actually, I've been able to do 35K TPS on commodity hardware on Ubuntu 10.04. I have yet to go about 15K on any Ubuntu running a 3.X Kernel. The CPU scheduling on 2.6 just seems to be far better tuned, aside from the IO issues; at 35K TPS, the CPU workload is evenly distributed across cores, whereas on 3.X it lurches from core to core like a drunk in a cathedral. Do drunks lurch differently in cathedrals than they do elsewhere? Yeah, beacause they lurch from one column to another. It's a visual metaphor. ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers