Re: [HACKERS] Storing the password in .pgpass file in an encrypted format
On Fri, Feb 21, 2014 at 10:42 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think this thread deserves more attention: http://www.postgresql.org/message-id/caazkufajufddfp1_vghbdfyru0sj6msovvkrp87acq53ov6...@mail.gmail.com (I wrote that mail) I'm still in interested in this idea and haven't found a good reason to rescind the general thinking there. Some of my colleagues are thinking along similar lines outside the Postgres context. They seem happy with how that is shaping up. So, if there is some will for revival, that would be grand. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing the password in .pgpass file in an encrypted format
On Fri, Feb 21, 2014 at 6:15 PM, Greg Stark st...@mit.edu wrote: On Fri, Feb 21, 2014 at 10:18 PM, Daniel Farina dan...@heroku.com wrote: I'm still in interested in this idea and haven't found a good reason to rescind the general thinking there. It's an interesting idea. I wonder if it would be possible to make it compatible with existing tools like ssh-agent instead of inventing our own? I don't understand what you mean: the aesthetic of that proposal was to act as pure delegation insomuch as possible to integrate with other programs, and the supplementary programs provided that I wrote just for the purposes of demonstration are short. (https://github.com/fdr/pq-resolvers, if you want to read the program texts) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr
On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: So my objection to what Peter is suggesting is not that it's a bad idea in isolation, but that I don't see where he's going to stop, short of reinventing every query-normalization behavior that exists in the planner. If this particular case is worthy of fixing with a hack in the fingerprinter, aren't there going to be dozens more with just as good claims? (Perhaps not, but what's the basis for thinking this is far worse than any other normalization issue?) Qualitatively, the dynamic length values list is the worst offender. There is no algebraic solution to where to stop with normalizations, but as Peter points out, that bridge has been crossed already: assumptions have already been made that toss some potentially useful information already, and yet the program is undoubtedly practical. Based on my own experience (which sounds similar to Andres's), I am completely convinced the canonicalization he proposes here is more practical than the current definition to a large degree, so I suggest the goal is worthwhile. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote: Hello, Please find attached pg_stat_statements-identification-v9.patch. I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? The destabilization of the query_id is to attempt to skirt the objection that the unpredictability of instability in the query_id would otherwise cause problems and present a false contract, particular in regards to point release upgrades. Earliest drafts of mine included destabilizing every session, but this kills off a nice use case of correlating query ids between primaries and standbys, if memory serves. This has the semantics of destabilizing -- for sure -- every point release. @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key;/* hash key of entry - MUST BE FIRST */ Counterscounters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) It is roughly modeled after the column version of the same thing that pre-existed in pg_stat_statements. The only reason to have a LATEST is for some of the invariants though; otherwise, most uses should use the version-stamped symbol. I did this wrongly a bunch of times as spotted by Alvaro, I believe. I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? Testing. It was very useful to set to small numbers, like two or three. It's not crucial to the patch at all but having to whack it back all the time to keep the patch submission devoid of it seemed impractically tedious during revisions. This is what I call a can't happen error, or a defensive one: + else + { + /* +* Couldn't identify the tuple format. Raise error. +* +* This is an exceptional case that may only happen in bizarre +* situations, since it is thought that every released version +* of pg_stat_statements has a matching schema. +*/ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. That seems reasonable. Yes, it's basically a soft assert. I hit this one in development a few times, it was handy. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? The whitespace changes are not intentional, but I think the comments help a reader track what's going on better, which becomes more important if one adds more fields. It can be broken out into a separate patch, but it didn't seem like that bookkeeping was necessary. @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). I made no design decisions here...no complaints from me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] Save Hash Indexes
On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hi, Here's an idea: when a user ask for an Hash Index transparently build a BTree index over an hash function instead. Advantages: - it works - it's crash safe - it's (much?) faster than a hash index anyways Drawbacks: - root access concurrency - we need a hash_any function stable against pg_upgrade After talking about it with Heikki, we don't seem to find ways in which the root access concurrency pattern would be visible enough to matter. Also, talking with Peter Geoghegan, it's unclear that there's a use case where a hash index would be faster than a btree index over the hash function. Comments? I haven't met a single Heroku user that has stumbled into this landmine. Normally I am very weary of such landmine laden features, but this one is there and doesn't seem to have detonated at any point. I guess the obscure nature of those indexes and the stern warning in the documentation has sufficed to discourage their use. Given that experience, I think foreclosing fixing hash indexes is premature, and doesn't seem to be hurting inexperienced users of this stripe. Maybe there are yet other reasons to argue the subject, though...it's not like the current user base relies on the behavior as-is either, having seemingly steered clear just about altogether. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Save Hash Indexes
On Fri, Nov 1, 2013 at 8:52 AM, Daniel Farina dan...@heroku.com wrote: On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Also, talking with Peter Geoghegan, it's unclear that there's a use case where a hash index would be faster than a btree index over the hash function. Comments? I haven't met a single Heroku user that has stumbled into this landmine. Normally I am very weary of such landmine laden features, but this one is there and doesn't seem to have detonated at any point. I guess the obscure nature of those indexes and the stern warning in the documentation has sufficed to discourage their use. Given that experience, I think foreclosing fixing hash indexes is premature, and doesn't seem to be hurting inexperienced users of this stripe. Maybe there are yet other reasons to argue the subject, though...it's not like the current user base relies on the behavior as-is either, having seemingly steered clear just about altogether. In addendum, though, some users *have* done the hash-function + btree trick to make keys smaller. They tend to resort to full blown cryptographic hashes and assume/enforce non-collision, but it's somewhat awkward and finicky. So while perhaps commandeering the 'hash' index type is a bit over-aggressive, making that use case work better would probably carry positive impact. I can say most of my indexes over text are *never* used for range queries, so hashing them down one would think would be great. The fiddliness of applying expression indexes over all that forecloses getting the benefits that might be possible, there: only certain heavy workloads will receive that level of care. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] What hook would you recommend for one time, post authentication?
What hook would you recommend that matches this criteria: * Runs post-authentication * ..Once I was putting together a little extension module[0] intended to do connection limits out-of-band with the catalog (so that hot standbys and primaries can have different imposed connection limits), but am stymied because I can't locate a hook matching this description. My general approach has been to try to use GetUserNameFromId(GetSessionUserId()), but this requires InitializeSessionUserId be called first, and that has been causing me some trouble. ClientAuthentication_hook is too early (authentication has not yet happened, InitializeSessionUserId has not been called). Many of the other hooks are run per query (like the Executor hooks). And, postinit.c is not giving me a lot of clues here and nothing with the lexeme 'hook' is giving me a lot of confidence as seen in typedefs.list/grep. I appreciate any advice one can supply, thank you. [0]: https://github.com/fdr/pg_connlimit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What hook would you recommend for one time, post authentication?
On Mon, Oct 28, 2013 at 6:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: roleid = get_role_oid(port-user_name, true); Thank you for that, that appears to work very well to my purpose, as does ClientAuthentication_hook, now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Tue, Oct 22, 2013 at 2:56 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Tom Lane t...@sss.pgh.pa.us writes: Hm. It's been a long time since college statistics, but doesn't the entire concept of standard deviation depend on the assumption that the underlying distribution is more-or-less normal (Gaussian)? Is there a I just had a quick chat with a statistician friends of mine on that topic, and it seems that the only way to make sense of an average is if you know already the distribution. In our case, what I keep experiencing with tuning queries is that we have like 99% of them running under acceptable threshold and 1% of them taking more and more time. Agreed. In a lot of Heroku's performance work, the Perc99 and Perc95 have provided a lot more value that stddev, although stddev is a lot better than nothing and probably easier to implement. There are apparently high-quality statistical approximations of these that are not expensive to compute and are small in memory representation. That said, I'd take stddev over nothing for sure. Handily for stddev, I think by snapshots of count(x), sum(x), sum(x**2) (which I understand to be the components of stddev), I think one can compute stddevs across different time spans using auxiliary tools that sample this triplet on occasion. That's kind of a handy property that I'm not sure if percN-approximates can get too easily. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote: Please find patch attached which adds documentation for session_start and introduced fields and corrects documentation for queryid to be query_id. session_start remains in the view as agreed. Thanks for updating the document! I'm not clear about the use case of 'session_start' and 'introduced' yet. Could you elaborate it? Maybe we should add that explanation into the document? Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects ncalls may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. + entrystructfieldsession_start/structfield/entry + entrytypetext/type/entry + entry/entry + entryStart time of a statistics session./entry + /row + + row + entrystructfieldintroduced/structfield/entry + entrytypetext/type/entry The data type of these columns is not text. Oops +instr_time session_start; +uint64private_stat_session_key; it's better to add the comments explaining these fields. Yeah. +microsec = INSTR_TIME_GET_MICROSEC(t) - +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY); HAVE_INT64_TIMESTAMP should be considered here. That's not a bad idea. +if (log_cannot_read) +ereport(LOG, +(errcode_for_file_access(), + errmsg(could not read pg_stat_statement file \%s\: %m, +PGSS_DUMP_FILE))); Is it better to use WARNING instead of LOG as the log level here? Is this new code? Why did I add it again? Seems reasonable though to call it a WARNING. +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? If we expose hash value of query tree, without using statistics session, it is possible that users might make wrong assumption that this value remains stable across version upgrades, when in reality it depends on whether the version has make changes to query tree internals. So to explicitly ensure that users do not make this wrong assumption, hash value generation use statistics session id, which is newly created under situations described above. So, ISTM that we can use, for example, the version number to calculate the query_id. Right? That does seem like it may be a more reasonable stability vs. once per statistics session, because then use case with standbys work somewhat better. That said, the general concern was accidental contracts being assumed by consuming code, so this approach is tuned for making the query_id as unstable as possible while still being useful: stable, but only in one statistics gathering section. I did not raise the objection about over-aggressive contracts being exposed although I think the concern has merit...but given the use case involving standbys, I am for now charitable to increasing the stability to the level you indicate: a guaranteed break every point release. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote: Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects ncalls may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. Thanks for elaborating them! Since 'introduced' is reset even when the statistics is reset, maybe we can do without 'session_start' for that purpose? There is a small loss of precision. The original reason was that if one wanted to know, given two samples of pg_stat_statements, when the query_id is going to remain stable for a given query. For example: If the session changes on account of a reset, then it is known that all query_ids one's external program is tracking are no longer going to be updated, and the program can take account for the fact that the same query text is going to have a new query_id. Given the new idea of mixing in the point release number: If the code is changed to instead mixing in the full version of Postgres, as you suggested recently, this can probably be removed. The caveat there is then the client is going to have to do something a bit weird like ask for the point release and perhaps even compile options of Postgres to know when the query_id is going to have a different value for a given query text. But, maybe this is an acceptable compromise to remove one field. +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. So hiding only query and query_id is enough? Yeah, I think so. The other fields feel a bit weird to leave hanging around as well, so I thought I'd just fix it in one shot, but doing the minimum or only applying this idea to new fields is safer. It's shorter to hit all the fields, though, which is why I was tempted to do that. Perhaps not a good economy for potential subtle breaks in the next version. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Daniel Farina escribió: On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that becomes latest, somebody running the current definition with the updated .so will no longer get these values. Or is the intention that PGSS_TUP_LATEST will never be updated again, and future versions will get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on? I mean, surely we can always come up with new symbols to use, but it seems hard to follow. There's one other use of PGSS_TUP_LATEST here which I think should also be changed to = SOME_SPECIFIC_VERSION: + if (detected_version = PGSS_TUP_LATEST) + { + uint64 qid = pgss-private_stat_session_key; + + qid ^= (uint64) entry-query_id; + qid ^= ((uint64) entry-query_id) 32; + + values[i++] = Int64GetDatumFast(qid); + } I made some confusing mistakes here in using the newer tuple versioning. Let me try to explain what the motivation was: I was adding new fields to pg_stat_statements and was afraid that it'd be hard to get a very clear view that all the set fields are in alignment and there were no accidental overruns, with the problem getting more convoluted as more versions are added. The idea of PGSS_TUP_LATEST is useful to catch common programmer error by testing some invariants, and it'd be nice not to have to thrash the invariant checking code every release, which would probably defeat the point of such oops prevention code. But, the fact that I went on to rampantly do questionable things PGSS_TUP_LATEST is a bad sign. By example, here are the two uses that have served me very well: /* Perform version detection */ if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_0) detected_version = PGSS_TUP_V1_0; else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_1) detected_version = PGSS_TUP_V1_1; else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS) detected_version = PGSS_TUP_LATEST; else { /* * Couldn't identify the tuple format. Raise error. * * This is an exceptional case that may only happen in bizarre * situations, since it is thought that every released version * of pg_stat_statements has a matching schema. */ ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(pg_stat_statements schema is not supported by its installed binary))); } And #ifdef USE_ASSERT_CHECKING /* Check that every column appears to be filled */ switch (detected_version) { case PGSS_TUP_V1_0: Assert(i == PG_STAT_STATEMENTS_COLS_V1_0); break; case PGSS_TUP_V1_1: Assert(i == PG_STAT_STATEMENTS_COLS_V1_1); break; case PGSS_TUP_LATEST: Assert(i == PG_STAT_STATEMENTS_COLS); break; default: Assert(false); } #endif Given that, perhaps a way to fix this is something like this patch-fragment: { PGSS_TUP_V1_0 = 1, PGSS_TUP_V1_1, - PGSS_TUP_LATEST + PGSS_TUP_V1_2 } pgssTupVersion; +#define PGSS_TUP_LATEST PGSS_TUP_V1_2 This way when a programmer is making new tuple versions, they are much more likely to see the immediate need to teach those two sites about the new tuple size. But, the fact that one does not get the invariants updated in a completely compulsory way may push the value of this checking under water. I'd be sad to see it go, it has saved me a lot of effort when returning to the code after a while. What do you think? The instr_time thingy being used for these times maps to QueryPerformanceCounter() on Windows, and I'm not sure how useful this is for long-term time tracking; see http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163 for instance. I think it'd be better to use TimestampTz and GetCurrentTimestamp() for this. Ah. I was going to do that, but thought it'd be nice to merely push down the already-extant Instr struct in most cases, as to get the 'start' time stored there for free. But if it can't
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Just noticed that you changed the timer to struct Instrumentation. Not really sure about that change. Since you seem to be using only the start time and counter, wouldn't it be better to store only those? Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). Yeah, I was unsure about that too. The motivation was that I need one more piece of information in pgss_store (the absolute start time). I was going to widen the argument list, but it was looking pretty long, so instead I was thinking it'd be more concise to push the entire, typically extant Instrumentation struct pointer down. Would it work to define your own struct to pass around? Absolutely, I was just hoping to spare the code another abstraction if another was a precise superset. Looks like that isn't going to happen, though, so a pgss-oriented struct is likely what will have to be. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote: On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch Please find the patch attached Thanks for the patch! Here are the review comments: +OUT session_start timestamptz, +OUT introduced timestamptz, The patch exposes these columns in pg_stat_statements view. These should be documented. I don't think that session_start should be exposed in every rows in pg_stat_statements because it's updated only when all statistics are reset, i.e., session_start of all entries in pg_stat_statements indicate the same. Dunno. I agree it'd be less query traffic and noise. Maybe hidden behind a UDF? I thought stats_reset on pg_database may be prior art, but realized that the statistics there differ depending on stats resets per database (maybe a name change of 'session' to 'stats_reset' would be useful to avoid too much in-cohesion, though). I didn't want to bloat the taxonomy of exposed API/symbols too much for pg_stat_statements, but perhaps in this instance it is reasonable. Also, isn't the interlock with the result set is perhaps more precise/fine-grained with the current solution? Yet, that's awfully corner-casey. I'm on the fence because the simplicity and precision of the current regime for aggregation tools is nice, but avoiding the noise for inspecting humans in the common case is also nice. I don't see a reason right now to go strongly either way, so if you feel moderately strongly that the repetitive column should be stripped then I am happy to relent there and help out. Let me know of your detailed thoughts (or modify the patch) with your idea. +OUT query_id int8, query_id or queryid? I like the latter. Also the document uses the latter. Not much opinion. I prefer expending the _ character because most compound words in postgres performance statistics catalogs seem to use an underscore, though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Oct 1, 2013 at 5:32 AM, Sameer Thakur samthaku...@gmail.com wrote: On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL] [hidden email] wrote: On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. This seems to work fine. 1. Started the instance 2. Executed pg_stat_statements_reset(), select * from pgbench_history,select* from pgbench_tellers. Got the following in pg_stat_statements view userid | dbid | session_start | introduced| query | query_id | calls | tota l_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_wri tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+--+---+--+---+- ---+--+-+--+-+-++-++--- -++---+---+ 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:43.724301+05:30 | select * from pgbench_history;| -165801328395488047 | 1 | 0 |0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:37.379785+05:30 | select * from pgbench_tellers;| 8376871363863945311 | 1 | 0 | 10 | 0 |1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); | -1061018443194138344 | 1 | 0 |1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (3 rows) Then restarted the server
Re: [HACKERS] pluggable compression support
On Mon, Sep 30, 2013 at 1:49 PM, Huchev hugochevr...@gmail.com wrote: How come any compressor which could put some competition to pglz is systematically pushed out of the field on the ground of unverifiable legal risks ? Because pglz has been around for a while and has not caused patent trouble. The risks have been accepted and the downsides have not materialized. Were pglz were being written and distributed starting today, perhaps your reasoning would be more compelling, but as-is the pglz ship has sailed for quite some time and empirically it has not been a problem. That said, I hope the findings are in favor of lz4 or snappy integration. It does seem lz4 has picked up a slight edge. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur samthaku...@gmail.com wrote: Yes i was. Just saw a warning when pg_stat_statements is loaded that valid values for pg_stat_statements.max is between 100 and 2147483647. Not sure why though. I remember hacking that out for testing sake. I can only justify it as a foot-gun to prevent someone from being stuck restarting the database to get a reasonable number in there. Let's CC Peter; maybe he can remember some thoughts about that. Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sep 30, 2013 4:39 AM, Sameer Thakur samthaku...@gmail.com wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. Yes, a bug. There are a few calls to pgss store and I must be submitting a zero value for the introduction time in one of those cases. Heh, I thought that was fixed, but maybe I broke something. Like I said; preliminary. At the earliest I can look at this Wednesday, but feel free to amend and resubmit including my changes if you feel inclined and get to it first.
Re: [HACKERS] Some interesting news about Linux 3.12 OOM
On Wed, Sep 25, 2013 at 8:00 AM, Greg Stark st...@mit.edu wrote: On Wed, Sep 25, 2013 at 12:15 AM, Daniel Farina dan...@heroku.com wrote: Enable the memcg OOM killer only for user faults, where it's really the only option available. Is this really a big deal? I would expect most faults to be user faults. It's certainly a big deal that we need to ensure we can handle ENOMEM from syscalls and library functions we weren't expecting to return it. But I don't expect it to actually reduce the OOM killing sprees by much. Hmm, I see what you mean. I have been reading through the mechanism: I got too excited about 'allocations by system calls', because I thought that might mean brk and friends, except that's not much of an allocation at all, just reservation. I think. There is some interesting stuff coming in along with these patches in bringing the user-space memcg OOM handlers up to snuff that may make it profitable to issue SIGTERM to backends when a safety margin is crossed (too bad the error messages will be confusing in that case). I was rather hoping that a regular ENOMEM could be injected by this mechanism the next time a syscall is touched (unknown), but I'm not confident if this is made easier or not, one way or another. One could imagine the kernel injecting such a fault when the amount of memory being consumed starts to look hairy, but I surmise part of the impetus for userspace handling of that is to avoid getting into that particular heuristics game. Anyway, I did do some extensive study of cgroups and memcg's implementation in particular and found it not really practical for Postgres use unless one was happy with lots and lots of database restarts, and this work still gives me some hope to try again, even if smaller modifications still seem necessary. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some interesting news about Linux 3.12 OOM
On Sep 24, 2013 10:12 AM, Josh Berkus j...@agliodbs.com wrote: All, I've send kernel.org a message that we're keen on seeing these changes become committed. I thought it was merged already in 3.12. There are a few related patches, but here's one: commit 519e52473ebe9db5cdef44670d5a97f1fd53d721 Author: Johannes Weiner han...@cmpxchg.org Date: Thu Sep 12 15:13:42 2013 -0700 mm: memcg: enable memcg OOM killer only for user faults System calls and kernel faults (uaccess, gup) can handle an out of memory situation gracefully and just return -ENOMEM. Enable the memcg OOM killer only for user faults, where it's really the only option available. Signed-off-by: Johannes Weiner han...@cmpxchg.org Acked-by: Michal Hocko mho...@suse.cz Cc: David Rientjes rient...@google.com Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: azurIt azu...@pobox.sk Cc: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Andrew Morton a...@linux-foundation.org Signed-off-by: Linus Torvalds torva...@linux-foundation.org $ git tag --contains 519e52473ebe9db5cdef44670d5a97f1fd53d721 v3.12-rc1 v3.12-rc2 Searching for recent work by Johannes Weiner shows the pertinent stuff more exhaustively. BTW, in the future if anyone sees kernel.org contemplating a patch which helps or hurts Postgres, don't hesiate to speak up to them. They don't get nearly enough feedback from DB developers. I don't hesitate, most of the time I simply don't know. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote: I think the n-call underestimation propagation may not be quite precise for various detailed reasons (having to do with 'sticky' queries) and to make it precise is probably more work than it's worth. And, on more reflection, I'm also having a hard time imaging people intuiting that value usefully. So, here's a version removing that. I forgot about removal of the relevant SGML, amended here in v6. pg_stat_statements-identification-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some interesting news about Linux 3.12 OOM
I'm not sure how many of you have been tracking this but courtesy of lwn.net I have learned that it seems that the OOM killer behavior in Linux 3.12 will be significantly different. And by description, it sounds like an improvement. I thought some people reading -hackers might be interested. Based on the description at lwn, excerpted below, it sounds like the news might be that systems with overcommit on might return OOM when a non-outlandish request for memory is made from the kernel. Johannes Weiner has posted a set of patches aimed at improving this situation. Following a bunch of cleanup work, these patches make two fundamental changes to how OOM conditions are handled in the kernel. The first of those is perhaps the most visible: it causes the kernel to avoid calling the OOM killer altogether for most memory allocation failures. In particular, if the allocation is being made in response to a system call, the kernel will just cause the system call to fail with an ENOMEMerror rather than trying to find a process to kill. That may cause system call failures to happen more often and in different contexts than they used to. But, naturally, that will not be a problem since all user-space code diligently checks the return status of every system call and responds with well-tested error-handling code when things go wrong. Subject to experiment, this may be some good news, as many programs, libraries, and runtime environments that may run parallel to Postgres on a machine are pretty lackadaisical about limiting the amount of virtual memory charged to them, and overcommit off is somewhat punishing in those situations if one really needed a large hash table from Postgres or whatever. I've seen some cases here where a good amount of VM has been reserved and caused apparent memory pressure that cut throughput short of what should ought to be possible. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unpacking scalar JSON values
Per report of Armin Ronacher, it's not clear how to take a scalar JSON string and unquote it into a regular Postgres text value, given what I can see here: http://www.postgresql.org/docs/9.3/static/functions-json.html Example: SELECT 'a json string'::json; (Although this some problem could play out with other scalar JSON types): SELECT '4'::json; SELECT '2.0'::json; This use cases arises from some of the extant unpacking operations, such as json_array_elements. It's not that strange to have a value something something like this in a JSON: '{tags: [a \ string, b, c]}' Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unpacking scalar JSON values
On Sat, Aug 24, 2013 at 3:09 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 08/24/2013 11:36 PM, Daniel Farina wrote: Per report of Armin Ronacher, it's not clear how to take a scalar JSON string and unquote it into a regular Postgres text value, given what I can see here: http://www.postgresql.org/docs/9.3/static/functions-json.html Example: SELECT 'a json string'::json; (Although this some problem could play out with other scalar JSON types): SELECT '4'::json; SELECT '2.0'::json; This use cases arises from some of the extant unpacking operations, such as json_array_elements. It's not that strange to have a value something something like this in a JSON: '{tags: [a \ string, b, c]}' Thoughts? This was discussed to death at some point during development and the prevailing consensus was that json type is not representing the underlying structure/class instance/object but a string which encodes this object so if you convert a restricted (must comply to JSON Spec) string to unrestricted string you really just do a NoOp vast. This doesn't make a lot of sense to me. select * from json_each_text('{key: va\lue}'); is handy and gives one the json value of the text -- that is to say, dequoted. So it's not like unquoting is not already an operation seen in some of the operators: select * from json_each_text('{key: va\lue}'); key | value -+ key | value (1 row) But there's no good way I can find from the documentation to do it with a scalar: select ('va\lue'::json)::text; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unpacking scalar JSON values
On Sat, Aug 24, 2013 at 6:04 PM, Daniel Farina dan...@fdr.io wrote: But there's no good way I can find from the documentation to do it with a scalar: select ('va\lue'::json)::text; Triggered send by accident: select ('va\lue'::json)::text; text --- va\lue (1 row) the JSON escaping is retained. That may be reasonable for a text-cast, so I'm not suggesting its reinterpretation, but there is no operator I can identify immediately from the documentation to convert a JSON string value into a Postgres one like json_each_text, except on a json that contains a scalar JSON string. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF] Free VMs for Reviewers Testers
On Tue, Jul 9, 2013 at 12:24 AM, Jesper Krogh jes...@krogh.cc wrote: The really, really big ones are useful even for pushing limits, such as cr1.8xlarge, with 32 CPUs and 244GiB memory. Current spot instance price (the heavily discounted can die at any time one) is $0.343/hr. Otherwise, it's 3.500/hr. Just to keep in mind cpus are similar throttled: One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. This is also the equivalent to an early-2006 1.7 GHz Xeon processor referenced in our original documentation. This is only a statement of measurement (notably, it also is a metric that is SMP-processor-count-oblivious), for lack of a more sensible metric (certainly not clock cycles nor bogomips) in common use. Who knows what that does to memory bandwidth / context switches etc. Virtualization adds complexity, that is true, but so does a new version of Linux or comparing across microprocessors, motherboards, or operating systems. I don't see a good reason to be off-put in the common cases, especially since I can't think of another way to produce such large machines on a short-term obligation basis. The advantages are probably diminished (except for testing virtualization overhead on common platforms) for smaller machines that can be located sans-virtualization more routinely. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF] Free VMs for Reviewers Testers
On Mon, Jul 8, 2013 at 7:25 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 07/09/2013 08:35 AM, Josh Berkus wrote: Since these are cloud servers, they won't work well for performance testing. I did some work on that a while ago, and found that I was able to get _astonishingly_ stable performance results out of EC2 EBS instances using provisioned IOPS volumes. Running them as EBS Optimized didn't seem to be required for the workloads I was testing on. My colleague, Greg Burek, has done similar measurements and has assessed an overall similar conclusion: the EBS PIOPS product delivers exactly what it says on the tin...even under random access. They can be striped with software-RAID. These VMs aren't well suited to vertical scaling performance tests and pushing extremes, but they're really, really good for what impact does this patch have on regular real-world workloads. The really, really big ones are useful even for pushing limits, such as cr1.8xlarge, with 32 CPUs and 244GiB memory. Current spot instance price (the heavily discounted can die at any time one) is $0.343/hr. Otherwise, it's 3.500/hr. Another instance offering that -- unlike the former -- I have yet to experience in any way at all is the high-storage ones, also the hs1.8xlarge, with 16CPU, 117GB RAM, and 24 instance-local rotational media of 2TB apiece. This is enough to deliver sequential reads measured in a couple of GB/s. I think this is a workhorse behind the AWS Redshift data warehousing offering. I don't think this one has spot pricing either: my guess is availability is low. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] askpass program for libpq
On Fri, May 17, 2013 at 2:03 PM, Daniel Farina dan...@heroku.com wrote: On Wed, Jan 9, 2013 at 5:17 AM, Peter Eisentraut pete...@gmx.net wrote: I would like to have something like ssh-askpass for libpq. The main reason is that I don't want to have passwords in plain text on disk, even if .pgpass is read protected. By getting the password from an external program, I can integrate libpq tools with the host system's key chain or wallet thing, which stores passwords encrypted. I'm thinking about adding a new connection option askpass with environment variable PGASKPASS. One thing I haven't quite figured out is how to make this ask for passwords only if needed. Maybe it needs two connection options, one to say which program to use and one to say whether to use it. Ideas? Okay, I have a patch that does something *like* (but not the same) as this, and whose implementation is totally unreasonable, but it's enough to get a sense of how the whole thing feels. Critically, it goes beyond askpass, instead allowing a shell-command based hook for arbitrary interpretation and rewriting of connection info...such as the 'host' libpq keyword. I have called it, without much thought, a 'resolver'. In this way, it's closer to the libpq 'service' facility, except with addition of complete control of the interpretation of user-provided notation. Hello everyone, I'm sort of thinking of attacking this problem again, does anyone have an opinion or any words of (en/dis)couragement to continue? The implementation I posted is bogus but is reasonable to feel around with, but I'm curious besides its obvious defects as to what the temperature of opinion is. Most generally, I think the benefits are strongest in dealing with: * Security: out-of-band secrets will just prevent people from pasting important stuff all over the place, as I see despairingly often today. * Client-side Proxies: pgbouncer comes to mind, a variation being used on production applications right now that uses full-blown preprocessing of the user environment (only possible in a environment with certain assumptions like Heroku) https://github.com/gregburek/heroku-buildpack-pgbouncer seems very promising and effective, but it'd be nice to confer the same benefits to everyone else, too. * HA: one of the most annoying problems in HA is naming things. Yes, this could be solved with other forms of common dynamic binding DNS or Virtual IP (sometimes), but these both are pretty complicated and carry baggage and pitfalls, but as long as there is dynamic binding of the credentials, I'm thinking it may make sense to have dynamci binding of net locations, too. * Cross-server references This is basically the issues seen in HA and Security, but on (horrible) steroids: the spate of features making Postgres work cross-server (older features like dblink, but now also new ones like FDWs and Writable FDWs) make complex interconnection between servers more likely and problematic, especially if one has standbys where there is a delay in catalog propagation from a primary to standby with new connection info. So, an out of band way where one can adjust the dynamic binding seems useful there. Knowing those, am I barking up the wrong tree? Can I do something else entirely? I've considered DNS and SSL certs, but these seem much, much harder and limited, too. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] askpass program for libpq
On Sat, Jun 15, 2013 at 8:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Okay, I have a patch that does something *like* (but not the same) as this, and whose implementation is totally unreasonable, but it's enough to get a sense of how the whole thing feels. Critically, it goes beyond askpass, instead allowing a shell-command based hook for arbitrary interpretation and rewriting of connection info...such as the 'host' libpq keyword. I have called it, without much thought, a 'resolver'. In this way, it's closer to the libpq 'service' facility, except with addition of complete control of the interpretation of user-provided notation. Hello everyone, I'm sort of thinking of attacking this problem again, does anyone have an opinion or any words of (en/dis)couragement to continue? The implementation I posted is bogus but is reasonable to feel around with, but I'm curious besides its obvious defects as to what the temperature of opinion is. Most generally, I think the benefits are strongest in dealing with: * Security: out-of-band secrets will just prevent people from pasting important stuff all over the place, as I see despairingly often today. * Client-side Proxies: pgbouncer comes to mind, a variation being used on production applications right now that uses full-blown preprocessing of the user environment (only possible in a environment with certain assumptions like Heroku) https://github.com/gregburek/heroku-buildpack-pgbouncer seems very promising and effective, but it'd be nice to confer the same benefits to everyone else, too. * HA: one of the most annoying problems in HA is naming things. Yes, this could be solved with other forms of common dynamic binding DNS or Virtual IP (sometimes), but these both are pretty complicated and carry baggage and pitfalls, but as long as there is dynamic binding of the credentials, I'm thinking it may make sense to have dynamci binding of net locations, too. * Cross-server references This is basically the issues seen in HA and Security, but on (horrible) steroids: the spate of features making Postgres work cross-server (older features like dblink, but now also new ones like FDWs and Writable FDWs) make complex interconnection between servers more likely and problematic, especially if one has standbys where there is a delay in catalog propagation from a primary to standby with new connection info. So, an out of band way where one can adjust the dynamic binding seems useful there. TBH, I see no clear reason to think that a connection-string rewriter solves any of those problems. At best it would move them somewhere else. Yes, that's exactly what I want to achieve: moving them somewhere else that can be held in common by client applications. Nor is it clear that any of this should be libpq's business, as opposed to something an application might do before invoking libpq. Yes, it's unclear. I have only arrived at seriously exploring this after trying my very best meditate on other options. In addition, sometimes 'the application' is Postgres, and with diverse access paths like FDWs and dblink, which may not be so easy to adjust, and it would seem strange to adjust them in a way that can't be shared in common with regular non-backend-linked client applications. Also, it seems like a very high bar to set for an application as to make use of environment keychains or environment-specific high availability retargeting. This general approach you mention is used in Greg Burek's heroku-pgbouncer buildpack, but it took significant work to iron out (and probably still needs more ironing, although it seems to work great) and only serves the Heroku-verse. Basically, the needs seem very similar, so abstracting seems to me profitable. This is not that different in principle than pgpass and the services file in that regard, except taking the final step to delegate their function...and deliver full control over notation. Although I don't know much about it, I seem to recall that VMWare felt inclined to instigate some kind of vaguely related solution to solve a similar problem in a custom libpq. After initial recoil and over a year of contemplation, I think the reasons are more well justified than I originally thought and it'd be nice to de-weirdify such approaches. Also, I think a facility dependent on invoking a shell command is (a) wide open for security problems, and (b) not likely to be portable to Windows. Yeah, those things occurred to me, I think a dlopen based mechanism is a more likely solution than the shell-command one. The latter just let me get started quickly to experiment. Would a rigorous proposal about how to do that help the matter? I mostly wanted to get the temperature before thinking about Real Mechanisms. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] updated emacs configuration
On Thu, Jun 13, 2013 at 6:27 PM, Peter Eisentraut pete...@gmx.net wrote: First, I propose adding a .dir-locals.el file to the top-level directory with basic emacs settings. These get applied automatically. This especially covers the particular tab and indentation settings that PostgreSQL uses. With this, casual developers will not need to modify any of their emacs settings. Yes please. I've had the pgsql stuff in my .emacs for-ever (ever since I was a student and compelled to do homework on Postgres) and knew the magical rules about naming the directory, but it always felt so dirty and very much a 'you need to know the trick' level of intimacy. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Mon, Jun 10, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote: Anyway, what I'm pointing out is that this is a business decision, and there is no way that we can make a decision for the users what to do when we run out of WAL space. And that the stop archiving option needs to be there for users, as well as the shut down option. *without* requiring users to learn the internals of the archiving system to implement it, or to know the implied effects of non-obvious PostgreSQL settings. I don't doubt this, that's why I do have a no-op fallback for emergencies. The discussion was about defaults. I still think that drop-wal-from-archiving-whenever is not a good one. You may have noticed I also wrote that a neater, common way to drop WAL when under pressure might be nice, to avoid having it ad-hoc and all over, so it's not as though I wanted to suggest an Postgres feature to this effect was an anti-feature. And, as I wrote before, it's much easier to teach an external system to drop WAL than it is to teach Postgres to attenuate, hence the repeated correspondence from my fellows and myself about attenuation side of the equation. Hope that clears things up about where I stand on the matter. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Mon, Jun 10, 2013 at 4:42 PM, Josh Berkus j...@agliodbs.com wrote: Daniel, Jeff, I don't doubt this, that's why I do have a no-op fallback for emergencies. The discussion was about defaults. I still think that drop-wal-from-archiving-whenever is not a good one. Yeah, we can argue defaults for a long time. What would be better is some way to actually determine what the user is trying to do, or wants to happen. That's why I'd be in favor of an explict setting; if there's a setting which says: on_archive_failure=shutdown ... then it's a LOT clearer to the user what will happen if the archive runs out of space, even if we make no change to the defaults. And if that setting is changeable on reload, it even becomes a way for users to get out of tight spots. I like your suggestion, save one thing: it's not a 'failure' or archiving if it cannot keep up, provided one subscribes to the view that archiving is not elective. I nit pick at this because one might think this has something to do with a non-zero return code from the archiving program, which already has a pretty alarmist message in event of transient failures (I think someone brought this up on -hackers but a few months ago...can't remember if that resulted in a change). I don't have a better suggestion that is less jargonrific though, but I wanted to express my general appreciation as to the shape of the suggestion. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accidentally cause your database to shut down in the middle of the night. Inverted and just as well supported: if you want to not accidentally lose data, you better hire an expensive consultant to check your systems for all sorts of default 'safety = off' features. This being but the hypothetical first one. Furthermore, I see no reason why high quality external archiving software cannot exist. Maybe some even exists already, and no doubt they can be improved and the contract with Postgres enriched to that purpose. Contrast: JSON, where the stable OID in the core distribution helps pragmatically punt on a particularly sticky problem (extension dependencies and non-system OIDs), I can't think of a reason an external archiver couldn't do its job well right now. At which point most sensible users say no thanks, I'll use something else. Oh, I lost some disks, well, no big deal, I'll use the archives. Surprise! forensic analysis ensues So, as it turns out, it has been dropping segments at times because of systemic backlog for months/years. Alternative ending: Hey, I restored the database. later Why is the state so old? Why are customers getting warnings that their (thought paid) invoices are overdue? Oh crap, the restore was cut short by this stupid option and this database lives in the past! Fin. I have a clear bias in experience here, but I can't relate to someone who sets up archives but is totally okay losing a segment unceremoniously, because it only takes one of those once in a while to make a really, really bad day. Who is this person that lackadaisically archives, and are they just fooling themselves? And where are these archivers that enjoy even a modicum of long-term success that are not reliable? If one wants to casually drop archives, how is someone going to find out and freak out a bit? Per experience, logs are pretty clearly hazardous to the purpose. Basically, I think the default that opts one into danger is not good, especially since the system is starting from a position of do too much stuff and you'll crash. Finally, it's not that hard to teach any archiver how to no-op at user-peril, or perhaps Postgres can learn a way to do this expressly to standardize the procedure a bit to ease publicly shared recipes, perhaps. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Wed, Jun 5, 2013 at 10:27 PM, Joshua D. Drake j...@commandprompt.com wrote: On 6/5/2013 10:07 PM, Daniel Farina wrote: If I told you there were some of us who would prefer to attenuate the rate that things get written rather than cancel or delay archiving for a long period of time, would that explain the framing of the problem? I understand that based on what you said above. Or, is it that you understand that's what I want, but find the notion of such a operation hard to relate to? I think this is where I am at. To me, you don't attenuate the rate that things get written, you fix the problem in needing to do so. The problem is one of provisioning. Please note that I am not suggesting there aren't improvements to be made, there absolutely are. I just wonder if we are looking in the right place (outside of some obvious badness like the PANIC running out of disk space). Okay, well, I don't see the fact that the block device is faster than the archive command as a problem, it's just an artifact of the ratios of performance of stuff in the system. If one views archives as a must-have, there's not much other choice than to attenuate. An alternative is to buy a slower block device. That'd accomplish the same effect, but it's a pretty bizarre and heavyhanded way to go about it, and not easily adaptive to, say, if I made the archive command faster (in my case, I well could, with some work). So, I don't think it's all that unnatural to allow for the flexibility of a neat attenuation technique, and it's pretty important too. Methinks. Disagree? Final thought: I can't really tell users to knock off what they're doing on a large scale. It's better to not provide abrupt changes in service (like crashing or turning off everything for extended periods while the archive uploads). So, smoothness and predictability is desirable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Wed, Jun 5, 2013 at 11:05 PM, Joshua D. Drake j...@commandprompt.com wrote: On 6/5/2013 10:54 PM, Peter Geoghegan wrote: On Wed, Jun 5, 2013 at 10:27 PM, Joshua D. Drake j...@commandprompt.com wrote: I just wonder if we are looking in the right place (outside of some obvious badness like the PANIC running out of disk space). So you don't think we should PANIC on running out of disk space? If you don't think we should do that, and you don't think that WAL writing should be throttled, what's the alternative? As I mentioned in my previous email: Instead of running out of disk space PANIC we should just write to an emergency location within PGDATA and log very loudly that the SA isn't paying attention. Perhaps if that area starts to get to an unhappy place we immediately bounce into read-only mode and log even more loudly that the SA should be fired. I would think read-only mode is safer and more polite than an PANIC crash. I do not think we should worry about filling up the hard disk except to protect against data loss in the event. It is not user unfriendly to assume that a user will pay attention to disk space. Really? Okay, then I will say it's user unfriendly, especially for a transient use of space, and particularly if there's no knob for said SA to attenuate what's going on. You appear to assume the SA can lean on the application to knock off whatever is going on or provision more disk in time, or that disk is reliable enough to meet one's goals. In my case, none of these precepts are true or desirable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Thu, Jun 6, 2013 at 9:30 PM, Jeff Janes jeff.ja...@gmail.com wrote: I would oppose that as the solution, either an unconditional one, or configurable with is it as the default. Those segments are not unneeded. I need them. That is why I set up archiving in the first place. If you need to shut down the database rather than violate my established retention policy, then shut down the database. Same boat. My archives are the real storage. The disks are write-back caching. That's because the storage of my archives is probably three to five orders of magnitude more reliable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Wed, Jun 5, 2013 at 6:00 PM, Joshua D. Drake j...@commandprompt.com wrote: I didn't see that proposal, link? Because the idea of slowing down wal-writing sounds insane. It's not as insane as introducing an archiving gap, PANICing and crashing, or running this hunk o junk I wrote http://github.com/fdr/ratchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Wed, Jun 5, 2013 at 8:23 PM, Joshua D. Drake j...@commandprompt.com wrote: It's not as insane as introducing an archiving gap, PANICing and crashing, or running this hunk o junk I wrote http://github.com/fdr/ratchet Well certainly we shouldn't PANIC and crash but that is a simple fix. You have a backup write location and start logging really loudly that you are using it. If I told you there were some of us who would prefer to attenuate the rate that things get written rather than cancel or delay archiving for a long period of time, would that explain the framing of the problem? Or, is it that you understand that's what I want, but find the notion of such a operation hard to relate to? Or, am I misunderstanding your confusion? Or, none of the above? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Mon, May 27, 2013 at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote: On 27 May 2013 15:36, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Mon, May 27, 2013 at 08:26:48AM -0400, Stephen Frost wrote: That said, many discussions and ideas do get shut down, perhaps too early, because of pg_upgrade considerations. If we had a plan to have an incompatible release in the future, those ideas and discussions might be able to progress to a point where we determine it's worth it to take the pain of a non-pg_upgrade-supported release. That's a bit of a stretch, in my view, but I suppose it's possible. Even so though, I would suggest that we put together a wiki page to list out those items and encourage people to add to such a list; perhaps having an item on that list would make discussion about it progress beyond it breaks pg_upgrade. Yes, we should be collecting things we want to do for a pg_upgrade break so we can see the list all in one place. Precisely. We've said right along that we reserve the right to have a non-upgradable disk format change whenever sufficiently many reasons accumulate to do that. Here's one that's come up a few times: being able to tweak the out-of-line storage strategy, e.g. change the compression format used. I think some folks were lamenting the lack of a convenient byte in the right place for that one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better handling of archive_command problems
On Thu, May 16, 2013 at 9:13 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 16, 2013 at 10:06 PM, Daniel Farina dan...@heroku.com wrote: Do you have a sketch about mechanism to not encounter that problem? I didn't until just now, but see my email to Peter. That idea might be all wet, but off-hand it seems like it might work... However little it may matter, I would like to disagree with your opinion on this one: the current situation as I imagine encountered by *all* users of archiving is really unpleasant, 'my' shop or no. It would probably not be inaccurate to say that 99.% of archiving users have to live with a hazy control over the amount of data loss, only bounded by how long it takes for the system to full up the WAL file system and then for PostgreSQL to PANIC and crash (hence, no more writes are processed, and no more data can be lost). I'm really not trying to pick a fight here. What I am saying is that it is the problem is not so bad that we should accept the first design proposed, despite the obvious problems with it, without trying to find a better one. The first post to -hackers on this was 4 days ago. Feature freeze for the first release that could conceivably contain this feature will most likely occur about 8 months from now. We can take a few more days or weeks to explore alternatives, and I believe it will be possible to find good ones. If I were trying to kill this proposal, I could find better ways to do it than clearly articulating my concerns at the outset. I know you are not trying to kill it or anything, and I didn't mean to pick a fight either: my point is, without rancor, that I think that as-is the state of affairs fails the sniff test dimensions like availability and exposure to data loss, much worse than some of the negative effects of hobbling cancellations. And I completely agree with you that given the timing of thinking about this issue means one can marinate it for a while. And, I hope it goes without saying, I am grateful for your noodling on the matter, as always. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] askpass program for libpq
On Wed, Jan 9, 2013 at 5:17 AM, Peter Eisentraut pete...@gmx.net wrote: I would like to have something like ssh-askpass for libpq. The main reason is that I don't want to have passwords in plain text on disk, even if .pgpass is read protected. By getting the password from an external program, I can integrate libpq tools with the host system's key chain or wallet thing, which stores passwords encrypted. I'm thinking about adding a new connection option askpass with environment variable PGASKPASS. One thing I haven't quite figured out is how to make this ask for passwords only if needed. Maybe it needs two connection options, one to say which program to use and one to say whether to use it. Ideas? Okay, I have a patch that does something *like* (but not the same) as this, and whose implementation is totally unreasonable, but it's enough to get a sense of how the whole thing feels. Critically, it goes beyond askpass, instead allowing a shell-command based hook for arbitrary interpretation and rewriting of connection info...such as the 'host' libpq keyword. I have called it, without much thought, a 'resolver'. In this way, it's closer to the libpq 'service' facility, except with addition of complete control of the interpretation of user-provided notation. I think it would be useful to have an in-process equivalent (e.g. loading hooks via dynamic linking, like the backend), if the functionality seems worthwhile, mostly for performance and a richer protocol between libpq and resolvers (error messages come to mind). I think with a bit of care this also would allow other drivers to more easily implement some of the features grown into libpq (by re-using the shared-object's protocol, or a generic wrapper that can load the shared object to provide a command line interface), like the .pgpass file and the services file, which are often missing from new driver implementations. With these disclaimers out of the way, here's a 'demo'. It's long, but hopefully not as dense as it first appears since it's mostly a narration of walking through using this. Included alongside the patch is a short program that integrates with the freedesktop.org secret service standard, realized in the form of libsecret (I don't know exactly what the relationship is, but I used libsecret's documentation[0]). This resolver I wrote is not a solid piece of work either, much like the libpq patch I've attached...but, I probably will try to actually make personal use of this for a time after fixing a handful of things to feel it out over an extended period. On Debian, one needs gir1.2-secret-1 and Python 2.7. I run this on Ubuntu Raring, and have traced my steps using the GNOME Keyring used there, Seahorse. # This tool has some help: $ ./pq-secret-service -h $ ./pq-secret-service store -h $ ./pq-secret-service recall -h # Store a secret: $ printf 'dbname=regression host=localhost port=5432 user=user'\ ' password=password' | ./pq-secret-service store Now, you can check your keyring manager, and search for 'postgres' or 'postgresql'. You should see a stored PostgreSQL secret, probably labeled PostgreSQL Database Information After having compiled a libpq with my patch applied and making sure a 'psql' is using that libpq... # Set the 'resolve' command: $ export PGRESOLVE='/path/to/pq-secret-service recall' # Attempt to connect to regression database. The credentials are # probably not enabled on your database, and so you'll see the # appropriate error message instead. $ psql regression What this does is searches the keyrings in the most straightforward way given the Secret Service protocol to try to 'complete' the input passed, based on any of the (non-secret) parts of the passed conninfo to 'pq-secret-service store'. Here's an example where the resolver opts to not load something from the keyring: $ psql 'dbname=regression user=not-the-one-stored' Now the resolver doesn't find a secret, so it just passes-through the original connection info. Ditto if one passes a password. Here are more examples, this time explored by invoking the resolver command directly, without having to go through libpq. This makes testing and experimentation somewhat easier: $ echo $(printf 'regression' | ./pq-secret-service recall) $ echo $(printf 'dbname=regression' | ./pq-secret-service recall) $ echo $(printf 'host=localhost port=5432' | ./pq-secret-service recall) # Show failing to match the secret (the input provided will be # returned unchanged): $ echo $(printf 'host=elsewhere port=5432' | ./pq-secret-service recall) In some aspects the loose matching is kind of neat, because usually some subset of database names, role names, and host names are sufficient to find the right thing, but this is besides the point: the resolver program can be made very strict, or lax, or can even try opening a database connection to feel things out. I do want to make sure that at least some useful baseline functionality can be implemented, though,
Re: [HACKERS] askpass program for libpq
On Fri, May 17, 2013 at 2:03 PM, Daniel Farina dan...@heroku.com wrote: Thanks for getting through all that text. Fin. And, thoughts? I have uploaded the resolvers, the last mail, and the patch to github: https://github.com/fdr/pq-resolvers So, if one prefers to use git to get this and track potential changes, or add contributions, please do feel encouraged to do so. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better handling of archive_command problems
On Thu, May 16, 2013 at 5:43 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, May 16, 2013 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote: Well, I think it IS a Postgres precept that interrupts should get a timely response. You don't have to agree, but I think that's important. Well, yes, but the fact of the matter is that it is taking high single digit numbers of seconds to get a response at times, so I don't think that there is any reasonable expectation that that be almost instantaneous. I don't want to make that worse, but then it might be worth it in order to ameliorate a particular pain point for users. At times, like when the system is under really heavy load? Or at times, like depending on what the backend is doing? We can't do a whole lot about the fact that it's possible to beat a system to death so that, at the OS level, it stops responding. Linux is unfriendly enough to put processes into non-interruptible kernel wait states when they're waiting on the disk, a decision that I suspect to have been made by a sadomasochist. But if there are times when a system that is not responding to cancels in under a second when not particularly heavily loaded, I would consider that a bug, and we should fix it. There is a setting called zero_damaged_pages, and enabling it causes data loss. I've seen cases where it was enabled within postgresql.conf for years. That is both true and bad, but it is not a reason to do more bad things. I don't think it's bad. I think that we shouldn't be paternalistic towards our users. If anyone enables a setting like zero_damaged_pages (or, say, wal_write_throttle) within their postgresql.conf indefinitely for no good reason, then they're incompetent. End of story. That's a pretty user-hostile attitude. Configuration mistakes are a very common user error. If those configuration hose the system, users expect to be able to change them back, hit reload, and get things back on track. But you're proposing a GUC that, if set to a bad value, will very plausibly cause the entire system to freeze up in such a way that it won't respond to a reload request - or for that matter a fast shutdown request. I think that's 100% unacceptable. Despite what you seem to think, we've put a lot of work into ensuring interruptibility, and it does not make sense to abandon that principle for this or any other feature. The inability to shut down in such a situation is not happy at all, as you say, and the problems with whacking the GUC around due to non-interruptability is pretty bad too. Would you feel better about it if the setting had a time-out? Say, the user had to explicitly re-enable it after one hour at the most? No, but I'd feel better about it if you figured out a way avoid creating a scenario where it might lock up the entire database cluster. I am convinced that it is possible to avoid that Do you have a sketch about mechanism to not encounter that problem? and that without that this is not a feature worthy of being included in PostgreSQL. Yeah, it's more work that way. But that's the difference between a quick hack that is useful in our shop and a production-quality feature ready for a general audience. However little it may matter, I would like to disagree with your opinion on this one: the current situation as I imagine encountered by *all* users of archiving is really unpleasant, 'my' shop or no. It would probably not be inaccurate to say that 99.% of archiving users have to live with a hazy control over the amount of data loss, only bounded by how long it takes for the system to full up the WAL file system and then for PostgreSQL to PANIC and crash (hence, no more writes are processed, and no more data can be lost). Once one factors in the human cost of having to deal with that down time or monitor it to circumvent this, I feel as though the bar for quality should be lowered. As you see, we've had to resort to horrific techniques that to get around this problem. I think this is something serious enough that it is worth doing better, but the bind that people doing archiving find themselves in is much worse at the margins -- involving data loss and loss of availability -- and accordingly, I think the bar for some kind of solution should be lowered, insomuch as that at least the interface should be right enough to not be an albatross later (of which this proposal may not meet). That said, there is probably a way to please everyone and do something better. Any ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on existing update construct
On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar devdas.kum...@gmail.com wrote: Hello, Is there an alternative of Sybase on existing update construct in pgsql. ON DUPLICATE KEY UPDATE doesn't work. Thanks in advance! No, you'll have to either handle this in the application or use a stored procedure at this time. The omission of such a construct from psql's \h command and the manual is not in error. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)
On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux machine: - 64.09% postgres postgres [.] tas - tas - 99.83% s_lock - 53.22% LWLockAcquire + 99.87% GetSnapshotData - 46.78% LWLockRelease GetSnapshotData + GetTransactionSnapshot + 2.97% postgres postgres [.] tas + 1.53% postgres libc-2.13.so [.] 0x119873 + 1.44% postgres postgres [.] GetSnapshotData + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable + 1.18% postgres postgres [.] AllocSetAlloc ... So, on this test, a lot of time is wasted spinning on the mutex of ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a surprisingly steep drop in performance once you go beyond 29 clients (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory is that after that point all the cores are busy, and processes start to be sometimes context switched while holding the spinlock, which kills performance. I have, I also used linux perf to come to this conclusion, and my determination was similar: a system was undergoing increasingly heavy load, in this case with processes number of processors. It was also a phase-change type of event: at one moment everything would be going great, but once a critical threshold was hit, s_lock would consume enormous amount of CPU time. I figured preemption while in the spinlock was to blame at the time, given the extreme nature. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)
On Wed, May 15, 2013 at 3:08 PM, Daniel Farina dan...@heroku.com wrote: On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux machine: - 64.09% postgres postgres [.] tas - tas - 99.83% s_lock - 53.22% LWLockAcquire + 99.87% GetSnapshotData - 46.78% LWLockRelease GetSnapshotData + GetTransactionSnapshot + 2.97% postgres postgres [.] tas + 1.53% postgres libc-2.13.so [.] 0x119873 + 1.44% postgres postgres [.] GetSnapshotData + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable + 1.18% postgres postgres [.] AllocSetAlloc ... So, on this test, a lot of time is wasted spinning on the mutex of ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a surprisingly steep drop in performance once you go beyond 29 clients (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory is that after that point all the cores are busy, and processes start to be sometimes context switched while holding the spinlock, which kills performance. I accidentally some important last words from Heikki's last words in his mail, which make my correspondence pretty bizarre to understand at the outset. Apologies. He wrote: [...] Has anyone else seen that pattern? I have, I also used linux perf to come to this conclusion, and my determination was similar: a system was undergoing increasingly heavy load, in this case with processes number of processors. It was also a phase-change type of event: at one moment everything would be going great, but once a critical threshold was hit, s_lock would consume enormous amount of CPU time. I figured preemption while in the spinlock was to blame at the time, given the extreme nature. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better handling of archive_command problems
On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan p...@heroku.com wrote: Has anyone else thought about approaches to mitigating the problems that arise when an archive_command continually fails, and the DBA must manually clean up the mess? Notably, the most common problem in this vein suffered at Heroku has nothing to do with archive_command failing, and everything to do with the ratio of block device write performance (hence, backlog) versus the archiving performance. When CPU is uncontended it's not a huge deficit, but it is there and it causes quite a bit of stress. Archive commands failing are definitely a special case there, where it might be nice to bring write traffic to exactly zero for a time. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_controldata gobbledygook
On Thu, Apr 25, 2013 at 9:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: I think I've heard of scripts grepping the output of pg_controldata for this that or the other. Any rewording of the labels would break that. While I'm not opposed to improving the labels, I would vote against your second, abbreviated scheme because it would make things ambiguous for simple grep-based scripts. We could provide two alternative outputs, one for human consumption with the proposed format and something else that uses, say, shell assignment syntax. (I did propose this years ago and I might have an unfinished patch still lingering about somewhere.) And a script would use that how? pg_controldata --machine-friendly would fail outright on older versions. I think it's okay to ask script writers to write pg_controldata | grep -e 'old label|new label' but not okay to ask them to deal with anything as complicated as trying a switch to see if it works or not. From what I'm reading, it seems like the main benefit of the changes is to make things easier for humans to skim over. Automated programs that care about precise meanings of each field are awkwardly but otherwise well-served by the precise output as rendered right now. What about doing something similar but different from the --machine-readable proposal, such as adding an option for the *human*-readable variant that is guaranteed to mercilessly change as human-readers/-hackers sees fit on whim? It's a bit of a kludge that this is not the default, but would prevent having to serve two quite different masters with the same output. Although I'm not seriously proposing explicitly -h (as seen in some GNU programs in rendering byte sizes and the like...yet could be confused for 'help'), something like that may serve as prior art. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 release notes suggestions
On Wed, Apr 24, 2013 at 6:30 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 23, 2013 at 11:41 PM, Bruce Momjian br...@momjian.us wrote: Thanks for the many suggestions on improving the 9.3 release notes. There were many ideas I would have never thought of. Please keep the suggestions coming. Bruce, Thanks for writing them! Consider the sentiment duplicated. Thank you, Bruce. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] confusing message about archive failures
On Wed, Apr 17, 2013 at 7:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wednesday, April 17, 2013, Peter Eisentraut wrote: When archive_command fails three times, it prints this message into the logs: transaction log file \%s\ could not be archived: too many failures This leaves it open what happens next. What will actually happen is that it will usually try again after 60 seconds or so, but the message indicates something much more fatal than that. Could we rephrase this a little bit to make it less dramatic, like ... too many failures, will try again later ? +1 I've found the current message alarming/confusing as well. But I don't really understand the logic behind bursting the attempts, 3 of them one second apart, then sleeping 57 seconds, in the first place. Same. By now I am numb, but when I was first rolling out archives ages ago the message was cause for more much alarm than was indicated. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 11:08 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-17 18:16:36 -0700, Daniel Farina wrote: The original paper is often shorthanded Castagnoli 93, but it exists in the IEEE's sphere of influence and is hard to find a copy of. Luckily, a pretty interesting survey paper discussing some of the issues was written by Koopman in 2002 and is available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a pedagolgical note, it's pretty interesting and accessible piece of writing (for me, as someone who knows little of error detection/correction) and explains some of the engineering reasons that provoke such exercises. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=arnumber=231911userType=inst There's also a koopman paper from 2004 thats interesting. Having read the 2002 paper more, it seems that the current CRC32 doesn't have a whole lot going for it: CRC32C pretty much cleans its clock across the board (I don't understand detected Hamming Distance that seem greater than the information content of the message, e.g. HD 14 with 8 bit messages as seen in CRC32C: that's where CRC32 can win). CRC32C looks, all in all, the most flexible, because detection of Hamming Distance 4 spans from 5244-131072 bits (the upper range of which is a full 16KiB!) and there is superior Hamming Distance detection on shorter messages up until the point where it seems like the Hamming Distance able to be detected is larger than the message size itself (e.g. HM 13 on an 8 bit message). I'm not sure if this is an error in my understanding, or what. Also, larger runs (16KB) are better served by CRC32C: even the probably-best contender I can see (0xD419CC15) drops to Hamming Distance 2-detection right after 65505 bits. CRC32C has the biggest range at HD4, although Koopman 0xBA0DC66 comes close, gaining superior Hamming distance detection for 178-16360 bits (the upper end of this rnage is short of 2KiB by 3 bytes). All in all, there is no reason I can see to keep CRC32 at all, vs CRC32C on the basis of error detection alone, so putting aside all the business about instruction set architecture, I think a software CRC32C in a vacuum can be seen as a robustness improvement. There may be polynomials that are not CRC32 or CRC32C that one might view as having slightly better tradeoffs as seen in Table 1 of Koopman 2002, but it's kind of a stretch: being able to handle 8KB and 16KB as seen in CRC32C at HD4 as seen in CRC32C is awfully compelling to me. Koopman 0xBA0DC66B can admirably reach HD6 on a much larger range, up to 16360 bytes, which is every so shy of 2KiB. Castagnoli 0xD419CC15 can, short of 8KB by 31 bits can detect HD 5. Corrections welcome on my interpretations of Tbl 1. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith g...@2ndquadrant.com wrote: Let me see if I can summarize where the messages flying by are at since you'd like to close this topic for now: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. -Committed checksum feature uses truncated CRC-32. This has known good error detection properties, but is expensive to compute. There's reason to believe that particular computation will become cheaper on future platforms though. But taking full advantage of that will require adding CPU-specific code to the database. -The latest idea is using the Fowler–Noll–Vo hash function: https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash There's 20 years of research around when that is good or bad. The exactly properties depend on magic FNV primes: http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that can vary based on both your target block size and how many bytes you'll process at a time. For PostgreSQL checksums, one of the common problems--getting an even distribution of the hashed values--isn't important the way it is for other types of hashes. Ants and Florian have now dug into how exactly that and specific CPU optimization concerns impact the best approach for 8K database pages. This is very clearly a 9.4 project that is just getting started. I was curious about the activity in this thread and wanted to understand the tradeoffs, and came to the same understanding as you when poking around. It seems the tough aspect of the equation is that the most well studied thing is slow (CRC-32C) unless you have special ISA support Trying to find as much information and conclusive research on FNV was a lot more challenging. Fletcher is similar in that regard. Given my hasty attempt to understand each of the alternatives, my qualitative judgement is that, strangely enough, the most conservative choice of the three (in terms of being understood and treated in the literature more than ten times over) is CRC-32C, but it's also the one being cast as only suitable inside micro-optimization. To add another, theoretically-oriented dimension to the discussion, I'd like suggest it's also the most thoroughly studied of all the alternatives. I really had a hard time finding follow-up papers about the two alternatives, but to be fair, I didn't try very hard...then again, I didn't try very hard for any of the three, it's just that CRC32C was by far the easiest find materials on. The original paper is often shorthanded Castagnoli 93, but it exists in the IEEE's sphere of influence and is hard to find a copy of. Luckily, a pretty interesting survey paper discussing some of the issues was written by Koopman in 2002 and is available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a pedagolgical note, it's pretty interesting and accessible piece of writing (for me, as someone who knows little of error detection/correction) and explains some of the engineering reasons that provoke such exercises. Basically...if it comes down to understand what the heck is going on and what the trade-offs are, it was a lot easier to brush up on CRC32-C in my meandering around the Internet. One might think this level of scrutiny would constitute a viable explanation of why CRC32C found its way into several standards and then finally in silicon. All in all, if the real world costs of CRC32C on not-SSE4.2 are allowable, I think it's the most researched and and conservative option, although perhaps some of the other polynomials seen in Koopman could also be desirable. It seems there's a tradeoff in CRC polynomials between long-message and short-message error detection, and the paper above may allow for a more informed selection. CRC32C is considered a good trade-off for both, but I haven't assessed the paper in enough detail to suggest whether there are specialized long-run polynomials that may be better still (although, then, there is also the microoptimization question, which postdates the literature I was looking at by a lot). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interesting post-mortem on a near disaster with git
On Wed, Apr 3, 2013 at 6:18 PM, Jim Nasby j...@nasby.net wrote: What about rdiff-backup? I've set it up for personal use years ago (via the handy open source bash script backupninja) years ago and it has a pretty nice no-frills point-in-time, self-expiring, file-based automatic backup program that works well with file synchronization like rsync (I rdiff-backup to one disk and rsync the entire rsync-backup output to another disk). I've enjoyed using it quite a bit during my own personal-computer emergencies and thought the maintenance required from me has been zero, and I have used it from time to time to restore, proving it even works. Hardlinks can be used to tag versions of a file-directory tree recursively relatively compactly. It won't be as compact as a git-aware solution (since git tends to to rewrite entire files, which will confuse file-based incremental differential backup), but the amount of data we are talking about is pretty small, and as far as a lowest-common-denominator tradeoff for use in emergencies, I have to give it a lot of praise. The main advantage it has here is it implements point-in-time recovery operations that easy to use and actually seem to work. That said, I've mostly done targeted recoveries rather than trying to recover entire trees. I have the same set up, and same feedback. I had the same setup, but got tired of how rdiff-backup behaved when a backup was interrupted (very lengthy cleanup process). Since then I've switched to an rsync setup that does essentially the same thing as rdiff-backup (uses hardlinks between multiple copies). The only downside I'm aware of is that my rsync backups aren't guaranteed to be consistent (for however consistent a backup of an active FS would be...). I forgot to add one more thing to my first mail, although it's very important to my feeble recommendation: the problem is that blind synchronization is a great way to propagate destruction. rdiff-backup (but perhaps others, too) has a file/directory structure that is, as far as I know, additive, and the pruning can be done independently at different replicas that can have different retention...and if done just right (I'm not sure about the case of concurrent backups being taken) one can write a re-check that no files are to be modified or deleted by the synchronization as a safeguard. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting to 9.3 beta
On Fri, Mar 29, 2013 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote: - pg_stat_statements: query, session, and eviction identification: Seems to need at least docs = wait for author, seems to be easy enough? I would have responded by now, but recent events have unfortunately made me put a lot of things that take longer than fifteen minutes on hold. Still, I am watching. Like you say, it's not terribly invasive. The under-estimation error propagation is perhaps a bit too weird to justify (even though it's a simple implementation). -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default connection parameters for postgres_fdw and dblink
On Wed, Mar 27, 2013 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 27, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Use a service file maybe? But you can't have it both ways: either we like the behavior of libpq absorbing defaults from the postmaster environment, or we don't. You were just arguing this was a bug, and now you're calling it a feature. Maybe we could compromise and call it a beature. Or a fug. In all seriousness, it's in that grey area where most people would probably consider this a surprising and unwanted behavior, but there might be a few out there who had a problem and discovered that they could use this trick to solve it. In terms of a different solution, what about a GUC that can contain a connection string which is used to set connection defaults, but which can still be overriden? So it would mimic the semantics of setting an environment variable, but it would be better, because you could do all of the usual GUC things with it instead of having to hack on the postmaster startup environment. In my meta-experience, nobody really wants defaults in that configuration anyway, and if they do, most of them would be appreciative of being notified positively of such a problem of relying on connection defaults (maybe with a warning at first at most?). In this case, such an abrupt change in the contrib is probably fine. From the vantage point I have: full steam ahead, with the sans GUC solution...it's one less detailed GUC in the world, and its desired outcome can be achieved otherwise. I'd personally rather open myself up for dealing with the consequences of this narrow change in my own corner of the world. This anecdote carries a standard disclaimer: my meta-experience doesn't include all use cases for everyone everywhere. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC project : K-medoids clustering in Madlib
On Tue, Mar 26, 2013 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Atri Sharma atri.j...@gmail.com writes: I suggested a couple of algorithms to be implemented in MADLib(apart from K Medoids). You could pick some(or all) of them, which would require 3 months to be completed. As for more information on index, you can refer http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1 along with the postgres wiki. The wiki is the standard for anything postgres. pg_trgm used KNN, but I believe it uses its own implementation of the algorithm. The idea I proposed aims at writing an implementation in the MADlib so that any client program can use the algorithm(s) in their code directly, using MADlib functions. I'm a bit confused as to why this is being proposed as a Postgres-related project. I don't even know what MADlib is, but I'm pretty darn sure that no part of Postgres uses it. KNNGist certainly doesn't. It's a reasonably well established extension for Postgres for statistical and machine learning methods. Rather neat, but as you indicate, it's not part of Postgres proper. http://madlib.net/ https://github.com/madlib/madlib/ -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interesting post-mortem on a near disaster with git
On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Back when we used CVS for quite a few years I kept 7 day rolling snapshots of the CVS repo, against just such a difficulty as this. But we seem to be much better organized with infrastructure these days so I haven't done that for a long time. well there is always room for improvement(and for learning from others) - but I agree that this proposal seems way overkill. If people think we should keep online delayed mirrors we certainly have the resources to do that on our own if we want though... What about rdiff-backup? I've set it up for personal use years ago (via the handy open source bash script backupninja) years ago and it has a pretty nice no-frills point-in-time, self-expiring, file-based automatic backup program that works well with file synchronization like rsync (I rdiff-backup to one disk and rsync the entire rsync-backup output to another disk). I've enjoyed using it quite a bit during my own personal-computer emergencies and thought the maintenance required from me has been zero, and I have used it from time to time to restore, proving it even works. Hardlinks can be used to tag versions of a file-directory tree recursively relatively compactly. It won't be as compact as a git-aware solution (since git tends to to rewrite entire files, which will confuse file-based incremental differential backup), but the amount of data we are talking about is pretty small, and as far as a lowest-common-denominator tradeoff for use in emergencies, I have to give it a lot of praise. The main advantage it has here is it implements point-in-time recovery operations that easy to use and actually seem to work. That said, I've mostly done targeted recoveries rather than trying to recover entire trees. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Fri, Mar 22, 2013 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: This contains some edits to comments that referred to the obsolete and bogus TupleDesc scanning. No mechanical alterations. Applied with some substantial revisions. I didn't like where you'd put the apply/restore calls, for one thing --- we need to wait to do the applies until we have the PGresult in hand, else we might be applying stale values of the remote's GUCs. Also, adding a call that could throw errors right before materializeResult() won't do, because that would result in leaking the PGresult on error. Good catches. The struct for state seemed a bit of a mess too, given that you couldn't always initialize it in one place. Yeah, I had to give that up when pushing things around, unless I wanted to push more state down. It used to be neater. (In hindsight I could have left that alone given where I ended up putting the calls, but it didn't seem to be providing any useful isolation.) I studied your commit. Yeah, the idea I had was to try to avoid pushing down a loaded a value as a PGconn into the lower level helper functions, but perhaps that economy was false one after the modifications. Earlier versions used to push down the RemoteGucs struct instead of a full-blown conn to hint to the restricted purpose of that reference. By conceding to this pushdown I think the struct could have remained, as you said, but the difference to clarity is likely marginal. I thought I found a way to not have to widen the parameter list at all, so I preferred that one, but clearly it is wrong, w.r.t. leaks and the not up-to-date protocol state. Sorry you had to root around so much in there to get something you liked, but thanks for going through it. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
This contains some edits to comments that referred to the obsolete and bogus TupleDesc scanning. No mechanical alterations. --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2961,9 +2961,8 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn) } /* - * Scan a TupleDesc and, should it contain types that are sensitive to - * GUCs, acquire remote GUCs and set them in a new GUC nesting level. - * This is undone with restoreLocalGucs. + * Acquire remote GUCs that may affect type parsing and set them in a + * new GUC nesting level. */ static void applyRemoteGucs(remoteGucs *rgs) @@ -2974,11 +2973,8 @@ applyRemoteGucs(remoteGucs *rgs) int addedGucNesting = false; /* -* Affected types require local GUC manipulations. Create a new -* GUC NestLevel to overlay the remote settings. -* -* Also, this nesting is done exactly once per remoteGucInfo -* structure, so expect it to come with an invalid NestLevel. +* This nesting is done exactly once per remoteGucInfo structure, +* so expect it to come with an invalid NestLevel. */ Assert(rgs-localGUCNestLevel == -1); diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index 3946485..579664e 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -930,9 +930,8 @@ SELECT dblink_exec('myconn', 'SET datestyle = GERMAN, DMY;'); SET (1 row) --- The following attempt test various paths at which TupleDescs are --- formed and inspected for containment of types requiring local GUC --- setting. +-- The following attempt test various paths at which tuples are formed +-- and inspected for containment of types requiring local GUC setting. -- single row synchronous case SELECT * FROM dblink('myconn', diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql index de925eb..7ff43fd 100644 --- a/contrib/dblink/sql/dblink.sql +++ b/contrib/dblink/sql/dblink.sql @@ -435,9 +435,8 @@ SET timezone = UTC; SELECT dblink_connect('myconn','dbname=contrib_regression'); SELECT dblink_exec('myconn', 'SET datestyle = GERMAN, DMY;'); --- The following attempt test various paths at which TupleDescs are --- formed and inspected for containment of types requiring local GUC --- setting. +-- The following attempt test various paths at which tuples are formed +-- and inspected for containment of types requiring local GUC setting. -- single row synchronous case SELECT * -- fdr dblink-guc-sensitive-types-v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 10:37 PM, Daniel Farina dan...@heroku.com wrote: I added programming around various NULL returns reading GUCs in this revision, v4. Okay, one more of those fridge-logic bugs. Sorry for the noise. v5. A missing PG_RETHROW to get the properly finally-esque semantics: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS) } PG_CATCH(); { + /* Pop any set GUCs, if necessary */ restoreLocalGucs(rgs); + + PG_RE_THROW(); } PG_END_TRY(); This was easy to add a regression test to exercise, and so I did (not displayed here). -- fdr dblink-guc-sensitive-types-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Wed, Mar 20, 2013 at 7:43 AM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Okay, one more of those fridge-logic bugs. Sorry for the noise. v5. A missing PG_RETHROW to get the properly finally-esque semantics: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS) } PG_CATCH(); { + /* Pop any set GUCs, if necessary */ restoreLocalGucs(rgs); + + PG_RE_THROW(); } PG_END_TRY(); Um ... you shouldn't need a PG_TRY for that at all. guc.c will take care of popping the values on transaction abort --- that's really rather the whole point of having that mechanism. Hmm, well, merely raising the error doesn't reset the GUCs, so I was rather thinking that this was a good idea to compose more neatly in the case of nested exception processing, e.g.: PG_TRY(); { PG_TRY(); { elog(NOTICE, pre: %s, GetConfigOption(DateStyle, false, true)); materializeResult(fcinfo, res); } PG_CATCH(); { elog(NOTICE, inner catch: %s, GetConfigOption(DateStyle, false, true)); PG_RE_THROW(); } PG_END_TRY(); } PG_CATCH(); { elog(NOTICE, outer catch: %s, GetConfigOption(DateStyle, false, true)); restoreLocalGucs(rgs); elog(NOTICE, restored: %s, GetConfigOption(DateStyle, false, true)); PG_RE_THROW(); } PG_END_TRY(); I don't think this paranoia is made in other call sites for AtEOXact, so included is a version that takes the same stance. This one shows up best with the whitespace-insensitive option for git: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -634,21 +634,8 @@ dblink_fetch(PG_FUNCTION_ARGS) * affect parsing and then un-set them afterwards. */ initRemoteGucs(rgs, conn); - - PG_TRY(); - { applyRemoteGucs(rgs); materializeResult(fcinfo, res); - } - PG_CATCH(); - { - /* Pop any set GUCs, if necessary */ - restoreLocalGucs(rgs); - - PG_RE_THROW(); - } - PG_END_TRY(); - restoreLocalGucs(rgs); return (Datum) 0; @@ -823,9 +810,6 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async) if (freeconn) PQfinish(conn); - /* Pop any set GUCs, if necessary */ - restoreLocalGucs(rgs); - PG_RE_THROW(); } PG_END_TRY(); -- fdr dblink-guc-sensitive-types-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Thu, Mar 14, 2013 at 8:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, watching the remote side's datestyle and intervalstyle and matching them (for both input and output) would probably work. Alright, so I've been whacking at this and there's one interesting thing to ask about: saving and restoring the local GUCs. There are a bunch of things about GUCs besides their value that are maintained, such as their 'source', so writing a little ad-hoc save/restore is not going to do the right thing. Right, you should use NewGUCNestLevel/AtEOXact_GUC. Look at the fixes I committed in postgres_fdw a day or two ago for an example. Thanks. Here's a patch. Here is the comments on top of the patch file inlined: Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but taking into account that a dblink caller may choose to cause arbitrary changes to DateStyle and IntervalStyle. To handle this, it is necessary to use PQparameterStatus before parsing any input, every time. This is avoided in cases that do not include the two problematic types treated -- interval and timestamptz -- by scanning the TupleDesc's types first. Although it has been suggested that extra_float_digits would need similar treatment as IntervalStyle and DateStyle (as it's seen in pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not able to be checked in PQparameterStatus, and furthermore, the float4 and float8 parsers are not sensitive to the GUC anyway: both accept as much precision as is provided in an unambiguous way. So, I can add one more such use of AtEOXact_GUC for the dblink fix, but would it also be appropriate to find some new names for the concepts (instead of AtEOXact_GUC and isCommit) here to more accurately express what's going on? Meh. I guess we could invent an EndGUCNestLevel that's a wrapper around AtEOXact_GUC, but I'm not that excited about it ... Per your sentiment, I won't pursue that then. Overall, the patch I think has reasonably thorough regression testing (I tried to hit the several code paths whereby one would have to scan TupleDescs, as well as a few other edge cases) and has some of the obvious optimizations in place: it doesn't call PQparameterStatus more than once per vulnerable type per TupleDesc scan, and avoids using the parameter status procedure at all if there are no affected types. The mechanisms may be overwrought though, since it was originally intended to generalize to three types before I realized that extra_float_digits is another kind of problem entirely, leaving just IntervalStyle and DateStyle remaining, which perhaps could have been treated even more specially to make the patch more terse. I'll add it to the commitfest. -- fdr dblink-guc-sensitive-types-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but taking into account that a dblink caller may choose to cause arbitrary changes to DateStyle and IntervalStyle. To handle this, it is necessary to use PQparameterStatus before parsing any input, every time. This is avoided in cases that do not include the two problematic types treated -- interval and timestamptz -- by scanning the TupleDesc's types first. Hm. Is that really a win? Examining the tupdesc wouldn't be free either, and what's more, I think you're making false assumptions about which types have behavior dependent on the GUCs. You definitely missed out on date and plain timestamp, and what of domains over these types? Yes, I also forgot about arrays, and nested composites in arrays or nested composites. As soon as recursive types are introduced the scan is not sufficient for sure: it's necessary to copy every GUC unless one wants to recurse through the catalogs which feels like a heavy loss. I presumed at the time that skimming over the tupdecs to compare a few Oids would be significantly cheaper than the guts of PQparameterStatus, which I believe to be a linked list traversal. I mostly wrote that optimization because it was easy coincidentally from how I chose to structure the code rather than belief in its absolute necessity. And, as you said, I forgot a few types even for the simple case, which is a bit of a relief because some of the machinery I wrote for the n = 3 case will come in useful for that. I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. I'll take into account these problems and post a new version. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. Doesn't seem that hard to test: run a dblink query that pulls back a bunch of data under best-case conditions (ie, local not remote server), and time it with and without the proposed fix. If the difference is marginal then it's not worth working hard to optimize. Okay, will do, and here's the shorter and less mechanically intensive naive version that I think is the baseline: it doesn't try to optimize out any GUC settings and sets up the GUCs before the two materialization paths in dblink. Something I forgot to ask about is about if an strangely-minded type input function could whack around the GUC as records are being remitted, which would necessitate per-tuple polling of pqParameterStatus (e.g. in the inner loop of a materialization) . That seemed pretty out there, but I'm broaching it for completeness. I'll see how much of a penalty it is vs. not applying any patch at all next. -- fdr dblink-guc-sensitive-types-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, Mar 19, 2013 at 3:52 PM, Greg Smith g...@2ndquadrant.com wrote: On 3/19/13 6:08 PM, Ants Aasma wrote: My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse than what you currently measured for a shared buffer swapping workload. That would be very bad. I want to keep hammering on this part of the implementation. If the only style of checksum that's computationally feasible is the Fletcher one that's already been done--if that approach is basically the most expensive one that's practical to use--I'd still consider that a major win over doing nothing. While being a lazy researcher today instead of writing code, I discovered that the PNG file format includes a CRC-32 on its data chunks, and to support that there's a CRC32 function inside of zlib: http://www.zlib.net/zlib_tech.html Is there anywhere that compiles a PostgreSQL --without-zlib that matters? I'm confused. Postgres includes a CRC32 implementation for WAL, does it not? Are you referring to something else? I happen to remember this because I moved some things around to enable third party programs (like xlogdump) to be separately compiled: http://www.postgresql.org/message-id/e1s2xo0-0004uv...@gemulon.postgresql.org -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina dan...@heroku.com wrote: On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. Doesn't seem that hard to test: run a dblink query that pulls back a bunch of data under best-case conditions (ie, local not remote server), and time it with and without the proposed fix. If the difference is marginal then it's not worth working hard to optimize. Okay, will do, and here's the shorter and less mechanically intensive naive version that I think is the baseline: it doesn't try to optimize out any GUC settings and sets up the GUCs before the two materialization paths in dblink. The results. Summary: seems like grabbing the GUC and strcmp-ing is worthwhile, but the amount of ping-ponging between processes adds some noise to the timing results: utilization is far short of 100% on either processor involved. Attached is a cumulative diff of the new version, and also reproduced below are the changes to v2 that make up v3. ## Benchmark SELECT dblink_connect('benchconn','dbname=contrib_regression'); CREATE FUNCTION bench() RETURNS integer AS $$ DECLARE iterations integer; BEGIN iterations := 0; WHILE iterations 30 LOOP PERFORM * FROM dblink('benchconn', 'SELECT 1') AS t(a int); iterations := iterations + 1; END LOOP; RETURN iterations; END; $$ LANGUAGE plpgsql; SELECT clock_timestamp() INTO TEMP beginning; SELECT bench(); SELECT clock_timestamp() INTO TEMP ending; SELECT 'dblink-benchmark-lines'; SELECT ending.clock_timestamp - beginning.clock_timestamp FROM beginning, ending; ## Data Setup CREATE TEMP TABLE bench_results (version text, span interval); COPY bench_results FROM STDIN WITH CSV; no-patch,@ 41.308122 secs no-patch,@ 36.63597 secs no-patch,@ 34.264119 secs no-patch,@ 34.760179 secs no-patch,@ 32.991257 secs no-patch,@ 34.538258 secs no-patch,@ 42.576354 secs no-patch,@ 39.335557 secs no-patch,@ 37.493206 secs no-patch,@ 37.812205 secs v2-applied,@ 36.550553 secs v2-applied,@ 38.608723 secs v2-applied,@ 39.415744 secs v2-applied,@ 46.091052 secs v2-applied,@ 45.425438 secs v2-applied,@ 48.219506 secs v2-applied,@ 43.514878 secs v2-applied,@ 45.892302 secs v2-applied,@ 48.479335 secs v2-applied,@ 47.632041 secs v3-strcmp,@ 32.524385 secs v3-strcmp,@ 34.982098 secs v3-strcmp,@ 34.487222 secs v3-strcmp,@ 44.394681 secs v3-strcmp,@ 44.638309 secs v3-strcmp,@ 44.113088 secs v3-strcmp,@ 45.497769 secs v3-strcmp,@ 33.530176 secs v3-strcmp,@ 32.9704 secs v3-strcmp,@ 40.84764 secs \. = SELECT version, avg(extract(epoch from span)), stddev(extract(epoch from span)) FROM bench_results GROUP BY version; version |avg | stddev ++-- no-patch | 37.1715227 | 3.17076487912923 v2-applied | 43.9829572 | 4.30572672565711 v3-strcmp | 38.7985768 | 5.54760393720725 ## Changes to v2: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2981,9 +2981,11 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn) static void applyRemoteGucs(remoteGucs *rgs) { - int i; const int numGucs = sizeof parseAffectingGucs / sizeof *parseAffectingGucs; + int i; + int addedGucNesting = false; + /* * Affected types require local GUC manipulations. Create a new * GUC NestLevel to overlay the remote settings. @@ -2992,14 +2994,27 @@ applyRemoteGucs(remoteGucs *rgs) * structure, so expect it to come with an invalid NestLevel. */ Assert(rgs-localGUCNestLevel == -1); - rgs-localGUCNestLevel = NewGUCNestLevel(); for (i = 0; i numGucs; i += 1) { + int gucApplyStatus; + const char *gucName = parseAffectingGucs[i]; const char *remoteVal = PQparameterStatus(rgs-conn, gucName); + const char *localVal = GetConfigOption(gucName, true, true); - int gucApplyStatus; + /* + * Attempt to avoid GUC setting if the remote and local GUCs + * already have the same value. + */ + if (strcmp(remoteVal, localVal) == 0) + continue; + + if (!addedGucNesting) + { + rgs-localGUCNestLevel = NewGUCNestLevel(); + addedGucNesting = true; + } gucApplyStatus = set_config_option(gucName, remoteVal, PGC_USERSET, PGC_S_SESSION, -- fdr dblink-guc-sensitive-types-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 6:10 PM, Daniel Farina dan...@heroku.com wrote: On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina dan...@heroku.com wrote: On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. Doesn't seem that hard to test: run a dblink query that pulls back a bunch of data under best-case conditions (ie, local not remote server), and time it with and without the proposed fix. If the difference is marginal then it's not worth working hard to optimize. Okay, will do, and here's the shorter and less mechanically intensive naive version that I think is the baseline: it doesn't try to optimize out any GUC settings and sets up the GUCs before the two materialization paths in dblink. The results. Summary: seems like grabbing the GUC and strcmp-ing is worthwhile, but the amount of ping-ponging between processes adds some noise to the timing results: utilization is far short of 100% on either processor involved. Attached is a cumulative diff of the new version, and also reproduced below are the changes to v2 that make up v3. I added programming around various NULL returns reading GUCs in this revision, v4. The non-cumulative changes: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -3005,8 +3005,22 @@ applyRemoteGucs(remoteGucs *rgs) /* * Attempt to avoid GUC setting if the remote and local GUCs * already have the same value. + * + * NB: Must error if the GUC is not found. */ - localVal = GetConfigOption(gucName, true, true); + localVal = GetConfigOption(gucName, false, true); + + if (remoteVal == NULL) + ereport(ERROR, + (errmsg(could not load parameter status of %s, + gucName))); + + /* + * An error must have been raised by now if GUC values could + * not be loaded for any reason. + */ + Assert(localVal != NULL); + Assert(remoteVal != NULL); if (strcmp(remoteVal, localVal) == 0) continue; -- fdr dblink-guc-sensitive-types-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 18, 2013 at 1:31 PM, Greg Smith g...@2ndquadrant.com wrote: On 3/18/13 10:52 AM, Bruce Momjian wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. If you survey people who are running PostgreSQL on cloud hardware, be it Amazon's EC2 or similar options from other vendors, you will find a high percentage of them would pay quite a bit of performance to make their storage more reliable. To pick one common measurement for popularity, a Google search on ebs corruption returns 17 million hits. To quote one of those, Baron Schwartz of Percona talking about MySQL on EC2: BTW, I have seen data corruption on EBS volumes. It’s not clear whether it was InnoDB’s fault (extremely unlikely IMO), the operating system’s fault, EBS’s fault, or something else. Clarification, because I think this assessment as delivered feeds some unnecessary FUD about EBS: EBS is quite reliable. Presuming that all noticed corruptions are strictly EBS's problem (that's quite a stretch), I'd say the defect rate falls somewhere in the range of volume-centuries. I want to point this out because I think EBS gets an outsized amount of public flogging, and not all of it is deserved. My assessment of the caustion at hand: I care about this feature not because EBS sucks more than anything else by a large degree, but because there's an ever mounting number of EBS volumes whose defects are under the responsibility of comparatively few individuals. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 18, 2013 at 7:13 PM, Greg Smith g...@2ndquadrant.com wrote: I wasn't trying to flog EBS as any more or less reliable than other types of storage. What I was trying to emphasize, similarly to your quite a stretch comment, was the uncertainty involved when such deployments fail. Failures happen due to many causes outside of just EBS itself. But people are so far removed from the physical objects that fail, it's harder now to point blame the right way when things fail. I didn't mean to imply you personally were going out of your way to flog EBS, but there is a sufficient vacuum in the narrative that someone could reasonably interpereted it that way, so I want to set it straight. The problem is the quantity of databases per human. The Pythons said it best: 'A simple question of weight ratios.' A quick example will demonstrate what I mean. Let's say my server at home dies. There's some terrible log messages, it crashes, and when it comes back up it's broken. Troubleshooting and possibly replacement parts follow. I will normally expect an eventual resolution that includes data like the drive showed X SMART errors or I swapped the memory with a similar system and the problem followed the RAM. I'll learn something about what failed that I might use as feedback to adjust my practices. But an EC2+EBS failure doesn't let you get to the root cause effectively most of the time, and that makes people nervous. Yes, the layering makes it tougher to do vertical treatment of obscure issues. Redundancy has often been the preferred solution here: bugs come and go all the time, and everyone at each level tries to fix what they can without much coordination from the layer above or below. There are hopefully benefits in throughput of progress at each level from this abstraction, but predicting when any one particular issue will go understood top to bottom is even harder than it already was. Also, I think the line of reasoning presented is biased towards a certain class of database: there are many, many databases with minimal funding and oversight being run in the traditional way, and the odds they'll get a vigorous root cause analysis in event of an obscure issue is already close to nil. Although there are other considerations at play (like not just leaving those users with nothing more than a bad block message), checksums open some avenues gradually benefit those use cases, too. I can already see how do checksums alone help narrow the blame? as the next question. I'll post something summarizing how I use them for that tomorrow, just out of juice for that tonight. Not from me. It seems pretty intuitive from here how database maintained checksums assist in partitioning the problem. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On Wed, Mar 6, 2013 at 6:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This one is for the archives, as I thought it surprising: there can be a surprisingly huge magnitude of performance difference of these algorithms depending on architecture. Here's a table reproduced from: http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9 testdata/alice29.txt : ZLIB:[b 1M] bytes 152089 - 54404 35.8% comp 0.8 MB/s uncomp 8.1 MB/s LZO: [b 1M] bytes 152089 - 82721 54.4% comp 14.5 MB/s uncomp 43.0 MB/s CSNAPPY: [b 1M] bytes 152089 - 90965 59.8% comp 2.1 MB/s uncomp 4.4 MB/s SNAPPY: [b 4M] bytes 152089 - 90965 59.8% comp 1.8 MB/s uncomp 2.8 MB/s testdata/asyoulik.txt: ZLIB:[b 1M] bytes 125179 - 48897 39.1% comp 0.8 MB/s uncomp 7.7 MB/s LZO: [b 1M] bytes 125179 - 73224 58.5% comp 15.3 MB/s uncomp 42.4 MB/s CSNAPPY: [b 1M] bytes 125179 - 80207 64.1% comp 2.0 MB/s uncomp 4.2 MB/s SNAPPY: [b 4M] bytes 125179 - 80207 64.1% comp 1.7 MB/s uncomp 2.7 MB/s LZO was ~8x faster compressing and ~16x faster decompressing. Only on uncompressible data was Snappy was faster: testdata/house.jpg : ZLIB:[b 1M] bytes 126958 - 126513 99.6% comp 1.2 MB/s uncomp 9.6 MB/s LZO: [b 1M] bytes 126958 - 127173 100.2% comp 4.2 MB/s uncomp 74.9 MB/s CSNAPPY: [b 1M] bytes 126958 - 126803 99.9% comp 24.6 MB/s uncomp 381.2 MB/s SNAPPY: [b 4M] bytes 126958 - 126803 99.9% comp 22.8 MB/s uncomp 354.4 MB/s So that's one more gotcha to worry about, since I surmise most numbers are being taken on x86. Apparently this has something to do with alignment of accesses. Some of it may be fixable by tweaking the implementation rather than the compression encoding, although I am no expert in the matter. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sun, Mar 17, 2013 at 5:50 PM, Greg Smith g...@2ndquadrant.com wrote: On the testing front, we've seen on-list interest in this feature from companies like Heroku and Enova, who both have some resources and practice to help testing too. Heroku can spin up test instances with workloads any number of ways. Enova can make a Londiste standby with checksums turned on to hit it with a logical replicated workload, while the master stays un-checksummed. I was thinking about turning checksums on for all new databases as long as I am able to turn them off easily, per my message prior: http://www.postgresql.org/message-id/caazkufzza+aw8zl4f_5c8t8zhrtjo3cm1ajqddglqcpez_3...@mail.gmail.com. An unstated assumption here was that I could apply the patch to 9.2 with some work. It seems the revitalized interest in the patch has raised a couple of issues on inspection that have yet to be resolved, so before moving I'd prefer to wait for a quiescence in the patch's evolution, as was the case for some time even after review. However, if we want to just hit 9.3dev with a bunch of synthetic traffic, that's probably doable also, and in some ways easier (or at least less risky). -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Okay, I see. So inverting the thinking I wrote earlier: how about hearkening carefully to any ParameterStatus messages on the local side before entering the inner loop of dblink.c:materializeResult as to set the local GUC (and carefully dropping it back off after materializeResult) so that the the _in functions can evaluate the input in the same relevant GUC context as the remote side? Yeah, watching the remote side's datestyle and intervalstyle and matching them (for both input and output) would probably work. Alright, so I've been whacking at this and there's one interesting thing to ask about: saving and restoring the local GUCs. There are a bunch of things about GUCs besides their value that are maintained, such as their 'source', so writing a little ad-hoc save/restore is not going to do the right thing. Luckily, something much closer to the right thing is done for SET LOCAL with transactions and subtransactions, to push and pop GUC contexts: guc.h: extern int NewGUCNestLevel(void); extern void AtEOXact_GUC(bool isCommit, int nestLevel); By and large looking at the mechanics of these two functions, the latter in particular has very little to do with the transaction machinery in general. It's already being called from a bunch of places that don't, to me, seem to really indicate a intrinsic connection to transactions, e.g. do_analyze_rel, ri_triggers, and postgres_fdw. I think in those cases the justification for settings of 'isCommit' is informed by the mechanism more than the symbol name or its comments. I count about ten call sites that are like this. So, I can add one more such use of AtEOXact_GUC for the dblink fix, but would it also be appropriate to find some new names for the concepts (instead of AtEOXact_GUC and isCommit) here to more accurately express what's going on? I'll perhaps do that after finishing up the dblink fix if I receive some positive response on the matter. However, if for some reason I *shouldn't* use that machinery in dblink, I'd appreciate the guidance. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Mon, Mar 11, 2013 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: I will try to make time for this, although it seems like the general approach should match pgsql_fdw if possible. Is the current thinking to forward the settings and then use the GUC hooks to track updates? That's not what I had in mind for postgres_fdw --- rather the idea is to avoid needing on-the-fly changes in remote-side settings, because those are so expensive to make. However, postgres_fdw is fortunate in that the SQL it expects to execute on the remote side is very constrained. dblink might need a different solution that would leave room for user-driven changes of remote-side settings. Okay, I see. So inverting the thinking I wrote earlier: how about hearkening carefully to any ParameterStatus messages on the local side before entering the inner loop of dblink.c:materializeResult as to set the local GUC (and carefully dropping it back off after materializeResult) so that the the _in functions can evaluate the input in the same relevant GUC context as the remote side? That should handle SET actions executed remotely. Otherwise it seems like a solution would have to be ambitious enough to encompass reifying the GUCs from the afflicted parsers, which I surmise is not something that we want to treat right now. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Mon, Mar 11, 2013 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, it strikes me that dblink is probably subject to at least some of these same failure modes. I'm not personally volunteering to fix any of this in dblink, but maybe someone ought to look into that. I will try to make time for this, although it seems like the general approach should match pgsql_fdw if possible. Is the current thinking to forward the settings and then use the GUC hooks to track updates? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Sun, Mar 10, 2013 at 12:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: There's a lot left to do here of course. One thing I was wondering about was why we don't allow DEFAULTs to be attached to foreign-table columns. There was no use in it before, but it seems sensible enough now. Hmm ... the buildfarm just rubbed my nose in a more immediate issue, which is that postgres_fdw is vulnerable to problems if the remote server is using different GUC settings than it is for things like timezone and datestyle. The failure seen here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rover_fireflydt=2013-03-10%2018%3A30%3A00 is basically just cosmetic, but it's not hard to imagine non-cosmetic problems coming up. For instance, suppose our instance is running in DMY datestyle and transmits an ambiguous date to a remote running in MDY datestyle. We could consider sending our settings to the remote at connection establishment, but that doesn't seem terribly bulletproof --- what if someone does a local SET later? What seems safer is to set the remote to ISO style always, but then we have to figure out how to get the local timestamptz_out to emit that style without touching our local GUC. Ugh. Forgive my naivety: why would a timestamptz value not be passed through the _in/_out function locally at least once (hence, respecting local GUCs) when using the FDW? Is the problem overhead in not wanting to parse and re-format the value on the local server? Although it seems that if GUCs affected *parsing* then the problem comes back again, since one may choose to canonicalize output from a remote server (e.g. via setting ISO time in all cases) but should the user have set up GUCs to interpret input in a particular way for a data type that is not enough. As-is this situation is already technically true for timestamptz in the case of time stamps without any explicit time offset or time zone, but since timestamptz_out doesn't ever render without the offset (right?) it's not a problem. Close shave, though, and one that an extension author could easily find themselves on the wrong side of. I suppose that means any non-immutable in/out function pair may have to be carefully considered, and the list is despairingly long...but finite: SELECT proname FROM pg_proc WHERE EXISTS (SELECT 1 FROM pg_type WHERE pg_proc.oid = pg_type.typinput OR pg_proc.oid = pg_type.typoutput OR pg_proc.oid = pg_type.typsend OR pg_proc.oid = pg_type.typreceive) AND provolatile != 'i'; (One more reason why GUCs that affect application-visible semantics are dangerous.) :( -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature will have to me: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. I see Heroku has corruption experience, and I know Jim Nasby has struggled with corruption in the past. More than a little: it has entered the realm of the routine, and happens frequently enough that it has become worthwhile to start looking for patterns. Our methods so far rely heavily on our archives to deal with it: it's time consuming but the 'simple' case of replaying WAL from some earlier base backup resulting in a non-corrupt database is easily the most common. Interestingly, the WAL has never failed to recover halfway through because of CRC failures while treating corruption[0]. We know this fairly convincingly because we constantly sample txid and wal positions while checking the database, as we typically do about every thirty seconds. I think this unreasonable effectiveness of this strategy of old backup and WAL replay might suggest that database checksums would prove useful. In my mind, the ways this formula could work so well if the bug was RAM or CPU based is slimmed considerably. [0] I have seen -- very rarely -- substantial periods of severe WAL corruption (files are not even remotely the correct size) propagated to the archives in the case of disaster recovery where the machine met its end because of the WAL disk being marked as dead. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 6, 2013 at 8:17 PM, Greg Smith g...@2ndquadrant.com wrote: TL;DR summary: on a system I thought was a fair middle of the road server, pgbench tests are averaging about a 2% increase in WAL writes and a 2% slowdown when I turn on checksums. There are a small number of troublesome cases where that overhead rises to closer to 20%, an upper limit that's shown up in a few tests aiming to stress this feature now. I have only done some cursory research, but cpu-time of 20% seem to expected for InnoDB's CRC computation[0]. Although a galling number, this comparison with other systems may be a way to see how much of that overhead is avoidable or just the price of entry. It's unclear how this 20% cpu-time compares to your above whole-system results, but it's enough to suggest that nothing comes for (nearly) free. [0]: http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. I think part of the difficulty here is that psql (if I understand this correctly) conflates the wire-format text representations with what should be displayed to the user. E.g., a different driver might parse the wire representation into a native representation, and then format that native representation when it is to be displayed. That's what the JDBC driver does, so it doesn't care about how the wire format actually looks. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I would hope that any driver cares about reproducing values exactly (or at least as exactly as the semantics of the client and server representations of the data type allow). Once you start talking operations, sure, things get a lot more complicated and you're better off not relying on any particular semantics. But IEEE 754 unambiguously defines certain bit patterns to correspond to certain values, no? If both client and server talk IEEE 754 floating point, it should be possible to round-trip values with no fuss and end up with the same bits you started with (and as far as I can tell, it is, as long as extra_float_digits is set to the max), even if the implementations of actual operations on these numbers behave very differently on client and server. I think given that many ORMs can cause UPDATEs on tuple fields that have not changed as part of saving an object, stable round trips seem like a desirable feature. I also find the rationale for extra_float digits quite mysterious for the same reason: why would most programs care about precision less than pg_dump does? If a client wants floating point numbers to look nice, I think the rendering should be on them (e.g. psql and pgadmin), and the default should be to expose whatever precision is available to clients that want an accurate representation of what is in the database. This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 4, 2013 at 1:22 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04.03.2013 23:00, Jeff Davis wrote: On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote: Yeah, fragmentation will certainly hurt some workloads. But how badly, and which workloads, and how does that compare with the work that PostgreSQL has to do to maintain the checksums? I'd like to see some data on those things. I think we all would. Btrfs will be a major filesystem in a few years, and we should be ready to support it. Perhaps we should just wait a few years? If we suspect that this becomes obsolete in a few years, it's probably better to just wait, than add a feature we'll have to keep maintaining. Assuming it gets committed today, it's going to take a year or two for 9.3 to get released and all the bugs ironed out, anyway. Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature will have to me: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. You are among the most well-positioned to make assessments of the cost of the feature, but I thought you might appreciate a perspective of the benefits, too. I think they're large, and for me they are the highest pole in the tent for what makes Postgres stressful to operate as-is today. It's a testament to the quality of the programming in Postgres that Postgres programming error is not the largest problem. For sense of reference, I think the next largest operational problem is the disruption caused by logical backups, e.g. pg_dump, and in particular its long running transactions and sessions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote: Attached is some bit rot updates to the checksums patches. The replace-tli one still works fine I rather badly want this feature, and if the open issues with the patch has hit zero, I'm thinking about applying it, shipping it, and turning it on. Given that the heap format has not changed, the main affordence I may check for is if I can work in backwards compatibility (while not maintaining the checksums, of course) in case of an emergency. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unarchived WALs deleted after crash
On Thu, Feb 21, 2013 at 12:39 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 21.02.2013 02:59, Daniel Farina wrote: On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggssi...@2ndquadrant.com wrote: On 15 February 2013 17:07, Heikki Linnakangashlinnakan...@vmware.com wrote: Unfortunately in HEAD, xxx.done file is not created when restoring archived file because of absence of the patch. We need to implement that first. Ah yeah, that thing again.. (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going to forward-port that patch now, before it's forgotten again. It's not clear to me what the holdup was on this, but whatever the bigger patch we've been waiting for is, it can just as well be done on top of the forward-port. Agreed. I wouldn't wait for a better version now. Related to this, how is this going to affect point releases, and are there any lingering doubts about the mechanism of the fix? Are you talking about the patch to avoid restored WAL segments from being re-archived (commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), or the bug that that unarchived WALs were deleted after crash (commit b5ec56f664fa20d80fe752de494ec96362eff520)? The former was included in 9.2.0 already, and the latter will be included in the next point release. Unarchived WALs being deleted after a crash is the one that worries me. I actually presume re-archivals will happen anyway because I may lose connection to archive storage after the WAL has already been committed, hence b5ec56f664fa20d80fe752de494ec96362eff520. This is quite serious given my reliance on archiving, so unless the thinking for point releases is 'real soon' I must backpatch and release it on my own accord until then. I don't know what the release schedule is. I take that to be a request to put out a new minor release ASAP. Perhaps, but it's more of a concrete evaluation of how important archiving is to me and my affiliated operation. An acceptable answer might be yeah, backpatch if you feel it's that much of a rush. Clearly, my opinion is that a gap in the archives is pretty cringe-inducing. I hit it from an out of disk case, and you'd be surprised (or perhaps not?) how many people like to kill -9 processes on a whim. I already maintain other backpatches (not related to fixes), and this one is only temporary, so it's not too much trouble for me. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unarchived WALs deleted after crash
On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggs si...@2ndquadrant.com wrote: On 15 February 2013 17:07, Heikki Linnakangas hlinnakan...@vmware.com wrote: Unfortunately in HEAD, xxx.done file is not created when restoring archived file because of absence of the patch. We need to implement that first. Ah yeah, that thing again.. (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going to forward-port that patch now, before it's forgotten again. It's not clear to me what the holdup was on this, but whatever the bigger patch we've been waiting for is, it can just as well be done on top of the forward-port. Agreed. I wouldn't wait for a better version now. Related to this, how is this going to affect point releases, and are there any lingering doubts about the mechanism of the fix? This is quite serious given my reliance on archiving, so unless the thinking for point releases is 'real soon' I must backpatch and release it on my own accord until then. Thanks for the attention paid to the bug report, as always. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unarchived WALs deleted after crash
On Thu, Feb 14, 2013 at 7:45 AM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: Hi, I am facing an unexpected behavior on a 9.2.2 cluster that I can reproduce on current HEAD. On a cluster with archive enabled but failing, after a crash of postmaster, the checkpoint occurring before leaving the recovery mode deletes any additional WALs, even those waiting to be archived. I believe I have encountered this recently, but didn't get enough chance to work with it to correspond. For me, the cause was out-of-disk on the file system that exclusively contained WAL, backlogged because archiving fell behind writing. This causes the cluster to crash -- par for the course -- but also an archive gap was created. At the time I thought there was some kind of bug in dealing with out of space issues in the archiver (the .ready bookkeeping), but the symptoms I saw seem like they might be explained by your report, too. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fractal tree indexing
On Wed, Feb 13, 2013 at 8:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: The basic idea of a fractal tree index is to attach a buffer to every non-leaf page. On insertion, instead of descending all the way down to the correct leaf page, the new tuple is put on the buffer at the root page. When that buffer fills up, all the tuples in the buffer are cascaded down to the buffers on the next level pages. And recursively, whenever a buffer fills up at any level, it's flushed to the next level. [ scratches head... ] What's fractal about that? Or is that just a content-free marketing name for this technique? The name in the literature is Cache Oblivious Lookahead Array, aka COLA. The authors also are founders of TokuTek, and seemed to have take pains to ring-fence mentions of the algorithm with reference to its patent. Well, at least nobody can blame them for submarine patent action. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Considering Gerrit for CFs
On Fri, Feb 8, 2013 at 2:23 AM, Magnus Hagander mag...@hagander.net wrote: On Fri, Feb 8, 2013 at 1:32 AM, Josh Berkus j...@agliodbs.com wrote: This is a few too many steps, and certainly appears completely broken to any newcomer. I agree it's way too many step. Several of those can certainly be made more efficient now that we have a more sane archives, well within the scope of the current system. I have thought it'd be 'nice' if pre-generated binaries and git repos were made for each submitted patch. Is there a nice-and-tidy way to paginate over list traffic from an external program over The Internet? Is there/could there be a nice pagination marker? Is that a reasonable step zero to making other things that consume email? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Considering Gerrit for CFs
On Wed, Feb 6, 2013 at 3:00 PM, Joshua D. Drake j...@commandprompt.com wrote: On 02/06/2013 01:53 PM, Tom Lane wrote: ... if it's going to try to coerce us out of our email-centric habits, then I for one am very much against it. To me, the problems with the existing CF app are precisely that it's not well enough integrated with the email discussions. The way to fix that is not to abandon email (or at least, it's not a way I wish to pursue). The email centric habits are by far the biggest limiting factor we have. Email was never designed for integral collaboration. That said, I see no way around it. I have brought up this idea before but, Redmine has by far served the purposes (with a little effort) of CMD and it also integrates with GIT. It also does not break the email work flow. It does not currently allow commands via email but that could be easily (when I say easily, I mean it) added. Just another thought. I don't think controlling things by email is the issue. I have used the usual litany of bug trackers and appreciate the correspondence-driven model that -hackers and -bugs uses pretty pleasant. If nothing else, the uniform, well-developed, addressable, and indexed nature of the archives definitely provides me a lot of value that I haven't really seen duplicated in other projects that use structured bug or patch tracking. The individual communications tend to be of higher quality -- particularly to the purpose of later reference -- maybe a byproduct of the fact that prose is on the pedestal. There are obvious tooling gaps (aren't there always?), but I don't really see the model as broken, and I don't think I've been around pgsql-hackers exclusively or extensively enough to have developed Stockholm syndrome. I also happen to feel that the commitfest application works rather well for me in general. Sure, I wish that it might slurp up all submitted patches automatically or something like that with default titles or something or identify new versions when they appear, but I've always felt that skimming the commitfest detail page for a patch was useful. It was perhaps harder to know if the commitfest page I was looking at was complete or up to date or not, although it often is. Here's what I find most gut-wrenching in the whole submit-review-commit process: * When a reviewer shows up a couple of weeks later and says this patch doesn't apply or make check crash or fails to compile. It's especially sad because the reviewer has presumably cut out a chunk of time -- now probably aborted -- to review the patch. Machines can know these things automatically. * When on occasion patches are submitted with non-C/sh/perl suites that may not really be something that anyone wants to be a build/source tree, but seem like they might do a better job testing the patch. The inevitable conclusion is that the automated test harness is tossed, or never written because it is known it will have no place to live after a possible commit. Somehow I wish those could live somewhere and run against all submitted patches. I've toyed a very, very tiny bit with running build farm clients on Heroku with both of these in mind, but haven't revisited it in a while. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
On Fri, Feb 1, 2013 at 2:22 AM, Daniel Farina dan...@heroku.com wrote: On Tue, Jan 29, 2013 at 1:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I noticed the v10 patch cannot be applied on the latest master branch cleanly. The attached ones were rebased. Anyway, I'm looking at the first patch, which applies neatly. Thanks. Hello, My review is incomplete, but to the benefit of pipelining I wanted to ask a couple of things and submit some changes for your consideration while continuing to look at this. So far, I've been trying to understand in very close detail how your changes affect non-FDW related paths first, before delving into the actual writable FDW functionality. There's not that much in this category, but there's one thing that gave me pause: the fact that the target list (by convention, tlist) has to be pushed down from planmain.c/query_planner all the way to a fdwroutine-GetForeignRelWidth callsite in plancat.c/get_relation_info (so even in the end, it is just pushed down -- never inspected or modified). In relative terms, it's a significant widening of currently fairly short parameter lists in these procedures: add_base_rels_to_query(PlannerInfo *root, List *tlist, Node *jtnode) build_simple_rel(PlannerInfo *root, int relid, List *tlist, RelOptKind reloptkind) get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, List *tlist, RelOptInfo *rel) In the case of all the other parameters except tlist, each is more intimately related with the inner workings and mechanisms of the procedure. I wonder if there's a nice way that can train the reader's eye that the tlist is simply pushed down rather than actively managed at each level. However, I can't immediately produce a way to both achieve that that doesn't seem overwrought. Perhaps a comment will suffice. Related to this, I found that make_modifytable grew a dependency on PlannerInfo *root, and it seemed like a bunch of the arguments are functionally related to that, so the attached patch that should be able to be applied to yours tries to utilize that as often as possible. The weirdest thing in there is how make_modifytable has been taught to call SS_assign_special_param, which has a side effect on the PlannerInfo, but there exist exactly zero cases where one *doesn't* want to do that prior to the (exactly two) call sites to make_modifytable, so I pushed it in. The second weirdest thing is pushing in the handling of rowMarks (e.g. SELECT FOR UPDATE et al) Let me know as to your thoughts, otherwise I'll keep moving on... -- fdr wfdw-rely-on-plannerinfo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2013 10:47 AM, Robert Haas wrote: The SQL standards considerations seem worth thinking about, too. We've certainly gone through a lot of pain working toward eliminating = as an operator name, and if the SQL standard has commandeered - for some purpose or other, I'd really rather not add to the headaches involved should we ever decide to reclaim it. OK, but I'd like to know what is going to be safe. There's no way to future-proof the language. I'm quite prepared to replace - with something else, and if I do then - will need to be adjusted accordingly, I think. My suggestion would be ~ and ~. I know David Wheeler didn't like that on the ground that some fonts elevate ~ rather than aligning it in the middle as most monospaced fonts do, but I'm tempted just to say then use a different font. Other possibilities that come to mind are + and +, although I think they're less attractive. But I'll be guided by the consensus, assuming there is one ;-) I suspect both of those are pretty safe from an SQL standards point of view. Of course, as Tom is often wont to point out, the SQL standards committee sometimes does bizarre things, so nothing's perfect, but I'd be rather shocked if any of those got tapped to mean something else. That having been said, I still don't see value in adding operators at all. Good old function call notation seems perfectly adequate from where I sit. Sure, it's a little more verbose, but when you try to too hard make things concise then you end up having to explain to your users why \ditS is a sensible thing for them to type into psql, or why s@\W@sprintf%%%02x,ord($)@e in Perl. I recognize that I may lose this argument, but I've worked with a couple of languages where operators can be overloaded (C++) or defined (ML) and it's just never seemed to work out very well. YMMV, of course. I also basically feel this way, although I know I tend more towards notational brutalism than many. I think we shouldn't kid ourselves that non-default operators will be used, and for current-implementation reasons (that maybe could be fixed by someone determined) it's not really at the pleasure of the author to use them via CREATE OPERATOR either. So, I basically subscribe to view that we should investigate what total reliance on prefix syntax looks like. I guess it'll make nested navigation horribly ugly, though...positively lisp-esque. That' s one consideration hstore doesn't have that may make use of infix notations considerably more useful for json than hstore. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2013 03:16 PM, Daniel Farina wrote: On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2013 10:47 AM, Robert Haas wrote: The SQL standards considerations seem worth thinking about, too. We've certainly gone through a lot of pain working toward eliminating = as an operator name, and if the SQL standard has commandeered - for some purpose or other, I'd really rather not add to the headaches involved should we ever decide to reclaim it. OK, but I'd like to know what is going to be safe. There's no way to future-proof the language. I'm quite prepared to replace - with something else, and if I do then - will need to be adjusted accordingly, I think. My suggestion would be ~ and ~. I know David Wheeler didn't like that on the ground that some fonts elevate ~ rather than aligning it in the middle as most monospaced fonts do, but I'm tempted just to say then use a different font. Other possibilities that come to mind are + and +, although I think they're less attractive. But I'll be guided by the consensus, assuming there is one ;-) I suspect both of those are pretty safe from an SQL standards point of view. Of course, as Tom is often wont to point out, the SQL standards committee sometimes does bizarre things, so nothing's perfect, but I'd be rather shocked if any of those got tapped to mean something else. That having been said, I still don't see value in adding operators at all. Good old function call notation seems perfectly adequate from where I sit. Sure, it's a little more verbose, but when you try to too hard make things concise then you end up having to explain to your users why \ditS is a sensible thing for them to type into psql, or why s@\W@sprintf%%%02x,ord($)@e in Perl. I recognize that I may lose this argument, but I've worked with a couple of languages where operators can be overloaded (C++) or defined (ML) and it's just never seemed to work out very well. YMMV, of course. I also basically feel this way, although I know I tend more towards notational brutalism than many. I think we shouldn't kid ourselves that non-default operators will be used, and for current-implementation reasons (that maybe could be fixed by someone determined) it's not really at the pleasure of the author to use them via CREATE OPERATOR either. So, I basically subscribe to view that we should investigate what total reliance on prefix syntax looks like. I guess it'll make nested navigation horribly ugly, though...positively lisp-esque. That' s one consideration hstore doesn't have that may make use of infix notations considerably more useful for json than hstore. We don't have the luxury of designing things like this in or out from scratch. Creation of operators has been a part of PostgreSQL for a good while longer than my involvement, and a great many people expect to be able to use it. I can just imagine the outrage at any suggestion of removing it. I am only referring to referring the restriction that the planner can't understand that fetchval() and '-' mean the same thing for, say, hstore. Hence, use of non-default CREATE OPERATOR may become more useful some day, instead of basically being a pitfall when someone reasonably thinks they could use either spelling of the same functionality and the optimizer will figure it out. I'm not suggesting removal of any feature. My reference to total reliance of prefix syntax refers only to the JSON operators, since the previous correspondence from Robert was about how function call syntax alone may be sufficient. This phrase refers to the same idea he is proposing. I also included a weakness to that idea, which is that nesting in JSON makes the situation worse than the common compared case, hstore. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
On Tue, Jan 29, 2013 at 1:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I noticed the v10 patch cannot be applied on the latest master branch cleanly. The attached ones were rebased. Hello, I'm just getting started looking at this, but notice that the second patch relies on contrib/postgres_fdw to apply, but it's not clear to me where to get the correct version of that. One way to solve this would be to tell me where to get a version of that, and another that may work well would be to relay a Git repository with postgres_fdw and the writable fdw changes accumulated. I poked around on git.postgresql.org and github and wasn't able to find a recent pushed copy of this. Anyway, I'm looking at the first patch, which applies neatly. Thanks. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/31/2013 05:06 PM, Peter Eisentraut wrote: I would like to not create any - operators, so that that syntax could be used in the future for method invocation or something similar (it's in the SQL standard). This is the first time I have heard that we should stay away from this. We have operators with this name in hstore, which is why I chose it. I'm not happy about this either. It's bad enough that we're thinking about taking away =, but to disallow - as well? My inclination is to just say no, we're not implementing that. Even if we remove the contrib operators named that way, it's insane to suppose that nobody has chosen these names for user-defined operators in their applications. I think it's smarter for us to ship functions, and let users wrap them in operators if they so choose. It's not difficult for people who want it to do it, and that way we dodge this whole mess. Normally I'd agree with you, but I think there's a complexity here that is very frown-inducing, although I'm not positively inclined to state that it's so great that your suggested solution is not the least of all evils: http://www.postgresql.org/message-id/8551.1331580...@sss.pgh.pa.us The problem being: even though pg_operator resolves to functions in pg_proc, they have distinct identities as far as the planner is concerned w.r.t selectivity estimation and index selection. Already there is a slight hazard that some ORMs that want to grow hstore support will spell it fetchval and others -. So far, infix syntax seems to be the common default, but a minor disagreement among ORMs or different user preferences will be destructive. Another way to look at this is that by depriving people multiple choices in the default install, that hazard goes away. But it also means that, practically, CREATE OPERATOR is going to be hazardous to use because almost all software is probably not going to assume the existence of any non-default installed operators for JSON, and those that do will not receive the benefit of indexes from software using the other notation. So, I think that if one takes the 'when in doubt, leave it out' approach you seem to be proposing, I also think that very little profitable use of CREATE OPERATOR will take place -- ORMs et al will choose the lowest common denominator (for good sensible reason) and flirting with CREATE OPERATOR will probably cause surprising plans, so I doubt it'll take hold. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch to add \watch to psql
I have adjusted this patch a little bit to take care of the review issues, along with just doing a bit of review myself. On Thu, Oct 25, 2012 at 2:25 AM, Will Leinweber w...@heroku.com wrote: Thanks for the reviews and comments. Responses inline: . On Sat, Oct 20, 2012 at 9:19 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: Maybe you should call it \repeat or something. I'm sure people would get around to using \watch that way eventually. :-) While I agree that clearing the screen would be nicer, I feel that watching the bottom of the screen instead of the top gets you 95% of the value of unix watch(1), and having the same name will greatly enhance discoverability. Perhaps later on if ncurses is linked for some other reason, this could take advantage of it then. That said, I'm not that strongly attached to the name one way or the other. The name \repeat has grown on me, but I haven't bothered renaming it for the time being. I think sameness with the familiar 'watch' program may not be such a big deal as I thought originally, but 'repeat' sounds a lot more like a kind of flow control for scripts, whereas \watch is more clearly for humans, which is the idea. On Wed, Oct 24, 2012 at 2:55 PM, Peter Eisentraut pete...@gmx.net wrote: This doesn't handle multiline queries: = \watch select 1 + ERROR: 42601: syntax error at end of input LINE 1: select + ^ I think to make it cooperate better with psql syntax, put the \watch at the end, as a replacement for \g, like I have implemented some kind of multi-line support. The rough part is in this part of the patch: + if (query_buf query_buf-len 0) + { + /* + * Check that the query in query_buf has been terminated. This + * is mostly consistent with behavior from commands like \g. + * The reason this is here is to prevent terrible things from + * occuring from submitting incomplete input of statements + * like: + * + * DELETE FROM foo + * \watch + * WHERE + * + * Wherein \watch will go ahead and send whatever has been + * submitted so far. So instead, insist that the user + * terminate the query with a semicolon to be safe. + */ + if (query_buf-data[query_buf-len - 1] == ';') What I found myself reaching for when giving up and writing this hack was a way to thread through the last lexer state of query_buf, which seems it could stand to be accrue a bit more information than being just a byte buffer. But this is the simplest possible thing, so I'll let others comment... -- fdr psql-watch-v2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch to add \watch to psql
On Thu, Jan 17, 2013 at 5:07 PM, Daniel Farina dan...@heroku.com wrote: I have adjusted this patch a little bit to take care of the review issues, along with just doing a bit of review myself. I realized while making my adjustments that I pointlessly grew some input checking in the inner loop. I just hoisted it out in this version. -- fdr psql-watch-v3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json api WIP patch
On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/15/2013 02:47 PM, Merlin Moncure wrote: On Tue, Jan 15, 2013 at 1:04 PM, David Fetter da...@fetter.org wrote: On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote: On 01/14/2013 07:36 PM, Merlin Moncure wrote: While testing this I noticed that integer based 'get' routines are zero based -- was this intentional? Virtually all other aspects of SQL are 1 based: postgres=# select json_get('[1,2,3]', 1); json_get -- 2 (1 row) postgres=# select json_get('[1,2,3]', 0); json_get -- 1 (1 row) Yes. it's intentional. SQL arrays might be 1-based by default, but JavaScript arrays are not. JsonPath and similar gadgets treat the arrays as zero-based. I suspect the Json-using community would not thank us for being overly SQL-centric on this - and I say that as someone who has always thought zero based arrays were a major design mistake, responsible for countless off-by-one errors. Perhaps we could compromise by making arrays 0.5-based. Well, I'm not prepared to argue with Andrew in this one. It was surprising behavior to me, but that's sample size one. I doubt I'm very representative either. People like David Wheeler, Taras Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than me. I'm quite prepared to change it if that's the consensus. Hello. I'm inclined to go with the same gut feeling you had (zero-based-indexing). Here is the background for my reasoning: The downside of zero-based-indexing is that people who want to use multiple sequential container types will inevitably have to deal with detailed and not easily type-checked integer coordinates that mean different things in each domain that will, no doubt, lead to a number of off-by-one errors. Nevertheless, this cost is already paid because one of the first things many people will do in programs generating SQL queries is try to zero-index a SQL array, swear a bit after figuring things out (because a NULL will be generated, not an error), and then adjust all the offsets. So, this is not a new problem. On many occasions I'm sure this has caused off-by-one bugs, or the NULLs slipped through testing and delivered funny results, yet the world moves on. On the other hand, the downside of going down the road of 1-based indexing and attempting to attain relative sameness to SQL arrays, it would also feel like one would be obliged to implement SQL array infelicities like 'out of bounds' being SQL NULL rather than an error, related to other spectres like non-rectangular nested arrays. SQL array semantics are complex and The Committee can change them or -- slightly more likely -- add interactions, so it seems like a general expectation that Postgres container types that happen to have any reasonable ordinal addressing will implement some level of same-ness with SQL arrays is a very messy one. As such, if it becomes customary to implement one-based indexing of containers, I think such customs are best carefully circumscribed so that attempts to be 'like' SQL arrays are only as superficial as that. What made me come down on the side of zero-based indexing in spite of the weaknesses are these two reasons: * The number of people who use JSON and zero-based-indexing is very large, and furthermore, within that set the number that know that SQL even defines array support -- much less that Postgres implements it -- is much smaller. Thus, one is targeting cohesion with a fairly alien concept that is not understood by the audience. * Maintaining PL integrated code that uses both 1-based indexing in PG functions and 0-based indexing in embedded languages that are likely to be combined with JSON -- doesn't sound very palatable, and the use of such PLs (e.g. plv8) seems pretty likely, too. That can probably be a rich source of bugs and frustration. If one wants SQL array semantics, it seems like the right way to get them is coercion to a SQL array value. Then one will receive SQL array semantics exactly. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 8, 2013 at 11:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? Daniel already did object To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 8, 2013 at 5:51 PM, Josh Berkus j...@agliodbs.com wrote: Daniel, To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I don't think it *should* fail. I think it should write a WARNING to the logs, to make it easy to debug if the cycle was created accidentally. Well, in the conversation so long ago that was more openly considered, which may not be true in the present era...just covering my old tracks exactly. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. Well, pg_conninfo will still be in postgresql.conf. But that doesn't help you if you're playing fast and loose with virtual IP addresses ... and arguably, people using Virtual IPs are more likely to accidentally create a cycle. That's a good point. Even simpler than virtual-IP is DNS, wherein the resolution can be rebound, but a pre-existing connection to an old IP will happily remain, and will be hard to know that from postgresql.conf and friends. I guess that means the hard case is when hot standby is not (yet) on, but the server is actively recovering WAL...UDFs are out, and scanning postgresql.conf is not necessarily an accurate picture of the situation. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Attached is a cumulative patch attempting to address the below. One can see the deltas to get there at https://github.com/fdr/postgres.git error-prop-pg_stat_statements-v2. On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: However, with this approach, calls_underest values might appear to the user in what might be considered an astonishing order. Now, I'm not suggesting that that's a real problem - just that they may not be the semantics we want, particularly as we can reasonably defer assigning a calls_underest until a sticky entry is unstuck, and an entry becomes user-visible, within pgss_store(). Fix for this as I understand it: *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 1036,1041 pgss_store(const char *query, uint32 queryId, --- 1036,1042 e-counters.usage = USAGE_INIT; e-counters.calls += 1; + e-counters.calls_underest = pgss-calls_max_underest; e-counters.total_time += total_time; e-counters.rows += rows; e-counters.shared_blks_hit += bufusage-shared_blks_hit; *** *** 1264,1272 entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky) /* set the appropriate initial usage count */ entry-counters.usage = sticky ? pgss-cur_median_usage : USAGE_INIT; - /* propagate calls under-estimation bound */ - entry-counters.calls_underest = pgss-calls_max_underest; - /* re-initialize the mutex each time ... we assume no one using it */ SpinLockInit(entry-mutex); /* ... and don't forget the query text */ --- 1265,1270 Also, it seems like you should initialise pgss-calls_max_underest, within pgss_shmem_startup(). Easy enough. Somehow I wrongly thought zero-initialization was a thing for the shmem functions. *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 426,431 pgss_shmem_startup(void) --- 426,432 { /* First time through ... */ pgss-lock = LWLockAssign(); + pgss-calls_max_underest = 0; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } You should probably serialise the value to disk, and initialise it to 0 if there is no such value to begin with. I prefer different approach here: just compute it while loading the entries from disk, since the calls + underestimation can be used to find a new pessimum underestimation global value. *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 419,424 pgss_shmem_startup(void) --- 419,425 int query_size; int buffer_size; char *buffer = NULL; + int64 calls_max_underest = 0; if (prev_shmem_startup_hook) prev_shmem_startup_hook(); *** *** 440,446 pgss_shmem_startup(void) { /* First time through ... */ pgss-lock = LWLockAssign(); ! pgss-calls_max_underest = 0; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } --- 441,447 { /* First time through ... */ pgss-lock = LWLockAssign(); ! pgss-calls_max_underest = calls_max_underest; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } *** *** 528,533 pgss_shmem_startup(void) --- 529,545 temp.query_len, query_size - 1); + /* +* Compute maxima of under-estimation over the read entries +* for reinitializing pgss-calls_max_underest. +*/ + { + int64 cur_underest; + + cur_underest = temp.calls + temp.calls_underest; + calls_max_underest = Max(calls_max_underest, cur_underest); + } + /* make the hashtable entry (discards old entries if too many) */ entry = entry_alloc(temp.key, buffer, temp.query_len, false); *** *** 535,540 pgss_shmem_startup(void) --- 547,559 entry-counters = temp.counters; } + /* +* Reinitialize global under-estimation
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina drfar...@acm.org wrote: These were not express goals of the patch, but so long as you are inviting features, attached is a bonus patch that exposes the queryid and also the notion of a statistics session that is re-rolled whenever the stats file could not be read or the stats are reset, able to fully explain all obvious causes of retrograde motion in statistics. It too is cumulative, so it includes the under-estimation field. Notably, I also opted to nullify extra pg_stat_statements fields when they'd also show insufficient privileges (that one is spared from this censorship), because I feel as though a bit too much information leaks from pg_stat_statement's statistics to ignore, especially after adding the query id. Since the common theme here is identifying queries, I have called it pg_stat_statements-identification, and it can be found in the git repo above under the same name (...-v1). A small amendment that doesn't really change the spirit of the narrative is attached. -- fdr pg_stat_statements-identification-v2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote: These were not express goals of the patch, but so long as you are inviting features, attached is a bonus patch that exposes the queryid and also the notion of a statistics session that is re-rolled whenever the stats file could not be read or the stats are reset, able to fully explain all obvious causes of retrograde motion in statistics. It too is cumulative, so it includes the under-estimation field. Cool. I had a thought about Tom's objection to exposing the hash value. I would like to propose a compromise between exposing the hash value and not doing so at all. As I recall, the gist of this objection had to do with a false sense of stability of the hash value, and the desire to enforce the ability to alter it. Here's an option: xor the hash value with the 'statistics session id', so it's *known* to be unstable between sessions. That gets you continuity in the common case and sound deprecation in the less-common cases (crashes, format upgrades, stat resetting). A change in hash function should also then necessitate changing the stat file header. Another more minor issue is the hard-wiring of the precision of the id. For that reason, I have thought it reasonable to expose this as a string also. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers