Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Hi, On 28/01/14 22:35, Tom Lane wrote: Absolutely. Probably best to save errno into a local just before the ereport. I think just resetting to edata-saved_errno is better and sufficient? -1 --- that field is nobody's business except elog.c's. Ok, so I propose the attached patch as a fix. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 8705586..f40215a 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -715,6 +715,7 @@ errcode_for_socket_access(void) { \ char *fmtbuf; \ StringInfoData buf; \ + int saved_errno = errno; \ /* Internationalize the error format string */ \ if (translateit !in_error_recursion_trouble()) \ fmt = dgettext((domain), fmt); \ @@ -744,6 +745,7 @@ errcode_for_socket_access(void) pfree(edata-targetfield); \ edata-targetfield = pstrdup(buf.data); \ pfree(buf.data); \ + errno = saved_errno; \ } /* @@ -756,6 +758,7 @@ errcode_for_socket_access(void) const char *fmt; \ char *fmtbuf; \ StringInfoData buf; \ + int saved_errno = errno; \ /* Internationalize the error format string */ \ if (!in_error_recursion_trouble()) \ fmt = dngettext((domain), fmt_singular, fmt_plural, n); \ @@ -787,6 +790,7 @@ errcode_for_socket_access(void) pfree(edata-targetfield); \ edata-targetfield = pstrdup(buf.data); \ pfree(buf.data); \ + errno = saved_errno; \ } pgpnd3GqyaztU.pgp Description: PGP signature
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 28th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thank you for trying to test my patch. As you say, recently commit changes pg_stat_statements.c a lot. So I have to revise my patch. Please wait for a while. By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. It works well in Linux system, but I'm not sure in Windows system. If you have time, could you test it on your Windows system? If it affects perfomance a lot, we can still change it. No Issue, you can share me the test cases, I will take the performance report. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: regexp_matches variant returning an array of matching positions
I'll elaborate on the use case. I have OCR scanned text for a large amounts of images, corresponding to one row per image. I want to match against words in another table. I need two results sets, one with all matched words and one with only the first matched word within the first 50 chars of the OCR scanned text. Having the matched position in the first result set makes it easy to produce the second. I cannot find the position using the substring because I use word boundaries in my regexp. Returning a SETOF named composite makes sense, so I could try to make such a function instead if there is interest. Perhaps a good name for such a function would be simply regexp_match och regexp_search (as in python). /Björn 2014-01-29 David Johnston pol...@yahoo.com Alvaro Herrera-9 wrote Björn Harrtell wrote: I've written a variant of regexp_matches called regexp_matches_positions which instead of returning matching substrings will return matching positions. I found use of this when processing OCR scanned text and wanted to prioritize matches based on their position. Interesting. I didn't read the patch but I wonder if it would be of more general applicability to return more info in a fell swoop a function returning a set (position, length, text of match), rather than an array. So instead of first calling one function to get the match and then their positions, do it all in one pass. (See pg_event_trigger_dropped_objects for a simple example of a function that returns in that fashion. There are several others but AFAIR that's the simplest one.) Confused as to your thinking. Like regexp_matches this returns SETOF type[]. In this case integer but text for the matches. I could see adding a generic function that returns a SETOF named composite (match varchar[], position int[], length int[]) and the corresponding type. I'm not imagining a situation where you'd want the position but not the text and so having to evaluate the regexp twice seems wasteful. The length is probably a waste though since it can readily be gotten from the text and is less often needed. But if it's pre-calculated anyway... My question is what position is returned in a multiple-match situation? The supplied test only covers the simple, non-global, situation. It needs to exercise empty sub-matches and global searches. One theory is that the first array slot should cover the global position of match zero (i.e., the entire pattern) within the larger document while sub-matches would be relative offsets within that single match. This conflicts, though, with the fact that _matches only returns array elements for () items and never for the full match - the goal in this function being parallel un-nesting. But as nesting is allowed it is still possible to have occur. How does this resolve in the patch? SELECT regexp_matches('abcabc','((a)(b)(c))','g'); David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789414.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] proposal: hide application_name from other users
On Tue, Jan 28, 2014 at 8:17 PM, Stephen Frost sfr...@snowman.net wrote: Greg, * Greg Stark (st...@mit.edu) wrote: On Tue, Jan 28, 2014 at 11:56 AM, Josh Berkus j...@agliodbs.com wrote: For example, I would really like to GRANT an unpriv user access to the WAL columns in pg_stat_replication so that I can monitor replication delay without granting superuser permissions. So you can do this now by defining a security definer function that extracts precisely the information you need and grant execute access to precisely the users you want. There was some concern upthread about defining security definer functions being tricky but I'm not sure what conclusion to draw from that argument. Yeah, but that sucks if you want to build a generic monitoring system like check_postgres.pl. Telling users to grant certain privileges may work out, telling them to install these pl/pgsql things you write as security-definer-to-superuser isn't going to be nearly as easy when these users are (understandably, imv) uncomfortable having a monitor role have superusr privs. I couldn't agree more. Whatever we do here we need a standard mechanism that tool developers can expect to be present and the same on all servers. Otherwise, we make it extremely difficult to build tools like pgAdmin, check_postgres.pl and so on. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 28 January 2014 20:16, Florian Pflug f...@phlo.org wrote: On Jan27, 2014, at 23:28 , Dean Rasheed dean.a.rash...@gmail.com wrote: strict transfn vs non-strict inv_transfn This case is explicitly forbidden, both in CREATE AGGREGATE and in the executor. To me, that seems overly restrictive --- if transfn is strict, then you know for sure that no NULL values are added to the aggregate state, so surely it doesn't matter whether or not inv_transfn is strict. It will never be asked to remove NULL values. I think there are definite use-cases where a user might want to use a pre-existing function as the inverse transition function, so it seems harsh to force them to wrap it in a strict function in this case. AFAICS, the code in advance/retreat_windowaggregate should just work if those checks are removed. True. It didn't use to in earlier version of the patch because the advance and retreat functions looked at the strict settings directly, but now that there's an ignore_nulls flag in the executor node, the only requirement should be that ignore_nulls is set if either of the transition functions is strict. I'm not sure that the likelihood of someone wanting to combine a strict forward with a non-strict inverse function is very hight, but neither Me neither, but the checks to forbid it aren't adding anything, and I think it's best to make it as flexible as possible. non-strict transfn vs strict inv_transfn At first this seems as though it must be an error --- the forwards transition function allows NULL values into the aggregate state, and the inverse transition function is strict, so it cannot remove them. But actually what the patch is doing in this case is treating the forwards transition function as partially strict --- it won't be passed NULL values (at least in a window context), but it may be passed a NULL state in order to build the initial state when the first non-NULL value arrives. Exactly. It looks like this behaviour is intended to support aggregates that ignore NULL values, but cannot actually be made to have a strict transition function. I think typically this is because the aggregate's initial state is NULL and it's state type differs from the type of the values being aggregated, and so can't be automatically created from the first non-NULL value. Yes. I added this because the alternative would haven been to count non-NULL values in most of the existing SUM() aggregates. That all seems quite ugly though, because now you have a transition function that is not strict, which is passed NULL values when used in a normal aggregate context, and not passed NULL values when used in a window context (whether sliding or not), except for the NULL state for the first non-NULL value. Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does and skip NULL inputs if the aggregate has a strict inverse transition function. That fact that we never actually *call* the inverse doesn't matter. Will fix that once we decided on the various strictness issues. Yuk! I'm not sure if there is a better way to do it though. If we disallow this case, these aggregates would have to use non-strict functions for both the forward and inverse transition functions, which means they would have to implement their own non-NULL value counting. Alternatively, allowing strict transition functions for these aggregates would require that we provide some other way to initialise the state from the first non-NULL input, such as a new initfn. I'm not sure an initfn would really help. It would allow us to return to the initial requirement that the strict settings match - but you already deem the check that the forward transition function can only be strict if the inverse is also overly harsh, so would that really be an improvement? It's also cost us an additional pg_proc entry per aggregate. Another idea would be to have an explicit nulls=ignore|process option for CREATE AGGREGATE. If nulls=process and either of the transition functions are strict, we could either error out, or simply do what normal functions calls do and pretend they return NULL for NULL inputs. Not sure how the rule that forward transition functions may not return NULL if there's an inverse transition function would fit in if we do the latter, though. The question is - is it worth it the effort to add that flag? Yeah, I thought about a flag too. I think that could work quite nicely. Basically where I'm coming from is trying to make this as flexible as possible, without pre-judging the kinds of aggregates that users may want to add. One use-case I had in mind upthread was suppose you wanted to write a custom version of array_agg that only collected non-NULL values. With such a flag, that would be trivial, but with the current patch you'd have to (count-intuitively) wrap the inverse transition
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 9:03 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2014/01/29 15:51), Tom Lane wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. There are no fflush calls, and no notion of warming the file cache either. Oh, all right. We do assume that the OS is smart enough to keep a frequently-read file in cache ... is Windows too stupid for that? I don't know about it. But I think Windows cache feature is stupid. It seems to always write/read data to/from disk, nevertheless having large memory... I'd like to know test result on Windows, if we can... I am quite certain the Windows cache is *not* that stupid, and hasn't been since the Windows 3.1 days. If you want to make certain, set FILE_ATTRIBUTE_TEMPORARY when the file is opened, then it will work really hard not to write it to disk - harder than most Linux fs'es do AFAIK. This should of course only be done if we don't mind it going away :) As per port/open.c, pg will set this when O_SHORT_LIVED is specified. But AFAICT, we don't actually use this *anywhere* in the backend? Perhaps we should for this - and also for the pgstat files? (I may have missed a codepath, only had a minute to do some greping) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Triggers on foreign tables
Hello, I initially tried to keep track of them by allocating read pointers on the tuple store, but it turned out to be so expensive that I had to find another way (24bytes per stored tuple, which are not reclaimable until the end of the transaction). What do you think about this approach ? Is there something I missed which would make it not sustainable ? It seems to me reasonable approach to track them. Just a corner case, it may cause an unexpected problem if someone tried to update a foreign table with 2^31 of tuples because of int index. It may make sense to put a check fdw_nextwrite is less than INT_MAX. :-) I have a few other minor comments: +static HeapTuple +ExtractOldTuple(TupleTableSlot *planSlot, + ResultRelInfo *relinfo) +{ + boolisNull; + JunkFilter *junkfilter = relinfo-ri_junkFilter; + HeapTuple oldtuple = palloc0(sizeof(HeapTupleData)); + HeapTupleHeader td; + Datum datum = ExecGetJunkAttribute(planSlot, +junkfilter-jf_junkAttNo, +isNull); + + /* shouldn't ever get a null result... */ + if (isNull) + elog(ERROR, wholerow is NULL); + td = DatumGetHeapTupleHeader(datum); + (*oldtuple).t_len = HeapTupleHeaderGetDatumLength(td); + (*oldtuple).t_data = td; + return oldtuple; +} + Why not usual coding manner as: oldtuple-t_len = HeapTupleHeaderGetDatumLength(td); oldtuple-t_data = td; Also, it don't put tableOid on the tuple. oldtuple-t_tableOid = RelationGetRelid(relinfo-ri_RelationDesc); @@ -730,6 +738,45 @@ rewriteTargetListIU(Query *parsetree, Relation target_relation, + /* +* For foreign tables, build a similar array for returning tlist. +*/ + if (need_full_returning_tlist) + { + returning_tles = (TargetEntry **) palloc0(numattrs * sizeof(TargetEntry *)); + foreach(temp, parsetree-returningList) + { + TargetEntry *old_rtle = (TargetEntry *) lfirst(temp); + + if (IsA(old_rtle-expr, Var)) + { + Var*var = (Var *) old_rtle-expr; + + if (var-varno == parsetree-resultRelation) + { + attrno = var-varattno; + if (attrno 1 || attrno numattrs) + elog(ERROR, bogus resno %d in targetlist, attrno); This checks caused an error when returning list contains a reference to system column; that has negative attribute number. Probably, it should be continue;, instead of elog(). BTW, isn't it sufficient to inhibit optimization by putting whole-row-reference here, rather than whole-row-reference. Postgres_fdw extracts whole-row-reference into individual columns reference. + att_tup = target_relation-rd_att-attrs[attrno - 1]; + /* We can (and must) ignore deleted attributes */ + if (att_tup-attisdropped) + continue; + returning_tles[attrno - 1] = old_rtle; + } + } + } + } + Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ronan Dunklau Sent: Thursday, January 23, 2014 11:18 PM To: Noah Misch Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Triggers on foreign tables Thank you very much for this review. The need here is awfully similar to a need of INSTEAD OF triggers on views. For them, we add a single wholerow resjunk TLE. Is there a good reason to do it differently for triggers on foreign tables? I wasn't aware of that, it makes for some much cleaner code IMO. Thanks. - for after triggers, the whole queuing mechanism is bypassed for foreign tables. This is IMO acceptable, since foreign tables cannot have constraints or constraints triggers, and thus have not need for deferrable execution. This design avoids the need for storing and retrieving/identifying remote tuples until the query or transaction end. Whether an AFTER ROW trigger is deferred determines whether it runs at the end of the firing query or at the end of the firing query's transaction. In all cases, every BEFORE ROW trigger of a given query fires before any AFTER ROW trigger of the same query. SQL requires that. This proposal would give foreign table AFTER ROW triggers a novel firing time; let's not do that. I think the options going forward are either (a) design a way to queue foreign table AFTER ROW triggers such that we can get the old and/or new rows at the end of the query or (b) not support AFTER ROW triggers on foreign tables for the time being. I did not know this was mandated by the standard. The attached patch tries to solve this problem by allocating a tuplestore in the global afterTriggers structure. This tuplestore is used for
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 8:58 AM, Peter Geoghegan p...@heroku.com wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Not (at this point) commenting on the details, but a big +1 on the fact that the overhead is low is a *very* important property. If the overhead starts growing it will be uninstalled - and that will make things even harder to diagnose. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Changeset Extraction v7.3
Andreas Karlsson wrote: On 01/28/2014 10:56 PM, Andres Freund wrote: On 2014-01-28 21:48:09 +, Thom Brown wrote: On 28 January 2014 21:37, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 28, 2014 at 11:53 AM, Robert Haas robertmh...@gmail.com wrote: The point of Andres's patch set is to introduce a new technology called logical decoding; that is, the ability to get a replication stream that is based on changes to tuples rather than changes to blocks. It could also be called logical replication. In these patches, our existing replication is referred to as physical replication, which sounds kind of funny to me. Anyone have another suggestion? Logical and Binary replication? Unfortunately changeset extraction output's can be binary data... I think physical and logical are fine and they seem to be well known terminology. Oracle uses those words and I have also seen many places use physical backup and logical backup, for example on Barman's homepage. +1 I think it also fits well with the well-known terms physical [database] design and logical design. Not that it is the same thing, but I believe that every database person, when faced with the distiction physical versus logical, will conclude that the former refers to data placement or block structure, while the latter refers to the semantics of the data being stored. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/29/2014 03:34 PM, Kouhei Kaigai wrote: Kouhei Kaigai kai...@ak.jp.nec.com writes: Let me ask an elemental question. What is the reason why inheritance expansion logic should be located on the planner stage, not on the tail of rewriter? I think it's mostly historical. You would however have to think of a way to preserve the inheritance relationships in the parsetree representation. In the current code, expand_inherited_tables() adds AppendRelInfo nodes to the planner's data structures as it does the expansion; but I don't think AppendRelInfo is a suitable structure for the rewriter to work with, and in any case there's no place to put it in the Query representation. Actually though, isn't this issue mostly about inheritance of a query *target* table? Moving that expansion to the rewriter is a totally different and perhaps more tractable change. It's certainly horribly ugly as it is; hard to see how doing it at the rewriter could be worse. It's just an idea, so might not be a deep consideration. Isn't ii available to describe a parse tree as if some UPDATE/DELETE statements are combined with UNION ALL? Of course, even if it is only internal form. UPDATE parent SET x = 2*x, y = y || '_update' WHERE x % 10 = 5 UNION ALL UPDATE child_1 SET x = 2*x, y = y || '_update' WHERE x % 10 = 5 : Right now, only SELECT statement is allowed being placed under set-operations. If rewriter can expand inherited relations into multiple individual selects with UNION ALL, it may be a reasonable internal representation. In similar way, both of UPDATE/DELETE takes a scan on relation once, then it modifies the target relation. Probably, here is no significant different on the earlier half; that performs as if multiple SELECTs with UNION ALL are running, except for it fetches ctid system column as junk attribute and acquires row-level locks. On the other hand, it may need to adjust planner code to construct ModifyTable node running on multiple relations. Existing code pulls resultRelations from AppendRelInfo, doesn't it? It needs to take the list of result relations using recursion of set operations, if not flatten. Once we can construct ModifyTable with multiple relations on behalf of multiple relation's scan, here is no difference from what we're doing now. How about your opinion? My worry here is that a fair bit of work gets done before inheritance expansion. Partitioning already performs pretty poorly for anything but small numbers of partitions. If we're expanding inheritance in the rewriter, won't that further increase the already large amount of duplicate work involved in planning a query that involves inheritance? Or am I misunderstanding you? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add force option to dropdb
Hello Robert, I'm not particularly in favor of implementing this as client-side functionality, because then it's only available to people who use that particular client. Simon Riggs proposed a server-side option to the DROP DATABASE command some time ago, and I think that might be the way to go. Could you please direct me -if possible- to the thread. I think,implementing it on the client side gives the user the some visibility and control. Initially, I wanted to force drop the database, then I have changed it to kill connections. I think the change in the client tool, is simple and covers the main reason for not being able to drop a database. I think, killing client connection is one of the FAQs. OTOH, having an option like DROP DATABASE [IF EXISTS, FORCE] database is more crisp. However, what does force mean? many options exist such as killing the connections gently, waiting for connections to terminate, killing connections immediately. Also, as Alvaro Herrera mentioned, DROP OWNED BY and/or REASSIGNED OWNED BY might hinder the force option -an example here would be nice-. So, for quick wins, I prefer the kill option in the client side; but, for mature solution , certainly back-end is the way to proceed. Regards
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 01/28/2014 07:01 PM, Heikki Linnakangas wrote: On 01/27/2014 07:03 PM, Amit Kapila wrote: I have tried to improve algorithm in another way so that we can get benefit of same chunks during find match (something similar to lz). The main change is to consider chunks at fixed boundary (4 byte) and after finding match, try to find if there is a longer match than current chunk. While finding longer match, it still takes care that next bigger match should be at chunk boundary. I am not completely sure about the chunk boundary may be 8 or 16 can give better results. Since you're only putting a value in the history every 4 bytes, you wouldn't need to calculate the hash in a rolling fashion. You could just take next four bytes, calculate hash, put it in history table. Then next four bytes, calculate hash, and so on. Might save some cycles when building the history table... On a closer look, you're putting a chunk in the history table only every four bytes, but you're *also* checking the history table for a match only every four bytes. That completely destroys the shift-resistence of the algorithm. For example, if the new tuple is an exact copy of the old tuple, except for one additional byte in the beginning, the algorithm would fail to recognize that. It would be good to add a test case like that in the test suite. You can skip bytes when building the history table, or when finding matches, but not both. Or you could skip N bytes, and then check for matches for the next four bytes, then skip again and so forth, as long as you always check four consecutive bytes (because the hash function is calculated from four bytes). I couldn't resist the challenge, and started hacking this. First, some observations from your latest patch (pgrb_delta_encoding_v5.patch): 1. There are a lot of comments and code that refers to chunks, which seem obsolete. For example, ck_size field in PGRB_HistEntry is always set to a constant, 4, except maybe at the very end of the history string. The algorithm has nothing to do with Rabin-Karp anymore. 2. The 'hindex' field in PGRB_HistEntry is unused. Also, ck_start_pos is redundant with the index of the entry in the array, because the array is filled in order. That only leaves us just the 'next' field, and that can be represented as a int16 rather than a pointer. So, we really only need a simple int16 array as the history entries array. 3. You're not gaining much by calculating the hash in a rolling fashion. A single rolling step requires two multiplications and two sums, plus shifting the variables around. Calculating the hash from scratch requires three multiplications and three sums. 4. Given that we're not doing the Rabin-Karp variable-length chunk thing, we could use a cheaper hash function to begin with. Like, the one used in pglz. The multiply-by-prime method probably gives fewer collisions than pglz's shift/xor method, but I don't think that matters as much as computation speed. No-one has mentioned or measured the effect of collisions in this thread; that either means that it's a non-issue or that no-one's just realized how big a problem it is yet. I'm guessing that it's not a problem, and if it is, it's mitigated by only trying to find matches every N bytes; collisions would make finding matches slower, and that's exactly what skipping helps with. After addressing the above, we're pretty much back to PGLZ approach. I kept the change to only find matches every four bytes, that does make some difference. And I like having this new encoding code in a separate file, not mingled with pglz stuff, it's sufficiently different that that's better. I haven't done all much testing with this, so take it with a grain of salt. I don't know if this is better or worse than the other patches that have been floated around, but I though I might as well share it.. - Heikki diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index e0b8a4e..c4ac2bd 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1014,6 +1014,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI /listitem /varlistentry + varlistentry +termliteralwal_compress_update/ (typeboolean/)/term +listitem + para + Enables or disables the WAL tuple compression for commandUPDATE/ + on this table. Default value of this option is false to maintain + backward compatability for the command. If true, all the update + operations on this table which will place the new tuple on same page + as it's original tuple will compress the WAL for new tuple and + subsequently reduce the WAL volume. It is recommended to enable + this option for tables where commandUPDATE/ changes less than + 50 percent of tuple data. + /para + /listitem +/varlistentry + /variablelist /refsect2 diff --git
Re: [HACKERS] patch: option --if-exists for pg_dump
Hi Pavel, Now the patch looks good to me. However when I try to restore your own sql file's dump, I get following errors: pg_restore: [archiver (db)] could not execute query: ERROR: relation public.emp does not exist Command was: DROP TRIGGER IF EXISTS emp_insert_trigger ON public.emp; pg_restore: [archiver (db)] could not execute query: ERROR: schema myschema does not exist Command was: DROP FUNCTION IF EXISTS myschema.int_to_date(integer); Is that expected after your patch ? Also, I didn't quite understand these lines of comments: /* * Descriptor string (te-desc) should not be same as object * specifier for DROP STATEMENT. The DROP DEFAULT has not * IF EXISTS clause - has not sense. */ Will you please rephrase ? Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] patch: option --if-exists for pg_dump
2014-01-29 Jeevan Chalke jeevan.cha...@enterprisedb.com Hi Pavel, Now the patch looks good to me. However when I try to restore your own sql file's dump, I get following errors: pg_restore: [archiver (db)] could not execute query: ERROR: relation public.emp does not exist Command was: DROP TRIGGER IF EXISTS emp_insert_trigger ON public.emp; pg_restore: [archiver (db)] could not execute query: ERROR: schema myschema does not exist Command was: DROP FUNCTION IF EXISTS myschema.int_to_date(integer); Is that expected after your patch ? it should be fixed by http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b152c6cd0de1827ba58756e24e18110cf902182acommit Also, I didn't quite understand these lines of comments: /* * Descriptor string (te-desc) should not be same as object * specifier for DROP STATEMENT. The DROP DEFAULT has not * IF EXISTS clause - has not sense. */ Will you please rephrase ? I can try it - . A content of te-desc is usually substring of DROP STATEMENT with one related exception - CONSTRAINT. Independent to previous sentence - ALTER TABLE ALTER COLUMN DROP DEFAULT doesn't support IF EXISTS - and therefore it should not be injected. Regards Pavel Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 28 January 2014 21:28, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 28, 2014 at 5:02 AM, Simon Riggs si...@2ndquadrant.com wrote: I agree that this is being seen the wrong way around. The planner contains things it should not do, and as a result we are now discussing enhancing the code that is in the wrong place, which of course brings objections. I think we would be best served by stopping inheritance in its tracks and killing it off. It keeps getting in the way. What we need is real partitioning. Other uses are pretty obscure and we should re-examine things. I actually think that inheritance is a pretty good foundation for real partitioning. If we were to get rid of it, we'd likely end up needing to add most of the same functionality back when we tried to do some kind of real partitioning later, and that doesn't sound fun. I don't see any reason why some kind of partitioning syntax couldn't be added that leverages the existing inheritance mechanism but stores additional metadata allowing for better optimization. Well... I'm lying, a little bit. If our chosen implementation of real partitioning involved some kind of partition objects that could be created and dropped but never directly accessed via DML commands, then we might not need anything that looks like the current planner support for partitioned tables. But I think that would be a surprising choice for this community. IMV, the problem with the planner and inheritance isn't that there's too much cruft in there already, but that there are still key optimizations that are missing. Still, I'd rather try to fix that than start over. In the absence of that, releasing this updateable-security views without suppport for inheritance is a good move. It gives us most of what we want now and continuing to have some form of restriction is better than having a much greater restriction of it not working at all. -1. Inheritance may be a crappy substitute for real partitioning, but there are plenty of people using it that way. When I talk about removing inheritance, of course I see some need for partitioning. Given the way generalised inheritance works, it is possible to come up with some monstrous designs that are then hard to rewrite and plan. What I propose is that we remove the user-visible generalised inheritance feature and only allow a more structured version which we call partitioning. If all target/result relations are identical it will be much easier to handle things because there'll be no special purpose code to juggle. Yes, key optimizations are missing. Overburdening ourselves with complications that slow us down from delivering more useful features is sensible. Think of it as feature-level refactoring, rather than the normal code-level refactoring we frequently discuss. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2014 - mentors, students and admins
On 01/28/2014 07:34 PM, Thom Brown wrote: And I'd be fine with being admin again this year, unless there's anyone else who would like to take up the mantle? Please do, thanks! Who would be up for mentoring this year? I can mentor. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 29 January 2014 06:43, Tom Lane t...@sss.pgh.pa.us wrote: Actually though, isn't this issue mostly about inheritance of a query *target* table? Exactly. IMHO updateable views on inheritance sets will have multiple other performance problems, so trying to support them here will not make their usage seamless. We allowed updateable views to work with restrictions in earlier releases, so I can't see why continuing with a much reduced restriction would be a problem in this release. We don't need to remove the remaining restriction all in one release, so we? Moving that expansion to the rewriter is a totally different and perhaps more tractable change. It's certainly horribly ugly as it is; hard to see how doing it at the rewriter could be worse. I see the patch adding some changes to inheritance_planner that might well get moved to rewriter. As long as the ugliness all stays in one place, does it matter where that is -- for this patch -- ? Just asking whether moving it will reduce the net ugliness of our inheritance support. @Craig: I don't think this would have much effect on partitioning performance. The main cost of that is constraint exclusion, which we wuld still perform only once. All other copying and re-jigging still gets performed even for excluded relations, so doing it earlier wouldn't hurt as much as you might think. @Dean: If we moved that to rewriter, would expand_security_quals() and preprocess_rowmarks() also move? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/23/2014 06:06 PM, Dean Rasheed wrote: On 21 January 2014 09:18, Dean Rasheed dean.a.rash...@gmail.com wrote: Yes, please review the patch from 09-Jan (http://www.postgresql.org/message-id/CAEZATCUiKxOg=voovja2s6g-sixzzxg18totggp8zobq6qn...@mail.gmail.com). After further testing I found a bug --- it involves having a security barrier view on top of a base relation that has a rule that rewrites the query to have a different result relation, and possibly also a different command type, so that the securityQuals are no longer on the result relation, which is a code path not previously tested and the rowmark handling was wrong. That's probably a pretty obscure case in the context of security barrier views, but that code path would be used much more commonly if RLS were built on top of this. Fortunately the fix is trivial --- updated patch attached. This is the most recent patch I see, and the one I've been working on top of. Are there any known tests that this patch fails? Can we construct any tests that this patch fails? If so, can we make it pass them, or error out cleanly? The discussion has gone a bit off the wall a bit - partly my fault I think - I mentioned inheritance. Lets try to refocus on the immediate patch at hand, and whether it's good to go. Right now, I'm not personally aware of tests cases that cause this code to fail. There's a good-taste complaint about handling of inheritance, but frankly, there's not much about inheritance that _is_ good taste. I don't see that this patch makes it worse, and it's functional. It might be interesting to revisit some of these broader questions in 9.5, but what can we do to get this functionality in place for 9.4? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 01/28/2014 06:11 PM, Christian Kruse wrote: Hi, attached you will find a new version of the patch, ported to HEAD, fixed the mentioned bug and - hopefully - dealing the the remaining issues. Thanks, I have committed this now. The documentation is still lacking. We should explain somewhere how to set nr.hugepages, for example. The Managing Kernel Resources section ought to mention setting. Could I ask you to work on that, please? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 29 January 2014 11:27, Simon Riggs si...@2ndquadrant.com wrote: On 29 January 2014 06:43, Tom Lane t...@sss.pgh.pa.us wrote: Actually though, isn't this issue mostly about inheritance of a query *target* table? Exactly. IMHO updateable views on inheritance sets will have multiple other performance problems, so trying to support them here will not make their usage seamless. We allowed updateable views to work with restrictions in earlier releases, so I can't see why continuing with a much reduced restriction would be a problem in this release. We don't need to remove the remaining restriction all in one release, so we? Moving that expansion to the rewriter is a totally different and perhaps more tractable change. It's certainly horribly ugly as it is; hard to see how doing it at the rewriter could be worse. I see the patch adding some changes to inheritance_planner that might well get moved to rewriter. As long as the ugliness all stays in one place, does it matter where that is -- for this patch -- ? Just asking whether moving it will reduce the net ugliness of our inheritance support. @Craig: I don't think this would have much effect on partitioning performance. The main cost of that is constraint exclusion, which we wuld still perform only once. All other copying and re-jigging still gets performed even for excluded relations, so doing it earlier wouldn't hurt as much as you might think. @Dean: If we moved that to rewriter, would expand_security_quals() and preprocess_rowmarks() also move? Actually I tend to think that expand_security_quals() should remain where it is, regardless of where inheritance expansion happens. One of the things that simplifies the job that expand_security_quals() has to do is that it takes place after preprocess_targetlist(), so the targetlist for the security barrier subqueries that it constructs is known. Calling expand_security_quals() earlier would require additional surgery on preprocess_targetlist() and expand_targetlist(), and would probably also mean that the Query would need to record the sourceRelation subquery RTE, as discussed upthread. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Jan 29, 2014 at 3:41 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/28/2014 07:01 PM, Heikki Linnakangas wrote: On 01/27/2014 07:03 PM, Amit Kapila wrote: I have tried to improve algorithm in another way so that we can get benefit of same chunks during find match (something similar to lz). The main change is to consider chunks at fixed boundary (4 byte) and after finding match, try to find if there is a longer match than current chunk. While finding longer match, it still takes care that next bigger match should be at chunk boundary. I am not completely sure about the chunk boundary may be 8 or 16 can give better results. Since you're only putting a value in the history every 4 bytes, you wouldn't need to calculate the hash in a rolling fashion. You could just take next four bytes, calculate hash, put it in history table. Then next four bytes, calculate hash, and so on. Might save some cycles when building the history table... First of all thanks for looking into patch. Yes, this is right, we can save cycles by not doing rolling during hash calculation and I was working to improve the patch on those lines. Earlier it was their because of rabin's delta encoding where we need to check for a special match after each byte. On a closer look, you're putting a chunk in the history table only every four bytes, but you're *also* checking the history table for a match only every four bytes. That completely destroys the shift-resistence of the algorithm. You are right that it will loose the shift-resistence and even Robert has pointed me this, that's why he wants to maintain the property of special bytes at chunk boundaries as mentioned in Rabin encoding. The only real reason to shift to fixed size was to improve CPU usage and I thought most cases in Update will update the fixed length columns, but it might not be true. For example, if the new tuple is an exact copy of the old tuple, except for one additional byte in the beginning, the algorithm would fail to recognize that. It would be good to add a test case like that in the test suite. You can skip bytes when building the history table, or when finding matches, but not both. Or you could skip N bytes, and then check for matches for the next four bytes, then skip again and so forth, as long as you always check four consecutive bytes (because the hash function is calculated from four bytes). Can we do something like: Build Phase a. Calculate the hash and add the entry in history table at every 4 bytes. Match Phase a. Calculate the hash in rolling fashion and try to find match at every byte. b. When match is found then skip only in chunks, something like I was doing in find match function + + /* consider only complete chunk matches. */ + if (history_chunk_size == 0) + thislen += PGRB_MIN_CHUNK_SIZE; + } Will this address the concern? The main reason to process in chunks as much as possible is to save cpu cycles. For example if we build hash table byte-by-byte, then even for best case where most of tuple has a match, it will have reasonable overhead due to formation of hash table. I couldn't resist the challenge, and started hacking this. First, some observations from your latest patch (pgrb_delta_encoding_v5.patch): 1. There are a lot of comments and code that refers to chunks, which seem obsolete. For example, ck_size field in PGRB_HistEntry is always set to a constant, 4, except maybe at the very end of the history string. The algorithm has nothing to do with Rabin-Karp anymore. 2. The 'hindex' field in PGRB_HistEntry is unused. Also, ck_start_pos is redundant with the index of the entry in the array, because the array is filled in order. That only leaves us just the 'next' field, and that can be represented as a int16 rather than a pointer. So, we really only need a simple int16 array as the history entries array. 3. You're not gaining much by calculating the hash in a rolling fashion. A single rolling step requires two multiplications and two sums, plus shifting the variables around. Calculating the hash from scratch requires three multiplications and three sums. 4. Given that we're not doing the Rabin-Karp variable-length chunk thing, we could use a cheaper hash function to begin with. Like, the one used in pglz. The multiply-by-prime method probably gives fewer collisions than pglz's shift/xor method, but I don't think that matters as much as computation speed. No-one has mentioned or measured the effect of collisions in this thread; that either means that it's a non-issue or that no-one's just realized how big a problem it is yet. I'm guessing that it's not a problem, and if it is, it's mitigated by only trying to find matches every N bytes; collisions would make finding matches slower, and that's exactly what skipping helps with. After addressing the above, we're pretty much back to PGLZ approach. Here during match phase, I think we can avoid copying literal
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 29 January 2014 11:34, Craig Ringer cr...@2ndquadrant.com wrote: On 01/23/2014 06:06 PM, Dean Rasheed wrote: On 21 January 2014 09:18, Dean Rasheed dean.a.rash...@gmail.com wrote: Yes, please review the patch from 09-Jan (http://www.postgresql.org/message-id/CAEZATCUiKxOg=voovja2s6g-sixzzxg18totggp8zobq6qn...@mail.gmail.com). After further testing I found a bug --- it involves having a security barrier view on top of a base relation that has a rule that rewrites the query to have a different result relation, and possibly also a different command type, so that the securityQuals are no longer on the result relation, which is a code path not previously tested and the rowmark handling was wrong. That's probably a pretty obscure case in the context of security barrier views, but that code path would be used much more commonly if RLS were built on top of this. Fortunately the fix is trivial --- updated patch attached. This is the most recent patch I see, and the one I've been working on top of. Are there any known tests that this patch fails? None that I've been able to come up with. Can we construct any tests that this patch fails? If so, can we make it pass them, or error out cleanly? Sounds sensible. Feel free to add any test cases you think up to the wiki page. Even if we don't like this design, any alternative must at least pass all the tests listed there. https://wiki.postgresql.org/wiki/Making_security_barrier_views_automatically_updatable Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIKE INCLUDING CONSTRAINTS is broken
While transforming the LIKE clause in transformTableLikeClause(), the function does remap the varattnos of the constraint expression. 838│ 839│ ccbin_node = map_variable_attnos(stringToNode(ccbin), 840│ 1, 0, 841│ attmap, tupleDesc-natts, 842│ found_whole_row); So, upto this point, the attribute numbers in the constraint expression string are in sync with the table schema definition. But when it comes to treating inheritance in DefineRelation-MergeAttributes(), the inherited attributes are added before the table specific attributes (including the attributed included because of LIKE clause). At this point the attribute numbers in constraint expressions get out of sync with the table's schema and are never corrected later. In AddRelationNewConstraints() we have following code and comment 2231 else 2232 { 2233 Assert(cdef-cooked_expr != NULL); 2234 2235 /* 2236 * Here, we assume the parser will only pass us valid CHECK 2237 * expressions, so we do no particular checking. 2238 */ 2239 expr = stringToNode(cdef-cooked_expr); 2240 } So, either in MergeAttributes or in AddRelationNewConstraints, we need to restamp the attribute numbers in the constraints, so that they are in sync with the table schema after adding the inherited columns. The other possibility is to add the inherited columns after the table specific columns (including those included because of LIKE clause), but that would break lot of other things (including backward compatibility) I guess. On Sat, Jan 25, 2014 at 1:36 AM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: It seems CREATE TABLE ... (LIKE INCLUDING CONSTRAINTS) doesn't work cleanly when there's also regular inheritance; my guess is that attnums get messed up at some point after the constraints are generated. Here's a trivial test case: create table b (b1 int unique check (b1 100)); CREATE TABLE c (c1 int not null references b (b1)); create table d (d1 int, d2 point not null); create table a (a1 int not null, a2 text primary key, a3 timestamptz(6), like b including constraints, like c) inherits (d); You can see the broken state: alvherre=# \d [ab] Tabla «public.a» Columna |Tipo | Modificadores -+-+--- d1 | integer | d2 | point | not null a1 | integer | not null a2 | text| not null a3 | timestamp(6) with time zone | b1 | integer | c1 | integer | not null Índices: a_pkey PRIMARY KEY, btree (a2) Restricciones CHECK: b_b1_check CHECK (a2 100) Hereda: d Tabla «public.b» Columna | Tipo | Modificadores -+-+--- b1 | integer | Índices: b_b1_key UNIQUE CONSTRAINT, btree (b1) Restricciones CHECK: b_b1_check CHECK (b1 100) Referenciada por: TABLE c CONSTRAINT c_c1_fkey FOREIGN KEY (c1) REFERENCES b(b1) Notice how the CHECK constraint in table b points to column b1, but in table a it is mentioning column a2, even though that one is not even of the correct datatype. In fact if you try an insert, you get a weird error message: alvherre=# insert into a (d2, a2, a1, c1) values ('(1, 0)', '1', 1, 1); ERROR: attribute 4 has wrong type DETALLE: Table has type text, but query expects integer. If I take out the INHERITS clause in table a, the error disappears. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: FDW's join pushing down is one of the valuable use-cases of this interface, but not all. As you might know, my motivation is to implement GPU acceleration feature on top of this interface, that offers alternative way to scan or join relations or potentially sort or aggregate. I'm curious how this relates to the pluggable storage idea discussed here https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html I haven't seen a specific proposal about how much functionality should be encapsulated by a pluggable storage system. But I wonder if that would be the best place for specialized table-scan code to end up?
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan29, 2014, at 09:59 , Dean Rasheed dean.a.rash...@gmail.com wrote: On 28 January 2014 20:16, Florian Pflug f...@phlo.org wrote: On Jan27, 2014, at 23:28 , Dean Rasheed dean.a.rash...@gmail.com wrote: This case is explicitly forbidden, both in CREATE AGGREGATE and in the executor. To me, that seems overly restrictive --- if transfn is strict, then you know for sure that no NULL values are added to the aggregate state, so surely it doesn't matter whether or not inv_transfn is strict. It will never be asked to remove NULL values. I think there are definite use-cases where a user might want to use a pre-existing function as the inverse transition function, so it seems harsh to force them to wrap it in a strict function in this case. I'm not sure that the likelihood of someone wanting to combine a strict forward with a non-strict inverse function is very hight, but neither Me neither, but the checks to forbid it aren't adding anything, and I think it's best to make it as flexible as possible. Ok. Another idea would be to have an explicit nulls=ignore|process option for CREATE AGGREGATE. If nulls=process and either of the transition functions are strict, we could either error out, or simply do what normal functions calls do and pretend they return NULL for NULL inputs. Not sure how the rule that forward transition functions may not return NULL if there's an inverse transition function would fit in if we do the latter, though. The question is - is it worth it the effort to add that flag? One use-case I had in mind upthread was suppose you wanted to write a custom version of array_agg that only collected non-NULL values. With such a flag, that would be trivial, but with the current patch you'd have to (count-intuitively) wrap the inverse transition function in a strict function. I'd be more convinced by that if doing so was actually possible for non-superusers. But it isn't, because the aggregate uses internal as it's state type and it's thus entirely up to the user to not crash the database by mixing transfer and final functions with incompatible state data. Plus, instead of creating a custom aggregate, one can just use a FILTER clause to get rid of the NULL values. Another use-case I can imagine is suppose you wanted a custom version of sum() that returned NULL if any of the input values were NULL. If you knew that most of your data was non-NULL, you might still wish to benefit from an inverse transition function. Right now the patch won't allow that because of the error in advance_windowaggregate(), but possibly that could be relaxed by forcing a restart in that case. That's not really true - that patch only forbids that if you insist on representing the state i have seen a NULL input with a NULL state value. But if you instead just count the number of NULLS in your transition functions, all you need to do is to have your final function return NULL if that count is not zero. If I've understood correctly that should give similar to current performance if NULLs are present, and enhanced performance as the window moved over non-NULL data. Exactly - and this makes defining a NULL-sensitive SUM() this way rather silly - a simple counter has very nearly zero overhead, and avoids all rescans. In that second case, it would also be nice if you could simply re-use the existing sum forward and inverse transition functions, with a different null-handling flag. Even if we had a nulls=process|ignore flag, SUM's transition functions would still need to take that use-case into account explicitly to make this work - at the very least, the forward transition function would need to return NULL if the input is NULL instead of just skipping it as it does now. But that would leads to completely unnecessary rescans, so what we actually ought to do then is to make it track whether there have been NULL inputs and make the finalfunc return NULL in this case. Which would border on ironic, since the whole raison d'etre for this is to *avoid* spreading NULL-counting logic around... Plus, to make this actually useable, we'd have to document this, and tell people how to define such a SUM aggregate. But I don't want to go there - we really mustn't encourage people to mix-and-match built-in aggregates with state type internal, since whether they work or crash depends on factors outside the control of the user. And finally, you can get that behaviour quite easily by doing CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END So I think an ignore-nulls flag would have real benefits, as well as being a cleaner design than relying on a strict inverse transition function. The more I think about this, the less convinced I am. In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM() aggregates that are
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 29 January 2014 06:43, Tom Lane t...@sss.pgh.pa.us wrote: Kouhei Kaigai kai...@ak.jp.nec.com writes: Let me ask an elemental question. What is the reason why inheritance expansion logic should be located on the planner stage, not on the tail of rewriter? I think it's mostly historical. You would however have to think of a way to preserve the inheritance relationships in the parsetree representation. In the current code, expand_inherited_tables() adds AppendRelInfo nodes to the planner's data structures as it does the expansion; but I don't think AppendRelInfo is a suitable structure for the rewriter to work with, and in any case there's no place to put it in the Query representation. Actually though, isn't this issue mostly about inheritance of a query *target* table? Moving that expansion to the rewriter is a totally different and perhaps more tractable change. It's certainly horribly ugly as it is; hard to see how doing it at the rewriter could be worse. That's interesting. Presumably then we could make rules work properly on inheritance children. I'm not sure if anyone has actually complained that that currently doesn't work. Thinking about that though, it does potentially open up a whole other can of worms --- a single update query could be turned into multiple other queries of different command types. Perhaps that's not so different from what currently happens in the rewriter, except that you'd need a way to track which of those queries counts towards the statement's final row count. And how many ModifyTable nodes would the resulting plan have? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 29 January 2014 09:16, Magnus Hagander mag...@hagander.net wrote: On Wed, Jan 29, 2014 at 8:58 AM, Peter Geoghegan p...@heroku.com wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Not (at this point) commenting on the details, but a big +1 on the fact that the overhead is low is a *very* important property. If the overhead starts growing it will be uninstalled - and that will make things even harder to diagnose. +1 also. I think almost everybody agrees. Having said that, I'm not certain that moves us forwards with how to handle the patch at hand. Here is my attempt at an objective view of whether or not to apply the patch: The purpose of the patch is to provide additional help to diagnose performance issues. Everybody seems to want this as an additional feature, given the above caveat. The author has addressed the original performance concerns there and provided some evidence to show that appears effective. It is possible that adding this extra straw breaks the camel's back, but it seems unlikely to be that simple. A new feature to help performance problems will be of a great use to many people; complaints about the performance of pg_stat_statements are unlikely to increase greatly from this change, since such changes would only affect those right on the threshold of impact. The needs of the many... Further changes to improve scalability of pg_stat_statements seem warranted in the future, but I see no reason to block the patch for that reason. If somebody has some specific evidence that the impact outweighs the benefit, please produce it or I am inclined to proceed to commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing xloginsert_slots?
Hi all, The undocumented GUC called xloginsert_slots has been introduced by commit 9a20a9b. It is mentioned by the commit that this parameter should be removed before the release. Wouldn't it be a good time to remove this parameter soon? I imagine that removing it before the beta would make sense so now is perhaps too early... Either way, attached is a patch doing so... Regards, -- Michael *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 69,74 extern uint32 bootstrap_data_checksum_version; --- 69,76 #define PROMOTE_SIGNAL_FILE promote #define FALLBACK_PROMOTE_SIGNAL_FILE fallback_promote + /* Number of slots for concurrent xlog insertions */ + #define XLOG_INSERT_SLOTS 8 /* User-settable parameters */ int CheckPointSegments = 3; *** *** 85,91 int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ - int num_xloginsert_slots = 8; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; --- 87,92 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 2078,2094 static struct config_int ConfigureNamesInt[] = }, { - {xloginsert_slots, PGC_POSTMASTER, WAL_SETTINGS, - gettext_noop(Sets the number of slots for concurrent xlog insertions.), - NULL, - GUC_NOT_IN_SAMPLE - }, - num_xloginsert_slots, - 8, 1, 1000, - NULL, NULL, NULL - }, - - { /* see max_connections */ {max_wal_senders, PGC_POSTMASTER, REPLICATION_SENDING, gettext_noop(Sets the maximum number of simultaneously running WAL sender processes.), --- 2078,2083 *** a/src/include/access/xlog.h --- b/src/include/access/xlog.h *** *** 192,198 extern bool EnableHotStandby; extern bool fullPageWrites; extern bool wal_log_hints; extern bool log_checkpoints; - extern int num_xloginsert_slots; /* WAL levels */ typedef enum WalLevel --- 192,197 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing xloginsert_slots?
Hi, On 2014-01-29 21:59:05 +0900, Michael Paquier wrote: The undocumented GUC called xloginsert_slots has been introduced by commit 9a20a9b. It is mentioned by the commit that this parameter should be removed before the release. Wouldn't it be a good time to remove this parameter soon? I imagine that removing it before the beta would make sense so now is perhaps too early... Either way, attached is a patch doing so... I'd rather wait till somebody actually has done some benchmarks. I don't think we're more clueful about it now than back when the patch went in. And such benchmarking is more likely during beta, so... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
2014-01-29 Christian Convey christian.con...@gmail.com: On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: FDW's join pushing down is one of the valuable use-cases of this interface, but not all. As you might know, my motivation is to implement GPU acceleration feature on top of this interface, that offers alternative way to scan or join relations or potentially sort or aggregate. I'm curious how this relates to the pluggable storage idea discussed here https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html I haven't seen a specific proposal about how much functionality should be encapsulated by a pluggable storage system. But I wonder if that would be the best place for specialized table-scan code to end up? If you are interested in designing your own storage layer (thus it needs to have own scan/writer implementation), FDW is an option currently available. It defines a set of interface that allows extensions to generate things to be there on the fly. It does not force to perform as a client of remote database, even though it was originally designed for dblink purpose. In other words, FDW is a feature to translate a particular data source into something visible according to the table definition. As long as driver can intermediate table definition and data format of your own storage layer, it shall work. On the other hands, custom-scan interface, basically, allows extensions to implement alternative way to access the data. If we have wiser way to scan or join relations than built-in logic (yep, it will be a wiser logic to scan a result set of remote-join than local join on a couple of remote scan results), this interface suggest the backend I have such a wise strategy, then planner will choose one of them; including either built-in or additional one, according to the cost. Let's back to your question. This interface is, right now, not designed to implement pluggable storage layer. FDW is an option now, and maybe a development item in v9.5 cycle if we want regular tables being pluggable. Because I'm motivated to implement my GPU acceleration feature to perform on regular relations, I put my higher priority on the interface to allow extension to suggest how to scan it. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao masao.fu...@gmail.com wrote: I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. Please find attached a simple patch renaming PgStat_GlobalStats to PgStat_BgWriterStats. Its related variables and functions are renamed as well and use the term bgwriter instead of global. Comments are updated as well. This patch also removes stats_timestamp from PgStat_GlobalStats and uses instead a static variable in pgstat.c, making all the structures dedicated to each component clearer. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Wed, Jan 29, 2014 at 10:38 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao masao.fu...@gmail.com wrote: I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. Please find attached a simple patch renaming PgStat_GlobalStats to PgStat_BgWriterStats. And of course I forgot the patch... Now attached. -- Michael *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 221,228 static int localNumBackends = 0; * Contains statistics that are not collected per database * or per table. */ static PgStat_ArchiverStats archiverStats; ! static PgStat_GlobalStats globalStats; /* Write request info for each database */ typedef struct DBWriteRequest --- 221,229 * Contains statistics that are not collected per database * or per table. */ + static TimestampTz global_stat_timestamp; static PgStat_ArchiverStats archiverStats; ! static PgStat_BgWriterStats bgwriterStats; /* Write request info for each database */ typedef struct DBWriteRequest *** *** 2344,2361 pgstat_fetch_stat_archiver(void) /* * - ! * pgstat_fetch_global() - * * Support function for the SQL-callable pgstat* functions. Returns ! * a pointer to the global statistics struct. * - */ ! PgStat_GlobalStats * ! pgstat_fetch_global(void) { backend_read_statsfile(); ! return globalStats; } --- 2345,2362 /* * - ! * pgstat_fetch_stat_bgwriter() - * * Support function for the SQL-callable pgstat* functions. Returns ! * a pointer to the bgwriter statistics struct. * - */ ! PgStat_BgWriterStats * ! pgstat_fetch_stat_bgwriter(void) { backend_read_statsfile(); ! return bgwriterStats; } *** *** 3594,3600 pgstat_write_statsfiles(bool permanent, bool allDbs) /* * Set the timestamp of the stats file. */ ! globalStats.stats_timestamp = GetCurrentTimestamp(); /* * Write the file header --- currently just a format ID. --- 3595,3601 /* * Set the timestamp of the stats file. */ ! global_stat_timestamp = GetCurrentTimestamp(); /* * Write the file header --- currently just a format ID. *** *** 3604,3612 pgstat_write_statsfiles(bool permanent, bool allDbs) (void) rc; /* we'll check for error with ferror */ /* ! * Write global stats struct */ ! rc = fwrite(globalStats, sizeof(globalStats), 1, fpout); (void) rc; /* we'll check for error with ferror */ /* --- 3605,3613 (void) rc; /* we'll check for error with ferror */ /* ! * Write bgwriter stats struct */ ! rc = fwrite(bgwriterStats, sizeof(bgwriterStats), 1, fpout); (void) rc; /* we'll check for error with ferror */ /* *** *** 3630,3636 pgstat_write_statsfiles(bool permanent, bool allDbs) */ if (allDbs || pgstat_db_requested(dbentry-databaseid)) { ! dbentry-stats_timestamp = globalStats.stats_timestamp; pgstat_write_db_statsfile(dbentry, permanent); } --- 3631,3637 */ if (allDbs || pgstat_db_requested(dbentry-databaseid)) { ! dbentry-stats_timestamp = global_stat_timestamp; pgstat_write_db_statsfile(dbentry, permanent); } *** *** 3881,3898 pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep) HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* ! * Clear out global and archiver statistics so they start from zero * in case we can't load an existing statsfile. */ ! memset(globalStats, 0, sizeof(globalStats)); memset(archiverStats, 0, sizeof(archiverStats)); /* * Set the current timestamp (will be kept only in case we can't load an * existing statsfile). */ ! globalStats.stat_reset_timestamp = GetCurrentTimestamp(); ! archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp; /* * Try to open the stats file. If it doesn't exist, the backends simply --- 3882,3899 HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* ! * Clear out bgwriter and archiver statistics so they start from zero * in case we can't load an existing statsfile. */ ! memset(bgwriterStats, 0, sizeof(bgwriterStats)); memset(archiverStats, 0, sizeof(archiverStats)); /* * Set the current timestamp (will be kept only in case we can't load an * existing statsfile). */ ! bgwriterStats.stat_reset_timestamp = GetCurrentTimestamp(); ! archiverStats.stat_reset_timestamp = bgwriterStats.stat_reset_timestamp; /* * Try to open the stats file. If it doesn't exist, the backends simply *** *** 3925,3933 pgstat_read_statsfiles(Oid onlydb, bool permanent, bool
Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views
On 01/28/2014 02:11 PM, Craig Ringer wrote: My first thought is to add a boolean flag to RangeTblEntry (similar to the inh flag) to say whether RLS expansion is requested for that RTE. Then set it to false on each RTE as you add new RLS quals to it's securityQuals. That's what I was getting at with adding flags to RangeTblEntry, yes. Given the number of flags we're growing I wonder if they should be consolodated into a bitmask, but I'll leave that problem for later. I'll do this part first, since it seems you agree that a RangeTblEntry flag is the appropriate path. That'll make row-security checking work and make the patch testable. It won't deal with recursive rules, they'll still crash the backend. I'll deal with that as a further step. I've put together a working RLS patch on top of updatable security barrier views. It has some known issues remaining; it doesn't do recursion checking yet, and it fails some of the regression tests in exciting ways. I'm looking into them step by step. Some differences in the tests behaviours that have changed due to the inheritance rules changing; many appear to be oversights or bugs yet to be chased down. You can find it here; https://github.com/ringerc/postgres/compare/rls-9.4-upd-sb-views i.e. https://github.com/ringerc/postgres.git , branch rls-9.4-upd-sb-views (subject to rebasing) or the non-rebased tag rls-9.4-upd-sb-views-v2 The guts of the patch appear as a diff, attached, but it's not standalone so I suggest using git. I'll be looking into recursion issues and the test failures tomorrow. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 03c02389c7b475d06864f9a7f38fd583c6e891e9 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Tue, 28 Jan 2014 14:23:23 +0800 Subject: [PATCH 1/4] RLS: Add rowsec_done flag to RangeTblEntry To be used to track completion status of row security expansion on a range table entry. Rels with this flag set must not be row-security expanded. --- src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/outfuncs.c | 1 + src/backend/nodes/readfuncs.c | 1 + src/include/nodes/parsenodes.h | 1 + 4 files changed, 4 insertions(+) diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 3e2acf0..9607911 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1975,6 +1975,7 @@ _copyRangeTblEntry(const RangeTblEntry *from) COPY_SCALAR_FIELD(rtekind); COPY_SCALAR_FIELD(relid); COPY_SCALAR_FIELD(relkind); + COPY_SCALAR_FIELD(rowsec_done); COPY_NODE_FIELD(subquery); COPY_SCALAR_FIELD(security_barrier); COPY_SCALAR_FIELD(rowsec_relid); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index dd447ed..974d169 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2372,6 +2372,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node) case RTE_RELATION: WRITE_OID_FIELD(relid); WRITE_CHAR_FIELD(relkind); + WRITE_BOOL_FIELD(rowsec_done); break; case RTE_SUBQUERY: WRITE_NODE_FIELD(subquery); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index a0cb369..7a43b59 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -1214,6 +1214,7 @@ _readRangeTblEntry(void) case RTE_RELATION: READ_OID_FIELD(relid); READ_CHAR_FIELD(relkind); + READ_BOOL_FIELD(rowsec_done); break; case RTE_SUBQUERY: READ_NODE_FIELD(subquery); diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index b3e718a..b3ae722 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -736,6 +736,7 @@ typedef struct RangeTblEntry */ Oid relid; /* OID of the relation */ char relkind; /* relation kind (see pg_class.relkind) */ + bool rowsec_done; /* True if row-security quals checked for and applied already */ /* * Fields valid for a subquery RTE (else NULL): -- 1.8.3.1 From b0f5797d81a4b856f26818e14c93a0ec453a35de Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 24 Jan 2014 16:18:40 +0800 Subject: [PATCH 2/4] RLS: Enforce row-security constraints Row-security constraints are enforced here by having the securityQual expansion code check for a row-security constraint before expanding quals. --- src/backend/optimizer/prep/prepsecurity.c | 17 +++- src/backend/optimizer/prep/prepunion.c| 5 ++ src/backend/optimizer/util/Makefile | 3 +- src/backend/optimizer/util/rowsecurity.c | 144 ++ src/include/optimizer/rowsecurity.h | 25 ++ 5 files changed, 192 insertions(+), 2 deletions(-) create mode 100644 src/backend/optimizer/util/rowsecurity.c create mode 100644 src/include/optimizer/rowsecurity.h diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 01/29/2014 01:12 PM, Heikki Linnakangas wrote: On 01/28/2014 06:11 PM, Christian Kruse wrote: Hi, attached you will find a new version of the patch, ported to HEAD, fixed the mentioned bug and - hopefully - dealing the the remaining issues. Thanks, I have committed this now. The documentation is still lacking. The documentation is indeed lacking since it breaks the build. doc/src/sgml/config.sgml contains the line normal allocation if that fails. With literalon/literal, failure which doesn't correctly terminate the closing /literal tag. Trivial patch attached. -- Vik *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 1137,1143 include 'filename' para With varnamehuge_tlb_pages/varname set to literaltry/literal, the server will try to use huge pages, but fall back to using ! normal allocation if that fails. With literalon/literal, failure to use huge pages will prevent the server from starting up. With literaloff/literal, huge pages will not be used. /para --- 1137,1143 para With varnamehuge_tlb_pages/varname set to literaltry/literal, the server will try to use huge pages, but fall back to using ! normal allocation if that fails. With literalon/literal, failure to use huge pages will prevent the server from starting up. With literaloff/literal, huge pages will not be used. /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 02:58 AM, Peter Geoghegan wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Importance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 01/29/2014 04:01 PM, Vik Fearing wrote: On 01/29/2014 01:12 PM, Heikki Linnakangas wrote: The documentation is still lacking. The documentation is indeed lacking since it breaks the build. doc/src/sgml/config.sgml contains the line normal allocation if that fails. With literalon/literal, failure which doesn't correctly terminate the closing /literal tag. Trivial patch attached. Thanks, applied! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add force option to dropdb
On Tue, Jan 28, 2014 at 11:01 PM, salah jubeh s_ju...@yahoo.com wrote: Hello Sawada, - This patch is not patched to master branch Sorry, My mistake //new connections are not allowed Corrected. I hope now the patch in better state, if somthing left, I will be glad to fix it Regards On Tuesday, January 28, 2014 4:17 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On 2014年1月17日 0:56, salah jubeh s_ju...@yahoo.com wrote: If the user owns objects, that will prevent this from working also. I have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY calls to this utility would be a bit excessive, but who knows. Please find attached the first attempt to drop drop the client connections. I have added an option -k, --kill instead of force since killing client connection does not guarantee -drop force-. Regards On Tuesday, January 14, 2014 8:06 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: salah jubeh wrote: For the sake of completeness: 1. I think also, I need also to temporary disallow conecting to the database, is that right? 2. Is there other factors can hinder dropping database? If the user owns objects, that will prevent this from working also. I have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY calls to this utility would be a bit excessive, but who knows. 3. Should I write two patches one for pg_version=9.2 and one for pg_version9.2 No point -- nothing gets applied to branches older than current development anyway. Thank you for the patch. And sorry for delay in reviewing. I started to look this patch, So the following is first review comment. - This patch is not patched to master branch I tried to patch this patch file to master branch, but I got following error. $ cd postgresql $ patch -d. -p1 ../dropdb.patch can't find fiel to patch at input line 3 Perhaps you used the wrong -p or --strip option? the text leading up to this was: -- |--- dropdb_org.c 2014-01-16 |+++ dropdb.c 2014-01-16 -- There is not dropdb_org.c. I think that you made mistake when the patch is created. - This patch is not according the coding rule For example, line 71 of the patch: //new connections are not allowed It should be: /* new connections are not allowed */ (Comment blocks that need specific line breaks should be formatted as block comments, where the comment starts as /*--.) Please refer to coding rule. http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F Thank you for updating patch! It did not works fine with following case. --- $ createdb -U postgres hoge $ psql -d hoge -U postgres hoge=# create table test (col text); hoge=# insert into test select repeat(chr(code),1) from generate_series(1,10) code; Execute dropdb -k while the client is inserting many tuples into database $ dropdb -k hoge 2014-01-29 23:10:49 JST FATAL: terminating connection due to administrator command 2014-01-29 23:10:49 JST STATEMENT: insert into test select repeat(chr(code),1) from generate_series(1,200) code; 2014-01-29 23:10:54 JST ERROR: database hoge is being accessed by other users 2014-01-29 23:10:54 JST DETAIL: There is 1 other session using the database. 2014-01-29 23:10:54 JST STATEMENT: DROP DATABASE hoge; 2014-01-29 23:10:54 JST ERROR: syntax error at or near hoge at character 41 2014-01-29 23:10:54 JST STATEMENT: UPDATE pg_database SET datconnlimit = e hoge is being accessed by other users WHERE datname= 'hoge'; dropdb: database removal failed: ERROR: syntax error at or near hoge LINE 1: UPDATE pg_database SET datconnlimit = e hoge is being acce... ^ hoge=# \l List of databases Name| Owner | Encoding | Collate | Ctype | Access privileges ---+--+--+-+---+--- hoge | postgres | UTF8 | C | C | postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres + | | | | | postgres=CTc/postgres hoge database is not dropped yet. Is this the bug? or not? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Tue, Jan 28, 2014 at 5:58 AM, Christian Kruse christ...@2ndquadrant.com wrote: Hi, On 28/01/14 13:51, Heikki Linnakangas wrote: Oh darn, I remembered we had already committed this, but clearly not. I'd love to still get this into 9.4. The latest patch (hugepages-v5.patch) was pretty much ready for commit, except for documentation. I'm working on it. I ported it to HEAD and currently doing some benchmarks. Next will be documentation. you mentioned benchmarks -- do you happen to have the results handy? (curious) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 01/29/2014 02:21 PM, Amit Kapila wrote: On Wed, Jan 29, 2014 at 3:41 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: For example, if the new tuple is an exact copy of the old tuple, except for one additional byte in the beginning, the algorithm would fail to recognize that. It would be good to add a test case like that in the test suite. You can skip bytes when building the history table, or when finding matches, but not both. Or you could skip N bytes, and then check for matches for the next four bytes, then skip again and so forth, as long as you always check four consecutive bytes (because the hash function is calculated from four bytes). Can we do something like: Build Phase a. Calculate the hash and add the entry in history table at every 4 bytes. Match Phase a. Calculate the hash in rolling fashion and try to find match at every byte. Sure, that'll work. However, I believe it's cheaper to add entries to the history table at every byte, and check for a match every 4 bytes. I think you'll get more or less the same level of compression either way, but adding to the history table is cheaper than checking for matches, and we'd rather do the cheap thing more often than the expensive thing. b. When match is found then skip only in chunks, something like I was doing in find match function + + /* consider only complete chunk matches. */ + if (history_chunk_size == 0) + thislen += PGRB_MIN_CHUNK_SIZE; + } Will this address the concern? Hmm, so when checking if a match is truly a match, you compare the strings four bytes at a time rather than byte-by-byte? That might work, but I don't think that's a hot spot currently. In the profiling I did, with a nothing matches test case, all the cycles were spent in the history building, and finding matches. Finding out how long a match is was negligible. Of course, it might be a different story with input where the encoding helps and you have matches, but I think we were doing pretty well in those cases already. The main reason to process in chunks as much as possible is to save cpu cycles. For example if we build hash table byte-by-byte, then even for best case where most of tuple has a match, it will have reasonable overhead due to formation of hash table. Hmm. One very simple optimization we could do is to just compare the two strings byte by byte, before doing anything else, to find any common prefix they might have. Then output a tag for the common prefix, and run the normal algorithm on the rest of the strings. In many real-world tables, the 1-2 first columns are a key that never changes, so that might work pretty well in practice. Maybe it would also be worthwhile to do the same for any common suffix the tuples might have. That would fail to find matches where you e.g. update the last column to have the same value as the first column, and change nothing else, but that's ok. We're not aiming for the best possible compression, just trying to avoid WAL-logging data that wasn't changed. Here during match phase, I think we can avoid copying literal bytes until a match is found, that can save cycles for cases when old and new tuple are mostly different. I think the extra if's in the loop will actually cost you more cycles than you save. You could perhaps have two copies of the main match-finding loop though. First, loop without outputting anything, until you find the first match. Then, output anything up to that point as literals. Then fall into the second loop, which outputs any non-matches byte by byte. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 29 January 2014 12:20, Dean Rasheed dean.a.rash...@gmail.com wrote: @Dean: If we moved that to rewriter, would expand_security_quals() and preprocess_rowmarks() also move? Actually I tend to think that expand_security_quals() should remain where it is, regardless of where inheritance expansion happens. One of the things that simplifies the job that expand_security_quals() has to do is that it takes place after preprocess_targetlist(), so the targetlist for the security barrier subqueries that it constructs is known. Calling expand_security_quals() earlier would require additional surgery on preprocess_targetlist() and expand_targetlist(), and would probably also mean that the Query would need to record the sourceRelation subquery RTE, as discussed upthread. Looks to me that preprocess_targetlist() could be done earlier also, if necessary, since it appears to contain checks and additional columns, not anything related to optimization. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] jsonb generation functions
In the jsonb patch I have been working on, I have replicated all of what I call the json processing functions, and I will shortly add analogs for the new functions in that category json_to_record and json_to_recordset. However I have not replicated what I call the json generation functions, array_to_json, row_to_json, to_json, and the new functions json_build_array, json_build_object, and json_object, nor the aggregate functions json_agg and the new json_object_agg. The reason for that is that I have always used those for constructing json given to the client, rather than json stored in the database, and for such a use there would be no point in turning it into jsonb rather than generating the json string directly. However, I could be persuaded that we should have a jsonb analog of every json function. If we decide that, the next question is whether we have to have it now, or if it can wait. (The other notable thing that's missing, and I think can't wait, is casts from json to jsonb and vice versa. I'm going to work on that immediately.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb generation functions
On Wed, Jan 29, 2014 at 9:08 AM, Andrew Dunstan and...@dunslane.net wrote: In the jsonb patch I have been working on, I have replicated all of what I call the json processing functions, and I will shortly add analogs for the new functions in that category json_to_record and json_to_recordset. However I have not replicated what I call the json generation functions, array_to_json, row_to_json, to_json, and the new functions json_build_array, json_build_object, and json_object, nor the aggregate functions json_agg and the new json_object_agg. The reason for that is that I have always used those for constructing json given to the client, rather than json stored in the database, and for such a use there would be no point in turning it into jsonb rather than generating the json string directly. However, I could be persuaded that we should have a jsonb analog of every json function. If we decide that, the next question is whether we have to have it now, or if it can wait. my 0.02$: it can wait -- possibly forever. Assuming the casts work I see absolutely no issue whatsover asking users to do: select xx_to_json(something complex)::jsonb; If you examine all the use cases json and jsonb, while they certainly have some overlap, are going to be used in different patterns. In hindsight the type bifurcation was a good thing ISTM. I don't think there should be any expectation for 100% match of the API especially since you can slide things around with casts. In fact, for heavy serialization at the end of the day it might be better to defer the jsonb creation to the final stage of serialization anyways (avoiding iterative insertion) even if we did match the API. (can't hurt to manage scope either considering the timelines for 9.4) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb generation functions
Andrew Dunstan and...@dunslane.net writes: In the jsonb patch I have been working on, I have replicated all of what I call the json processing functions, and I will shortly add analogs for the new functions in that category json_to_record and json_to_recordset. However I have not replicated what I call the json generation functions, array_to_json, row_to_json, to_json, and the new functions json_build_array, json_build_object, and json_object, nor the aggregate functions json_agg and the new json_object_agg. The reason for that is that I have always used those for constructing json given to the client, rather than json stored in the database, and for such a use there would be no point in turning it into jsonb rather than generating the json string directly. However, I could be persuaded that we should have a jsonb analog of every json function. If we decide that, the next question is whether we have to have it now, or if it can wait. (The other notable thing that's missing, and I think can't wait, is casts from json to jsonb and vice versa. I'm going to work on that immediately.) It disturbs me that two weeks into CF4, we appear to still be in full-speed-ahead development mode for jsonb. Every other feature that's hoping to get into 9.4 is supposed to have a completed patch under review by the CF process. If jsonb is an exception, why? It seems to have already gotten a pass on the matter of documentation quality. I'm reluctant to write a blank check for more code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb generation functions
On 01/29/2014 10:21 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: In the jsonb patch I have been working on, I have replicated all of what I call the json processing functions, and I will shortly add analogs for the new functions in that category json_to_record and json_to_recordset. However I have not replicated what I call the json generation functions, array_to_json, row_to_json, to_json, and the new functions json_build_array, json_build_object, and json_object, nor the aggregate functions json_agg and the new json_object_agg. The reason for that is that I have always used those for constructing json given to the client, rather than json stored in the database, and for such a use there would be no point in turning it into jsonb rather than generating the json string directly. However, I could be persuaded that we should have a jsonb analog of every json function. If we decide that, the next question is whether we have to have it now, or if it can wait. (The other notable thing that's missing, and I think can't wait, is casts from json to jsonb and vice versa. I'm going to work on that immediately.) It disturbs me that two weeks into CF4, we appear to still be in full-speed-ahead development mode for jsonb. Every other feature that's hoping to get into 9.4 is supposed to have a completed patch under review by the CF process. If jsonb is an exception, why? It seems to have already gotten a pass on the matter of documentation quality. I'm reluctant to write a blank check for more code. In that case I will add the casts, which are trivial but essential, and be done. Apart from that there is no essential development work. FYI, the principal causes of delay have been a) some ill health on my part and b) Russian vacation season. I've been very conscious that we've been stretching the deadline a bit. I am going to change the documentation stuff you griped about, in the way I previously suggested. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode
2014/1/29 Ian Lawrence Barwick barw...@gmail.com: 2014-01-29 Andrew Dunstan and...@dunslane.net: On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote: Hi Payal Many thanks for the review, and my apologies for not getting back to you earlier. Updated version of the patch attached with suggested corrections. On a very quick glance, I see that you have still not made adjustments to contrib/file_fdw to accommodate this new option. I don't see why this COPY option should be different in that respect. Hmm, that idea seems to have escaped me completely. I'll get onto it forthwith. Striking while the keyboard is hot... version with contrib/file_fdw modifications attached. Regards Ian Barwick diff --git a/contrib/file_fdw/data/text.csv b/contrib/file_fdw/data/text.csv new file mode 100644 index ed348a9..c7e243c *** a/contrib/file_fdw/data/text.csv --- b/contrib/file_fdw/data/text.csv *** *** 1,4 ! AAA,aaa ! XYZ,xyz ! NULL,NULL ! ABC,abc --- 1,4 ! AAA,aaa,123, ! XYZ,xyz,,321 ! NULL,NULL,NULL,NULL ! ABC,abc,, diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c new file mode 100644 index 5639f4d..5877512 *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** struct FileFdwOption *** 48,56 /* * Valid options for file_fdw. ! * These options are based on the options for COPY FROM command. ! * But note that force_not_null is handled as a boolean option attached to ! * each column, not as a table option. * * Note: If you are adding new option for user mapping, you need to modify * fileGetOptions(), which currently doesn't bother to look at user mappings. --- 48,56 /* * Valid options for file_fdw. ! * These options are based on the options for the COPY FROM command. ! * But note that force_not_null and force_null are handled as boolean options ! * attached to a column, not as a table option. * * Note: If you are adding new option for user mapping, you need to modify * fileGetOptions(), which currently doesn't bother to look at user mappings. *** static const struct FileFdwOption valid_ *** 69,75 {null, ForeignTableRelationId}, {encoding, ForeignTableRelationId}, {force_not_null, AttributeRelationId}, ! /* * force_quote is not supported by file_fdw because it's for COPY TO. */ --- 69,75 {null, ForeignTableRelationId}, {encoding, ForeignTableRelationId}, {force_not_null, AttributeRelationId}, ! {force_null, AttributeRelationId}, /* * force_quote is not supported by file_fdw because it's for COPY TO. */ *** file_fdw_validator(PG_FUNCTION_ARGS) *** 187,192 --- 187,193 Oid catalog = PG_GETARG_OID(1); char *filename = NULL; DefElem*force_not_null = NULL; + DefElem*force_null = NULL; List *other_options = NIL; ListCell *cell; *** file_fdw_validator(PG_FUNCTION_ARGS) *** 243,252 } /* ! * Separate out filename and force_not_null, since ProcessCopyOptions ! * won't accept them. (force_not_null only comes in a boolean ! * per-column flavor here.) */ if (strcmp(def-defname, filename) == 0) { if (filename) --- 244,253 } /* ! * Separate out filename and column-specific options, since ! * ProcessCopyOptions won't accept them. */ + if (strcmp(def-defname, filename) == 0) { if (filename) *** file_fdw_validator(PG_FUNCTION_ARGS) *** 255,270 errmsg(conflicting or redundant options))); filename = defGetString(def); } else if (strcmp(def-defname, force_not_null) == 0) { if (force_not_null) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg(conflicting or redundant options))); force_not_null = def; /* Don't care what the value is, as long as it's a legal boolean */ (void) defGetBoolean(def); } else other_options = lappend(other_options, def); } --- 256,297 errmsg(conflicting or redundant options))); filename = defGetString(def); } + /* + * force_not_null is a boolean option; after validation we can discard + * it - it will be retrieved later in get_file_fdw_attribute_options() + */ else if (strcmp(def-defname, force_not_null) == 0) { if (force_not_null) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg(conflicting or redundant options), ! errhint(option \force_not_null\ supplied more than once for a column))); ! if(force_null) ! ereport(ERROR, ! (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg(conflicting or redundant options), ! errhint(option \force_not_null\ cannot be used together with \force_null\))); force_not_null = def; /* Don't care what the value is, as long as it's a legal boolean */ (void) defGetBoolean(def); } + /* See comments for force_not_null above */
Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup
New patch attached with the following changes: On Thu, Jan 23, 2014 at 11:01 AM, Steeve Lennmark stee...@handeldsbanken.se wrote: On Thu, Jan 23, 2014 at 2:06 AM, Peter Eisentraut pete...@gmx.net wrote: I'm not totally happy with the choice of : as the mapping separator, because that would always require escaping on Windows. We could make it analogous to the path handling and make ; the separator on Windows. Then again, this is not a path, so maybe it should look like one. We pick something else altogether, like =. The option name --tablespace isn't very clear. It ought to be something like --tablespace-mapping. This design choice I made about -T (--tablespace) and colon as separator was copied from pg_barman, which is the way I back up my clusters at the moment. Renaming the option to --tablespace-mapping and changing the syntax to something like = is totally fine by me. I changed the directory separator from : to = but made it configurable in the code. I don't think we should require the new directory to be an absolute path. It should be relative to the current directory, just like the -D option does it. Accepting a relative path should be fine, I made a failed attempt using realpath(3) initially but I guess checking for [0] != '/' and prepending getcwd(3) would suffice. Relative paths are now accepted. This code will probably not work on windows though. I tried setting up Windows 8 with the git version of postgres but was unsuccessful, so I can't really test any of this. Help on this subject (Windows) would be much appreciated. Somehow, I had the subconscious assumption that this feature would do prefix matching on the directories, not only complete string matching. So if I had tablespaces in /mnt/data1, /mnt/data2, /mnt/data3, I could map them all with -T /mnt:mnt and be done. Not sure if that is useful to many, but it's worth a thought. I like that a lot, but I'm afraid the code would have to get a bit more complex to add that functionality. It would be an easier rewrite if we added a hint character, something like -T '/mnt/*:mnt'. This is not implemented as suggested by Peter in his previous comment. -T /mnt:mnt now relocates all tablespaces under /mnt to the relative path mnt. Review style guide for error messages: http://www.postgresql.org/docs/devel/static/error-style-guide.html I've updated error messages according to the style guide. We need to think about how to handle this on platforms without symlinks. I don't like just printing an error message and moving on. It should be either pass or fail or an option to choose between them. I hope someone with experience with those kind of systems can come up with suggestions on how to solve that. I only run postgres on Linux. I would still love some input on this. Please put the options in the getopt handling in alphabetical order. It's not very alphabetical now, but between D and F is probably not the best place. ;-) Done. //Steeve 0006-pg_basebackup-relocate-tablespace.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: option --if-exists for pg_dump
2014-01-29 Pavel Stehule pavel.steh...@gmail.com 2014-01-29 Jeevan Chalke jeevan.cha...@enterprisedb.com Hi Pavel, Now the patch looks good to me. However when I try to restore your own sql file's dump, I get following errors: pg_restore: [archiver (db)] could not execute query: ERROR: relation public.emp does not exist Command was: DROP TRIGGER IF EXISTS emp_insert_trigger ON public.emp; pg_restore: [archiver (db)] could not execute query: ERROR: schema myschema does not exist Command was: DROP FUNCTION IF EXISTS myschema.int_to_date(integer); Is that expected after your patch ? it should be fixed by http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b152c6cd0de1827ba58756e24e18110cf902182acommit Also, I didn't quite understand these lines of comments: /* * Descriptor string (te-desc) should not be same as object * specifier for DROP STATEMENT. The DROP DEFAULT has not * IF EXISTS clause - has not sense. */ Will you please rephrase ? I can try it - . A content of te-desc is usually substring of DROP STATEMENT with one related exception - CONSTRAINT. Independent to previous sentence - ALTER TABLE ALTER COLUMN DROP DEFAULT doesn't support IF EXISTS - and therefore it should not be injected. is it ok? Regards Pavel Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Wed, Jan 29, 2014 at 4:12 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/28/2014 06:11 PM, Christian Kruse wrote: Hi, attached you will find a new version of the patch, ported to HEAD, fixed the mentioned bug and - hopefully - dealing the the remaining issues. Thanks, I have committed this now. I'm getting this warning now with gcc (GCC) 4.4.7: pg_shmem.c: In function 'PGSharedMemoryCreate': pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this function pg_shmem.c:332: note: 'allocsize' was declared here Cheers, Jeff
Re: [HACKERS] proposal: hide application_name from other users
On 28 January 2014 20:14, Josh Berkus j...@agliodbs.com wrote: On 01/28/2014 12:10 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: For example, I would really like to GRANT an unpriv user access to the WAL columns in pg_stat_replication so that I can monitor replication delay without granting superuser permissions. Just out of curiosity, why is that superuser-only at all? AFAICS the hidden columns are just some LSNs ... what is the security argument for hiding them in the first place? Beats me, I can't find any discussion on it at all. No specific reason that I can recall but replication is heavily protected by layers of security. pg_stat_replication is a join with pg_stat_activity, so some of the info is open, some closed. It seems possible to relax that. Presumably the current patch is returned with feedback? Or can we fix these problems by inventing a new user aspect called MONITOR (similar to REPLICATION)? We can grant application_name and replication details to that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Christian Kruse christ...@2ndquadrant.com writes: Ok, so I propose the attached patch as a fix. No, what I meant is that the ereport caller needs to save errno, rather than assuming that (some subset of) ereport-related subroutines will preserve it. In general, it's unsafe to assume that any nontrivial subroutine preserves errno, and I don't particularly want to promise that the ereport functions are an exception. Even if we did that, this type of coding would still be risky. Here are some examples: ereport(..., foo() ? errdetail(...) : 0, (errno == something) ? errhint(...) : 0); If foo() clobbers errno and returns false, there is nothing that elog.c can do to make this coding work. ereport(..., errmsg(%s belongs to %s, foo(), (errno == something) ? this : that)); Again, even if every single elog.c entry point saved and restored errno, this coding wouldn't be safe. I don't think we should try to make the world safe for some uses of errno within ereport logic, when there are other very similar-looking uses that we cannot make safe. What we need is a coding rule that you don't do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FOREIGN KEY ... CONCURRENTLY
Esteemed hackers, I can't be the only person to have encountered a situation where adding a new foreign key pointing at a busy table essentially never happens because the way things work now, creating the constraint trigger on that busy table requires an AccessExclusive lock, or a unicorn, whichever you can acquire first. So I'd like to propose, per a conversation with Andrew Gierth, that we make an option to create foreign keys concurrently, which would mean in essence that the referencing table would: 1) need to be empty, at least in the first version, and 2) needs to stay in a non-writeable state until all possible conflicting transactions had ended. Now, the less-fun part. Per Andres Freund, the current support for CONCURRENTLY in other operations is complex and poorly understood, and there's no reason to believe this new CONCURRENTLY would be simpler or easier to understand. A couple of questions: 1) Would people like to have FOREIGN KEY ... CONCURRENTLY as described above? 2) Is there another way to solve the problem of adding a foreign key constraint that points at a busy table? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 29/01/14 14:12, Heikki Linnakangas wrote: The documentation is still lacking. We should explain somewhere how to set nr.hugepages, for example. The Managing Kernel Resources section ought to mention setting. Could I ask you to work on that, please? Of course! Attached you will find a patch for better documentation. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml old mode 100644 new mode 100755 index 1b5f831..68b38f7 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1128,10 +1128,7 @@ include 'filename' The use of huge TLB pages results in smaller page tables and less CPU time spent on memory management, increasing performance. For more details, see -ulink url=https://wiki.debian.org/Hugepages;the Debian wiki/ulink. -Remember that you will need at least shared_buffers / huge page size + -1 huge TLB pages. So for example for a system with 6GB shared buffers -and a hugepage size of 2kb of you will need at least 3156 huge pages. +link linkend=linux-huge-tlb-pagesLinux huge TLB pages/link. /para para diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml old mode 100644 new mode 100755 index bbb808f..2288c7b --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1307,6 +1307,83 @@ echo -1000 /proc/self/oom_score_adj /para /note /sect2 + + sect2 id=linux-huge-tlb-pages + titleLinux huge TLB pages/title + + para +Nowadays memory address spaces for processes are virtual. This means, when +a process reserves memory, it gets a virtual memory address which has to +be translated to a physical memory address by the OS or the CPU. This can +be done via calculations, but since memory is accessed very often there is +a cache for that, called Translation Lookaside Buffer, +short emphasisTLB/emphasis. + /para + + para +When a process reserves memory, this is done in chunks (often +of literal4kb/literal) named pages. This means if a process requires +1GB of RAM, it has literal262144/literal (literal1GB/literal +/ literal4KB/literal) pages and therefore literal262144/literal +entries for the translation table. Since the TLB has a limited number of +entries it is obvious that they can't be they can't all be cached, which +will lead to loss of performance. + /para + + para +One way to tune this is to increase the page size. Most platforms allow +larger pages, e.g. x86 allows pages of literal2MB/literal. This would +reduce the number of pages to literal512/literal +(literal1GB/literal / literal2MB/literal). This reduces the number +of necessary lookups drastrically. + /para + + para +To enable this feature in productnamePostgreSQL/productname you need a +kernel with varnameCONFIG_HUGETLBFS=y/varname and +varnameCONFIG_HUGETLB_PAGE=y/varname. You also have to tune the system +setting varnamevm.nr_hugepages/varname. To calculate the number of +necessary huge pages start productnamePostgreSQL/productname without +huge pages enabled and check the varnameVmPeakvarname value from the +proc filesystem: + +programlisting +$ userinputhead -1 /path/to/data/directory/postmaster.pid/userinput +4170 +$ userinputgrep ^VmPeak /proc/4170/status/userinput +VmPeak: 6490428 kB +/programlisting + literal6490428/literal / literal2048/literal + (varnamePAGE_SIZE/varname literal2MB/literal) are + roughly literal3169.154/literal huge pages, so you will need at + least literal3170/literal huge pages: +programlisting +sysctl -w vm.nr_hugepages=3170 +/programlisting +Sometimes the kernel is not able to allocate the desired number of huge +pages, so it might be necessary to repeat that command or to reboot. Don't +forget to add an entry to filename/etc/sysctl.conf/filename to persist +this setting through reboots. + /para + + para +The default behavior for huge pages +in productnamePostgreSQL/productname is to use them when possible and +to fallback to normal pages when failing. To enforce the use of huge +pages, you can +set link linkend=guc-huge-tlb-pagesvarnamehuge_tlb_pages/varname/link +to literalon/literal. Note that in this +case productnamePostgreSQL/productname will fail to start if not +enough huge pages are available. + /para + + para +For a detailed description of the productnameLinux/productname huge +pages feature have a look +at ulink url=https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt/ulink. + /para + + /sect2 /sect1 pgp2cXAeJk4PE.pgp Description: PGP signature
[HACKERS] pg_sleep_enhancements.patch
Hello I am looking on this patch http://www.postgresql.org/message-id/525fe206.6000...@dalibo.com a) pg_sleep_for - no objection - it is simple and secure b) pg_sleep_until I am not sure - maybe this implementation is too simply. I see two possible risk where it should not work as users can expect a) what will be expected behave whem time is changed - CET/CEST ? b) what will be expected behave when board clock is not accurate and it is periodically fixed (by NTP) - isn't better to sleep only few seconds and recalculate sleeping interval? Regards Pavel
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 04:55 AM, Simon Riggs wrote: It is possible that adding this extra straw breaks the camel's back, but it seems unlikely to be that simple. A new feature to help performance problems will be of a great use to many people; complaints about the performance of pg_stat_statements are unlikely to increase greatly from this change, since such changes would only affect those right on the threshold of impact. The needs of the many... Further changes to improve scalability of pg_stat_statements seem warranted in the future, but I see no reason to block the patch for that reason If we're talking here about min, max and stddev, then +1. The stats we've seen so far seem to indicate that any affect on query response time is within the margin of error for pgbench. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sleep_enhancements.patch
On 01/29/2014 08:04 PM, Pavel Stehule wrote: Hello I am looking on this patch Thank you for looking at it. http://www.postgresql.org/message-id/525fe206.6000...@dalibo.com a) pg_sleep_for - no objection - it is simple and secure Okay. b) pg_sleep_until I am not sure - maybe this implementation is too simply. I see two possible risk where it should not work as users can expect a) what will be expected behave whem time is changed - CET/CEST ? There is no risk there, the wake up time is specified with time zone. b) what will be expected behave when board clock is not accurate and it is periodically fixed (by NTP) - isn't better to sleep only few seconds and recalculate sleeping interval? We could do that, but it seems like overkill. It would mean writing a new C function whereas this is just a simple helper for the existing pg_sleep() function. So my vote is to keep the patch as-is. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 29/01/14 10:11, Jeff Janes wrote: I'm getting this warning now with gcc (GCC) 4.4.7: Interesting. I don't get that warning. But the compiler is (formally) right. pg_shmem.c: In function 'PGSharedMemoryCreate': pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this function pg_shmem.c:332: note: 'allocsize' was declared here Attached patch should fix that. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index f7596bf..c39dfb6 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -329,7 +329,7 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2) static void * CreateAnonymousSegment(Size *size) { - Size allocsize; + Size allocsize = *size; void *ptr = MAP_FAILED; #ifndef MAP_HUGETLB @@ -358,7 +358,6 @@ CreateAnonymousSegment(Size *size) */ int hugepagesize = 2 * 1024 * 1024; - allocsize = *size; if (allocsize % hugepagesize != 0) allocsize += hugepagesize - (allocsize % hugepagesize); @@ -372,7 +371,6 @@ CreateAnonymousSegment(Size *size) if (huge_tlb_pages == HUGE_TLB_OFF || (huge_tlb_pages == HUGE_TLB_TRY ptr == MAP_FAILED)) { - allocsize = *size; ptr = mmap(NULL, *size, PROT_READ | PROT_WRITE, PG_MMAP_FLAGS, -1, 0); } pgpzNJrOrDmZa.pgp Description: PGP signature
Re: [HACKERS] Planning time in explain/explain analyze
On 01/28/2014 09:39 PM, Tom Lane wrote: I'm for doing the measurement in ExplainOneQuery() and not printing anything in code paths that don't go through there. Reading the discussion here and realizing that my last patch was wrong I am now in agreement with this. I have attached a patch which no longer tries to measure planning time for prepared statements. /Andreas diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml new file mode 100644 index 2af1738..482490b *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** EXPLAIN SELECT * FROM tenk1; *** 89,94 --- 89,95 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1; *** 162,167 --- 163,174 /para para + The literalPlanning time/literal shown is the time it took to generate + the query plan from the parsed query and optimize it. It does not include + rewriting and parsing. +/para + +para Returning to our example: screen *** EXPLAIN SELECT * FROM tenk1; *** 170,175 --- 177,183 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 198,203 --- 206,212 Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) Filter: (unique1 lt; 7000) + Planning time: 0.104 ms /screen Notice that the commandEXPLAIN/ output shows the literalWHERE/ *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 234,239 --- 243,249 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.093 ms /screen Here the planner has decided to use a two-step plan: the child plan *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 262,267 --- 272,278 Filter: (stringu1 = 'xxx'::name) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.089 ms /screen The added condition literalstringu1 = 'xxx'/literal reduces the *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 283,288 --- 294,300 - Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique1 = 42) + Planning time: 0.076 ms /screen In this type of plan the table rows are fetched in index order, which *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 311,316 --- 323,329 Index Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) Index Cond: (unique2 gt; 9000) + Planning time: 0.094 ms /screen But this requires visiting both indexes, so it's not necessarily a win *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 331,336 --- 344,350 -gt; Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244) Index Cond: (unique2 gt; 9000) Filter: (unique1 lt; 100) + Planning time: 0.087 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t1.unique2 *** 364,369 --- 378,384 Index Cond: (unique1 lt; 10) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) + Planning time: 0.117 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t2.unique2 *** 415,420 --- 430,436 -gt; Materialize (cost=0.29..8.51 rows=10 width=244) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244) Index Cond: (unique2 lt; 10) + Planning time: 0.119 ms /screen The condition literalt1.hundred lt; t2.hundred/literal can't be *** WHERE t1.unique1 lt; 100 AND t1.unique2 *** 462,467 --- 478,484 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.182 ms /screen /para *** WHERE t1.unique1 lt; 100 AND t1.unique2 *** 492,497 --- 509,515 -gt; Sort (cost=197.83..200.33 rows=1000 width=244) Sort Key: t2.unique2 -gt; Seq Scan on onek t2 (cost=0.00..148.00 rows=1000
Re: [HACKERS] pg_sleep_enhancements.patch
2014-01-29 Vik Fearing vik.fear...@dalibo.com On 01/29/2014 08:04 PM, Pavel Stehule wrote: Hello I am looking on this patch Thank you for looking at it. http://www.postgresql.org/message-id/525fe206.6000...@dalibo.com a) pg_sleep_for - no objection - it is simple and secure Okay. b) pg_sleep_until I am not sure - maybe this implementation is too simply. I see two possible risk where it should not work as users can expect a) what will be expected behave whem time is changed - CET/CEST ? There is no risk there, the wake up time is specified with time zone. b) what will be expected behave when board clock is not accurate and it is periodically fixed (by NTP) - isn't better to sleep only few seconds and recalculate sleeping interval? We could do that, but it seems like overkill. It would mean writing a new C function whereas this is just a simple helper for the existing pg_sleep() function. So my vote is to keep the patch as-is. Ok second question - is not this functionality too dangerous? If somebody use it as scheduler, then a) can holds connect, session data, locks too long time b) it can stop on query timeout probably much more early then user expect What is expected use case? Regards Pavel -- Vik
Re: [HACKERS] proposal: hide application_name from other users
On 01/29/2014 10:19 AM, Simon Riggs wrote: No specific reason that I can recall but replication is heavily protected by layers of security. pg_stat_replication is a join with pg_stat_activity, so some of the info is open, some closed. It seems possible to relax that. I'm all for the idea of restrict, then open up. That is, it made sense to start with data restricted, but then unrestrict is as we know it's OK. Going the other way generally isn't possible, as this patch demonstrates. Presumably the current patch is returned with feedback? Or can we fix these problems by inventing a new user aspect called MONITOR (similar to REPLICATION)? We can grant application_name and replication details to that. Yeah, except I don't see doing the MONITOR thing for 9.4. We'd need a spec for it first. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sleep_enhancements.patch
On 01/29/2014 08:21 PM, Pavel Stehule wrote: second question - is not this functionality too dangerous? If somebody use it as scheduler, then a) can holds connect, session data, locks too long time b) it can stop on query timeout probably much more early then user expect What is expected use case? It is no more dangerous than plain pg_sleep(). The use case is convenience and clarity of code. I don't think people will be using it as a scheduler any more than they do with pg_sleep() because it can't cross transaction boundaries. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 01/29/2014 09:18 PM, Christian Kruse wrote: Hi, On 29/01/14 10:11, Jeff Janes wrote: I'm getting this warning now with gcc (GCC) 4.4.7: Interesting. I don't get that warning. But the compiler is (formally) right. pg_shmem.c: In function 'PGSharedMemoryCreate': pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this function pg_shmem.c:332: note: 'allocsize' was declared here Hmm, I didn't get that warning either. Attached patch should fix that. That's not quite right. If the first mmap() fails, allocsize is set to the rounded-up size, but the second mmap() uses the original size for the allocation. So it returns a too high value to the caller. Ugh, it's actually broken anyway :-(. The first allocation also passes *size to mmap(), so the calculated rounded-up allocsize value is not used for anything. Fix pushed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sleep_enhancements.patch
2014-01-29 Vik Fearing vik.fear...@dalibo.com On 01/29/2014 08:21 PM, Pavel Stehule wrote: second question - is not this functionality too dangerous? If somebody use it as scheduler, then a) can holds connect, session data, locks too long time b) it can stop on query timeout probably much more early then user expect What is expected use case? It is no more dangerous than plain pg_sleep(). The use case is convenience and clarity of code. I don't think people will be using it as a scheduler any more than they do with pg_sleep() because it can't cross transaction boundaries. I am sure so experienced user didn't use it. But beginners can be messed due similarity with schedulers. I invite a) documented these risks b) opinion of other hackers Probably when it is used as single statement in transaction, then it can breaks only vacuum Pavel -- Vik
Re: [HACKERS] Add force option to dropdb
On Wed, Jan 29, 2014 at 4:56 AM, salah jubeh s_ju...@yahoo.com wrote: I'm not particularly in favor of implementing this as client-side functionality, because then it's only available to people who use that particular client. Simon Riggs proposed a server-side option to the DROP DATABASE command some time ago, and I think that might be the way to go. Could you please direct me -if possible- to the thread. I think,implementing it on the client side gives the user the some visibility and control. Initially, I wanted to force drop the database, then I have changed it to kill connections. I think the change in the client tool, is simple and covers the main reason for not being able to drop a database. I think, killing client connection is one of the FAQs. OTOH, having an option like DROP DATABASE [IF EXISTS, FORCE] database is more crisp. However, what does force mean? many options exist such as killing the connections gently, waiting for connections to terminate, killing connections immediately. Also, as Alvaro Herrera mentioned, DROP OWNED BY and/or REASSIGNED OWNED BY might hinder the force option -an example here would be nice-. So, for quick wins, I prefer the kill option in the client side; but, for mature solution , certainly back-end is the way to proceed. http://www.postgresql.org/message-id/1296552979.1779.8622.camel@ebony -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing xloginsert_slots?
On Wed, Jan 29, 2014 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-01-29 21:59:05 +0900, Michael Paquier wrote: The undocumented GUC called xloginsert_slots has been introduced by commit 9a20a9b. It is mentioned by the commit that this parameter should be removed before the release. Wouldn't it be a good time to remove this parameter soon? I imagine that removing it before the beta would make sense so now is perhaps too early... Either way, attached is a patch doing so... I'd rather wait till somebody actually has done some benchmarks. I don't think we're more clueful about it now than back when the patch went in. And such benchmarking is more likely during beta, so... Well, it's either got to go away, or get documented, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 9:06 AM, Andrew Dunstan and...@dunslane.net wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Importance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I agree. I find it somewhat unlikely that pg_stat_statements is fragile enough that these few extra counters are going to make much of a difference. At the same time, I find min and max a dubious value proposition. It seems highly likely to me that stddev will pay its way, but I'm much less certain about the others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 29 January 2014 05:43, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Honestly, I would prefer that we push a patch that has been thoroughly reviewed and in which we have more confidence, so that we can push without a GUC. This means mark in CF as needs-review, then some other developer reviews it and marks it as ready-for-committer. I also believe that would be the correct procedure. Personally, I think it would be great if Noah can review this, as he's very good at finding the kind of problems that are likely to crop up here, and has examined previous versions. I also have some interest in looking at it myself. But I doubt that either of us (or any other senior hacker) can do that by Thursday. I think all such people are hip-deep in other patches at the moment; I certainly am. Yeah. I actually have little doubt that the patch is sane on its own terms of discussion, that is that Simon has chosen locking levels that are mutually consistent in an abstract sense. What sank the previous iteration was that he'd failed to consider an implementation detail, namely possible inconsistencies in SnapshotNow-based catalog fetches. I'm afraid that there may be more such problems lurking under the surface. Agreed Noah's pretty good at finding such things, but really two or three of us need to sit down and think about it for awhile before I'd have much confidence in such a fundamental change. And I sure don't have time for this patch right now myself. I've reviewed Noah's earlier comments, fixed things and also further reviewed for any similar relcache related issues. I've also reviewed Hot Standby to see if any locking issues arise, since the ALTER TABLE now won't generate an AccessExclusiveLock as it used to do for certain operations. I can't see any problems there. While doing those reviews, I'd remind everybody that this only affects roughly half of ALTER TABLE subcommands and definitely nothing that affects SELECTs. So the risk profile is much less than it sounds at first glance. If anybody else has any hints or clues about where to look, please mention them and I will investigate. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 29/01/14 21:36, Heikki Linnakangas wrote: […] Fix pushed. You are right. Thanks. But there is another bug, see 20140128154307.gc24...@defunct.ch ff. Attached you will find a patch fixing that. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index ac3a9fe..cf590a0 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -380,9 +380,12 @@ CreateAnonymousSegment(Size *size) } if (ptr == MAP_FAILED) + { + int saved_errno = errno; + ereport(FATAL, (errmsg(could not map anonymous shared memory: %m), - (errno == ENOMEM) ? + (saved_errno == ENOMEM) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space or huge pages. To reduce the request size @@ -390,6 +393,7 @@ CreateAnonymousSegment(Size *size) memory usage, perhaps by reducing shared_buffers or max_connections., *size) : 0)); + } *size = allocsize; return ptr; pgphzz7yu9Gp0.pgp Description: PGP signature
Re: [HACKERS] Planning time in explain/explain analyze
On Wed, Jan 29, 2014 at 2:21 PM, Andreas Karlsson andr...@proxel.se wrote: On 01/28/2014 09:39 PM, Tom Lane wrote: I'm for doing the measurement in ExplainOneQuery() and not printing anything in code paths that don't go through there. Reading the discussion here and realizing that my last patch was wrong I am now in agreement with this. I have attached a patch which no longer tries to measure planning time for prepared statements. Cool. I propose adding one parameter rather than two to ExplainOnePlan() and making it of type instr_time * rather than passing an instr_time and a bool. If you don't want to include the planning time, pass NULL; if you do, pass a pointer to the instr_time you want to print. I think that would come out slightly neater than what you have here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 11:54 AM, Robert Haas wrote: I agree. I find it somewhat unlikely that pg_stat_statements is fragile enough that these few extra counters are going to make much of a difference. At the same time, I find min and max a dubious value proposition. It seems highly likely to me that stddev will pay its way, but I'm much less certain about the others. What I really want is percentiles, but I'm pretty sure we already shot that down. ;-) I could use min/max -- think of performance test runs. However, I agree that they're less valuable than stddev. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
On 14 January 2014 08:38, Christian Kruse christ...@2ndquadrant.com wrote: Hi, On 13/01/14 20:06, Heikki Linnakangas wrote: On 12/17/2013 04:58 PM, Christian Kruse wrote: attached you will find a patch for showing the current transaction id (xid) and the xmin of a backend in pg_stat_activty and the xmin in pg_stat_replication. Docs. Thanks, update with updated docs is attached. Looks simple enough and useful for working out which people are holding up CONCURRENT activities. I've not been involved with this patch, so any objections to me doing final review and commit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On 01/29/2014 09:01 PM, Robert Haas wrote: Cool. I propose adding one parameter rather than two to ExplainOnePlan() and making it of type instr_time * rather than passing an instr_time and a bool. If you don't want to include the planning time, pass NULL; if you do, pass a pointer to the instr_time you want to print. I think that would come out slightly neater than what you have here. Excellent suggestion, thanks! New patch attached. /Andreas diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml new file mode 100644 index 2af1738..482490b *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** EXPLAIN SELECT * FROM tenk1; *** 89,94 --- 89,95 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1; *** 162,167 --- 163,174 /para para + The literalPlanning time/literal shown is the time it took to generate + the query plan from the parsed query and optimize it. It does not include + rewriting and parsing. +/para + +para Returning to our example: screen *** EXPLAIN SELECT * FROM tenk1; *** 170,175 --- 177,183 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 198,203 --- 206,212 Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) Filter: (unique1 lt; 7000) + Planning time: 0.104 ms /screen Notice that the commandEXPLAIN/ output shows the literalWHERE/ *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 234,239 --- 243,249 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.093 ms /screen Here the planner has decided to use a two-step plan: the child plan *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 262,267 --- 272,278 Filter: (stringu1 = 'xxx'::name) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.089 ms /screen The added condition literalstringu1 = 'xxx'/literal reduces the *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 283,288 --- 294,300 - Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique1 = 42) + Planning time: 0.076 ms /screen In this type of plan the table rows are fetched in index order, which *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 311,316 --- 323,329 Index Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) Index Cond: (unique2 gt; 9000) + Planning time: 0.094 ms /screen But this requires visiting both indexes, so it's not necessarily a win *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 331,336 --- 344,350 -gt; Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244) Index Cond: (unique2 gt; 9000) Filter: (unique1 lt; 100) + Planning time: 0.087 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t1.unique2 *** 364,369 --- 378,384 Index Cond: (unique1 lt; 10) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) + Planning time: 0.117 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t2.unique2 *** 415,420 --- 430,436 -gt; Materialize (cost=0.29..8.51 rows=10 width=244) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244) Index Cond: (unique2 lt; 10) + Planning time: 0.119 ms /screen The condition literalt1.hundred lt; t2.hundred/literal can't be *** WHERE t1.unique1 lt; 100 AND t1.unique2 *** 462,467 --- 478,484 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.182 ms /screen /para *** WHERE t1.unique1 lt; 100 AND t1.unique2 *** 492,497 --- 509,515 -gt; Sort (cost=197.83..200.33 rows=1000 width=244) Sort Key: t2.unique2
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 01/29/2014 09:59 PM, Christian Kruse wrote: Hi, On 29/01/14 21:36, Heikki Linnakangas wrote: […] Fix pushed. You are right. Thanks. But there is another bug, see 20140128154307.gc24...@defunct.ch ff. Attached you will find a patch fixing that. Thanks. There are more cases of that in InternalIpcMemoryCreate, they ought to be fixed as well. And should also grep the rest of the codebase for more instances of that. And this needs to be back-patched. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 29/01/14 22:17, Heikki Linnakangas wrote: Thanks. There are more cases of that in InternalIpcMemoryCreate, they ought to be fixed as well. And should also grep the rest of the codebase for more instances of that. And this needs to be back-patched. I'm way ahead of you ;-) Working on it. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpCS2lzolGJg.pgp Description: PGP signature
[HACKERS] review: psql command copy count tag
related to http://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab7713ddb1...@szxeml508-mbx.china.huawei.com Hello 1. I had to rebase this patch - actualised version is attached - I merged two patches to one 2. The psql code is compiled without issues after patching 3. All regress tests are passed without errors 5. We like this feature - it shows interesting info without any slowdown - psql copy command is more consistent with server side copy statement from psql perspective. This patch is ready for commit Regards Pavel *** ./src/bin/psql/common.c.orig 2014-01-29 21:09:07.108862537 +0100 --- ./src/bin/psql/common.c 2014-01-29 21:09:42.810920907 +0100 *** *** 631,638 * When the command string contained no affected COPY command, this function * degenerates to an AcceptResult() call. * ! * Changes its argument to point to the last PGresult of the command string, ! * or NULL if that result was for a COPY FROM STDIN or COPY TO STDOUT. * * Returns true on complete success, false otherwise. Possible failure modes * include purely client-side problems; check the transaction status for the --- 631,637 * When the command string contained no affected COPY command, this function * degenerates to an AcceptResult() call. * ! * Changes its argument to point to the last PGresult of the command string. * * Returns true on complete success, false otherwise. Possible failure modes * include purely client-side problems; check the transaction status for the *** *** 689,709 * connection out of its COPY state, then call PQresultStatus() * once and report any error. */ SetCancelConn(); if (result_status == PGRES_COPY_OUT) ! success = handleCopyOut(pset.db, pset.queryFout) success; else ! success = handleCopyIn(pset.db, pset.cur_cmd_source, ! PQbinaryTuples(*results)) success; ResetCancelConn(); /* * Call PQgetResult() once more. In the typical case of a * single-command string, it will return NULL. Otherwise, we'll * have other results to process that may include other COPYs. */ ! PQclear(*results); ! *results = next_result = PQgetResult(pset.db); } else if (first_cycle) /* fast path: no COPY commands; PQexec visited all results */ --- 688,723 * connection out of its COPY state, then call PQresultStatus() * once and report any error. */ + + /* + * If this is second copy; then it will be definately not \copy, + * and also it can not be from any user given file. So pass the + * value of copystream as NULL, so that read/wrie happens on + * already set cur_cmd_source/queryFout. + */ SetCancelConn(); if (result_status == PGRES_COPY_OUT) ! success = handleCopyOut(pset.db, ! (first_cycle ? pset.copyStream : NULL), ! results) success; else ! success = handleCopyIn(pset.db, ! (first_cycle ? pset.copyStream : NULL), ! PQbinaryTuples(*results), ! results) success; ResetCancelConn(); /* * Call PQgetResult() once more. In the typical case of a * single-command string, it will return NULL. Otherwise, we'll * have other results to process that may include other COPYs. + * It keeps last result. */ ! if ((next_result = PQgetResult(pset.db))) ! { ! PQclear(*results); ! *results = next_result; ! } } else if (first_cycle) /* fast path: no COPY commands; PQexec visited all results */ *** ./src/bin/psql/copy.c.orig 2014-01-29 21:09:15.131875660 +0100 --- ./src/bin/psql/copy.c 2014-01-29 21:09:42.811920909 +0100 *** *** 269,276 { PQExpBufferData query; FILE *copystream; - FILE *save_file; - FILE **override_file; struct copy_options *options; bool success; struct stat st; --- 269,274 *** *** 287,294 if (options-from) { - override_file = pset.cur_cmd_source; - if (options-file) { if (options-program) --- 285,290 *** *** 308,315 } else { - override_file = pset.queryFout; - if (options-file) { if (options-program) --- 304,309 *** *** 369,378 appendPQExpBufferStr(query, options-after_tofrom); /* Run it like a user command, interposing the data source or sink. */ ! save_file = *override_file; ! *override_file = copystream; success = SendQuery(query.data); ! *override_file = save_file; termPQExpBuffer(query); if (options-file != NULL) --- 363,371 appendPQExpBufferStr(query, options-after_tofrom); /* Run it like a user command, interposing the data source or sink. */ ! pset.copyStream = copystream; success = SendQuery(query.data); ! pset.copyStream = NULL; termPQExpBuffer(query); if (options-file != NULL) *** *** 433,444 *
Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Hi, On 29/01/14 13:39, Tom Lane wrote: No, what I meant is that the ereport caller needs to save errno, rather than assuming that (some subset of) ereport-related subroutines will preserve it. […] Your reasoning sounds quite logical to me. Thus I did a grep -RA 3 ereport src/* | less and looked for ereport calls with errno in it. I found quite a few, attached you will find a patch addressing that issue. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index d73e5e8..3705d0b 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -782,10 +782,14 @@ remove_symlink: else { if (unlink(linkloc) 0) - ereport(redo ? LOG : (errno == ENOENT ? WARNING : ERROR), + { + int saved_errno = errno; + + ereport(redo ? LOG : (saved_errno == ENOENT ? WARNING : ERROR), (errcode_for_file_access(), errmsg(could not remove symbolic link \%s\: %m, linkloc))); + } } pfree(linkloc_with_version_dir); diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c index b4825d2..c79c8ad 100644 --- a/src/backend/port/sysv_sema.c +++ b/src/backend/port/sysv_sema.c @@ -85,7 +85,8 @@ static void ReleaseSemaphores(int status, Datum arg); static IpcSemaphoreId InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems) { - int semId; + int semId, +saved_errno; semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection); @@ -107,12 +108,13 @@ InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems) /* * Else complain and abort */ + saved_errno = errno; ereport(FATAL, (errmsg(could not create semaphores: %m), errdetail(Failed system call was semget(%lu, %d, 0%o)., (unsigned long) semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection), - (errno == ENOSPC) ? + (saved_errno == ENOSPC) ? errhint(This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of @@ -133,13 +135,14 @@ static void IpcSemaphoreInitialize(IpcSemaphoreId semId, int semNum, int value) { union semun semun; + int saved_errno = errno; semun.val = value; if (semctl(semId, semNum, SETVAL, semun) 0) ereport(FATAL, (errmsg_internal(semctl(%d, %d, SETVAL, %d) failed: %m, semId, semNum, value), - (errno == ERANGE) ? + (saved_errno == ERANGE) ? errhint(You possibly need to raise your kernel's SEMVMX value to be at least %d. Look into the PostgreSQL documentation for details., value) : 0)); diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index ac3a9fe..cb297bb 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -68,6 +68,7 @@ InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size) { IpcMemoryId shmid; void *memAddress; + int saved_errno = 0; shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection); @@ -137,25 +138,26 @@ InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size) * it should be. SHMMNI violation is ENOSPC, per spec. Just plain * not-enough-RAM is ENOMEM. */ + saved_errno = errno; ereport(FATAL, (errmsg(could not create shared memory segment: %m), errdetail(Failed system call was shmget(key=%lu, size=%zu, 0%o)., (unsigned long) memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection), - (errno == EINVAL) ? + (saved_errno == EINVAL) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter, or possibly that it is less than your kernel's SHMMIN parameter.\n The PostgreSQL documentation contains more information about shared memory configuration.) : 0, - (errno == ENOMEM) ? + (saved_errno == ENOMEM) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMALL parameter. You might need to reconfigure the kernel with larger SHMALL.\n The PostgreSQL documentation contains more information about shared memory configuration.) : 0, - (errno == ENOSPC) ? + (saved_errno == ENOSPC) ? errhint(This error does *not* mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, pgpkJvJgiH340.pgp Description: PGP signature
Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Hi, On 29/01/14 21:37, Christian Kruse wrote: […] attached you will find a patch addressing that issue. Maybe we should include the patch proposed in 20140129195930.gd31...@defunct.ch and do this as one (slightly bigger) patch. Attached you will find this alternative version. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index d73e5e8..3705d0b 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -782,10 +782,14 @@ remove_symlink: else { if (unlink(linkloc) 0) - ereport(redo ? LOG : (errno == ENOENT ? WARNING : ERROR), + { + int saved_errno = errno; + + ereport(redo ? LOG : (saved_errno == ENOENT ? WARNING : ERROR), (errcode_for_file_access(), errmsg(could not remove symbolic link \%s\: %m, linkloc))); + } } pfree(linkloc_with_version_dir); diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c index b4825d2..c79c8ad 100644 --- a/src/backend/port/sysv_sema.c +++ b/src/backend/port/sysv_sema.c @@ -85,7 +85,8 @@ static void ReleaseSemaphores(int status, Datum arg); static IpcSemaphoreId InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems) { - int semId; + int semId, +saved_errno; semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection); @@ -107,12 +108,13 @@ InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems) /* * Else complain and abort */ + saved_errno = errno; ereport(FATAL, (errmsg(could not create semaphores: %m), errdetail(Failed system call was semget(%lu, %d, 0%o)., (unsigned long) semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection), - (errno == ENOSPC) ? + (saved_errno == ENOSPC) ? errhint(This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of @@ -133,13 +135,14 @@ static void IpcSemaphoreInitialize(IpcSemaphoreId semId, int semNum, int value) { union semun semun; + int saved_errno = errno; semun.val = value; if (semctl(semId, semNum, SETVAL, semun) 0) ereport(FATAL, (errmsg_internal(semctl(%d, %d, SETVAL, %d) failed: %m, semId, semNum, value), - (errno == ERANGE) ? + (saved_errno == ERANGE) ? errhint(You possibly need to raise your kernel's SEMVMX value to be at least %d. Look into the PostgreSQL documentation for details., value) : 0)); diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index ac3a9fe..511be72 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -68,6 +68,7 @@ InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size) { IpcMemoryId shmid; void *memAddress; + int saved_errno = 0; shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection); @@ -137,25 +138,26 @@ InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size) * it should be. SHMMNI violation is ENOSPC, per spec. Just plain * not-enough-RAM is ENOMEM. */ + saved_errno = errno; ereport(FATAL, (errmsg(could not create shared memory segment: %m), errdetail(Failed system call was shmget(key=%lu, size=%zu, 0%o)., (unsigned long) memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection), - (errno == EINVAL) ? + (saved_errno == EINVAL) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter, or possibly that it is less than your kernel's SHMMIN parameter.\n The PostgreSQL documentation contains more information about shared memory configuration.) : 0, - (errno == ENOMEM) ? + (saved_errno == ENOMEM) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMALL parameter. You might need to reconfigure the kernel with larger SHMALL.\n The PostgreSQL documentation contains more information about shared memory configuration.) : 0, - (errno == ENOSPC) ? + (saved_errno == ENOSPC) ? errhint(This error does *not* mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, @@ -380,9 +382,12 @@ CreateAnonymousSegment(Size *size) } if (ptr == MAP_FAILED) + { + int saved_errno = errno; + ereport(FATAL, (errmsg(could not map anonymous shared memory: %m), - (errno == ENOMEM) ? + (saved_errno == ENOMEM) ? errhint(This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space or
Re: [HACKERS] jsonb and nested hstore
On Wed, Jan 29, 2014 at 12:03 PM, Andrew Dunstan and...@dunslane.net wrote: Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. I think the opening paragraphs contrasting json/jsonb be needs refinement. json is going to be slightly faster than jsonb for input *and* output. For example, in one application I store fairly large json objects containing pre-compiled static polygon data that is simply flipped up to google maps. This case will likely be pessimal for jsonb. For the next paragaph, I'd like to expand it a bit on 'specialized needs' and boil it down to specific uses cases. Basically, json will likely be more compact in most cases and slightly faster for input/output; jsonb would be preferred in any context where processing, or searching or extensive server side parsing is employed. If you agree, I'd be happy to do that... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/29/2014 12:46 PM, Merlin Moncure wrote: I think the opening paragraphs contrasting json/jsonb be needs refinement. json is going to be slightly faster than jsonb for input *and* output. For example, in one application I store fairly large json objects containing pre-compiled static polygon data that is simply flipped up to google maps. This case will likely be pessimal for jsonb. For the next paragaph, I'd like to expand it a bit on 'specialized needs' and boil it down to specific uses cases. Basically, json will likely be more compact in most cases and slightly faster for input/output; jsonb would be preferred in any context where processing, or searching or extensive server side parsing is employed. If you agree, I'd be happy to do that... Please take a stab at it, I'll be happy to revise it. I was working on doing a two-column table comparison chart; I still think that's the best way to go. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing xloginsert_slots?
On 29. Januar 2014 20:51:38 MEZ, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 29, 2014 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-01-29 21:59:05 +0900, Michael Paquier wrote: The undocumented GUC called xloginsert_slots has been introduced by commit 9a20a9b. It is mentioned by the commit that this parameter should be removed before the release. Wouldn't it be a good time to remove this parameter soon? I imagine that removing it before the beta would make sense so now is perhaps too early... Either way, attached is a patch doing so... I'd rather wait till somebody actually has done some benchmarks. I don't think we're more clueful about it now than back when the patch went in. And such benchmarking is more likely during beta, so... Well, it's either got to go away, or get documented, IMHO. Yes, all I am saying is that I'd like to wait till things have calmed down a bit, so it actually makes sense to run bigger benchmarks. I don't think removing the option is that urgent. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On Wed, Jan 29, 2014 at 3:13 PM, Andreas Karlsson andr...@proxel.se wrote: On 01/29/2014 09:01 PM, Robert Haas wrote: Cool. I propose adding one parameter rather than two to ExplainOnePlan() and making it of type instr_time * rather than passing an instr_time and a bool. If you don't want to include the planning time, pass NULL; if you do, pass a pointer to the instr_time you want to print. I think that would come out slightly neater than what you have here. Excellent suggestion, thanks! New patch attached. Committed with minor doc changes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 6:06 AM, Andrew Dunstan and...@dunslane.net wrote: mportance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I certainly agree that stddev is far more valuable than min and max. I'd be inclined to not accept the latter on the grounds that they aren't that useful. So, am I correct in my understanding: The benchmark performed here [1] was of a standard pgbench SELECT workload, with no other factor influencing the general overhead (unlike the later test that queried pg_stat_statements as well)? I'd prefer to have seen the impact on a 32 core system, where spinlock contention would naturally be more likely to appear, but even still I'm duly satisfied. There was no testing of the performance impact prior to 6 days ago, and I'm surprised it took Mitsumasa that long to come up with something like this, since I raised this concern about 3 months ago. [1] http://www.postgresql.org/message-id/52e10e6a.5060...@lab.ntt.co.jp -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On 01/29/2014 10:10 PM, Robert Haas wrote: Committed with minor doc changes. Thanks! /Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 04:14 PM, Peter Geoghegan wrote: On Wed, Jan 29, 2014 at 6:06 AM, Andrew Dunstan and...@dunslane.net wrote: mportance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I certainly agree that stddev is far more valuable than min and max. I'd be inclined to not accept the latter on the grounds that they aren't that useful. So, am I correct in my understanding: The benchmark performed here [1] was of a standard pgbench SELECT workload, with no other factor influencing the general overhead (unlike the later test that queried pg_stat_statements as well)? I'd prefer to have seen the impact on a 32 core system, where spinlock contention would naturally be more likely to appear, but even still I'm duly satisfied. I could live with just stddev. Not sure others would be so happy. Glad we're good on the performance impact front. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tests for client programs
Hello I am looking on this patch. It is great idea, and I am sure, so we want this patch - it was requested and proposed more time. Some tips: a) possibility to test only selected tests b) possibility to verify generated file against expected file c) detection some warnings (expected/unexpected) p.s. some tests fails when other Postgres is up. It should be checked on start.and raise warning or stop testing. Regards Pavel ok 4 - pg_ctl initdb waiting for server to startLOG: could not bind IPv6 socket: Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. LOG: could not bind IPv4 socket: Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. WARNING: could not create listen socket for localhost FATAL: could not create any TCP/IP sockets stopped waiting pg_ctl: could not start server Examine the log output. not ok 5 - pg_ctl start -w # Failed test 'pg_ctl start -w' # at /home/pavel/src/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm line 89. waiting for server to startLOG: could not bind IPv6 socket: Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. LOG: could not bind IPv4 socket: Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. WARNING: could not create listen socket for localhost FATAL: could not create any TCP/IP sockets stopped waiting pg_ctl: could not start server Examine the log output. not ok 6 - second pg_ctl start succeeds # Failed test 'second pg_ctl start succeeds' # at /home/pavel/src/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm line 89. pg_ctl: PID file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.pid does not exist Is server running? not ok 7 - pg_ctl stop -w # Failed test 'pg_ctl stop -w' # at /home/pavel/src/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm line 89. ok 8 - second pg_ctl stop fails pg_ctl: PID file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.pid does not exist Is server running? starting server anyway pg_ctl: could not read file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.opts not ok 9 - pg_ctl restart with server not running # Failed test 'pg_ctl restart with server not running' # at /home/pavel/src/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm line 89. pg_ctl: PID file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.pid does not exist Is server running? starting server anyway pg_ctl: could not read file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.opts not ok 10 - pg_ctl restart with server running # Failed test 'pg_ctl restart with server running' # at /home/pavel/src/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm line 89. pg_ctl: PID file /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data/postmaster.pid does not exist Is server running? Bailout called. Further testing stopped: system pg_ctl stop -D /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data -m fast failed: 256 Bail out! system pg_ctl stop -D /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data -m fast failed: 256 FAILED--Further testing stopped: system pg_ctl stop -D /home/pavel/src/postgresql/src/bin/pg_ctl/testf2KW/data -m fast failed: 256 make[1]: *** [check] Error 255 make[1]: Leaving directory `/home/pavel/src/postgresql/src/bin/pg_ctl' make: *** [check-pg_ctl-recurse] Error 2 make: Leaving directory `/home/pavel/src/postgresql/src/bin'
Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Christian Kruse christ...@2ndquadrant.com writes: Your reasoning sounds quite logical to me. Thus I did a grep -RA 3 ereport src/* | less and looked for ereport calls with errno in it. I found quite a few, attached you will find a patch addressing that issue. Excellent, thanks for doing the legwork. I'll take care of getting this committed and back-patched. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] commit fest 2014-01 week 2 report
Last week: Status Summary. Needs Review: 66, Waiting on Author: 16, Ready for Committer: 9, Committed: 20, Returned with Feedback: 2. Total: 113. This week: Status Summary. Needs Review: 49, Waiting on Author: 18, Ready for Committer: 9, Committed: 33, Returned with Feedback: 3, Rejected: 1. Total: 113. I wrote to all patch authors who didn't sign up as reviewers yet, and a number of them signed up. Several asked me for beginner-level patches to look at, but it seems we're all out of beginner-level patches this time. A lot of patches don't have any reviewer yet, for the same reason. I reminded all reviewers to send in their first review. In the past, we would soon start unassigning inactive reviewers, but seems a bit futile here, for the reason above. So while there is decent progress, you can do your own math about where this is likely to end. There are a lot of complex patches under consideration, and we might have a painful end game. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Jan 29, 2014 at 3:56 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2014 01:03 PM, Andrew Dunstan wrote: On 01/27/2014 10:43 PM, Andrew Dunstan wrote: On 01/26/2014 05:42 PM, Andrew Dunstan wrote: Here is the latest set of patches for nested hstore and jsonb. Because it's so large I've broken this into two patches and compressed them. The jsonb patch should work standalone. The nested hstore patch depends on it. All the jsonb functions now use the jsonb API - there is no more turning jsonb into text and reparsing it. At this stage I'm going to be starting cleanup on the jsonb code (indentation, error messages, comments etc.) as well get getting up some jsonb docs. Here is an update of the jsonb part of this. Charges: * there is now documentation for jsonb * most uses of elog() in json_funcs.c are replaced by ereport(). * indentation fixes and other tidying. No changes in functionality. Further update of jsonb portion. Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. Updated to apply cleanly after recent commits. ok, great. This is really fabulous. So far most everything feels natural and good. I see something odd in terms of the jsonb use case coverage. One of the major headaches with json deserialization presently is that there's no easy way to easily move a complex (record- or array- containing) json structure into a row object. For example, create table bar(a int, b int[]); postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, false); ERROR: cannot populate with a nested object unless use_json_as_text is true If find the use_json_as_text argument here to be pretty useless (unlike in the json_build to_record variants where it least provides some hope for an escape hatch) for handling this since it will just continue to fail: postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, true); ERROR: missing ] in array dimensions OTOH, the nested hstore handles this no questions asked: postgres=# select * from populate_record(null::bar, 'a=1, b={1,2}'::hstore); a | b ---+--- 1 | {1,2} So, if you need to convert a complex json to a row type, the only effective way to do that is like this: postgres=# select* from populate_record(null::bar, '{a: 1, b: [1,2]}'::json::hstore); a | b ---+--- 1 | {1,2} Not a big deal really. But it makes me wonder (now that we have the internal capability of properly mapping to a record) why *both* the json/jsonb populate record variants shouldn't point to what the nested hstore behavior is when the 'as_text' flag is false. That would demolish the error and remove the dependency on hstore in order to do effective rowtype mapping. In an ideal world the json_build 'to_record' variants would behave similarly I think although there's no existing hstore analog so I'm assuming it's a non-trival amount of work. Now, if we're agreed on that, I then also wonder if the 'as_text' argument needs to exist at all for the populate functions except for backwards compatibility on the json side (not jsonb). For non-complex structures it does best effort casting anyways so the flag is moot. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/29/2014 05:37 PM, Merlin Moncure wrote: On Wed, Jan 29, 2014 at 3:56 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2014 01:03 PM, Andrew Dunstan wrote: On 01/27/2014 10:43 PM, Andrew Dunstan wrote: On 01/26/2014 05:42 PM, Andrew Dunstan wrote: Here is the latest set of patches for nested hstore and jsonb. Because it's so large I've broken this into two patches and compressed them. The jsonb patch should work standalone. The nested hstore patch depends on it. All the jsonb functions now use the jsonb API - there is no more turning jsonb into text and reparsing it. At this stage I'm going to be starting cleanup on the jsonb code (indentation, error messages, comments etc.) as well get getting up some jsonb docs. Here is an update of the jsonb part of this. Charges: * there is now documentation for jsonb * most uses of elog() in json_funcs.c are replaced by ereport(). * indentation fixes and other tidying. No changes in functionality. Further update of jsonb portion. Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. Updated to apply cleanly after recent commits. ok, great. This is really fabulous. So far most everything feels natural and good. I see something odd in terms of the jsonb use case coverage. One of the major headaches with json deserialization presently is that there's no easy way to easily move a complex (record- or array- containing) json structure into a row object. For example, create table bar(a int, b int[]); postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, false); ERROR: cannot populate with a nested object unless use_json_as_text is true If find the use_json_as_text argument here to be pretty useless (unlike in the json_build to_record variants where it least provides some hope for an escape hatch) for handling this since it will just continue to fail: postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, true); ERROR: missing ] in array dimensions OTOH, the nested hstore handles this no questions asked: postgres=# select * from populate_record(null::bar, 'a=1, b={1,2}'::hstore); a | b ---+--- 1 | {1,2} So, if you need to convert a complex json to a row type, the only effective way to do that is like this: postgres=# select* from populate_record(null::bar, '{a: 1, b: [1,2]}'::json::hstore); a | b ---+--- 1 | {1,2} Not a big deal really. But it makes me wonder (now that we have the internal capability of properly mapping to a record) why *both* the json/jsonb populate record variants shouldn't point to what the nested hstore behavior is when the 'as_text' flag is false. That would demolish the error and remove the dependency on hstore in order to do effective rowtype mapping. In an ideal world the json_build 'to_record' variants would behave similarly I think although there's no existing hstore analog so I'm assuming it's a non-trival amount of work. Now, if we're agreed on that, I then also wonder if the 'as_text' argument needs to exist at all for the populate functions except for backwards compatibility on the json side (not jsonb). For non-complex structures it does best effort casting anyways so the flag is moot. Well, I could certainly look at making the populate_record{set} and to_record{set} logic handle types that are arrays or composites inside the record. It might not be terribly hard to do - not sure. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.3
It seems to me that the terms physical, logical, and binary are always relative to the perspective of the component being worked on. Physical often means one level of abstraction below mine, and upon which my work builds. Logical means my work's level of abstraction. And Binary means data which I'm not going to pretend I know or care how to interpret. So I'd suggest block and tuple, perhaps. On Wed, Jan 29, 2014 at 4:25 AM, Albe Laurenz laurenz.a...@wien.gv.atwrote: Andreas Karlsson wrote: On 01/28/2014 10:56 PM, Andres Freund wrote: On 2014-01-28 21:48:09 +, Thom Brown wrote: On 28 January 2014 21:37, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 28, 2014 at 11:53 AM, Robert Haas robertmh...@gmail.com wrote: The point of Andres's patch set is to introduce a new technology called logical decoding; that is, the ability to get a replication stream that is based on changes to tuples rather than changes to blocks. It could also be called logical replication. In these patches, our existing replication is referred to as physical replication, which sounds kind of funny to me. Anyone have another suggestion? Logical and Binary replication? Unfortunately changeset extraction output's can be binary data... I think physical and logical are fine and they seem to be well known terminology. Oracle uses those words and I have also seen many places use physical backup and logical backup, for example on Barman's homepage. +1 I think it also fits well with the well-known terms physical [database] design and logical design. Not that it is the same thing, but I believe that every database person, when faced with the distiction physical versus logical, will conclude that the former refers to data placement or block structure, while the latter refers to the semantics of the data being stored. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 01/29/2014 02:37 PM, Merlin Moncure wrote: create table bar(a int, b int[]); postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, false); ERROR: cannot populate with a nested object unless use_json_as_text is true Hmmm. What about just making any impossibly complex objects type JSON? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan and...@dunslane.net writes: I could live with just stddev. Not sure others would be so happy. FWIW, I'd vote for just stddev, on the basis that min/max appear to add more to the counter update time than stddev does; you've got this: + e-counters.total_sqtime += total_time * total_time; versus this: + if (e-counters.min_time total_time || e-counters.min_time == EXEC_TIME_INIT) + e-counters.min_time = total_time; + if (e-counters.max_time total_time) + e-counters.max_time = total_time; I think on most modern machines, a float multiply-and-add is pretty darn cheap; a branch that might or might not be taken, OTOH, is a performance bottleneck. Similarly, the shared memory footprint hit is more: two new doubles for min/max versus one for total_sqtime (assuming we're happy with the naive stddev calculation). If we felt that min/max were of similar value to stddev then this would be mere nitpicking. But since people seem to agree they're worth less, I'm thinking the cost/benefit ratio isn't there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write buffer lock that the caller will have provided (per its comments) - it also drops the buffer pin. It calls _bt_insert_parent() last, which was previously only called last thing by _bt_insertonpg() (some of the time), and _bt_insertonpg() is indeed documented to drop both the lock and the pin. And if you look at _bt_moveright() again, you'll see that there is a tacit assumption that the buffer pin isn't dropped, or at least that opaque (the BTPageOpaque BT special page area pointer) continues to point to the same page held in the same buffer, even though the code doesn't set the opaque' pointer again and it may not point to a pinned buffer or even the appropriate buffer. Ditto page. So opaque may point to the wrong thing on subsequent iterations - you don't do anything with the return value of _bt_getbuf(), unlike all of the existing call sites. I believe you should store its return value, and get the held page and the special pointer on the page, and assign that to the opaque pointer before the next iteration (an iteration in which you very probably really do make progress not limited to completing a page split, the occurrence of which the _bt_moveright() loop gets stuck on). You know, do what is done in the non-split-handling case. It may not always be the same buffer as before, obviously. Yep, fixed. Can you explain what the fix was, please? Ping? I would like to hear some details here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Wed, Jan 29, 2014 at 12:53:02AM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: If this only affects a handful of places, then sure, let's go ahead and fix it. But if it's going to create a massive enough diff that we've gotta think about back-patching it, then IMHO it's totally not worth it. A quick grep search says there are well over 3000 comment lines containing '.' followed by a tab. grep isn't smart enough to tell if the tabs expand to exactly two spaces, but I bet the vast majority do. So it might not be quite as bad as the 8.1 wrap-margin change, but it'd be extensive. I have cleaned up entab.c so I am ready to add a new option that removes tabs from only comments. Would you like me to create that and provide a diff at a URL? It would have to be run against all back branches. If it goes well, it could prove to be a way to incrementally improve pgindent. If not, I am afraid we are stuck with our current pgindent output. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Bruce Momjian br...@momjian.us writes: I have cleaned up entab.c so I am ready to add a new option that removes tabs from only comments. Would you like me to create that and provide a diff at a URL? It would have to be run against all back branches. If you think you can actually tell the difference reliably in entab, sure, give it a go. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote: On 11/30/13, 6:59 AM, Haribabu kommi wrote: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. I note that initdb does not detect whether the data and xlog directories are the same. I think there is no point in addressing this only in pg_basebackup. If we want to forbid it, it should be done in initdb foremost. Thanks for pointing it. if the following approach is fine for identifying the identical directories then I will do the same for initdb also. I'm not sure it's worth the trouble, but if I were to do it, I'd just stat() the two directories and compare their inodes. That seems much easier and more robust than comparing path strings stat() is having problems in windows, because of that reason the patch is written to identify the directories with string comparison. Regards, Hari Babu
Re: [HACKERS] GIN improvements part2: fast scan
On 28.1.2014 08:29, Heikki Linnakangas wrote: On 01/28/2014 05:54 AM, Tomas Vondra wrote: Then I ran those scripts on: * 9.3 * 9.4 with Heikki's patches (9.4-heikki) * 9.4 with Heikki's and first patch (9.4-alex-1) * 9.4 with Heikki's and both patches (9.4-alex-2) It would be good to also test with unpatched 9.4 (ie. git master). The packed posting lists patch might account for a large part of the differences between 9.3 and the patched 9.4 versions. - Heikki Hi, the e-mail I sent yesterday apparently did not make it into the list, probably because of the attachments, so I'll just link them this time. I added the results from 9.4 master to the spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0Alm8ruV3ChcgdHJfZTdOY2JBSlkwZjNuWGlIaGM0REE It's a bit cumbersome to analyze though, so I've quickly hacked up a simple jqplot page that allows comparing the results. It's available here: http://www.fuzzy.cz/tmp/gin/ It's likely there are some quirks and issues - let me know about them. The ODT with the data is available here: http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods Three quick basic observations: (1) The current 9.4 master is consistently better than 9.3 by about 15% on rare words, and up to 30% on common words. See the charts for 6-word queries: http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png With 3-word queries the effects are even stronger clearer, especially with the common words. (2) Heikki's patches seem to work OK, i.e. improve the performance, but only with rare words. http://www.fuzzy.cz/tmp/gin/heikki-vs-94-rare.png With 3 words the impact is much stronger than with 6 words, presumably because it depends on how frequent the combination of words is (~ multiplication of probabilities). See http://www.fuzzy.cz/tmp/gin/heikki-vs-94-3-common-words.png http://www.fuzzy.cz/tmp/gin/heikki-vs-94-6-common-words.png for comparison of 9.4 master vs. 9.4+heikki's patches. (3) A file with explain plans for 4 queries suffering ~2x slowdown, and explain plans with 9.4 master and Heikki's patches is available here: http://www.fuzzy.cz/tmp/gin/queries.txt All the queries have 6 common words, and the explain plans look just fine to me - exactly like the plans for other queries. Two things now caught my eye. First some of these queries actually have words repeated - either exactly like database database or in negated form like !anything anything. Second, while generating the queries, I use dumb frequency, where only exact matches count. I.e. write != written etc. But the actual number of hits may be much higher - for example write matches exactly just 5% documents, but using @@ it matches more than 20%. I don't know if that's the actual cause though. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suspicion of a compiler bug in clang: using ternary operator in ereport()
Christian Kruse christ...@2ndquadrant.com writes: Your reasoning sounds quite logical to me. Thus I did a grep -RA 3 ereport src/* | less and looked for ereport calls with errno in it. I found quite a few, attached you will find a patch addressing that issue. Committed. I found a couple of errors in your patch, but I think everything is addressed in the patch as committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FOREIGN KEY ... CONCURRENTLY
David Fetter wrote: 2) Is there another way to solve the problem of adding a foreign key constraint that points at a busy table? Add it as NOT VALID and do a later VALIDATE CONSTRAINT step, after the patch to reduce lock levels for ALTER TABLE goes in, perhaps? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Wed, Jan 29, 2014 at 07:31:29PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I have cleaned up entab.c so I am ready to add a new option that removes tabs from only comments. Would you like me to create that and provide a diff at a URL? It would have to be run against all back branches. If you think you can actually tell the difference reliably in entab, sure, give it a go. OK, I have modified entab.c in a private patch to only process text inside comments, and not process leading whitespace, patch attached. I basically ran 'entab -o -t4 -d' on the C files. The result are here, in context, plain, and unified format: http://momjian.us/expire/entab_comment.cdiff http://momjian.us/expire/entab_comment.pdiff http://momjian.us/expire/entab_comment.udiff and their line counts: 89741 entab_comment.cdiff 26351 entab_comment.pdiff 50503 entab_comment.udiff I compute 6627 lines as modified. What I did not do is handle _only_ cases with periods before the tabs. Should I try that? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/tools/entab/entab.c b/src/tools/entab/entab.c new file mode 100644 index 3b849f2..3fd2997 *** a/src/tools/entab/entab.c --- b/src/tools/entab/entab.c *** main(int argc, char **argv) *** 51,63 { int tab_size = 8, min_spaces = 2, protect_quotes = FALSE, del_tabs = FALSE, clip_lines = FALSE, prv_spaces, col_in_tab, escaped, ! nxt_spaces; char in_line[BUFSIZ], out_line[BUFSIZ], *src, --- 51,66 { int tab_size = 8, min_spaces = 2, + only_comments = FALSE, protect_quotes = FALSE, del_tabs = FALSE, clip_lines = FALSE, + in_comment = FALSE, prv_spaces, col_in_tab, escaped, ! nxt_spaces, ! leading_whitespace; char in_line[BUFSIZ], out_line[BUFSIZ], *src, *** main(int argc, char **argv) *** 74,80 if (strcmp(cp, detab) == 0) del_tabs = 1; ! while ((ch = getopt(argc, argv, cdhqs:t:)) != -1) switch (ch) { case 'c': --- 77,83 if (strcmp(cp, detab) == 0) del_tabs = 1; ! while ((ch = getopt(argc, argv, cdhoqs:t:)) != -1) switch (ch) { case 'c': *** main(int argc, char **argv) *** 83,88 --- 86,94 case 'd': del_tabs = TRUE; break; + case 'o': + only_comments = TRUE; + break; case 'q': protect_quotes = TRUE; break; *** main(int argc, char **argv) *** 97,102 --- 103,109 fprintf(stderr, USAGE: %s [ -cdqst ] [file ...]\n\ -c (clip trailing whitespace)\n\ -d (delete tabs)\n\ + -o (only comments)\n\ -q (protect quotes)\n\ -s minimum_spaces\n\ -t tab_width\n, *** main(int argc, char **argv) *** 134,146 if (escaped == FALSE) quote_char = ' '; escaped = FALSE; /* process line */ while (*src != NUL) { col_in_tab++; /* Is this a potential space/tab replacement? */ ! if (quote_char == ' ' (*src == ' ' || *src == '\t')) { if (*src == '\t') { --- 141,163 if (escaped == FALSE) quote_char = ' '; escaped = FALSE; + leading_whitespace = TRUE; /* process line */ while (*src != NUL) { col_in_tab++; + + /* look backward so we handle slash-star-slash properly */ + if (!in_comment src in_line + *(src - 1) == '/' *src == '*') + in_comment = TRUE; + else if (in_comment *src == '*' *(src + 1) == '/') + in_comment = FALSE; + /* Is this a potential space/tab replacement? */ ! if ((!only_comments || (in_comment !leading_whitespace)) ! quote_char == ' ' (*src == ' ' || *src == '\t')) { if (*src == '\t') { *** main(int argc, char **argv) *** 192,197 --- 209,218 /* Not a potential space/tab replacement */ else { + /* allow leading stars in comments */ + if (leading_whitespace *src != ' ' *src != '\t' + (!in_comment || *src != '*')) + leading_whitespace = FALSE; /* output accumulated spaces */ output_accumulated_spaces(prv_spaces, dst); /* This can only happen in a quote. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/29 16:58), Peter Geoghegan wrote: On Tue, Jan 28, 2014 at 10:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. You could make a strong case for the patch improving performance in practice for many users, by allowing us to increase the pg_stat_statements.max default value to 5,000, 5 times the previous value. The real expense of creating a new entry is the exclusive locking of the hash table, which blocks *everything* in pg_stat_statements. That isn't expanded at all, since queries are only written with a shared lock, which only blocks the creation of new entries which was already relatively expensive. In particular, it does not block the maintenance of costs for all already observed entries in the hash table. It's obvious that simply reducing the pressure on the cache will improve matters a lot, which for many users the external texts patch does. Since Mitsumasa has compared that patch and at least one of his proposed pg_stat_statements patches on several occasions, I will re-iterate the difference: any increased overhead from the external texts patch is paid only when a query is first entered into the hash table. Then, there is obviously and self-evidently no additional overhead from contention for any future execution of the same query, no matter what, indefinitely (the exclusive locking section of creating a new entry does not do I/O, except in fantastically unlikely circumstances). So for many of the busy production systems I work with, that's the difference between a cost paid perhaps tens of thousands of times a second, and a cost paid every few days or weeks. Even if he is right about a write() taking an unreasonably large amount of time on Windows, the consequences felt as pg_stat_statements LWLock contention would be very limited. I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. I'd like to know the truth and the fact in your patch, rather than your argument:-) So I create detail pg_stat_statements benchmark tool using pgbench. This tool can create 1 pattern unique sqls in a file, and it is only for measuring pg_stat_statements performance. Because it only updates pg_stat_statements data and doesn't write to disk at all. It's fair benchmark. [usage] perl create_sql.pl file.sql psql -h -h xxx.xxx.xxx.xxx mitsu-ko -c 'SELECT pg_stat_statements_reset()' pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -n -T 180 -f file.sql [part of sample sqls file, they are executed in pgbench] SELECT 1-1/9+5*8*6+5+9-6-3-4/9%7-2%7/5-9/7+2+9/7-1%5%9/3-4/4-9/1+5+5/6/5%2+1*2*3-8/8+5-3-8-4/8+5%2*2-2-5%6+4 SELECT 1%9%7-8/5%6-1%2*2-7+9*6-2*6-9+1-2*9+6+7*8-4-2*1%3/7-1%4%9+4+7/5+4/2-3-5%8/3/7*6-1%8*6*1/7/2%5*6/2-3-9 SELECT 1%3*2/8%6%5-3%1+1/6*6*5/9-2*5+6/6*5-1/2-6%4%8/7%2*7%5%9%5/9%1%1-3-9/2*1+1*6%8/2*4/1+6*7*1+5%6+9*1-9*6 ... I test some methods that include old pgss, old pgss with my patch, and new pgss. Test server and postgresql.conf are same in I tested last day in this ML-thread. And test methods and test results are here, [methods] method 1: with old pgss, pg_stat_statements.max=1000(default) method 2: with old pgss with my patch, pg_stat_statements.max=1000(default) peter 1 : with new pgss(Peter's patch), pg_stat_statements.max=5000(default) peter 2 : with new pgss(Peter's patch), pg_stat_statements.max=1000 [for reference] method 5:with old pgss, pg_stat_statements.max=5000 method 6:with old pgss with my patch, pg_stat_statements.max=5000 [results] method | try1 | try2 | try3 | degrade performance ratio - method 1 | 6.546 | 6.558 | 6.638 | 0% (reference score) method 2 | 6.527 | 6.556 | 6.574 | 1% peter 1 | 5.204 | 5.203 | 5.216 |20% peter 2 | 4.241 | 4.236 | 4.262 |35% method 5 | 5.931 | 5.848 | 5.872 |11% method 6 | 5.794 | 5.792 | 5.776 |12% New pgss is extremely degrade performance than old pgss, and I cannot see performance scaling. I understand that his argument was My patch reduces time of LWLock in pg_stat_statements, it is good for performance. However, Kondo's patch will be degrade performance in