Re: [HACKERS] Problem with displaying wide tables in psql
Hi. How can I pass or set the value of pset variable for an regression test? The code with ioctl was copied from print_aligned_text function, which has a long history. 43ee2282 (Bruce Momjian 2008-05-16 16:59:05 + 680) if (ioctl(fileno(stdout), TIOCGWINSZ, screen_size) != -1) 2014-04-08 20:27 GMT+04:00 Greg Stark st...@mit.edu: On Tue, Apr 8, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think this is easily testable that way - doesn't it rely on determining the width of the terminal? Which you won't have when started from pg_regress? There's a pset variable to set the target width so at least the formatting code can be tested. It would be nice to have the ioctl at least get called on the regression farm so we're sure we aren't doing something unportable. -- greg -- Best regards, Sergey Muraviov
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/09/2014 01:18 AM, Andrew Dunstan wrote: On 04/08/2014 05:57 PM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. I tend to agree with Tom that POLA will be more violated by the default ops class not being able to index some values. Yeah. rant Both of the operator classes are actually much less flexible than I'd like. Firstly, they index everything. In many cases, that's not what you want, so you end up with much larger indexes than necessary. Secondly, jsonb_ops indexes all values separately from the keys. That makes the index pretty much useless for a query on, say, WHERE json @ '{needs_processing:true}', if all the rows also contain a key-value pair active:true. Thirdly, inequality operators are not supported; you can't search for rows with (the json-syntax equivalent of) price 12.3. Fourthly, sometimes you would want to include the path to an entry in the key, sometimes not. If I understood correctly the way jsonb_hash_ops works, the limitation compared to jsonb_ops is that it cannot be used for foo ? 'bar' type queries. And the reason for that limitation is that it hashes the whole path to the key; the naked values are not indexes separately. But why not? jsonb_ops does - why is that decision related to whether you hash or not? Or it could index both. Sure, it would be wasteful when you don't need to support foo ? 'bar', but the point is that it should be up to the DBA to decide, based on his needs. As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability. ISTM we need a way to parameterize opclasses, so that when you create the index, you specify the above things. /rant The ship has cleatly sailed to add parameterized opclasses to 9.4, but let's keep it in mind when we decide on the defaults. In the absence of parameterizable opclasses, it would be much more flexible to have opclasses that index, keys, values, key-value pairs and paths separately, instead of the current json_ops and json_hash_ops opclasses which index all of those in the same index. That way, if you only e.g. ever query on the existence of a key, you'd only need to index the keys. I don't understand how we ended up with the current dichotomy of json_ops and json_hash_ops... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pointer to structure in ECPG
Hi All, I tried to use pointer to array to fetch results of a query. The test case test_select.pgc is attached. Changes specific to one's environment are needed before that test can be tried. Otherwise, you may try file pointer_to_struct.pgc in the patch attached, by putting it ecpg/test directory. The file test_select.pgc compiles fine but when run it produces error -203 and produces garbage because the result of FETCH was not saved in the array pointed by pointer. ./test_select SQL error: SQL error -203 on line 40 empno=-163754450, ename=, job= empno=634004672, ename=�, job= empno=634004624, ename=�, job= This happens because of two problems 1. When dumping individual members of structure emp in test_select.pgc, when a pointer to that structure is dumped, its array size is dumped as 1 instead of 0 or a negative number like scalar variable pointers. 2. As discussed in other thread using arrays within structure in ECPG, the offsets of members of structures are not dumped correctly for an unbounded array. Problem 1 reason: - When dumping individual members of structure emp in test_select.pgc, when a pointer to that structure is dumped, the call is routed through code path as shown below *#0* *ECPGdump_a_type* (o=0x6cc4f0, *name=0x6d2190 empno*, type=0x6d2130, brace_level=-1, ind_name=0x4c0ae8 no_indicator, ind_type=0x6c8260, ind_brace_l evel=-1, *prefix=0x6d1a20 emp2-*, ind_prefix=0x0, *arr_str_siz=0x6d2070 0,* *struct_sizeof=0x6d0c60 sizeof( struct employee )*, ind_struct_sizeof=0x0) at type.c:247 *#1* 0x0042f21d in *ECPGdump_a_struct* (o=0x6cc4f0, *name=0x6d20b0 emp2*, ind_name=0x435c20 no_indicator, *arrsiz=0x6d2070 0*, type=0x6d20d0, in d_type=0x6c8260, *prefix=0x6d1a20 emp2-*, ind_prefix=0x0) at type.c:572 *#2* 0x0042e2cc in *ECPGdump_a_type *(o=0x6cc4f0, *name=0x6d20b0 emp2*, type=0x6d22b0, brace_level=1, ind_name=0x435c20 no_indicator, ind_type=0 x6c8260, ind_brace_level=0, prefix=0x0, ind_prefix=0x0, *arr_str_siz=0x6d1e60 0*, *struct_sizeof=0x0*, ind_struct_sizeof=0x0) at type.c:293 *#3* 0x0043335d in *dump_variables* (list=0x6d2310, mode=1) at variable.c:452 ECPGdump_a_type() at frame #3, then calls ECPGdump_a_simple() as follows #0 *ECPGdump_a_simple* (o=0x6cc4f0, *name=0x6d2190 empno*, type=ECPGt_int, varcharsize=0x6d0f70 1,* arrsize=0x6d2360 -1*, *siz=0x6d0c60 sizeof( struct employee )*, prefix=0x6d1a20 emp2-, counter=0) at type.c:405 ECPGdump_a_simple() while dumping the member (here empno) dumps with arrsize = 1 instead of -1 because of code at line 523│ if (atoi(arrsize) 0) 524│ strcpy(arrsize, 1); In ECPG run time library when reading this variable, it is read with array size 1 and is falsely interpreted as an array with size 1 instead of an unbounded array (which are read with arrsize 0). Hence giving error -203. ECPGdump_a_struct has correctly called ECPGdump_a_type with arrsize 0 to mean an unbounded array, and ECPGdump_a_type() has correctly converted it to -1 again to differentiate between pointer to struct from pointer to scalar variable. This differentiation is important to dump pointer to struct and pointer to scalar variable differently in ECPGdump_a_simple(). So, in ECPGdump_a_simple() we should dump the variable with arrsize -1 if it's part of an outer structure. The patch attached, includes solutions to both the problems and also a testcase preproc/pointer_to_struct.pgc. This test is a copy of test preproc/array_of_struct.pgc with the structure arrays replaced by pointers to structures and memory allocations. The patch is based on development head. Please consider this to be backpatched to 9.3 as well. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company test_select.pgc Description: Binary data diff --git a/src/interfaces/ecpg/preproc/type.c b/src/interfaces/ecpg/preproc/type.c index 2982cb6..dbe2d7e 100644 --- a/src/interfaces/ecpg/preproc/type.c +++ b/src/interfaces/ecpg/preproc/type.c @@ -512,25 +512,33 @@ ECPGdump_a_simple(FILE *o, const char *name, enum ECPGttype type, if (((atoi(arrsize) 0) || (atoi(arrsize) == 0 strcmp(arrsize, 0) != 0)) siz == NULL) sprintf(variable, (%s%s), prefix ? prefix : , name); else sprintf(variable, (%s%s), prefix ? prefix : , name); sprintf(offset, sizeof(%s), ecpg_type_name(type)); break; } - - if (atoi(arrsize) 0) + + /* + * Array size would be -1 for addresses of members within structure, + * when pointer to structure is being dumped. + */ + if (atoi(arrsize) 0 !siz) strcpy(arrsize, 1); - if (siz == NULL || strlen(siz) == 0 || strcmp(arrsize, 0) == 0 || strcmp(arrsize, 1) == 0) + /* + * If siz i.e. the size of structure of which this variable is part of, + * that gives the offset to the next element, if required + */ + if (siz == NULL || strlen(siz) == 0) fprintf(o, \n\t%s,%s,(long)%s,(long)%s,%s, ,
Re: [HACKERS] Autonomous Transaction (WIP)
On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi rajeev.rast...@huawei.comwrote: Though autonomous transaction uses mixed approach of sub-transaction as well as main transaction, transaction state of autonomous transaction is handled independently. Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Autonomous Transaction (WIP)
On 04/09/2014 08:44 AM, Pavan Deolasee wrote: On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi rajeev.rast...@huawei.com mailto:rajeev.rast...@huawei.com wrote: Though autonomous transaction uses mixed approach of sub-transaction as well as main transaction, transaction state of autonomous transaction is handled independently. Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. I would hope most of this to be solved by having one (read only) virtual transaction and then juggling the ATs in a way similar to current subtransaction machinery. The main differences would be that: A) the top level transaction stays virtual and B) ATs are committed independantly This would be greatly simplified if we can accept the restriction that there is only single snapshot per backend (not per transaction). To me this seems a completely sensible restriction. Re syntax, I think we need a way to name the transactions so we can have a way to switch between multiple parallel active autonomous transactions. - BEGIN TRANSACTION myfirsttransaction; do something in myfirsttransaction; BEGIN TRANSACTION anothertransaction; do something in anothertransaction; SET TRANSACTION myfirsttransaction; more work in myfirsttransaction; ROLLBACK anothertransaction; COMMIT; -- or COMMIT myfirsttransaction; Cheers Hannu Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Autonomous Transaction (WIP)
On 04/09/2014 02:44 PM, Pavan Deolasee wrote: On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi rajeev.rast...@huawei.com mailto:rajeev.rast...@huawei.com wrote: Though autonomous transaction uses mixed approach of sub-transaction as well as main transaction, transaction state of autonomous transaction is handled independently. Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. ... not to mention the fact that extensions may rely on having their own global state. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 04/09/2014 09:55 AM, Hannu Krosing wrote: This would be greatly simplified if we can accept the restriction that there is only single snapshot per backend (not per transaction). To me this seems a completely sensible restriction. Huh? In Read committed mode, every query within a transaction gets a different snapshot. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability. Why would you ever not have to use expression indexes? Idiomatic usage of jsonb involves expression indexes because it's desirable to index only a expression. People will want to do things like only index the nested tags array far more frequently then they'll only want to index keys (that is, Object pair keys) in the entire document. I don't get why you'd say that they'd resort to expression indexes, like they're a kludge. Have you ever tried out one of the new document databases? I suggest you do. Expression indexes on jsonb map pretty closely onto how you're frequently expected to index data in those systems. That's something that they make heavy use of. Why would you ever not really have to consider ahead of time what is important enough to be indexed, and what is not? ISTM we need a way to parameterize opclasses, so that when you create the index, you specify the above things. That would be nice. In the absence of parameterizable opclasses, it would be much more flexible to have opclasses that index, keys, values, key-value pairs and paths separately, instead of the current json_ops and json_hash_ops opclasses which index all of those in the same index. That way, if you only e.g. ever query on the existence of a key, you'd only need to index the keys. I think only ever needing to index the keys is not a common use-case. It pretty much makes exactly as much sense to do so as it would with hstore, and yet hstore doesn't support that after all these years. I don't understand how we ended up with the current dichotomy of json_ops and json_hash_ops... It makes sense if you consider jsonb_ops best suited to simpler hstore-style indexing, while jsonb_hash_ops is best suited to testing containment of JSON documents, potentially with lots of nesting. These documents are typically homogeneous in structure. Idiomatic usage of systems like MongoDB involves collections of fairly homogeneous documents. If there is a lot of variability in their structure within a collection, the collection more or less becomes impossible to usefully query. They aim to be flexible, but still implicitly require you to insert data with a half-way sensible/consistent structure. This makes separately indexing the keys less than compelling as a default, because there is so much duplication of keys in practice. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq api wart: no PQconnect variant that can consume output of PQconninfoParse(...)
Hi all While working on something else I just noticed that there's no PQconnect variant that can consume the output from PQconninfoParse(...), i.e. an array of PQconninfoOption* . This would be a nice-to-have for times when you want to pass a connstr, modify it, and then connect with the modified options. There's PQconnectdbParams, but you have to transform the option arrays to use them with it. This might be a worthwhile TODO (beginner) option - and may simplify some of fe-connect.c in the process. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Merge Join for Non '=' Operators
2014-04-09 Dilip kumar dilip.ku...@huawei.com: I would like to propose a New merge join algorithm for optimizing non ‘=’ operators. (‘’, ‘=’, ‘’, ‘=’) Do you have a real-world example use case of such joins, to offset the extra planner time that will likely have to be paid (even for queries for which the functionality ends up not being used)? I guess there might be queries that join on “values that are not too far apart” or something, but as those cases (there typically not being a lot of “inner” rows that join with each “outer” row) are probably executed efficiently using a nested loop + index scan, I don’t see the point (yet). Are you aiming for the case where the inner relation is difficult to compute and cannot be accessed using an index scan? selecting this new cost calculation can be implemented in planner Hmm. Of course, the difficult part will be adding support for this in the planner, but you don’t seem to have any plans for implementing that? Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 8 April 2014 21:48, Florian Pflug f...@phlo.org wrote: On Apr7, 2014, at 17:41 , Dean Rasheed dean.a.rash...@gmail.com wrote: I've just finished reading through all the other patches, and they all look OK to me. It's mostly straightforward stuff, so despite the size it's hopefully all committable once the base patch goes in. Hm, I'm starting to have second thoughts about the minmax patch. The inverse transition functions for MIN and MAX have a non-trivial probability of failure - they trigger a rescan whenever the value that is removed isn't strictly smaller (respectively strictly larger) then the current maximum (respectively minimum). Thus, whenever that happens, we both call the inverse transition function *and* (since it fails) restart the aggregation. For windows based on ROWS, this isn't too bad - even if we fail every second time, we still avoid half the rescans, which should be a net win if the average window size is 2. But for RANGE-based windows, more than one call of the inverse transition function must succeed for us to save anything, since we must successfully remove *all* peers to avoid one restart. This greatly increases the chance that using the inverse transition function hurts rather then helps - the situation is worse the larger the average number of peers is. Argh, I hadn't really considered that case. I suppose any imperfect inverse transition function has the potential to make performance worse rather than better. But working out the likelihood of that isn't necessarily straightforward. It might be possible to include some sort of heuristic based on the known information --- the number of rows P in the peer group about to be removed vs the total number N of rows aggregated so far. If the data were fairly random, then a quick back-of-the-envelope calculation suggests that trying the inverse min/max functions would be worthwhile on average if P were less than around 0.4N, but of course different data distributions could make that much worse. Even a perfect inverse transition function isn't going to be much use if P N/2 (e.g., imagine a case where the peer groups decrease in size exponentially), so perhaps we should be including such a check anyway. That's also assuming that the cost of the inverse transition function is about the same as the cost of the forward function, which is not necessarily the case. Perhaps imperfect inverse transition functions should be assigned a higher cost, and that should be factored into the decision as to whether they are likely to be worth trying. All that feels very speculative though, and I think it's too late to be considering that for 9.4, so yes, let's leave out the min/max aggregates for now. I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch invtrans_bool - it previously was part of invtrans_minmax. Given the performance risk involved, I think that we probably shouldn't commit invtrans_minmax at this time. I should have brought this up earlier, but the issue had slipped my mind :-( Sorry for that. I think that you're probably right that optimising window_gettupleslot() to eliminate the O(n^2) behaviour can be left to a later patch --- the existing performance benefits of this patch are enough to justify its inclusion IMO. It would be nice to include the trivial optimisation to window_gettupleslot() that I posted upthread, since it gives such a big improvement for only a few lines of code changed. Agreed, but since it's independent from the rest of the base patch, I kept it as a separate patch (invtrans_optimize) instead of merging it into the base patch. It should probably be committed separately too. It would be good to commit at least the base, arith and optimize patches for 9.4. I think the collecting and bool patches are also committable, but I also suspect that those aggregates are less well used, so they could be considered lower priority. If you post a new patch set, I'll mark it as ready for committer attention. New patch set is attached. The only difference to the previous one is that Forward Transitions and Inverse Transitions are now scaled with nloops, and that it includes your window_gettupleslot patch under the name invtrans_optimize. OK, I'm marking this ready for committer attention, on the understanding that that doesn't include the invtrans_minmax patch. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/09/2014 10:40 AM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability. Why would you ever not have to use expression indexes? Idiomatic usage of jsonb involves expression indexes because it's desirable to index only a expression. People will want to do things like only index the nested tags array far more frequently then they'll only want to index keys (that is, Object pair keys) in the entire document. I don't get why you'd say that they'd resort to expression indexes, like they're a kludge. Expression indexes are definitely nice, but you have to be careful to formulate the query in exactly the same way to match the index. Have you ever tried out one of the new document databases? I suggest you do. Expression indexes on jsonb map pretty closely onto how you're frequently expected to index data in those systems. That's something that they make heavy use of. Why would you ever not really have to consider ahead of time what is important enough to be indexed, and what is not? I didn't say that. On the contrary, I think the shotgun approach jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to specify what to index in a more detailed fashion. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Wed, Apr 9, 2014 at 01:02:05AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. I think the proposal is don't mention the property if it has the default value. That's not second-class status, as long as people who know what the property is understand that behavior. It's just conserving screen space. Yes, the lines will certainly appear if you have set _anything_ as non-default, both oids and replica identity. Kind of the same as how we show indexes and child tables if any exist (it is non-default), and unlogged tables. I know I have fielded questions during training asking, What is that OID line?, so I do know it confuses people. It does give me a chance to talk about it, but based on how often it is useful, I am not sure that is a win. Ideally we would deal with oids for 9.5, but since I think everyone agrees that replica identity and oids should behave the same, we need to decide this for 9.4. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Firing trigger if only
Thank you, this done the job.All the best -- View this message in context: http://postgresql.1045698.n5.nabble.com/Firing-trigger-if-only-tp5798484p5799344.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ipc_test
On Tue, Apr 8, 2014 at 02:08:25PM -0400, Greg Stark wrote: On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote: OK, done. One less thing to worry about when committing! Also one less thing to cause headaches with etags and similar tools. It always drove me nuts that I was constantly being sent to ipc_test files for various typedefs. Thanks! Oh, yeah, that was a _big_ pain. Thanks! -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I didn't say that. On the contrary, I think the shotgun approach jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to specify what to index in a more detailed fashion. It is - use an expression index. That's by far the most important way to specify what to index in a more detailed fashion. There are others, but that's the major one. Beyond that, yes, it's necessary to carefully write your query predicate a certain way. However, a similar situation exists in MongoDB, where there is a distinction between Indexes on embedded fields (which must be accessed using special dot notation) and indexes on subdocuments (which cannot be accessed using dot notation). It's late here, but I'm pretty sure that's a feature and not a limitation. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 21 March 2014 14:22, Andres Freund and...@2ndquadrant.com wrote: That seems to work fairly well. On the few tests I could run on my laptop - I've done this during a flight - it's a small performance win in all cases I could test. While saving a fair amount of memory. We've got to the stage now that saving this much memory is essential, so this patch is a must-have. The patch does all I would expect and no more, so approach and details look good to me. Performance? Discussed many years ago, but I suspect the micro-tuning of those earlier patches wasn't as good as it is here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER TABLE set reloptions
As part of the ALTER TABLE lock reductions we've now agreed that reloptions should have a lock level associated with them, so we can take appropriate lock levels. Attached patch will be applied at start of 9.5 dev cycle, so that any new relopt authors are aware that lock levels are needed for any new work. Later patch will then use this infrastructure to (attempt) to reduce lock levels, assuming no problems. Added to next CF for June 2014, not for commit yet. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services relopt.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 05:34:42 -0400, Simon Riggs wrote: On 21 March 2014 14:22, Andres Freund and...@2ndquadrant.com wrote: That seems to work fairly well. On the few tests I could run on my laptop - I've done this during a flight - it's a small performance win in all cases I could test. While saving a fair amount of memory. We've got to the stage now that saving this much memory is essential, so this patch is a must-have. I think some patch like this is necessary - I am not 100% sure mine is the one true approach here, but it certainly seems simple enough. Performance? Discussed many years ago, but I suspect the micro-tuning of those earlier patches wasn't as good as it is here. It's a small win on small machines (my laptop, 16GB), so we need to retest with 128GB shared_buffers or such on bigger ones. There PrivateRefCount previously was the source of a large portion of the cache misses... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pointer to structure in ECPG
Thanks Ashutosh, both patches committed and backported to the whole 9.* series. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Merge Join for Non '=' Operators
On 09 April 2014 13:31, Nicolas Barbier Wrote Do you have a real-world example use case of such joins, to offset the extra planner time that will likely have to be paid (even for queries for which the functionality ends up not being used)? I guess there might be queries that join on “values that are not too far apart” or something, but as those cases (there typically not being a lot of “inner” rows that join with each “outer” row) are probably executed efficiently using a nested loop + index scan, I don’t see the point (yet). Are you aiming for the case where the inner relation is difficult to compute and cannot be accessed using an index scan? I think this will be more useful when Both the relation are Big and are of almost equal size. Here we can compare the cost of existing methods and new approach.. For such case planner can select Either(1). NLJ [Outer(seq scan) JOIN Inner (Materialize)) OR (2). NLJ [Outer(seq scan) JOIN Inner (Index)) As you mentioned. In approach (1), cost will be: NLJ Cost : Outer Tuple * Inner Tuple + I/O cost : Seq Page Cost (Inner rel pages + outer rel pages). * And relation can be too big for materialization. In approach (2), Cost will be: NLJ Cost : OuterTuple * Inner Tuple in Path{ Inner Tuple in Path - Join selectivity * Number of inner tuple} + I/O Cost : OuterTuple * Index Rescan Cost{ Index Rescan Cost will depend upon (how many pages fetched in scan)} * This will be costly because I/O cost will increase for doing multiple index rescan(for each outer). In New Approach(3) Cost will be: (since here for each outer tuple we need not to scan complete Inner Tuple.) MJ Cost : Outer Tuple * Inner Tuple in Path + (every outer tuple need to scan some tuple before reach to qualifying tuple) + I/O Cost : Index Scan Cost {Only one scan} So for This Best case cost will be : MJ Cost : Outer Tuple * Inner Tuple in Path + I/O Cost : Index Scan Cost Worst case cost will be: MJ Cost : Outer Tuple * Inner Tuple + I/O Cost : Index Scan Cost So for many case approach(3) can be cheaper, that can be detected by planner cost calculation. selecting this new cost calculation can be implemented in planner Hmm. Of course, the difficult part will be adding support for this in the planner, but you don’t seem to have any plans for implementing that? Yes, I have plan to implement this part, but it's not completed yet. Thanks Regards, Dilip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pointer to structure in ECPG
Thanks a lot Michael. On Wed, Apr 9, 2014 at 3:59 PM, Michael Meskes mes...@postgresql.orgwrote: Thanks Ashutosh, both patches committed and backported to the whole 9.* series. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 8, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote: Apparently not. However, I'm fairly sure this is a step toward addressing the complaints previously raised, even if there may be some details people still want changed, so I've gone ahead and committed it. Few Observations: 1. One new warning has been introduced in code. 1src\backend\port\win32_shmem.c(295): warning C4013: 'dsm_set_control_handle' undefined; assuming extern returning int Attached patch fixes this warning. 2. On running test_shm_mq manually, the client side didn't get any problem (the results are as expected). However I am seeing below log on server: LOG: registering background worker test_shm_mq LOG: starting background worker process test_shm_mq LOG: unrecognized win32 error code: 127 LOG: worker process: test_shm_mq (PID 2528) exited with exit code 1 LOG: unregistering background worker test_shm_mq I think below message in log is not expected: LOG: unrecognized win32 error code: 127 This has been started appearing after your commit related to DSM. I have debugged this issue and found that it comes from below part of code: dsm_impl_windows { .. else { hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ, FALSE, /* do not inherit the name */ name); /* name of mapping object */ _dosmaperr(GetLastError()); } } Here even though handle returned by OpenFileMapping() is a valid handle, but still GetLastError() return 127 (The specified procedure could not be found.). Now the specs[1] of API says that if handle is non-NULL, then consider it success, so I am not sure whether we should bother about this error or not. I have tried many ways (trying different parameters, search on net) to change this and related API's, but the same problem exists. One strange thing is if I just call the API twice (I know this is not right, but just to experiment for finding some info), second time this error doesn't occur. The only difference in latest changes is that now the OpenFileMapping() is getting called by Child process rather than peer process (w.r.t process that has called CreateFileMapping), don't know why this should make such a difference. On net whatever examples I have seen for such usage, they call GetLastError() only if handle is invalid, we have called in above fashion just to keep code little bit simple. I am just not sure whether it is okay to rearrange the code and call GetLastError() only if returned handle is Invalid (NULL) or try to look for more info. One question: 1. I have seen that initdb still creates pg_dynshmem, is it required after your latest changes? Let me know your opinion? [1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa366791(v=vs.85).aspx With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_win_warning.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs si...@2ndquadrant.com wrote: We've got to the stage now that saving this much memory is essential, so this patch is a must-have. The patch does all I would expect and no more, so approach and details look good to me. Performance? Discussed many years ago, but I suspect the micro-tuning of those earlier patches wasn't as good as it is here. I think this approach is practically a slam-dunk when the number of pins is small (as it typically is). I'm less clear what happens when we overflow from the small array into the hashtable. That certainly seems like it could be a loss, but how do we construct such a case to test it? A session with lots of suspended queries? Can we generate a regression by starting a few suspended queries to use up the array elements, and then running a scan that pins and unpins many buffers? One idea is: if we fill up all the array elements and still need another one, evict all the elements to the hash table and then start refilling the array. The advantage of that over what's done here is that the active scan will always being using an array slot rather than repeated hash table manipulations. I guess you'd still have to probe the hash table repeatedly, but you'd avoid entering and removing items frequently. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 08:22:15 -0400, Robert Haas wrote: On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs si...@2ndquadrant.com wrote: We've got to the stage now that saving this much memory is essential, so this patch is a must-have. The patch does all I would expect and no more, so approach and details look good to me. Performance? Discussed many years ago, but I suspect the micro-tuning of those earlier patches wasn't as good as it is here. I think this approach is practically a slam-dunk when the number of pins is small (as it typically is). I'm less clear what happens when we overflow from the small array into the hashtable. That certainly seems like it could be a loss, but how do we construct such a case to test it? A session with lots of suspended queries? Can we generate a regression by starting a few suspended queries to use up the array elements, and then running a scan that pins and unpins many buffers? I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. One idea is: if we fill up all the array elements and still need another one, evict all the elements to the hash table and then start refilling the array. The advantage of that over what's done here is that the active scan will always being using an array slot rather than repeated hash table manipulations. I guess you'd still have to probe the hash table repeatedly, but you'd avoid entering and removing items frequently. We could do that, but my gut feeling is that it's not necessary. There'd be some heuristic to avoid doing that all the time, otherwise we'd probably regress. I think the fact that we pin/unpin very frequently will put frequently used pins to the array most of the time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund and...@2ndquadrant.comwrote: I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. AFAIR this was suggested before and got rejected because constructing that worst case and proving that the approach does not perform too badly was a challenge. Having said that, I agree its time to avoid that memory allocation, especially with large number of backends running with large shared buffers. An orthogonal issue I noted is that we never check for overflow in the ref count itself. While I understand overflowing int32 counter will take a large number of pins on the same buffer, it can still happen in the worst case, no ? Or is there a theoretical limit on the number of pins on the same buffer by a single backend ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote: On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund and...@2ndquadrant.comwrote: I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. AFAIR this was suggested before and got rejected because constructing that worst case and proving that the approach does not perform too badly was a challenge. Having said that, I agree its time to avoid that memory allocation, especially with large number of backends running with large shared buffers. Well, I've tested the worst case by making *all* pins go through the hash table. And it didn't regress too badly, although it *was* visible in the profile. I've searched the archive and to my knowledge nobody has actually sent a patch implementing this sort of schemes for pins, although there's been talk about various ways to solve this. An orthogonal issue I noted is that we never check for overflow in the ref count itself. While I understand overflowing int32 counter will take a large number of pins on the same buffer, it can still happen in the worst case, no ? Or is there a theoretical limit on the number of pins on the same buffer by a single backend ? I think we'll die much earlier, because the resource owner array keeping track of buffer pins will be larger than 1GB. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote: I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. Suspended queries won't do it? Also, it would be good to quantify not that bad. Actually, this thread is completely lacking any actual benchmark results... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. I think the proposal is don't mention the property if it has the default value. That's not second-class status, as long as people who know what the property is understand that behavior. It's just conserving screen space. One thing that concerns me is that replica identity has a different default for system tables (NOTHING) than for other tables (DEFAULT). So when we say we're not going to display the default value, are we going to display it when it's not NOTHING, when it's not DEFAULT, or when it's not the actual default for that particular kind of table? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 09:17:59 -0400, Robert Haas wrote: On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote: I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. Suspended queries won't do it? What exactly do you mean by suspended queries? Defined and started portals? Recursive query execution? Also, it would be good to quantify not that bad. The 'not bad' comes from my memory of the benchmarks I'd done after about 12h of flying around ;). Yes, it needs real benchmarks. Probably won't get to it the next few days tho. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Merge Join for Non '=' Operators
Dilip kumar dilip.ku...@huawei.com writes: On 09 April 2014 13:31, Nicolas Barbier Wrote Do you have a real-world example use case of such joins, to offset the extra planner time that will likely have to be paid (even for queries for which the functionality ends up not being used)? I think this will be more useful when Both the relation are Big and are of almost equal size. I'm not following how this would help much. Consider a.x b.y where we know that the inputs are sorted by x/y. I think you are saying that for a given b row, we could scan forward from the beginning of a, but stop as soon as we find a row with a.x = b.y. Great ... but instead of comparing M*N rows, we are comparing M*N/2 rows. So it's still O(N^2), though with a slightly smaller constant. Is this going to be worth the added time to sort the inputs? It's slightly more promising for range constraints (that is, a.x between b.y and b.z) but even then you have to assume that the ranges are narrow, at which point a nestloop join using an inner indexscan on a.x might do about as well. So personally, I suspect there's a reason why this isn't a standard join algorithm already. If you want to try it, I'd suggest that you focus on getting to where you can do some performance testing ASAP, without doing much code polishing or worrying about teaching the planner how to cost it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On Wed, Apr 9, 2014 at 9:38 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-09 09:17:59 -0400, Robert Haas wrote: On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote: I've tried to reproduce problems around this (when I wrote this), but it's really hard to construct cases that need more than 8 pins. I've tested performance for those cases by simply not using the array, and while the performance suffers a bit, it's not that bad. Suspended queries won't do it? What exactly do you mean by suspended queries? Defined and started portals? Recursive query execution? Open a cursor and fetch from it; leave it open while doing other things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Andres Freund and...@2ndquadrant.com writes: On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote: An orthogonal issue I noted is that we never check for overflow in the ref count itself. While I understand overflowing int32 counter will take a large number of pins on the same buffer, it can still happen in the worst case, no ? Or is there a theoretical limit on the number of pins on the same buffer by a single backend ? I think we'll die much earlier, because the resource owner array keeping track of buffer pins will be larger than 1GB. The number of pins is bounded, more or less, by the number of scan nodes in your query plan. You'll have run out of memory trying to plan the query, assuming you live that long. The resource managers are interesting to bring up in this context. That mechanism didn't exist when PrivateRefCount was invented. Is there a way we could lay off the work onto the resource managers? (I don't see one right at the moment, but I'm under-caffeinated still.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 10:09:44 -0400, Tom Lane wrote: The resource managers are interesting to bring up in this context. That mechanism didn't exist when PrivateRefCount was invented. Is there a way we could lay off the work onto the resource managers? (I don't see one right at the moment, but I'm under-caffeinated still.) Yea, that's something I've also considered, but I couldn't come up with a performant and sensibly complicated way to do it. There's some nasty issues with pins held by different ResourceOwners and such, so even if we could provide sensible random access to check for existing pins, it wouldn't be a simple thing. It's not unreasonable to argue that we just shouldn't optimize for several pins held by the same backend for the same and always touch the global count. Thanks to resource managers the old reason for PrivateRefCount, which was the need to be able cleanup remaining pins in case of error, doesn't exist anymore. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Andres Freund and...@2ndquadrant.com writes: It's not unreasonable to argue that we just shouldn't optimize for several pins held by the same backend for the same and always touch the global count. NAK. That would be a killer because of increased contention for buffer headers. The code is full of places where a buffer's PrivateRefCount jumps up and down a bit, for example when transferring a tuple into a TupleTableSlot. (I said upthread that the number of pins is bounded by the number of scan nodes, but actually it's probably some small multiple of that --- eg a seqscan would hold its own pin on the current buffer, and there'd be a slot or two holding the current tuple, each with its own pin count.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
On 2014-04-09 10:26:25 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: It's not unreasonable to argue that we just shouldn't optimize for several pins held by the same backend for the same and always touch the global count. NAK. Note I didn't implement it because I wasn't too convinced either ;) That would be a killer because of increased contention for buffer headers. The code is full of places where a buffer's PrivateRefCount jumps up and down a bit, for example when transferring a tuple into a TupleTableSlot. On the other hand in those scenarios the backend is pretty likely to already have the cacheline locally in exclusive mode... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
On Wed, Apr 9, 2014 at 2:06 AM, Prabakaran, Vaishnavi vaishna...@fast.au.fujitsu.com wrote: Hi all, Following the discussion in message id - cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.com , I have developed the patch which gives option to user to exclude pg_log directory contents in pg_basebackup. [Current situation] During pg_basebackup, all files in pg_log directory will be copied to new backup directory. [Design] - Added new non-mandatory option -S/--skip-log-dir to pg_basebackup . - If skip-log-dir is specified in pg_basebackup command, then in basebackup, exclude copying log files from standard pg_log directory and any other directory specified in Log_directory guc variable. (Still empty folder pg_log/$Log_directory will be created) - In case, pg_log/$Log_directory is symbolic link, then an empty folder will be created [Advantage] It gives an option to user to avoid copying of large log files if they doesn't wish to and hence can save memory space. While pg_log is definitely the most common one being the default on many platforms, we'll still be missing other ones. Should we really hardcode it, or should we somehow derive it from the settings for log_directory instead? As a more general discussion, is this something we might want to expose as a more general facility rather than hardcode it to the log directory only? And is it perhaps something we'd rather have configured at the server than specified in pg_basebackup - like a guc saying which directories should always be excluded from a basebackup? So you don't have to remember it every time? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
Magnus Hagander wrote: While pg_log is definitely the most common one being the default on many platforms, we'll still be missing other ones. Should we really hardcode it, or should we somehow derive it from the settings for log_directory instead? As a more general discussion, is this something we might want to expose as a more general facility rather than hardcode it to the log directory only? And is it perhaps something we'd rather have configured at the server than specified in pg_basebackup - like a guc saying which directories should always be excluded from a basebackup? So you don't have to remember it every time? So it'd be an array, and by default you'd have something like: basebackup_skip_path = $log_directory ? Maybe use it to skip backup labels by default as well. basebackup_skip_path = $log_directory, $backup_label_files -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: Magnus Hagander wrote: While pg_log is definitely the most common one being the default on many platforms, we'll still be missing other ones. Should we really hardcode it, or should we somehow derive it from the settings for log_directory instead? As a more general discussion, is this something we might want to expose as a more general facility rather than hardcode it to the log directory only? And is it perhaps something we'd rather have configured at the server than specified in pg_basebackup - like a guc saying which directories should always be excluded from a basebackup? So you don't have to remember it every time? So it'd be an array, and by default you'd have something like: basebackup_skip_path = $log_directory ? Maybe use it to skip backup labels by default as well. basebackup_skip_path = $log_directory, $backup_label_files I hadn't considered any details, but yes, someting along that line. And then you could also include arbitrary filenames or directories should you want. E.g. if you use the data directory to store your torrents or something. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The ship has cleatly sailed to add parameterized opclasses to 9.4, but let's keep it in mind when we decide on the defaults. In the absence of parameterizable opclasses, it would be much more flexible to have opclasses that index, keys, values, key-value pairs and paths separately, instead of the current json_ops and json_hash_ops opclasses which index all of those in the same index. That way, if you only e.g. ever query on the existence of a key, you'd only need to index the keys. I don't understand how we ended up with the current dichotomy of json_ops and json_hash_ops... +1 for parameterizable opclasses -- With best regards, Alexander Korotkov.
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Oops! I found a bug in this patch. The previous v8 patch missed the case that build_index_pathkeys() could build a partial pathkeys from the index tlist. TBH I think that's barely the tip of the iceberg of cases where this patch will get the wrong answer. It looks like it thinks that *any* Var belonging to the indexed relation must be the same one indexed by the index. Also, I don't see it doing anything to check the ordering of multiple index columns --- for instance, an index on (a,b) and one on (b,a) would both pass or fail identically, though surely only one should match ORDER BY a,b, Also, what's with the success return before the loop: + if (list_length(pathkeys) == list_length(root-query_pathkeys)) + return true; At this point you haven't proven *anything at all* about whether the index columns have something to do with the query_pathkeys. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I didn't say that. On the contrary, I think the shotgun approach jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to specify what to index in a more detailed fashion. It is - use an expression index. That's by far the most important way to specify what to index in a more detailed fashion. There are others, but that's the major one. Beyond that, yes, it's necessary to carefully write your query predicate a certain way. However, a similar situation exists in MongoDB, where there is a distinction between Indexes on embedded fields (which must be accessed using special dot notation) and indexes on subdocuments (which cannot be accessed using dot notation). It's late here, but I'm pretty sure that's a feature and not a limitation. I believe that serious limitation we now have is that we actually specify kind of index to be used in the SQL query. For example you need to find objects with active = true. You can write: js @ {active: true} then GIN index on js can be used. Also you can write: js-'active' = true then btree expression index on (js-'active') can be used. For sure, one can do js @ {active: true} AND js-'active' = true This query can use any of indexes, but it is: 1) Cluge 2) Excess recheck 3) If both indexes present, excess bitmap and. Having to choose index in SQL-query we make our SQL more imperative and less declarative. Similar things can happen without json/hstore (user have to rewrite SQL in order to use expression index), but now it could become very common. My opinion is that we have to do something in planner to make it understand at least this two kinds of queries to be equivalent. -- With best regards, Alexander Korotkov.
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
Magnus Hagander wrote: On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: So it'd be an array, and by default you'd have something like: basebackup_skip_path = $log_directory ? Maybe use it to skip backup labels by default as well. basebackup_skip_path = $log_directory, $backup_label_files I hadn't considered any details, but yes, someting along that line. And then you could also include arbitrary filenames or directories should you want. E.g. if you use the data directory to store your torrents or something. Man, that's a great idea. Database servers have lots of diskspace in that partition, so it should work really well. Thanks! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Both of the operator classes are actually much less flexible than I'd like. Firstly, they index everything. In many cases, that's not what you want, so you end up with much larger indexes than necessary. Secondly, jsonb_ops indexes all values separately from the keys. That makes the index pretty much useless for a query on, say, WHERE json @ '{needs_processing:true}', if all the rows also contain a key-value pair active:true. Thirdly, inequality operators are not supported; you can't search for rows with (the json-syntax equivalent of) price 12.3. Fourthly, sometimes you would want to include the path to an entry in the key, sometimes not. Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. ISTM we need a way to parameterize opclasses, so that when you create the index, you specify the above things. Yeah, that would be great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add psql tab completion for event triggers
On Tue, Apr 8, 2014 at 5:27 AM, Ian Barwick i...@2ndquadrant.com wrote: On 08/04/14 18:22, Ian Barwick wrote: As it was kind of annoying not to have this when playing around with event triggers. This also tightens up the existing tab completion for ALTER TRIGGER, which contained redundant code for table name completion, and which was also causing a spurious RENAME TO to be inserted in this context: CREATE EVENT TRIGGER foo ON {event} ^I Apologies, previous patch had some unrelated changes in it. Correct patch attached. This *still* has some unrelated things in it, like s/Pgsql/Postgres/, and numerous hunks consisting entirely of whitespace changes and/or changes to unrelated comments. Also, what's the point of this hunk: *** psql_completion(const char *text, int st *** 1318,1340 pg_strcasecmp(prev2_wd, TRIGGER) == 0) COMPLETE_WITH_CONST(ON); - else if (pg_strcasecmp(prev4_wd, ALTER) == 0 -pg_strcasecmp(prev3_wd, TRIGGER) == 0) - { - completion_info_charp = prev2_wd; - COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); - } - /* !* If we have ALTER TRIGGER sth ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) ! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); /* ALTER TRIGGER name ON name */ ! else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 pg_strcasecmp(prev2_wd, ON) == 0) COMPLETE_WITH_CONST(RENAME TO); --- 1355,1374 pg_strcasecmp(prev2_wd, TRIGGER) == 0) COMPLETE_WITH_CONST(ON); /* !* If we have ALTER TRIGGER name ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) ! { ! completion_info_charp = prev2_wd; ! COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); ! } /* ALTER TRIGGER name ON name */ ! else if (pg_strcasecmp(prev5_wd, ALTER) == 0 !pg_strcasecmp(prev4_wd, TRIGGER) == 0 pg_strcasecmp(prev2_wd, ON) == 0) COMPLETE_WITH_CONST(RENAME TO); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Robert Haas robertmh...@gmail.com writes: On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Both of the operator classes are actually much less flexible than I'd like. Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. There's definitely something to be said for that. Default opclasses are sensible when there's basically only one behavior that's interesting for most people. We can already see that that's not going to be the case for jsonb indexes, at least not with the currently available alternatives. Not having a default would force users to make decisions explicitly. Is that what we want? One other point here is that non-default opclasses can't be used in UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't matter here since these aren't btree opclasses, but is there a use-case for EXCLUDE with any of the supported jsonb operators? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Including replication slot data in base backups
On Tue, Apr 8, 2014 at 3:08 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Apr 8, 2014 at 1:18 AM, Robert Haas robertmh...@gmail.com wrote: Not sure if this is exactly the right way to do it, but I agree that something along those lines is a good idea. I also think, maybe even importantly, that we should probably document that people using file-copy based hot backups should strongly consider removing the replication slots by hand before using the backup. Good point. Something here would be adapted in this case: http://www.postgresql.org/docs/devel/static/backup-file.html I am attaching an updated patch as well. What you've got here doesn't look like the right section to update - the section you've updated is on taking a cold backup. The section I think you want to update is Making a Base Backup Using the Low Level API, and specifically this part: You can, however, omit from the backup dump the files within the cluster's pg_xlog/ subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if pg_xlog/ is a symbolic link pointing to someplace outside the cluster directory, which is a common setup anyway for performance reasons. You might also want to exclude postmaster.pid and postmaster.opts, which record information about the running postmaster, not about the postmaster which will eventually use this backup. (These files can confuse pg_ctl.) What I'd propose is adding a second paragraph like this: It is often a good idea to also omit from the backup dump the files within the cluster's pg_replslot/ directory, so that replication slots that exist on the master do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby, and possibly bloat on the master if hot standby feedback is enabled, because the clients that are using those replication slots will still be connecting to and updating the slots on the master, not the standby. Even if the backup is only intended for use in creating a new master, copying the replication slots isn't expected to be particularly useful, since the contents of those slots will likely be badly out of date by the time the new master comes on line. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Wed, Apr 9, 2014 at 09:27:11AM -0400, Robert Haas wrote: On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. I think the proposal is don't mention the property if it has the default value. That's not second-class status, as long as people who know what the property is understand that behavior. It's just conserving screen space. One thing that concerns me is that replica identity has a different default for system tables (NOTHING) than for other tables (DEFAULT). So when we say we're not going to display the default value, are we going to display it when it's not NOTHING, when it's not DEFAULT, or when it's not the actual default for that particular kind of table? We exclude pg_catalog from displaying Replica Identity due to this inconsistency. I assume this was desired because you can't replicate system tables. Is that true? The current test is: if ((tableinfo.relkind == 'r' || tableinfo.relkind == 'm') /* * No need to display default values; we already display a * REPLICA IDENTITY marker on indexes. */ tableinfo.relreplident != 'd' tableinfo.relreplident != 'i' strcmp(schemaname, pg_catalog) != 0) What might make more sense is this: if ((tableinfo.relkind == 'r' || tableinfo.relkind == 'm') /* * No need to display default values; we already display a * REPLICA IDENTITY marker on indexes. */ tableinfo.relreplident != 'i' ((strcmp(schemaname, pg_catalog) != 0 tableinfo.relreplident != 'd') || (strcmp(schemaname, pg_catalog) == 0 tableinfo.relreplident != 'n'))) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On 2014-04-09 11:42:32 -0400, Bruce Momjian wrote: On Wed, Apr 9, 2014 at 09:27:11AM -0400, Robert Haas wrote: On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. I think the proposal is don't mention the property if it has the default value. That's not second-class status, as long as people who know what the property is understand that behavior. It's just conserving screen space. One thing that concerns me is that replica identity has a different default for system tables (NOTHING) than for other tables (DEFAULT). So when we say we're not going to display the default value, are we going to display it when it's not NOTHING, when it's not DEFAULT, or when it's not the actual default for that particular kind of table? We exclude pg_catalog from displaying Replica Identity due to this inconsistency. I don't understand why it's inconsistent, but whatever. I assume this was desired because you can't replicate system tables. Is that true? Yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: Few Observations: 1. One new warning has been introduced in code. 1src\backend\port\win32_shmem.c(295): warning C4013: 'dsm_set_control_handle' undefined; assuming extern returning int Attached patch fixes this warning. OK, committed, after moving the header to the correct position in alphabetical order. 2. On running test_shm_mq manually, the client side didn't get any problem (the results are as expected). However I am seeing below log on server: LOG: registering background worker test_shm_mq LOG: starting background worker process test_shm_mq LOG: unrecognized win32 error code: 127 LOG: worker process: test_shm_mq (PID 2528) exited with exit code 1 LOG: unregistering background worker test_shm_mq I think below message in log is not expected: LOG: unrecognized win32 error code: 127 This has been started appearing after your commit related to DSM. I have debugged this issue and found that it comes from below part of code: dsm_impl_windows { .. else { hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ, FALSE, /* do not inherit the name */ name); /* name of mapping object */ _dosmaperr(GetLastError()); } } Here even though handle returned by OpenFileMapping() is a valid handle, but still GetLastError() return 127 (The specified procedure could not be found.). Now the specs[1] of API says that if handle is non-NULL, then consider it success, so I am not sure whether we should bother about this error or not. I have tried many ways (trying different parameters, search on net) to change this and related API's, but the same problem exists. One strange thing is if I just call the API twice (I know this is not right, but just to experiment for finding some info), second time this error doesn't occur. The only difference in latest changes is that now the OpenFileMapping() is getting called by Child process rather than peer process (w.r.t process that has called CreateFileMapping), don't know why this should make such a difference. On net whatever examples I have seen for such usage, they call GetLastError() only if handle is invalid, we have called in above fashion just to keep code little bit simple. I am just not sure whether it is okay to rearrange the code and call GetLastError() only if returned handle is Invalid (NULL) or try to look for more info. Well, I don't know either. You wrote the Windows portion of this code, so you'll have to decide what's best. If the best practice in this area is to only call GetLastError() if the handle isn't valid, then that's probably what we should do, too. But I can't speak to what the best practice is. One question: 1. I have seen that initdb still creates pg_dynshmem, is it required after your latest changes? It's only used now if dynamic_shared_memory_type = mmap. I know Andres was never a huge fan of the mmap implementation, so we could rip that out and get rid of the directory, too, but I kind of liked having it, and broke the tie in favor of myself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 2014-04-09 11:50:33 -0400, Robert Haas wrote: One question: 1. I have seen that initdb still creates pg_dynshmem, is it required after your latest changes? It's only used now if dynamic_shared_memory_type = mmap. I know Andres was never a huge fan of the mmap implementation, so we could rip that out and get rid of the directory, too, but I kind of liked having it, and broke the tie in favor of myself. It's purely a toy thing. I think pretty much every dynshm user that actually transfers data through it will be better off falling back to single process style work, rather than parellizing it. Anyway, it's there, it doesn't presently cause problems, and I don't have to maintain it, so ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On Wed, Apr 9, 2014 at 12:24 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: Deadlock Detection: I'm not sure how this would work out internally In order to resolve deadlock, two member variable will be created in the structure PROLOCK: Bitmask for lock types currently held by autonomous transaction. LOCKMASKholdMaskByAutoTx[MAX_AUTO_TX_LEVEL] Bitmask for lock types currently held by main transaction. LOCKMASKholdMaskByNormalTx Now when we grant the lock to particular transaction, depending on type of transaction, bit Mask will be set for either holdMaskByAutoTx or holdMaskByNormalTx. Similar when lock is ungranted, corresponding bitmask will be reset. That sounds pretty ugly, not to mention the fact that it will cause a substantial increase in the amount of memory required to store PROCLOCKs. It will probably slow things down, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Tom Lane wrote: On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Both of the operator classes are actually much less flexible than I'd like. Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. +1. I was thinking the same thing after reading Heikki's rant. One other point here is that non-default opclasses can't be used in UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't matter here since these aren't btree opclasses, but is there a use-case for EXCLUDE with any of the supported jsonb operators? That sounds like an oversight that could better be fixed in EXCLUDE, no? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation
Robert Haas wrote: On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: I just created sections in the SGML manual chapters about GIST, GIN, and SP-GIST to hold documentation about the standard opclasses provided for them: http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html We never had any well-defined place to discuss such opclasses before, but I think it's past time we did. +1. Great idea. Agreed. I find that in my browser it's a bit difficult to make out the different operators in the long list; ISTM a single space between operators isn't enough. I wonder how can we do better; sticking commas (or any other single character, really) between them is not likely to improve matters; and using one column per operator is not going to work very well. Maybe if it were possible to split a single cell in several sub-cells; or perhaps there's a way to specify two whitespace units, or a long space, or something like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: One other point here is that non-default opclasses can't be used in UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't matter here since these aren't btree opclasses, but is there a use-case for EXCLUDE with any of the supported jsonb operators? That sounds like an oversight that could better be fixed in EXCLUDE, no? Well, there hasn't been a use-case up to now. I'm not sure there's one yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation
Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html I find that in my browser it's a bit difficult to make out the different operators in the long list; ISTM a single space between operators isn't enough. Yeah, I noticed that too, but wasn't sure what to do about it. One line per operator isn't better, but I don't know how to get a bit more horizontal space into the lists. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
Robert Haas robertmh...@gmail.com writes: On Wed, Apr 9, 2014 at 12:24 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: Now when we grant the lock to particular transaction, depending on type of transaction, bit Mask will be set for either holdMaskByAutoTx or holdMaskByNormalTx. Similar when lock is ungranted, corresponding bitmask will be reset. That sounds pretty ugly, not to mention the fact that it will cause a substantial increase in the amount of memory required to store PROCLOCKs. It will probably slow things down, too. More to the point, why isn't it a flat-out bad idea? I can see no justification for distinguishing normal and autonomous transactions at this level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
Tom Lane t...@sss.pgh.pa.us: Committed with some additional documentation work. Thanks for submitting this! Thank you for committing. I had not thought of using different structure for the index. It works faster with my test case, too. I am sending rebased version of the consistent operator names patch in case you would like to include it to the same release. This is what I wrote about this change before: That is why I prepared it as a separate patch on top of the others. It is not only consistency with the range types: @ and @ symbols used for containment everywhere except the inet data types, particularly on the geometric types, arrays; cube, hstore, intaray, ltree extensions. The patch does not just change the operator names, it leaves the old ones, adds new operators with GiST support and changes the documentation, like your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could not find why did you leave the inet operators unchanged on that commit, in the mailing list archives [1]. GiST support will be a promotion for users to switch to the new operators, if we make this change with it. This change will also indirectly deprecate the undocumented non-transparent btree index support that works sometimes for some of the subnet inclusion operators [2]. [1] http://www.postgresql.org/message-id/14277.1157304...@sss.pgh.pa.us [2] http://www.postgresql.org/message-id/389.1042992...@sss.pgh.pa.us inet-operator-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 8:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. There's definitely something to be said for that. Default opclasses are sensible when there's basically only one behavior that's interesting for most people. We can already see that that's not going to be the case for jsonb indexes, at least not with the currently available alternatives. I've heard worse ideas. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On 2014-04-05 00:21:47 +0200, Andres Freund wrote: On 2014-01-15 21:13:18 -0500, Peter Eisentraut wrote: The attached patch will probably fail to apply because of the pg_proc changes. So if you want to try it out, look into the header for the Git hash it was based off. I'll produce a properly merged version when this approach is validated. Also, some implementation details could probably take some revising after a couple of nights of sleep. ;-) You've slept since? ;) So, I am only doign a look through the patch, to see where it has gone in the past year. ... So, unless somebody protests PDQ, I am going to mark this Returned with Feedback. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
How can I pass or set the value of pset variable for an regression test? I just wrote some tests using \pset columns to control the output. Having figured out what the point of the patch is I'm pretty happy with the functionality. It definitely is something I would appreciate having. One thing I noticed is that it's not using the formatting characters to indicate newlines and wrapping. I'm trying to add that myself but it's taking me a bit to figure out what's going on in the formatting code. I do have a feeling this is all rearranging the deck chairs on an obsolete technology and this should all be replaced with a web ui. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed dean.a.rash...@gmail.com writes: OK, I'm marking this ready for committer attention, on the understanding that that doesn't include the invtrans_minmax patch. I've started to look at this patch set. After rereading the thread, I'm thinking that it's a mistake to just add the inverse transition function and require it to work with the standard forward transition function for the aggregate. There was discussion upthread of providing two separate forward transition functions, but Florian argued that that would do nothing that you couldn't accomplish with a runtime check in the forward function. I think that's nonsense though, because one of the key points here is that an invertible aggregate may require a more complex transition state data structure --- in particular, if you're forced to go from a pass-by-value to a pass-by-reference data type, right there you are going to take a big hit in aggregate performance, and there is no way for the forward transition function to avoid it. The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I think what'd make sense is to have a separate forward function *and* separate state datatype to be used when we want invertible aggregation (hm, and I guess a separate final function too). So an aggregate definition would include two entirely independent implementations. If they chance to share sfunc and stype, fine, but they don't have to. This would mean we'd need some new names for the doppelgangers of the CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond (but not sortop). I guess that'd open up a chance to use a more uniform naming scheme, but I'm not too sure what would be good. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. There's definitely something to be said for that. Default opclasses are sensible when there's basically only one behavior that's interesting for most people. We can already see that that's not going to be the case for jsonb indexes, at least not with the currently available alternatives. Not having a default would force users to make decisions explicitly. Is that what we want? I don't like the idea of having no default opclass. I think there's a huge usability gain in being able to just create an index on a column and have it do something reasonable for most use cases. I can get behind the idea of having separate index opclasses for paths and path-value pairs but I suspect the default should just be to index both in the same index. If we can have one default index opclass that supports containment and existence and then other opclasses that are smaller but only support a subset of the operators that would seem like the best compromise. I'm a bit confused by Heikki's list though. I would expect path and path-value pair to be the only useful ones. I'm not clear what an index on keys or key-value would be -- it would index just the top-level keys and values without recursing? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buildfarm network under attack
According to Joshua Drake on IRC, there is a DDOS in progress against a customer in the same network as the buildfarm. As a consequence, you may experience slowness, or inability to connect while they deal with it. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
I wrote: ... an invertible aggregate may require a more complex transition state data structure --- in particular, if you're forced to go from a pass-by-value to a pass-by-reference data type, right there you are going to take a big hit in aggregate performance, and there is no way for the forward transition function to avoid it. The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? As a quick check, I compared aggregation performance in HEAD, non-assert builds, with and without --disable-float8-byval on a 64-bit machine. So this tests replacing a pass-by-val transition datatype with a pass-by-ref one without any other changes. There's essentially no difference in performance of sum(int4), AFAICT, but that's because int4_sum goes out of its way to cheat and avoid palloc overhead. I looked to the bit_and() aggregates to see what would happen to an aggregate not thus optimized. As expected, int4 and int8 bit_and are just about the same speed if int8 is pass by value ... but if it's pass by ref, the int8 case is a good 60% slower. So added palloc overhead, at least, is a no-go. I see that the patched version of sum(int4) avoids that trap, but nonetheless it's replaced a pretty cheap transition function with a less cheap function, namely the function previously used for avg(int4). A quick test says that avg(int4) is about five percent slower than sum(int4), so that's the kind of hit we'd be taking on non-windowed aggregations if we do it like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
Dean Rasheed dean.a.rash...@gmail.com writes: On 10 March 2014 03:36, Craig Ringer cr...@2ndquadrant.com wrote: I've found an issue with updatable security barrier views. Locking is being pushed down into the subquery. Locking is thus applied before user-supplied quals are, so we potentially lock too many rows. That has nothing to do with *updatable* security barrier views, because that's not an update. In fact you get that exact same plan on HEAD, without the updatable security barrier views patch. I got around to looking at this. The proximate reason for the two LockRows nodes is: (1) The parser and rewriter intentionally insert RowMarkClauses into both the outer query and the subquery when a FOR UPDATE/SHARE applies to a subquery-in-FROM. The comment in transformLockingClause explains why: * FOR UPDATE/SHARE of subquery is propagated to all of * subquery's rels, too. We could do this later (based on * the marking of the subquery RTE) but it is convenient * to have local knowledge in each query level about which * rels need to be opened with RowShareLock. that is, if we didn't push down the RowMarkClause, processing of the subquery would be at risk of opening relations with too weak a lock. In the example case, this pushdown is actually done by the rewriter's markQueryForLocking function, but it's just emulating what the parser would have done if the view query had been written in-line. (2) The planner doesn't consider this, and generates a LockRows plan node for each level of the query, since both of them have rowMarks. However, I'm not sure this is a bug, or at least it's not the same bug you think it is. The lower LockRows node is doing a ROW_MARK_EXCLUSIVE mark on the physical table rows, while the upper one is doing a ROW_MARK_COPY on the virtual rows emitted by the subquery. *These are not the same thing*. A ROW_MARK_COPY isn't a lock of any sort; it's just there to allow EvalPlanQual to see the row that was emitted from the subquery, in case a recheck has to be done during a Read Committed UPDATE/DELETE. Since this is a pure SELECT, the upper locking action is useless, and arguably the planner ought to be smart enough not to emit it. But it's just wasting some cycles, it's not changing any semantics. So if you wanted user-supplied quals to limit which rows get locked physically, they would need to be applied before the lower LockRows node. To my mind, it's not immediately apparent that that is a reasonable expectation. The entire *point* of a security_barrier view is that unsafe quals don't get pushed into it, so why would you expect that those quals get applied early enough to limit locking? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote: As a quick check, I compared aggregation performance in HEAD, non-assert builds, with and without --disable-float8-byval on a 64-bit machine. So this tests replacing a pass-by-val transition datatype with a pass-by-ref one without any other changes. There's essentially no difference in performance of sum(int4), AFAICT, but that's because int4_sum goes out of its way to cheat and avoid palloc overhead. I looked to the bit_and() aggregates to see what would happen to an aggregate not thus optimized. As expected, int4 and int8 bit_and are just about the same speed if int8 is pass by value ... but if it's pass by ref, the int8 case is a good 60% slower. True, but that just means that aggregate transition functions really ought to update the state in-place, no? So added palloc overhead, at least, is a no-go. I see that the patched version of sum(int4) avoids that trap, but nonetheless it's replaced a pretty cheap transition function with a less cheap function, namely the function previously used for avg(int4). A quick test says that avg(int4) is about five percent slower than sum(int4), so that's the kind of hit we'd be taking on non-windowed aggregations if we do it like this. That's rather surprising though. AFAICS, there's isn't much difference between the two transfer functions int4_sum and int4_avg_accum at all. The differences come down to (+ denoting things which ought to make int4_avg_accum slower compared to int4_sum, - denoting the opposite) 1. +) int4_avg_accum calls AggCheckCallContext 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS to verify that the state is a 2-element array without NULL entries 4. -) int4_sum checks if the input is NULL The number of conditional branches should be about the same (and all are seldomly taken). The validity checks on the state array, i.e. (3), should be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but not so much more expensive either. That leaves the AggCheckCallContext call. If that call costs us 5%, maybe we can find a way to make it faster, or get rid of it entirely? Still seems a lot of a call of a not-very-complex function, though... I'll go and check the disassembly - maybe something in int4_avg_accum turns out to be more complex than is immediately obvious. I'll also try to create a call profile, unless you already have one from your test runs. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
* Tom Lane (t...@sss.pgh.pa.us) wrote: Dean Rasheed dean.a.rash...@gmail.com writes: On 10 March 2014 03:36, Craig Ringer cr...@2ndquadrant.com wrote: I've found an issue with updatable security barrier views. Locking is being pushed down into the subquery. Locking is thus applied before user-supplied quals are, so we potentially lock too many rows. That has nothing to do with *updatable* security barrier views, because that's not an update. In fact you get that exact same plan on HEAD, without the updatable security barrier views patch. I got around to looking at this. Thanks! So if you wanted user-supplied quals to limit which rows get locked physically, they would need to be applied before the lower LockRows node. Right. To my mind, it's not immediately apparent that that is a reasonable expectation. The entire *point* of a security_barrier view is that unsafe quals don't get pushed into it, so why would you expect that those quals get applied early enough to limit locking? Right, we don't want unsafe quals to get pushed down below the security_barrier view. The point here is that those quals should not be able to lock rows which they don't even see. My understanding of the issue was that the unsafe quals were being able to see and lock rows that they shouldn't know exist. If that's not the case then I don't think there's a bug here..? Craig/Dean, can you provide a complete test case which shows the issue? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote: There was discussion upthread of providing two separate forward transition functions, but Florian argued that that would do nothing that you couldn't accomplish with a runtime check in the forward function. I think that's nonsense though, because one of the key points here is that an invertible aggregate may require a more complex transition state data structure --- in particular, if you're forced to go from a pass-by-value to a pass-by-reference data type, right there you are going to take a big hit in aggregate performance, and there is no way for the forward transition function to avoid it. To be precise, my point was that *only* having a separate non-invertible forward transition function is pointless, exactly because of the reason you gave - that won't allow a cheaper state representation for the non-invertible case. So all such a non-invertible forward transition function can do is to skip some bookkeeping that's required by the inverse transition function - and *that* can just as easily be accomplished by a runtime check. I was (and still am) not in favour of duplicating the whole quadruple of (state, initialvalue, transferfunction, finalfunction) because it seems excessive. In fact, I believed that doing this would probably be grounds for outright rejection of the patch, on the base of catalog bloat. And your initial response to this suggestion seemed to confirm this. The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I'm pretty sure David Rowley did some benchmarking. The results should be in this thread somewhere I think, but they currently evade me... Maybe David can re-post, if he's following this... I think what'd make sense is to have a separate forward function *and* separate state datatype to be used when we want invertible aggregation (hm, and I guess a separate final function too). So an aggregate definition would include two entirely independent implementations. If they chance to share sfunc and stype, fine, but they don't have to. This would mean we'd need some new names for the doppelgangers of the CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond (but not sortop). I guess that'd open up a chance to use a more uniform naming scheme, but I'm not too sure what would be good. If we really go down that road (and I'm far from convinced), then maybe instead of having a bunch of additional fields, we could have separate entries in pg_aggregate for the two cases, with links between them. The non-invertible aggregates would have something like _noninv or so appended to their name, and we'd automatically use them if we know we won't need to remove entries from the aggregation state. That would also allow the users to *force* the non-invertible aggregate to be used by simply saying SUM_NONINV instead of SUM. Then all we'd need would be an additional OID field that links the invertible to the non-invertible definition. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: So if you wanted user-supplied quals to limit which rows get locked physically, they would need to be applied before the lower LockRows node. Right. To my mind, it's not immediately apparent that that is a reasonable expectation. The entire *point* of a security_barrier view is that unsafe quals don't get pushed into it, so why would you expect that those quals get applied early enough to limit locking? Right, we don't want unsafe quals to get pushed down below the security_barrier view. The point here is that those quals should not be able to lock rows which they don't even see. That sounds a bit confused. The quals aren't getting to see rows they shouldn't be allowed to see. The issue rather is whether rows that the user quals would exclude might get locked anyway because the locking happens before those quals are checked. [ thinks for awhile... ] I guess the thing that is surprising is that ordinarily, if you say SELECT ... WHERE ... FOR UPDATE, only rows passing the WHERE clause get locked. If there's a security view involved, that gets reversed (at least for unsafe clauses). So from that standpoint it does seem pretty bogus. I'm not sure how much we can do about it given the current implementation technique for security views. A physical row lock requires access to the source relation and the ctid column, neither of which are visible at all outside an un-flattened subquery. Anyway, this is definitely a pre-existing issue with security_barrier views (or really with any unflattenable subquery), and so I'm not sure if it has much to do with the patch at hand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug f...@phlo.org writes: I was (and still am) not in favour of duplicating the whole quadruple of (state, initialvalue, transferfunction, finalfunction) because it seems excessive. In fact, I believed that doing this would probably be grounds for outright rejection of the patch, on the base of catalog bloat. And your initial response to this suggestion seemed to confirm this. Well, I think it's much more likely that causing a performance penalty for cases unrelated to window aggregates would lead to outright rejection :-(. The majority of our users probably don't ever use window functions, but for sure they've heard of SUM(). We can't penalize the non-window case. Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this suggestion) is a little annoying but it doesn't sound like a show stopper. It seems reasonable to assume that the extra initval would be NULL in most cases, so it's probably a net addition of 12 bytes per row. On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote: The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I'm pretty sure David Rowley did some benchmarking. The results should be in this thread somewhere I think, but they currently evade me... Maybe David can re-post, if he's following this... I saw benchmarks addressing window aggregation, but none looking for side-effects on plain aggregation. If we really go down that road (and I'm far from convinced), then maybe instead of having a bunch of additional fields, we could have separate entries in pg_aggregate for the two cases, with links between them. That seems like a complete mess; in particular it would break the primary key for pg_aggregate (aggfnoid), and probably break every existing query that looks at pg_aggregate. Some extra fields would not break such expectations (in fact we've added fields to pg_aggregate in the past). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bogus tsdict, tsparser, etc object identities
I noticed that pg_identify_object() gives a bogus answer for the text search object types: it is failing to schema-qualify the objects. I guess at the time I thought that those object types were global, not contained within schemas (as seems reasonable. Who would want to have different text search parsers in different schemas anyway?) alvherre=# CREATE TEXT SEARCH DICTIONARY alt_ts_dict1 (template=simple); CREATE TEXT SEARCH DICTIONARY alvherre=# create schema foo; CREATE SCHEMA alvherre=# CREATE TEXT SEARCH DICTIONARY foo.alt_ts_dict1 (template=simple); CREATE TEXT SEARCH DICTIONARY Before patch, alvherre=# select identity.* from pg_ts_dict, pg_identify_object('pg_ts_dict'::regclass, oid, 0) identity where schema 'pg_catalog'; type | schema | name | identity ++--+-- text search dictionary | public | alt_ts_dict1 | alt_ts_dict1 text search dictionary | foo| alt_ts_dict1 | alt_ts_dict1 (2 filas) After patch, alvherre=# select identity.* from pg_ts_dict, pg_identify_object('pg_ts_dict'::regclass, oid, 0) identity where schema 'pg_catalog'; type | schema | name | identity ++--+- text search dictionary | public | alt_ts_dict1 | public.alt_ts_dict1 text search dictionary | foo| alt_ts_dict1 | foo.alt_ts_dict1 (2 filas) This problem is new in 9.3, so backpatching to that. Prior to that we didn't have object identities. The attached patch demonstrates the fix for tsdict only, but as I said above all text search object types are affected, so I'm going to backpatch for all of them. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/catalog/objectaddress.c --- b/src/backend/catalog/objectaddress.c *** *** 3095,3100 getObjectIdentity(const ObjectAddress *object) --- 3095,3101 { HeapTuple tup; Form_pg_ts_dict formDict; + char *schema; tup = SearchSysCache1(TSDICTOID, ObjectIdGetDatum(object-objectId)); *** *** 3102,3109 getObjectIdentity(const ObjectAddress *object) elog(ERROR, cache lookup failed for text search dictionary %u, object-objectId); formDict = (Form_pg_ts_dict) GETSTRUCT(tup); appendStringInfoString(buffer, ! quote_identifier(NameStr(formDict-dictname))); ReleaseSysCache(tup); break; } --- 3103,3112 elog(ERROR, cache lookup failed for text search dictionary %u, object-objectId); formDict = (Form_pg_ts_dict) GETSTRUCT(tup); + schema = get_namespace_name(formDict-dictnamespace); appendStringInfoString(buffer, ! quote_qualified_identifier(schema, ! NameStr(formDict-dictname))); ReleaseSysCache(tup); break; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
Haribabu Kommi wrote: I modified the autovac_balance_cost function to balance the costs using the number of running workers, instead of default vacuum cost parameters. Just as a heads-up, this patch wasn't part of the commitfest, but I intend to review it and possibly commit for 9.4. Not immediately but at some point. Arguably this is a bug fix, since the autovac rebalance code behaves horribly in cases such as the one described here, so I should consider a backpatch right away. However I don't think it's a good idea to do that without more field testing. Perhaps we can backpatch later if the new code demonstrates its sanity. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 23:17 , Florian Pflug f...@phlo.org wrote: On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote: A quick test says that avg(int4) is about five percent slower than sum(int4), so that's the kind of hit we'd be taking on non-windowed aggregations if we do it like this. That's rather surprising though. AFAICS, there's isn't much difference between the two transfer functions int4_sum and int4_avg_accum at all. The differences come down to (+ denoting things which ought to make int4_avg_accum slower compared to int4_sum, - denoting the opposite) 1. +) int4_avg_accum calls AggCheckCallContext 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS to verify that the state is a 2-element array without NULL entries 4. -) int4_sum checks if the input is NULL I've done a bit of profiling on this (using Instruments.app on OSX). One thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, since we know that the datum cannot possibly be toasted I think (or if it was, nodeAgg.c should do the de-toasting). The profile also attributes a rather large percent of the total runtime of int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting rid of that would help quite a few transition functions, invertible or not. That certainly seems doable - we'd need a way to mark functions as internal support functions, and prevent user-initiated calls of such functions. Transition functions marked that way could then safely scribble over their state arguments without having to consult AggCheckCallContext() first. I've also compared the disassemblies of int4_sum and int4_avg_accum, and apart from these differences, they are pretty similar. Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's transition type is that it needs to return NULL, not 0, if zero rows were aggregates. It might seems that it could just use int8 as state with initial value NULL, but that only works if the transition functions are non-strict (since the special case of state type == input type doesn't apply here). And for non-strict transition functions need to deal with NULL inputs themselves, which means counting the number of non-NULL inputs.. That problem would go away though if we had support for an initalfunction, which would receive the first input value and initialize the state. In the case of SUM(), the initial function would be strict, and thus would be called on the first non-NULL input value, which it'd convert to int8 and return as the new state. However, I still believe the best approach at this point is to just work on making int4_avg_accum faster. I still see no principal reason what it has to be noticeably slower - the only additional work it absolutely *has* to perform is *one* 64-bit increment. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 9 April 2014 22:55, Tom Lane t...@sss.pgh.pa.us wrote: Florian Pflug f...@phlo.org writes: I was (and still am) not in favour of duplicating the whole quadruple of (state, initialvalue, transferfunction, finalfunction) because it seems excessive. In fact, I believed that doing this would probably be grounds for outright rejection of the patch, on the base of catalog bloat. And your initial response to this suggestion seemed to confirm this. Well, I think it's much more likely that causing a performance penalty for cases unrelated to window aggregates would lead to outright rejection :-(. The majority of our users probably don't ever use window functions, but for sure they've heard of SUM(). We can't penalize the non-window case. Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this suggestion) is a little annoying but it doesn't sound like a show stopper. It seems reasonable to assume that the extra initval would be NULL in most cases, so it's probably a net addition of 12 bytes per row. On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote: The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I'm pretty sure David Rowley did some benchmarking. The results should be in this thread somewhere I think, but they currently evade me... Maybe David can re-post, if he's following this... I saw benchmarks addressing window aggregation, but none looking for side-effects on plain aggregation. If we really go down that road (and I'm far from convinced), then maybe instead of having a bunch of additional fields, we could have separate entries in pg_aggregate for the two cases, with links between them. That seems like a complete mess; in particular it would break the primary key for pg_aggregate (aggfnoid), and probably break every existing query that looks at pg_aggregate. Some extra fields would not break such expectations (in fact we've added fields to pg_aggregate in the past). This may initially sound unrelated, but I think it might address some of these issues. Suppose we added a 'firsttrans' function, that took a single argument (the first value to be aggregated) and was responsible for creating the initial state from that first value. This would apply to aggregates that ignore null values, but whose transition function cannot currently be declared strict (either because the state type is internal, or because it is not the same as the aggregate's argument type). I think quite a lot of the existing aggregates fall into this category, and this would allow their transition functions to be made strict and simplified --- no more testing if the state is null, and then building it, and no more testing if the argument is null and ignoring it. That might give a noticeable performance boost in the regular aggregate case, especially over data containing nulls. But in addition, it would help with writing inverse transition functions because if the transition functions could be made strict, they wouldn't need to do null-counting, which would mean that their state types would not need to be expanded. So for example sum(int4) could continue to have int8 as its state type, it could use int8(int4) as its firsttrans function, and int4_sum() could become strict and lose all its null-handling logic. Then int4_sum_inv() would be the trivial to write - just doing the same in reverse. I'm not sure it helps for all aggregates, but there are certainly some where it would seem to simplify things. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add psql tab completion for event triggers
On 10/04/14 00:23, Robert Haas wrote: On Tue, Apr 8, 2014 at 5:27 AM, Ian Barwick i...@2ndquadrant.com wrote: On 08/04/14 18:22, Ian Barwick wrote: As it was kind of annoying not to have this when playing around with event triggers. This also tightens up the existing tab completion for ALTER TRIGGER, which contained redundant code for table name completion, and which was also causing a spurious RENAME TO to be inserted in this context: CREATE EVENT TRIGGER foo ON {event} ^I Apologies, previous patch had some unrelated changes in it. Correct patch attached. This *still* has some unrelated things in it, like s/Pgsql/Postgres/, and numerous hunks consisting entirely of whitespace changes and/or changes to unrelated comments. Apologies again, that was ill-thought out. Revised patch attached with only the additions related to event triggers, and the small fix for ALTER TRIGGER mentioned above which ensures RENAME TO is applied only when ALTER TRIGGER name ON sth was input; currently there is no check for a preceding ALTER, resulting in the spurious RENAME TO when completing CREATE EVENT TRIGGER. Also, what's the point of this hunk: *** psql_completion(const char *text, int st *** 1318,1340 pg_strcasecmp(prev2_wd, TRIGGER) == 0) COMPLETE_WITH_CONST(ON); - else if (pg_strcasecmp(prev4_wd, ALTER) == 0 -pg_strcasecmp(prev3_wd, TRIGGER) == 0) - { - completion_info_charp = prev2_wd; - COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); - } - /* !* If we have ALTER TRIGGER sth ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) ! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); /* ALTER TRIGGER name ON name */ ! else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 pg_strcasecmp(prev2_wd, ON) == 0) COMPLETE_WITH_CONST(RENAME TO); --- 1355,1374 pg_strcasecmp(prev2_wd, TRIGGER) == 0) COMPLETE_WITH_CONST(ON); /* !* If we have ALTER TRIGGER name ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) ! { ! completion_info_charp = prev2_wd; ! COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); ! } /* ALTER TRIGGER name ON name */ ! else if (pg_strcasecmp(prev5_wd, ALTER) == 0 !pg_strcasecmp(prev4_wd, TRIGGER) == 0 pg_strcasecmp(prev2_wd, ON) == 0) COMPLETE_WITH_CONST(RENAME TO); I'll submit that as a separate patch. This was intended to fix this: else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0) { completion_info_charp = prev2_wd; COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); } /* * If we have ALTER TRIGGER sth ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); as the second else if clause is redundant. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c new file mode 100644 index 202ffce..6d26ffc *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** static const SchemaQuery Query_for_list_ *** 714,719 --- 714,724 FROM pg_catalog.pg_prepared_statements \ WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s' + #define Query_for_list_of_event_triggers \ + SELECT pg_catalog.quote_ident(evtname) \ +FROM pg_catalog.pg_event_trigger \ + WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s' + /* * This is a list of all things in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. *** static const pgsql_thing_t words_after_c *** 746,751 --- 751,757 {DATABASE, Query_for_list_of_databases}, {DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW}, {DOMAIN, NULL, Query_for_list_of_domains}, + {EVENT TRIGGER, NULL, NULL}, {EXTENSION, Query_for_list_of_extensions}, {FOREIGN DATA WRAPPER, NULL, NULL}, {FOREIGN TABLE,
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 01:13, Florian Pflug f...@phlo.org wrote: Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's transition type is that it needs to return NULL, not 0, if zero rows were aggregates. It might seems that it could just use int8 as state with initial value NULL, but that only works if the transition functions are non-strict (since the special case of state type == input type doesn't apply here). And for non-strict transition functions need to deal with NULL inputs themselves, which means counting the number of non-NULL inputs.. That problem would go away though if we had support for an initalfunction, which would receive the first input value and initialize the state. In the case of SUM(), the initial function would be strict, and thus would be called on the first non-NULL input value, which it'd convert to int8 and return as the new state. Ah snap! However, I still believe the best approach at this point is to just work on making int4_avg_accum faster. I still see no principal reason what it has to be noticeably slower - the only additional work it absolutely *has* to perform is *one* 64-bit increment. In the best case that would make sum() not noticeably slower than avg(), whereas using a firsttrans/initialfunction would potentially make both of them faster than they currently are, and not just in window queries. Also, IMO a first/initial function leads to a cleaner separation of logic, allowing some of the transition functions to be simplified rather than becoming more complex. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
On Thursday, Apr 10,2014 at 1:15Am, Álvaro Herrera wrote: Magnus Hagander wrote: On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: So it'd be an array, and by default you'd have something like: basebackup_skip_path = $log_directory ? Maybe use it to skip backup labels by default as well. basebackup_skip_path = $log_directory, $backup_label_files I hadn't considered any details, but yes, someting along that line. And then you could also include arbitrary filenames or directories should you want. E.g. if you use the data directory to store your torrents or something. Man, that's a great idea. Database servers have lots of diskspace in that partition, so it should work really well. Thanks! Yes, It sounds like a good idea. I will look into this and start working in sometime. Thanks Regards, Vaishnavi Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
* Tom Lane (t...@sss.pgh.pa.us) wrote: That sounds a bit confused. It was- I clearly hadn't followed the thread entirely. The quals aren't getting to see rows they shouldn't be allowed to see. The issue rather is whether rows that the user quals would exclude might get locked anyway because the locking happens before those quals are checked. This, imv, moves this from a 'major' bug to more of a 'performance' or 'obnoxious' side-effect issue that we should address *eventually*, but not something to hold up the RLS implementation. We often have more locking than is strictly required and then reduce it later (see also: ALTER TABLE lock reduction thread). It will be an unfortunate gotcha for a release or two, but hopefully we'll be able to address it... [ thinks for awhile... ] I guess the thing that is surprising is that ordinarily, if you say SELECT ... WHERE ... FOR UPDATE, only rows passing the WHERE clause get locked. If there's a security view involved, that gets reversed (at least for unsafe clauses). So from that standpoint it does seem pretty bogus. I'm not sure how much we can do about it given the current implementation technique for security views. A physical row lock requires access to the source relation and the ctid column, neither of which are visible at all outside an un-flattened subquery. Interesting. I'm trying to reason out why we don't have it already in similar situations; we can't *always* flatten a subquery... Updatable security_barrier views and RLS may make this a more promenint issue but hopefully that'll just encourage work on fixing it (perhaps take the higher level lock and then figure out a way to drop it when the rows don't match the later quals..?). Anyway, this is definitely a pre-existing issue with security_barrier views (or really with any unflattenable subquery), and so I'm not sure if it has much to do with the patch at hand. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
Stephen Frost sfr...@snowman.net writes: Interesting. I'm trying to reason out why we don't have it already in similar situations; we can't *always* flatten a subquery... I think we do, just nobody's particularly noticed (which further reduces the urgency of finding a solution, I suppose). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] four minor proposals for 9.5
On Tue, Apr 8, 2014 at 9:07 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: So do you want to just print lock time for error'd statements, won't it better to do it for non-error'd statements as well or rather I feel it can be more useful for non-error statements? Do we already have some easy way to get wait-time for non-error statements? There are two points: a) we have no a some infrastructure how to glue some specific info to any query other than log_line_prefix. Can't we do that by using log_duration and log_min_duration_statement? For Example, if I enable these parameters, below is the log: LOG: duration: 343.000 ms statement: create table t1(c1 int); And I have no any idea, how and what implement better. And I don't think so any new infrastructure (mechanism) is necessary. log_line_prefix increase log size, but it is very well parseable - splunk and similar sw has no problem with it. One thing that could happen if we implement total lock time at log_line_prefix is that if user enables log_lock_waits, then it will start printing duration for each lock wait time, not sure again it depends on implementation. b) lock time can be interesting on error statements too - for example - you can cancel locked queries - so you would to see a lock time and duration for cancelled queries. So this implementation can be sensible too. Agreed, I just said it will be quite useful for non-error'd long running statements as well, so it might be good idea to see if we can implement it for successful statements as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Mon, Apr 7, 2014 at 7:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: On Sat, Apr 5, 2014 at 8:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: AFAICS, pg_ctl already reports to stderr if stderr is a tty. This whole issue only comes up when pg_ctl itself is running as a service (which I guess primarily means starting/stopping the postgresql service?). So that puts extra weight on trying to be sure that error messages go to a predictable place; the user's going to be hard-pressed to debug background failures without that. Agreed. I think if this needs to fixed, then it will be better to do as per your suggestion of adding a new switch. So I have marked this patch as Returned with feedback. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] small typo about comment in xlog.c
Hi, I'm reading xlog.c, and I noticed a comment of do_pg_abort_backup is typo. ... 10247 * NB: This is only for aborting a non-exclusive backup that doesn't write 10248 * backup_label. A backup started with pg_stop_backup() needs to be finished 10249 * with pg_stop_backup(). ... I think A backup started with pg_stop_backup() should be A backup started with pg_start_backup(). This is a bug about source comment, so it's not big problem. But I want to fix the comment. See attached patch. regards, -- Tomonari Katsumata diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0b4a5f6..3551d94 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10245,7 +10245,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) * an error handler. * * NB: This is only for aborting a non-exclusive backup that doesn't write - * backup_label. A backup started with pg_stop_backup() needs to be finished + * backup_label. A backup started with pg_start_backup() needs to be finished * with pg_stop_backup(). */ void -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 12:14, Pavan Deolasee Wrote: Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. I could think of few global variables like transaction properties related(i.e. read-only mode, isolation level etc). As I plan to keep transaction properties of autonomous transaction same as main transaction, so there is no need to have these global variables separately. Apart from this there are global variables like with-in transaction counters, GUC, xactStartTimeStamp. I think there is no need to maintain these variables also separately. They can continue from previous value for autonomous transaction also similar to as sub-transaction does. In-case of autonomous transaction, only specific global variables initialized are related to resources (similar to sub-transaction), which anyway gets stored in current transaction state. Please let me know if I am missing something or if you have some specific global variables related issue. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
On Tue, Apr 8, 2014 at 11:32 PM, Bruce Momjian br...@momjian.us wrote: On Sun, Apr 6, 2014 at 11:45:59AM +0530, Amit Kapila wrote: In fact, this C program compiled by gcc on Debian issues no compiler warnings and returns 'hello', showing that -1 and ~0 compare as equal: int main(int argc, char **argv) { int i; unsigned int j; i = -1; j = ~0; if (i == j) printf(hello\n); return 0; } I have add below code to check it's usage as per PG: if (j 0) printf(hello-1\n); It doesn't print hello-1, which means that all the check's in code for sock_desc 0 can have problem. 1. int pg_foreach_ifaddr(PgIfAddrCallback callback, void *cb_data) sock = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0); if (sock == SOCKET_ERROR) Well, the actual problem here is that WSASocket() returns INVALID_SOCKET per the documentation, not SOCKET_ERROR. I did not use PGINVALID_SOCKET here because this is Windows-specific code, defining 'sock' as SOCKET. We could have sock defined as pgsocket, but because this is Windows code already, it doesn't seem wise to mix portability code in there. I think it's better to use check like below, just for matter of consistency with other place if (sock == INVALID_SOCKET) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On Thu, Apr 10, 2014 at 10:44 AM, Rajeev rastogi rajeev.rast...@huawei.comwrote: On 09 April 2014 12:14, Pavan Deolasee Wrote: Whenever I was asked to have a look at implementing this feature, I always wondered about the great amount of global state that a backend maintains which is normally tied to a single top transaction. Since AT will have same characteristics as a top level transaction, I wonder how do you plan to separate those global state variables ? Sure, we can group them in a structure and put them on a stack when an AT starts and pop them off when the original top transaction becomes active again, finding all such global state variables is going to be tricky. I could think of few global variables like transaction properties related(i.e. read-only mode, isolation level etc). As I plan to keep transaction properties of autonomous transaction same as main transaction, so there is no need to have these global variables separately. Apart from this there are global variables like with-in transaction counters, GUC, xactStartTimeStamp. I think there is no need to maintain these variables also separately. They can continue from previous value for autonomous transaction also similar to as sub-transaction does. Hmm. Is that in line with what other databases do ? I would have preferred AT to run like a standalone transaction without any influence of the starting transaction, managing its own resources/locks/visibility/triggers etc. In-case of autonomous transaction, only specific global variables initialized are related to resources (similar to sub-transaction), which anyway gets stored in current transaction state. Please let me know if I am missing something or if you have some specific global variables related issue. No, I don't have any specific issues in mind. Mostly all such global state is managed through various AtStart/AtEOX and related routines. So a careful examination of all those routines will give a good idea what needs to be handled. You probably will require to write AtATStart/AtATEOX and similar routines to manage the state at AT start/commit/rollback. Sorry, I haven't looked at your WIP patch yet. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] four minor proposals for 9.5
2014-04-10 5:50 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: On Tue, Apr 8, 2014 at 9:07 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: So do you want to just print lock time for error'd statements, won't it better to do it for non-error'd statements as well or rather I feel it can be more useful for non-error statements? Do we already have some easy way to get wait-time for non-error statements? There are two points: a) we have no a some infrastructure how to glue some specific info to any query other than log_line_prefix. Can't we do that by using log_duration and log_min_duration_statement? For Example, if I enable these parameters, below is the log: LOG: duration: 343.000 ms statement: create table t1(c1 int); yes, sorry. You have true. I though about this possibility, and I choose log_line_prefix due simple configurability and better parserability. But again, enhancing log_duration feature can be implement together with enhancing log_line_prefix. I'll try to visualise in prototype. Regards Pavel And I have no any idea, how and what implement better. And I don't think so any new infrastructure (mechanism) is necessary. log_line_prefix increase log size, but it is very well parseable - splunk and similar sw has no problem with it. One thing that could happen if we implement total lock time at log_line_prefix is that if user enables log_lock_waits, then it will start printing duration for each lock wait time, not sure again it depends on implementation. b) lock time can be interesting on error statements too - for example - you can cancel locked queries - so you would to see a lock time and duration for cancelled queries. So this implementation can be sensible too. Agreed, I just said it will be quite useful for non-error'd long running statements as well, so it might be good idea to see if we can implement it for successful statements as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com