Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Hello. The attached ugly patch does it. We seem should put NO_LOCALE=1 on the 'make check' command line for the encodings not compatible with the environmental locale, although it looks work. +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo snip. Hrm, that's quite cute. I dunno if there is a more cannon way of doing the above-- but it seems to work. I'm not sure this regression test is worth it. I'm thinking maybe we should just remove theegressionegression test instead. I agree. That is the fundamental question. I've coded just for my fun but I don't see not so much signicance to do that. We might omit the test for this which is non-ciritical and corner cases. I'll leave it to your decision whether to do that. There is a minor issue with the patch where sql/plperl_lc_sql_ascii.sql contains the text plperl_lc.sql. After copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything worked as described. Ah. It is what was a simbolic link. I made the patch with doubt whether symlink could be encoded into diff, and saw the doubious result but left as it is :-p. I leaned that no meaningful symbolic-link cannot be used in source tree managed by git. -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
Andres Freund Sent: Thursday, June 21, 2012 5:11 PM 4.) Log enough information in the walstream to make decoding possible using only the walstream. What I understood is that enough information is catalog data. Is that right or something else? Advantages: * Decoding can optionally be done on the master * No catalog syncing/access required * its possible to make this architecture independent Disadvantage: * high to very high implementation overhead depending on efficiency aims * high space overhead in the wal because at least all the catalog information In Multiversion approach also, there will be overhead of space to maintain multiple versions irrespective of any approach you use. needs to be logged in a transactional manner repeatedly Why it needs to be logged repeatedly, once we log the catalog information in WAL, during that time we can disallow/block other DDL's and after that changes to Catalog information can be retrievied from WAL only. * misuses wal far more than other methods What is the misuse in this, I believe it can be later used for log mining purposes also. * significant new complexity in somewhat cricital code paths (heapam.c) * insanely high space overhead if the decoding should be possible architecture independent The option 4 seems to be better as compare to others w.r.t top level approach to solve the problem. Some other databases also uses similar approach for the use cases similar to what you have described. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote: 2.) Keep the decoding site up2date by replicating the catalog via normal HS recovery mechanisms. Advantages: * most of the technology is already there * minimal overhead (space, performance) * no danger of out of sync catalogs * no support for command triggers required that can keep a catalog in sync, including oids Disadvantages: * driving the catalog recovery that way requires some somewhat intricate code as it needs to be done in lockstep with decoding the wal-stream * requires an additional feature to guarantee HS always has enough information to be queryable after a crash/shutdown * some complex logic/low-level fudging required to keep the transactional behaviour sensible when querying the catalog * full version/architecture compatibility required * the decoding site will always ever be only readable My initial reaction was this wont work, but that requires qualification since this is a complex topic: You can use this approach as long as you realise that the catalog it gives can never be rewound. So the generic approach to construct me a catalog as of this LSN would need to start with a base backup of the catalog and then roll forward to the appropriate LSN. Which means a generic user of this approach would need to be able to construct an initial catalog using a PITR. Constructing a decoding site requires you to a) take a partial base backup of the catalog b) apply WAL records to bring that forwards to the correct LSN, which would require some alteration of the recovery code to skip the files missing in a) So taking the approach of a decoding site means we have to modify recovery code, and even when we do that we still end up with a difficult to deploy option in the real world. Difficult to deploy becaus we need a whole new instance of Postgres, plus we need all of the WAL files, which could easily be impractical. The overall approach is good, but the architecture is wrong. What we need is a catalog base backup and a means of rolling forward to the appropriate LSN. Rolling forward using WAL is too bulky, so we need a separate log of DDL changes to the catalog. So what we need is a catalog base backup plus a ddl-log. And we need to be able to reconstruct the correct catalog on the target server directly. To translate the WAL we maintain a secondary set of catalog tables, which only exist for logical replication. These are normal, non-versioned tables, but held in a new schema pg_logical or similar. One reason why this must be a secondary catalog is to allow the translation to take place on the target server, and to allow translation of WAL from a prior version of the catalog - so we can allow online upgrades across different catalog versions (and possibly major versions). The secondary catalog is used in place of the normal catalog during InitCatalogCache() in the apply process. All the normal caches exist, they just point to secondary relations rather than the normal ones. When we initialise replication we take a copy of the appropriate tables, columns and rows in a catalog-base-backup, using something like pg_dump. Overall, this is much smaller than normal catalog since it avoids temp tables, and anything not related to WAL translation. On each non-temp change to the database we record changes as SQL in the ddl-log, together with the LSN of the change. When number of changes in ddl-log hits a limit we take a new catalog-base-backup. This process is similar to a checkpoint, but much less frequent, lets call it a ddl-checkpoint. When we start to read WAL logs to translate them, we start by truncating/re-bootstrapping and reloading the secondary catalog from the base backup. We then apply all changes from the ddl-log (which is just a sequence of SQL statements) up until the LSN at the start of WAL. The secondary catalog is then an exact copy of the catalog as of that LSN. As we read through WAL we apply further changes to secondary catalog so it maintains in lock step with the WAL we currently read. Having the ddl-base-backup and ddl-log allows reconstruction of the catalog without needing to put whole catalog into WAL each checkpoint. We can truncate old WAL segments and yet still recreate the DDL needed to translate current WAL data. As a result, ddl-checkpoints are much less frequent, perhaps weekly or monthly, rather than every few minutes. The whole process is similar in algorithm to recovery, but is just normal userspace tables and SQL. Constructing the correct catalog seems to be the heart of this problem, so it is likely to take a while and look complex. Getting the HS initial state was around 50% of the effort in making it all work, so I guess its similar here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Hello. Renaming ret to quoted and str to ret as the patch attached might make it easily readable. I think I'm going to refrain from this because it will be more painful to backpatch. I've felt hesitation to do so, too. The new patch is indeed avoid leaks although which does not lasts permanently. This will preserve more heap for future work but has no necessity to do. On the other hand, the return value of DatumGetTextP() is also may (and 'should' in this case) palloc'ed but not pfree'd like other part of PostgreSQL source (as far as I saw..) because of, I suppose, the nature of these functions that it is difficult/unable to be predicted/determined whether returning memory block is palloc'ed ones or not. And the pain to maintain such codes unrobust for future modification. From such a point of view, we might be good to refrain to backport this. The attached ugly patch does it. We seem should put NO_LOCALE=1 on the 'make check' command line for the encodings not compatible with the environmental locale, although it looks work. The idea of separating the test into its own file has its merit; but instead of having two different tests, I'm going to have a single test and two expected files. That seems simpler than messing around in the makefile. Yes, you're right. But it was easier to add pairs of .sql and .out to do that. Plus, as I wrote in another message, I'm unwilling to push it nevertheless I've wrote it:-( -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote: Hackers, attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing. Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in column @ const query is thanks to 2d-mapping. Looking at this patch now. I see that it fails the opr_sanity test (on master), can you look into that? It looks very promising from a performance standpoint. I think the col @ const query will be a common query; and I also think that pattern will be useful to restrict a large table down to something more manageable. In the bounds_connected function, it might make more sense to use the word adjacent which I already used for ordinary ranges, rather than using the new word connected. Also, I'm getting a little confused switching between thinking in terms of X and Y and lower and upper (particularly since lower and upper can be confused with or ). I don't have a suggestion yet how to clarify that, but it might be good to use the spatial terminology in more places and avoid lower/upper except where needed. Please excuse the slow review, I'm catching up on the SP-GiST API. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote: On 20 June 2012 14:38, Andres Freund and...@2ndquadrant.com wrote: It incurs a rather high performance overhead due to added memory allocations and added pointer indirections. Thats fine for most of the current users of the List interface, but certainly not for all. In other places you cannot even have memory allocations because the list lives in shared memory. Yes, in general lists interact horribly with the memory hierarchy. I think I pointed out to you once a rant of mine on -hackers a while back in which I made various points about just how badly they do these days. Yes, but how is that relevant? Its still the best data structure for many use- cases. Removing one of the two indirections is still a good idea, hence this patch ;) On modern architectures, with many layers of cache, the cost of the linear search to get an insertion point is very large. So this: /* * removes a node from a list * Attention: O(n) */ static inline void ilist_s_remove(ilist_s_head *head, ilist_s_node *node) is actually even worse than you might suspect. O(n) is O(n), the constant is irrelevant. Anybody who uses arbitrary node removal in a single linked link in the fast path is deserves the pain ;) Several of the pieces of code I pointed out in a previous email use open-coded list implementation exactly to prevent those problems. Interesting. So, it seems like this list implementation could be described as a minimal embeddable list implementation that requires the user to do all the memory allocation, and offers a doubly-linked list too. Not an unreasonable idea. I do think that the constraints you have are not well served by any existing implementation, including List and Dllist. Yep. Note though that you normally wouldn't do extra/manual memory allocation because you just use the already allocated memory of the struct where you embedded the list element into. Are you planning on just overhauling the Dllist interface in your next iteration? It needs to be unified. Not yet sure whether its better to just remove Dllist or morph my code into it. All of the less popular compilers we support we support precisely because they pretend to be GCC, with the sole exception, as always, of the Microsoft product, in this case MSVC. So my position is that I'm in broad agreement that we should freely allow the use of inline without macro hacks, since we generally resists using macro hacks if that makes code ugly, which USE_INLINE certainly does, and for a benefit that is indistinguishable from zero, at least to me. Tom already pointed out that not all compilers pretend to be gcc. I agree though that we should try to make all supported compilers support USE_INLINE. I think with some ugliness that should be possible at least for aCC. Will respond to Tom on that. Why are you using the stdlib's assert.h? Why have you used the NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if the header was intended to live in port, but it isn't, right? That should probably be removed, yes. I did it that way that it could be tested independently of casserts because the list checking code turns some linear algorithms into quadratic ones which is noticeable even when --enable- cassert is defined. Why have you done this: #ifdef __GNUC__ #define unused_attr __attribute__((unused)) #else #define unused_attr #endif and then gone on to use this unused_attr macro all over the place? Firstly, that isn't going to suppress the warnings on many platforms that we support, and we do make an effort to build warning free on at least 3 compilers these days - GCC, Clang and MSVC. Secondly, compilers give these warnings because it doesn't make any sense to have an unused parameter - so why have you used one? At the very least, if you require this exact interface, use compatibility macros. I can't imagine why that would be important though. And even if you did want a standard unused_attr facility, you'd do that in c.h, where a lot of that stuff lives. If you look at the places its mostly used in functions like: /* * adds a node at the beginning of the list */ static inline void ilist_d_push_front(ilist_d_head *head, ilist_d_node *node) { node-next = head-head.next; node-prev = head-head; node-next-prev = node; head-head.next = node; ilist_d_check(head); } Where ilist_d_check doesn't do anything if assertions aren't enabled which gcc unfortunately groks and warns. The other case is functions like: static inline void ilist_s_add_after(unused_attr ilist_s_head *head, ilist_s_node *after, ilist_s_node *node) { node-next = after-next; after-next = node; } Where it makes sense for the api to get the head element for consistency reasons. It very well would be possible to add a checking
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote: Peter Geoghegan pe...@2ndquadrant.com writes: All of the less popular compilers we support we support precisely because they pretend to be GCC, with the sole exception, as always, of the Microsoft product, in this case MSVC. This is nonsense. There are at least three buildfarm machines running compilers that do not pretend to be gcc (at least, configure recognizes them as not gcc) and are not MSVC either. We ought to have more IMO, because software monocultures are dangerous. Of those three, two pass the quiet inline test and one --- the newest of the three if I guess correctly --- does not. So it is not the case that !USE_INLINE is dead code, even if you adopt the position that we don't care about any compiler not represented in the buildfarm. I think you can make hpux's acc do the right thing with some trickery though. I don't have access to hpux anymore though so I can't test it. Should there be no other trick - I think there is though - we could just specify -W2177 as an alternative parameter to test in the 'quiet static inline' test. I definitely do not want to bar any sensible compiler from compiling postgres but the keyword here is 'sensible'. If it requires some modest force/trickery to behave sensible, thats ok, but if we need to ship around huge unreadable crufty macros just to support them I don't find it ok. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
Based on the discussion and suggestions in this mail chain, following features can be implemented: 1. To compute the value of max LSN in data pages based on user input whether he wants it for an individual file, a particular directory or whole database. 2a. To search the available WAL files for the latest checkpoint record and prints the value. 2b. To search the available WAL files for the latest checkpoint record and recreates a pg_control file pointing at that checkpoint. I have kept both options to address different kind of corruption scenarios. 1. WAL files are in separate partition which is not corrupt, only the partition where data files and pg_control is corrupt. In this case users can use options 2a or 2b to proceed. 2. All pg_control, data, WAL are on same disk partition which got corrupt. In this case he can use options 1 and 2a to decide the next-LSN for pg_control and proceed. Suggestions? If there is an agreement to do this features, I can send the proposal which kind of options we can keep in existing or new utility for the usage. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
Hi, On Friday, June 22, 2012 08:48:41 AM Simon Riggs wrote: On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote: 2.) Keep the decoding site up2date by replicating the catalog via normal HS recovery mechanisms. Advantages: * most of the technology is already there * minimal overhead (space, performance) * no danger of out of sync catalogs * no support for command triggers required that can keep a catalog in sync, including oids Disadvantages: * driving the catalog recovery that way requires some somewhat intricate code as it needs to be done in lockstep with decoding the wal-stream * requires an additional feature to guarantee HS always has enough information to be queryable after a crash/shutdown * some complex logic/low-level fudging required to keep the transactional behaviour sensible when querying the catalog * full version/architecture compatibility required * the decoding site will always ever be only readable My initial reaction was this wont work, but that requires qualification since this is a complex topic: You can use this approach as long as you realise that the catalog it gives can never be rewound. Well, only as far as the min recovery point has been advanced. Thats advanced less frequent than we apply xlog records. So the generic approach to construct me a catalog as of this LSN would need to start with a base backup of the catalog and then roll forward to the appropriate LSN. Which means a generic user of this approach would need to be able to construct an initial catalog using a PITR. Constructing a decoding site requires you to a) take a partial base backup of the catalog b) apply WAL records to bring that forwards to the correct LSN, which would require some alteration of the recovery code to skip the files missing in a) So taking the approach of a decoding site means we have to modify recovery code, and even when we do that we still end up with a difficult to deploy option in the real world. Difficult to deploy becaus we need a whole new instance of Postgres, plus we need all of the WAL files, which could easily be impractical. The overall approach is good, but the architecture is wrong. What we need is a catalog base backup and a means of rolling forward to the appropriate LSN. Rolling forward using WAL is too bulky, so we need a separate log of DDL changes to the catalog. So what we need is a catalog base backup plus a ddl-log. The idea was to store the applycache to disk everytime UpdateMinRecoveryPoint is called. That way you wouldn't have to scroll back, even if the database crashes/is stopped hard. But I agree, I don't like the architecture that much either. To translate the WAL we maintain a secondary set of catalog tables, which only exist for logical replication. These are normal, non-versioned tables, but held in a new schema pg_logical or similar. One reason why this must be a secondary catalog is to allow the translation to take place on the target server, and to allow translation of WAL from a prior version of the catalog - so we can allow online upgrades across different catalog versions (and possibly major versions). The secondary catalog is used in place of the normal catalog during InitCatalogCache() in the apply process. All the normal caches exist, they just point to secondary relations rather than the normal ones. When we initialise replication we take a copy of the appropriate tables, columns and rows in a catalog-base-backup, using something like pg_dump. Overall, this is much smaller than normal catalog since it avoids temp tables, and anything not related to WAL translation. On each non-temp change to the database we record changes as SQL in the ddl-log, together with the LSN of the change. When number of changes in ddl-log hits a limit we take a new catalog-base-backup. This process is similar to a checkpoint, but much less frequent, lets call it a ddl-checkpoint. When we start to read WAL logs to translate them, we start by truncating/re-bootstrapping and reloading the secondary catalog from the base backup. We then apply all changes from the ddl-log (which is just a sequence of SQL statements) up until the LSN at the start of WAL. The secondary catalog is then an exact copy of the catalog as of that LSN. As we read through WAL we apply further changes to secondary catalog so it maintains in lock step with the WAL we currently read. I can't see how thats going to fly because the *_out functions use the syscache and also plain access to catalog tables. We would have to completely map oids to the alternative catalog. For one I think that mapping would involve far too many places (shared catalogs/relmapper. smgr, fd, syscache, ...). For another you need to access those tables in a completely normal fashion from non-recovery backends which means that we cannot just have duplicated oids hidden away somewhere. Constructing the
Re: [HACKERS] COMMUTATOR doesn't seem to work
On Jun22, 2012, at 06:32 , D'Arcy Cain wrote: So I have my type working now but I had to create a new C function that take the opposite argument order. Seems redundant but I could not see a better way. There isn't. Postgres itself contains a huge number of such functions, e.g. for every *lt() (less-than) there's a *gt() (greater-than). best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions over pathological TCP connections
Leon Smith leon.p.sm...@gmail.com writes: It's not clear to me that this is even a solvable problem without modifying the schema to include both a taken and a finished processing state, and then letting elements be re-delievered after a period of time. You maybe should have a look at PGQ from Skytools before reinventing it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add some more documentation for array indexes/operators
On Wed, Jun 20, 2012 at 9:23 AM, Ryan Kelly rpkell...@gmail.com wrote: I had trouble finding what operators arrays supported or which ones had index support or even determining that arrays could be indexed from the documentation from the array data type. So, patch. Yeah, I agree that the method should be mentioned there. I edited this down a bit so that it doesn't duplicate quite as much material, and committed it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Too frequent message of pgbench -i?
On Thu, Jun 21, 2012 at 3:44 AM, Tatsuo Ishii is...@postgresql.org wrote: On Wed, Jun 20, 2012 at 4:04 AM, Tatsuo Ishii is...@postgresql.org wrote: Currently pgbench -i prints following message every 10k tuples created. fprintf(stderr, %d tuples done.\n, j); I think it's long time ago when the frequency of message seemed to be appropriate because computer is getting so fast these days and every 10k message seems to be too often for me. Can we change the frequency from 10k to 100k? +1 for doing it that way. I have an old patch lying around for that. It's one line, which seems to be about as much work as the problem justifies. +1. Your patch looks good. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On Thu, Jun 21, 2012 at 10:25 AM, Simon Riggs si...@2ndquadrant.com wrote: On 21 June 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote: Nonetheless, it would be a good idea to prune the TODO list regularly, such as after a release. We used to do that a bit, not so much lately, perhaps. But everyone is invited to contribute to that. The idea is to remove contentious issues from the list, to avoid the waste of time. The thing is, a lot of stuff gets punted to the TODO list *because* it's contentious, ie there's not consensus on what to do. If there were consensus we might've just done it already. I'm not sure we want to remove such entries, though perhaps somehow marking them as debatable would be a good thing. There may well be stuff on the list that is no longer very relevant in today's world, but somebody would have to go through it item by item to decide which ones those are. I'm not volunteering. smiles Understood I'll have a play. Maybe I should just go with the idea of Simon's TODO List - stuff I personally think is worth working on, and leave it at that. +1 for that approach. I have a page on the wiki which is irregularly updated and contains a somewhat random list of things that I think are worth doing. I think it would be great to have similar lists for other developers, even if they're not 100% up-to-date or accurate. I wouldn't be averse to pruning 10% of the TODO list, maybe even 20%, but I think there's a lot of stuff on there that's actually worth doing, even if much of it needs discussion before it's implemented. Novices - and even experienced developers, sometimes - tend to make the mistake of deciding to implement X, as if it were self-evident that X is a good thing. The TODO list can feed that misapprehension, but isn't really the source of it. Rather, we all like to believe that our own ideas are awesome. This is frequently true, but not so frequently as we like to believe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Thursday, June 21, 2012 05:40:08 PM Andres Freund wrote: On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote: On Jun21, 2012, at 13:41 , Andres Freund wrote: 3b) Ensure that enough information in the catalog remains by fudging the xmin horizon. Then reassemble an appropriate snapshot to read the catalog as the tuple in question has seen it. The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is updated multiple times by the same transaction, you cannot decide whether a tuple was visible in a certain snapshot unless you have access to the updating backend's ComboCID hash. Thats a very good point. Not sure how I forgot that. It think it might be possible to reconstruct a sensible combocid mapping from the walstream. Let me think about it for a while... I have a very, very preliminary thing which seems to work somewhat. I just log (cmin, cmax) additionally for every modified catalog tuple into the wal and so far that seems to be enough. Do you happen to have suggestions for other problematic things to look into before I put more time into it? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On 22 June 2012 14:15, Robert Haas robertmh...@gmail.com wrote: Rather, we all like to believe that our own ideas are awesome. This is frequently true, but not so frequently as we like to believe. Hmm, for me, awesome has nothing to do with it. I strive to produce useful features that address real problems in the simplest way. I think most of my proposals are fairly obvious to database users. If I find a solution, I push it, but not because I found it, or I think its awesome. The idea that I'm scratching my own itches is mostly wrong. For me, this is about working on the features that Postgres needs and then doing them, with a sense of urgency that seems to be slightly ahead of the curve. It's not just reviewers that work hard on tasks they may not be interested in yet believe are for the common good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal and patch : support INSERT INTO...RETURNING with partitioned table using rule
On Wed, Jun 20, 2012 at 12:24 PM, John Lumby johnlu...@hotmail.com wrote: An INSERT which has a RETURNING clause and which is to be rewritten based on a rule will be accepted if the rule is an unconditional DO INSTEAD. In general I believe unconditional means no WHERE clause, but in practice if the rule is of the form CREATE RULE insert_part_history as ON INSERT to history \ DO INSTEAD SELECT history_insert_partitioned(NEW) returning NEW.id this is treated as conditional and the query is rejected. This isn't rejected because the query is treated as condition; it's rejected because it's not valid syntax. A SELECT query can't have a RETURNING clause, because the target list (i.e. the part that immediately follows the SELECT) already serves that purpose. The fact that it's in a CREATE RULE statement is irrelevant: rhaas=# select 4 returning 3; ERROR: syntax error at or near returning LINE 1: select 4 returning 3; ^ . I propose to extend the rule system to recognize cases where the INSERT query specifies RETURNING and the rule promises to return a row, and to then permit this query to run and return the expected row. In effect, to widen the definition of unconditional to handle cases such as my testcase. That already (kind of) works: rhaas=# create table history (id bigserial, name text);NOTICE: CREATE TABLE will create implicit sequence history_id_seq for serial column history.id CREATE TABLE rhaas=# create table history1 () inherits (history); CREATE TABLE rhaas=# create rule history_insert as on insert to history do instead insert into history1 (id, name) values (NEW.id, NEW.name || ' is awesome!') returning 17::bigint, 'cheeze whiz'::text; CREATE RULE rhaas=# insert into history (name) values ('Linus') returning id, name; id |name +- 17 | cheeze whiz (1 row) INSERT 0 1 rhaas=# select * from history; id | name +--- 1 | Linus is awesome! (1 row) I do notice that the RETURNING clause of the INSERT can't reference NEW, which seems like a restriction that we probably ought to lift, but it doesn't seem to have much to do with your patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On Sat, Feb 4, 2012 at 8:18 PM, Noah Misch n...@leadboat.com wrote: On Sat, Feb 04, 2012 at 03:41:27PM -0800, Jeff Davis wrote: On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote: Yeah. Personally I would be sad if initdb got noticeably slower, and I've never seen or heard of a failure that this would fix. I worked up a patch, and it looks like it does about 6 file fsync's and a 7th for the PGDATA directory. That degrades the time from about 1.1s to 1.4s on my workstation. So, is it worth it? Should we make it an option that can be specified? If we add fsync calls to the initdb process, they should cover the entire data directory tree. This patch syncs files that initdb.c writes, but we ought to also sync files that bootstrap-mode backends had written. An optimization like the pg_flush_data() call in copy_file() may reduce the speed penalty. initdb should do these syncs by default and offer an option to disable them. This may be a stupid question, by why is it initdb's job to fsync the files the server creates, rather than the server's job? Normally we rely on the server to make its own writes persistent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
I am not convinced that it's a good idea to wake up every walsender every time we do XLogInsert(). XLogInsert() is a super-hot code path, and adding more overhead there doesn't seem warranted. We need to replicate commit, commit prepared, etc. quickly, by why do we need to worry about a short delay in replicating heap_insert/update/delete, for example? They don't really matter until the commit arrives. 7 seconds might be a bit long, but that could be fixed by decreasing the polling interval for walsender to, say, a second. Its not woken up every XLogInsert call. Its only woken up if there was an actual disk write + fsync in there. Thats exactly the point of the patch. Sure, but it's still adding cycles to XLogInsert. I'm not sure that XLogBackgroundFlush() is the right place to be doing this, but at least it's in the background rather than the foreground. The wakeup rate is actually lower for synchronous_commit=on than before because then it unconditionally did a wakeup for every commit (and similar) and now only does that if something has been written + fsynced. I'm a bit confused by this, because surely if there's been a commit, then WAL has been written and fsync'd, but the reverse is not true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed
On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I'm confused by this remark, because surely the query planner does it this way only if there's no LIMIT. When there is a LIMIT, we choose based on the startup cost plus the estimated fraction of the total cost we expect to pay based on dividing the LIMIT by the overall row count estimate. Or is this not what you're talking about? I think that Ants is pointing the way of estimating costs in choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for cheapest_path + hashagg, where the costs are calculated based on the total cost only of cheapest_path. I think that it might be good to do cost_agg() for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED strategy. Well, Ants already made some adjustments to those functions; not sure if this means they need some more adjustment, but I don't see that there's a general problem with the costing algorithm around LIMIT. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
Andres Freund and...@2ndquadrant.com writes: On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote: Why are you using the stdlib's assert.h? Why have you used the NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if the header was intended to live in port, but it isn't, right? That should probably be removed, yes. I did it that way that it could be tested independently of casserts because the list checking code turns some linear algorithms into quadratic ones which is noticeable even when --enable- cassert is defined. As far as that goes, I wonder whether the list-checking code hasn't long since served its purpose. Neil Conway put it in when he redid the List API to help catch places that were using no-longer-supported hacks; but it's been years since I've seen it catch anything. I suggest that we might want to either remove it, or enable it via something other than USE_ASSERT_CHECKING (and not enable it by default). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote: I am not convinced that it's a good idea to wake up every walsender every time we do XLogInsert(). XLogInsert() is a super-hot code path, and adding more overhead there doesn't seem warranted. We need to replicate commit, commit prepared, etc. quickly, by why do we need to worry about a short delay in replicating heap_insert/update/delete, for example? They don't really matter until the commit arrives. 7 seconds might be a bit long, but that could be fixed by decreasing the polling interval for walsender to, say, a second. Its not woken up every XLogInsert call. Its only woken up if there was an actual disk write + fsync in there. Thats exactly the point of the patch. Sure, but it's still adding cycles to XLogInsert. I'm not sure that XLogBackgroundFlush() is the right place to be doing this, but at least it's in the background rather than the foreground. It adds one if() if nothing was fsynced. If something was written and fsynced inside XLogInsert some kill() calls are surely not the problem. The wakeup rate is actually lower for synchronous_commit=on than before because then it unconditionally did a wakeup for every commit (and similar) and now only does that if something has been written + fsynced. I'm a bit confused by this, because surely if there's been a commit, then WAL has been written and fsync'd, but the reverse is not true. As soon as you have significant concurrency by the time the XLogFlush in RecordTransactionCommit() is reached another backend or the wal writer may have already fsynced the wal up to the requested point. In that case no wakeup will performed by the comitting backend at all. 9.2 improved the likelihood of that as you know. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On Friday, June 22, 2012 04:18:35 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote: Why are you using the stdlib's assert.h? Why have you used the NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if the header was intended to live in port, but it isn't, right? That should probably be removed, yes. I did it that way that it could be tested independently of casserts because the list checking code turns some linear algorithms into quadratic ones which is noticeable even when --enable- cassert is defined. As far as that goes, I wonder whether the list-checking code hasn't long since served its purpose. Neil Conway put it in when he redid the List API to help catch places that were using no-longer-supported hacks; but it's been years since I've seen it catch anything. I suggest that we might want to either remove it, or enable it via something other than USE_ASSERT_CHECKING (and not enable it by default). Oh, I and Peter weren't talking about the pg_list.h stuff, it was about my 'embedded list' implementation which started this subthread. The pg_list.h/list.c stuff isn't problematic as far as I have seen in profiles; its checks are pretty simple so I do not find that surprising. We might want to disable it by default anyway. In my code the list checking stuff iterates over the complete list after modifications and checks that all prev/next pointers are correct so its linear in itself... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Fri, Jun 22, 2012 at 10:19 AM, Andres Freund and...@2ndquadrant.com wrote: On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote: I am not convinced that it's a good idea to wake up every walsender every time we do XLogInsert(). XLogInsert() is a super-hot code path, and adding more overhead there doesn't seem warranted. We need to replicate commit, commit prepared, etc. quickly, by why do we need to worry about a short delay in replicating heap_insert/update/delete, for example? They don't really matter until the commit arrives. 7 seconds might be a bit long, but that could be fixed by decreasing the polling interval for walsender to, say, a second. Its not woken up every XLogInsert call. Its only woken up if there was an actual disk write + fsync in there. Thats exactly the point of the patch. Sure, but it's still adding cycles to XLogInsert. I'm not sure that XLogBackgroundFlush() is the right place to be doing this, but at least it's in the background rather than the foreground. It adds one if() if nothing was fsynced. If something was written and fsynced inside XLogInsert some kill() calls are surely not the problem. The wakeup rate is actually lower for synchronous_commit=on than before because then it unconditionally did a wakeup for every commit (and similar) and now only does that if something has been written + fsynced. I'm a bit confused by this, because surely if there's been a commit, then WAL has been written and fsync'd, but the reverse is not true. As soon as you have significant concurrency by the time the XLogFlush in RecordTransactionCommit() is reached another backend or the wal writer may have already fsynced the wal up to the requested point. In that case no wakeup will performed by the comitting backend at all. 9.2 improved the likelihood of that as you know. Hmm, well, I guess. I'm still not sure I really understand what benefit we're getting out of this. If we lose a few WAL records for an uncommitted transaction, who cares? That transaction is gone anyway. As an implementation detail, I suggest rewriting WalSndWakeupRequest and WalSndWakeupProcess as macros. The old code does an in-line test for max_wal_senders 0, which suggests that somebody thought the function call overhead might be enough to matter here. Perhaps they were wrong, but it shouldn't hurt anything to keep it that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
Andres Freund and...@2ndquadrant.com writes: Oh, I and Peter weren't talking about the pg_list.h stuff, it was about my 'embedded list' implementation which started this subthread. The pg_list.h/list.c stuff isn't problematic as far as I have seen in profiles; its checks are pretty simple so I do not find that surprising. We might want to disable it by default anyway. In my code the list checking stuff iterates over the complete list after modifications and checks that all prev/next pointers are correct so its linear in itself... Well, so does list.c, so I'd expect the performance risks to be similar. Possibly you're testing on longer lists than are typical in the backend. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Friday, June 22, 2012 04:34:33 PM Robert Haas wrote: On Fri, Jun 22, 2012 at 10:19 AM, Andres Freund and...@2ndquadrant.com wrote: On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote: I am not convinced that it's a good idea to wake up every walsender every time we do XLogInsert(). XLogInsert() is a super-hot code path, and adding more overhead there doesn't seem warranted. We need to replicate commit, commit prepared, etc. quickly, by why do we need to worry about a short delay in replicating heap_insert/update/delete, for example? They don't really matter until the commit arrives. 7 seconds might be a bit long, but that could be fixed by decreasing the polling interval for walsender to, say, a second. Its not woken up every XLogInsert call. Its only woken up if there was an actual disk write + fsync in there. Thats exactly the point of the patch. Sure, but it's still adding cycles to XLogInsert. I'm not sure that XLogBackgroundFlush() is the right place to be doing this, but at least it's in the background rather than the foreground. It adds one if() if nothing was fsynced. If something was written and fsynced inside XLogInsert some kill() calls are surely not the problem. The wakeup rate is actually lower for synchronous_commit=on than before because then it unconditionally did a wakeup for every commit (and similar) and now only does that if something has been written + fsynced. I'm a bit confused by this, because surely if there's been a commit, then WAL has been written and fsync'd, but the reverse is not true. As soon as you have significant concurrency by the time the XLogFlush in RecordTransactionCommit() is reached another backend or the wal writer may have already fsynced the wal up to the requested point. In that case no wakeup will performed by the comitting backend at all. 9.2 improved the likelihood of that as you know. Hmm, well, I guess. I'm still not sure I really understand what benefit we're getting out of this. If we lose a few WAL records for an uncommitted transaction, who cares? That transaction is gone anyway. Well, before the simple fix Simon applied after my initial complaint you didn't get wakeups *at all* in the synchronous_commit=off case. Now, with the additional changes, the walsender is woken exactly when data is available to send and not always when a commit happens. I played around with various scenarios and it always was a win. One reason is that the walreceiver often is a bottleneck because it fsyncs the received data immediately so a less blocky transfer pattern is reducing that problem a bit. As an implementation detail, I suggest rewriting WalSndWakeupRequest and WalSndWakeupProcess as macros. The old code does an in-line test for max_wal_senders 0, which suggests that somebody thought the function call overhead might be enough to matter here. Perhaps they were wrong, but it shouldn't hurt anything to keep it that way. True. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
On Friday, June 22, 2012 04:41:20 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Oh, I and Peter weren't talking about the pg_list.h stuff, it was about my 'embedded list' implementation which started this subthread. The pg_list.h/list.c stuff isn't problematic as far as I have seen in profiles; its checks are pretty simple so I do not find that surprising. We might want to disable it by default anyway. In my code the list checking stuff iterates over the complete list after modifications and checks that all prev/next pointers are correct so its linear in itself... Well, so does list.c, so I'd expect the performance risks to be similar. Possibly you're testing on longer lists than are typical in the backend. I don't think list.c does so: static void check_list_invariants(const List *list) { if (list == NIL) return; Assert(list-length 0); Assert(list-head != NULL); Assert(list-tail != NULL); Assert(list-type == T_List || list-type == T_IntList || list-type == T_OidList); if (list-length == 1) Assert(list-head == list-tail); if (list-length == 2) Assert(list-head-next == list-tail); Assert(list-tail-next == NULL); } But yes, the lists I deal with are significantly longer, so replacing O(n) by O(n^2) is rather painful there... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo Hrm, that's quite cute. I dunno if there is a more cannon way of doing the above-- but it seems to work. I'm not sure this regression test is worth it. I'm thinking maybe we should just remove theegressionegression test instead. I agree. That is the fundamental question. I've coded just for my fun but I don't see not so much signicance to do that. We might omit the test for this which is non-ciritical and corner cases. We need these tests to work on Windows too, so fancy gmake tricks are probably not the way to deal with varying results. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)
Andres Freund and...@2ndquadrant.com writes: On Friday, June 22, 2012 04:41:20 PM Tom Lane wrote: Well, so does list.c, so I'd expect the performance risks to be similar. I don't think list.c does so: Huh, OK. I seem to remember that the original version actually chased down the whole list and verified that the length matched. We must've soon decided that that was insupportable in practice. There might be a lesson here for your checks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Fri, Jun 22, 2012 at 10:45 AM, Andres Freund and...@2ndquadrant.com wrote: the likelihood of that as you know. Hmm, well, I guess. I'm still not sure I really understand what benefit we're getting out of this. If we lose a few WAL records for an uncommitted transaction, who cares? That transaction is gone anyway. Well, before the simple fix Simon applied after my initial complaint you didn't get wakeups *at all* in the synchronous_commit=off case. Now, with the additional changes, the walsender is woken exactly when data is available to send and not always when a commit happens. I played around with various scenarios and it always was a win. Can you elaborate on that a bit? What scenarios did you play around with, and what does win mean in this context? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On 12-06-22 07:11 AM, Florian Pflug wrote: On Jun22, 2012, at 06:32 , D'Arcy Cain wrote: So I have my type working now but I had to create a new C function that take the opposite argument order. Seems redundant but I could not see a better way. There isn't. Postgres itself contains a huge number of such functions, e.g. for every *lt() (less-than) there's a *gt() (greater-than). Right but that's not the same thing. Assuming you meant lt/gte and lte/gt those still are not self-commutating (SC). For example, '=' on two ints is SC. The issue here is that the operator is SC but the args are different types. It would be nice if there was a way to automatically generate code that reverses arguments. Maybe such a thing belongs in the CREATE FUNCTION command. Or, I guess this works and keeps from creating a second C function: CREATE OR REPLACE FUNCTION eq(chkpass, text) RETURNS bool STRICT AS 'chkpass.so', 'chkpass_eq' LANGUAGE 'c' RETURNS NULL ON NULL INPUT; CREATE OR REPLACE FUNCTION eq(text, chkpass) RETURNS bool AS 'select eq($2, $1);' LANGUAGE SQL RETURNS NULL ON NULL INPUT; -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila amit.kap...@huawei.com wrote: Based on the discussion and suggestions in this mail chain, following features can be implemented: 1. To compute the value of max LSN in data pages based on user input whether he wants it for an individual file, a particular directory or whole database. 2a. To search the available WAL files for the latest checkpoint record and prints the value. 2b. To search the available WAL files for the latest checkpoint record and recreates a pg_control file pointing at that checkpoint. I have kept both options to address different kind of corruption scenarios. I think I can see all of those things being potentially useful. There are a couple of pending patches that will revise the WAL format slightly; not sure how much those are likely to interfere with any development you might do on (2) in the meantime. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
D'Arcy Cain da...@druid.net writes: ... The issue here is that the operator is SC but the args are different types. Well, that's a weird way of defining self-commutating, but ... It would be nice if there was a way to automatically generate code that reverses arguments. Maybe such a thing belongs in the CREATE FUNCTION command. Or, I guess this works and keeps from creating a second C function: CREATE OR REPLACE FUNCTION eq(chkpass, text) RETURNS bool STRICT AS 'chkpass.so', 'chkpass_eq' LANGUAGE 'c' RETURNS NULL ON NULL INPUT; CREATE OR REPLACE FUNCTION eq(text, chkpass) RETURNS bool AS 'select eq($2, $1);' LANGUAGE SQL RETURNS NULL ON NULL INPUT; The thing is that either of those approaches is hugely more expensive than just providing a second C function. It costs probably thousands of cycles to inline that SQL function, each time it's used in a query. I doubt that an auto reverse the arguments facility would be very much cheaper. You could maybe argue that the aggregated maintenance and space costs of all the commutator-pair functions are enough to justify having some such solution instead, but I'm doubtful --- and even if true, getting from here to there would be painful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pruning the TODO list
On 06/22/2012 09:45 AM, Simon Riggs wrote: On 22 June 2012 14:15, Robert Haasrobertmh...@gmail.com wrote: Rather, we all like to believe that our own ideas are awesome. This is frequently true, but not so frequently as we like to believe. Hmm, for me, awesome has nothing to do with it. I strive to produce useful features that address real problems in the simplest way. I think most of my proposals are fairly obvious to database users. If I find a solution, I push it, but not because I found it, or I think its awesome. The idea that I'm scratching my own itches is mostly wrong. For me, this is about working on the features that Postgres needs and then doing them, with a sense of urgency that seems to be slightly ahead of the curve. It's not just reviewers that work hard on tasks they may not be interested in yet believe are for the common good. That's true of many developers. I think the real problem with the TODO list is that some people see it as some sort of official roadmap, and it really isn't. Neither is there anything else that is. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers reduced, v1
On Wed, Jun 20, 2012 at 4:36 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: 1. I still think we ought to get rid of the notion of BEFORE or AFTER (i.e. pg_event_trigger.evttype) and just make that detail part of the event name (e.g. pg_event_trigger.evtevent). Many easily forseeable event types will be more like during rather than before or after, and for those that do have a notion of before and after, we can have two different event names and include the word before or after there. I am otherwise satisfied with the schema you've chosen. It's not before/after anymore, but rather addon/replace if you will. I kept the INSTEAD OF keyword for the replace semantics, that you've been asking me to keep IIRC, with security policy plugins as a use case. Now we can of course keep those semantics and embed them in the event name we provide users, I though that maybe a documentation matrix of which event support which mode would be cleaner to document. We might as well find a clean way to implement both modes for most of the commands, I don't know yet. So, are you sure you want to embed that part of the event trigger semantics in the event name itself? Yeah, pretty sure. I think that for regular triggers, BEFORE, AFTER, and INSTEAD-OF are the firing-point specification. But even triggers will have more than three firing points, probably eventually quite a lot more. So we need something more flexible. But we don't need that more flexible thing AND ALSO the before/after/instead-of specification, which I think in most cases won't be meaningful anyway. It happens to be somewhat sensible for this initial firing point, but I think for most of them there will be just one place, and in many cases it will be neither before, nor after, nor instead-of. 2. I think it's important to be able to add new types of event triggers without creating excessive parser bloat. I think it's I've been trying to do that yes, as you can see with event_name and event_trigger_variable rules. I've been re-using as much existing keywords as I could because I believe that's not causing any measurable bloat, I'll kindly reconsider if necessary, even if sadly. The issue is that the size of the parser tables grow with the square of the number of states. This will introduce lots of new states that we don't really need; and every new kind of event trigger that we want to add will introduce more. 3. The event trigger cache seems to be a few bricks shy of a load. I wouldn't be that surprised, mind you. I didn't have nearly as much time I wanted to working on that project. First, event_trigger_cache_is_stalled is mis-named; I think you mean stale, not stalled. Second, instead of setting that flag and then Stale. Right. Edited. rebuilding the cache when you see the flag set, how about just blowing away the cache contents whenever you would have set the flag? That I've been doing that at first, but that meant several full rebuilds in a row in the regression tests, which are adding new event triggers then using them. I though lazily maintaining the cache would be better. Well, AFAICS, you're still doing full rebuilds whenever something changes; you're just keeping the (useless, dead) cache around until you decide to rebuild it. Might as well free the memory once you know that the next access will rebuild it anyway, and for a bonus it saves you a flag. I'm not that fond of psql commands, but I don't think it's going to fly not to have one for event triggers. I could buy \dy. Yeah, I think people are going to want to have one. I really despise the \dwhatever syntax, but it's not 100% clear what a better one would look like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On 12-06-22 11:36 AM, Tom Lane wrote: D'Arcy Cainda...@druid.net writes: The thing is that either of those approaches is hugely more expensive than just providing a second C function. It costs probably thousands of cycles to inline that SQL function, each time it's used in a query. I assumed itwould be more expensive but didn't know it would be that much more. I doubt that an auto reverse the arguments facility would be very much cheaper. You could maybe argue that the aggregated maintenance and space costs of all the commutator-pair functions are enough to justify having some such solution instead, but I'm doubtful --- and even if true, getting from here to there would be painful. And it would only apply to a very specific type of function. The other idea I had was to just have the second C function call the first but that didn't work. Here is what I tried. PG_FUNCTION_INFO_V1(chkpass_eq); Datum chkpass_eq(PG_FUNCTION_ARGS) { chkpass*a1 = (chkpass *) PG_GETARG_POINTER(0); text *a2 = (text *) PG_GETARG_TEXT_P(1); charstr[9]; strlcpy(str, a2-vl_dat, sizeof(str)); PG_RETURN_BOOL(strcmp(a1-password, crypt(str, a1-password)) == 0); } PG_FUNCTION_INFO_V1(chkpass_eq2); Datum chkpass_eq2(PG_FUNCTION_ARGS) { return chkpass_eq(PG_GETARG_POINTER(1), PG_GETARG_TEXT_P(0)); } Now in this specific case the function is trivial and writing it twice is no big deal but in general I hate writing the same code twice. I suppose I could extract the actual operation out to a third function and call it from the others. I may do that anyway just for the value of the example. Or is there a way to do what I tried above? -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Friday, June 22, 2012 04:59:45 PM Robert Haas wrote: On Fri, Jun 22, 2012 at 10:45 AM, Andres Freund and...@2ndquadrant.com wrote: the likelihood of that as you know. Hmm, well, I guess. I'm still not sure I really understand what benefit we're getting out of this. If we lose a few WAL records for an uncommitted transaction, who cares? That transaction is gone anyway. Well, before the simple fix Simon applied after my initial complaint you didn't get wakeups *at all* in the synchronous_commit=off case. Now, with the additional changes, the walsender is woken exactly when data is available to send and not always when a commit happens. I played around with various scenarios and it always was a win. Can you elaborate on that a bit? What scenarios did you play around with, and what does win mean in this context? I had two machines connected locally and setup HS and my prototype between them (not at once obviously). The patch reduced all the average latency between both nodes (measured by 'ticker' rows arriving in a table on the standby), the jitter in latency and the amount of load I had to put on the master before the standby couldn't keep up anymore. I played with different loads: * multple concurrent ~50MB COPY's * multple concurrent ~50MB COPY's, pgbench * pgbench All three had a ticker running concurrently with synchronous_commit=off (because it shouldn't cause any difference in the replication pattern itself). The difference in averagelag and cutoff were smallest with just pgbench running alone and biggest with COPY running alone. Highjitter was most visible with just pgbench running alone but thats likely just because the average lag was smaller. Its not that surprising imo. On workloads that have a high wal throughput like all of the above XLogInsert frequently has to write out data itself. If that happens the walsender might not get waken up in the current setup so the walsender/receiver pair is inactive and starts to work like crazy afterwards to catch up. During that period of higher activity it does fsync's of MAX_SEND_SIZE (16 * XLOG_BLKSZ) in a high rate which reduces the throughput of apply... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On 20-06-2012 17:40, Marko Kreen wrote: On Wed, Jun 20, 2012 at 10:05 PM, Florian Pflug f...@phlo.org wrote: I'm starting to think that relying on SSL/TLS for compression of unencrypted connections might not be such a good idea after all. We'd be using the protocol in a way it quite clearly never was intended to be used... Maybe, but what is the argument that we should avoid on encryption+compression at the same time? AES is quite lightweight compared to compression, so should be no problem in situations where you care about compression. If we could solve compression problem without AES that will turn things easier. Compression-only via encryption is a weird manner to solve the problem in the user's POV. RSA is noticeable, but only for short connections. Thus easily solvable with connection pooling. RSA overhead is not the main problem. SSL/TLS setup is. And for really special compression needs you can always create a UDF that does custom compression for you. You have to own the code to modify it; it is not always an option. So what exactly is the situation we need to solve with postgres-specific protocol compression? Compression only support. Why do I need to set up SSL/TLS just for compression? IMHO SSL/TLS use is no different from relying in another library to handle compression for the protocol and more it is compression-specific. That way, we could implement another algorithms in such library without needing to modify libpq code. Using SSL/TLS you are bounded by what SSL/TLS software products decide to use as compression algorithms. I'll be happy to maintain the code iif it is postgres-specific or even as close as possible to core. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] random failing builds on spoonbill - backends not exiting...
It has now happened at least twice that builds on spponbill started to fail after it failed during ECPGcheck: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04 the first failure was: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05 so in both cases the postmaster was not shuting down properly and it was in fact still running - I have attached gdb to to the still running backend: (gdb) bt #0 0x000208eb5928 in poll () from /usr/lib/libc.so.62.0 #1 0x00020a972b88 in _thread_kern_poll (wait_reqd=Variable wait_reqd is not available. ) at /usr/src/lib/libpthread/uthread/uthread_kern.c:784 #2 0x00020a973d04 in _thread_kern_sched (scp=0x0) at /usr/src/lib/libpthread/uthread/uthread_kern.c:384 #3 0x00020a96c080 in select (numfds=Variable numfds is not available. ) at /usr/src/lib/libpthread/uthread/uthread_select.c:170 #4 0x003a2894 in ServerLoop () at postmaster.c:1321 #5 0x003a45ac in PostmasterMain (argc=Variable argc is not available. ) at postmaster.c:1121 #6 0x00326df8 in main (argc=6, argv=0x14f8) at main.c:199 (gdb) print Shutdown $2 = 2 (gdb) print pmState $3 = PM_WAIT_BACKENDS (gdb) p *(Backend *) (BackendList-dll_head) Cannot access memory at address 0x0 (gdb) p *BackendList $9 = {dll_head = 0x0, dll_tail = 0x0} all processes are still running: pgbuild 18020 0.0 1.2 5952 12408 ?? I Wed04AM0:03.98 /home/pgbuild/pgbuildfarm/HEAD/pgsql.5709/src/interfaces/ecpg/test/./tmp_check/install//home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D / pgbuild 21483 0.0 0.7 6088 7296 ?? IsWed04AM0:00.68 postgres: checkpointer process(postgres) pgbuild 12480 0.0 0.4 5952 4464 ?? SsWed04AM0:06.88 postgres: writer process(postgres) pgbuild 9841 0.0 0.5 5952 4936 ?? SsWed04AM0:06.92 postgres: wal writer process(postgres) pgbuild623 0.1 0.6 7424 6288 ?? SsWed04AM4:16.76 postgres: autovacuum launcher process(postgres) pgbuild 30949 0.0 0.4 6280 3896 ?? SsWed04AM0:40.94 postgres: stats collector process(postgres) sending a manual kill -15 to either of them does not seem to make them exit either... I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: It has now happened at least twice that builds on spponbill started to fail after it failed during ECPGcheck: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04 the first failure was: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05 so in both cases the postmaster was not shuting down properly panther has been showing similar postmaster-does-not-shut-down failures every so often, though IIRC always in the IsolationCheck step not ECPG. I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. But we don't use threading ... Still, panther is NetBSD so there may be some general BSD flavor to whatever's going on here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On 06/22/2012 02:34 PM, Tom Lane wrote: Stefan Kaltenbrunnerste...@kaltenbrunner.cc writes: It has now happened at least twice that builds on spponbill started to fail after it failed during ECPGcheck: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04 the first failure was: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05 so in both cases the postmaster was not shuting down properly panther has been showing similar postmaster-does-not-shut-down failures every so often, though IIRC always in the IsolationCheck step not ECPG. brolga (Cygwin) has had similar sporadic failures on ecpg checks, so much so that, since I have been unable to debug it in the time I have available, for the time being I have disabled the ecpg checks. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: sending a manual kill -15 to either of them does not seem to make them exit either... I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. I remarked to Stefan that the symptoms seem consistent with the idea that the children have signals blocked. But I don't know how that could happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote: On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: sending a manual kill -15 to either of them does not seem to make them exit either... I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. I remarked to Stefan that the symptoms seem consistent with the idea that the children have signals blocked. But I don't know how that could happen. You cannot block sigkill. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On 06/22/2012 08:34 PM, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: It has now happened at least twice that builds on spponbill started to fail after it failed during ECPGcheck: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04 the first failure was: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05 so in both cases the postmaster was not shuting down properly panther has been showing similar postmaster-does-not-shut-down failures every so often, though IIRC always in the IsolationCheck step not ECPG. hmm I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. But we don't use threading ... Still, panther is NetBSD so there may be some general BSD flavor to whatever's going on here. yeah the threading reference was mostly because all backtraces contain references to threading libs and because the threading tests are the last ones done by the ECPG changes... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal
On Friday, June 22, 2012 03:22:03 PM Andres Freund wrote: On Thursday, June 21, 2012 05:40:08 PM Andres Freund wrote: On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote: On Jun21, 2012, at 13:41 , Andres Freund wrote: 3b) Ensure that enough information in the catalog remains by fudging the xmin horizon. Then reassemble an appropriate snapshot to read the catalog as the tuple in question has seen it. The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is updated multiple times by the same transaction, you cannot decide whether a tuple was visible in a certain snapshot unless you have access to the updating backend's ComboCID hash. Thats a very good point. Not sure how I forgot that. It think it might be possible to reconstruct a sensible combocid mapping from the walstream. Let me think about it for a while... I have a very, very preliminary thing which seems to work somewhat. I just log (cmin, cmax) additionally for every modified catalog tuple into the wal and so far that seems to be enough. Do you happen to have suggestions for other problematic things to look into before I put more time into it? Im continuing to play around with this. The tricky bit so far is subtransaction handling in transactions which modify the catalog (+ possible tables which are marked as being required for decoding like pg_enum equivalent). Would somebody fundamentally object to one the following things: 1. replace #define IsMVCCSnapshot(snapshot) \ ((snapshot)-satisfies == HeapTupleSatisfiesMVCC) with something like #define IsMVCCSnapshot(snapshot) \ ((snapshot)-satisfies == HeapTupleSatisfiesMVCC || (snapshot)-satisfies == HeapTupleSatisfiesMVCCDuringDecode) The define is only used sparingly and none of the code path looks so hot that this could make a difference. 2. Set SnapshotNowData.satisfies to HeapTupleSatisfiesNowDuringRecovery while reading the catalog for decoding. Its possible to go on without both but the faking up of data gets quite a bit more complex. The problem making replacement of SnapshotNow.satisfies useful is that there is no convenient way to represent subtransactions of the current transaction which already have committed according to the TransactionLog but aren't yet visible at the current lsn because they only started afterwards. Its relatively easy to fake this in an mvcc snapshot but way harder for SnapshotNow because you cannot mark transactions as in-progress. Thanks, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: On 06/22/2012 08:34 PM, Tom Lane wrote: Still, panther is NetBSD so there may be some general BSD flavor to whatever's going on here. yeah the threading reference was mostly because all backtraces contain references to threading libs and because the threading tests are the last ones done by the ECPG changes... It is weird that this seems to be happening only in the context of the ecpg and isolation tests, because it's not clear why client-side activity would have anything to do with it. Your gdb investigations confirm that all the actual client-serving backends are gone, and the postmaster knows it. But the background service processes haven't shut down. AFAICS the postmaster could not have reached PM_WAIT_BACKENDS state without signaling them, so why aren't they shutting down, and why does it matter which set of tests we'd been running? My first thought about it was that maybe a signal got missed, but it's hard to credit that bgwriter, walwriter, and autovac would all have missed signals concurrently. (checkpointer and stats collector don't get signaled yet, so it's not surprising those are still around.) I wonder whether signal_child() could have failed? It logs about such failures, but only at debug3 which seems overly taciturn. I wonder if we should crank that up to LOG level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
Andres Freund and...@2ndquadrant.com writes: On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote: I remarked to Stefan that the symptoms seem consistent with the idea that the children have signals blocked. But I don't know how that could happen. You cannot block sigkill. sigterm is at issue, not sigkill. But I don't care for the signals-blocked theory either, at least not in three different children at the same time. (Hey Stefan, is there a way on BSD to check a process's signals-blocked state from outside? If so, next time this happens you should try to determine the children's signal state.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers reduced, v1
Robert Haas robertmh...@gmail.com writes: It's not before/after anymore, but rather addon/replace if you will. I kept the INSTEAD OF keyword for the replace semantics, that you've been asking me to keep IIRC, with security policy plugins as a use case. Now we can of course keep those semantics and embed them in the event name we provide users, I though that maybe a documentation matrix of which event support which mode would be cleaner to document. We might as well find a clean way to implement both modes for most of the commands, I don't know yet. So, are you sure you want to embed that part of the event trigger semantics in the event name itself? Yeah, pretty sure. I think that for regular triggers, BEFORE, AFTER, and INSTEAD-OF are the firing-point specification. But even triggers will have more than three firing points, probably eventually quite a lot more. So we need something more flexible. But we don't need that more flexible thing AND ALSO the before/after/instead-of specification, which I think in most cases won't be meaningful anyway. It happens to be somewhat sensible for this initial firing point, but I think for most of them there will be just one place, and in many cases it will be neither before, nor after, nor instead-of. I agree with using the event name as a the specification for the firing point, and that we should prefer documenting the ordering of those rather than offering a fuzzy idea of BEFORE and AFTER steps in there. The AFTER step is better expressed as BEFORE the next one. Now, I still think there's an important discrepancy between adding a new behaviour that adds-up to whatever the backend currently implements and providing a replacement behaviour with a user defined function that gets called instead of the backend code. And I still don't think that the event name should be carrying alone that semantic discrepancy. Now, I also want the patch to get in, so I won't insist very much if I'm alone in that position. Anyone else interested enough to chime in? The user visible difference would be between those variants: create event trigger foo at 'before_security_check' ... create event trigger foo at 'replace_security_check' ... create event trigger foo before 'security_check' ... create event trigger foo instead of 'security_check' ... Note that in this version the INSTEAD OF variant is not supported, we only intend to offer it in some very narrow cases, or at least that is my understanding. The issue is that the size of the parser tables grow with the square of the number of states. This will introduce lots of new states that we don't really need; and every new kind of event trigger that we want to add will introduce more. It's a little sad not being able to reuse command tag keywords, but it's even more sad to impact the rest of the query parsing. IIRC you had some performance test patch with a split of the main parser into queries and dml on the one hand, and utility commands on the other hand. Would that help here? (I mean more as a general solution against that bloat problem than for this very patch here). I prefer the solution of using 'ALTER TABLE' rather than ALTER TABLE, even if code wise we're not gaining anything in complexity: the parser bloat gets replaced by a big series of if branches. Of course you only exercise it when you need to. I will change that for next patch. 3. The event trigger cache seems to be a few bricks shy of a load. Well, AFAICS, you're still doing full rebuilds whenever something changes; you're just keeping the (useless, dead) cache around until you decide to rebuild it. Might as well free the memory once you know that the next access will rebuild it anyway, and for a bonus it saves you a flag. I'm just done rewriting the cache management with a catalog cache for event triggers and a Syscache Callback that calls into a new module called src/backend/utils/cache/evtcache.c that mimics attoptcache.c. No more cache stale variable. And a proper composite hash key. I still have some more work here before being able to send a new release of the patch, as I said I won't have enough time to make that happen until within next week. The git repository is updated, though. https://github.com/dimitri/postgres/tree/evt_trig_v1 https://github.com/dimitri/postgres/compare/913091de51...861eb038d0 Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_prewarm
Robert Haas robertmh...@gmail.com writes: 73%? I think it's got about 15% overlap. 83.7% of stats are wrong. This one included. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On 06/22/2012 09:39 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote: I remarked to Stefan that the symptoms seem consistent with the idea that the children have signals blocked. But I don't know how that could happen. You cannot block sigkill. sigterm is at issue, not sigkill. But I don't care for the signals-blocked theory either, at least not in three different children at the same time. (Hey Stefan, is there a way on BSD to check a process's signals-blocked state from outside? If so, next time this happens you should try to determine the children's signal state.) with help from RhodiumToad on IRC: # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 12480 20004004 34084005 c942b002 fffefeff postgres: writer process (postgres) # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process (postgres) Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess
I wrote: I believe the right fix for both of these issues is to add knowledge of the section concept to the topological sort logic, so that an ordering that puts POST_DATA before DATA or PRE_DATA after DATA is considered to be a dependency-ordering violation. One way to do that is to add dummy fencepost objects to the sort, representing the start and end of the DATA section. However, these objects would need explicit dependency links to every other DumpableObject, so that doesn't sound very good from a performance standpoint. What I'm going to go look at is whether we can mark DumpableObjects with their SECTION codes at creation time (rather than adding that information at ArchiveEntry() time) and then have the topo sort logic take that marking into account in addition to the explicit dependency links. I gave up on putting any fancy hacks into the topological sort code; it would have made it extremely ugly, and besides every solution I could think of wanted to have at least one extra auxiliary array with an entry per DumpableObject. Which would have eaten about as much space as the extra dependency links. Furthermore, it turns out that the post-data dependencies need to be changeable, since rules and constraints should only be forced to be post-data *if* we've decided to dump them separately from their parent tables/views. (My original try at this ended up forcing every rule constraint to be dumped separately, which is not what we want.) Putting knowledge of that into the core topological sort code seemed right out. So the attached draft patch does it the straightforward way, actually creating two dummy boundary objects and setting up explicit dependency links with them. I did some simple performance tests and found that this adds a measurable but pretty negligible cost to pg_dump's runtime. For instance, dumping several thousand empty tables went from 9.34 to 9.57 seconds of pg_dump CPU time, compared to multiple minutes of CPU time spent on the backend side (even with the recent lockmanager fixes). So I no longer feel any strong need to optimize the code. A disadvantage of representing the dependencies explicitly is that the ones attached to DATA and POST_DATA objects show up in the output archive. I'm not particularly worried about this so far as HEAD and 9.2 are concerned, because the other patch to fix emitted dependencies will make them go away again. But as I mentioned, I'm not big on back-patching that one into 9.1. We could hack something simpler to directly suppress dependencies on the boundary objects only, or we could just write it off as not mattering much. I'd barely have noticed it except I was testing whether I got an exact match to the archive produced by an unpatched pg_dump (in cases not involving the view-vs-constraint bug). Anyway, the attached patch does seem to fix the constraint bug. A possible objection to it is that there are now three different ways in which the pg_dump code knows which DO_XXX object types go in which dump section: the new addBoundaryDependencies() function knows this, the SECTION_xxx arguments to ArchiveEntry calls know it, and the sort ordering constants in pg_dump_sort.c have to agree too. My original idea was to add an explicit section field to DumpableObject to reduce the number of places that know this, but that would increase pg_dump's memory consumption still more, and yet still not give us a single point of knowledge. Has anybody got a better idea? Barring objections or better ideas, I'll push forward with applying this patch and the dependency-fixup patch. regards, tom lane diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 5fde18921ac3c20f878fbe5ad22c60fabc13a916..71cc3416bb9fd9dcf482c7e8c3a99d01acd7f238 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** static void dumpACL(Archive *fout, Catal *** 210,215 --- 210,220 const char *acls); static void getDependencies(Archive *fout); + + static DumpableObject *createBoundaryObjects(void); + static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs, + DumpableObject *boundaryObjs); + static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo); static void getTableData(TableInfo *tblinfo, int numTables, bool oids); static void makeTableDataInfo(TableInfo *tbinfo, bool oids); *** main(int argc, char **argv) *** 270,275 --- 275,281 int numTables; DumpableObject **dobjs; int numObjs; + DumpableObject *boundaryObjs; int i; enum trivalue prompt_password = TRI_DEFAULT; int compressLevel = -1; *** main(int argc, char **argv) *** 691,696 --- 697,713 */ getDependencies(fout); + /* Lastly, create dummy objects to represent the section boundaries */ + boundaryObjs = createBoundaryObjects(); + + /* Get pointers to all the known DumpableObjects */ +
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: On 06/22/2012 09:39 PM, Tom Lane wrote: (Hey Stefan, is there a way on BSD to check a process's signals-blocked state from outside? If so, next time this happens you should try to determine the children's signal state.) with help from RhodiumToad on IRC: # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 12480 20004004 34084005 c942b002 fffefeff postgres: writer process (postgres) # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process (postgres) Well, the nonzero PENDING masks sure look like a smoking gun, but why are there multiple pending signals? And I'm not sure I know OpenBSD's signal numbers by heart. Could you convert those masks into text signal name lists for us? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On 06/22/2012 11:02 PM, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: On 06/22/2012 09:39 PM, Tom Lane wrote: (Hey Stefan, is there a way on BSD to check a process's signals-blocked state from outside? If so, next time this happens you should try to determine the children's signal state.) with help from RhodiumToad on IRC: # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 12480 20004004 34084005 c942b002 fffefeff postgres: writer process (postgres) # ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841 PID PENDING CAUGHT IGNORED BLOCKED COMMAND 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process (postgres) Well, the nonzero PENDING masks sure look like a smoking gun, but why are there multiple pending signals? And I'm not sure I know OpenBSD's signal numbers by heart. Could you convert those masks into text signal name lists for us? this seems to be SIGUSR1,SIGTERM and SIGQUIT Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: PID PENDING CAUGHT IGNORED BLOCKED COMMAND 12480 20004004 34084005 c942b002 fffefeff postgres: writer process 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process this seems to be SIGUSR1,SIGTERM and SIGQUIT OK, I looked up OpenBSD's signal numbers on the web. It looks to me like these two processes have everything blocked except KILL and STOP (which are unblockable of course). I do not see any place in the PG code that could possibly set such a mask (note that BlockSig should have more holes in it than that). So I'm thinking these must be blocked inside some system function that's installed a restrictive signal mask, or some such function forgot to restore the mask on exit. Could you gdb each of these processes and get a stack trace? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
oh, and just for comparison's sake, what do the postmaster's signal masks look like? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random failing builds on spoonbill - backends not exiting...
On Fri, Jun 22, 2012 at 2:57 PM, Andres Freund and...@2ndquadrant.com wrote: On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote: On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: sending a manual kill -15 to either of them does not seem to make them exit either... I did some further investiagations with robert on IM but I don't think he has any further ideas other than that I have a weird OS :) It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a new threading implementation compared to older OpenBSD versions. I remarked to Stefan that the symptoms seem consistent with the idea that the children have signals blocked. But I don't know how that could happen. You cannot block sigkill. Obviously. The issue is: the postmaster apparently sent SIGTERM (15) to all of these children, and yet they're not dead; and a manual SIGTERM doesn't kill them either. I'm sure SIGKILL (9) would do the trick, but then it's not a clean shutdown. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On Fri, Jun 22, 2012 at 12:28 PM, D'Arcy Cain da...@druid.net wrote: I doubt that an auto reverse the arguments facility would be very much cheaper. You could maybe argue that the aggregated maintenance and space costs of all the commutator-pair functions are enough to justify having some such solution instead, but I'm doubtful --- and even if true, getting from here to there would be painful. And it would only apply to a very specific type of function. The other idea I had was to just have the second C function call the first but that didn't work. Here is what I tried. PG_FUNCTION_INFO_V1(chkpass_eq); Datum chkpass_eq(PG_FUNCTION_ARGS) { chkpass *a1 = (chkpass *) PG_GETARG_POINTER(0); text *a2 = (text *) PG_GETARG_TEXT_P(1); char str[9]; strlcpy(str, a2-vl_dat, sizeof(str)); PG_RETURN_BOOL(strcmp(a1-password, crypt(str, a1-password)) == 0); } PG_FUNCTION_INFO_V1(chkpass_eq2); Datum chkpass_eq2(PG_FUNCTION_ARGS) { return chkpass_eq(PG_GETARG_POINTER(1), PG_GETARG_TEXT_P(0)); } Now in this specific case the function is trivial and writing it twice is no big deal but in general I hate writing the same code twice. I suppose I could extract the actual operation out to a third function and call it from the others. I may do that anyway just for the value of the example. Or is there a way to do what I tried above? I think DirectionFunctionCall2 is what you want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_prewarm
The biggest problem with pgfincore from my point of view is that it only works under Linux, whereas I use a MacOS X machine for my development, and there is also Windows to think about. Even if that were fixed, though, I feel we ought to have something in the core distribution. This patch got more +1s than 95% of what gets proposed on hackers. Fincore is only a blocker to this patch if we think pgfincore is ready to be proposed for the core distribution. Do we? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A good illustraton of why we need user-friendly system views
http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it If we had stable system views for all database objects (stable as in we just append to them), then refactoring our system tables wouldn't break things for our users. Just sayin'. (and don't tell me about information_schema, which is fairly useless for anything except tables and columns) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On Fri, Jun 22, 2012 at 10:04:23AM -0400, Robert Haas wrote: On Sat, Feb 4, 2012 at 8:18 PM, Noah Misch n...@leadboat.com wrote: If we add fsync calls to the initdb process, they should cover the entire data directory tree. ?This patch syncs files that initdb.c writes, but we ought to also sync files that bootstrap-mode backends had written. ?An optimization like the pg_flush_data() call in copy_file() may reduce the speed penalty. initdb should do these syncs by default and offer an option to disable them. This may be a stupid question, by why is it initdb's job to fsync the files the server creates, rather than the server's job? Normally we rely on the server to make its own writes persistent. Modularity would dictate having the server fsync its own work product, but I expect that approach to perform materially worse. initdb runs many single-user server instances, and each would fsync independently. When N initdb steps change one file, it would see N fsyncs. Using sync_file_range to queue all writes is best for the typical interactive or quasi-interactive initdb user. It's not always a win for server fsyncs, so we would need to either define special cases such that it's used during initdb or forgo the optimization. On the other hand, the server could skip some files, like fsm forks, in a principled manner. Overall, I think it will hard to improve modularity while retaining the performance Jeff's approach achieves through exploiting initdb's big-picture perspective. So I favor how Jeff has implemented it. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A good illustraton of why we need user-friendly system views
On Fri, Jun 22, 2012 at 9:30 PM, Josh Berkus j...@agliodbs.com wrote: http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it If we had stable system views for all database objects (stable as in we just append to them), then refactoring our system tables wouldn't break things for our users. Just sayin'. This has been discussed before, and I'm still not buying it. I mean, suppose you wrote code that depended on anything stated in a constraint always being true. Then we added deferrable constraints. Oops. But would you rather NOT have that feature? Appending columns doesn't help in that case. Or suppose you wrote code that depended on pg_stat_user_functions.total_time being an integer. Well, we could append a new column with a different datatype, but now you've got two columns with the same information, which is a confusing mess. I still remember the first time my application code got broken by a system catalog change. Some moron added pg_attribute.attisdropped, and boy was I annoyed. However, in between my annoyance, I realized that (1) adapting my code wasn't really going to be that hard and (2) being able to drop columns was a pretty good feature. Granted, a compatibility view would have worked in this case, but only if I'd been using the compatibility view rather than the underlying table, and I am not sure I would have been that smart. The compatibility breaks that really bother me are the ones that affect a lot of people: standard_conforming_strings, 8.3's implicit casting changes, and Tom's PL/plgsql lexer stuff that made a bunch of things no longer usable as unquoted variable names. That stuff breaks application code, sometimes quite a lot of it. System catalog changes have a pretty small impact by comparison, although of course (as in this case) it's not perfect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
On 12-06-22 07:09 PM, Robert Haas wrote: I think DirectionFunctionCall2 is what you want. Can you elaborate? I could not find a single hit in Google or the documentation search on the PG site and it does not appear anywhere in the source distribution. -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. IM: da...@vex.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMUTATOR doesn't seem to work
D'Arcy Cain da...@druid.net writes: On 12-06-22 07:09 PM, Robert Haas wrote: I think DirectionFunctionCall2 is what you want. Can you elaborate? I could not find a single hit in Google or the documentation search on the PG site and it does not appear anywhere in the source distribution. He meant DirectFunctionCall2 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A good illustraton of why we need user-friendly system views
On Fri, Jun 22, 2012 at 9:05 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 22, 2012 at 9:30 PM, Josh Berkus j...@agliodbs.com wrote: http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it If we had stable system views for all database objects (stable as in we just append to them), then refactoring our system tables wouldn't break things for our users. Just sayin'. This has been discussed before, and I'm still not buying it. I mean, suppose you wrote code that depended on anything stated in a constraint always being true. Then we added deferrable constraints. Oops. But would you rather NOT have that feature? Appending columns doesn't help in that case. also the incompatibility in the case of tablespaces was a good one... i saw cases where the link was manually moved to another place... and don't ask, don't know why they do this... so something reading the spclocation would have been misleading -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers