[HACKERS] Word-smithing doc changes
I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then the two table scans occur in a -second and third transaction. +second and third transaction. All active transactions at the time the +second table scan starts, not just ones that already involve the table, +have the potential to block the concurrent index creation until they +finish. When checking for transactions that could still use the original +index, concurrent index creation advances through potentially interfering +older transactions one at a time, obtaining shared locks on their virtual +transaction identifiers to wait for them to complete. Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about "shared locks on virtual transactions identifiers" we just talk about waiting for a transaction to complete. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like "active transactions", "potentially interfering older transactions", and "original index" -- from the user's point of view there's only one index and it just isn't completely built yet. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Thu, Jun 23, 2011 at 4:42 PM, Robert Haas wrote: > ProcArrayLock looks like a tougher nut to crack - there's simply no > way, with the system we have right now, that you can take a snapshot > without locking the list of running processes. I'm not sure what to > do about that, but we're probably going to have to come up with > something, because it seems clear that once we eliminate the lock > manager LWLock contention, this is a major bottleneck. Well as Tom observed earlier the kernel of a snapshot is actually a LSN. A snapshot contains a set of xids which all committed before some LSN and none which committed after it. So if we had a record of what log sequence number the commit record for any given transaction is we could build the snapshot at our leisure without any exclusive lock. In fact we could even build it lazily as a kind of cache only when we actually are interested in a given xid. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Range Types, constructors, and the type system
Different ranges over the same subtype make sense when using different total orders for the subtype. This is most apparent with text collation, but makes sense (at least mathematically, if not practically) for any subtype. For instance: [a, Z) is a valid range in "en_US", but not in "C", so it makes sense to have multiple ranges over the same subtype with different collations. But what if you have a function (like a constructor), of the form: (anyelement, anyelement) -> anyrange ? To work with the type system, you need to be able to figure out the return type from the arguments; which means to support functions like this we need a mapping from the subtype to the range type. Unfortunately, that restricts us to one range type per subtype (this isn't a problem for ARRAYs, because there is only one useful array type for a given element type). This problem first came up a while ago: http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php My workaround was to use domains, but that's not a very clean solution (you have to add a bunch of casts to make sure the right domain is chosen). It became entirely unworkable with collations, because people would be using different text collations a lot more frequently than, say, a different ordering for timestamptz. Tom mentioned that here: http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us I think Florian proposed the most promising line of attack here: http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org by suggesting that functions of the form: (anyelement, [other non-anyrange arguments]) -> anyrange might be expendable. After all, they are only useful for constructors as far as we can tell. Other range functions will have an anyrange parameter, and we can use the actual type of the argument to know the range type (as well as the subtype). Although it's very nice to be able to say: range(1,10) and get an int4range out of it, it's not the only way, and it's not without its problems anyway. For instance, to get an int8range you have to do: range(1::int8, 10::int8) or similar. So, we could just make functions like: int4range(int4, int4) int8range(int8, int8) ... when creating the range type, and it would actually be a usability improvement. There are at least a few constructors that would need to be made for each rangetype: the constructor above, the singleton constructor, constructors that have infinite bounds, the empty constructor, and all of the permutations for inclusivity/exclusivity. That adds up to quite a few catalog entries per range type. We could reduce some of the permutations by using extra arguments somehow, but that seems like it adds to the ugliness. This might also be a time to revisit whether there is a better way to present all of these constructors (rather than the _[co][co] suffixes to names, etc.). Even if we're willing to put up with a bunch of catalog entries, it will take a little creativity to figure out how to run the functions generically from a fixed set of C functions. Are there other thoughts or ideas about eliminating the need for generic constructors like range()? Another idea Florian suggested (in the same email) was the ability to declare the return type of a function, and then use the declared type to infer the argument types. That would be nice because you would just have to do: range(1,10)::int8range However, that's kind of backwards from how our type inference system works now, and sounds like a big change. Maybe we could consider a syntax addition for constructing range values? That was kicked around briefly, but perhaps we should revisit the possibilities there. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible connection leak in dblink?
On lör, 2011-06-25 at 13:36 -0700, Joe Conway wrote: > However, since this is really just a case of unused variables and not > a leaked connection, I'm inclined to just fix git master -- comments > on that? Please put it into 9.1 as well, so we can get a clean compile with gcc 4.6 there. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible connection leak in dblink?
On 06/14/2011 07:41 PM, Fujii Masao wrote: > On Wed, Jun 15, 2011 at 5:34 AM, Peter Eisentraut wrote: >> Otherwise the connection might not get freed. Could someone verify >> that? > > ISTM that the root problem is that dblink_send_query calls DBLINK_GET_CONN > though it doesn't accept the connection string as an argument. Since the first > argument in dblink_send_query must be the connection name, dblink_send_query > should call DBLINK_GET_NAMED_CONN instead. The variable 'freeconn' is used > only when DBLINK_GET_CONN is called. So, if dblink_send_query uses > DBLINK_GET_NAMED_CONN instead, the variable 'freeconn' is no longer necessary. > > The similar problem exists in dblink_get_result and dblink_record_internal. > Attached patch fixes those problems. Fujii's assessment looks correct, although arguably the change is unnecessary for dblink_record_internal. Looks like the issue with dblink_send_query goes back through 8.4, while dblink_record_internal could be fixed as far back as 8.2. However, since this is really just a case of unused variables and not a leaked connection, I'm inclined to just fix git master -- comments on that? Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Repeated PredicateLockRelation calls during seqscan
Heikki Linnakangas wrote: > BTW, isn't bitgetpage() in nodeBitmapHeapscan.c missing > PredicateLockTuple() and CheckForSerializableConflictOut() calls in > the codepath for a lossy bitmap? In the non-lossy case, > heap_hot_search_buffer() takes care of it, but not in the lossy > case. I think the attached addresses that. In looking this over I noticed something else that doesn't seem quite right. In heapam.c there are two places where the execution of PredicateLockTuple() is conditioned not just on MVCC visibility, but also on HeapKeyTest(). I think those calls should be moved to not be conditioned on that. Otherwise we have a predicate condition being tested without "locking the gaps", don't we? -Kevin ssi-lossy-bitmap.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_hot_search_buffer refactoring
On Fri, 2011-06-24 at 15:32 -0400, Robert Haas wrote: > On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas wrote: > > New patch attached, with that one-line change. > > Jeff, are you planning to review this further? Do you think it's OK to > commit? 1. Patch does not apply to master cleanly, and it's in unified format (so I can't compare it against the old patch very easily). This review is for the first patch, disregarding the "skip = !first_call" issue that you already fixed. If you had other changes in the latest version, please repost the patch. 2. Comment above heap_hot_search_buffer should be updated to document that heapTuple is an out-parameter and document the behavior of first_call 3. The logic around "skip" is slightly confusing to me. Here's my description: if it's not an MVCC snapshot and it's not the first call, then you don't actually want to fetch the tuple with the given tid or a later one in the chain -- you want to fetch the _next_ tuple in the chain or a later one in the chain. Some wording of that description in a comment (either in the function's comment or near the use of "skip") would help a lot. Also, if skip is true, then the tid _must_ be visible according to the (non-MVCC) snapshot, correct? It might help if that was apparent from the code/comments. Other than that, it looks good. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] debugging tools inside postgres
On Fri, Jun 24, 2011 at 02:35:08PM +0800, HuangQi wrote: > Hi, >I'm trying to debug a modification for the query planner. But I found it > seems the data structure of my planned query is incorrect. I was trying to > print out the data structure by use the "p" command in gdb which is quite > inconvenient and takes time. May I know is there any embedded function in > postgres to print out the node data structures, or any other plan related > data structures? Thanks. I don't know if anyone has done it, but recent versions of gdb apparenly can use python scripts, and use them to dump c++ library structures in readable formats. I guess someone could write some script to make debugging postgresql nicer (pretty printing snapshots, locks, etc). Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] WIP: Fast GiST index build
On Sat, Jun 25, 2011 at 11:03 AM, Jesper Krogh wrote: > * Wouldn't it be natural to measure the performance benefits of > disc-bound tests in an SSD setup? > Sure, it would be great to run performance tests on SSD drives too. Unfortunately, I don't have corresponding test platform just now. ... my understanding of Fast gi(n|st) index build is that it is > more or less a challenge to transform a lot of random IO workload > to be more sequential and collapse multiple changes into fewer. > The main benefit of proposed algorithm is to greatly reduce number IO operations during index build due to dealing with great number of index tuples simultaneously. And it also makes some IO more sequential. I haven't precise measures yet, but I belive that contribution of making IO more sequantial is not very significant. > In terms of random IO an SSD can easily be x100 better than rotating > drives and it would be a shame to optimize "against" that world? > Actually, I'm not sure that IO is bottle neck of GiST index build on SSD drives. It's more likely for me that CPU becomes a bottle neck in this case and optimizing IO can't give much benefit. But anyway, the value of this work can be in producing better index in some cases and SSD drive lifetime economy due to less IO operations. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP: Fast GiST index build
On 2011-06-06 09:42, Heikki Linnakangas wrote: took about 15 hours without the patch, and 2 hours with it. That's quite dramatic. With the precense of robust consumer-class SSD-drives that can be found in sizes where they actually can fit "many" database usage scenarios. A PostgreSQL version is not likely to hit the streets before 50% of PostgreSQL users are sitting on "some kind" of flash based storage (for the part where the entire dataset doesn't fit in memory any more). Point is: * Wouldn't it be natural to measure the performance benefits of disc-bound tests in an SSD setup? ... my understanding of Fast gi(n|st) index build is that it is more or less a challenge to transform a lot of random IO workload to be more sequential and collapse multiple changes into fewer. In terms of random IO an SSD can easily be x100 better than rotating drives and it would be a shame to optimize "against" that world? -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers