Re: [HACKERS] preserving forensic information when we freeze
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund wrote: >> I wondered that, too, but it's not well-defined for all tuples. What >> happens if you pass in constructed tuple rather than an on-disk tuple? > > Those should be discernible I think, t_self/t_tableOid won't be set for > generated tuples. I went looking for existing precedent for code that does things like this and found record_out, which does this: HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0); ... /* Extract type info from the tuple itself */ tupType = HeapTupleHeaderGetTypeId(rec); tupTypmod = HeapTupleHeaderGetTypMod(rec); tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); ncolumns = tupdesc->natts; /* Build a temporary HeapTuple control structure */ tuple.t_len = HeapTupleHeaderGetDatumLength(rec); ItemPointerSetInvalid(&(tuple.t_self)); tuple.t_tableOid = InvalidOid; tuple.t_data = rec; This appears to be a typical pattern, although interestingly I noticed that row_to_json() doesn't bother setting t_tableOid or t_self, which I think it's supposed to do. The problem I see here is that this code seems to imply that a function passed a record doesn't actually have enough information to know what sort of a thing it's getting. The use of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that it's safe to assume that t_choice will contain DatumTupleFields rather than HeapTupleFields, which doesn't seem to bode well for your approach. Am I missing a trick? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XML Issue with DTDs
On Dec20, 2013, at 18:52 , Robert Haas wrote: > On Thu, Dec 19, 2013 at 6:40 PM, Florian Pflug wrote: >> Solving this seems a bit messy, unfortunately. First, I think we need to >> have some XMLOPTION value which is a superset of all the others - otherwise, >> dump & restore won't work reliably. That means either allowing DTDs if >> XMLOPTION is CONTENT, or inventing a third XMLOPTION, say ANY. > > Or we can just decide that it was a bug that this was ever allowed, > and if you upgrade to $FIXEDVERSION you'll need to sanitize your data. > This is roughly what we did with encoding checks. What exactly do you suggest we outlaw? If there are XML values which are CONTENT but not a DOCUMENT, and other values which are a DOCUMENT but not CONTENT, then what is pg_restore supposed to set XMLOPTION to? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
On 12/20/2013 04:44 PM, Gavin Flower wrote: > On 21/12/13 13:40, Josh Berkus wrote: >> On 12/20/2013 03:09 PM, Gavin Flower wrote: >>> What about leap years? >> What about them? >> > some years have 365 days others have 366, so how any days in an interval > of 2 years?, 4 years? Your question isn't relevant to this patch. It's not defining the interval type, just creating an alternate constructor for it. (the answer is, it depends on what timestamp you're adding it to ...) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
On 21/12/13 13:40, Josh Berkus wrote: On 12/20/2013 03:09 PM, Gavin Flower wrote: What about leap years? What about them? some years have 365 days others have 366, so how any days in an interval of 2 years?, 4 years? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
On 12/20/2013 03:09 PM, Gavin Flower wrote: > What about leap years? What about them? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 2013-12-20 15:16:30 -0800, David E. Wheeler wrote: > But for the hstore feature itself, I think the current interface and features > are ready to go. I think this patch needs significant amount of work because it can be considered ready for committer. I found the list of issues in http://archives.postgresql.org/message-id/20131118163633.GE20305%40awork2.anarazel.de within 10 minutes, indicating that it clearly cannot be ready yet. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On Nov 12, 2013, at 10:35 AM, Teodor Sigaev wrote: > Hi! > > Attatched patch adds nesting feature, types (string, boll and numeric > values), arrays and scalar to hstore type. My apologies for not getting to this sooner, work has been a bit nutty. The truth is that I reviewed this patch quite a bit a month back, mostly so I could write documentation, the results of which are included in this patch. And I'm super excited for what's to come in the next iteration, as I hear that Teodor and Andrew are hard at work adding jsonb as a binary-compatible JSON data type. Meanwhile, for this version, a quick overview of what has changed since 9.2. Contents & Purpose == Improved Data Type Support -- * Added data type support for values. Previously they could only be strings or NULL, but with this patch they can also be numbers or booleans. * Added array support. Values can be arrays of other values. The format for arrays is a bracketed, comma-delimited list. * Added nesting support. hstore values can themselves be hstores. Nested hstores are wrapped in braces, but the root-level hstore is not (for compatibility with the format of previous versions of hstore). * An hstore value is no longer required to be an hstore object. It can now be any scalar value. These three items make the basic format feature-complete with JSON. Here's an example where the values are scalars: =% SELECT 'foo'::hstore, '"hi \"bob\""'::hstore, '1.0'::hstore, 'true'::hstore, NULL::hstore; hstore |hstore| hstore | hstore | hstore +--+++ "foo" | "hi \"bob\"" | 1.0| t | And here are a couple of arrays with strings, numbers, booleans, and NULLs: SELECT '[k,v]'::hstore, '[1.0, "hi there", false, null]'::hstore; hstore | hstore + ["k", "v"] | [1.0, "hi there", f, NULL] Here's a complicated example formatted with `hstore.pretty_print` enabled. =% SET hstore.pretty_print=true; =% SELECT '{ "type" => "Feature", "bbox" => [-180.0, -90.0, 180.0, 90.0], "geometry" => { "type" => "Polygon", "coordinates" => [[ [-180.0, 10.0], [20.0, 90.0], [180.0, -5.0], [-30.0, -90.0] ]] } }'::hstore; hstore -- "bbox"=>+ [ + -180.0, + -90.0, + 180.0, + 90.0+ ], + "type"=>"Feature", + "geometry"=>+ { + "type"=>"Polygon", + "coordinates"=> + [ + [ + [ + -180.0, + 10.0+ ], + [ + 20.0, + 90.0+ ], + [ + 180.0, + -5.0+ ], + [ + -30.0, + -90.0 + ] + ] + ] + } So, exact feature parity with the JSON data type. * hstore.pretty_print is a new GUC, specifically to allow an HSTORE value to be pretty-printed. There is also a function to pretty-print, so we might be able to just do away with the GUC. Interface - * New operators: + `hstore -> int`: Get string value at array index (starting at 0) + `hstore ^> text`:Get numeric value for key + `hstore ^> int`: Get numeric value at array index + `hstore ?> text`:Get boolean value for key + `hstore ?> int`: Get boolean value at array index + `hstore #> text[]`: Get string value for key path + `hstore #^> text[]`: Get numeric value for key path + `hstore #?> text[]`: Get boolean value for key path + `hstore %> text`:Get hstore value for key + `hstore %> int`: Get hstore value at array index + `hstore #%> text[]`: Get hstore value for key path + `hstore ? int`: Does hstore contain array index + `hstore #? text[]`: Does hstore contain key path + `hstore - int`: Delete index from left operand + `hstore #- text[]`: Delete key path from left operand * New functions: + `hstore(text)`: Make a text scalar hstore + `hstore(numeric)`: Make a numeric scalar hstore + `hstore(boolean)`: Make a boolean scalar hstore + `hstore(text, hstore)`: Make a nested hstore
Re: [HACKERS] make_interval ??
On 21/12/13 06:29, Josh Berkus wrote: Pavel, So constructor should to look like: CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int DEFAULT 0, ...) and usage: SELECT make_interval(years := 2) SELECT make_interval(days := 14) Is there a interest for this (or similar) function? It would certainly make our Python users happy. And for that matter would get rid of this kind of stupid thing in stored procedure code: time_ahead := ( interval '1 minute' * var_skip ); So, +1 for the feature. What about leap years? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Dec 20, 2013 at 12:39 PM, Heikki Linnakangas wrote: > Hmm. If I understand the problem correctly, it's that as soon as another > backend sees the tuple you've inserted and calls XactLockTableWait(), it > will not stop waiting even if we later decide to kill the already-inserted > tuple. Forgive me for being pedantic, but I wouldn't describe it that way. Quite simply, the tuples speculatively inserted (and possibly later deleted) are functionally value locks, that presently cannot be easily released (so my point is it doesn't matter if you're currently waiting on XactLockTableWait() or are just about to). I have to wonder about the performance implications of fixing this, even if we suppose the fix is itself inexpensive. The current approach probably benefits from not having to re-acquire value locks from previous attempts, since everyone still has to care about "value locks" from our previous attempts. The more I think about it, the more opposed I am to letting this slide, which is an notion I had considered last night, if only because MySQL did so for many years. This is qualitatively different from other cases where we deadlock. Even back when we exclusive locked rows as part of foreign key enforcement, I think it was more or less always possible to do an analysis of the dependencies that existed, to ensure that locks were acquired in a predictable order so that deadlocking could not occur. Now, maybe that isn't practical for an entire app, but it is practical to do in a localized way as problems emerge. In contrast, if we allowed unprincipled deadlocking, the only advice we could give is "stop doing so much upserting". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 12/20/2013 10:56 PM, Alvaro Herrera wrote: Robert Haas escribió: On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas wrote: Hmm. If I understand the problem correctly, it's that as soon as another backend sees the tuple you've inserted and calls XactLockTableWait(), it will not stop waiting even if we later decide to kill the already-inserted tuple. One approach to fix that would be to release and immediately re-acquire the transaction-lock, when you kill an already-inserted tuple. Then teach the callers of XactLockTableWait() to re-check if the tuple is still alive. That particular mechanism sounds like a recipe for unintended consequences. Yep, what I thought too. There are probably other ways to make that general idea work though. I didn't follow this thread carefully, but is the idea that there would be many promise tuples "live" at any one time, or only one? Because if there's only one, or a very limited number, it might be workable to sleep on that tuple's lock instead of the xact's lock. Only one. heap_update() and heap_delete() also grab a heavy-weight lock on the tuple, before calling XactLockTableWait(). _bt_doinsert() does not, but it could. Perhaps we can take advantage of that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
Hi, On 2013-12-18 15:23:23 -0500, Robert Haas wrote: > It sounds like most people who have looked at this stuff are broadly > happy with it, so I'd like to push on toward commit soon, but it'd be > helpful, Andres, if you could review the comment additions to > shm-mq-v2.patch and see whether those address your concerns. If so, > I'll see about improving the overall comments for shm-toc-v1.patch as > well to clarify the points that seem to have caused a bit of > confusion; specific thoughts on what ought to be covered there, or any > other review, is most welcome. Some things: * shm_mq_minimum_size should be const * the MAXALIGNing in shm_mq_create looks odd. We're computing data_offset using MAXALIGN, determine the ring size using it, but then set mq_ring_offset to data_offset - offsetof(shm_mq, mq_ring)? * same problem as the toc stuff with a dynamic number of spinnlocks. * you don't seem to to initialize the spinlock anywhere. That's clearly not ok, independent of the spinlock implementation. * The comments around shm_mq/shm_mq_handle are a clear improvement. I'd be pretty happy if you additionally add someting akin to 'This struct describes the shared state of a queue' and 'Backend-local reference to a queue'. * s/MQH_BUFSIZE/MQH_INITIAL_BUFSIZE/, I was already wondering whether there's a limit on the max number of bytes. * I think shm_mq_attach()'s documentation should mention that we'll initially and later allocate memory in the current CurrentMemoryContext when attaching. That will likely require some care for some callers. Yes, it's documented in a bigger comment somewhere else, but still. * I don't really understand the mqh_consume_pending stuff. If we just read a message spanning from the beginning to the end of the ringbuffer, without wrapping, we might not confirm the read in shm_mq_receive() until the next the it is called? Do I understand that correctly? I am talking about the following and other similar pieces of code: + if (rb >= needed) + { + /* +* Technically, we could consume the message length information at +* this point, but the extra write to shared memory wouldn't be +* free and in most cases we would reap no benefit. +*/ + mqh->mqh_consume_pending = needed; + *nbytesp = nbytes; + *datap = ((char *) rawdata) + MAXALIGN64(sizeof(uint64)); + return SHM_MQ_SUCCESS; + } * ISTM the messages around needing to use the same arguments for send/receive again after SHM_MQ_WOULD_BLOCK could stand to be more forceful. "In this case, the caller should call this function again after the process latch has been set." doesn't make it all that clear that failure to do so will corrupt the next message. Maybe there also should be an assert checking that the size is the same when retrying as when starting a partial read? * You have some CHECK_FOR_INTERRUPTS(), are you sure the code is safe to longjmp out from there in all the cases? E.g. I am not sure it is for shm_mq_send_bytes(), when we've first written a partial message, done a shm_mq_inc_bytes_written() and then go into the available == 0 branch and jump out. * Do I understand it correctly that mqh->mqh_counterparty_attached is just sort of a cache, and we'll still detect a detached counterparty when we're actually sending/receiving? That's it for now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Robert Haas escribió: > On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas > wrote: > > Hmm. If I understand the problem correctly, it's that as soon as another > > backend sees the tuple you've inserted and calls XactLockTableWait(), it > > will not stop waiting even if we later decide to kill the already-inserted > > tuple. > > > > One approach to fix that would be to release and immediately re-acquire the > > transaction-lock, when you kill an already-inserted tuple. Then teach the > > callers of XactLockTableWait() to re-check if the tuple is still alive. > > That particular mechanism sounds like a recipe for unintended consequences. Yep, what I thought too. There are probably other ways to make that general idea work though. I didn't follow this thread carefully, but is the idea that there would be many promise tuples "live" at any one time, or only one? Because if there's only one, or a very limited number, it might be workable to sleep on that tuple's lock instead of the xact's lock. Another thought is to have a different LockTagType that signals a transaction that's doing the INSERT/ON DUPLICATE thingy, and remote backends sleep on that instead of the regular transaction lock. That different lock type could be released and reacquired as proposed by Heikki above without danger of unintended consequences. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas wrote: > Hmm. If I understand the problem correctly, it's that as soon as another > backend sees the tuple you've inserted and calls XactLockTableWait(), it > will not stop waiting even if we later decide to kill the already-inserted > tuple. > > One approach to fix that would be to release and immediately re-acquire the > transaction-lock, when you kill an already-inserted tuple. Then teach the > callers of XactLockTableWait() to re-check if the tuple is still alive. That particular mechanism sounds like a recipe for unintended consequences. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 12/20/2013 06:06 AM, Peter Geoghegan wrote: On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan wrote: Empirically, retrying because ExecInsertIndexTuples() returns some recheckIndexes occurs infrequently, so maybe that makes all of this okay. Or maybe it happens infrequently *because* we don't give up on insertion when it looks like the current iteration is futile. Maybe just inserting into every unique index, and then blocking on an xid within ExecCheckIndexConstraints(), works out fairly and performs reasonably in all common cases. It's pretty damn subtle, though, and I worry about the worst case performance, and basic correctness issues for these reasons. I realized that it's possible to create the problem that I'd previously predicted with "promise tuples" [1] some time ago, that are similar in some regards to what Heikki has here. At the time, Robert seemed to agree that this was a concern [2]. I have a very simple testcase attached, much simpler that previous testcases, that reproduces deadlock for the patch exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and occasionally when scale=10 (for tiny, single-statement transactions). With scale=100, I can't get it to deadlock on my laptop (60 clients in all cases), at least in a reasonable time period. With the patch btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock, even with scale=1, simply because value locks are not held on to across row locking. This is why I characterized the locking as "opportunistic" on several occasions in months past. The test-case is actually much simpler than the one I describe in [1], and much simpler than all previous test-cases, as there is only one unique index, though the problem is essentially the same. It is down to old "value locks" held across retries - with "exclusion_...", we can't *stop* locking things from previous locking attempts (where a locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL flag), because dirty snapshots still see inserted-then-deleted-in-other-xact tuples. This deadlocking seems unprincipled and unjustified, which is a concern that I had all along, and a concern that Heikki seemed to share more recently [3]. This is why I felt strongly all along that value locks ought to be cheap to both acquire and _release_, and it's unfortunate that so much time was wasted on tangential issues, though I do accept some responsibility for that. Hmm. If I understand the problem correctly, it's that as soon as another backend sees the tuple you've inserted and calls XactLockTableWait(), it will not stop waiting even if we later decide to kill the already-inserted tuple. One approach to fix that would be to release and immediately re-acquire the transaction-lock, when you kill an already-inserted tuple. Then teach the callers of XactLockTableWait() to re-check if the tuple is still alive. I'm just waving hands here, but the general idea is to somehow wake up others when you kill the tuple. We could make use of that facility to also let others to proceed, if you delete a tuple in the same transaction that you insert it. It's a corner case, not worth much on its own, but I think it would fall out of the above machinery for free, and be an easier way to test it than inducing deadlocks with ON DUPLICATE. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Fri, Dec 20, 2013 at 11:36 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > On 12/19/2013 03:33 PM, Heikki Linnakangas wrote: > >> On 12/17/2013 12:49 AM, Heikki Linnakangas wrote: >> >>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote: >>> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas >>> > wrote: > On 12/12/2013 06:44 PM, Alexander Korotkov wrote: > > When values are packed into small groups, we have to either insert > >> inefficiently encoded value or re-encode whole right part of values. >> > > It would probably be simplest to store newly inserted items > uncompressed, > in a separate area in the page. For example, grow the list of > uncompressed > items downwards from pg_upper, and the compressed items upwards from > pg_lower. When the page fills up, re-encode the whole page. > >>> I hacked together an implementation of a variant of Simple9, to see how >>> it performs. Insertions are handled per the above scheme. >>> >> >> Here's an updated version of that, using the page layout without >> item-indexes that I described in the other post. This is much less buggy >> than that earlier crude version I posted - and unfortunately it doesn't >> compress as well. The earlier version lost some items :-(. >> >> Nevertheless, I think this page layout and code formatting is better, >> even if we switch the encoding back to the varbyte encoding in the end. >> > > Yet another version. The encoding/decoding code is now quite isolated in > ginpostinglist.c, so it's easy to experiment with different encodings. This > patch uses varbyte encoding again. > > I got a bit carried away, experimented with a bunch of different > encodings. I tried rice encoding, rice encoding with block and offset > number delta stored separately, the simple9 variant, and varbyte encoding. > > The compressed size obviously depends a lot on the distribution of the > items, but in the test set I used, the differences between different > encodings were quite small. > > One fatal problem with many encodings is VACUUM. If a page is completely > full and you remove one item, the result must still fit. In other words, > removing an item must never enlarge the space needed. Otherwise we have to > be able to split on vacuum, which adds a lot of code, and also makes it > possible for VACUUM to fail if there is no disk space left. That's > unpleasant if you're trying to run VACUUM to release disk space. (gin fast > updates already has that problem BTW, but let's not make it worse) > > I believe that eliminates all encodings in the Simple family, as well as > PForDelta, and surprisingly also Rice encoding. For example, if you have > three items in consecutive offsets, the differences between them are > encoded as 11 in rice encoding. If you remove the middle item, the encoding > for the next item becomes 010, which takes more space than the original. > > AFAICS varbyte encoding is safe from that. (a formal proof would be nice > though). > Formal proof is so. Removing number is actually replacement of two numbers with their sum. We have to proof that varbyte encoding of sum can't be longer than varbyte encoding of summands.High bit number of sum is at most one more than high bit number of larger summand. So varbyte encoding of sum is at most one byte longer than varbyte encoding of larger summand. Lesser summand is at least one byte. > So, I'm happy to go with varbyte encoding now, indeed I don't think we > have much choice unless someone can come up with an alternative that's > VACUUM-safe. I have to put this patch aside for a while now, I spent a lot > more time on these encoding experiments than I intended. If you could take > a look at this latest version, spend some time reviewing it and cleaning up > any obsolete comments, and re-run the performance tests you did earlier, > that would be great. One thing I'm slightly worried about is the overhead > of merging the compressed and uncompressed posting lists in a scan. This > patch will be in good shape for the final commitfest, or even before that. OK. I'll make tests for scans on mix of compressed and uncompressed posting lists. However, I don't expect it to be slower than both pure compressed and pure uncompressed posting lists. Overhead of compressing uncompressed posting lists is evident. But it also would be nice to measure. -- With best regards, Alexander Korotkov.
Re: [HACKERS] GIN improvements part 1: additional information
Alexander Korotkov escribió: > On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera > wrote: > > > Heikki Linnakangas escribió: > > > > > I believe that eliminates all encodings in the Simple family, as > > > well as PForDelta, and surprisingly also Rice encoding. For example, > > > if you have three items in consecutive offsets, the differences > > > between them are encoded as 11 in rice encoding. If you remove the > > > middle item, the encoding for the next item becomes 010, which takes > > > more space than the original. > > > > I don't understand this. If you have three consecutive entries, and the > > differences between them are 11, you need to store two 11s. But if you > > have two items, you only need to store 010 once. So the difference is > > larger, but since you need to store only one of them then overall it's > > still shorter than the original. No? > > I believe Heikki mean both differences are encoded as 11, each one is 1. Oh, that sucks (or it's great, depending on perspective). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Fri, Dec 20, 2013 at 2:17 PM, Alvaro Herrera wrote: > Robert Haas escribió: >> On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera >> wrote: > >> > I assume without checking that passing reloid/ctid would allow this to >> > work for tuples in a RETURNING clause; and if we ever have an OLD >> > reference for the RETURNING clause of an UPDATE, that it would work >> > there, too, showing the post-update status of the updated tuple. >> >> I don't understand what you're saying here. Are you saying that >> reloid/ctid is a better approach, a worse approach, or just a >> different approach? > > That probably wasn't worded very well. I am just saying that whatever > approach we end up with, it would be nice that it worked somehow with > RETURNING clauses. Hmm. It's not evident to me why that particular case would be any different than anything else, but that might be my ignorance showing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
On Fri, Dec 20, 2013 at 2:33 PM, Andres Freund wrote: > On 2013-12-20 14:10:57 -0500, Robert Haas wrote: >> > Since you're embedding spinlocks in struct shm_toc, this module will be >> > in conflict with platforms that do --disable-spinlocks, since the number >> > of spinlocks essentially needs to be predetermined there. I personally >> > still think the solution to that is getting rid of --disable-spinlocks. >> >> Yep. We can either deprecate --disable-spinlocks, or we can add an >> API that is the reverse of S_INIT_LOCK(). > > How is that going to help? Even if we'd deallocate unused spinlocks - > which sure is a good idea when they require persistent resources like > the PGSemaphore based one - the maximum is still going to be there. Well, we can set the maximum to a bigger number, if that's what we want to do, but I'll admit it's unclear what value would be sufficient. We probably won't have more than one TOC per DSM, and the maximum number of DSMs is capped based on MaxBackends, but it seems likely that many applications of dynamic shared memory will require spinlocks, and I don't think we can plausibly calculate an upper bound there. If we could it would be a big number, and that might just make startup fail anyway. Personally, I don't have a big problem with the idea that if you compile with --disable-spinlocks, some things may just fail, and parallel whatever might be one of those things. We could just bump the "30" in SpinlockSemas to "100", and if you happen to use more than that, too bad for you: it'll fail. But I also don't have a big problem with removing --disable-spinlocks altogether. What do other people think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera wrote: > Heikki Linnakangas escribió: > > > I believe that eliminates all encodings in the Simple family, as > > well as PForDelta, and surprisingly also Rice encoding. For example, > > if you have three items in consecutive offsets, the differences > > between them are encoded as 11 in rice encoding. If you remove the > > middle item, the encoding for the next item becomes 010, which takes > > more space than the original. > > I don't understand this. If you have three consecutive entries, and the > differences between them are 11, you need to store two 11s. But if you > have two items, you only need to store 010 once. So the difference is > larger, but since you need to store only one of them then overall it's > still shorter than the original. No? I believe Heikki mean both differences are encoded as 11, each one is 1. -- With best regards, Alexander Korotkov.
Re: [HACKERS] preserving forensic information when we freeze
Robert Haas escribió: > On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera > wrote: > > I assume without checking that passing reloid/ctid would allow this to > > work for tuples in a RETURNING clause; and if we ever have an OLD > > reference for the RETURNING clause of an UPDATE, that it would work > > there, too, showing the post-update status of the updated tuple. > > I don't understand what you're saying here. Are you saying that > reloid/ctid is a better approach, a worse approach, or just a > different approach? That probably wasn't worded very well. I am just saying that whatever approach we end up with, it would be nice that it worked somehow with RETURNING clauses. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
Heikki Linnakangas escribió: > I believe that eliminates all encodings in the Simple family, as > well as PForDelta, and surprisingly also Rice encoding. For example, > if you have three items in consecutive offsets, the differences > between them are encoded as 11 in rice encoding. If you remove the > middle item, the encoding for the next item becomes 010, which takes > more space than the original. I don't understand this. If you have three consecutive entries, and the differences between them are 11, you need to store two 11s. But if you have two items, you only need to store 010 once. So the difference is larger, but since you need to store only one of them then overall it's still shorter than the original. No? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On 12/20/2013 08:34 PM, Fujii Masao wrote: On Fri, Dec 20, 2013 at 2:05 PM, Sawada Masahiko wrote: On Fri, Dec 20, 2013 at 3:38 AM, Sawada Masahiko wrote: I attached the patch which changes name from 'wal_log_hintbits' to 'wal_log_hints'. It gained the approval of plural. Sorry the patch which I attached has wrong indent on pg_controldata. I have modified it and attached the new version patch. Thanks! Committed. Thanks. I was hoping someone would come up with an even better name, but since no-one did, I agree that's better. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 12/19/2013 03:33 PM, Heikki Linnakangas wrote: On 12/17/2013 12:49 AM, Heikki Linnakangas wrote: On 12/17/2013 12:22 AM, Alexander Korotkov wrote: On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas wrote: On 12/12/2013 06:44 PM, Alexander Korotkov wrote: When values are packed into small groups, we have to either insert inefficiently encoded value or re-encode whole right part of values. It would probably be simplest to store newly inserted items uncompressed, in a separate area in the page. For example, grow the list of uncompressed items downwards from pg_upper, and the compressed items upwards from pg_lower. When the page fills up, re-encode the whole page. I hacked together an implementation of a variant of Simple9, to see how it performs. Insertions are handled per the above scheme. Here's an updated version of that, using the page layout without item-indexes that I described in the other post. This is much less buggy than that earlier crude version I posted - and unfortunately it doesn't compress as well. The earlier version lost some items :-(. Nevertheless, I think this page layout and code formatting is better, even if we switch the encoding back to the varbyte encoding in the end. Yet another version. The encoding/decoding code is now quite isolated in ginpostinglist.c, so it's easy to experiment with different encodings. This patch uses varbyte encoding again. I got a bit carried away, experimented with a bunch of different encodings. I tried rice encoding, rice encoding with block and offset number delta stored separately, the simple9 variant, and varbyte encoding. The compressed size obviously depends a lot on the distribution of the items, but in the test set I used, the differences between different encodings were quite small. One fatal problem with many encodings is VACUUM. If a page is completely full and you remove one item, the result must still fit. In other words, removing an item must never enlarge the space needed. Otherwise we have to be able to split on vacuum, which adds a lot of code, and also makes it possible for VACUUM to fail if there is no disk space left. That's unpleasant if you're trying to run VACUUM to release disk space. (gin fast updates already has that problem BTW, but let's not make it worse) I believe that eliminates all encodings in the Simple family, as well as PForDelta, and surprisingly also Rice encoding. For example, if you have three items in consecutive offsets, the differences between them are encoded as 11 in rice encoding. If you remove the middle item, the encoding for the next item becomes 010, which takes more space than the original. AFAICS varbyte encoding is safe from that. (a formal proof would be nice though). So, I'm happy to go with varbyte encoding now, indeed I don't think we have much choice unless someone can come up with an alternative that's VACUUM-safe. I have to put this patch aside for a while now, I spent a lot more time on these encoding experiments than I intended. If you could take a look at this latest version, spend some time reviewing it and cleaning up any obsolete comments, and re-run the performance tests you did earlier, that would be great. One thing I'm slightly worried about is the overhead of merging the compressed and uncompressed posting lists in a scan. This patch will be in good shape for the final commitfest, or even before that. - Heikki gin-packed-postinglists-varbyte2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
On 2013-12-20 14:10:57 -0500, Robert Haas wrote: > > Since you're embedding spinlocks in struct shm_toc, this module will be > > in conflict with platforms that do --disable-spinlocks, since the number > > of spinlocks essentially needs to be predetermined there. I personally > > still think the solution to that is getting rid of --disable-spinlocks. > > Yep. We can either deprecate --disable-spinlocks, or we can add an > API that is the reverse of S_INIT_LOCK(). How is that going to help? Even if we'd deallocate unused spinlocks - which sure is a good idea when they require persistent resources like the PGSemaphore based one - the maximum is still going to be there. > > I vote for removing all the shm_toc_estimator() knowledge from the > > header, there seems little reason to expose it that way. That just > > exposes unneccessary details and makes fixes after releases harder > > (requiring extensions to recompile). Function call costs hardly can > > matter, right? > > Oh, you're saying to make it a function rather than a macro? Sure, I > could do that. Yea, keeping it private, as a function, seems like a good idea. > > Do you assume that lookup speed isn't that important? I have a bit of a > > hard time imagining that linear search over the keys isn't going to bite > > us. At the least there should be a comment somewhere documenting that > > the indented usecase is having a relatively few keys. > > Well, as previously noted, in the use cases I imagine for this, it'll > be nworkers + somesmallconstant. I can add a comment to that effect. I primarily was wondering because you have gone through a bit of effort making it lockless. A comment would be good, yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: option --if-exists for pg_dump
Hello next version pg_restore knows --if-exists option now Regards Pavel Stehule 2013/12/13 Pavel Stehule > Hello > > I am sending a rebased patch. > > Now dump generated with --if-exists option is readable by pg_restore > > Regards > > Pavel > commit 7c79aa77ccf53252bac8ce2302a7a0f7a1942e9b Author: Pavel Stehule Date: Fri Dec 20 20:13:00 2013 +0100 inital diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 8d45f24..18f7346 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -650,6 +650,16 @@ PostgreSQL documentation + --if-exists + + +It use conditional commands (with IF EXISTS +clause) for cleaning database schema. + + + + + --disable-dollar-quoting diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index 5c6a101..670539e 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -301,6 +301,16 @@ PostgreSQL documentation + --if-exists + + +It use conditional commands (with IF EXISTS +clause) for cleaning database schema. + + + + + --inserts diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml index 717da42..dd8e990 100644 --- a/doc/src/sgml/ref/pg_restore.sgml +++ b/doc/src/sgml/ref/pg_restore.sgml @@ -490,6 +490,16 @@ + --if-exists + + +It use conditional commands (with IF EXISTS +clause) for cleaning database schema. + + + + + --no-data-for-failed-tables diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 6927968..83f7216 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -113,6 +113,7 @@ typedef struct _restoreOptions char *superuser; /* Username to use as superuser */ char *use_role; /* Issue SET ROLE to this */ int dropSchema; + int if_exists; const char *filename; int dataOnly; int schemaOnly; diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 63a8009..327ff03 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -55,7 +55,8 @@ static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt, const int compression, ArchiveMode mode, SetupWorkerPtr setupWorkerPtr); static void _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH); -static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isData, bool acl_pass); +static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, + bool isData, bool acl_pass); static char *replace_line_endings(const char *str); static void _doSetFixedOutputState(ArchiveHandle *AH); static void _doSetSessionAuth(ArchiveHandle *AH, const char *user); @@ -234,6 +235,7 @@ RestoreArchive(Archive *AHX) bool parallel_mode; TocEntry *te; OutputContext sav; + AH->stage = STAGE_INITIALIZING; @@ -409,8 +411,24 @@ RestoreArchive(Archive *AHX) /* Select owner and schema as necessary */ _becomeOwner(AH, te); _selectOutputSchema(AH, te->namespace); -/* Drop it */ -ahprintf(AH, "%s", te->dropStmt); + +if (*te->dropStmt != '\0') +{ + if (ropt->if_exists) + { + int desc_len = strlen(te->desc); + + /* Clause IF EXISTS can be used yet (by pg_dump) */ + if (strncmp(te->dropStmt + desc_len + 5, " IF EXISTS", 9) != 0) + ahprintf(AH, "DROP %s IF EXISTS %s", + te->desc, + te->dropStmt + 6 + desc_len); + else + ahprintf(AH, "%s", te->dropStmt); + } + else + ahprintf(AH, "%s", te->dropStmt); +} } } @@ -2938,9 +2956,35 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH) strcmp(type, "OPERATOR CLASS") == 0 || strcmp(type, "OPERATOR FAMILY") == 0) { - /* Chop "DROP " off the front and make a modifiable copy */ - char *first = pg_strdup(te->dropStmt + 5); - char *last; + char *first; + char *last; + + /* try to search IF EXISTS in DROP command */ + if (strstr(te->dropStmt, "IF EXISTS") != NULL) + { + char buffer[40]; + size_t l; + + /* IF EXISTS clause should be optional, check it*/ + snprintf(buffer, sizeof(buffer), "DROP %s IF EXISTS", type); + l = strlen(buffer); + + if (strncmp(te->dropStmt, buffer, l) == 0) + { +/* append command type to target type */ +appendPQExpBufferStr(buf, type); + +/* skip first n chars, and create a modifieble copy */ +first = pg_strdup(te->dropStmt + l); + } + else +/* DROP IF EXISTS pattern is not appliable on dropStmt */ +first = pg_strdup(te->dropStmt + 5); + } + + else + /* IF EXISTS clause was not used, simple solution */ +
Re: [HACKERS] Logging WAL when updating hintbit
On Fri, Dec 20, 2013 at 9:06 AM, Alvaro Herrera wrote: > Michael Paquier escribió: >> On Fri, Dec 20, 2013 at 1:05 PM, Sawada Masahiko >> wrote: > >> > Sorry the patch which I attached has wrong indent on pg_controldata. >> > I have modified it and attached the new version patch. >> Now that you send this patch, I am just recalling some recent email >> from Tom arguing about avoiding to mix lower and upper-case characters >> for a GUC parameter name: >> http://www.postgresql.org/message-id/30569.1384917...@sss.pgh.pa.us >> >> To fullfill this requirement, could you replace walLogHints by >> wal_log_hints in your patch? Thoughts from others? > > The issue is with the user-visible variables, not with internal > variables implementing them. I think the patch is sane. (Other than > the fact that it was posted as a patch-on-patch instead of > patch-on-master). But spelling it the same way everywhere really improves greppability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera wrote: > Robert Haas escribió: >> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund >> wrote: >> >> Maybe what we should do is add a function something like >> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something >> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 >> >> int, hoff int). Or perhaps some slightly more cooked version of that >> >> information. And then delete the xmin, xmax, cmin, and cmax system >> >> columns. That'd save significantly on pg_attribute entries while, at >> >> the same time, actually providing more information than we do today. >> > >> > I was wondering whether we couldn't just pass pg_tuple_header() a whole >> > row, instead of having the user manually pass in reloid and ctid. I >> > think that should actually work in the interesting scenarios. >> >> I wondered that, too, but it's not well-defined for all tuples. What >> happens if you pass in constructed tuple rather than an on-disk tuple? > > I assume without checking that passing reloid/ctid would allow this to > work for tuples in a RETURNING clause; and if we ever have an OLD > reference for the RETURNING clause of an UPDATE, that it would work > there, too, showing the post-update status of the updated tuple. I don't understand what you're saying here. Are you saying that reloid/ctid is a better approach, a worse approach, or just a different approach? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
On Fri, Dec 20, 2013 at 1:11 PM, Andres Freund wrote: > On 2013-10-31 12:21:31 -0400, Robert Haas wrote: >> Patch #2, shm-toc-v1.patch, provides a facility for sizing a dynamic >> shared memory segment before creation, and for dividing it up into >> chunks after it's been created. It therefore serves a function quite >> similar to RequestAddinShmemSpace, except of course that there is only >> one main shared memory segment created at postmaster startup time, >> whereas new dynamic shared memory segments can come into existence on >> the fly; and it serves even more conspicuously the function of >> ShmemIndex, which enables backends to locate particular data >> structures within the shared memory segment. It is however quite a >> bit simpler than the ShmemIndex mechanism: we don't need the same >> level of extensibility here that we do for the main shared memory >> segment, because a new extension need not piggyback on an existing >> dynamic shared memory segment, but can create a whole segment of its >> own. > > So, without the argument of having per-extension dsm segments, I'd say > that a purely integer key sucks, because it's hard to manage and > debug. This way it's still not too nice, but I don't see a all that good > alternative. > > Comments about shm-toc-v1.patch: > > Since you're embedding spinlocks in struct shm_toc, this module will be > in conflict with platforms that do --disable-spinlocks, since the number > of spinlocks essentially needs to be predetermined there. I personally > still think the solution to that is getting rid of --disable-spinlocks. Yep. We can either deprecate --disable-spinlocks, or we can add an API that is the reverse of S_INIT_LOCK(). > I vote for removing all the shm_toc_estimator() knowledge from the > header, there seems little reason to expose it that way. That just > exposes unneccessary details and makes fixes after releases harder > (requiring extensions to recompile). Function call costs hardly can > matter, right? Oh, you're saying to make it a function rather than a macro? Sure, I could do that. > Do you assume that lookup speed isn't that important? I have a bit of a > hard time imagining that linear search over the keys isn't going to bite > us. At the least there should be a comment somewhere documenting that > the indented usecase is having a relatively few keys. Well, as previously noted, in the use cases I imagine for this, it'll be nworkers + somesmallconstant. I can add a comment to that effect. > Wouldn't it make sense to have a function that does the combined job of > shm_toc_insert() and shm_toc_allocate()? We could, but I don't really feel compelled. It's not hard to call them individually if that's the behavior you happen to want. > What's the assumption about the use of the magic in create/attach? Just > statically defined things in user code? Yeah. > Ok, cooking now, then I'll have a look at the queue itself, Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
Robert Haas escribió: > On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund wrote: > >> Maybe what we should do is add a function something like > >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something > >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 > >> int, hoff int). Or perhaps some slightly more cooked version of that > >> information. And then delete the xmin, xmax, cmin, and cmax system > >> columns. That'd save significantly on pg_attribute entries while, at > >> the same time, actually providing more information than we do today. > > > > I was wondering whether we couldn't just pass pg_tuple_header() a whole > > row, instead of having the user manually pass in reloid and ctid. I > > think that should actually work in the interesting scenarios. > > I wondered that, too, but it's not well-defined for all tuples. What > happens if you pass in constructed tuple rather than an on-disk tuple? I assume without checking that passing reloid/ctid would allow this to work for tuples in a RETURNING clause; and if we ever have an OLD reference for the RETURNING clause of an UPDATE, that it would work there, too, showing the post-update status of the updated tuple. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On Fri, Dec 20, 2013 at 2:05 PM, Sawada Masahiko wrote: > On Fri, Dec 20, 2013 at 3:38 AM, Sawada Masahiko > wrote: >> On Thu, Dec 19, 2013 at 12:37 PM, Amit Kapila >> wrote: >>> On Wed, Dec 18, 2013 at 11:30 AM, Michael Paquier >>> wrote: On Wed, Dec 18, 2013 at 11:22 AM, Amit Kapila wrote: > On Fri, Dec 13, 2013 at 7:57 PM, Heikki Linnakangas > wrote: >> Thanks, committed with some minor changes: > > Should this patch in CF app be moved to Committed Patches or is there > something left for this patch? Nothing has been forgotten for this patch. It can be marked as committed. >>> >>> Thanks for confirmation, I have marked it as Committed. >>> >> >> Thanks! >> >> I attached the patch which changes name from 'wal_log_hintbits' to >> 'wal_log_hints'. >> It gained the approval of plural. >> > > Sorry the patch which I attached has wrong indent on pg_controldata. > I have modified it and attached the new version patch. Thanks! Committed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
On 2013-10-31 12:21:31 -0400, Robert Haas wrote: > Patch #2, shm-toc-v1.patch, provides a facility for sizing a dynamic > shared memory segment before creation, and for dividing it up into > chunks after it's been created. It therefore serves a function quite > similar to RequestAddinShmemSpace, except of course that there is only > one main shared memory segment created at postmaster startup time, > whereas new dynamic shared memory segments can come into existence on > the fly; and it serves even more conspicuously the function of > ShmemIndex, which enables backends to locate particular data > structures within the shared memory segment. It is however quite a > bit simpler than the ShmemIndex mechanism: we don't need the same > level of extensibility here that we do for the main shared memory > segment, because a new extension need not piggyback on an existing > dynamic shared memory segment, but can create a whole segment of its > own. So, without the argument of having per-extension dsm segments, I'd say that a purely integer key sucks, because it's hard to manage and debug. This way it's still not too nice, but I don't see a all that good alternative. Comments about shm-toc-v1.patch: Since you're embedding spinlocks in struct shm_toc, this module will be in conflict with platforms that do --disable-spinlocks, since the number of spinlocks essentially needs to be predetermined there. I personally still think the solution to that is getting rid of --disable-spinlocks. I vote for removing all the shm_toc_estimator() knowledge from the header, there seems little reason to expose it that way. That just exposes unneccessary details and makes fixes after releases harder (requiring extensions to recompile). Function call costs hardly can matter, right? Do you assume that lookup speed isn't that important? I have a bit of a hard time imagining that linear search over the keys isn't going to bite us. At the least there should be a comment somewhere documenting that the indented usecase is having a relatively few keys. Wouldn't it make sense to have a function that does the combined job of shm_toc_insert() and shm_toc_allocate()? What's the assumption about the use of the magic in create/attach? Just statically defined things in user code? Ok, cooking now, then I'll have a look at the queue itself, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XML Issue with DTDs
On Thu, Dec 19, 2013 at 6:40 PM, Florian Pflug wrote: > While looking into ways to implement a XMLSTRIP function which extracts the > textual contents of an XML value and de-escapes them (i.e. > Solving this > seems a bit messy, unfortunately. First, I think we need to have some > XMLOPTION value which is a superset of all the others - otherwise, dump & > restore won't work reliably. That means either allowing DTDs if XMLOPTION is > CONTENT, or inventing a third XMLOPTION, say ANY. Or we can just decide that it was a bug that this was ever allowed, and if you upgrade to $FIXEDVERSION you'll need to sanitize your data. This is roughly what we did with encoding checks. > We then need to ensure that combining XML values yields something that is > valid according to the most general XMLOPTION setting. That means either > > (1) Removing the DTD from all but the first argument to XMLCONCAT, and > similarly all but the first value passed to XMLAGG > > or > > (2) Complaining if these values contain a DTD. > > or > > (3) Allowing multiple DTDs in a document if XMLOPTION is, say, ANY. > > I'm not in favour of (3), since clients are unlikely to be able to process > such a value. (1) matches how we currently handle XML declarations ( …?>), so I'm slightly in favour of that. I don't like #3, mostly because I don't like XMLOPTION ANY in the first place. Either #1 or #2 sounds OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make_interval ??
Pavel, > So constructor should to look like: > > CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int > DEFAULT 0, ...) > > and usage: > > SELECT make_interval(years := 2) > SELECT make_interval(days := 14) > > Is there a interest for this (or similar) function? It would certainly make our Python users happy. And for that matter would get rid of this kind of stupid thing in stored procedure code: time_ahead := ( interval '1 minute' * var_skip ); So, +1 for the feature. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Make various variables read-only (const)
This allows the variables to be moved from .data to .rodata section which means that more data can be shared by processes and makes sure that nothing can accidentally overwrite the read-only definitions. On a x86-64 Linux system this moves roughly 9 kilobytes of previously writable data to the read-only data segment in the backend and 4 kilobytes in libpq. https://github.com/saaros/postgres/compare/constify 24 files changed, 108 insertions(+), 137 deletions(-) / Oskari >From 89f0348747b356eaccf5edc2f85bf47d0a35c4f4 Mon Sep 17 00:00:00 2001 From: Oskari Saarenmaa Date: Fri, 20 Dec 2013 18:38:10 +0200 Subject: [PATCH] Make various variables read-only (const) This allows the variables to be moved from .data to .rodata section which means that more data can be shared by processes and makes sure that nothing can accidentally overwrite the read-only definitions. On a x86-64 Linux system this moves roughly 9 kilobytes of previously writable data to the read-only data segment in the backend and 4 kilobytes in libpq. --- src/backend/access/common/reloptions.c | 31 src/backend/catalog/objectaddress.c| 4 +- src/backend/commands/conversioncmds.c | 2 +- src/backend/commands/createas.c| 3 +- src/backend/commands/tablecmds.c | 8 ++-- src/backend/optimizer/path/costsize.c | 2 +- src/backend/regex/regc_lex.c | 4 +- src/backend/regex/regcomp.c| 2 +- src/backend/regex/regerror.c | 6 +-- src/backend/tcop/utility.c | 3 +- src/backend/tsearch/wparser_def.c | 4 +- src/backend/utils/adt/datetime.c | 6 +-- src/backend/utils/adt/formatting.c | 67 +++--- src/backend/utils/adt/tsrank.c | 16 src/backend/utils/mb/encnames.c| 31 src/backend/utils/mb/mbutils.c | 8 ++-- src/backend/utils/mb/wchar.c | 2 +- src/common/relpath.c | 8 ++-- src/include/access/reloptions.h| 5 +-- src/include/common/relpath.h | 3 +- src/include/mb/pg_wchar.h | 18 - src/include/optimizer/cost.h | 2 +- src/include/utils/datetime.h | 2 + src/port/chklocale.c | 8 ++-- 24 files changed, 108 insertions(+), 137 deletions(-) diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 31941e9..5ec617b 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -564,18 +564,17 @@ add_string_reloption(bits32 kinds, char *name, char *desc, char *default_val, * If ignoreOids is true, then we should ignore any occurrence of "oids" * in the list (it will be or has been handled by interpretOidsOption()). * - * Note that this is not responsible for determining whether the options - * are valid, but it does check that namespaces for all the options given are - * listed in validnsps. The NULL namespace is always valid and need not be - * explicitly listed. Passing a NULL pointer means that only the NULL - * namespace is valid. + * Note that this is not responsible for determining whether the options are + * valid, but it does check that namespaces for all the options given + * matches validnsp. The NULL namespace is always valid. Passing a NULL + * valinsp means that only the NULL namespace is valid. * * Both oldOptions and the result are text arrays (or NULL for "default"), * but we declare them as Datums to avoid including array.h in reloptions.h. */ Datum -transformRelOptions(Datum oldOptions, List *defList, char *namspace, - char *validnsps[], bool ignoreOids, bool isReset) +transformRelOptions(Datum oldOptions, List *defList, const char *namspace, + const char *validnsp, bool ignoreOids, bool isReset) { Datum result; ArrayBuildState *astate; @@ -667,23 +666,7 @@ transformRelOptions(Datum oldOptions, List *defList, char *namspace, */ if (def->defnamespace != NULL) { - boolvalid = false; - int i; - - if (validnsps) - { - for (i = 0; validnsps[i]; i++) - { - if (pg_strcasecmp(def->defnamespace, - validnsps[i]) == 0) - { - valid = true; - break; - } - } - } - -
Re: [HACKERS] GIN improvements part 1: additional information
On 12/19/2013 10:44 AM, Heikki Linnakangas wrote: On 12/19/2013 08:37 AM, Oleg Bartunov wrote: Guys, before digging deep into the art of comp/decomp world I'd like to know if you familiar with results of http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and some newer research ? Yeah, I saw that paper. Do we agree in what we really want ? Basically, there are three main features: size, compression, decompression speed - we should take two :) According to that Zhang et al paper you linked, the Vbyte method actually performs the worst on all of those measures. The other algorithms are quite similar in terms of size (PForDelta being the most efficient), while PForDelta is significantly faster to compress/decompress. Just by looking at those numbers, PForDelta looks like a clear winner. However, it operates on much bigger batches than the other algorithms; I haven't looked at it in detail, but Zhang et al used 128 integer batches, and they say that 32 integers is the minimum batch size. If we want to use it for the inline posting lists stored in entry tuples, that would be quite wasteful if there are only a few item pointers on the tuple. Also, in the tests I've run, the compression/decompression speed is not a significant factor in total performance, with either varbyte encoding or Simple9-like encoding I hacked together. One disadvantage of Simple9 and other encodings that operate in batches is that removing a value from the middle can increase the number of bytes required for the remaining values. That's a problem during vacuum; it's possible that after vacuuming away one item pointer, the remaining items no longer fit on the page. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR during end-of-xact/FATAL
Robert Haas escribió: > On Thu, Nov 28, 2013 at 10:10 AM, Alvaro Herrera > wrote: > > Robert Haas escribió: > >> On Wed, Nov 6, 2013 at 9:40 AM, Alvaro Herrera > >> wrote: > >> > Noah Misch wrote: > >> >> Incomplete list: > >> >> > >> >> - If smgrDoPendingDeletes() finds files to delete, mdunlink() and its > >> >> callee > >> >> relpathbackend() call palloc(); this is true in all supported > >> >> branches. In > >> >> 9.3, due to commit 279628a0, smgrDoPendingDeletes() itself calls > >> >> palloc(). > >> >> (In fact, it does so even when the pending list is empty -- this is > >> >> the only > >> >> palloc() during a trivial transaction commit.) palloc() failure there > >> >> yields a PANIC during commit. > >> > > >> > I think we should fix this routine to avoid the palloc when not > >> > necessary. > >> > That initial palloc is pointless. > > > > Here's a trivial patch we could apply to 9.3 immediately. Anything else > > such as the ideas proposed below would require more effort than anyone > > can probably spend here soon. > > Yeah, this seems like a good thing to do for now. Pushed, thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fw: LSF/MM 2014 Call For Proposals
During the direct I/O discussion I'd suggested that somebody from the PostgreSQL community might want to put in an appearance at the LSFMM summit in March. Here's the CFP. My guess is that a proposal for a session on avoiding performance regressions for systems like PostgreSQL, probably crossing all three tracks, would be well received. Thanks, jon Begin forwarded message: Date: Fri, 20 Dec 2013 09:30:22 + From: Mel Gorman To: linux-s...@vger.kernel.org, linux-...@vger.kernel.org, linux...@kvack.org, linux-fsde...@vger.kernel.org Cc: linux-ker...@vger.kernel.org, lsf...@lists.linux-foundation.org Subject: LSF/MM 2014 Call For Proposals The annual Linux Storage, Filesystem and Memory Management Summit for 2014 will be held on March 24th and 25th before the Linux Foundation Collaboration summit at The Meritage Resort, Napa Valley, CA. http://events.linuxfoundation.org/events/linux-storage-filesystem-and-mm-summit http://events.linuxfoundation.org/events/collaboration-summit Note that we are running LSF/MM a little earlier in 2014 than in previous years. On behalf of the committee I would like to issue a call for agenda proposals that are suitable for cross-track discussion as well as more technical subjects for discussion in the breakout sessions. 1) Suggestions for agenda topics should be sent before January 31st 2014 to: lsf...@lists.linux-foundation.org and cc the Linux list or lists that are most interested in it: ATA: linux-...@vger.kernel.org FS: linux-fsde...@vger.kernel.org MM: linux...@kvack.org SCSI: linux-s...@vger.kernel.org People who need more time for visa applications should send proposals before January 15th. The committee will complete the first round of selections on that date to accommodate applications. Please remember to tag your subject with [LSF/MM TOPIC] to make it easier to track. Agenda topics and attendees will be selected by the program committee, but the final agenda will be formed by consensus of the attendees on the day. We will try to cap attendance at around 25-30 per track to facilitate discussions although the final numbers will depend on the room sizes at the venue. 2) Requests to attend the summit should be sent to: lsf...@lists.linux-foundation.org Please summarize what expertise you will bring to the meeting, and what you would like to discuss. Please also tag your email with [LSF/MM ATTEND] so there is less chance of it getting lost in the large mail pile. Presentations are allowed to guide discussion, but are strongly discouraged. There will be no recording or audio bridge. However, we expect that written minutes will be published as we did in previous years 2013: http://lwn.net/Articles/548089/ 2012: http://lwn.net/Articles/490114/ http://lwn.net/Articles/490501/ 2011: http://lwn.net/Articles/436871/ http://lwn.net/Articles/437066/ 3) If you have feedback on last year's meeting that we can use to improve this year's, please also send that to: lsf...@lists.linux-foundation.org Thank you on behalf of the program committee: Storage: James Bottomley Martin K. Petersen Filesystems: Trond Myklebust Jeff Layton Dave Chinner Jan Kara Ted Ts'o MM: Rik van Riel Michel Lespinasse -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] psql \copy doesn't end if backend is killed
Hello, I've encountered a bug on PG 9.2 and fixed it for 9.4. Please find attached the patch. I'd like it to be backported to at least 9.2. [Problem] If the backend is terminated with SIGKILL while psql is running "\copy table_name from file_name", the \copy didn't end forever. I expected \copy to be cancelled because the corresponding server process vanished. [Cause] psql could not get out of the loop below in handleCopyIn(): while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN) { OK = false; PQclear(res); PQputCopyEnd(pset.db, _("trying to exit copy mode")); } This situation is reached as follows: 1. handleCopyIn() calls PQputCopyData(). 2. PQputCopyData() calls pqPutMsgEnd(). 3. pqPutMsgEnd() calls pqFlush(), which calls pqSendSome(). 4. pqSendSome() calls pqReadData(). 5. At this moment, the backend is killed with SIGKILL. 6. pqReadData() fails to read the socket, receiving ECONNRESET. It closes the socket. 7. As a result, PQputCopyData() fails in 2. 8. handleCopyIn() then calls PQputCopyEnd(). 9. PQputCopyEnd() calls pqPutMsgENd(), which calls pqFlush(), which in turn calls pqSendSome(). 10. pqSendSome() fails because the socket is not open. 11. As a result, PQputCopyENd() returns an error, leaving conn->asyncStatus PGASYNC_COPY_IN. 12. Because conn->asyncStatus remains PGASYNC_COPY_IN, PQgetResult() continues to return pgresult whose status is PGRES_COPY_IN. [Fix] If the message transmission fails in PQputCopyEnd(), switch conn->asyncStatus back to PGASYNC_BUSY. That causes PQgetResult() to try to read data with pqReadData(). pqReadData() fails and PQgetResult() returns NULL. As a consequence, the loop in question terminates. Regards MauMau failed_copy_loops.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
Michael Paquier escribió: > On Fri, Dec 20, 2013 at 1:05 PM, Sawada Masahiko > wrote: > > Sorry the patch which I attached has wrong indent on pg_controldata. > > I have modified it and attached the new version patch. > Now that you send this patch, I am just recalling some recent email > from Tom arguing about avoiding to mix lower and upper-case characters > for a GUC parameter name: > http://www.postgresql.org/message-id/30569.1384917...@sss.pgh.pa.us > > To fullfill this requirement, could you replace walLogHints by > wal_log_hints in your patch? Thoughts from others? The issue is with the user-visible variables, not with internal variables implementing them. I think the patch is sane. (Other than the fact that it was posted as a patch-on-patch instead of patch-on-master). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [bug fix] multibyte messages are displayed incorrectly on the client
Noah Misch escribió: > On Tue, Dec 17, 2013 at 01:42:08PM -0500, Bruce Momjian wrote: > > On Fri, Dec 13, 2013 at 10:41:17PM +0900, MauMau wrote: > > > [Fix] > > > Disable message localization during session startup. In other > > > words, messages are output in English until the database session is > > > established. > > > > I think the question is whether the server encoding or English are > > likely to be better for the average client. My bet is that the server > > encoding is more likely correct. > > > > However, you are right that English/ASCII at least will always be > > viewable, while there are many server/client combinations that will > > produce unreadable characters. > > > > I would be interested to hear other people's experience with this. > > I don't have a sufficient sense of multilingualism among our users to know > whether English/ASCII messages would be more useful, on average, than > localized messages in the server encoding. Forcing English/ASCII does worsen > behavior in the frequent situation where client encoding will match server > encoding. I lean toward retaining the status quo of delivering localized > messages in the server encoding. The problem is that if there's an encoding mismatch, the message might be impossible to figure out. If the message is in english, at least it can be searched for in the web, or something -- the user might even find a page in which the english error string appears, with a native language explanation. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
Haribabu kommi escribió: > From the compilation I observed as libpgcommon is linking while building ecpg. > I tested the same by using psprintf directly in ecpg code. > > I modified the libecpg's Makefile as suggested by you which is attached in > the mail, > Still the errors are occurring. Please let me know is there anything missed? I don't know what's the cause of the error you're seeing, but IIRC you can't have a file in src/port depend on src/common functionality (which psprintf is IIRC). So you need to fix things up so that the psprintf() doesn't occur in src/port at all. Not sure what's the best way to go about this; perhaps the whole new function should be in src/common; or perhaps you need part of it in src/port and the bits with the funny error reporting in src/common, where psprintf can be called without issue. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
Robert Haas escribió: > Maybe what we should do is add a function something like > pg_tuple_header(tableoid, ctid) that returns a record, maybe something > like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 > int, hoff int). Or perhaps some slightly more cooked version of that > information. And then delete the xmin, xmax, cmin, and cmax system > columns. That'd save significantly on pg_attribute entries while, at > the same time, actually providing more information than we do today. +1 for this general idea. I proposed this some time ago and got shot down because of pageinspect. I don't know about taking the system columns out completely -- not sure how much third party code we're going to break that way, certainly a lot -- but the extra functionality would be useful nonetheless. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 2013-12-20 07:58:46 -0500, Robert Haas wrote: > I think the immediate problem is to decide whether this patch ought to > make the xmin column display the result of GetXmin() or GetRawXmin(). > Thoughts on that? I slightly favor GetRawXmin(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund wrote: > On 2013-12-20 07:47:17 -0500, Robert Haas wrote: >> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund >> wrote: >> >> Maybe what we should do is add a function something like >> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something >> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 >> >> int, hoff int). Or perhaps some slightly more cooked version of that >> >> information. And then delete the xmin, xmax, cmin, and cmax system >> >> columns. That'd save significantly on pg_attribute entries while, at >> >> the same time, actually providing more information than we do today. >> > >> > I was wondering whether we couldn't just pass pg_tuple_header() a whole >> > row, instead of having the user manually pass in reloid and ctid. I >> > think that should actually work in the interesting scenarios. >> >> I wondered that, too, but it's not well-defined for all tuples. What >> happens if you pass in constructed tuple rather than an on-disk tuple? > > Those should be discernible I think, t_self/t_tableOid won't be set for > generated tuples. Hmm. That might work. I think the immediate problem is to decide whether this patch ought to make the xmin column display the result of GetXmin() or GetRawXmin(). Thoughts on that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 2013-12-20 07:47:17 -0500, Robert Haas wrote: > On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund wrote: > >> Maybe what we should do is add a function something like > >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something > >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 > >> int, hoff int). Or perhaps some slightly more cooked version of that > >> information. And then delete the xmin, xmax, cmin, and cmax system > >> columns. That'd save significantly on pg_attribute entries while, at > >> the same time, actually providing more information than we do today. > > > > I was wondering whether we couldn't just pass pg_tuple_header() a whole > > row, instead of having the user manually pass in reloid and ctid. I > > think that should actually work in the interesting scenarios. > > I wondered that, too, but it's not well-defined for all tuples. What > happens if you pass in constructed tuple rather than an on-disk tuple? Those should be discernible I think, t_self/t_tableOid won't be set for generated tuples. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund wrote: >> Maybe what we should do is add a function something like >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 >> int, hoff int). Or perhaps some slightly more cooked version of that >> information. And then delete the xmin, xmax, cmin, and cmax system >> columns. That'd save significantly on pg_attribute entries while, at >> the same time, actually providing more information than we do today. > > I was wondering whether we couldn't just pass pg_tuple_header() a whole > row, instead of having the user manually pass in reloid and ctid. I > think that should actually work in the interesting scenarios. I wondered that, too, but it's not well-defined for all tuples. What happens if you pass in constructed tuple rather than an on-disk tuple? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 2013-12-20 07:12:01 -0500, Robert Haas wrote: > I think the root of the problem is that nobody's very eager to add > more hidden system catalog columns because each one bloats > pg_attribute significantly. I think that part is actually solveable - there's no real need for them to be real columns, present in pg_attribute, things could easily enough be setup during parse analysis. The bigger problem I see is that adding more useful columns will cause name clashes, which will probably prohibit improvements in that direction. > Maybe what we should do is add a function something like > pg_tuple_header(tableoid, ctid) that returns a record, maybe something > like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 > int, hoff int). Or perhaps some slightly more cooked version of that > information. And then delete the xmin, xmax, cmin, and cmax system > columns. That'd save significantly on pg_attribute entries while, at > the same time, actually providing more information than we do today. I was wondering whether we couldn't just pass pg_tuple_header() a whole row, instead of having the user manually pass in reloid and ctid. I think that should actually work in the interesting scenarios. > pageinspect is useful, too, but it seems to deal mostly with pages, so > I'm not sure it's a natural substitute for something like this. Agreed. I also think that we really need something to investigate problems in core, not in a contrib module people won't have installed, especially in bigger setups. That said I plan to either submit some improvements to pageinspect myself, or convince somebody else to do so in the not too far away future. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
From: "Fujii Masao" ! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested) Even when standby_mode is not enabled, we can use cascade replication and it needs the accumulated WAL files. So I think that AllowCascadeReplication() should be added into this condition. ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG"); ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); ! ! if (restoredFromArchive) Don't we need to check !StandbyModeRequested and !AllowCascadeReplication() here? Oh, you are correct. Okay, done. ! /* !* If the latest segment is not archival, but there's still a !* RECOVERYXLOG laying about, get rid of it. !*/ ! unlink(recoveryPath); /* ignore any error */ The similar line exists in the lower side of exitArchiveRecovery(), so ISTM that you can refactor that. That's already done in the previous patch: deletion of RECOVERYXLOG was moved into else clause, as in: - /* - * Since there might be a partial WAL segment named RECOVERYXLOG, get rid - * of it. - */ - snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG"); - unlink(recoveryPath); /* ignore any error */ - Regards MauMau wal_increase_in_pitr_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Thu, Dec 19, 2013 at 9:22 PM, Alvaro Herrera wrote: > Jim Nasby escribió: >> One thing users will lose in this patch is the ability to reliably see if a >> tuple is frozen via SQL. Today you can do that just by selecting xmin from >> the table. >> >> Obviously people don't generally need to do that... but it's one of those >> things that when you do need it it's incredibly handy to have... would it be >> difficult to expose infomask(2) via SQL, the same way xmin et all are? > > It's already exposed via the pageinspect extension. It's doubtful that > there are many valid cases where you need infomask but don't have access > to that module. > > The real fix seems to ensure that the module is always available. We could make the code that displays this column do GetXmin rather than GetRawXmin, which would allow this question to be answered, but lose the ability to find the original xmin once the tuple is frozen and break precedent with the xmax and cmin/cmax fields, which return the "raw" value from the tuple header. But I'm OK to go whichever direction people prefer. I think the root of the problem is that nobody's very eager to add more hidden system catalog columns because each one bloats pg_attribute significantly. Currently, we have xmin, xmax, cmin, and cmax columns, and that's basically just a historical artifact at this point. cmin and cmax always return the same value, and the value returned may be neither a cmin nor a cmax but a combo CID. And if you're looking for information that's only in the infomask, good luck. Maybe what we should do is add a function something like pg_tuple_header(tableoid, ctid) that returns a record, maybe something like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2 int, hoff int). Or perhaps some slightly more cooked version of that information. And then delete the xmin, xmax, cmin, and cmax system columns. That'd save significantly on pg_attribute entries while, at the same time, actually providing more information than we do today. pageinspect is useful, too, but it seems to deal mostly with pages, so I'm not sure it's a natural substitute for something like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: "Amit Kapila" Few other points: - 1. #ifdef WIN32 /* Get event source from postgresql.conf for eventlog output */ get_config_value("event_source", event_source, sizeof(event_source)); #endif event logging is done for both win32 and cygwin env. under hash define (Win32 || cygwin), so event source name should also be retrieved for both environments. Refer below in code: #if defined(WIN32) || defined(__CYGWIN__) static void write_eventlog(int level, const char *line) 2. Docs needs to be updated for default value: http://www.postgresql.org/docs/devel/static/event-log-registration.html http://www.postgresql.org/docs/devel/static/runtime-config-logging.html#GUC-EVENT-SOURCE Okay, done. Thanks. I'll update the commitfest entry this weekend. Regards MauMau pg_ctl_eventsrc_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 20 December 2013 02:02 Bruce Momjian wrote: > On Thu, Dec 19, 2013 at 05:14:50AM +, Haribabu kommi wrote: > > On 19 December 2013 05:31 Bruce Momjian wrote: > > > On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote: > > > > The make_absolute_path() function moving to port is changed in > > > similar > > > > way as Bruce Momjian approach. The psprintf is used to store the > > > error > > > > string which Occurred in the function. But psprintf is not used > > > > for storing the absolute path As because it is giving problems in > > > > freeing > > > the allocated memory in SelectConfigFiles. > > > > Because the same memory is allocated in a different code branch > > > > from > > > guc_malloc. > > > > > > > > After adding the make_absolute_path() function with psprintf > stuff > > > > in path.c file It is giving linking problem in compilation of > > > > ecpg. I am > > > not able to find the problem. > > > > So I added another file abspath.c in port which contains these > two > > > functions. > > > > > > What errors are you seeing? > > > > If I move the make_absolute_path function from abspath.c to path.c, I > > was getting following linking errors while compiling "ecpg". > > > > ../../../../src/port/libpgport.a(path.o): In function > `make_absolute_path': > > /home/hari/postgres/src/port/path.c:795: undefined reference to > `psprintf' > > /home/hari/postgres/src/port/path.c:809: undefined reference to > `psprintf' > > /home/hari/postgres/src/port/path.c:818: undefined reference to > `psprintf' > > /home/hari/postgres/src/port/path.c:830: undefined reference to > `psprintf' > > collect2: ld returned 1 exit status > > make[4]: *** [ecpg] Error 1 > > make[3]: *** [all-preproc-recurse] Error 2 > > make[2]: *** [all-ecpg-recurse] Error 2 > > make[1]: *** [all-interfaces-recurse] Error 2 > > make: *** [all-src-recurse] Error 2 > > You didn't show the actual command that is generating the error, but I > assume it is linking ecpg, not creating libecpg. I think the issue is > that path.c is specially handled when it is included in libecpg. Here > is a comment from the libecpg Makefile: > > # We use some port modules verbatim, but since we need to > # compile with appropriate options to build a shared lib, we > can't > # necessarily use the same object files as the backend uses. > Instead, > # symlink the source files in here and build our own object file. > > My guess is that libecpg isn't marked as linking to libpgcommon, and > when you called psprintf in path.c, it added a libpgcommon link > requirement. > > My guess is that if you compiled common/psprintf.c like port/path.c in > libecpg's Makefile, it would link fine. Sorry for not mentioning the command, yes it is giving problem while linking ecpg. >From the compilation I observed as libpgcommon is linking while building ecpg. I tested the same by using psprintf directly in ecpg code. I modified the libecpg's Makefile as suggested by you which is attached in the mail, Still the errors are occurring. Please let me know is there anything missed? Regards, Hari babu. make_abs_path_v3.patch Description: make_abs_path_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] make_interval ??
Hello we have defined interface date, time, timestamp constructors. There is a question if we would to have some similar for interval type? As different from time, timestamp there we can use a zero as defaults. So constructor should to look like: CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int DEFAULT 0, ...) and usage: SELECT make_interval(years := 2) SELECT make_interval(days := 14) Is there a interest for this (or similar) function? Regards Pavel