Re: [PATCHES] still alive?
Bernd Helmle wrote: --On Donnerstag, September 11, 2008 15:39:01 +0300 Peter Eisentraut [EMAIL PROTECTED] wrote: Anyone who thinks the patches list should remain as separate from hackers, shout now (with rationale)! Seems i've missed something, what's then supposed to hold patches now? Just send them to pgsql-hackers. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Also, it would be nice to use B-M(-H) for LIKE as well. Right offhand, that seems impossible, at least in patterns with %. Or were you thinking of trying to separate out the fixed substrings of a pattern and search for them with BMH? Yep, something like that. Even if it only handled the special case of '%foobar%', that would be nice, because that's a pretty common special case. Anyway, it's not material for this patch, since it'd involve pretty fundamental redesign of the LIKE code. Yep. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs wrote: On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: It turns out that a join like this select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; can be cheaper if we don't remove the join, when there is an index on a.col1 and b.col2, because the presence of b allows the values returned from b to be used for an index scan on a. Umm, you *can't* remove that join. Yes, you can. The presence or absence of rows in b is not important to the result of the query because of the left outer join. I spent nearly a whole day going down that deadend also. Oh. How does the query look like after removing the join, then? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs wrote: select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; is logically equivalent to select a.col2 from a; No, it's not: postgres=# CREATE TABLE a (col1 int4, col2 int4); CREATE TABLE postgres=# CREATE TABLE b (col1 int4, col2 int4); CREATE TABLE postgres=# INSERT INTO a VALUES (1,1); INSERT 0 1 postgres=# select a.col2 from a; col2 -- 1 (1 row) postgres=# select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; col2 -- (0 rows) But anyway, Greg's example looks valid, and proves the point that removing a join isn't always a win. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Gregory Stark wrote: I wonder if it would be more worthwhile to remove them and have a subsequent phase where we look for possible joins to *add*. So even if the user writes select * from invoices where customer_id=? the planner might be able to discover that it can find those records quicker by scanning customer, finding the matching company_id,customer_id and then using an index to look them up in invoices. Yeah, that would be cool. The question is whether it's worth the additional overhead in planner, compared to the gain in the rare case that it's applicable. That's always the thing with planner tricks like this. I think we'll eventually need some sort of tuning knob to control how hard the planner tries to apply different optimizations like that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs wrote: It seems I wrote my original tests using and instead of where and hadn't noticed the distinction. Thanks for helping me catch that error. Ah, yeah, that's a big difference. Proving correctness is hard, but to refute something you need just one test case that fails ;-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] fixing bug in combocid.c
Karl Schnaitter wrote: This patch is for a bug in backend/utils/time/combocid.c. In HeapTupleHeaderAdjustCmax, the code makes the incorrect assumption that the raw command id corresponds to cmin, when the raw command id can in fact be a combo cid. It needs to be fixed to call HeapTupleHeaderGetCmin to get cmin. That's all there is to my patch, but read on for more details. Yep. Patch committed. Thanks for the report! This is my first patch; I hope I did it right! You did well :-). No need to tar or gzip small patches like that, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS][PATCHES] odd output in restore mode
Andrew Dunstan wrote: Greg Smith wrote: On Wed, 23 Jul 2008, Kevin Grittner wrote: In our scripts we handle this by copying to a temp directory on the same mount point as the archive directory and doing a mv to the archive location when the copy is successfully completed. I think that this even works on Windows. Could that just be documented as a strong recommendation for the archive script? This is exactly what I always do. I think the way cp is shown in the examples promotes what's really a bad practice for lots of reasons, this particular problem being just one of them. I've been working on an improved archive_command shell script that I expect to submit for comments and potential inclusion in the documentation as a better base for other people to build on. This is one of the options for how it can operate. It would be painful but not impossible to convert a subset of that script to run under Windows as well, at least enough to cover this particular issue. A Perl script using the (standard) File::Copy module along with the builtin function rename() should be moderately portable. It would to be nice not to have to maintain two scripts. It's also not very nice to require a Perl installation on Windows, just for a replacement of Copy. Would a simple .bat script work? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS][PATCHES] odd output in restore mode
Andrew Dunstan wrote: Kevin Grittner wrote: Heikki Linnakangas [EMAIL PROTECTED] wrote: We really need a more reliable way of detecting that a file has been fully copied. In our scripts we handle this by copying to a temp directory on the same mount point as the archive directory and doing a mv to the archive location when the copy is successfully completed. I think that this even works on Windows. Could that just be documented as a strong recommendation for the archive script? Needs testing at least. If it does in fact work then we can just adjust the docs and be done Yeah. - or maybe provide a .bat file or perl script that would work as na archive_command on Windows. We're not talking about archive_command. We're talking about the thing that copies files to the directory that pg_standby polls. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS][PATCHES] odd output in restore mode
Andrew Dunstan wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: - or maybe provide a .bat file or perl script that would work as na archive_command on Windows. We're not talking about archive_command. We're talking about the thing that copies files to the directory that pg_standby polls. Er, that's what the archive_command is. Look at the pg_standby docs and you'll see that that's where we're currently recommending use of windows copy. Perhaps you're confusing this with the restore_command? Oh, right. I was thinking that archive_command copies the files to an archive location, and there's yet another process copying files from there to the directory pg_standby polls. But indeed in the simple configuration, archive_command is the command that we're interested in. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS][PATCHES] odd output in restore mode
Simon Riggs wrote: On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: 8. Unresolved question of implementing now/later a cp replacement The patch implements what's been agreed. I'm not rewriting cp, for reasons already discussed. Not a comment to you Martin, but it's fairly clear that I'm not maintaining this correctly for Windows. I've never claimed to have tested this on Windows, and only included Windows related items as requested by others. I need to make it clear that I'm not going to maintain it at all, for Windows. If others wish to report Windows issues then they can suggest appropriate fixes and test them also. Hmm. I just realized that replacing the cp command within pg_standby won't help at all. The problem is with the command that copies the files *to* the archivelocation that pg_standby polls, not with the copy pg_standby does from archivelocation to pg_xlog. And we don't have much control over that. We really need a more reliable way of detecting that a file has been fully copied. One simple improvement would be to check the xlp_magic field of the last page, though it still wouldn't be bullet-proof. Do the commands that preallocate the space keep the file exclusively locked during the copy? If they do, shouldn't we get an error in trying to run the restore copy command, and retry after the 1s sleep in RestoreWALFileForRecovery? Though if the archive location is a samba mount or something, I guess we can't rely on Windows-style exclusive locking. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Zdenek Kotala wrote: Pavan Deolasee napsal(a): On Fri, Jul 4, 2008 at 4:25 PM, Heikki Linnakangas [EMAIL PROTECTED] wrote: No, there's a itemsz = MAXALIGN(itemsz) call before the check against HashMaxItemSize. Ah, right. Still should we just not MAXALIGN_DOWN the Max size to reflect the practical limit on the itemsz ? It's more academical though, so not a big deal. Finally I use following formula: #define HashMaxItemSize(page) \ MAXALIGN_DOWN(PageGetPageSize(page) - \ ( SizeOfPageHeaderData + sizeof(ItemIdData) ) - \ MAXALIGN(sizeof(HashPageOpaqueData)) ) I did not replace PageGetPageSize(page), because other *MaxItemSize has same interface. Ok, fair enough. There's another academical discrepancy between these two hunks: *** src/backend/access/hash/hashpage.c 12 May 2008 00:00:44 - 1.75 --- src/backend/access/hash/hashpage.c 9 Jul 2008 11:30:09 - *** *** 407,413 for (i = _hash_log2(metap-hashm_bsize); i 0; --i) { if ((1 i) = (metap-hashm_bsize - ! (MAXALIGN(sizeof(PageHeaderData)) + MAXALIGN(sizeof(HashPageOpaqueData) break; } --- 407,413 for (i = _hash_log2(metap-hashm_bsize); i 0; --i) { if ((1 i) = (metap-hashm_bsize - ! (MAXALIGN(SizeOfPageHeaderData) + MAXALIGN(sizeof(HashPageOpaqueData) break; } and *** src/include/access/hash.h 19 Jun 2008 00:46:05 - 1.88 --- src/include/access/hash.h 9 Jul 2008 11:30:10 - *** *** 192,198 #define BMPG_SHIFT(metap) ((metap)-hashm_bmshift) #define BMPG_MASK(metap) (BMPGSZ_BIT(metap) - 1) #define HashPageGetBitmap(pg) \ ! ((uint32 *) (((char *) (pg)) + MAXALIGN(sizeof(PageHeaderData /* * The number of bits in an ovflpage bitmap word. --- 191,197 #define BMPG_SHIFT(metap) ((metap)-hashm_bmshift) #define BMPG_MASK(metap) (BMPGSZ_BIT(metap) - 1) #define HashPageGetBitmap(pg) \ ! ((uint32 *) (((char *) (pg)) + MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData /* * The number of bits in an ovflpage bitmap word. I admit I don't understand what that bitmap is, it looks like we're assuming it can take up all the space between page header and the special portion, without any line pointers, but in HashPageGetBitmap, we're reserving space for one pointer. It looks like the actual size of the bitmap is only the largest power of 2 below the maximum size, so that won't be an issue in practice. That macro is actually doing the same thing as PageGetContents, so I switched to using that. As that moves the data sligthly on those bitmap pages, I guess we'll need a catversion bump. Attached is an updated patch. I also fixed some whitespace and comments that were no longer valid. How does it look to you now? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ? DEADJOE ? GNUmakefile ? buildlog ? config.log ? config.status ? page_04-heikki-1.patch ? contrib/pg_standby/.deps ? contrib/pg_standby/pg_standby ? contrib/spi/.deps ? doc/src/sgml/cvsmsg ? src/Makefile.global ? src/backend/postgres ? src/backend/access/common/.deps ? src/backend/access/gin/.deps ? src/backend/access/gist/.deps ? src/backend/access/hash/.deps ? src/backend/access/heap/.deps ? src/backend/access/index/.deps ? src/backend/access/nbtree/.deps ? src/backend/access/transam/.deps ? src/backend/bootstrap/.deps ? src/backend/catalog/.deps ? src/backend/catalog/postgres.bki ? src/backend/catalog/postgres.description ? src/backend/catalog/postgres.shdescription ? src/backend/commands/.deps ? src/backend/executor/.deps ? src/backend/lib/.deps ? src/backend/libpq/.deps ? src/backend/main/.deps ? src/backend/nodes/.deps ? src/backend/optimizer/geqo/.deps ? src/backend/optimizer/path/.deps ? src/backend/optimizer/plan/.deps ? src/backend/optimizer/prep/.deps ? src/backend/optimizer/util/.deps ? src/backend/parser/.deps ? src/backend/port/.deps ? src/backend/postmaster/.deps ? src/backend/regex/.deps ? src/backend/rewrite/.deps ? src/backend/snowball/.deps ? src/backend/snowball/snowball_create.sql ? src/backend/storage/buffer/.deps ? src/backend/storage/file/.deps ? src/backend/storage/freespace/.deps ? src/backend/storage/ipc/.deps ? src/backend/storage/large_object/.deps ? src/backend/storage/lmgr/.deps ? src/backend/storage/page/.deps ? src/backend/storage/smgr/.deps ? src/backend/tcop/.deps ? src/backend/tsearch/.deps ? src/backend/utils/.deps ? src/backend/utils/probes.h ? src/backend/utils/adt/.deps ? src/backend/utils/cache/.deps ? src/backend/utils/error/.deps ? src/backend/utils/fmgr/.deps ? src/backend/utils/hash/.deps ? src/backend/utils
Re: [PATCHES] Bug fix for pg_standby keepfiles calculation
Simon Riggs wrote: Fix minor bug in pg_standby, noted by Ferenc Felhoffer Applied, thanks. I couldn't find a bug report from Ferenc in the archives. Did he contact you personally? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] Multi-column GIN
Dumb question: What's the benefit of a multi-column GIN index over multiple single-column GIN indexes? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Relation forks FSM rewrite patches
Simon Riggs wrote: On Fri, 2008-06-27 at 19:36 +0300, Heikki Linnakangas wrote: Here's an updated version of the relation forks patch, and an incremental FSM rewrite patch on top of that. The relation forks patch is ready for review. The FSM implementation is more work-in-progress still, but I'd like to get some review on that as well, with the goal of doing more performance testing and committing it after the commit fest. Hmmm, a 6000 line page with no visible documentation, readme, nor any discussion on -hackers that I'm aware of. There is a readme in the patch, and there certainly has been discussion on -hackers, see: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00415.php where the current design is discussed for the first time. User documentation is still to be done. There isn't anything to tune, or anything that requires manual maintenance, so there isn't much to document, though, except perhaps a chapter in the Internals part. Here's an updated version of the patch. Changes: - one bug has been fixed (I assumed that it's always safe to call rightchild(x) on a parent node, but that was not always true for the rightmost branch of the tree) - there's some new debugging code. - if we can't find a page with enough free space in search_avail after starting from scratch 1000 times, give up. That shouldn't happen, but see below. There is still a nasty bug somewhere, probably related to locking :-(. I ran DBT-2 with this, and after about 1h a FSM lookup goes into an endless loop, hogging all CPU. I suspect that there's a bug somewhere so that a change to the root node of a lower level FSM page isn't propagated to the upper level FSM page for some reason. oprofile shows that the endless loop happens in search_avail. This is why I added the code to give up after 1000 tries, hoping to get some debugging output the next time that happens. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup
Pavan Deolasee wrote: On Fri, Jul 4, 2008 at 1:01 PM, Zdenek Kotala [EMAIL PROTECTED] wrote: Good catch. I lost in basic arithmetic. What I see now that original definition count sizeof(ItemIdData) twice and on other side it does not take care about MAXALING correctly. I think correct formula is: #define HashMaxItemSize(page) \ (PageGetPageSize(page) - \ ( MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData))+ \ MAXALIGN(sizeof(HashPageOpaqueData)) \ )\ ) What do you think? Yes. I think that's the correct way. Doesn't look right to me. There's no padding after the first line pointer, hence the first MAXALIGN shouldn't be there. BTW, looking at hashinsert.c where it's used, we're actually passing a pointer to the meta page to HashMaxItemSize(). So the PageGetPageSize() call on that is quite bogus, since it's not the meta page that the tuple is going to be inserted to. It's academical, because all pages are the same size anyway, but doesn't look right. I think I'd go with BLKCSZ instead. I think this is the way it should be: #define HashMaxItemSize \ (BLCKSZ - \ SizeOfPageHeaderData - \ MAXALIGN(sizeof(HashPageOpaqueData)) - \ sizeof(ItemIdData)) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup
Pavan Deolasee wrote: On Fri, Jul 4, 2008 at 3:37 PM, Heikki Linnakangas [EMAIL PROTECTED] wrote: I think this is the way it should be: #define HashMaxItemSize \ (BLCKSZ - \ SizeOfPageHeaderData - \ MAXALIGN(sizeof(HashPageOpaqueData)) - \ sizeof(ItemIdData)) I am wondering if this would fail for corner case if HashMaxItemSize happened to be unaligned. For example, if (itemsz HashMaxItemSize MAXALIGN(itemsz), PageAddItem() would later fail with a not-so-obvious error. Should we just MAXALIGN_DOWN the HashMaxItemSize ? No, there's a itemsz = MAXALIGN(itemsz) call before the check against HashMaxItemSize. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup
Zdenek Kotala wrote: By my opinion first place where tuple should be placed is: MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData)) Yeah, but we don't enforce that directly. We enforce it by MAXALIGNing size in PageAddItem, and with the rule that special-size is MAXALIGNed. To put it another way, it's possible that there's an unaligned amount of free space on a page, as returned by PageGetExactFreeSpace. But since item size is always MAXALIGNed, it's impossible to use it all. Come to think of it, why do we require MAXALIGN alignment of tuples? I must be missing something, but AFAICS the widest fields in HeapTupleHeaderData are 4-bytes wide, so it should be possible to get away with 4-byte alignment. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Relation forks FSM rewrite patches
Simon Riggs wrote: On Fri, 2008-07-04 at 12:27 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: On Fri, 2008-06-27 at 19:36 +0300, Heikki Linnakangas wrote: Here's an updated version of the relation forks patch, and an incremental FSM rewrite patch on top of that. The relation forks patch is ready for review. The FSM implementation is more work-in-progress still, but I'd like to get some review on that as well, with the goal of doing more performance testing and committing it after the commit fest. Hmmm, a 6000 line page with no visible documentation, readme, nor any discussion on -hackers that I'm aware of. There is a readme in the patch, and there certainly has been discussion on -hackers, see: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00415.php where the current design is discussed for the first time. OK, I see the readme now. Thanks. Minor point but the readme needs to draw a clear distinction between FSM pages and data pages, which confused me when I read it first time. Ok, thanks. I had trouble finding distinctive terms for the tree within a page, and the tree of FSM blocks, with root at block 0. Contention on FSM pages concerns me. Every change has the potential to bubble up to the top, which would cause a significant problem. I'd like to find a way to minimise the number of bubble-up operations, otherwise there will be no material difference between using a single whole-FSM lock like we do now and this new scheme. Note that in this new scheme the path length to the bottom of the tree is considerably longer and can stall waiting on I/O - so contention in the FSM is a big issue. (It always has been in databases as long as I can remember). There's already some mitigating factors: 1. You only need to bubble up to an upper level FSM page if the amount at the top of the leaf FSM page changes. Keep in mind that one FSM page holds FSM information on ~4000 heap pages, so you don't need to bubble up if there's any page within that 4000 page range that has as much or more space than the page you're updating. 2. We only update the FSM when we try to insert a tuple and find that it doesn't fit. That reduces the amount of FSM updates dramatically when you're doing bulk inserts. (This is the same as in the current implementation, though I'm not sure it's optimal anymore.) I haven't been able to come up with a simple test case that shows any meaningful performance difference between the current and this new implementation. Got any ideas? I fear that we're going overboard trying to avoid contention that simple isn't there, but it's hard to argue without a concrete test case to analyze.. The FSM tree as proposed has exact values. Not quite. Free space is tracked in BLCKSZ/256 (=32 with default BLCKSZ) increments, so that we only need one byte per heap page. What if we bubbled up based upon only significant tree changes, rather than exact changes? Hmm. So an update would only ever update the lowest-level FSM page, and leave a mismatch between the value at the root node of a lower level FSM page, and the corresponding value at the lead node of its parent. The mismatch would then need to be fixed by the next search that traverses down that path, and finds that there's not enough space there after all. That works when the amount of free space on page is decremented. VACUUM, that increments it, would still need to bubble up the change, because if the value at the upper level is not fixed, no search might ever traverse down that path, and the value would never be fixed. That would solve the I/O while holding lock issue. (not that I'm too worried about it, though) So perhaps we can perform bubble-up only when the change in free space in greater than avg row size or the remaining space has dropped to less than 2*avg row size, where exact values begin to matter more. One idea is to make the mapping from amount of free space in bytes to the 1-byte FSM category non-linear. For example, use just one category for 2000 bytes free, on the assumption that anything larger than that is toasted anyway, and divide the 255 categories evenly across the remaining 2000 bytes. I would like to move away from using the average row width in the calculation if possible. We use it in the current implementation, but if we won't to keep doing it, we'll need to track it within the FSM like the current implementation does, adding complexity and contention issues of its own. Or reach into the statistics from the FSM, but I don't like that idea much either. Also, as FSM map becomes empty we will see more and more bubble up operations reaching top parts of the tree. We need a way to avoid contention from growing over time. Yeah, that's something to watch out for. I'm not happy about the FSM being WAL logged for minor changes (new pages, yes). The high number of changes will cause extra traffic where we don't want it. This will accentuate
Re: [PATCHES] Relation forks FSM rewrite patches
Alvaro Herrera wrote: I wonder if we could instead make vacuum write a new FSM tree from scratch, rather than updating it piecemeal. I'd like to move away from that, as we'll hopefully get some sort of partial vacuum capability soon. We might want to have some sort of bulk update operation, though, so that you could refresh FSM information for say 100 pages, or one FSM page, at a time. That would reduce the WAL traffic of a VACUUM significantly, though I'm not sure how significant that is to begin with. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] EXPLAIN progress info
I like this idea in general. I'm envisioning a cool explain display in pgAdmin that's updated live, as the query is executed, showing how many tuples a seq scan in the bottom, and an aggregate above it, has processed. Drool. Currently the interface is that you open a new connection, and signal the backend using the same mechanism as a query cancel. That's fine for the interactive psql use case, but seems really heavy-weight for the pgAdmin UI I'm envisioning. You'd have to poll the server, opening a new connection each time. Any ideas on that? How about a GUC to send the information automatically every n seconds, for example? Other than that, this one seems to be the most serious issue: Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: There are downsides: Insurmountable ones at that. This one already makes it a non-starter: a) the overhead of counting rows and loops is there for every query execution, even if you don't do explain analyze. I wouldn't be surprised if the overhead of the counters turns out to be a non-issue, but we'd have to see some testing of that. InstrEndLoop is the function we're worried about, right? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup
Just one quick note: Zdenek Kotala wrote: *** pgsql.orig.da8c485e0e2a/src/backend/access/gist/gistutil.c pá Ärn 13 18:00:35 2008 --- pgsql.orig/src/backend/access/gist/gistutil.c pá Ärn 13 18:00:35 2008 *** *** 592,598 /* * Additionally check that the special area looks sane. */ ! if (((PageHeader) (page))-pd_special != (BLCKSZ - MAXALIGN(sizeof(GISTPageOpaqueData ereport(ERROR, (errcode(ERRCODE_INDEX_CORRUPTED), --- 592,598 /* * Additionally check that the special area looks sane. */ ! if ( PageGetSpecialPointer(page) - page != (BLCKSZ - MAXALIGN(sizeof(GISTPageOpaqueData ereport(ERROR, (errcode(ERRCODE_INDEX_CORRUPTED), Should probably use PageGetSpecialSize here. Much simpler, and doesn't assume that the special area is always at the end of page (not that I see us changing that anytime soon). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] extend VacAttrStats to allow stavalues of different types
Jan Urbański wrote: Tom Lane wrote: I think the correct solution is to initialize the fields to match the column type before calling the typanalyze function. Then you don't break compatibility for existing typanalyze functions. It's also less code, since the standard typanalyze functions can rely on those preset values. Right. Updated patch attached. Thanks, committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] odd output in restore mode
Simon Riggs wrote: Patch implements * recommendation to use GnuWin32 cp on Windows * provide holdtime delay, default 0 (on all platforms) * default stays same on Windows=copy to ensure people upgrading don't get stung This seems pretty kludgey to me. I wouldn't want to install GnuWin32 utilities on a production system just for the cp command, and I don't know how I would tune holdtime properly for using copy. And it seems risky to have defaults that are known to not work reliably. How about implementing a replacement function for cp ourselves? It seems pretty trivial to do. We could use that on Unixes as well, which would keep the differences between Win32 and other platforms smaller, and thus ensure the codepath gets more testing. (Sorry for jumping into the discussion so late, I didn't follow this thread earlier, and just read it now in the archives while looking at the patch.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Better formatting of functions in pg_dump
Tom Lane wrote: Greg Sabino Mullane [EMAIL PROTECTED] writes: Why the random switching between newline-before and newline-after styles? Please be consistent. I thought they were all after. On second glance, they still seem all after? Oh, my mistake, I had failed to see that the patch was getting rid of newline-before style in this function. I think you might have gone a bit overboard on adding whitespace, but the previous objection is nonsense, sorry. Yeah, I like idea of moving the metadata stuff before the function body, but the whitespace is a bit too much. You can fit LANGUAGE plpgsql IMMUTABLE STRICT SECURITY DEFINER COST 10 in on one line without wrapping on a 80 col terminal. And we don't try to guarantee any specific width anyway, you can get very long lines if the function has a lot of arguments, for example. I applied this simpler patch that just moves the metadata stuff before the function body, leaving the whitespace as is (in newline-before style). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/bin/pg_dump/pg_dump.c --- src/bin/pg_dump/pg_dump.c *** *** 6775,6788 dumpFunc(Archive *fout, FuncInfo *finfo) rettypename = getFormattedTypeName(finfo-prorettype, zeroAsOpaque); appendPQExpBuffer(q, CREATE FUNCTION %s , funcsig); ! appendPQExpBuffer(q, RETURNS %s%s\n%s\nLANGUAGE %s, (proretset[0] == 't') ? SETOF : , ! rettypename, ! asPart-data, ! fmtId(lanname)); ! free(rettypename); if (provolatile[0] != PROVOLATILE_VOLATILE) { if (provolatile[0] == PROVOLATILE_IMMUTABLE) --- 6775,6786 rettypename = getFormattedTypeName(finfo-prorettype, zeroAsOpaque); appendPQExpBuffer(q, CREATE FUNCTION %s , funcsig); ! appendPQExpBuffer(q, RETURNS %s%s, (proretset[0] == 't') ? SETOF : , ! rettypename); free(rettypename); + appendPQExpBuffer(q, \nLANGUAGE %s, fmtId(lanname)); if (provolatile[0] != PROVOLATILE_VOLATILE) { if (provolatile[0] == PROVOLATILE_IMMUTABLE) *** *** 6850,6856 dumpFunc(Archive *fout, FuncInfo *finfo) appendStringLiteralAH(q, pos, fout); } ! appendPQExpBuffer(q, ;\n); ArchiveEntry(fout, finfo-dobj.catId, finfo-dobj.dumpId, funcsig_tag, --- 6848,6854 appendStringLiteralAH(q, pos, fout); } ! appendPQExpBuffer(q, \n%s;\n, asPart-data); ArchiveEntry(fout, finfo-dobj.catId, finfo-dobj.dumpId, funcsig_tag, -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES][UPDATED] A GUC variable to replace PGBE_ACTIVITY_SIZE
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Another simple optimization occurred to me while looking at this: we should skip the memcpy/strcpy altogether if the BackendActivity slot is not in use. That seems like a good bet, you usually don't try to max out max_connections. Huh? How could we be assigning to a slot that is not in use? Before the patch, we loop through the shared PgBackendStatus slots (there is MaxBackends of them), and issue a memcpy for each to copy it to our local slot. After that, we check if it was actually in use. After the patch, we still loop through the shared slots, but only issue the memcpy for slots that are in use. Hope this answers your question, I'm not sure what you meant... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Hint Bits and Write I/O
Gregory Stark wrote: I'm also a bit concerned that *how many hint bits* isn't enough information to determine how important it is to write out the page. Agreed, that doesn't seem like a very good metric to me either. Or how many *unhinted* xmin/xmax values were found? If HTSV can hint xmin for a tuple but finds xmax still in progress perhaps that's a good sign it's not worth dirtying the page? I like that thought. Overall, I feel that we should never dirty when setting a hint bit, just set the separate buffer flag to indicate that hint bits have been set. The decision to dirty and write out, or not, should be delayed until we're about to write/replace the buffer. That is, in bgwriter. How about this strategy: 1. First of all, before writing a dirty buffer, scan all tuples on the page and set all hint bits that can be set. This will hopefully save us from having to dirty the page again in the future, when another tuple on the page is accessed. This has been proposed before, and IIRC Tom has argued that it's a modularity violation for bgwriter to access the contents of pages like that, but I'm sure we can find a way to do it safely. 2. When bgwriter encounters a page that's marked as hint bits dirty, write it only if *all* hint bits on the page has been, or can be, set. Dirtying a page before that point doesn't seem worthwhile, as the next access to the tuple that doesn't have all the hint bits set will have to dirty the page again. Actually, I'd like to see some benchmarks on an even simpler strategy: just never dirty a page just because a hint bit has been set. It might work surprisingly well in practice: If a database is I/O bound, we don't care about the extra CPU work or lock congestion of checking the clog. If it's CPU bound, the active pages that matter are in the buffer cache, and so are the hint bits for those pages. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] A GUC variable to replace PGBE_ACTIVITY_SIZE
I'd like to see some quick testing of this thing mentioned in the comments: /* * XXX if pgstat_track_activity_query_size is really large, * it might be best to use strcpy not memcpy for copying the * activity string? */ If we make it a GUC, there will be people running with really large pgstat_track_activity_query_size settings. In fact I wouldn't be surprised if people routinely cranked it up to the 10-100 KB range, just in case. It's going to be platform-dependent, for sure, but some quick micro-benchmarks of where the break-even point between memcpy and strcpy lies would be nice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Refactoring xlogutils.c
Teodor Sigaev wrote: - For the routines that need a fake, or lightweight, Relation struct anyway, there's a new function called XLogFakeRelcacheEntry(RelFileNode), that returns a palloc'd Relation struct. Is that fake structure applicable for ReadBuffer()? Yes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Refactoring xlogutils.c
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Attached is an updated version of my patch to refactor the XLogOpenRelation/XLogReadBuffer interface, in preparation for the relation forks patch, and subsequently the FSM rewrite patch. The code motion in md.c looks fairly bogus; was that a copy-and-paste error? This one? *** a/src/backend/storage/smgr/md.c --- b/src/backend/storage/smgr/md.c *** *** 208,216 mdcreate(SMgrRelation reln, bool isRedo) char *path; Filefd; - if (isRedo reln-md_fd != NULL) - return; /* created and opened already... */ - Assert(reln-md_fd == NULL); path = relpath(reln-smgr_rnode); --- 208,213 That's intentional. That check has been moved to smgrcreate(), so that it's done before calling TablespaceCreateDbSpace(). The reason for that is that smgrcreate() is now called for every XLogReadBuffer() invocation, so we want to make it exit quickly if there's nothing to do. On second look though, I think we should actually leave that check in mdcreate(). Even though it'd be dead code since we do the same check in smgrcreate already, removing it changes the semantics of mdcreate(): it'd no longer be acceptable to call mdcreate() if the file is open already. Otherwise it looks pretty sane, but I have one stylistic gripe: I'm dubious about your willingness to assume that pfree() is enough for getting rid of a fake relcache entry. Relcache entries are complex beasts and it may not continue to work to do that. It's also possible that we'd have additional resources attached to them someday. So I think it'd be worth having a FreeFakeRelcacheEntry() routine to localize the knowledge of how to get rid of them. Yeah, I guess you're right. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] minor ts_type.h comment fix
Jan Urbański wrote: These should read TSQuery, not TSVector, no? Yep. Applied. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] \timing on/off
I started to look at David Fetter's patch to enable \timing on/off, in addition to toggling the mod with just \timing. I gather that the conclusion of the thread Making sure \timing is on, starting at: http://archives.postgresql.org/pgsql-general/2008-05/msg00324.php was that we should leave \H and \a alone for now, because it's not clear what \H off would do. And \a is already documented as deprecated; we might want to do that for \H as well. Here's a patch, based on David's patch, that turns timing into a \pset variable, and makes \timing an alias for \pset timing. This makes \timing behave the same as \x. I also moved the help line from the External section into Formatting. I don't think \timing is external in the same sense as \cd or \! are. Rather, it controls the output, like \x or \pset options. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] \timing on/off
And here is the patch I forgot to attach. Heikki Linnakangas wrote: I started to look at David Fetter's patch to enable \timing on/off, in addition to toggling the mod with just \timing. I gather that the conclusion of the thread Making sure \timing is on, starting at: http://archives.postgresql.org/pgsql-general/2008-05/msg00324.php was that we should leave \H and \a alone for now, because it's not clear what \H off would do. And \a is already documented as deprecated; we might want to do that for \H as well. Here's a patch, based on David's patch, that turns timing into a \pset variable, and makes \timing an alias for \pset timing. This makes \timing behave the same as \x. I also moved the help line from the External section into Formatting. I don't think \timing is external in the same sense as \cd or \! are. Rather, it controls the output, like \x or \pset options. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: doc/src/sgml/ref/psql-ref.sgml === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/ref/psql-ref.sgml,v retrieving revision 1.207 diff -c -r1.207 psql-ref.sgml *** doc/src/sgml/ref/psql-ref.sgml 1 Jun 2008 16:23:08 - 1.207 --- doc/src/sgml/ref/psql-ref.sgml 9 Jun 2008 12:19:35 - *** *** 1721,1726 --- 1721,1739 /para /listitem /varlistentry + + varlistentry + termliteraltiming/literal/term + listitem + para + You can specify an optional second argument, if it is provided it + may be either literalon/literal or literaloff/literal which + will enable or disable display of how long each SQL statement takes, + in milliseconds. If the second argument is not provided then we will + toggle between on and off. + /para + /listitem + /varlistentry /variablelist /para *** *** 1734,1740 para There are various shortcut commands for command\pset/command. See command\a/command, command\C/command, command\H/command, ! command\t/command, command\T/command, and command\x/command. /para /tip --- 1747,1754 para There are various shortcut commands for command\pset/command. See command\a/command, command\C/command, command\H/command, ! command\t/command, command\T/command, command\x/command, ! and command\timing/command. /para /tip *** *** 1864,1870 termliteral\timing/literal/term listitem para ! Toggles a display of how long each SQL statement takes, in milliseconds. /para /listitem /varlistentry --- 1878,1885 termliteral\timing/literal/term listitem para ! Toggles a display of how long each SQL statement takes. As such it ! is equivalent to literal\pset timing/literal. /para /listitem /varlistentry Index: src/bin/psql/command.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/bin/psql/command.c,v retrieving revision 1.189 diff -c -r1.189 command.c *** src/bin/psql/command.c 14 May 2008 19:10:29 - 1.189 --- src/bin/psql/command.c 9 Jun 2008 10:56:25 - *** *** 290,301 char *opt = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, false); ! if (pset.timing) INSTR_TIME_SET_CURRENT(before); success = do_copy(opt); ! if (pset.timing success) { INSTR_TIME_SET_CURRENT(after); INSTR_TIME_SUBTRACT(after, before); --- 290,301 char *opt = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, false); ! if (pset.popt.timing) INSTR_TIME_SET_CURRENT(before); success = do_copy(opt); ! if (pset.popt.timing success) { INSTR_TIME_SET_CURRENT(after); INSTR_TIME_SUBTRACT(after, before); *** *** 884,897 /* \timing -- toggle timing of queries */ else if (strcmp(cmd, timing) == 0) { ! pset.timing = !pset.timing; ! if (!pset.quiet) ! { ! if (pset.timing) ! puts(_(Timing is on.)); ! else ! puts(_(Timing is off.)); ! } } /* \unset */ --- 884,894 /* \timing -- toggle timing of queries */ else if (strcmp(cmd, timing) == 0) { ! char *opt = psql_scan_slash_option(scan_state, ! OT_NORMAL, NULL, true); ! ! success = do_pset(timing, opt, pset.popt, pset.quiet); ! free(opt); } /* \unset */ *** *** 1739,1744 --- 1736,1752 printf(_(Target width for \wrapped\ format is %d.\n), popt-topt.columns); } + /* set timing mode */ + else if (strcmp(param, timing) == 0
Re: [PATCHES] \timing on/off
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Here's a patch, based on David's patch, that turns timing into a \pset variable, and makes \timing an alias for \pset timing. This makes \timing behave the same as \x. This seems a bit random to me. AFAIR all of the \pset properties are associated with formatting of table output. \timing doesn't belong. Hmm, yes, so it seems. This patch http://archives.postgresql.org/pgsql-general/2008-05/msg00427.php from David seems the most appropriate thing to do then. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] GIN improvements
Teodor Sigaev wrote: 2) fast insert into GIN New version: http://www.sigaev.ru/misc/fast_insert_gin-0.6.gz Changes: - added option FASTUPDATE=(1|t|true|on|enable|0|f|false|disable) for CREATE/ALTER command for GIN indexes I think we should try to make it automatic. I'm not sure how. How about having a constant sized fastupdate buffer, of say 100 rows or a fixed number of pages, and when that becomes full, the next inserter will have to pay the price of updating the index and flushing the buffer. To keep that overhead out of the main codepath, we could make autovacuum to flush the buffers periodically. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] extend VacAttrStats to allow stavalues of different types
Jan Urbański wrote: Following the conclusion here: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00273.php here's a patch that extends VacAttrStats to allow typanalyze functions to store statistic values of different types than the underlying column. Looks good to me at first glance. About this comment: +* XXX or maybe fall back on attrtype- stuff when these are NULL? That way +* we won't break other people's custom typanalyze functions. Not sure if +* any exist, though. I tried to google for a user defined data type with a custom typanalyze function but didn't find anything, so I don't think it's an issue. It's a bit nasty, though, because if one exists, it will compile and run just fine, until you run ANALYZE. In general it's better to break an old API obviously rather than silently, so that the old code doesn't even compile. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Map forks (WIP)
Tom Lane wrote: One thing I did *not* like was changing the FSM API to refer to Relation rather than RelFileNode --- I don't believe that's a good idea at all. In particular, consider what happens during TRUNCATE or CLUSTER: it's not very clear how you'll tell the versions of the relation apart. If you want to push the FSM API up to use SMgrRelation instead of RelFileNode, that'd be okay, but not Relation. (Essentially the distinction I'm trying to preserve here is logical vs physical relation.) Oh really? I'm quite fond of the new API. From a philosophical point of view, in the new world order, the FSM is an integral part of a relation, not something tacked on the physical layer. TRUNCATE and CLUSTER will need to truncate and truncate+recreate the FSM file, respectively. The FSM fork is on an equal footing with the main fork: when TRUNCATE swaps the relation file, a new FSM fork is created as well, and there's no way or need to access the old file anymore. When a relation is moved to another tablespace, the FSM fork is moved as well, and while the RelFileNode changes at that point, the logical Relation is the same. Besides, Relation contains a bunch of very handy fields. pgstat_info in particular, which is needed if we want to collect pgstat information about FSM, and I think we will. I might also want add a field like rd_amcache there, for the FSM: I'm thinking of implementing something like the fastroot thing we have in b-tree, and we might need some other per-relation information there as well. The XLogOpenRelationWithFork stuff needs to be re-thought also, as again this is blurring the question of what's a logical and what's a physical relation --- and if forknum isn't part of the relation ID, that API is wrong either way. I'm not sure about a good solution in this area, but I wonder if the right answer might be to make the XLOG replay stuff use SMgrRelations instead of bogus Relations. IIRC the replay code design predates the existence of SMgrRelation, so maybe we need a fresh look there. Agreed, I'm not happy with that part either. I tried to do just what you suggest, make XLOG replay stuff deal with SMgrRelations instead of the lightweight relcache, and it did look good until I got to refactoring btree_xlog_cleanup() (GIN/GiST has the same problem, IIRC). btree_xlog_cleanup() uses the same functions as the normal-operation code to insert pointers to parent pages, which operates on Relation. That started to become really hairy to solve without completely bastardizing the normal code paths. Hmm. One idea would be to still provide a function to create a fake RelationData struct from SMgrRelation, which the redo function can call in that kind of situations. (On closer look, XLogOpenRelationWithFork seems unused anyway That's just because FSM WAL-logging hasn't been implemented yet. One really trivial thing that grated on me was + /* + * In a few places we need to loop through 0..MAX_FORKS to discover which + * forks exists, so we should try to keep this number small. + */ + #define MAX_FORKS (FSM_FORKNUM + 1) I think you should either call it MAX_FORK (equal to the last fork number) or NUM_FORKS (equal to last fork number plus one). As is, it's just confusing. Agreed, will fix. And the comment is flat out wrong for the current usage. What's described in the comment is done in ATExecSetTableSpace. I grant you that there's many other usages for it. I'll work on the comment. BTW, it would probably be a good idea to try to get the fork access API committed before you work on FSM. Whenever you can break a big patch into successive sections, it's a good idea, IMHO. I don't think there's any doubt that we are going to go in this direction, so I see no objection to committing fork-based API revisions in advance of having any real use for them. Yep. I'll develop them together for now, but will separate them when the fork stuff is ripe for committing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] create or replace language
Andreas 'ads' Scherbaum wrote: Attached is another version of the patch (still missing documentation), which changes the language owner on update (the owner can still be changed in pg_pltemplate). The other CREATE OR REPLACE commands don't change the owner, so CREATE OR REPLACE LANGUAGE shouldn't do that either. So do we want to replace any data (in my opinion only the validator is left) at all or just skip any error message? I think you should be able to change handler and validator functions, and the trusted flag. Or is there a reason to not allow that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Verified fix for Bug 4137
Simon Riggs wrote: if (restartWALFileName) { + /* +* Don't do cleanup if the restartWALFileName provided +* is later than the xlog file requested. This is an error +* and we must not remove these files from archive. +* This shouldn't happen, but better safe than sorry. +*/ + if (strcmp(restartWALFileName, nextWALFileName) 0) + return false; + strcpy(exclusiveCleanupFileName, restartWALFileName); return true; } I committed this sanity check into pg_standy, though it really shouldn't happen, but it just occurred to me that the most likely reason for that to happen is probably that the user has screwed up his restore_command line, mixing up the %p and %r arguments. Should we make that an error instead of just not doing the cleanup? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Verified fix for Bug 4137
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Ok, that'll work. Committed, thanks. I changed the sanity check that xlogfname restore point into an Assert, though, because that's a sign that something's wrong. Shouldn't that Assert allow the equality case? Yes. Thanks. Hmm. I can't actually think of a scenario where that would happen, but it was definitely an oversight on my part. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Verified fix for Bug 4137
Simon Riggs wrote: The problem was that at the very start of archive recovery the %r parameter in restore_command could be set to a filename later than the currently requested filename (%f). This could lead to early truncation of the archived WAL files and would cause warm standby replication to fail soon afterwards, in certain specific circumstances. Fix applied to both core server in generating correct %r filenames and also to pg_standby to prevent acceptance of out-of-sequence filenames. So the core problem is that we use ControlFile-checkPointCopy.redo in RestoreArchivedFile to determine the safe truncation point, but when there's a backup label file, that's still coming from pg_control file, which is wrong. The patch fixes that by determining the safe truncation point as Min(checkPointCopy.redo, xlogfname), where xlogfname is the xlog file being restored. That depends on the assumption that everything before the first file we (try to) restore is safe to truncate. IOW, we never try to restore file B first, and then A, where A B. I'm not totally convinced that's a safe assumption. As an example, consider doing an archive recovery, but without a backup label, and the latest checkpoint record is broken. We would try to read the latest (broken) checkpoint record first, and call RestoreArchivedFile to get the xlog file containing that. But because that record is broken, we fall back to using the previous checkpoint, and will need the xlog file where the previous checkpoint record is in. That's a pretty contrived example, but the point is that assumption feels fragile. At the very least it should be noted explicitly in the comments. A less fragile approach would be to use something dummy, like as the truncation point until we've successfully read the checkpoint/restartpoint record and started the replay. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Verified fix for Bug 4137
Simon Riggs wrote: Falling back to the secondary checkpoint implies we have a corrupted or absent WAL file, so making recovery startup work correctly won't avoid the need to re-run the base backup. We'll end with an unrecoverable error in either case, so it doesn't seem worth attempting to improve this in the way you suggest. That's true whenever you have to fall back to a secondary checkpoint, but we still try to get the database up. One could argue that we shouldn't, of course. Anyway, the point is that the patch relies on a non-obvious assumption. Even if the secondary checkpoint issue is a non-issue, it's not obvious (to me at least) that there isn't other similar scenarios. And someone might inadvertently break the assumption in a future patch, because it's not an obvious one; calling ReadRecord looks very innocent. We shouldn't introduce an assumption like that when we don't have to. I think we should completely prevent access to secondary checkpoints during archive recovery, because if the primary checkpoint isn't present or is corrupt we aren't ever going to get passed it to get to the pg_stop_backup() point. Hence an archive recovery can never be valid in that case. I'll do a separate patch for that because they are unrelated issues. Well, we already don't use the secondary checkpoint if a backup label file is present. And you can take a base backup without pg_start_backup()/pg_stop_backup() if you shut down the system first (a cold backup). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Verified fix for Bug 4137
Simon Riggs wrote: On Tue, 2008-05-06 at 17:52 +0100, Heikki Linnakangas wrote: I didn't suggest that alphabetical sorting property is a new assumption; it sure isn't. The new assumption is that you never call ReadRecord() for record 0002 before you call it for record 0001 (before initializing the checkPointCopy field from the checkpoint record, anyway). I've not done anything with the ordering of calls to ReadRecord(). There is no such assumption introduced here. Hmm, I see that I was inaccurate when I said the patch introduces that assumption, sorry about the confusion. It's more like the code is currently completely oblivious of the issue, and the patch makes it better but doesn't fully fix it. The code in CVS is broken, as we now know, because it assumes that we can delete all files checkPointCopy.redo, which is not true when checkPointCopy.redo has not yet been initialized from the backup label file, and points to a location that's ahead of the real safe truncation point. The patch makes it better, and the assumption is now that it's safe to truncate everything min(checkPointCopy.redo, xlog file we're reading). That still seems too fragile to me, because we assume that after you've asked for xlog record X, we never need anything earlier then that. In fact, what will happen if the checkpoint record's redo pointer points to an earlier xlog file: 1. The location of the checkpoint record is read by read_backup_label(). Let's say that it's 0005. 2. ReadCheckpointRecord() is called for 0005. The restore command is called because that xlog file is not present. The safe truncation point is determined to be 0005, as that's what we're reading. Everything before that is truncated 3. The redo pointer in the checkpoint record points to 0003. That's where we should start the recovery. Oops :-( I haven't tested this, so, am I missing something that makes that not happen? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] plpgsql CASE statement - last version
Pavel Stehule wrote: Hello I found some bugs when I used base_lexer, so I returned back own lexer. It's only little bit longer, but simpler. Hmm. I don't like having to lex the expressions again. It just doesn't feel right. How about taking a completely different strategy, and implement the CASE-WHEN construct fully natively in plpgsql, instead of trying to convert it to a single SQL CASE-WHEN expression? It's not a very good match anyway; you have to do tricks to convert the comma-separated lists of WHEN expressions to WHEN-THEN clauses, and you can't use the THEN-clauses as is, but you have to replace them with offsets into the array. I think implementing the logic in pl_exec.c would lead to cleaner code. FWIW, the current approach gives pretty cryptic CONTEXT information in error messages as well. For example, this pretty simple case-when example: postgres=# create or replace FUNCTION case_test(int) returns text as $$ begin case $1 when 1 then return 'one'; when 'invalid' then return 'two'; when 3,4,5 then return 'three, four or five'; end case; end; $$ language plpgsql immutable; CREATE FUNCTION gives this pretty hard-to-understand error message: postgres=# SELECT case_test(1); ERROR: invalid input syntax for integer: invalid CONTEXT: SQL statement SELECT CASE $1 WHEN 1 THEN 1 WHEN 'invalid' THEN 2 WHEN 3 THEN 3 WHEN 4 THEN 3 WHEN 5 THEN 3 END PL/pgSQL function case_test line 2 at unknown BTW, what does PL/SQL or PSM say about the type-compatibility of the CASE and the WHENs? We're very lenient in assignments, how should this behave? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] GUC parameter cursors_tuple_fraction
Simon Riggs wrote: * We've said here http://www.postgresql.org/docs/faqs.TODO.html that we Don't want hints. If that's what we really think, then this patch must surely be rejected because its a hint... That isn't my view. I *now* think we do need hints of various kinds. cursors_tuple_fraction or OPTIMIZE FOR xxx ROWS isn't the kind of hints we've said no to in the past. We don't want hints that work-around planner deficiencies, for example where we get the row count of a node completely wrong. This is different. This is about telling how the application is going to use the result set. It's relevant even assuming that the planner got the estimates spot on. Which plan is the best depends on whether the application can start processing the data as it comes in, or whether it's loading it all in memory first, for example. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Patch to add a feature to pg_standby
[EMAIL PROTECTED] wrote: When using pg_standby to remain in recovery mode on a warm standby system, if there is a need to perform other actions in coordination with recovery actions, the -x auxiliary command option implemented by this patch enables that coordination. I considered adding the ability to override the restoreCommand, however by keeping this separate and optional it is possible to force retries of the auxiliary command until successful and still utilize pg_usleep instead of looping within an external script or command. And the previous behavior of pg_standby remains unchanged (other than debug logging and documenting the option in usage) if the new option is omitted. I added this feature to help with synchronization of a content repository consisting of a PostgreSQL db for meta-information and a separate file store for content. The auxiliary command enables forcing an rsync of the file store that is at least as current as the found WAL segment file's db changes, and prevents recovery of that WAL file unless the rsync can be performed successfully. (See attached file: pg_standby.c.diff) This could be implemented by a pass-through restore_command, that calls pg_standby, and does the custom action when pg_standby returns successfully. Or perhaps you could run the custom action in whatever script is copying the files to the directory. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_postmaster_reload_time() patch
Gurjeet Singh wrote: On Wed, Apr 30, 2008 at 9:53 AM, George Gensure [EMAIL PROTECTED] wrote: I've done a quick write up for reload time reporting from the administration TODO. I was a little paranoid with the locking, but didn't want problems to occur with signals on the postmaster and the read side. IMHO, the function should return NULL if the postmaster never reloaded; that is, it was started, but never signaled to reload. I think it's useful to get the server startup time if the configuration has never been reloaded. That's when the configuration was loaded, if no reload has been triggered since. Perhaps the function should be named to reflect that better. pg_configuration_load_time() ? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Partial match in GIN
Alvaro Herrera wrote: Well, LIKE %foo% is supposed to match foo unanchored, but with a btree (or two btrees) you can only get 'foo' anchored to either end of the string (or both). Ah, true. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] EXPLAIN progress info
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: There are downsides: Insurmountable ones at that. This one already makes it a non-starter: a) the overhead of counting rows and loops is there for every query execution, even if you don't do explain analyze. and you are also engaging in a flight of fantasy about what the client-side code might be able to handle. Particularly if it's buried inside, say, httpd or some huge Java app. Yeah, you could possibly make it work for the case that the problem query was manually executed in psql, but that doesn't cover very much real-world territory. I think there's two different use cases here. The one that Greg's proposal would be good for is a GUI, like pgAdmin. It would be cool to see how a query progresses through the EXPLAIN tree when you run it from the query tool. That would be great for visualizing the executor; a great teaching tool. But I agree it's no good for use by a DBA to monitor a live system running a real-world application. For that we do need something else. You'd be far more likely to get somewhere with a design that involves looking from another session to see if anything's happening. In the case of queries that are making database changes, pgstattuple is certainly a usable option. For SELECT-only queries, I agree it's harder, but it's still possible. I seem to recall some discussion of including a low-overhead progress counter of some kind in the pg_stat_activity state exposed by a backend. The number of rows so far processed by execMain.c in the current query might do for the definition. Yeah, something like this would be better for monitoring a live system. The number of rows processed by execMain.c would only count the number of rows processed by the top node of the tree, right? For a query that for example performs a gigantic sort, that would be 0 until the sort is done, which is not good. It's hard to come up with a single counter that's representative :-(. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] Indexam API changes
There's a bunch of mails in the patch queue about the indexam API, so we need to discuss that. The first question is: do we want to refactor the bitmap index scan API, if we don't have any immediate use for it? Namely, since we don't have anyone actively working on the bitmap index patch nor the git patch. There was also discussion on adding support for candidate matches, mainly for GIT, but GiST could possibly also take advantage of that. If people think it's worth it, I can fix the bit-rot in the patch and work on it, but personally I don't think it is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Partial match in GIN
Teodor Sigaev wrote: Looking at the patch, you require that the TIDBitmap fits in work_mem in non-lossy format. I don't think that's acceptable, it can easily exceed work_mem if you search for some very common word. Failing to execute a valid query is not good. But way is better than nothing. In really, that way was chosen to have fast merge of (potentially) hundreds of sorted lists of ItemPointers. Other ways is much slower. How about forcing the use of a bitmap index scan, and modify the indexam API so that GIN could a return a lossy bitmap, and let the bitmap heap scan do the rechecking? I don't think the storage size of tsquery matters much, so whatever is the best solution in terms of code readability etc. That was about tsqueryesend/recv format? not a storage on disk. We don't require compatibility of binary format of db's files, but I have some doubts about binary dump. We generally don't make any promises about cross-version compatibility of binary dumps, though it would be nice not to break it if it's not too much effort. Hmm. match_special_index_operator() already checks that the index's opfamily is pattern_ops, or text_ops with C-locale. Are you reusing the same operator families for wildspeed? Doesn't it then also get confused if you do a WHERE textcol 'foo' query by hand? No, wildspeed use the same operator ~~ match_special_index_operator() isn't called at all: in match_clause_to_indexcol() function is_indexable_operator() is called before match_special_index_operator() and returns true. expand_indexqual_opclause() sees that operation is a OID_TEXT_LIKE_OP and calls prefix_quals() which fails because it wishes only several Btree opfamilies. Oh, I see. So this assumption mentioned in the comment there: /* * LIKE and regex operators are not members of any index opfamily, * so if we find one in an indexqual list we can assume that it * was accepted by match_special_index_operator(). */ is no longer true with wildspeed. So we do need to check that in expand_indexqual_opclause() then. NOTICE 2: it seems to me, that similar technique could be implemented for ordinary BTree to eliminate hack around LIKE support. LIKE expression. I wonder what the size and performance of that would be like, in comparison to the proposed GIN solution? GIN speeds up '%foo%' too - which is impossible for btree. But I don't like a hack around LIKE support in BTree. This support uses outflank ways missing regular one. You could satisfy '%foo%' using a regular and a reverse B-tree index, and a bitmap AND. Which is interestingly similar to the way you proposed to use a TIDBitmap within GIN. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Partial match in GIN
Alvaro Herrera wrote: Heikki Linnakangas wrote: Teodor Sigaev wrote: GIN speeds up '%foo%' too - which is impossible for btree. But I don't like a hack around LIKE support in BTree. This support uses outflank ways missing regular one. You could satisfy '%foo%' using a regular and a reverse B-tree index, and a bitmap AND. Which is interestingly similar to the way you proposed to use a TIDBitmap within GIN. Huh, can you? I can see doing col LIKE 'foo%' OR reverse(col) LIKE reverse('%foo') with two btree indexes, but not a true %foo% ... That should be AND, not OR.. Hmm. It is the same as far as I can see. Am I missing something? You couldn't support more complex patterns directly, like 'foo%bar%foo', but you could still split that into 'foo%' AND '%bar%' AND '%foo', and recheck the matches against the original pattern -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Improve shutdown during online backup
Albe Laurenz wrote: Moreover, if Shutdown == SmartShutdown, new connections won't be accepted, and nobody can connect and call pg_stop_backup(). So even if I'd add a check for (pmState == PM_WAIT_BACKENDS) !BackupInProgress() somewhere in the ServerLoop(), it wouldn't do much good, because the only way for somebody to cancel online backup mode would be to manually remove the file. Good point. So the only reasonable thing to do on smart shutdown during an online backup is to have the shutdown request fail, right? The only alternative being that a smart shutdown request should interrupt online backup mode. Or we can add another state, PM_WAIT_BACKUP, before PM_WAIT_BACKENDS, that allows new connections, and waits until the backup ends. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Expose checkpoint start/finish times into SQL.
Robert Treat wrote: 1) Alert if checkpointing stops occuring within a reasonable time frame (note there are failure cases and normal use cases where this might occur) (also note I'll agree, this isn't common, but the results are pretty disatrous if it does happen) What are the normal use cases where this would occur? I can't think of any. Wrt. failure cases, there's a million things that can go wrong in a system, and only few of them will give the symptom of checkpoints stopped happening, so I'm not excited about adding a facility to monitor just that. More hooks for monitoring purposes in general would be nice, and I would like to see them exposed as SQL functions, but I'd like to see a much bigger proposal for that. 2) Can be graphed over time (using rrdtool and others) for trending checkpoint activity Hmm. You'd need the historical data to do that properly. In particular, if two checkpoints happen between the polling interval, you'd miss that. log_checkpoints=on, in CSV output, seems like a better approach for that. 3) Ease monitoring of checkpoints across pitr setups How is that different from monitoring in a non-pitr setup? 4) Help determine if your checkpoints are being timeout driven or segment driven, or if you need to look at those settings Seems like the log_checkpoints output is more useful for that, as it directly tells you what triggered the checkpoint. 5) Determine the number of log files that will need to be replayed in the event of a crash If you have a requirement for that, just set checkpoint_segments accordingly. And again, you can get the information in the logs by log_checkpoints=on already. 6) Helps give an indication on if you should enter a manual checkpoint before issuing a pg_start_backup call If you want the backup to begin immediately, just do a manual checkpoint unconditionally. It'll finish quickly if there hasn't been much activity since the last one. We talked about adding a new variant of pg_start_backup() that does that automatically (or rather, just hurries the current checkpoint) in the 8.3 cycle, but didn't do it in the end. Perhaps we should add that, after all? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Partial match in GIN
Teodor Sigaev wrote: For each matched entry all corresponding ItemPointers are collected in TIDBitmap structure to effective merge ItemPointers from different entries. Patch introduces following changes in interface: Looking at the patch, you require that the TIDBitmap fits in work_mem in non-lossy format. I don't think that's acceptable, it can easily exceed work_mem if you search for some very common word. Failing to execute a valid query is not good. Here there is a unclean issue: now tsquery has new flag to label prefix search and cstring representation has backward compatibility, but external binary hasn't it now. Now, extra byte is used for storage of this flag. In other hand, there 4 unused bits in external binary representation (in byte stores weights of lexeme), so it's possible to use one of them to store this flag. What are opinions? I don't think the storage size of tsquery matters much, so whatever is the best solution in terms of code readability etc. NOTICE 1: current index support of LIKE believes that only BTree can speed up LIKE and becomes confused with this module with error 'unexpected opfamily' in prefix_quals(). For this reason, partial match patch adds small check before calling expand_indexqual_opclause(). Hmm. match_special_index_operator() already checks that the index's opfamily is pattern_ops, or text_ops with C-locale. Are you reusing the same operator families for wildspeed? Doesn't it then also get confused if you do a WHERE textcol 'foo' query by hand? NOTICE 2: it seems to me, that similar technique could be implemented for ordinary BTree to eliminate hack around LIKE support. Yep, if you created a b-tree index on the reversed key, that could be used for LIKE '%foo' expressions. And if you had that in addition to a regular b-tree index, you could use those two indexes together for any LIKE expression. I wonder what the size and performance of that would be like, in comparison to the proposed GIN solution? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Re: [BUGS] BUG #4070: Join more then ~15 tables let postgreSQL produces wrong data
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: On second thought, expanding AttrNumber to int32, wholesale, might not be a good idea, No, it wouldn't. For one thing it'd be a protocol break --- column numbers are int16 --- I wasn't planning to change that. and for another, we'd have terrible performance problems with such wide rows. Yes, we probably would :-). Though if there's any nasty O(n^2) behavior left in there, we should look at optimizing it anyway to speed up more reasonably sized queries, in the range of a few hundred columns. Actually rows are supposed to be limited to ~1600 columns, anyway, because of HeapTupleHeader limitations. The trick is that that limitation doesn't apply to the intermediate virtual tuples we move around in the executor. Those are just arrays of Datums, and can have more than MaxTupleAttributeNumber attributes, as long as you project away enough attributes, bringing it below that limit, before returning it to the client or materializing it into a HeapTuple or MinimalTuple in the executor. Apparently you've found a path where that restriction isn't enforced correctly, but I haven't seen the referenced message yet ... Enforcing the limit for virtual tuples as well, and checking for the limit in the planner is one option, but it would cripple the ability to join extremely wide tables. For example, if you had 10 tables with 200 columns each, you couldn't join them together even for the purposes of COUNT(*). Granted, that's not a very common thing to do, this is the first time this bug is reported after all, but I'd prefer to keep the capability if possible. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Re: [BUGS] BUG #4070: Join more then ~15 tables let postgreSQL produces wrong data
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: I still haven't seen the actual bug description come by here, and the pgsql-bugs archive hasn't got it either. (apparently some mails on that thread are missing ...) That's what I meant. Heikki is quoting himself from a message that hasn't appeared anywhere public, and he must have had at least one message from the OP that hasn't appeared either. So the rest of us are still mostly in the dark about the problem. Hmm, strange. Looks like my mail client decided to sent that mail to pgsql-bugs-owner@ instead of pgsql-bugs@ for some reasone. Here's the missing mail: Ceschia, Marcello wrote: In query query_not_working all values from column 136_119 has the value of the first column. Using the splitted query (working_version) it works. I hope this data will help to find the bug. Thanks. Oh, the query actually gives an assertion failure on an assertion-enabled build, so this is clearly a bug: TRAP: FailedAssertion(!(attnum 0 attnum = list_length(rte-joinaliasvars)), File: parse_relation.c, Line: 1697) gdb tells that attnum is -31393 at that point. That's because get_rte_attribute_type() takes an AttrNumber, which is int16, and make_var() is trying to pass 34143, so it overflows. It seems we should extend AttrNumber to int32; we don't use AttrNumber in any of the on-disk structs. Though you still couldn't have more than MaxHeapAttributeNumber (1600) attributes in a table or MaxTupleAttributeNumber (1664) in a result set or intermediate tuples, like the output of a sort node, at least you could join ridiculously wide tables like that as long as you project out enough columns. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Expose checkpoint start/finish times into SQL.
Theo Schlossnagle wrote: First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Ordered Append WIP patch v1
Gregory Stark wrote: Here's the WIP patch I described on -hackers to implemented ordered append nodes. I think it would be better to create completely separate executor node for this, rather than tack this feature on the regular Append node. The two don't seem to have much in common. Looking at AppendState, for example, as_firstplan and as_lastplan are not interesting for the MergeAppend, and likewise MergeAppend needs a whole bunch of fields not interesting to regular Append. The fact that the first statement in ExecAppend is if(!node-as_is_ordered), with zero lines of shared codes between them, also suggests that it would be a good idea to keep them separate. As you point out in the comments, it's quite confusing to have indexes into the slots array and the heap array. I still can't get my head around that. Would it help to define a struct along the lines of: struct { TupleTableSlot *slot; PlanState *plan; }; and store pointers to those in the heap? 1) I still haven't completely figured out what to do with equivalence classes. My hack of just stuffing all the append subrel vars into there seems to work fine. I need to understand what's going on to see if there's really a problem with it or not. I don't understand that well enough to comment... I presume the FIXME in the patch is related to this? 4) I haven't handled mark/restore or random access. I think they could be handled and they'll probably be worth the complexity but I'm not sure. For Mark/restore, I suppose you would mark/restore all subnodes, and store/restore the heap. For reversing direction, you would reverse the heap. Doesn't sound too hard, but it's probably better to make it work without them at first before adding any bells and whistles... 5) Is it considering too many paths? Are there some which maybe aren't worth considering? For example, maybe it only makes sense to take best start_cost paths since if that plan doesn't dominate then the best total_cost plan is likely to be the sequential scans + unordered append + sort. I don't understand this paragraph. Are you referring to this comment in the patch: /* If we can't find any plans with the right order just add the * cheapest total plan to both paths, we'll have to sort it * anyways. Perhaps consider not generating a startup path for such * pathkeys since it'll be unlikely to be the cheapest startup * cost. */ Having to sort one or two of the sub-plans might not be to be expensive at all. For example, having to sort the empty parent relation of a partitioned table is essentially free. It's probably never cheaper to use the Merge Append if you need to sort *all* of the children, but if you can avoid sorting even one of them, it might very well be worthwhile. 6) I haven't looked at setops yet but this is particularly attractive for the UNION (as opposed to UNION ALL) case. Yes. And DISTINCT. 7) I copied/adapted a bunch of bits from tuplesort to maintain the heap and do the scankey comparisons. I could refactor that code back into tuplesort but it would mean twisting around tuplesort quite a bit. Essentially it would mean introducing a new type of tuplesort which would start off in FINAL_MERGE state only it would have to behave differently since we don't want it prefetching lots of records like FINAL_MERGE does, I don't think. Doesn't seem worthwhile. The heap management part of the patch is quite short. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] create language ... if not exists
Andreas 'ads' Scherbaum wrote: The attached patch for HEAD extends the CREATE LANGUAGE statement by an IF NOT EXISTS option which in effect changes the raised error into a notice. Before i continue working on this patch i would like to know if this extension has a chance to go into PG and what other changes i should apply (beside the missing documentation). The way we've solved this problem for other CREATE commands is to add OR REPLACE option, instead of IF NOT EXISTS. We should do the same here. Regarding the patch itself: You define rule opt_if_not_exists, but never use it. And you add a new rule for CREATE LANGUAGE ... HANDLER ..., but forgot IF_P NOT EXISTS from the end of that. Looks like you couldn't decide which approach to take, and ended up doing a little bit of both ;-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] create language ... if not exists
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: The way we've solved this problem for other CREATE commands is to add OR REPLACE option, instead of IF NOT EXISTS. We should do the same here. If we're willing to consider a solution that is specific to CREATE LANGUAGE (as opposed to implementing IF NOT EXISTS across-the-board, which might happen someday) what I'd suggest is just incorporating the behavior directly into the abbreviated (no parameters) form of CREATE LANGUAGE. If the language already exists and has the same properties specified in pg_pltemplate, don't raise an error. Give a notice maybe. Why not implement OR REPLACE like for other things? Still seems the most consistent behavior to me. You might want to get the error if the language already exists, which your proposal wouldn't allow. And it wouldn't help with languages without a pg_pltemplate entry. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [BUGS] Incomplete docs for restore_command for hotstandby
Simon Riggs wrote: On Mon, 2008-02-25 at 17:56 +0600, Markus Bertheau wrote: The FIXME of course needs replacement by someone in the know. Doc patch edited to include all of Markus' points, tidy up some related text and fix typos. Good to apply to HEAD. Committed to HEAD with minor fixes. What's our policy wrt. back-patching doc changes? This seems applicable to older versions as well, but do we do that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Consistent \d commands in psql
Greg Sabino Mullane wrote: Attached is an updated version of my psql patch that makes the \d backslash commands perform in an intuitive, consistent way. Specifically, the following objects will be treated as first class citizens (as tables and indexes currently are) by showing all the non-system objects by default and requiring a S to see the system ones. aggregates conversions comments domains operators functions types Currently, there is no way to view all the non-system functions in a database using backslash commands, as you can with \dt, unless all of the functions happen to be in a single schema (\df myschema.). With this patch, it would be as simple as \df, and the current behavior would be done with \dfS. Yes, that seems like a good idea. \df in particular has been too noisy to be usable. Not sure about conversions and domains; I doubt anyone creates custom conversions in practice, and there's no system domains in a standard installation. Does anyone want to argue that there's a backward-compatibility problem with changing \df? I don't think there is; you shouldn't be using psql backslash commands in an application. This patch also adds a few new things to the tab-completion table, such as comments and conversions. There's a bunch of merge conflicts in the diff. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] \password in psql help
Magnus Hagander wrote: + fprintf(output, _( \\password [USERNAME]\n + securely change the password for a user\n)); I would leave out the word securely. Unless you want to provide another command for changing it insecurely ;-). What does it mean, anyway? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Elsewhere in our codebase where we use arrays that are enlarged as needed, we keep track of the allocated size and the used size of the array separately, and only call repalloc when the array fills up, and repalloc a larger than necessary array when it does. I chose to just call repalloc every time instead, as repalloc is smart enough to fall out quickly if the chunk the allocation was made in is already larger than the new size. There might be some gain avoiding the repeated repalloc calls, but I doubt it's worth the code complexity, and calling repalloc with a larger than necessary size can actually force it to unnecessarily allocate a new, larger chunk instead of reusing the old one. Thoughts on that? Seems like a pretty bad idea to me, as the behavior you're counting on only applies to chunks up to 8K or thereabouts. Oh, you're right. Though I'm sure libc realloc has all kinds of smarts as well, it does seem better to not rely too much on that. In a situation where you are subcommitting lots of XIDs one at a time, this is likely to have quite awful behavior (or at least, you're at the mercy of the local malloc library as to how bad it is). I'd go with the same double-it-each-time-needed approach we use elsewhere. Yep, patch attached. I also changed xactGetCommittedChildren to return the original array instead of copying it, as Alvaro suggested. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/twophase.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v retrieving revision 1.39 diff -c -r1.39 twophase.c *** src/backend/access/transam/twophase.c 1 Jan 2008 19:45:48 - 1.39 --- src/backend/access/transam/twophase.c 12 Mar 2008 12:45:13 - *** *** 827,833 save_state_data(children, hdr.nsubxacts * sizeof(TransactionId)); /* While we have the child-xact data, stuff it in the gxact too */ GXactLoadSubxactData(gxact, hdr.nsubxacts, children); - pfree(children); } if (hdr.ncommitrels 0) { --- 827,832 Index: src/backend/access/transam/xact.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.258 diff -c -r1.258 xact.c *** src/backend/access/transam/xact.c 4 Mar 2008 19:54:06 - 1.258 --- src/backend/access/transam/xact.c 12 Mar 2008 13:41:10 - *** *** 130,136 int gucNestLevel; /* GUC context nesting depth */ MemoryContext curTransactionContext; /* my xact-lifetime context */ ResourceOwner curTransactionOwner; /* my query resources */ ! List *childXids; /* subcommitted child XIDs */ Oid prevUser; /* previous CurrentUserId setting */ bool prevSecDefCxt; /* previous SecurityDefinerContext setting */ bool prevXactReadOnly; /* entry-time xact r/o state */ --- 130,138 int gucNestLevel; /* GUC context nesting depth */ MemoryContext curTransactionContext; /* my xact-lifetime context */ ResourceOwner curTransactionOwner; /* my query resources */ ! TransactionId *childXids; /* subcommitted child XIDs, in XID order */ ! int nChildXids; /* # of subcommitted child XIDs */ ! int maxChildXids; /* allocated size of childXids */ Oid prevUser; /* previous CurrentUserId setting */ bool prevSecDefCxt; /* previous SecurityDefinerContext setting */ bool prevXactReadOnly; /* entry-time xact r/o state */ *** *** 156,162 0, /* GUC context nesting depth */ NULL, /* cur transaction context */ NULL, /* cur transaction resource owner */ ! NIL, /* subcommitted child Xids */ InvalidOid, /* previous CurrentUserId setting */ false, /* previous SecurityDefinerContext setting */ false, /* entry-time xact r/o state */ --- 158,166 0, /* GUC context nesting depth */ NULL, /* cur transaction context */ NULL, /* cur transaction resource owner */ ! NULL, /* subcommitted child Xids */ ! 0, /* # of subcommitted child Xids */ ! 0, /* allocated size of childXids */ InvalidOid, /* previous CurrentUserId setting */ false, /* previous SecurityDefinerContext setting */ false, /* entry-time xact r/o state */ *** *** 559,565 */ for (s = CurrentTransactionState; s != NULL; s = s-parent) { ! ListCell *cell; if (s-state == TRANS_ABORT) continue; --- 563,569 */ for (s = CurrentTransactionState; s != NULL; s = s-parent) { ! int low, high; if (s-state == TRANS_ABORT) continue; *** *** 567,576 continue; /* it can't have any child XIDs either */ if (TransactionIdEquals(xid, s-transactionId)) return true; ! foreach
Re: [PATCHES] [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Pavan Deolasee wrote: Wait. Subtransaction ids are local to a transaction and always start from 1. See this: /* * reinitialize within-transaction counters */ s-subTransactionId = TopSubTransactionId; currentSubTransactionId = TopSubTransactionId; No, we're not talking about SubTransactionIds. We're talking about real XIDs assigned to subtransactions. Subtransactions are assigned regular XIDs as well, just like top-level transactions. I can see where you were coming from with you suggestion now :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Pavan Deolasee wrote: On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane [EMAIL PROTECTED] wrote: I didn't like it; it seemed overly complicated (consider dealing with XID wraparound), We are talking about subtransactions here. I don't think we support subtransaction wrap-around, do we ? Imagine that you start a transaction just before transaction wrap-around, so that the top level XID is 2^31-10. Then you start 20 subtransactions. What XIDs will they get? Now how would you map those to a bitmap? It's certainly possible, you could index the bitmap by the index from top transaction XID for example. But it does get a bit complicated. and it would have problems with a slow transaction generating a sparse set of subtransaction XIDs. I agree thats the worst case. But is that common ? Thats what I was thinking when I proposed the alternate solution. I thought that can happen only if most of the subtransactions abort, which again I thought is not a normal case. But frankly I am not sure if my assumption is correct. It's not that common to have hundreds of thousands of subtransactions to begin with.. I think getting rid of the linear search will be enough to fix the performance problem. I wonder if a skewed binary search would help more ? For example, if we know that the range of xids stored in the array is 1 to 1000 and if we are searching for a number closer to 1000, we can break the array into large,small parts instead of equal parts and then search. Possibly, but I doubt it's worth the trouble. The simple binary search solved the performance problem well enough. In the test case of the OP, with 30 subtransactions, with the patch, there was no longer any measurable difference whether you ran the SELECT COUNT(*) in the same transaction as the INSERTs or after a COMMIT. Well, may be I making simple things complicated ;-) I think so :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
(moved to pgsql-patches, as there's a patch attached) Tom Lane wrote: Getting rid of the linked-list representation would be a win in a couple of ways --- we'd not need the bogus list of XIDs support in pg_list.h, and xactGetCommittedChildren would go away. OTOH AtSubCommit_childXids would get more expensive. I couldn't let this case go, so I wrote a patch. I replaced the linked list with an array that's enlarged at AtSubCommit_childXids when necessary. I couldn't measure any performance hit from the extra work of enlarging and memcpying the array. I tried two test cases: 1. Insert one row per subtransaction, and commit the subtransaction after each insert. This is like the OP's case. printf(CREATE TABLE foo (id int4);\n); printf(BEGIN;\n); for(i=1; i = 10; i++) { printf(SAVEPOINT sp%d;\n, i); printf(INSERT INTO foo VALUES (1);\n); printf(RELEASE SAVEPOINT sp%d;\n, i); } printf(COMMIT;\n); 2. Insert one row per subtransaction, but leave the subtransaction in not-committed state printf(CREATE TABLE foo (id int4, t text);\n); printf(BEGIN;\n); for(i=1; i = 10; i++) { printf(SAVEPOINT sp%d;\n, i); printf(INSERT INTO foo VALUES (1, 'f');\n); } printf(COMMIT;\n); Test case 1 is not bad, because we just keep appending to the parent array one at a time. Test case 2 might become slower, as the number of subtransactions increases, as at the commit of each subtransaction you need to enlarge the parent array and copy all the already-committed childxids to it. However, you hit the limit on amount of shared mem required for holding the per-xid locks before that becomes a problem, and the performance becomes dominated by other things anyway (notably LockReassignCurrentOwner). I initially thought that using a single palloc'd array to hold all the XIDs would introduce a new limit on the number committed subtransactions, thanks to MaxAllocSize, but that's not the case. Without patch, we actually allocate an array like that anyway in xactGetCommittedChildren. Elsewhere in our codebase where we use arrays that are enlarged as needed, we keep track of the allocated size and the used size of the array separately, and only call repalloc when the array fills up, and repalloc a larger than necessary array when it does. I chose to just call repalloc every time instead, as repalloc is smart enough to fall out quickly if the chunk the allocation was made in is already larger than the new size. There might be some gain avoiding the repeated repalloc calls, but I doubt it's worth the code complexity, and calling repalloc with a larger than necessary size can actually force it to unnecessarily allocate a new, larger chunk instead of reusing the old one. Thoughts on that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xact.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.258 diff -c -r1.258 xact.c *** src/backend/access/transam/xact.c 4 Mar 2008 19:54:06 - 1.258 --- src/backend/access/transam/xact.c 11 Mar 2008 12:31:28 - *** *** 130,136 int gucNestLevel; /* GUC context nesting depth */ MemoryContext curTransactionContext; /* my xact-lifetime context */ ResourceOwner curTransactionOwner; /* my query resources */ ! List *childXids; /* subcommitted child XIDs */ Oid prevUser; /* previous CurrentUserId setting */ bool prevSecDefCxt; /* previous SecurityDefinerContext setting */ bool prevXactReadOnly; /* entry-time xact r/o state */ --- 130,137 int gucNestLevel; /* GUC context nesting depth */ MemoryContext curTransactionContext; /* my xact-lifetime context */ ResourceOwner curTransactionOwner; /* my query resources */ ! TransactionId *childXids; /* subcommitted child XIDs, in XID order */ ! int nChildXids; /* # of subcommitted child XIDs */ Oid prevUser; /* previous CurrentUserId setting */ bool prevSecDefCxt; /* previous SecurityDefinerContext setting */ bool prevXactReadOnly; /* entry-time xact r/o state */ *** *** 156,162 0, /* GUC context nesting depth */ NULL, /* cur transaction context */ NULL, /* cur transaction resource owner */ ! NIL, /* subcommitted child Xids */ InvalidOid, /* previous CurrentUserId setting */ false, /* previous SecurityDefinerContext setting */ false, /* entry-time xact r/o state */ --- 157,164 0, /* GUC context nesting depth */ NULL, /* cur transaction context */ NULL, /* cur transaction resource owner */ ! NULL, /* subcommitted child Xids */ ! 0, /* # of subcommitted child Xids */ InvalidOid
Re: [PATCHES] Bulk Insert tuning
Gavin Sherry wrote: On Tue, Feb 26, 2008 at 02:43:51PM +, Simon Riggs wrote: Following patch implements a simple mechanism to keep a buffer pinned while we are bulk loading. CK Tan and I worked on something similar but the problem we discovered was self locking. Consider a primary key: we insert a tuple into a buffer and do not release the exclusive lock. The btree code fetches the buffer and tries to share lock it, but we've already exclusive locked it. Oops. The performance improvement, though, makes it worth seeing if there is a solution. That's not a problem if you only keep the buffer pinned, not locked. In my experience, pinning is the more expensive part, with the lookup into the buffer lookup table. Locking isn't free either, of course, but just avoiding the pin+unpin would help a lot. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] TransactionIdIsInProgress() cache
Alvaro Herrera wrote: I didn't check whether your transformation is correct, but if so then it can be changed like this and save the extra XidDidCommit call: xvac_committed = TransactionIdDidCommit(xvac); if (xvac_committed) { /* committed */ } else if (!TransactionIdIsInProgress(xvac)) { if (xvac_committed) { /* committed */ } else { /* aborted */ } } else { /* in-progress */ } Nope, that's not good. Per comments in tqual.c, you have to call TransactionIdIsInProgress *before* TransactionIdDidCommit, to avoid this race condition: 1. Xact A inserts a record 2. Xact B does TransactionIdDidCommit, which retuns false because it's still in progress 3. Xact B commits 4. Xact B does TransactionIdIsInProgress to see if A is still in progress. It returns false. We conclude that A aborted, while it actually committed. My proposal was basically to add an extra TransactionIdDidCommit call before the TransactionIdIsInProgress call, in the hope that it returns true and we can skip the rest. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PERFORM] Very slow (2 tuples/second) sequentialscan after bulk insert; speed returns to ~500 tuples/second aftercommit
Alvaro Herrera wrote: Heikki Linnakangas wrote: I couldn't let this case go, so I wrote a patch. I replaced the linked list with an array that's enlarged at AtSubCommit_childXids when necessary. Do you still need to palloc the return value from xactGetCommittedChildren? Perhaps you can save the palloc/memcpy/pfree and just return the pointer to the array already in memory? Yeah, good point. The callers just need to be modified not to pfree it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Andrew Dunstan wrote: Another question that occurred to me - did you try using strpbrk() to look for the next interesting character rather than your homegrown searcher gadget? If so, how did that perform? It looks like strpbrk() performs poorly: unpatched: testname | min duration --+- all | 00:00:08.099656 1/2 | 00:00:06.734241 1/4 | 00:00:06.016946 1/8 | 00:00:05.622122 1/16 | 00:00:05.304252 none | 00:00:05.155755 (6 rows) strpbrk: testname | min duration --+- all | 00:00:22.980997 1/2 | 00:00:13.724834 1/4 | 00:00:08.980246 1/8 | 00:00:06.582357 1/16 | 00:00:05.291485 none | 00:00:06.239468 (6 rows) memchr: testname | min duration --+- all | 00:00:13.684479 1/2 | 00:00:09.509213 1/4 | 00:00:06.921959 1/8 | 00:00:05.654761 1/16 | 00:00:04.719027 none | 00:00:03.718361 (6 rows) Attached is the test script and patches I used, if someone wants to test this on another platform. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/commands/copy.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v retrieving revision 1.295 diff -c -r1.295 copy.c *** src/backend/commands/copy.c 1 Jan 2008 19:45:48 - 1.295 --- src/backend/commands/copy.c 5 Mar 2008 14:44:58 - *** *** 67,72 --- 67,275 EOL_CRNL } EolType; + + /* Fast byte searcher functions * + * + * CopyReadLine needs to search for newlines, as well as quoting and escape + * characters to determine which newlines are real and which ones are escaped. + * Doing that in the naive way, looping through the input byte by byte and + * comparing against the interesting characters, can be slow. + * + * To speed that up, we provide a searcher interface, that can be used to + * search a piece of memory for up to 4 different bytes simultaneously. It's + * similar to memchr, but allows searching for multiple chars at the same + * time. + * + * To start a search on a new block of memory, call init_searcher. Then call + * next_searcher repeatedly to return all the occurrances of the bytes being + * searched for. + * + * The searcher implementation uses memchr to search for the bytes, and keeps + * track of where the next occurrance of each is. Using memchr allows us + * to take advantage of any platform-specific optimizations. + */ + + /* + * Struct used to store state of the current search between calls to + * next_searcher. Initialized by init_search. + */ + typedef struct { + /* Each element in c corresponds the element in s. These are sorted + * by the pointers (ptr) */ + int n; /* elements used in the arrays */ + char c[4]; /* bytes we're looking for */ + char *ptr[4]; /* pointers to next occurances of the bytes */ + char *end; /* end of block being searched. Last byte to search is end-1 */ + } searcher; + + /* Functions internal to searcher */ + + #define swap_searcher_entries(searcher, a, b) do { \ + char tmpc = (searcher)-c[a]; \ + char *tmpptr = (searcher)-ptr[a]; \ + (searcher)-c[a] = (searcher)-c[b]; \ + (searcher)-ptr[a] = (searcher)-ptr[b]; \ + (searcher)-c[b] = tmpc; \ + (searcher)-ptr[b] = tmpptr; \ + } while(0) + + static void + sort_searcher(searcher *s) + { + switch(s-n) + { + case 0: + case 1: + /* Nothing to do */ + break; + case 2: + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + break; + case 3: + if (s-ptr[0] s-ptr[2]) + swap_searcher_entries(s, 0, 2); + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[1] s-ptr[2]) + swap_searcher_entries(s, 1, 2); + break; + case 4: + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[2] s-ptr[3]) + swap_searcher_entries(s, 2, 3); + if (s-ptr[1] s-ptr[2]) + swap_searcher_entries(s, 2, 3); + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[2] s-ptr[3]) + swap_searcher_entries(s, 2, 3); + break; + } + } + + /* Remove the topmost element */ + static void + remove_top(searcher *searcher) + { + int i; + + searcher-n--; + + /* Move up the remaining items */ + for (i = 0; i searcher-n; i++) + { + searcher-c[i] = searcher-c[i + 1]; + searcher-ptr[i] = searcher-ptr[i + 1]; + } + } + + /* + * The first element in the list has been replaced by the caller. + * Move it to the right place in the list, to keep it sorted. + */ + static void + sift_down(searcher *searcher) + { + if (searcher-n 2 || searcher-ptr[0] = searcher-ptr[1]) + return; + swap_searcher_entries(searcher, 0, 1); + + if (searcher-n 3 || searcher-ptr[1] = searcher-ptr[2]) + return; + swap_searcher_entries(searcher, 1, 2); + + if (searcher-n 4 || searcher-ptr[2
Re: [PATCHES] CopyReadLineText optimization
Andrew Dunstan wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we could fall back to the current method on the first escape, quote or backslash character. Also, could we perhaps benefit from inlining some calls, or is your compiler doing that anyway? gcc does inline all static functions that are only called from one site, and small functions, using some heuristic. I don't think more aggressive inlining would help. Another question that occurred to me - did you try using strpbrk() to look for the next interesting character rather than your homegrown searcher gadget? If so, how did that perform? I haven't tried that. There's a small difference: strpbrk stops at '\0'. But come to think of it, I guess it doesn't matter. Will test... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] CopyReadAttributesCSV optimization
Andrew Dunstan wrote: Heikki Linnakangas wrote: Here's a patch to speed up CopyReadAttributesCSV. On the test case I've been playing with, loading the TPC-H partsupp table, about 20% CopyReadAttributesCSV (inlined into DoCopy, DoCopy itself is insignificant): [snip] The trick is to split the loop in CopyReadAttributesCSV into two parts, inside quotes, and outside quotes, saving some instructions in both parts. Your mileage may vary, but I'm quite happy with this. I haven't tested it much yet, but I wouldn't expect it to be a loss in any interesting scenario. The code also doesn't look much worse after the patch, perhaps even better. This looks sane enough, and worked for me in testing, so I'm going to apply it shortly. I'll probably add a comment or two about how the loops interact. Thanks. FWIW, I did some more performance testing, with input consisting of a lot of quotes, and it seems the performance gain holds even then. I was not able to find an input where the new version performs worse. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Andrew Dunstan wrote: Heikki Linnakangas wrote: Another update attached: It occurred to me that the memchr approach is only safe for server encodings, where the non-first bytes of a multi-byte character always have the hi-bit set. We currently make the following assumption in the code: * These four characters, and the CSV escape and quote characters, are * assumed the same in frontend and backend encodings. * The four characters are the carriage return, line feed, backslash and dot. I think the requirement might well actually be somewhat stronger than that: i.e. that none of these will appear as a non-first byte in any multi-byte client encoding character. If that's right, then we should be able to write CopyReadLineText without bothering about multi-byte chars. If it's not right then I suspect we have some cases that can fail now anyway. No, we don't require that, and we do handle it correctly. We use pg_encoding_mblen to determine the length of each character in CopyReadLineText when the encoding is a client-only encoding, and only look at the first byte of each character. In CopyReadAttributesText, where we have a similar loop, we've already transformed the input to server encoding. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we could fall back to the current method on the first escape, quote or backslash character. Also, could we perhaps benefit from inlining some calls, or is your compiler doing that anyway? gcc does inline all static functions that are only called from one site, and small functions, using some heuristic. I don't think more aggressive inlining would help. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Andrew Dunstan wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we could fall back to the current method on the first escape, quote or backslash character. That's far too conservative, I think. Somewhere a bit short of your observed breakeven point seems about right. The problem is, you don't know how many stop characters there is until you've counted them. We could fall back after X such characters, or only start using memchr after seeing 8 consecutive non-stop characters. Whatever we choose, the heuristic needs to be very simple and fast to check, otherwise we just introduce more overhead trying to decide which method to use. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] WIP: guc enums
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Magnus Hagander wrote: On my platform (linux x86) it works fine when I just cast this to (int *), but I'm unsure if that's going to be safe on other platforms. I had some indication that it's probably not? No, I don't think that's safe. Some googleing (*) suggests that the compiler is free to choose any integer type for an enum. Yeah, it's absolutely not safe. What I'd suggest is declaring the actual variable as int. You can still use an enum typedef to declare the values, and just avert your eyes when you have to cast the enum to int or vice versa. (This is legal per C spec, so you won't introduce any portability issues when you do it.) That's pretty much the same as int variable and #defined constants. You lose compiler checks, like assigning from one enum type to another, and the enumeration value ‘FOOBAR’ not handled in switch warning. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] WIP: guc enums
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: What I'd suggest is declaring the actual variable as int. You can still use an enum typedef to declare the values, and just avert your eyes when you have to cast the enum to int or vice versa. (This is legal per C spec, so you won't introduce any portability issues when you do it.) That's pretty much the same as int variable and #defined constants. You lose compiler checks, like assigning from one enum type to another, and the enumeration value ‘FOOBAR’ not handled in switch warning. Well, you can at least get the latter if you cast explicitly: switch ((MyEnum) myvariable) ... We do this in several places already where the underlying variable isn't declared as the enum for one reason or another. Also, local variables can be declared as the enum type to get a little more safety. True, I guess that's not too bad then. Just have to remember to do that. Regarding the places where we already do that, I could find just three: src/backend/utils/adt/lockfuncs.c: switch ((LockTagType) lock-tag.locktag_type) src/backend/storage/lmgr/lmgr.c:switch ((LockTagType) tag-locktag_type) src/backend/regex/regc_locale.c:switch ((enum classes) index) The first and 2nd are really the same, and it seems legitimate. At quick glance, I couldn't figure out why index is an int-variable, and not an enum, but that code comes from tcl. In any case, the alternative being suggested of keeping the variables as strings throws away *every* possible code-level advantage of having an enum variable classification. Oh no, I didn't suggest keeping the variables as strings, that's madness. I suggested keeping the variables as enums, and defining setter functions for them, similar to the assign hooks we have now, but the setter function wouldn't have to do anything else than assign an int to the enum variable. The setter function would be just a replacement for *((int *)variable) = X. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Heikki Linnakangas wrote: Heikki Linnakangas wrote: Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. Here's an updated version of the patch. The principle is the same, but the same optimization is now used for CSV input as well, and there's more comments. Another update attached: It occurred to me that the memchr approach is only safe for server encodings, where the non-first bytes of a multi-byte character always have the hi-bit set. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/commands/copy.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v retrieving revision 1.295 diff -c -r1.295 copy.c *** src/backend/commands/copy.c 1 Jan 2008 19:45:48 - 1.295 --- src/backend/commands/copy.c 5 Mar 2008 14:44:58 - *** *** 67,72 --- 67,275 EOL_CRNL } EolType; + + /* Fast byte searcher functions * + * + * CopyReadLine needs to search for newlines, as well as quoting and escape + * characters to determine which newlines are real and which ones are escaped. + * Doing that in the naive way, looping through the input byte by byte and + * comparing against the interesting characters, can be slow. + * + * To speed that up, we provide a searcher interface, that can be used to + * search a piece of memory for up to 4 different bytes simultaneously. It's + * similar to memchr, but allows searching for multiple chars at the same + * time. + * + * To start a search on a new block of memory, call init_searcher. Then call + * next_searcher repeatedly to return all the occurrances of the bytes being + * searched for. + * + * The searcher implementation uses memchr to search for the bytes, and keeps + * track of where the next occurrance of each is. Using memchr allows us + * to take advantage of any platform-specific optimizations. + */ + + /* + * Struct used to store state of the current search between calls to + * next_searcher. Initialized by init_search. + */ + typedef struct { + /* Each element in c corresponds the element in s. These are sorted + * by the pointers (ptr) */ + int n; /* elements used in the arrays */ + char c[4]; /* bytes we're looking for */ + char *ptr[4]; /* pointers to next occurances of the bytes */ + char *end; /* end of block being searched. Last byte to search is end-1 */ + } searcher; + + /* Functions internal to searcher */ + + #define swap_searcher_entries(searcher, a, b) do { \ + char tmpc = (searcher)-c[a]; \ + char *tmpptr = (searcher)-ptr[a]; \ + (searcher)-c[a] = (searcher)-c[b]; \ + (searcher)-ptr[a] = (searcher)-ptr[b]; \ + (searcher)-c[b] = tmpc; \ + (searcher)-ptr[b] = tmpptr; \ + } while(0) + + static void + sort_searcher(searcher *s) + { + switch(s-n) + { + case 0: + case 1: + /* Nothing to do */ + break; + case 2: + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + break; + case 3: + if (s-ptr[0] s-ptr[2]) + swap_searcher_entries(s, 0, 2); + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[1] s-ptr[2]) + swap_searcher_entries(s, 1, 2); + break; + case 4: + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[2] s-ptr[3]) + swap_searcher_entries(s, 2, 3); + if (s-ptr[1] s-ptr[2]) + swap_searcher_entries(s, 2, 3); + if (s-ptr[0] s-ptr[1]) + swap_searcher_entries(s, 0, 1); + if (s-ptr[2] s-ptr[3]) + swap_searcher_entries(s, 2, 3); + break; + } + } + + /* Remove the topmost element */ + static void + remove_top(searcher *searcher) + { + int i; + + searcher-n--; + + /* Move up the remaining items */ + for (i = 0; i searcher-n; i++) + { + searcher-c[i] = searcher-c[i + 1]; + searcher-ptr[i] = searcher-ptr[i + 1]; + } + } + + /* + * The first element in the list has been replaced by the caller. + * Move it to the right place in the list, to keep it sorted. + */ + static void + sift_down(searcher *searcher) + { + if (searcher-n 2 || searcher-ptr[0] = searcher-ptr[1]) + return; + swap_searcher_entries(searcher, 0, 1); + + if (searcher-n 3 || searcher-ptr[1] = searcher-ptr[2]) + return; + swap_searcher_entries(searcher, 1, 2); + + if (searcher-n 4 || searcher-ptr[2] = searcher-ptr[3]) + return; + swap_searcher_entries(searcher, 2, 3); + } + + /* + * Starts a new search in a block of memory. + * + * searcher - for storing state between calls. Opaque to caller, + * init_searcher will initialize this. + * str - block of memory to search + * len - length of str + * c - bytes to search for, max 4. + * n - number of bytes in c + */ + static void + init_searcher(searcher *searcher, char *str, int
Re: [PATCHES] WIP: guc enums
Magnus Hagander wrote: The patch only converts a couple of the potential enum variables to the new type, mainly as a proof of concept. But already I hit the problem twice - the variable that holds the value of the guc enum is a C enum itself, which gives a compiler warning when I pass a pointer to it for config_enum.variable. (in this case, Log_error_verbosity and log_statement are enums and have the problem, but client_min_messages, log_min_messages and log_min_error_statement are already int and don't have it) On my platform (linux x86) it works fine when I just cast this to (int *), but I'm unsure if that's going to be safe on other platforms. I had some indication that it's probably not? No, I don't think that's safe. Some googleing (*) suggests that the compiler is free to choose any integer type for an enum. If you do *((int *)enumptr) = x, and the compiler chose a 16-bit type for the enum, you overwrite some unrelated piece of memory. And if not, the only way I know to do it is to change the C level enums to be an int and use #define:s instead of what's there now. If that's required, is that an acceptable change in order to implement this? If not, any better ideas on how to do it? Yuck :-(. We could keep using the assignment hooks. But they could be a lot simpler than they are nowadays, if the string - int conversion was done by the GUC machinery: static const char * assign_client_min_messages(int newval, bool doit, GucSource source) { client_min_messages = newval; } Another idea would be cajole the compiler to choose a type of the same length as int, by adding a dummy enum value to the enum, like: enum client_min_messages { DEBUG, INFO, ..., DUMMY = INT_MAX } Though I guess it might in theory choose something even wider, and the *((int *)enumptr) = x would fail to set all the bits of the enum variable. BTW, shouldn't be using malloc in config_enum_get_options... (*): http://david.tribble.com/text/cdiffs.htm#C99-enum-type and what I believe to be the current C99 standard, see 6.7.2.2 Enumeration specifiers: http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches
Re: [PATCHES] CopyReadLineText optimization
Heikki Linnakangas wrote: Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. Here's an updated version of the patch. The principle is the same, but the same optimization is now used for CSV input as well, and there's more comments. I still need to do more benchmarking. I mentioned a ~5% speedup on the test I ran earlier, which was a load of the lineitem table from TPC-H. It looks like with cheaper data types the gain can be much bigger; here's an oprofile from loading the TPC-H partsupp table, Before: samples %image name symbol name 5146 25.7635 postgres CopyReadLine 4089 20.4716 postgres DoCopy 1449 7.2544 reiserfs (no symbols) 1369 6.8539 postgres pg_verify_mbstr_len 1013 5.0716 libc-2.7.so memcpy 749 3.7499 libc-2.7.so strtod_l_internal 598 2.9939 postgres heap_formtuple 548 2.7436 libc-2.7.so strtol_l_internal 403 2.0176 libc-2.7.so memset 309 1.5470 libc-2.7.so strlen 208 1.0414 postgres AllocSetAlloc ... After: samples %image name symbol name 4165 25.7879 postgres DoCopy 1574 9.7455 postgres pg_verify_mbstr_len 1520 9.4112 reiserfs (no symbols) 1005 6.2225 libc-2.7.so memchr 986 6.1049 libc-2.7.so memcpy 632 3.9131 libc-2.7.so strtod_l_internal 589 3.6468 postgres heap_formtuple 546 3.3806 libc-2.7.so strtol_l_internal 386 2.3899 libc-2.7.so memset 366 2.2661 postgres CopyReadLine 287 1.7770 libc-2.7.so strlen 215 1.3312 postgres LWLockAcquire 208 1.2878 postgres hash_any 176 1.0897 postgres LWLockRelease 161 0.9968 postgres InputFunctionCall 157 0.9721 postgres AllocSetAlloc ... Profile shows that with the patch, ~8.5% of the CPU time is spent in CopyReadLine+memchr, vs. 25.5% before. That's a quite significant speedup. I still need to test the worst-case performance, with input that has a lot of escapes. It would be interesting to hear reports with this patch from people on different platforms. These results are from my laptop with 32-bit Intel CPU, running Linux. There could be big differences in the memchr implementations. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/commands/copy.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v retrieving revision 1.295 diff -c -r1.295 copy.c *** src/backend/commands/copy.c 1 Jan 2008 19:45:48 - 1.295 --- src/backend/commands/copy.c 29 Feb 2008 17:43:53 - *** *** 67,72 --- 67,275 EOL_CRNL } EolType; + + /* Fast byte searcher functions * + * + * CopyReadLine needs to search for newlines, as well as quoting and escape + * characters to determine which newlines are real and which ones are escaped. + * Doing that in the naive way, looping through the input byte by byte and + * comparing against the interesting characters, can be slow. + * + * To speed that up, we provide a searcher interface, that can be used to + * search a piece of memory for up to 4 different bytes simultaneously. It's + * similar to memchr, but allows searching for multiple chars at the same + * time. + * + * To start a search on a new block of memory, call init_searcher. Then call + * next_searcher repeatedly to return all the occurrances of the bytes being + * searched for. + * + * The searcher implementation uses memchr to search for the bytes, and keeps + * track of where the next occurrance of each is. Using memchr allows us + * to take advantage of any platform-specific optimizations. + */ + + /* + * Struct used to store state of the current search between calls to + * next_searcher. Initialized by init_search. + */ + typedef struct { + /* Each element in c corresponds the element in s. These are sorted + * by the pointers (ptr) */ + int n; /* elements used in the arrays */ + char c[4]; /* bytes we're looking for */ + char *ptr[4]; /* pointers to next occurances of the bytes */ + char *end; /* end of block being searched. Last byte to search is end-1 */ + } searcher; + + /* Functions internal to searcher */ + + #define swap_searcher_entries(searcher, a, b) do { \ + char tmpc = (searcher)-c[a]; \ + char *tmpptr = (searcher)-ptr[a]; \ + (searcher)-c
[PATCHES] CopyReadAttributesCSV optimization
Here's a patch to speed up CopyReadAttributesCSV. On the test case I've been playing with, loading the TPC-H partsupp table, about 20% CopyReadAttributesCSV (inlined into DoCopy, DoCopy itself is insignificant): samples %image name symbol name 8136 25.8360 postgres CopyReadLine 6350 20.1645 postgres DoCopy 2181 6.9258 postgres pg_verify_mbstr_len 2157 6.8496 reiserfs (no symbols) 1668 5.2968 libc-2.7.so memcpy 1142 3.6264 libc-2.7.so strtod_l_internal 951 3.0199 postgres heap_formtuple 904 2.8707 libc-2.7.so strtol_l_internal 619 1.9656 libc-2.7.so memset 442 1.4036 libc-2.7.so strlen 341 1.0828 postgres hash_any 329 1.0447 postgres pg_atoi 300 0.9527 postgres AllocSetAlloc With this patch, the usage of that function goes down to ~13% samples %image name symbol name 7191 28.7778 postgres CopyReadLine 3257 13.0343 postgres DoCopy 2127 8.5121 reiserfs (no symbols) 1914 7.6597 postgres pg_verify_mbstr_len 1413 5.6547 libc-2.7.so memcpy 920 3.6818 libc-2.7.so strtod_l_internal 784 3.1375 libc-2.7.so strtol_l_internal 745 2.9814 postgres heap_formtuple 508 2.0330 libc-2.7.so memset 398 1.5928 libc-2.7.so strlen 315 1.2606 postgres hash_any 255 1.0205 postgres AllocSetAlloc The trick is to split the loop in CopyReadAttributesCSV into two parts, inside quotes, and outside quotes, saving some instructions in both parts. Your mileage may vary, but I'm quite happy with this. I haven't tested it much yet, but I wouldn't expect it to be a loss in any interesting scenario. The code also doesn't look much worse after the patch, perhaps even better. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/commands/copy.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v retrieving revision 1.295 diff -c -r1.295 copy.c *** src/backend/commands/copy.c 1 Jan 2008 19:45:48 - 1.295 --- src/backend/commands/copy.c 29 Feb 2008 20:57:09 - *** *** 2913,2919 for (;;) { bool found_delim = false; - bool in_quote = false; bool saw_quote = false; char *start_ptr; char *end_ptr; --- 3146,3151 *** *** 2934,3000 { char c; ! end_ptr = cur_ptr; ! if (cur_ptr = line_end_ptr) ! break; ! c = *cur_ptr++; ! /* unquoted field delimiter */ ! if (c == delimc !in_quote) { ! found_delim = true; ! break; ! } ! /* start of quoted field (or part of field) */ ! if (c == quotec !in_quote) ! { ! saw_quote = true; ! in_quote = true; ! continue; } ! /* escape within a quoted field */ ! if (c == escapec in_quote) { ! /* ! * peek at the next char if available, and escape it if it is ! * an escape char or a quote char ! */ ! if (cur_ptr line_end_ptr) ! { ! char nextc = *cur_ptr; ! if (nextc == escapec || nextc == quotec) { ! *output_ptr++ = nextc; ! cur_ptr++; ! continue; } } ! } ! /* ! * end of quoted field. Must do this test after testing for escape ! * in case quote char and escape char are the same (which is the ! * common case). ! */ ! if (c == quotec in_quote) ! { ! in_quote = false; ! continue; } - - /* Add c to output string */ - *output_ptr++ = c; } /* Terminate attribute value in output area */ *output_ptr++ = '\0'; - /* Shouldn't still be in quote mode */ - if (in_quote) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg(unterminated CSV quoted field))); - /* Check whether raw input matched null marker */ input_len = end_ptr - start_ptr; if (!saw_quote input_len == cstate-null_print_len --- 3166,3241 { char c; ! /* Not in quote */ ! for (;;) { ! end_ptr = cur_ptr; ! if (cur_ptr = line_end_ptr) ! goto endfield; ! c = *cur_ptr++; ! /* unquoted field delimiter */ ! if (c == delimc) ! { ! found_delim = true; ! goto endfield; ! } ! /* start of quoted field (or part of field) */ ! if (c == quotec) ! { ! saw_quote = true; ! break; ! } ! /* Add c to output string */ ! *output_ptr++ = c; } ! ! /* In quote */ ! for (;;) { ! end_ptr = cur_ptr
Re: [PATCHES] Fix for initdb failures on Vista
Dave Page wrote: The attached patch fixes problems reported primarily on Vista, but also on some Windows 2003 and XP installations in which initdb reports that it cannot find postgres.exe. A couple of minor nitpicks: Regarding the AddUserToDaclCleanup helper function, I would suggest putting all the cleanups at the end of AddUserToDacl, jump to the cleanup section with a goto. That's a commonly used pattern to do it. One problem with the Cleanup function is that if you need to add more cleanup code (probably not likely in this case, though), you need to modify the function signature and all callers. The comment in AddUserToDacl says This is required on Windows machines running some of Microsoft's latest security patches on XP/2K3, and on Vista/Longhorn boxes. The security patches we're talking about are not going to be the latest for very long; might want to rephrase that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] CopyReadLineText optimization
The purpose of CopyReadLineText is to scan the input buffer, and find the next newline, taking into account any escape characters. It currently operates in a loop, one byte at a time, searching for LF, CR, or a backslash. That's a bit slow: I've been running oprofile on COPY, and I've seen CopyReadLine to take around ~10% of the CPU time, and Joshua Drake just posted a very similar profile to hackers. Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. In the tests I've been running, it roughly halves the time spent in CopyReadLine (including the new memchr calls), thus reducing the total CPU overhead by ~5%. I'm planning to run more tests with data that has backslashes and with different width tables to see what the worst-case and best-case performance is like. Also, it doesn't work for CSV format at the moment; that needs to be fixed. 5% isn't exactly breathtaking, but it's a start. I tried the same trick to CopyReadAttributesText, but unfortunately it doesn't seem to help there because you need to stop the efficient word-at-a-time scan that memchr does (at least with glibc, YMMV) whenever there's a column separator, while in CopyReadLineText you get to process the whole line in one call, assuming there's no backslashes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/commands/copy.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v retrieving revision 1.295 diff -c -r1.295 copy.c *** src/backend/commands/copy.c 1 Jan 2008 19:45:48 - 1.295 --- src/backend/commands/copy.c 24 Feb 2008 00:40:09 - *** *** 67,72 --- 67,77 EOL_CRNL } EolType; + typedef struct { + char c; /* byte we're looking for */ + char *s; /* pointer to next occurance of this byte */ + } searchbyte; + /* * This struct contains all the state variables used throughout a COPY * operation. For simplicity, we use the same struct for all variants of COPY, *** *** 159,164 --- 164,171 char *raw_buf; int raw_buf_index; /* next byte to process */ int raw_buf_len; /* total # of bytes stored */ + + searchbyte sb[3]; /* pointers to next interesting characters in raw_buf */ } CopyStateData; typedef CopyStateData *CopyState; *** *** 170,175 --- 177,283 CopyState cstate; /* CopyStateData for the command */ } DR_copy; + #define swap_sb(a,b) do { \ + searchbyte c = (a); \ + (a) = (b); \ + (b) = (c); \ + } while(0) + + /* is a b? NULLs are last */ + #define cmp_sb(a,b) ((a).s (b).s (a).s != NULL) + + static void + sort_sb(searchbyte *sb) + { + if (cmp_sb(sb[2], sb[1])) + swap_sb(sb[1], sb[2]); + if (cmp_sb(sb[1], sb[0])) + swap_sb(sb[0], sb[1]); + if (cmp_sb(sb[2], sb[1])) + swap_sb(sb[1], sb[2]); + } + + /* + * Starts a new search in a string + */ + static void + init_interesting(searchbyte *sb, char *str, int len) + { + sb[0].c = '\n'; + sb[1].c = '\r'; + sb[2].c = '\\'; + sb[0].s = memchr(str, '\n', len); + sb[1].s = memchr(str, '\r', len); + sb[2].s = memchr(str, '\\', len); + sort_sb(sb); + } + + /* + * Look for the next interesting character + * + * sb - array of three searchbytes. + * ss - string to search + * len_left - number of bytes left in ss to search + * + * Returns the offset of the next interesting location, relative to 'ss' + */ + static int + next_interesting(searchbyte *sb, char *ss, int len_left) + { + char *s; + + if (sb[0].s == NULL) + return 0; + + /* + * The caller skipped over the next pointer we had memorized, so we have to + * search for the one after that. + */ + if (ss sb[0].s) + { + sb[0].s = memchr(ss, sb[0].c, len_left); + if (sb[0].s == NULL) + return 0; /* we're done */ + if (sb[1].s != NULL) + { + if (ss sb[1].s) + { + /* + * The caller skipped over the 2nd pointer we had memorized + * as well. Have to search for that as well + */ + sb[1].s = memchr(ss, sb[1].c, len_left); + if (sb[2].s != NULL ss sb[2].s) + { + /* Caller skipped over 3rd pointer as well... */ + sb[2].s = memchr(ss, sb[2].c, len_left); + } + } + sort_sb(sb); + } + } + + /* + * Return the next interesting location we had memorized, and search + * for the next occurance of that byte. + */ + s = sb[0].s; + + sb[0].s = memchr(s+1, sb[0].c, (ss+len_left) - (s + 1)); + + /* + * Bubble down the next location into the three-element list we have, + */ + if (cmp_sb(sb[1], sb[0])) + { + swap_sb(sb[0], sb[1]); + if (cmp_sb(sb[2], sb[1])) + swap_sb(sb[1], sb[2]); + } + + return s - ss; + } + /* * These macros centralize code used to process line_buf
Re: [PATCHES] tzcode update
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: On Mon, Feb 18, 2008 at 04:32:58PM +0100, Bjorn Munch wrote: Ouch! This fails on our Solaris builds, because we build with the Solaris timezone files. And these apparently don't work beyond 2038 and don't know that Finland plans to have DST also in the summer of 2050... I haven't studied this in depth but I'm afraid the answer is fix your own timezone files? Or use the ones shipping with pg ;-) Yeah. I included the far-future cases in the committed patch specifically to make it obvious if you were using tz files that lacked post-2038 support. Agreed, we want to give people a notice. What we could do is to add comments to the expected output file, that would show up in regression.diffs, that explain what the failure means. I can't imagine that fixing this isn't pretty darn high on the Solaris to-do list, anyway. Yep, and PostgreSQL 8.4 won't be released for quite some time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] tzcode update
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Looking closer, I don't understand how that change was supposed to do anything. The point of that patch is to avoid an off-by-one result for years BC. The direction of rounding in integer division with a negative numerator is undefined in C (or at least used to be --- did C99 tighten this up?). Oh, I see. In that case we're good. The corresponding new code in tzcode actually looks like this: ! static int ! leaps_thru_end_of(const int y) ! { ! return (y = 0) ? (y / 4 - y / 100 + y / 400) : ! -(leaps_thru_end_of(-(y + 1)) + 1); ! } -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] tzcode update
I just noticed that I had accidentally reverted this change in the patch: /* * Note: the point of adding 4800 is to ensure we make the same * assumptions as Postgres' Julian-date routines about the placement of * leap years in centuries BC, at least back to 4713BC which is as far as * we'll go. This is effectively extending Gregorian timekeeping into * pre-Gregorian centuries, which is a tad bogus but it conforms to the * SQL spec... */ #define LEAPS_THRU_END_OF(y)(((y) + 4800) / 4 - ((y) + 4800) / 100 + ((y) + 4800) / 400) vs original in tzcode2003e: #define LEAPS_THRU_END_OF(y)((y) / 4 - (y) / 100 + (y) / 400) Looking closer, I don't understand how that change was supposed to do anything. If I'm doing my math right, our +4800 version of the code can be written as: y/4 - y/100 - y/400 + 1164. The only place this macro is used is this: days -= ((int64) (newy - y)) * DAYSPERNYEAR + LEAPS_THRU_END_OF(newy - 1) - LEAPS_THRU_END_OF(y - 1); AFAICS, the constant + 1164 is always cancelled out, making the exercise of adding 4800 a waste of time. Am I missing something? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] tzcode update
We included the public domain timezone library by Arthur David Olson back in 2004 into our source tree, but we haven't kept it up to date with the upstream changes since. We've made a number of small changes to our version of the library, including: - formatting changes, mostly thanks to pgindent - widened time_t to 8 bytes - we only include the parts of the library that we need which means that we can't just take the new library version and drop it in the src/timezone directory. We can debate whether those changes were a good idea or not, given that they make the merging harder, but my take is that it's not that bad given how seldom and little the upstream code changes. I was not able to find anything like release notes that would list the differences between tzcode2003e, which I believe is the version that we included back then, and the latest version tzcode2007k. So I just took a diff between those, and deduced from there what has changed. The main difference between those version seems to be support for 64-bit time_t, including - extended data file format, with 64-bit time values - extrapolation of DST transitions in a 400 year cycle, for values not directly covered by the transition table. In addition to that, they've added public domain notice to the top of ialloc.c, and some other cosmetic changes. We already had such a notice in our version of the file, but the original did not. I merged the upstream changes, mostly manually, trying to be faithful to the original diff to make future merges easier. But I also made some whitespace/comment changes, like we've done before to our version of the code: no double-stars in comments, braces on lines of their own. Attached is the resulting patch against Postgres CVS HEAD. In addition to manually merging the patch, I had to teach pg_next_dst_boundary how to extrapolate the DST transitions. The code there is copy-pasted from localsub, so I copy-pasted that change from there as well. Also, they now use a binary search instead of linear search in localsub, and I copied that change to pg_next_dst_bounday as well (that is why I started looking at this in the first place, http://archives.postgresql.org/pgsql-patches/2007-09/msg00291.php) I don't really know how to test this. It passes the regression tests, after the fixes to pg_dst_next_boundary, but I was expecting there to be some failures now that we support timezones for timestamps outside the 32-bit time_t range. Apparently our tests don't cover that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com pg-tzcode2007k.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Endless recovery
Hans-Juergen Schoenig wrote: this is he last info which was issued ... nothing in between ... during the rm_cleanup() nothing was logged into the logs. this is the last log from today dawn: [2008-02-11 03:45:16 CET ]LOG: lost parent for block 8558565 [2008-02-11 03:45:16 CET ]LOG: index 1663/16384/16578435 needs VACUUM or REINDEX to finish crash recovery [2008-02-11 03:45:16 CET ]DETAIL: Incomplete insertion detected during crash replay. [2008-02-11 03:47:54 CET ]LOG: database system is ready [2008-02-11 03:47:54 CET ]LOG: transaction ID wrap limit is 1073742476, limited by database blids that's where it finished, nothing else was logged between the redo done and the last log messages I bet you've bumped into a bug in gist redo code, the cleanup phase shouldn't take long. It's just for completing any incomplete splits by inserting pointers to upper-level pages, and there shouldn't be more than a few of those active at any point in time. It looks like there's been quite a few changes to gistlog.c that haven't been back-patched. This one in particular might be relevant here: revision 1.15 date: 2006-04-03 17:45:50 +0100; author: tgl; state: Exp; lines: +15 -13; Fix thinko in gistRedoPageUpdateRecord: if XLR_BKP_BLOCK_1 is set, we don't have anything to do to the page, but we still have to adjust the incomplete_inserts list that we're maintaining in memory. Teodor, what do you think? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Endless recovery
Hans-Juergen Schoenig wrote: Last week we have seen a problem with some horribly configured machine. The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed the system (-9). After two days PostgreSQL has still not started up and they tried to restart it again and again making sure that the consistency check was started over an over again (thus causing more and more downtime). From the admi point of view there was no way to find out whether the machine was actually dead or still recovering. Here is a small patch which issues a log message indicating that the recovery process can take ages. Maybe this can prevent some admis from interrupting the recovery process. Wait, are you saying that the time was spent in the rm_cleanup phase? That sounds unbelievable. Surely the time was spent in the redo phase, no? In our case, the recovery process took 3.5 days !! That's a ridiculously long time. Was this a normal recovery, not a PITR archive recovery? Any idea why the recovery took so long? Given the max. checkpoint timeout of 1h, I would expect that the recovery would take a maximum of few hours even with an extremely write-heavy workload. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] DDL in EDB-SPL
While looking at the package function precedence problem, I bumped into another unrelated bug: According to a quick Google search, Oracle doesn't accept DDL in PL/SQL; you have to use EXECUTE IMMEDIATE to do that. Trying to run DDL in the edb-spl fails with a bizarre error message. For example, for CREATE TABLE footable (full test case attached): ERROR: syntax error at or near footable LINE 1: CREATE footable2 (id integer) So the TABLE token seems to be stripped away somewhere. This begs the question of what happens with CREATE TEMPORARY TABLE. Lo and behold, it does what you might've guessed, kind of. TEMPORARY is stripped away, leaving just CREATE TABLE tablename. However, we've set the package namespace as the special namespace, and that's the current default creation namespace. Therefore the table gets created inside the package namespace. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com DROP PACKAGE foopack; CREATE PACKAGE foopack IS foovar integer; PROCEDURE fooproc; END; CREATE PACKAGE BODY foopack IS PROCEDURE fooproc IS BEGIN CREATE TEMPORARY TABLE footable2 (id integer); END; END; exec foopack.fooproc; ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] DDL in EDB-SPL
Pavel Stehule wrote: Wrong address :) Shit! I knew this was going to happen eventually, because when I start to type pa... in the address bar, [EMAIL PROTECTED] is the first hit... Sorry for the noise, please ignore... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Proposed patch to disallow password=foo in databasename parameter
Alvaro Herrera wrote: Magnus Hagander wrote: On Mon, Dec 10, 2007 at 10:47:19PM -0500, Tom Lane wrote: If we want to prevent it for psql, we should actually prevent it *in* psql, not in libpq. There are an infinite number of scenarios where it's perfectly safe to put the password there... If we want to do it share, we should add a function like PQSanitizeConnectionString() that will remove it, that can be called from those client apps that may be exposing it. There are also platforms that don't show the full commandline to other users - or even other processes - that aren't affected, of course. One idea is to have psql hide the password on the ps status. That way it becomes less of a security issue. It would still be a problem on certain operating systems, but at least several common platforms would be covered. There would still be race condition. It would still be visible until psql hides it. In a way that would be even worse, because it wouldn't be obvious to an administrator that there's a problem because the password wouldn't be visible in ps output, but hackers know about stuff like that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] hashlittle(), hashbig(), hashword() and endianness
Alex Vinokur wrote: On Nov 15, 10:40 am, Alex Vinokur [EMAIL PROTECTED] wrote: [snip] I have some question concerning Bob Jenkins' functions hashword(uint32_t*, size_t), hashlittle(uint8_t*, size_t) and hashbig(uint8_t*, size_t) in lookup3.c. Let k1 by a key: uint8_t* k1; strlen(k1)%sizeof(uint32_t) == 0. 1. hashlittle(k1) produces the same value on Little-Endian and Big- Endian machines. Let hashlittle(k1) be == L1. 2. hashbig(k1) produces the same value on Little-Endian and Big-Endian machines. Let hashbig(k1) be == B1. L1 != B1 3. hashword((uint32_t*)k1) produces * L1 on LittleEndian machine and * B1 on BigEndian machine. === - The question is: is it possible to change hashword() to get * L1 on Little-Endian machine and * B1 on Big-Endian machine ? Sorry, it should be as follows: Is it possible to create two new hash functions on basis of hashword(): i) hashword_little () that produces L1 on Little-Endian and Big- Endian machines; ii) hashword_big ()that produces B1 on Little-Endian and Big- Endian machines ? Why? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Miscalculation in IsCheckpointOnSchedule()
Tom Lane wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: -((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / +((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / Surely this makes matters worse, not better. What happens near a segment boundary crossing? Hmm. There seems to be another little bug in there. XLogSegsPerFile is defined as 0x/XLogSegSize, which is 255 with default settings. It should be 256. That leads to negative elapsed_xlogs estimates at xlog file boundaries. XLogCheckpointNeeded suffers from it too; the number of segments consumed since last checkpoint is off by one per each xlogid, so we trigger the checkpoint a little bit too late. Otherwise, the patch looks good to me. I also tested the calculation with a little C program (attached), and it seems to work on segment and xlog file boundaries just fine. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com #include stdio.h #include stdlib.h typedef unsigned char uint8; /* == 8 bits */ typedef unsigned short uint16; /* == 16 bits */ typedef unsigned int uint32; /* == 32 bits */ typedef signed char int8; /* == 8 bits */ typedef signed short int16; /* == 16 bits */ typedef signed int int32; /* == 32 bits */ #define XLOG_SEG_SIZE (16*1024*1024) #define XLogSegSize ((uint32) XLOG_SEG_SIZE) #define XLogSegsPerFile (((uint32) 0x) / XLogSegSize) int main(int argc, char **argv) { double elapsed_xlogs; uint32 cur_xrecoff, cur_xlogid; uint32 ckpt_xrecoff, ckpt_xlogid; double a, b; cur_xlogid = strtoul(argv[1], NULL, 10); cur_xrecoff = strtoul(argv[2], NULL, 10); ckpt_xlogid = strtoul(argv[3], NULL, 10); ckpt_xrecoff = strtoul(argv[4], NULL, 10); //a = (cur_xlogid - ckpt_xlogid) * XLogSegsPerFile; //b = (((double) cur_xrecoff) - ((double) ckpt_xrecoff)) / ((double )XLogSegSize); //a = ((double) (int32) (cur_xlogid - ckpt_xlogid)) * XLogSegsPerFile; //b = ((double) (int32) (cur_xrecoff - ckpt_xrecoff)) / XLogSegSize; a = (cur_xlogid - ckpt_xlogid) * XLogSegsPerFile; b = ((double) cur_xrecoff - (double) ckpt_xrecoff) / XLogSegSize; elapsed_xlogs = a + b; printf(XLogSegsPerFile: %d\n, XLogSegsPerFile); printf(xrecoff: %u %f\n, ckpt_xrecoff, (double) ckpt_xrecoff); printf(a: %f\n, a); printf(b: %f\n, b); printf(elapsed_xlogs = %f\n, elapsed_xlogs); } ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Miscalculation in IsCheckpointOnSchedule()
Heikki Linnakangas wrote: Tom Lane wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: - ((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / Surely this makes matters worse, not better. What happens near a segment boundary crossing? Hmm. There seems to be another little bug in there. XLogSegsPerFile is defined as 0x/XLogSegSize, which is 255 with default settings. It should be 256. That leads to negative elapsed_xlogs estimates at xlog file boundaries. XLogCheckpointNeeded suffers from it too; the number of segments consumed since last checkpoint is off by one per each xlogid, so we trigger the checkpoint a little bit too late. I'll take that back. We intentionally don't use the last possible segment of each xlog file, to avoid overflows. Sorry for the noise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords
Jan Urbański wrote: The solution I came up with was simple: write a dictionary, that does only one thing: looks up the lexeme in a stopwords file and either discards it or returns NULL. Doesn't the simple dictionary handle this? I don't think so. The 'simple' dictionary discards stopwords, but accepts any other lexemes. So if use {'simple', 'pl_ispell'} for my config, I'll get rid of the stopwords, but I won't get any lexemes stemmed by ispell. Every lexeme that's not a stopword will produce the very same lexeme (this is how I think the 'simple' dictionary works). My dictionary does basically the same thing as the 'simple' dictionary, but it returns NULL instead of the original lexeme in case the lexeme is not found in the stopwords file. In the long term, what we really need a more flexible way to chain dictionaries. In this case, you would first check against one stopword list, eliminating 'od', then check the ispell dictionary, and then check another stopword list without 'od'. I suggested that a while ago (http://archives.postgresql.org/pgsql-hackers/2007-08/msg01036.php). Hopefully Oleg or someone else gets around restructuring the dictionaries in a future release. I wonder if you could hack the ispell dictionary file to treat oda specially? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] mingw autoconf again
Magnus Hagander wrote: So I was fixing my MingW environment to test and fix that issue with the functions missing. In doing so, I came across the error previously discussed that gettimeofday() is now suddently defined in the latest versions of mingw, but wasn't before. Is it just missing from header files, or does it really not exist? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Autovacuum cancellation
Alvaro Herrera wrote: /* * Look for a blocking autovacuum. There will only ever * be one, since the autovacuum workers are careful * not to operate concurrently on the same table. */ I think that's a bit unaccurate. You could have multiple autovacuum workers operating on different tables participating in a deadlock. The reason that can't happen is that autovacuum never holds a lock while waiting for another. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend