Re: [HACKERS] tsearch2 in 8.3
On Tue, 24 Apr 2007, Bruce Momjian wrote: Naz Gassiep wrote: A few of us on IRC were wondering what the status of tsearch2 is in 8.3 ? Was it decided to include it in core or did we decide to keep FTS as a plugin? Some brief comments from anyone on the inside of the whole FTS issue would be greatly appreciated by us mere end users. Regards, The patch is in the patch queue and we will try to get it into 8.3. Let me just say, that for me this is the most anticipated feature for 8.3. Along with the patch to allow the database owner to create trusted PLs, this will allow me to move all but one of my databases to my hosting provider's PostgreSQL instance from my own instance running in my home directory (the one I cannot move also requires dblink). I can only imagine there are other users out there in similar circumstances to mine. I was lucky enough to find a hosting provider with shell access where I can run a postgres instance and that I already had the know-how to do so. Without running my own instance, my only other option was to choose the lesser of two evils: do without FTS, or use MySQL. ;) Sorry for the rant, I just wanted to make sure that people knew that this is not just cosmetic, or a restructure for its own sake, but will actually help real world users. -- The cow is nothing but a machine which makes grass fit for us people to eat. -- John McNulty ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
1) To deal with partial/inconsisitent write to the data file at crash recovery, we need full page writes at the first modification to pages after each checkpoint. It consumes much of WAL space. We need to find a way around this someday. Other DBs don't do this; it may be becuase they're less durable, or because they fixed the problem. They eighter can only detect a failure later (this may be a very long time depending on access and verify runs) or they also write page images. Those that write page images usually write before images to a different area that is cleared periodically (e.g. during checkpoint). Writing to a different area was considered in pg, but there were more negative issues than positive. So imho pg_compresslog is the correct path forward. The current discussion is only about whether we want a more complex pg_compresslog and no change to current WAL, or an increased WAL size for a less complex implementation. Both would be able to compress the WAL to the same archive log size. Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Buildfarm: Stage logs not available for MSVC builds
I just noticed that the stage logs aren't displayed against MSVC build hosts as they are for regular hosts, eg: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mastodondt=2007-04-25%2001:00:02 vs. http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2007-04-25%2002:00:03 Is this WIP, or a bug to be fixed? Regards Dave. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] autovacuum does not start in HEAD
I found that autovacuum launcher does not launch any workers in HEAD. AFAICS, we track the time to be vaccumed of each database in the following way: 1. In rebuild_database_list(), we initialize avl_dbase-adl_next_worker with (current_time + autovacuum_naptime / nDBs). 2. In do_start_worker(), we skip database entries that adl_next_worker is between current_time and current_time + autovacuum_naptime. 3. If there is no jobs in do_start_worker(), we call rebuild_database_list() to rebuild database entries. The point is we use the same range (current_time and current_time + autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the range, and drop all of them at 2 because their values are in the range. And if there is no database to vacuum, we re-initilaize database list at 3, then we repeat the cycle. Or am I missing something? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might be causing this? http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquitadt=2007-04-24%2020:00:05 The only other Vista buildfarm member (baiji, on the same physical box) is running MSVC builds which don't yet test ECPG from what I can see. Regards, Dave ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Buildfarm: Stage logs not available for MSVC builds
Andrew Dunstan wrote: The problem not beacuse of MSVC, but because of member misconfiguration, by the look of it. The tar command string will need to be set in the config file and tar installed. I found that I needed bsdtar for Windows for this to work. See Ah, OK, thanks - there was a typo in the path entry I'd added for tar. 'tis working now. Regards, Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Avoiding unnecessary reads in recovery
In recovery, with full_pages_writes=on, we read in each page only to overwrite the contents with a full page image. That's a waste of time, and can have a surprisingly large effect on recovery time. As a quick test on my laptop, I initialized a DBT-2 test with 5 warehouses, and let it run for 2 minutes without think-times to generate some WAL. Then I did a kill -9 postmaster, and took a copy of the data directory to use for testing recovery. With CVS HEAD, the recovery took ~ 2 minutes. With the attached patch, it took 5 seconds. (yes, I used the same not-yet-recovered data directory in both tests, and cleared the os cache with echo 1 /proc/sys/vm/drop_caches). I was surprised how big a difference it makes, but when you think about it it's logical. Without the patch, it's doing roughly the same I/O as the test itself, reading in pages, modifying them, and writing them back. With the patch, all the reads are done sequentially from the WAL, and then written back in a batch at the end of the WAL replay which is a lot more efficient. It's interesting that (with the patch) full_page_writes can *shorten* your recovery time. I've always thought it to have a purely negative effect on performance. I'll leave it up to the jury if this tiny little change is appropriate after feature freeze... While working on this, this comment in ReadBuffer caught my eye: /* * During WAL recovery, the first access to any data page should * overwrite the whole page from the WAL; so a clobbered page * header is not reason to fail. Hence, when InRecovery we may * always act as though zero_damaged_pages is ON. */ if (zero_damaged_pages || InRecovery) { But that assumption only holds if full_page_writes is enabled, right? I changed that in the attached patch as well, but if it isn't accepted that part of it should still be applied, I think. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xlogutils.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlogutils.c,v retrieving revision 1.49 diff -c -r1.49 xlogutils.c *** src/backend/access/transam/xlogutils.c 5 Jan 2007 22:19:24 - 1.49 --- src/backend/access/transam/xlogutils.c 25 Apr 2007 11:40:09 - *** *** 226,232 if (blkno lastblock) { /* page exists in file */ ! buffer = ReadBuffer(reln, blkno); } else { --- 226,235 if (blkno lastblock) { /* page exists in file */ ! if(init) ! buffer = ZapBuffer(reln, blkno); ! else ! buffer = ReadBuffer(reln, blkno); } else { Index: src/backend/storage/buffer/bufmgr.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.216 diff -c -r1.216 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 30 Mar 2007 18:34:55 - 1.216 --- src/backend/storage/buffer/bufmgr.c 25 Apr 2007 11:44:27 - *** *** 97,102 --- 97,103 static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty, int set_flag_bits); static void buffer_write_error_callback(void *arg); + static Buffer ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only); static volatile BufferDesc *BufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr); static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln); *** *** 121,126 --- 122,148 Buffer ReadBuffer(Relation reln, BlockNumber blockNum) { + return ReadBuffer_common(reln, blockNum, false); + } + + /* + * ZapBuffer -- like ReadBuffer, but doesn't read the contents of the page + * from disk. The caller is expected to completely rewrite the page, + * regardless of the current contents. This should only be used in + * recovery where there's no concurrent readers that might see the + * contents of the page before the caller rewrites it. + */ + Buffer + ZapBuffer(Relation reln, BlockNumber blockNum) + { + Assert(InRecovery); + + return ReadBuffer_common(reln, blockNum, true); + } + + static Buffer + ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only) + { volatile BufferDesc *bufHdr; Block bufBlock; bool found; *** *** 253,269 } else { smgrread(reln-rd_smgr, blockNum, (char *) bufBlock); /* check for garbage data */ if (!PageHeaderIsValid((PageHeader) bufBlock)) { /* ! * During WAL recovery, the first access to any data page should ! * overwrite the whole page from the WAL; so a clobbered page ! * header is not reason to fail. Hence, when InRecovery we may ! * always act as though zero_damaged_pages is ON. */ ! if (zero_damaged_pages || InRecovery) {
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Wed, Apr 25, 2007 at 10:00:16AM +0200, Zeugswetter Andreas ADI SD wrote: 1) To deal with partial/inconsisitent write to the data file at crash recovery, we need full page writes at the first modification to pages after each checkpoint. It consumes much of WAL space. We need to find a way around this someday. Other DBs don't do this; it may be becuase they're less durable, or because they fixed the problem. They eighter can only detect a failure later (this may be a very long time depending on access and verify runs) or they also write page images. Those that write page images usually write before images to a different area that is cleared periodically (e.g. during checkpoint). Writing to a different area was considered in pg, but there were more negative issues than positive. So imho pg_compresslog is the correct path forward. The current discussion is only about whether we want a more complex pg_compresslog and no change to current WAL, or an increased WAL size for a less complex implementation. Both would be able to compress the WAL to the same archive log size. Andreas I definitely am in the camp of not increasing WAL size at all. If we need a bit more complexity to ensure that, so be it. Any approach that increases WAL volume would need to have an amazing benefit to make it warranted. This certainly does not meet that criteria. Ken ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] autovacuum does not start in HEAD
ITAGAKI Takahiro wrote: I found that autovacuum launcher does not launch any workers in HEAD. AFAICS, we track the time to be vaccumed of each database in the following way: 1. In rebuild_database_list(), we initialize avl_dbase-adl_next_worker with (current_time + autovacuum_naptime / nDBs). 2. In do_start_worker(), we skip database entries that adl_next_worker is between current_time and current_time + autovacuum_naptime. 3. If there is no jobs in do_start_worker(), we call rebuild_database_list() to rebuild database entries. The point is we use the same range (current_time and current_time + autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the range, and drop all of them at 2 because their values are in the range. And if there is no database to vacuum, we re-initilaize database list at 3, then we repeat the cycle. Or am I missing something? Note that rebuild_database_list skips databases that don't have stat entries. Maybe that's what confusing your examination. When the list is empty, worker are launched only every naptime seconds; and then it'll also pick only databases with stat entries. All other databases will be skipped until the max_freeze_age is reached. Right after an initdb or a WAL replay, all database stats are deleted. The point of (1) is to spread the starting of workers in the autovacuum_naptime interval. The point of (2) is that we don't want to process a database that was processed too recently (less than autovacuum_naptime seconds ago). This is useful in the cases where databases are dropped, so the launcher is awakened earlier than what the schedule would say if the dropped database were not in the list. It is possible that I confused the arithmetic in there (because TimestampDifference does not return negative results so there may be strange corner cases), but the last time I examined it it was correct. The point of (3) is to cover the case where there were no databases being previously autovacuumed and that may now need vacuuming (i.e. just after a database got its stat entry). The fact that some databases may not have stat entries tends to confuse the logic, both in rebuild_database_list and do_start_worker. If it's not documented enough maybe it needs extra clarification in code comments. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Avoiding unnecessary reads in recovery
Heikki Linnakangas wrote: While working on this, this comment in ReadBuffer caught my eye: /* * During WAL recovery, the first access to any data page should * overwrite the whole page from the WAL; so a clobbered page * header is not reason to fail. Hence, when InRecovery we may * always act as though zero_damaged_pages is ON. */ if (zero_damaged_pages || InRecovery) { But that assumption only holds if full_page_writes is enabled, right? I changed that in the attached patch as well, but if it isn't accepted that part of it should still be applied, I think. On second thought, my fix still isn't 100% right because one could turn full_page_writes on before starting replay. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Avoiding unnecessary reads in recovery
Heikki Linnakangas [EMAIL PROTECTED] writes: While working on this, this comment in ReadBuffer caught my eye: /* * During WAL recovery, the first access to any data page should * overwrite the whole page from the WAL; so a clobbered page * header is not reason to fail. Hence, when InRecovery we may * always act as though zero_damaged_pages is ON. */ if (zero_damaged_pages || InRecovery) { But that assumption only holds if full_page_writes is enabled, right? I changed that in the attached patch as well, but if it isn't accepted that part of it should still be applied, I think. Well it's only true if full_page_writes was on when the WAL was written. Which isn't necessarily the same as saying it's enabled during recovery... As long as there's a backup block in the log we can use it to clobber pages in the heap -- which is what your patch effectively does anyways. If we're replaying a log entry where there isn't a backup block and we find a damaged page then we're in trouble. Either the damaged page was in a previous backup block or it's the recovery itself that's damaging it. In the latter case it would be pretty useful to abort the recovery so the user doesn't lose his backup and has a chance to recovery properly (possibly after reporting and fixing the bug). So in short I think with your patch this piece of code no longer has a role. Either your patch kicks in and we never even look at the damaged page at all, or we should be treating it as corrupt data and just check zero_damaged_pages alone and not do anything special in recovery. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might be causing this? Hmm, first glance suggests some permission problems. I never touched a Vista system so far, so I'm at a loss as far as details are concerned. I also saw that wombat is segfaulting in ecpg tests but not only with CVS HEAD but also trying to test 8.2. Any idea what's going on with this machine? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] database size estimates
Hi, I am trying to estimate the size of a table composed of 51754000 rows. Each row has 31 attributes: 16 x bit(24) and 15 x bit(8) So, the payload should be: 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB Now, from what I understand from postgresql manual is that the overhead is composed of 32 bytes per row + 20 bytes per page. This leads me to approx. 1700 MB overhead. Therefore a total table size of 4900 MB. However, when I load the table, the table reaches the size of 21500 MB, i.e., 400% of my estimate. The table has no toast table, no index. So I am wondering if someone could give me a better estimate. Cheers, Francois begin:vcard fn;quoted-printable:Fran=C3=A7ois Deli=C3=A8ge n;quoted-printable;quoted-printable:Deli=C3=A8ge;Fran=C3=A7ois org:AAU;CS department adr:;;;Aalborg;;9000;Danemark email;internet:[EMAIL PROTECTED] title:PhD student tel;work:+45 96359830 x-mozilla-html:TRUE url:http://www.cs.aau.dk/~fdeliege version:2.1 end:vcard ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Michael Meskes wrote: On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might be causing this? Hmm, first glance suggests some permission problems. Yes, that was my thought as well, however I ran cacls down the entire build tree, granting full control to Everyone on everything, but the problem persists. Additionally, all the other tests pass, including make check and make installcheck, so unless the ECPG tests are trying to create something outside of the buildtree, I don't think it is a permissions issue. Is there anything I can try building/running manually to help debug this? I also saw that wombat is segfaulting in ecpg tests but not only with CVS HEAD but also trying to test 8.2. Any idea what's going on with this machine? Michael Can't help with that one I'm afraid. Regards, Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] database size estimates
Francois Deliege wrote: Hi, I am trying to estimate the size of a table composed of 51754000 rows. Each row has 31 attributes: 16 x bit(24) and 15 x bit(8) So, the payload should be: 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB What data types are those exactly? If those 24-bit fields are in fact text, varchar, char(x) or other data types that are stored as variable length fields, the varlen header will take 4 bytes. And then there's alignment, those 24-bit fields are most almost certainly 4-byte aligned, which means that there'll be one byte of padding between them. The upcoming 8.3 release will be much better in that respect, It'll use just a 1 byte varlen header per field instead of 4 bytes for small values like yours. You might want to test a CVS snapshot. Now, from what I understand from postgresql manual is that the overhead is composed of 32 bytes per row + 20 bytes per page. This leads me to approx. 1700 MB overhead. Therefore a total table size of 4900 MB. In addition, there will be on average 1/2 rows worth of wasted space on every page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Kill a Long Running Query
Hi , Any body tell me how to kill a long running query in postgresql, is there any statement to kill a query, and also tell me how to log slow queries to a log file. Regards J Mageshwaran ** DISCLAIMER ** Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail notify us immediately at [EMAIL PROTECTED] Complete Coverage of the ICC World Cup '07! Log on to www.sify.com/khel for latest updates, expert columns, schedule, desktop scorecard, photo galleries and more! Watch the hottest videos from Bollywood, Fashion, News and more only on www.sifymax.com For the Expert view of the ICC World Cup log on to www.sify.com/khel. Read exclusive interviews with Sachin, Ganguly, Yuvraj, Sreesanth, Expert Columns by Gavaskar, Web chat with Dhoni and more! . ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [ADMIN] Kill a Long Running Query
On 4/25/07, Mageshwaran [EMAIL PROTECTED] wrote: Hi , Any body tell me how to kill a long running query in postgresql, is there any statement to kill a query, and also tell me how to log slow queries to a log file. Regards J Mageshwaran See if this helps: http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00039.php -- == Aaron Bono Aranya Software Technologies, Inc. http://www.aranya.com http://codeelixir.com ==
Re: [HACKERS] Kill a Long Running Query
Mageshwaran wrote: Hi , Any body tell me how to kill a long running query in postgresql, is there any statement to kill a query, and also tell me how to log slow queries to a log file. First. please do not cross-post like this. Pick the correct list and use it. Second, this query definitely does not belong on the -hackers list. Third, please find a way of posting to lists that does not include a huge disclaimer and advertisements. If that is added by your company's mail server, you should look at using some other method of posting such as gmail. Fourth, please read our excellent documentation. It contains the answers to your questions, I believe. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Kill a Long Running Query
Please don't cross-post to multiple mailing lists. And pgsql-hackers is not the correct list for basic usage questions. And long end-of-mail disclaimers are not generally appreciated. Mageshwaran wrote: Any body tell me how to kill a long running query in postgresql, is there any statement to kill a query, See the user manual on administration functions, pg_cancel_backend in particular: http://www.postgresql.org/docs/8.2/interactive/functions-admin.html Basically you issue a SELECT * FROM pg_stat_activity, or plain ps to find out the pid of the backend executing the long running query, and then use pg_cancel_backend (or kill -INT) to cancel it. and also tell me how to log slow queries to a log file. Using the log_min_duration_statement configuration variable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Avoiding unnecessary reads in recovery
Gregory Stark wrote: So in short I think with your patch this piece of code no longer has a role. Either your patch kicks in and we never even look at the damaged page at all, or we should be treating it as corrupt data and just check zero_damaged_pages alone and not do anything special in recovery. Good point. Adjusted patch attached. I added some comments as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xlogutils.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlogutils.c,v retrieving revision 1.49 diff -c -r1.49 xlogutils.c *** src/backend/access/transam/xlogutils.c 5 Jan 2007 22:19:24 - 1.49 --- src/backend/access/transam/xlogutils.c 25 Apr 2007 14:27:05 - *** *** 226,232 if (blkno lastblock) { /* page exists in file */ ! buffer = ReadBuffer(reln, blkno); } else { --- 226,235 if (blkno lastblock) { /* page exists in file */ ! if (init) ! buffer = ZapBuffer(reln, blkno); ! else ! buffer = ReadBuffer(reln, blkno); } else { Index: src/backend/storage/buffer/bufmgr.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.216 diff -c -r1.216 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 30 Mar 2007 18:34:55 - 1.216 --- src/backend/storage/buffer/bufmgr.c 25 Apr 2007 14:36:15 - *** *** 17,22 --- 17,26 * and pin it so that no one can destroy it while this process * is using it. * + * ZapBuffer() -- like ReadBuffer, but destroys the contents of the + * page. Used in recovery when the page is completely overwritten + * from WAL. + * * ReleaseBuffer() -- unpin a buffer * * MarkBufferDirty() -- mark a pinned buffer's contents as dirty. *** *** 97,102 --- 101,107 static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty, int set_flag_bits); static void buffer_write_error_callback(void *arg); + static Buffer ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only); static volatile BufferDesc *BufferAlloc(Relation reln, BlockNumber blockNum, bool *foundPtr); static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln); *** *** 121,126 --- 126,155 Buffer ReadBuffer(Relation reln, BlockNumber blockNum) { + return ReadBuffer_common(reln, blockNum, false); + } + + /* + * ZapBuffer -- like ReadBuffer, but doesn't read the contents of the page + * from disk. The caller is expected to completely rewrite the page, + * regardless of the current contents. This should only be used in + * recovery where there's no concurrent readers that might see the + * contents of the page before the caller rewrites it. + */ + Buffer + ZapBuffer(Relation reln, BlockNumber blockNum) + { + Assert(InRecovery); + + return ReadBuffer_common(reln, blockNum, true); + } + + /* + * ReadBuffer_common -- common logic of ReadBuffer and ZapBuffer. + */ + static Buffer + ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only) + { volatile BufferDesc *bufHdr; Block bufBlock; bool found; *** *** 253,269 } else { smgrread(reln-rd_smgr, blockNum, (char *) bufBlock); /* check for garbage data */ if (!PageHeaderIsValid((PageHeader) bufBlock)) { /* ! * During WAL recovery, the first access to any data page should ! * overwrite the whole page from the WAL; so a clobbered page ! * header is not reason to fail. Hence, when InRecovery we may ! * always act as though zero_damaged_pages is ON. */ ! if (zero_damaged_pages || InRecovery) { ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED), --- 282,304 } else { + /* + * Read in the page, unless the caller intends to overwrite it + * and just wants us to allocate a buffer. + */ + if (!alloc_only) + { smgrread(reln-rd_smgr, blockNum, (char *) bufBlock); /* check for garbage data */ if (!PageHeaderIsValid((PageHeader) bufBlock)) { /* ! * We used to ignore corrupt pages in WAL recovery, but ! * that was only ever safe if full_page_writes was enabled. ! * Now the caller sets alloc_only to false if he intends to ! * overwrite the whole page, which is already checked above. */ ! if (zero_damaged_pages) { ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED), *** *** 277,282 --- 312,318 errmsg(invalid page header in block %u of relation \%s\, blockNum, RelationGetRelationName(reln; } + } } if (isLocalBuf) Index: src/include/storage/bufmgr.h === RCS file:
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Dave Page wrote: Michael Meskes wrote: On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might be causing this? Hmm, first glance suggests some permission problems. Yes, that was my thought as well, however I ran cacls down the entire build tree, granting full control to Everyone on everything, but the problem persists. Please don't do that on your buildfarm repo copy (if that's what you did). You should not touch *anything* inside it. If need to you do this, make a copy (see later) and alter that. If you did do this to the buildfarm repo copy, please blow it away so that buildfarm will get a fresh clean copy next time it runs. Additionally, all the other tests pass, including make check and make installcheck, so unless the ECPG tests are trying to create something outside of the buildtree, I don't think it is a permissions issue. Is there anything I can try building/running manually to help debug this? To recreate the failing buildfarm environment (including making an alterable repo copy), run the following (with suitable local mods): run_build.pl --test --keepall At the end of that you should have the build tree that was actually used, and the install tree too if it got that far. Then you can start playing manually. Remember to move or remove the kept trees before your next buildfarm run. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Andrew Dunstan wrote: Please don't do that on your buildfarm repo copy (if that's what you did). You should not touch *anything* inside it. If need to you do this, make a copy (see later) and alter that. If you did do this to the buildfarm repo copy, please blow it away so that buildfarm will get a fresh clean copy next time it runs. No, I reset the permissions on /msys. Thats entirely necessary on Vista where the permissions are so locked down by default that you can't even create it without some measure of pain. Regards, Dave. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Dave Page wrote: Andrew Dunstan wrote: Please don't do that on your buildfarm repo copy (if that's what you did). You should not touch *anything* inside it. If need to you do this, make a copy (see later) and alter that. If you did do this to the buildfarm repo copy, please blow it away so that buildfarm will get a fresh clean copy next time it runs. No, I reset the permissions on /msys. Thats entirely necessary on Vista where the permissions are so locked down by default that you can't even create it without some measure of pain. Yes, my Vista experience so far has been very unpleasant indeed. It strikes me as the OS equivalent of jumping the shark. I haven't even thought about Msys ... cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Avoiding unnecessary reads in recovery
On Wed, 2007-04-25 at 13:48 +0100, Heikki Linnakangas wrote: I was surprised how big a difference it makes, but when you think about it it's logical. Without the patch, it's doing roughly the same I/O as the test itself, reading in pages, modifying them, and writing them back. With the patch, all the reads are done sequentially from the WAL, and then written back in a batch at the end of the WAL replay which is a lot more efficient. Interesting patch. It would be good to see a longer term test. I'd expect things to fall back to about 50% of the time on a longer recovery where the writes need to be written out of cache. I was thinking of getting the bgwriter to help out to improve matters there. Patch-wise, I love the name ZapBuffer() but would think that OverwriteBuffer() would be more descriptive. As regards the zero_damaged_pages question, I raised that some time ago but we didn't arrive at an explicit answer. All I would say is we can't allow invalid pages in the buffer manager at any time, whatever options we have requested, otherwise other code will fail almost immediately. I'm not sure there's another option? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] temporal variants of generate_series()
On Thu, 2007-04-12 at 14:56 -0700, Andrew Hammond wrote: I've written the following function definitions to extend generate_series to support some temporal types (timestamptz, date and time). Please include them if there's sufficient perceived need or value. I could see these being useful, but a PL/PgSQL implementation is not eligible for inclusion in the core backend (since PL/PgSQL is not enabled by default). -Neil ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
On 4/25/07, Michael Meskes [EMAIL PROTECTED] wrote: I also saw that wombat is segfaulting in ecpg tests but not only with CVS HEAD but also trying to test 8.2. Any idea what's going on with this machine? I generated a stack trace for REL8_2_STABLE, but I'm not sure how helpful it is. Let me know what other information I can provide... Looks like I don't have symbols for libc. Core was generated by `sql/update '. Program terminated with signal 11, Segmentation fault. #0 ECPGget_variable (ap=value optimized out, type=value optimized out, var=0x10028730, indicator=1 '\001') at execute.c:131 131 var-ind_value = *((char **) (var-ind_pointer)); (gdb) bt #0 ECPGget_variable (ap=value optimized out, type=value optimized out, var=0x10028730, indicator=1 '\001') at execute.c:131 #1 0x04048948 in ECPGdo (lineno=28, compat=value optimized out, force_indicator=value optimized out, connection_name=value optimized out, query=value optimized out) at execute.c:195 #2 0x1d20 in main (argc=value optimized out, argv=value optimized out) at update.pgc:28 #3 0x0463ce4c in .generic_start_main () from /lib/libc.so.6 #4 0x0463d0f8 in .__libc_start_main () from /lib/libc.so.6 #5 0x in ?? () Regards, Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Mark Wong wrote: On 4/25/07, Michael Meskes [EMAIL PROTECTED] wrote: I also saw that wombat is segfaulting in ecpg tests but not only with CVS HEAD but also trying to test 8.2. Any idea what's going on with this machine? I generated a stack trace for REL8_2_STABLE, but I'm not sure how helpful it is. Let me know what other information I can provide... Looks like I don't have symbols for libc. Core was generated by `sql/update '. Program terminated with signal 11, Segmentation fault. #0 ECPGget_variable (ap=value optimized out, type=value optimized out, var=0x10028730, indicator=1 '\001') at execute.c:131 131 var-ind_value = *((char **) (var-ind_pointer)); (gdb) bt #0 ECPGget_variable (ap=value optimized out, type=value optimized out, var=0x10028730, indicator=1 '\001') at execute.c:131 #1 0x04048948 in ECPGdo (lineno=28, compat=value optimized out, force_indicator=value optimized out, connection_name=value optimized out, query=value optimized out) at execute.c:195 #2 0x1d20 in main (argc=value optimized out, argv=value optimized out) at update.pgc:28 #3 0x0463ce4c in .generic_start_main () from /lib/libc.so.6 #4 0x0463d0f8 in .__libc_start_main () from /lib/libc.so.6 #5 0x in ?? () I think you'll need to compile with optimisation turned off and then try running the test under debugger control, putting a breakpoint in ECPGget_variable() and then stepping through it. I wonder what value of var-ind_pointer it is getting? cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Avoiding unnecessary reads in recovery
Simon Riggs [EMAIL PROTECTED] writes: As regards the zero_damaged_pages question, I raised that some time ago but we didn't arrive at an explicit answer. All I would say is we can't allow invalid pages in the buffer manager at any time, whatever options we have requested, otherwise other code will fail almost immediately. Yeah --- the proposed new bufmgr routine should probably explicitly zero the content of the buffer. It doesn't really matter in the context of WAL recovery, since there can't be any concurrent access to the buffer, but it'd make it safe to use in non-WAL contexts (I think there are other places where we know we are going to init the page and so a physical read is a waste of time). Also, this would let the patch be + if (alloc_only) + MemSet... + else smgrread... and you don't need to hack out the PageHeaderIsValid test, since it will allow zeroed pages. Possibly ReadZeroedBuffer would be a better name? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Andrew Dunstan [EMAIL PROTECTED] writes: I think you'll need to compile with optimisation turned off and then try running the test under debugger control, putting a breakpoint in ECPGget_variable() and then stepping through it. I wonder what value of var-ind_pointer it is getting? You could probably inspect the contents of *var in that dump without having to recompile. Given that this is PPC64, I'm betting on a pointer size or alignment problem ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Andreas, Writing to a different area was considered in pg, but there were more negative issues than positive. So imho pg_compresslog is the correct path forward. The current discussion is only about whether we want a more complex pg_compresslog and no change to current WAL, or an increased WAL size for a less complex implementation. Both would be able to compress the WAL to the same archive log size. Huh? As conceived, pg_compresslog does nothing to lower log volume for general purposes, just on-disk storage size for archiving. It doesn't help us at all with the tremendous amount of log we put out for an OLTP server, for example. Not that pg_compresslog isn't useful on its own for improving warm standby managability, but it's completely separate from addressing the we're logging too much issue. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
On 4/25/07, Tom Lane [EMAIL PROTECTED] wrote: Andrew Dunstan [EMAIL PROTECTED] writes: I think you'll need to compile with optimisation turned off and then try running the test under debugger control, putting a breakpoint in ECPGget_variable() and then stepping through it. I wonder what value of var-ind_pointer it is getting? You could probably inspect the contents of *var in that dump without having to recompile. Given that this is PPC64, I'm betting on a pointer size or alignment problem ... Does this help? (gdb) p var-type $1 = 4267828624 (gdb) p var-value $2 = (void *) 0x1 (gdb) p var-pointer $3 = (void *) 0x1 (gdb) p var-varcharsize $4 = 3 (gdb) p var-arrsize $5 = 4 (gdb) p var-offset $6 = 29 (gdb) p var-ind_type $7 = 0 (gdb) p var-ind_pointer $8 = (void *) 0x0 (gdb) p var-ind_varcharsize $9 = 0 (gdb) p var-ind_arrsize $10 = 0 (gdb) p var-ind_offset $11 = 5 (gdb) p var-next $12 = (struct variable *) 0x0 Regards, Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect
I wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: We could have two kinds of seq scans, with and without support for concurrent inserts. Yeah, I considered that too, but it just seems too error-prone. We could maybe make it trustworthy by having hash_seq_search complain if it noticed there had been any concurrent insertions --- but then you're putting new overhead into hash_seq_search, which kind of defeats the argument for it (and hash_seq_search is a bit of a bottleneck, so extra cycles there matter). I just finished looking through the uses of hash_seq_search, and realized that there is one place where it would be a bit painful to convert to the insertion-safe approach I'm proposing; namely nodeAgg.c. The places where the hashtable iteration is started and used are scattered, and we don't really track whether the iteration is done or not, so it's hard to be sure where to cancel the iteration. It could probably be made to work but it seems like it'd be fragile. I still don't want to introduce more checking overhead into hash_seq_search, though, so what I'm now thinking about is a new dynahash primitive named something like hash_freeze, which'd mark a hashtable as disallowing insertions. If the hashtable is frozen before hash_seq_init then we don't add it to the central list of scans, and therefore there is no cleanup to do at the end. nodeAgg can use this mode since it doesn't modify its hashtable anymore after beginning its readout scan. BTW, we didn't really get into details, but for the insertion-safe case I'm envisioning adding a routine hash_seq_term, which you would need to call if and only if you abandon a hash_seq_search scan without running it to completion (if you do the complete scan, hash_seq_search will automatically call hash_seq_term before returning NULL). All but a very small number of places run their searches to completion and therefore won't require any source code changes with this API. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] strange buildfarm failures
Stefan Kaltenbrunner wrote: two of my buildfarm members had different but pretty weird looking failures lately: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02 any ideas on what might causing those ? lionfish just failed too: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09 Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
On 4/25/07, Tom Lane [EMAIL PROTECTED] wrote: Mark Wong [EMAIL PROTECTED] writes: Does this help? (gdb) p var-ind_pointer $8 = (void *) 0x0 Well, that seems to be the reason why it's failing to indirect through ind_pointer ... but why is it only failing on your machine and not everyone else's? I think this indicates something unportable about ecpg's usage of va_list. Hmm, and I don't have to look far to find a smoking gun: #if defined(__GNUC__) (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) if (create_statement(lineno, compat, force_indicator, con, stmt, query, args) == false) #else if (create_statement(lineno, compat, force_indicator, con, stmt, query, args) == false) #endif Why in the world is that like that? We don't have such a kluge anyplace else we use va_list. stringinfo.c for instance has never needed any such thing. Mark, does your gcc define __powerpc__, or only __powerpc64__? $ touch foo.c; gcc -E -dM foo.c | grep __p ; rm foo.c #define __powerpc64__ 1 #define __powerpc__ 1 Regards, Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Tom Lane wrote: Hmm, and I don't have to look far to find a smoking gun: #if defined(__GNUC__) (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) if (create_statement(lineno, compat, force_indicator, con, stmt, query, args) == false) #else if (create_statement(lineno, compat, force_indicator, con, stmt, query, args) == false) #endif I also see: #if defined(__GNUC__) (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) #define APREF ap #else #define APREF *ap #endif But I also see that my amd64/FC6 machine does pass these tests with gcc. I would certainly be nice if we could simplify all this. And if not, we should have a note about why it's needed. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] My upcoming travel
I am leaving Friday for a two week trip to Sydney and Melbourne, Australia, Mumbai and Pune, India, and London, England. I will be attending Open Cebit in Sydney, and a Sydney PostgreSQL Users Group meeting. London is planning to put together a user group meeting. I expect both events to appear on the main PostgreSQL web page once they are finalized. If Melbourne, Mumbai, or Pune have user groups, I would be glad to visit them too. I will try to read email during the trip, but obviously not as frequently. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)
Andrew Dunstan [EMAIL PROTECTED] writes: But I also see that my amd64/FC6 machine does pass these tests with gcc. Yeah, but the typedef represented by va_list can and probably does vary between amd64 and ppc64. I haven't an easy way to check, but I wonder whether it's not an array type on ppc. I'm of the opinion that ecpg is trying to do something here that's not portable. The C99 draft I have says [#3] The type declared is va_list which is an object type suitable for holding information needed by the macros va_start, va_arg, va_end, and va_copy. If access to the varying arguments is desired, the called function shall declare an object (referred to as ap in this subclause) having type va_list. The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap. (198) 198 It is permitted to create a pointer to a va_list and pass that pointer to another function, in which case the original function may make further use of the original list after the other function returns. The footnote seems to say that what the code is doing is OK ... but there isn't any such footnote in the Single Unix Spec: http://www.opengroup.org/onlinepubs/007908799/xsh/stdarg.h.html which makes me wonder just how portable it really is. My recommendation is to get rid of the APREF hack, deal only in va_list not va_list, and inline ECPGget_variable into the two places it's used to avoid the question of passing va_lists around after they've been modified. The routine's not that big (especially seeing that only half of it is actually shared by the two callers) and it's just not worth the notational effort, let alone any portability risks, to factor it out. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fragmentation project
Josh, On 4/23/07, Josh Berkus [EMAIL PROTECTED] wrote: Gustavo, Oh, you're talking about distributing partitions across different nodes and parallelizing queries. No, we don't do that today. Yes.This is the goal. Well, I will try it. I'll send the project reports to this list. Comments will be valuable. Desire me good luck... You might join/look at the PgPoolII project, which is working on parallel query amoung other things. The pgpool is an interesting approach to this, but I think that the funcionality of inserting a record at a backend which will be redirectioned to other and verifying deadlocks under network demands in acquiring locks on the referenced records/tables in several hosts. Then, IMO, this may be implemented inside dbms. How Marko wrote, this is a non-trivial solution... Really, It could be improved on pgpool to be a process coordinator, but will need some changes in backend too. This is a non trivial implementation, but there are several users waiting for an effective solution for data distributing in a cluster. These users actually buy commercial solutions or build themselves one. Gustavo. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fragmentation project
Marko, On 4/24/07, Marko Kreen [EMAIL PROTECTED] wrote: On 4/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Oh, you're talking about distributing partitions across different nodes and parallelizing queries. No, we don't do that today. PL/Proxy actually works like that, only in smaller scope - for function calls only. I think that proposed funcionalities cannot be implemented in a PL scope... Gustavo. General solution that partitions free-form SQL will be non-trivial... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Re: [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Tom Lane wrote: I wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: We could have two kinds of seq scans, with and without support for concurrent inserts. Yeah, I considered that too, but it just seems too error-prone. We could maybe make it trustworthy by having hash_seq_search complain if it noticed there had been any concurrent insertions --- but then you're putting new overhead into hash_seq_search, which kind of defeats the argument for it (and hash_seq_search is a bit of a bottleneck, so extra cycles there matter). I just finished looking through the uses of hash_seq_search, and realized that there is one place where it would be a bit painful to convert to the insertion-safe approach I'm proposing; namely nodeAgg.c. The places where the hashtable iteration is started and used are scattered, and we don't really track whether the iteration is done or not, so it's hard to be sure where to cancel the iteration. It could probably be made to work but it seems like it'd be fragile. I still don't want to introduce more checking overhead into hash_seq_search, though, so what I'm now thinking about is a new dynahash primitive named something like hash_freeze, which'd mark a hashtable as disallowing insertions. If the hashtable is frozen before hash_seq_init then we don't add it to the central list of scans, and therefore there is no cleanup to do at the end. nodeAgg can use this mode since it doesn't modify its hashtable anymore after beginning its readout scan. This plan includes having the list of hash tables that mustn't be expanded? And the list would be cleaned up at the end of transaction, to avoid leaks. BTW, we didn't really get into details, but for the insertion-safe case I'm envisioning adding a routine hash_seq_term, which you would need to call if and only if you abandon a hash_seq_search scan without running it to completion (if you do the complete scan, hash_seq_search will automatically call hash_seq_term before returning NULL). All but a very small number of places run their searches to completion and therefore won't require any source code changes with this API. Sounds good to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Avoiding unnecessary reads in recovery
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: As regards the zero_damaged_pages question, I raised that some time ago but we didn't arrive at an explicit answer. All I would say is we can't allow invalid pages in the buffer manager at any time, whatever options we have requested, otherwise other code will fail almost immediately. Yeah --- the proposed new bufmgr routine should probably explicitly zero the content of the buffer. It doesn't really matter in the context of WAL recovery, since there can't be any concurrent access to the buffer, but it'd make it safe to use in non-WAL contexts (I think there are other places where we know we are going to init the page and so a physical read is a waste of time). Is there? I can't think of any. Extending a relation doesn't count. Zeroing the buffer explicitly might be a good idea anyway. I agree it feels a bit dangerous to have a moment when there's a garbled page in buffer cache, even if only in recovery. Also, this would let the patch be + if (alloc_only) + MemSet... + else smgrread... and you don't need to hack out the PageHeaderIsValid test, since it will allow zeroed pages. Well, I'd still put the PageHeaderIsValid test in the else-branch. Not like the few cycles spent on the test matter, but I don't see a reason not to. Possibly ReadZeroedBuffer would be a better name? Yeah, sounds good. Read is a bit misleading, since it doesn't actually read in the buffer, but it's good that the name is closely related to ReadBuffer. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I still don't want to introduce more checking overhead into hash_seq_search, though, so what I'm now thinking about is a new dynahash primitive named something like hash_freeze, which'd mark a hashtable as disallowing insertions. If the hashtable is frozen before hash_seq_init then we don't add it to the central list of scans, and therefore there is no cleanup to do at the end. nodeAgg can use this mode since it doesn't modify its hashtable anymore after beginning its readout scan. This plan includes having the list of hash tables that mustn't be expanded? And the list would be cleaned up at the end of transaction, to avoid leaks. Right, all that's still the same. This is just a way to exempt certain scans from that machinery, by allowing the caller to declare he doesn't need to modify the hashtable anymore. AFAICS that covers our needs, at least for the present. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Avoiding unnecessary reads in recovery
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: but it'd make it safe to use in non-WAL contexts (I think there are other places where we know we are going to init the page and so a physical read is a waste of time). Is there? I can't think of any. Extending a relation doesn't count. No, but re-using a free page in an index does. I'm not sure which index AMs know for sure the page is free, and which have to read it and check, but I think there's at least some scope for that. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Josh Berkus [EMAIL PROTECTED] writes: Andreas, So imho pg_compresslog is the correct path forward. The current discussion is only about whether we want a more complex pg_compresslog and no change to current WAL, or an increased WAL size for a less complex implementation. Both would be able to compress the WAL to the same archive log size. Huh? As conceived, pg_compresslog does nothing to lower log volume for general purposes, just on-disk storage size for archiving. It doesn't help us at all with the tremendous amount of log we put out for an OLTP server, for example. I don't see how what you said refutes what he said. The sticking point here is that the patch as-proposed *increases* the log volume before compression. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] strange buildfarm failures
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Stefan Kaltenbrunner wrote: two of my buildfarm members had different but pretty weird looking failures lately: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02 any ideas on what might causing those ? lionfish just failed too: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09 And had a similar failure a few days ago. The curious thing is that what we get in the postmaster log is LOG: server process (PID 23405) was terminated by signal 6: Aborted LOG: terminating any other active server processes You would think SIGABRT would come from an assertion failure, but there's no preceding assertion message in the log. The other characteristic of these crashes is that *all* of the failing regression instances report terminating connection because of crash of another server process, which suggests strongly that the crash was in an autovacuum process (if it were bgwriter or stats collector the postmaster would've said so). So I think the recent autovac patches are at fault. I spent a bit of time trolling for a spot where the code might abort() without having printed anything, but didn't find one. If any of the buildfarm owners can get a stack trace from the core dump of one of these events, it'd be mighty helpful. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] database size estimates
Heikki Linnakangas [EMAIL PROTECTED] writes: Francois Deliege wrote: Hi, I am trying to estimate the size of a table composed of 51754000 rows. Each row has 31 attributes: 16 x bit(24) and 15 x bit(8) So, the payload should be: 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB What data types are those exactly? If those 24-bit fields are in fact text, varchar, char(x) or other data types that are stored as variable length fields, And sadly that includes bit() if you're being literal. As of Postgres 8.1 you can see how much space a column is taking up using the pg_column_size() function. This won't include alignment padding but will include the length header for that column. You can see how much a given row is taking up by passing the entire row to og_column_size with something like pg_column_size(tab.*) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fragmentation project
Gustavo, The pgpool is an interesting approach to this, but I think that the funcionality of inserting a record at a backend which will be redirectioned to other and verifying deadlocks under network demands in acquiring locks on the referenced records/tables in several hosts. Then, IMO, this may be implemented inside dbms. How Marko wrote, this is a non-trivial solution... Really, It could be improved on pgpool to be a process coordinator, but will need some changes in backend too. This is a non trivial implementation, but there are several users waiting for an effective solution for data distributing in a cluster. These users actually buy commercial solutions or build themselves one. Yeah, I was just thinking that if you start from scratch you're not likely to get very far ... you might look at some of the existing partial solutions (pgPoolII, PostgreSQLForest, ExtenDB, etc.) and see if you can improve them. --Josh ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Hi, Zeugswetter Andreas ADI SD wrote: I don't insist the name and the default of the GUC parameter. I'm afraid wal_fullpage_optimization = on (default) makes some confusion because the default behavior becomes a bit different on WAL itself. Seems my wal_fullpage_optimization is not a good name if it caused misinterpretation already :-( Amount of WAL after 60min. run of DBT-2 benchmark wal_add_optimization_info = off (default) 3.13GB how about wal_fullpage_optimization = on (default) The meaning of wal_fullpage_optimization = on (default) would be the same as your wal_add_optimization_info = off (default). (Reversed name, reversed meaning of the boolean value) It would be there to *turn off* the (default) WAL full_page optimization. For your pg_compresslog it would need to be set to off. add_optimization_info sounded like added info about/for some optimization which it is not. We turn off an optimization with the flag for the benefit of an easier pg_compresslog implementation. For pg_compresslog to remove full page writes, we need wal_add_optimization_info=on. As already said I would decouple this setting from the part that sets the removeable full page flag in WAL, and making the recovery able to skip dummy records. This I would do unconditionally. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- - Koichi Suzuki ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] autovacuum does not start in HEAD
I wrote: I found that autovacuum launcher does not launch any workers in HEAD. The attached autovacuum-fix.patch could fix the problem. I changed to use 'greater or equal' instead of 'greater' at the decision of next autovacuum target. The point was in the resolution of timer; There is a platform that timer has only a resolution of milliseconds. We initialize adl_next_worker with current_time in rebuild_database_list(), but we could use again the same value in do_start_worker(), because there is no measurable difference in those low-resolution-platforms. Another attached patch, autovacuum-debug.patch, is just for printf-debug. I got the following logs without fix -- autovacuum never works. # SELECT oid, datname FROM pg_database ORDER BY oid; oid | datname ---+--- 1 | template1 11494 | template0 11495 | postgres 16384 | bench (4 rows) # pgbench bench -s1 -c1 -t10 [with configurations of autovacuum_naptime = 10s and log_min_messages = debug1] LOG: do_start_worker skip : 230863399.25, 230863399.25, 230863409.25 LOG: rebuild_database_list: db=11495, time=230863404.25 LOG: rebuild_database_list: db=16384, time=230863409.25 DEBUG: autovacuum: processing database bench LOG: do_start_worker skip : 230863404.25, 230863404.25, 230863414.25 LOG: do_start_worker skip : 230863404.25, 230863409.25, 230863414.25 LOG: rebuild_database_list: db=11495, time=230863409.25 LOG: rebuild_database_list: db=16384, time=230863414.25 LOG: do_start_worker skip : 230863409.25, 230863409.25, 230863419.25 LOG: do_start_worker skip : 230863409.25, 230863414.25, 230863419.25 LOG: rebuild_database_list: db=11495, time=230863414.25 LOG: rebuild_database_list: db=16384, time=230863419.25 ... (no autovacuum activities forever) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center autovacuum-debug.patch Description: Binary data autovacuum-fix.patch Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] strange buildfarm failures
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Stefan Kaltenbrunner wrote: two of my buildfarm members had different but pretty weird looking failures lately: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02 any ideas on what might causing those ? Just for the record, quagga and emu failures don't seem related to the report below. They don't crash; the regression.diffs contains data that suggests that there may be data corruption of some sort. INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226'); ERROR: invalid cidr value: %{ This doesn't seem to make much sense. lionfish just failed too: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09 And had a similar failure a few days ago. The curious thing is that what we get in the postmaster log is LOG: server process (PID 23405) was terminated by signal 6: Aborted LOG: terminating any other active server processes You would think SIGABRT would come from an assertion failure, but there's no preceding assertion message in the log. The other characteristic of these crashes is that *all* of the failing regression instances report terminating connection because of crash of another server process, which suggests strongly that the crash was in an autovacuum process (if it were bgwriter or stats collector the postmaster would've said so). So I think the recent autovac patches are at fault. I spent a bit of time trolling for a spot where the code might abort() without having printed anything, but didn't find one. Hmm. I kept an eye on the buildfarm for a few days, but saw nothing that could be connected to autovacuum so I neglected it. This is the other failure: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-20%2005:30:14 It shows the same pattern. I am baffled -- I don't understand how it can die without reporting the error. Apparently it crashes rather frequently, so it shouldn't be too difficult to reproduce on manual runs. If we could get it to run with a higher debug level, it might prove helpful to further pinpoint the problem. The core file would be much better obviously (first and foremost to confirm that it's autovacuum that's crashing ... ) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] strange buildfarm failures
Alvaro Herrera wrote: Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Stefan Kaltenbrunner wrote: two of my buildfarm members had different but pretty weird looking failures lately: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02 any ideas on what might causing those ? Just for the record, quagga and emu failures don't seem related to the report below. They don't crash; the regression.diffs contains data that suggests that there may be data corruption of some sort. INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226'); ERROR: invalid cidr value: %{ This doesn't seem to make much sense. yeah on further reflection it looks like the failures from emu and quagga seem unrelated to the issue lionfish is experiencing lionfish just failed too: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09 And had a similar failure a few days ago. The curious thing is that what we get in the postmaster log is LOG: server process (PID 23405) was terminated by signal 6: Aborted LOG: terminating any other active server processes You would think SIGABRT would come from an assertion failure, but there's no preceding assertion message in the log. The other characteristic of these crashes is that *all* of the failing regression instances report terminating connection because of crash of another server process, which suggests strongly that the crash was in an autovacuum process (if it were bgwriter or stats collector the postmaster would've said so). So I think the recent autovac patches are at fault. I spent a bit of time trolling for a spot where the code might abort() without having printed anything, but didn't find one. Hmm. I kept an eye on the buildfarm for a few days, but saw nothing that could be connected to autovacuum so I neglected it. This is the other failure: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-20%2005:30:14 It shows the same pattern. I am baffled -- I don't understand how it can die without reporting the error. I should have mentioned that initially - but I think the failure from 2007-04-20 is not related at all. The failure from 2007-04-20 was very likely caused due to the kernel running totally out of memory (lionfish is a very resource starved box at only 48MB of RAM and 128MB of swap at that time - do we have a recent patch that is increasing memory usage quite a lot?). I immediatly added another 128MB of swap after that and I don't think the failure from yesterday is the same (at least there are no kernel logs that indicate a similiar issue) Apparently it crashes rather frequently, so it shouldn't be too difficult to reproduce on manual runs. If we could get it to run with a higher debug level, it might prove helpful to further pinpoint the problem. a manual run of the buildfarm script takes ~4,5 hours on lionfish ;-) The core file would be much better obviously (first and foremost to confirm that it's autovacuum that's crashing ... ) I will see what I can come up with ... Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend