[PATCHES] code cleanup in dynahash.c
This patch makes a minor code cleanup in dynahash.c: a function declared to return 'bool' only ever returned true, so I changed it to return void. I'll apply this to CVS before the end of the day, barring any objections. -Neil --- src/backend/utils/hash/dynahash.c +++ src/backend/utils/hash/dynahash.c @@ -66,7 +66,7 @@ static bool element_alloc(HTAB *hashp, int nelem); static bool dir_realloc(HTAB *hashp); static bool expand_table(HTAB *hashp); -static bool hdefault(HTAB *hashp); +static void hdefault(HTAB *hashp); static bool init_htab(HTAB *hashp, long nelem); static void hash_corrupted(HTAB *hashp); @@ -178,8 +178,7 @@ return NULL; } - if (!hdefault(hashp)) - return NULL; + hdefault(hashp); hctl = hashp-hctl; #ifdef HASH_STATISTICS @@ -254,7 +253,7 @@ /* * Set default HASHHDR parameters. */ -static bool +static void hdefault(HTAB *hashp) { HASHHDR*hctl = hashp-hctl; @@ -268,8 +268,6 @@ hctl-nentries = 0; hctl-nsegs = 0; - /* I added these MS. */ - /* rather pointless defaults for key entry size */ hctl-keysize = sizeof(char *); hctl-entrysize = 2 * sizeof(char *); @@ -279,8 +277,6 @@ /* garbage collection for HASH_REMOVE */ hctl-freeList = NULL; - - return true; } ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] hash_create(): check return code
Neil Conway [EMAIL PROTECTED] writes: I'm not really sure whether this is the correct fix, but it certainly seems wrong to be doing the check in some places and not in others. Another approach would be to elog(ERROR) when an error occurs in hash_create(), which would be fine except there might be some circumstances in which hash_create() is invoked but elog(ERROR) is not setup yet. There are no places where hash_create is called before elog() is functional. I'd go for putting the error into hash_create, I think, because I can't imagine any caller not wanting to error out. (If there is any such caller, it can always catch it with PG_TRY.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] hash_create(): check return code
On Fri, 2004-10-22 at 16:13, Tom Lane wrote: There are no places where hash_create is called before elog() is functional. Well, it's invoked from the statistics collector, which avoids doing elog(ERROR) for some reason. But my guess is that it should be workable to get elog(ERROR) / elog(FATAL) working in the statistics collector, and it will mean a cleanup of existing code as well (which laboriously invokes elog(LOG) followed by exit(1)). I'm working on that now... -Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] hash_create(): check return code
Neil Conway [EMAIL PROTECTED] writes: On Fri, 2004-10-22 at 16:13, Tom Lane wrote: There are no places where hash_create is called before elog() is functional. Well, it's invoked from the statistics collector, which avoids doing elog(ERROR) for some reason. With all due respect to Jan, that coding seems 100% bogus. elog(ERROR) will work (it had better, because pgstat.c certainly calls routines that might do it) and the insistence on using exit() rather than proc_exit() is just plain wrong anyway. Note that there is really no difference between elog(ERROR) and elog(FATAL) in this context, since pgstat doesn't have an outer sigsetjmp call. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] code cleanup in dynahash.c
Neil Conway wrote: This patch makes a minor code cleanup in dynahash.c: a function declared to return 'bool' only ever returned true, so I changed it to return void. Applied to HEAD. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] ARC Memory Usage analysis
I've been using the ARC debug options to analyse memory usage on the PostgreSQL 8.0 server. This is a precursor to more complex performance analysis work on the OSDL test suite. I've simplified some of the ARC reporting into a single log line, which is enclosed here as a patch on freelist.c. This includes reporting of: - the total memory in use, which wasn't previously reported - the cache hit ratio, which was slightly incorrectly calculated - a useful-ish value for looking at the B lists in ARC (This is a patch against cvstip, but I'm not sure whether this has potential for inclusion in 8.0...) The total memory in use is useful because it allows you to tell whether shared_buffers is set too high. If it is set too high, then memory usage will continue to grow slowly up to the max, without any corresponding increase in cache hit ratio. If shared_buffers is too small, then memory usage will climb quickly and linearly to its maximum. The last one I've called turbulence in an attempt to ascribe some useful meaning to B1/B2 hits - I've tried a few other measures though without much success. Turbulence is the hit ratio of B1+B2 lists added together. By observation, this is zero when ARC gives smooth operation, and goes above zero otherwise. Typically, turbulence occurs when shared_buffers is too small for the working set of the database/workload combination and ARC repeatedly re-balances the lengths of T1/T2 as a result of near-misses on the B1/B2 lists. Turbulence doesn't usually cut in until the cache is fully utilized, so there is usually some delay after startup. We also recently discussed that I would add some further memory analysis features for 8.1, so I've been trying to figure out how. The idea that B1, B2 represent something really useful doesn't seem to have been borne out - though I'm open to persuasion there. I originally envisaged a shadow list operating in extension of the main ARC list. This will require some re-coding, since the variables and macros are all hard-coded to a single set of lists. No complaints, just it will take a little longer than we all thought (for me, that is...) My proposal is to alter the code to allow an array of memory linked lists. The actual list would be [0] - other additional lists would be created dynamically as required i.e. not using IFDEFs, since I want this to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab work. This will then allow reporting against the additional lists, so that cache hit ratios can be seen with various other prototype shared_buffer settings. Any thoughts? -- Best Regards, Simon Riggs Index: freelist.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/freelist.c,v retrieving revision 1.48 diff -d -c -r1.48 freelist.c *** freelist.c 16 Sep 2004 16:58:31 - 1.48 --- freelist.c 22 Oct 2004 18:15:38 - *** *** 126,131 --- 126,133 if (StrategyControl-stat_report + DebugSharedBuffers now) { long all_hit, + buf_used, + b_hit, b1_hit, t1_hit, t2_hit, *** *** 155,161 } if (StrategyControl-num_lookup == 0) ! all_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0; else { b1_hit = (StrategyControl-num_hit[STRAT_LIST_B1] * 100 / --- 157,163 } if (StrategyControl-num_lookup == 0) ! all_hit = buf_used = b_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0; else { b1_hit = (StrategyControl-num_hit[STRAT_LIST_B1] * 100 / *** *** 166,181 StrategyControl-num_lookup); b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 / StrategyControl-num_lookup); ! all_hit = b1_hit + t1_hit + t2_hit + b2_hit; } errcxtold = error_context_stack; error_context_stack = NULL; ! elog(DEBUG1, ARC T1target=%5d B1len=%5d T1len=%5d T2len=%5d B2len=%5d, T1_TARGET, B1_LENGTH, T1_LENGTH, T2_LENGTH, B2_LENGTH); ! elog(DEBUG1, ARC total =%4ld%% B1hit=%4ld%% T1hit=%4ld%% T2hit=%4ld%% B2hit=%4ld%%, all_hit, b1_hit, t1_hit, t2_hit, b2_hit); ! elog(DEBUG1, ARC clean buffers at LRU T1= %5d T2= %5d, t1_clean, t2_clean); error_context_stack = errcxtold; --- 168,187 StrategyControl-num_lookup); b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 / StrategyControl-num_lookup); ! all_hit = t1_hit + t2_hit; ! b_hit = b1_hit + b2_hit; ! buf_used = T1_LENGTH + T2_LENGTH; } errcxtold = error_context_stack; error_context_stack = NULL; ! elog(DEBUG1, shared_buffers used=%8ld cache hits=%4ld%% turbulence=%4ld%%, ! buf_used, all_hit, b_hit); ! elog(DEBUG2, ARC T1target=%5d B1len=%5d T1len=%5d T2len=%5d B2len=%5d, T1_TARGET, B1_LENGTH, T1_LENGTH, T2_LENGTH, B2_LENGTH); ! elog(DEBUG2, ARC total =%4ld%% B1hit=%4ld%% T1hit=%4ld%% T2hit=%4ld%% B2hit=%4ld%%,
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On 10/22/2004 2:50 PM, Simon Riggs wrote: I've been using the ARC debug options to analyse memory usage on the PostgreSQL 8.0 server. This is a precursor to more complex performance analysis work on the OSDL test suite. I've simplified some of the ARC reporting into a single log line, which is enclosed here as a patch on freelist.c. This includes reporting of: - the total memory in use, which wasn't previously reported - the cache hit ratio, which was slightly incorrectly calculated - a useful-ish value for looking at the B lists in ARC (This is a patch against cvstip, but I'm not sure whether this has potential for inclusion in 8.0...) The total memory in use is useful because it allows you to tell whether shared_buffers is set too high. If it is set too high, then memory usage will continue to grow slowly up to the max, without any corresponding increase in cache hit ratio. If shared_buffers is too small, then memory usage will climb quickly and linearly to its maximum. The last one I've called turbulence in an attempt to ascribe some useful meaning to B1/B2 hits - I've tried a few other measures though without much success. Turbulence is the hit ratio of B1+B2 lists added together. By observation, this is zero when ARC gives smooth operation, and goes above zero otherwise. Typically, turbulence occurs when shared_buffers is too small for the working set of the database/workload combination and ARC repeatedly re-balances the lengths of T1/T2 as a result of near-misses on the B1/B2 lists. Turbulence doesn't usually cut in until the cache is fully utilized, so there is usually some delay after startup. We also recently discussed that I would add some further memory analysis features for 8.1, so I've been trying to figure out how. The idea that B1, B2 represent something really useful doesn't seem to have been borne out - though I'm open to persuasion there. I originally envisaged a shadow list operating in extension of the main ARC list. This will require some re-coding, since the variables and macros are all hard-coded to a single set of lists. No complaints, just it will take a little longer than we all thought (for me, that is...) My proposal is to alter the code to allow an array of memory linked lists. The actual list would be [0] - other additional lists would be created dynamically as required i.e. not using IFDEFs, since I want this to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab work. This will then allow reporting against the additional lists, so that cache hit ratios can be seen with various other prototype shared_buffer settings. All the existing lists live in shared memory, so that dynamic approach suffers from the fact that the memory has to be allocated during ipc_init. What do you think about my other theory to make C actually 2x effective cache size and NOT to keep T1 in shared buffers but to assume T1 lives in the OS buffer cache? Jan Any thoughts? Index: freelist.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/freelist.c,v retrieving revision 1.48 diff -d -c -r1.48 freelist.c *** freelist.c 16 Sep 2004 16:58:31 - 1.48 --- freelist.c 22 Oct 2004 18:15:38 - *** *** 126,131 --- 126,133 if (StrategyControl-stat_report + DebugSharedBuffers now) { long all_hit, + buf_used, + b_hit, b1_hit, t1_hit, t2_hit, *** *** 155,161 } if (StrategyControl-num_lookup == 0) ! all_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0; else { b1_hit = (StrategyControl-num_hit[STRAT_LIST_B1] * 100 / --- 157,163 } if (StrategyControl-num_lookup == 0) ! all_hit = buf_used = b_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0; else { b1_hit = (StrategyControl-num_hit[STRAT_LIST_B1] * 100 / *** *** 166,181 StrategyControl-num_lookup); b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 / StrategyControl-num_lookup); ! all_hit = b1_hit + t1_hit + t2_hit + b2_hit; } errcxtold = error_context_stack; error_context_stack = NULL; ! elog(DEBUG1, ARC T1target=%5d B1len=%5d T1len=%5d T2len=%5d B2len=%5d, T1_TARGET, B1_LENGTH, T1_LENGTH, T2_LENGTH, B2_LENGTH); ! elog(DEBUG1, ARC total =%4ld%% B1hit=%4ld%% T1hit=%4ld%% T2hit=%4ld%% B2hit=%4ld%%, all_hit, b1_hit, t1_hit, t2_hit, b2_hit); ! elog(DEBUG1, ARC clean buffers at LRU T1= %5d T2= %5d, t1_clean, t2_clean); error_context_stack = errcxtold; --- 168,187 StrategyControl-num_lookup); b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 / StrategyControl-num_lookup); ! all_hit = t1_hit + t2_hit; ! b_hit = b1_hit + b2_hit; ! buf_used = T1_LENGTH + T2_LENGTH; } errcxtold
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Fri, 2004-10-22 at 20:35, Jan Wieck wrote: On 10/22/2004 2:50 PM, Simon Riggs wrote: My proposal is to alter the code to allow an array of memory linked lists. The actual list would be [0] - other additional lists would be created dynamically as required i.e. not using IFDEFs, since I want this to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab work. This will then allow reporting against the additional lists, so that cache hit ratios can be seen with various other prototype shared_buffer settings. All the existing lists live in shared memory, so that dynamic approach suffers from the fact that the memory has to be allocated during ipc_init. [doh] - dreaming again. Yes of course, server startup it is then. [That way, we can include the memory for it at server startup, then allow the GUC to be turned off after a while to avoid another restart?] What do you think about my other theory to make C actually 2x effective cache size and NOT to keep T1 in shared buffers but to assume T1 lives in the OS buffer cache? Summarised like that, I understand it. My observation is that performance varies significantly between startups of the database, which does indicate that the OS cache is working well. So, yes it does seem as if we have a 3 tier cache. I understand you to be effectively suggesting that we go back to having just a 2-tier cache. I guess we've got two options: 1. Keep ARC as it is, but just allocate much of the available physical memory to shared_buffers, so you know that effective_cache_size is low and that its either in T1 or its on disk. 2. Alter ARC so that we experiment with the view that T1 is in the OS and T2 is in shared_buffers, we don't bother keeping T1. (as you say) Hmmm...I think I'll pass on trying to judge its effectiveness - simplifying things is likely to make it easier to understand and predict behaviour. It's well worth trying, and it seems simple enough to make a patch that keeps T1target at zero. i.e. Scientific method: conjecture + experimental validation = theory If you make up a patch, probably against BETA4, Josh and I can include it in the performance testing that I'm hoping we can do over the next few weeks. Whatever makes 8.0 a high performance release is well worth it. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PATCHES] tsearch build on win32
This patch fixes the tsearch build problems reported by Andrew Dunstan. I have confirmed that stop words work as expected now, so the fix that broke this part also works. If possible, please apply before beta-4. //Magnus tsearch_win32build.patch Description: tsearch_win32build.patch ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] tsearch build on win32
Magnus Hagander wrote: This patch fixes the tsearch build problems reported by Andrew Dunstan. ... + ifneq (,$(findstring timezone,$(subdir))) + override CPPFLAGS+= -DBUILDING_DLL + endif That's all it took extra? Wow - easy when you know how! ;-) Thanks, Magnus. cheers andrew ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
Jan Wieck [EMAIL PROTECTED] writes: What do you think about my other theory to make C actually 2x effective cache size and NOT to keep T1 in shared buffers but to assume T1 lives in the OS buffer cache? What will you do when initially fetching a page? It's not supposed to go directly into T2 on first use, but we're going to have some difficulty accessing a page that's not in shared buffers. I don't think you can equate the T1/T2 dichotomy to is in shared buffers or not. You could maybe have a T3 list of pages that aren't in shared buffers anymore but we think are still in OS buffer cache, but what would be the point? It'd be a sufficiently bad model of reality as to be pretty much useless for stats gathering, I'd think. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On 10/22/2004 4:21 PM, Simon Riggs wrote: On Fri, 2004-10-22 at 20:35, Jan Wieck wrote: On 10/22/2004 2:50 PM, Simon Riggs wrote: My proposal is to alter the code to allow an array of memory linked lists. The actual list would be [0] - other additional lists would be created dynamically as required i.e. not using IFDEFs, since I want this to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab work. This will then allow reporting against the additional lists, so that cache hit ratios can be seen with various other prototype shared_buffer settings. All the existing lists live in shared memory, so that dynamic approach suffers from the fact that the memory has to be allocated during ipc_init. [doh] - dreaming again. Yes of course, server startup it is then. [That way, we can include the memory for it at server startup, then allow the GUC to be turned off after a while to avoid another restart?] What do you think about my other theory to make C actually 2x effective cache size and NOT to keep T1 in shared buffers but to assume T1 lives in the OS buffer cache? Summarised like that, I understand it. My observation is that performance varies significantly between startups of the database, which does indicate that the OS cache is working well. So, yes it does seem as if we have a 3 tier cache. I understand you to be effectively suggesting that we go back to having just a 2-tier cache. Effectively yes, just with the difference that we keep a pseudo T1 list and hope that what we are tracking there is what the OS is caching. As said before, if the effective cache size is set properly, that is what should happen. I guess we've got two options: 1. Keep ARC as it is, but just allocate much of the available physical memory to shared_buffers, so you know that effective_cache_size is low and that its either in T1 or its on disk. 2. Alter ARC so that we experiment with the view that T1 is in the OS and T2 is in shared_buffers, we don't bother keeping T1. (as you say) Hmmm...I think I'll pass on trying to judge its effectiveness - simplifying things is likely to make it easier to understand and predict behaviour. It's well worth trying, and it seems simple enough to make a patch that keeps T1target at zero. Not keeping T1target at zero, because that would keep T2 at the size of shared_buffers. What I suspect is that in the current calculation the T1target is underestimated. It is incremented on B1 hits, but B1 is only of T2 size. What it currently tells is what got pushed from T1 into the OS cache. It could well be that it would work much more effective if it would fuzzily tell what got pushed out of the OS cache to disk. Jan i.e. Scientific method: conjecture + experimental validation = theory If you make up a patch, probably against BETA4, Josh and I can include it in the performance testing that I'm hoping we can do over the next few weeks. Whatever makes 8.0 a high performance release is well worth it. Best Regards, Simon Riggs -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Fri, 2004-10-22 at 21:45, Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: What do you think about my other theory to make C actually 2x effective cache size and NOT to keep T1 in shared buffers but to assume T1 lives in the OS buffer cache? What will you do when initially fetching a page? It's not supposed to go directly into T2 on first use, but we're going to have some difficulty accessing a page that's not in shared buffers. I don't think you can equate the T1/T2 dichotomy to is in shared buffers or not. Yes, there are issues there. I want Jan to follow his thoughts through. This is important enough that its worth it - there's only a few even attempting this. You could maybe have a T3 list of pages that aren't in shared buffers anymore but we think are still in OS buffer cache, but what would be the point? It'd be a sufficiently bad model of reality as to be pretty much useless for stats gathering, I'd think. The OS cache is in many ways a wild horse, I agree. Jan is trying to think of ways to harness it, whereas I had mostly ignored it - but its there. Raw disk usage never allowed this opportunity. For high performance systems, we can assume that the OS cache is ours to play with - what will we do with it? We need to use it for some purposes, yet would like to ignore it for others. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] tsearch build on win32
Magnus Hagander [EMAIL PROTECTED] writes: This patch fixes the tsearch build problems reported by Andrew Dunstan. Applied. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] initdb NLS on win32
Magnus Hagander [EMAIL PROTECTED] writes: This patch is required for initdb to work on win32 with NLS enabled. Without it we get a segfault when trying to determine the current locale, since we can't get it using LC_MESSAGES. Applied. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] Example of function returning SETOF RECORD
David Fetter [EMAIL PROTECTED] writes: Here's a patch that adds an example of using SETOF RECORD with the needed grammar. I'm having a bit of a problem with this, because it is a plpgsql example inserted into a section that is solely about SQL-language functions. Can you adapt it to be an SQL function? Or put the example into the plpgsql chapter? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match