Re: [HACKERS] syslogging oddity
On Sun, Jul 22, 2007 at 08:05:12PM -0400, Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Somewhere along the way we seem to have made the syslogger's shutdown message go to stderr, even if we have redirected it: I'm pretty sure it has done that all along; at least the design intention is that messages generated by syslogger itself should go to its stderr. (Else, if the logger is having trouble, you might never get to find out why at all.) Yeah, I think it's been that way all along. It might be reasonable to reduce logger shutting down to DEBUG1 or so, now that the facility has been around for awhile. +1. For example, many windows system have *only* that message in the eventlog, and nothing else... Which is kind of strange. It could be interesting to have it write it *to the logfile* though, since it'd then at least be in the same place as the others. As in special-casing this one message, and just ignore logging it in case it fails. But think we're fine just dropping the level. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Sun, 2007-07-22 at 08:53 -0700, Josh Berkus wrote: Pavel Stehule wrote: Hello, I checked my tests again I have different results. Now I tested PostgreSQL on dedicated server. Now 8.3 is about 20% faster. I didn't see strong impression of autovacuum. All numbers are approximate only. I did pgbench 3x for folowing configuration: (autovacuum on, autovacuum off, statistics off) and for -tntransaction (100, 1000, 4000) In other news, 8.3 with current HOT is 13% faster than 8.2 at TPCE in the first 1/2 hour. Performance does not fall over 5 hours of test run, and most of the main tables never have autovacuum triggered at all. Unfortnately, we don't yet have a 5-hour 8.2 run to compare last-half-hour performance. I think the rule of thumb is if the workload doesn't have enough UPDATEs to trigger VACUUMs then HOT will have a low benefit. With any workload, we should run it *until* we see some autovacuums kick-in, so we can compare the overall situation of HOT v non-HOT. HOT is designed for longer term benefit; fillfactor benefits fade over time (as defined). -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Sat, 2007-07-21 at 13:30 -0400, Tom Lane wrote: Note to all: we ***HAVE TO*** settle on some reasonable default vacuum_cost_delay settings before we can ship 8.3. With no cost delay and two or three workers active, 8.3's autovac does indeed send performance into the tank. Couple of thoughts here: HOT will reduce the need for VACUUMs quite considerably, so multiple concurrent VACUUMs becomes a quite rare situation. We should re-evaluate this thought once we have taken the go/no-go decision for HOT in 8.3 The good thing about having multiple autovacuum daemons active is that this reduces the possibility of having small tables starved while a single large VACUUM runs to completion. My experience is that a single large table can take many hours possibly frequently preventing 10,000 VACUUMs of small tables within that time. The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. I would suggest that we look at ways of queueing, so that multiple large VACUUMs cannot occur. Setting vacuum_cost_delay will still allow multiple large VACUUMs but will make the starvation problem even worse as well. If we allow that situation to occur, I think I'd rather stick to autovac_workers=1. We will still have this potential problem even with HOT. Potential solution: Each autovac worker gets a range of table sizes they are allowed to VACUUM. This is set with an additional parameter which is an array of gating values (i.e. one less gating value than number of autovac workers). That way small VACUUMs are never starved out by large ones. This is the same as having a Small:Medium:Large style queueing system. We can work out how to make the queueing system self-tune by observation of autovacuum frequency. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Simon Riggs wrote: The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. I would suggest that we look at ways of queueing, so that multiple large VACUUMs cannot occur. Setting vacuum_cost_delay will still allow multiple large VACUUMs but will make the starvation problem even worse as well. If we allow that situation to occur, I think I'd rather stick to autovac_workers=1. We will still have this potential problem even with HOT. Potential solution: Each autovac worker gets a range of table sizes they are allowed to VACUUM. This is set with an additional parameter which is an array of gating values (i.e. one less gating value than number of autovac workers). That way small VACUUMs are never starved out by large ones. This is the same as having a Small:Medium:Large style queueing system. We can work out how to make the queueing system self-tune by observation of autovacuum frequency. default autovac_workers is 3, so wouldn't you need three, not two, large VACUUMs to starvate a smaller table? Instead of queuing, how about increasing autovac_workers if starvation is a concern? I'd like to set a default autovacuum_vacuum_cost_delay anyway. Without it, autovacuum is a performance hit when it kicks in, even if there's only one of them running, and even if it only lasts for a short time. It's an unpleasant surprise for someone who's new to PostgreSQL and doesn't yet understand how vacuum and autovacuum works. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] SSPI vs MingW
I just came across yet another place where MingW isn't compatible with the windows api. Specifically, their libsecur32.a file lacks at least one function that is needed to implement SSPI authentication. The way I can see it, there are three ways to solve it: 1) Simply state that SSPI authentication in the backend cannot be built with mingw, and require msvc build for it (the msvc api follows the windows api, which is hardly surprising). We could add an autoconf test for it that'd pick up an updated libsecur32.a file if/when mingw release an update. 2) Ship our own secur32.def file, and automatically build an import library for it that we can link against. Because the function is present in the DLL file, this works fine. 3) Dynamically load the function at runtime, thus completely ignoring the need for an import library for it. What do people feel about these options? I'm annoyed enough with mingw right now (after having tracked this stupid thing down) that I'm probably not thinking clearly enough to say something myself :) Oh, and feel free to tell me which option(s) I missed completely.. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Mon, 2007-07-23 at 10:04 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. I would suggest that we look at ways of queueing, so that multiple large VACUUMs cannot occur. Setting vacuum_cost_delay will still allow multiple large VACUUMs but will make the starvation problem even worse as well. If we allow that situation to occur, I think I'd rather stick to autovac_workers=1. We will still have this potential problem even with HOT. Potential solution: Each autovac worker gets a range of table sizes they are allowed to VACUUM. This is set with an additional parameter which is an array of gating values (i.e. one less gating value than number of autovac workers). That way small VACUUMs are never starved out by large ones. This is the same as having a Small:Medium:Large style queueing system. We can work out how to make the queueing system self-tune by observation of autovacuum frequency. default autovac_workers is 3, so wouldn't you need three, not two, large VACUUMs to starvate a smaller table? Instead of queuing, how about increasing autovac_workers if starvation is a concern? Neither of those things prevent the problem, they just make it less likely. I don't think thats a good answer for production systems that have response time service level agreements to meet. I'd like to set a default autovacuum_vacuum_cost_delay anyway. Without it, autovacuum is a performance hit when it kicks in, even if there's only one of them running, and even if it only lasts for a short time. It's an unpleasant surprise for someone who's new to PostgreSQL and doesn't yet understand how vacuum and autovacuum works. I agree, but only if we can prevent the starvation problem while we do it, otherwise it just gets worse. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SSPI vs MingW
Magnus Hagander wrote: I just came across yet another place where MingW isn't compatible with the windows api. Specifically, their libsecur32.a file lacks at least one function that is needed to implement SSPI authentication. The way I can see it, there are three ways to solve it: Ugh. 1) Simply state that SSPI authentication in the backend cannot be built with mingw, and require msvc build for it (the msvc api follows the windows api, which is hardly surprising). We could add an autoconf test for it that'd pick up an updated libsecur32.a file if/when mingw release an update. I prefer this option, if only because I have little interest in supporting mingw any longer than necessarily, but I realise others may want to use it so... 2) Ship our own secur32.def file, and automatically build an import library for it that we can link against. Because the function is present in the DLL file, this works fine. Yuck. 3) Dynamically load the function at runtime, thus completely ignoring the need for an import library for it. That gets my vote. It's relatively clean and non-kludgy. Regards, Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SSPI vs MingW
On Mon, Jul 23, 2007 at 11:06:59AM +0100, Dave Page wrote: Magnus Hagander wrote: I just came across yet another place where MingW isn't compatible with the windows api. Specifically, their libsecur32.a file lacks at least one function that is needed to implement SSPI authentication. The way I can see it, there are three ways to solve it: Ugh. Indeed. 1) Simply state that SSPI authentication in the backend cannot be built with mingw, and require msvc build for it (the msvc api follows the windows api, which is hardly surprising). We could add an autoconf test for it that'd pick up an updated libsecur32.a file if/when mingw release an update. I prefer this option, if only because I have little interest in supporting mingw any longer than necessarily, but I realise others may want to use it so... Heh, well, I don't see that one going away... 2) Ship our own secur32.def file, and automatically build an import library for it that we can link against. Because the function is present in the DLL file, this works fine. Yuck. 3) Dynamically load the function at runtime, thus completely ignoring the need for an import library for it. That gets my vote. It's relatively clean and non-kludgy. Ok, jus so people knowing what amount of code we're talking about, here's a patch that does this. Awaiting further comments :-) //Magnus Index: src/backend/libpq/auth.c === RCS file: /projects/cvsroot/pgsql/src/backend/libpq/auth.c,v retrieving revision 1.154 diff -c -r1.154 auth.c *** src/backend/libpq/auth.c23 Jul 2007 10:16:53 - 1.154 --- src/backend/libpq/auth.c23 Jul 2007 12:52:01 - *** *** 567,572 --- 567,575 errdetail(%s (%x), sysmsg, r))); } + typedef SECURITY_STATUS + (SEC_ENTRY * QUERY_SECURITY_CONTEXT_TOKEN_FN)( + PCtxtHandle, void SEC_FAR * SEC_FAR *); static int pg_SSPI_recvauth(Port *port) *** *** 591,596 --- 594,601 DWORD accountnamesize = sizeof(accountname); DWORD domainnamesize = sizeof(domainname); SID_NAME_USEaccountnameuse; + HMODULE secur32; + QUERY_SECURITY_CONTEXT_TOKEN_FN _QuerySecurityContextToken; /* *** *** 728,737 * pg username that was specified for the connection. */ ! r = QuerySecurityContextToken(sspictx, token); if (r != SEC_E_OK) pg_SSPI_error(ERROR, gettext_noop(could not get security token from context), r); /* * No longer need the security context, everything from here on uses the --- 733,763 * pg username that was specified for the connection. */ ! secur32 = LoadLibrary(SECUR32.DLL); ! if (secur32 == NULL) ! ereport(ERROR, ! (errmsg_internal(could not load secur32.dll: %d, ! (int)GetLastError(; ! ! _QuerySecurityContextToken = (QUERY_SECURITY_CONTEXT_TOKEN_FN) ! GetProcAddress(secur32, QuerySecurityContextToken); ! if (_QuerySecurityContextToken == NULL) ! { ! FreeLibrary(secur32); ! ereport(ERROR, ! (errmsg_internal(could not locate QuerySecurityContextToken in secur32.dll: %d, ! (int)GetLastError(; ! } ! ! r = (_QuerySecurityContextToken)(sspictx, token); if (r != SEC_E_OK) + { + FreeLibrary(secur32); pg_SSPI_error(ERROR, gettext_noop(could not get security token from context), r); + } + + FreeLibrary(secur32); /* * No longer need the security context, everything from here on uses the ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SSPI vs MingW
Dave Page wrote: Magnus Hagander wrote: I just came across yet another place where MingW isn't compatible with the windows api. Specifically, their libsecur32.a file lacks at least one function that is needed to implement SSPI authentication. The way I can see it, there are three ways to solve it: Ugh. agreed. 1) Simply state that SSPI authentication in the backend cannot be built with mingw, and require msvc build for it (the msvc api follows the windows api, which is hardly surprising). We could add an autoconf test for it that'd pick up an updated libsecur32.a file if/when mingw release an update. I prefer this option, if only because I have little interest in supporting mingw any longer than necessarily, but I realise others may want to use it so... I don't think it's going away any time soon. For example, it's the only platform I've been able to make work on my Vista box, and nobody has told me how to get around the problems, even though apparently some have managed to make MSVC work on Vista. This is the least good option IMNSHO. 2) Ship our own secur32.def file, and automatically build an import library for it that we can link against. Because the function is present in the DLL file, this works fine. Yuck. 3) Dynamically load the function at runtime, thus completely ignoring the need for an import library for it. That gets my vote. It's relatively clean and non-kludgy. Yes, I can live with this one too, although I don't think option 2 is so bad either. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Anyway, if you believe that DDL is infrequent, why are you resistant to the idea of WAL-logging cache flushes? First, cache invalidations are not the only problem caused by replaying system-table updates. The whole SnapshotNow business doesn't exactly make things easier too. So it feels like a lot of added complexity and code for little gain - unless a *lot* more things (like locking requests) are logged too. The mention of locking requests brought to mind the following gedankenexperiment: 1. slave server backend is running some long-running query on table X. 2. WAL-reading process receives and executes DROP TABLE X. (It doesn't even have to be a DROP; most varieties of ALTER are enough to create problems for a concurrently-running query.) It's really hard to see how to defend against that without a fairly complete simulation of locking on the slave side. Well, locking on the slave is a bit easier than on the master, for two reasons 1) Queries running on the slave only lock in AccessShareLock mode - therefore, you only need to know if a certain mode conflics with AccessShareLock - and there seems to be only one that does, namely AccessExclusiveLock. So we really only need to care about AccessExclusiveLock locks on the master 2) As far as I can see, the point of an AccessExclusiveLock is *not* actually preventing queries from running while a DDL statement is *executed*, but rather preventing queries from running while the statement is *committed*. This fits the fact that system tables are read using SnapshotNow (not SnapshotDirty) - while the DDL is running, everybody is happily using the old information, the trouble would only start after the commit because with SnapshotNow you suddenly see the new state. I not yet 100% sure that (2) holds (with the exception of VACUUM FULL)- but I'm fairly confident, because if (2) was wrong, than how would the system survive a crash during the execution of a DDL statement? So after a bit more thought (And reading. Thanks for all the replies, guys! They are greatly appreciated.), I came up with the following plan for both inval events and locks .) Store two flags in the commit record of a transaction, for transaction generated inval events and transaction held an access exlusive lock. .) Upon replay, block until no transactions are running (for transaction held an exclusive lock) before replaying the record, or flush the caches after replaying it (for transaction generated inval events). This scheme has two really nice properties: First, it can be extended fairly easily to not store a simple flag, but a list of OIDs, and use that to make the flushing and blocking more fine-grained - without changing the basic way in which things work. And second, it stores all information needed in the *commit* record. That removes problems with transactions that are aborted due to a crash, and therefor the WAL doesn't tell that they were aborted. VACUUM FULL will need some special care - but at least VACUUM FULL is already such a disruptive operation, that it probably won't surprise anyone if it's disruptive on the slave too. (And now that CLUSTER is MVCC-Safe from what I read, the usecase for VACUUM FULL seems to be pretty slim). The btree metapage caching will need special treatement too - probably some flags in the WAL record that change the metapage that instruct the slave to synthesize a suitable inval event. What stays open is regular VACUUM (and maybe inline vacuuming - is that part of the latest version of HOT, or did it get removed?). Here, the key is logging the return value of GetOldestXmin() I think. Since that value is what decides in the end if a tuple can be killed or not, having it available on the slave should allow the slave to block replay until no slave query depends on the tuple anymore. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Oops in fe-auth.c
I've been debugging some really weird crashes in libpq on win32, and I think I've finally found the reason for the heap corruption that shows up in msvc debug mode. When run in debug mode, the runtime for msvc will *zero-pad the entire buffer* in a strncpy() call. This in itself is not bad (just slow), but it shows a rather bad bug in libpq. In a bunch of places in fe-auth.c, we do: strncpy(PQerrormsg, libpq_gettext(out of memory\n), PQERRORMSG_LENGTH); Except when calling it, the size of the buffer is 256 bytes. But PQERRORMSG_LENGTH is 1024. Naturally, this causes a heap corruption. It doesn't happen in production, because the string length fits as long as there is no padding. One way to get around this on win32 is to just use snprintf() instead of strncpy(), since it doesn't pad. But that's just hiding the underlying problem, so I think that's a really bad fix. I assume the comment in the header: * NOTE: the error message strings returned by this module must not * exceed INITIAL_EXPBUFFER_SIZE (currently 256 bytes). refers to this, but it's hard to guarantee that from the code since it's translated strings. I see a comment in fe-connect.c that has * XXX fe-auth.c has not been fixed to support PQExpBuffers, Given this, I'll go ahead and fix fe-connect to support PQExpBuffers, unless there are any objections. Also, is this something we shuold backpatch - or just ignore since we've had no actual reports of it in the field? //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Oops in fe-auth.c
Magnus Hagander [EMAIL PROTECTED] writes: When run in debug mode, the runtime for msvc will *zero-pad the entire buffer* in a strncpy() call. This in itself is not bad (just slow), but it shows a rather bad bug in libpq. [squint] That is the specified behavior of strncpy on every platform, not only msvc. If there's a bug here why didn't we notice it long ago? Given this, I'll go ahead and fix fe-connect to support PQExpBuffers, unless there are any objections. I'm not against that, but I question what bug you've really found. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Oops in fe-auth.c
On Mon, Jul 23, 2007 at 10:28:57AM -0400, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: When run in debug mode, the runtime for msvc will *zero-pad the entire buffer* in a strncpy() call. This in itself is not bad (just slow), but it shows a rather bad bug in libpq. [squint] That is the specified behavior of strncpy on every platform, not only msvc. If there's a bug here why didn't we notice it long ago? Hmm. Interesting - I see that now if I look at http://www.opengroup.org/onlinepubs/007908799/xsh/strncpy.html. That's very interesting - but my debugger very much shows me that the buffer size is 256 bytes (INITIAL_EXPBUFFER_SIZE), and passes 1024 (PQERRORMSG_LENGTH) as the size of the buffer... Perhaps we've just never hit one of those codepaths before. Previously, it was only used for out of memory errors - the gssapi code adds a few places where it's used in other cases, and this is where it crashed for me. Given this, I'll go ahead and fix fe-connect to support PQExpBuffers, unless there are any objections. I'm not against that, but I question what bug you've really found. I never actually tested if it crashes on mingw, but looking some more at it it really should - once one of these errors happen. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SSPI vs MingW
Dave Page [EMAIL PROTECTED] writes: Magnus Hagander wrote: 1) Simply state that SSPI authentication in the backend cannot be built with mingw, and require msvc build for it (the msvc api follows the windows api, which is hardly surprising). We could add an autoconf test for it that'd pick up an updated libsecur32.a file if/when mingw release an update. I prefer this option, +1. I grow weary of working around so many Windows-related bugs/omissions. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: I came up with the following plan for both inval events and locks .) Store two flags in the commit record of a transaction, for transaction generated inval events and transaction held an access exlusive lock. .) Upon replay, block until no transactions are running (for transaction held an exclusive lock) before replaying the record, or flush the caches after replaying it (for transaction generated inval events). This does not work; the lock has to be taken earlier than that. (See for instance VACUUM's truncate calls.) Not to mention that you have converted exclusive lock on one table to exclusive lock on every table, which is even worse than the idea of converting per-table cache flushes to system-wide ones. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] syslogging oddity
Magnus Hagander [EMAIL PROTECTED] writes: It could be interesting to have it write it *to the logfile* though, since it'd then at least be in the same place as the others. It does that too, no? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] syslogging oddity
On Mon, Jul 23, 2007 at 10:45:35AM -0400, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: It could be interesting to have it write it *to the logfile* though, since it'd then at least be in the same place as the others. It does that too, no? Ok, I admit writing that without actually checking anything :-) The main thing is that yes, I'd like to get it out of the eventlog. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] syslogging oddity
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: It could be interesting to have it write it *to the logfile* though, since it'd then at least be in the same place as the others. It does that too, no? Yes, but if we make the message DEBUG1 it won't normally. Still, I think we could live with that. I'm not inclined to waste too much time on it. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Why is lc_messages superuser only?
Looking around the lc_messages stuff a bit, I notice it's set to superuser-only. I do use ALTER USER joe SET lc_messages='sv_SE' now and then to change the language for a user. And I see it's also possible to use it on a database level by doing ALTER DATABASE postgres SET lc_messages='sv_SE' (user overriding database overriding system default, as expected) However, it can also be useful for the user to be able to change his own session, and this only works if you are superuser. Is there a reason for this? //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MAXIMUM_ALIGNOF on Windows-32
On Fri, Jul 20, 2007 at 10:32:35AM -0400, Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Anyway, we detect this directly based on the C compiler's behavior, and you can't argue with the compiler about it. Whatever it's doing is right by definition. Perhaps Pavan is referring to what is hardcoded in pg_config.h.win32 which is used for MSVC builds (but not for MinGW builds, IIRC), in which case the answer might be that in this file we need to be pessimistic about such things, since we have no reasonable way to run configure on this platform. Somebody had better double-check that. We don't need to be pessimistic, we need to be *correct*, because the align values had better match the way the compiler will lay out a C struct. Otherwise struct-based access to catalog rows will fail. (I'm not sure if there are any system catalogs with float8 or int64 columns, but I'd sure not want to find out that we couldn't have one because of misconfiguration of MSVC builds.) How do I double-check this? I see though that the comment in pg_config.h.win32 claims it was derived from mechanically-generated configure output, so unless that's lying it should be OK already. It's not - it started out as a copy of the output of ./configure on mingw. AFAIK struct alignment is part of the ABI for a platform and is not subject to the whims of individual compilers, so the result from MinGW should be OK for MSVC. Still, it doesn't hurt to double-check. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] syslogging oddity
Andrew Dunstan [EMAIL PROTECTED] writes: Yes, but if we make the message DEBUG1 it won't normally. Still, I think we could live with that. I'm not inclined to waste too much time on it. Yeah. I think the only reason it was LOG initially was because the syslogger was pretty experimental at the time. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why is lc_messages superuser only?
Magnus Hagander wrote: Looking around the lc_messages stuff a bit, I notice it's set to superuser-only. I do use ALTER USER joe SET lc_messages='sv_SE' now and then to change the language for a user. And I see it's also possible to use it on a database level by doing ALTER DATABASE postgres SET lc_messages='sv_SE' (user overriding database overriding system default, as expected) However, it can also be useful for the user to be able to change his own session, and this only works if you are superuser. Is there a reason for this? Presumably we don't want a user changing what is used on the logs ... cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why is lc_messages superuser only?
Magnus Hagander [EMAIL PROTECTED] writes: Is there a reason for this? Two arguments I can recall: (1) Having log messages emitted in a language that the DBA can't read would be a useful tactic for a Bad Guy trying to cover his tracks. (2) Setting lc_messages to a value incompatible with the database encoding would be likely to result in PANIC or worse. If we had more-robust locale support, I could see separating lc_messages into one setting for messages bound to the client and one for messages bound to the log, and making the latter superuser only (or, more likely, PGC_SIGHUP, because surely you'd want DB-wide consistency). But we are nowhere near being able to do that. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why is lc_messages superuser only?
It is so that the user cannot hide log messages he causes by setting the language to something that the administrator cannot understand. (There are more conceivable scenarios of that sort, such as exploiting the administrator's ad hoc log parsing tool.) -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why is lc_messages superuser only?
On Mon, Jul 23, 2007 at 11:20:15AM -0400, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Is there a reason for this? Two arguments I can recall: (1) Having log messages emitted in a language that the DBA can't read would be a useful tactic for a Bad Guy trying to cover his tracks. (2) Setting lc_messages to a value incompatible with the database encoding would be likely to result in PANIC or worse. If we had more-robust locale support, I could see separating lc_messages into one setting for messages bound to the client and one for messages bound to the log, and making the latter superuser only (or, more likely, PGC_SIGHUP, because surely you'd want DB-wide consistency). But we are nowhere near being able to do that. Ok. That makes a lot of sense, unfortunately. Hopefully something we can get sometime in the future, then :-) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MAXIMUM_ALIGNOF on Windows-32
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Somebody had better double-check that. We don't need to be pessimistic, we need to be *correct*, because the align values had better match the way the compiler will lay out a C struct. Otherwise struct-based access to catalog rows will fail. (I'm not sure if there are any system catalogs with float8 or int64 columns, but I'd sure not want to find out that we couldn't have one because of misconfiguration of MSVC builds.) How do I double-check this? The configure script checks it by declaring struct { char pad; TYPE field; } foo and then measuring offsetof(foo, field), for each interesting TYPE. I see though that the comment in pg_config.h.win32 claims it was derived from mechanically-generated configure output, so unless that's lying it should be OK already. It's not - it started out as a copy of the output of ./configure on mingw. Started out as? Good luck keeping it in sync, if it's not mechanically created. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] MAXIMUM_ALIGNOF on Windows-32
Tom Lane wrote: I see though that the comment in pg_config.h.win32 claims it was derived from mechanically-generated configure output, so unless that's lying it should be OK already. It's not - it started out as a copy of the output of ./configure on mingw. Started out as? Good luck keeping it in sync, if it's not mechanically created. ISTM this is symptomatic of the MSVC build system problems. I understand why Dave and Magnus want to use it, but essentially it is breaking one of the original requirements of our building on Windows at all, namely that we use a unified build tool chain. It's a thousand pities. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Simon Riggs wrote: The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. I would suggest that we look at ways of queueing, so that multiple large VACUUMs cannot occur. Setting vacuum_cost_delay will still allow multiple large VACUUMs but will make the starvation problem even worse as well. If we allow that situation to occur, I think I'd rather stick to autovac_workers=1. We will still have this potential problem even with HOT. We already discussed all this to death before feature freeze. I'm not sure if it's a good idea to try to come up with new heuristics for the thing this late. Feel free to work on it for 8.4 though! I also wonder whether you have noticed the balancing code in autovac. Whenever more than one autovac workers are running, they split the available I/O allocated to them fairly, so that each one delays more frequently than if it was running alone. The net effect is supposed to be that no matter how many workers are running, your vacuum delay settings are respected. In any case, I think a better solution to the starvation problem caused by huge tables is not skipping the vacuuming of them, but making it less wasteful, for example with the DSM. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Debug a C shared library using Eclipse or Visual C++ 6.0
Hi, I have a problem when trying to debug a shared library developed in C. Im following the steps included in the docs, in chapter 33 (extending SQL), section 33.9 (C-Language functions) http://www.postgresql.org/docs/8.2/interactive/xfunc-c.html Ive successfully compiled the example tablefunc.c which is included in http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tablefunc/ Im using Eclipse 3.2 and have installed mingw. The problem I have is that I cannot debug the functions. When trying to debug, I choose the option C/C++ attach to local application. Then a list of processes to select appears. The only option I can choose is pgAdmin3.exe. When I choose it and open a SQL script in the pgadmin and execute the function Im trying to debug, the function works properly but in doesnt enter the debug. This is the code I use to create the function. Its linked to the dll compiled with Eclipse. CREATE OR REPLACE FUNCTION mi_crosstab(text) RETURNS SETOF record AS E'D:eclipseworkspacetablefunc_libDebugtablefunc_lib', 'mi_crosstab' LANGUAGE 'c' VOLATILE STRICT; ALTER FUNCTION mi_crosstab(text) OWNER TO postgres; I have seen that there are several processes called postgres.exe in the system administrator window. I have managed to make them appear in the list I mentioned, but when I select them in the debug option, I get the following error message: Attach to process failed. Im also trying to debug using Microsoft Visual C++ 6.0. With this option I cannot even compile any library. This is the simple code Im trying to compile: extern C{ #include postgres.h #include string.h #include fmgr.h }; PG_MODULE_MAGIC; PG_FUNCTION_INFO_V1(add_one_float8); __stdcall DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) { return TRUE; } extern C { __declspec(dllexport) Datum add_one_float8(PG_FUNCTION_ARGS) { /* The macros for FLOAT8 hide its pass-by-reference nature. */ float8 arg = PG_GETARG_FLOAT8(0); PG_RETURN_FLOAT8(arg + 1.0); } }; When doing so I get 3 warnings referred to the MAGIC FUNCTION declaration, and the function doesnt work. warning C4273: 'Pg_magic_func' : inconsistent dll linkage. dllexport assumed. warning C4273: 'pg_finfo_add_one_float8' : inconsistent dll linkage. dllexport assumed. Linking... LINK : warning LNK4075: ignoring /INCREMENTAL due to /FORCE specification Creating library Debug/Postgres_dll.lib and object Debug/Postgres_dll.exp Postgres_dll.dll - 0 error(s), 3 warning(s) The Pg_magic_func is declared in the header file fmgr.h, I have checked it but I dont see any error there. Please, if you could provide me some feedback, Id be really grateful!! Thanks, Aurora
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Mon, 2007-07-23 at 12:00 -0400, Alvaro Herrera wrote: Simon Riggs wrote: The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. I would suggest that we look at ways of queueing, so that multiple large VACUUMs cannot occur. Setting vacuum_cost_delay will still allow multiple large VACUUMs but will make the starvation problem even worse as well. If we allow that situation to occur, I think I'd rather stick to autovac_workers=1. We will still have this potential problem even with HOT. We already discussed all this to death before feature freeze. ...and starvation has still not been avoided. I like what you have done, but we still have a problem, whichever release it gets fixed in. I'm not sure if it's a good idea to try to come up with new heuristics for the thing this late. Feel free to work on it for 8.4 though! I also wonder whether you have noticed the balancing code in autovac. Whenever more than one autovac workers are running, they split the available I/O allocated to them fairly, so that each one delays more frequently than if it was running alone. The net effect is supposed to be that no matter how many workers are running, your vacuum delay settings are respected. I did and I like it, many thanks. In any case, I think a better solution to the starvation problem caused by huge tables is not skipping the vacuuming of them, but making it less wasteful, for example with the DSM. Neither of those things prevent starvation though. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Simon Riggs wrote: On Mon, 2007-07-23 at 12:00 -0400, Alvaro Herrera wrote: Simon Riggs wrote: The bad thing about having multiple autovacuum daemons active is that you can get two large VACUUMs running at the same time. This gives you the same small-VACUUM-starvation problem we had before, but now the effects of two VACUUMs kill performance even more. We already discussed all this to death before feature freeze. ...and starvation has still not been avoided. I like what you have done, but we still have a problem, whichever release it gets fixed in. Oh I will the first to admit that autovacuum is still not good enough. In any case, I think a better solution to the starvation problem caused by huge tables is not skipping the vacuuming of them, but making it less wasteful, for example with the DSM. Neither of those things prevent starvation though. Certainly it doesn't prevent starvation completely -- really there is no way to completely prevent starvation unless you have as many workers as you have tables, and one disk for each. What DSM does do is let the big tables be vacuumed quickly which makes most of the problem go away. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I came up with the following plan for both inval events and locks .) Store two flags in the commit record of a transaction, for transaction generated inval events and transaction held an access exlusive lock. .) Upon replay, block until no transactions are running (for transaction held an exclusive lock) before replaying the record, or flush the caches after replaying it (for transaction generated inval events). This does not work; the lock has to be taken earlier than that. (See for instance VACUUM's truncate calls.) Not to mention that you have converted exclusive lock on one table to exclusive lock on every table, which is even worse than the idea of converting per-table cache flushes to system-wide ones. I'll check what VACUUM is doing.. I primarily had CLUSTER and TRUNCATE in mind. That exclusive lock on one table becomes exclusive lock on all tables issue can (as I wrote in the part of my mail that you sniped) be solved I think by storing a list of OIDs instead of a flag for the locks and inval events. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On 7/23/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Certainly it doesn't prevent starvation completely -- really there is no way to completely prevent starvation unless you have as many workers as you have tables, and one disk for each. What DSM does do is let the big tables be vacuumed quickly which makes most of the problem go away. Frankly I haven't seen DSM results very closely, but DSM can help us avoid full heap scans (and thats a big thing!), but it can't avoid the associated index scans and that might limit our ability to vacuum very large tables frequently. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] MAXIMUM_ALIGNOF on Windows-32
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Somebody had better double-check that. We don't need to be pessimistic, we need to be *correct*, because the align values had better match the way the compiler will lay out a C struct. Otherwise struct-based access to catalog rows will fail. (I'm not sure if there are any system catalogs with float8 or int64 columns, but I'd sure not want to find out that we couldn't have one because of misconfiguration of MSVC builds.) How do I double-check this? The configure script checks it by declaring struct { char pad; TYPE field; } foo and then measuring offsetof(foo, field), for each interesting TYPE. Ok. Confirmed that they are all the same. I see though that the comment in pg_config.h.win32 claims it was derived from mechanically-generated configure output, so unless that's lying it should be OK already. It's not - it started out as a copy of the output of ./configure on mingw. Started out as? Good luck keeping it in sync, if it's not mechanically created. It's been working fine for a year... In general, any new features need to be added to the build system anyway, which gets done when the feature is checked to be working on the msvc build (see the gssapi or the xml stuff for example). Since we don't have ./configure there. There's no point in automatically adding the #undef rows if there is no system that changes them into #defines if they're used.. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Jul 23, 2007, at 1:40 AM, Simon Riggs wrote: Potential solution: Each autovac worker gets a range of table sizes they are allowed to VACUUM. Anyone putting thought into this should check the archives; there was quite a bit of discussion around it. For 8.3 we decided to KISS so that we'd get in the release, but we really do need to tackle the starvation issue for 8.4. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Jul 22, 2007, at 8:54 AM, Josh Berkus wrote: Tom, Note to all: we ***HAVE TO*** settle on some reasonable default vacuum_cost_delay settings before we can ship 8.3. With no cost delay and two or three workers active, 8.3's autovac does indeed send performance into the tank. I've been using 20ms for most of my setups. That's aimed at reducing autovac to almost no impact at all, but taking a long time. Maybe 10ms? I've found 20ms to be a pretty good number for run-of-the-mill IO capability, and 10ms to be good for a good RAID setup (RAID10, 8+ drives, BBU). For a default setting, I think it'd be better to lean towards 20ms. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Oops in fe-auth.c
Magnus Hagander wrote: On Mon, Jul 23, 2007 at 10:28:57AM -0400, Tom Lane wrote: Given this, I'll go ahead and fix fe-connect to support PQExpBuffers, unless there are any objections. I'm not against that, but I question what bug you've really found. I never actually tested if it crashes on mingw, but looking some more at it it really should - once one of these errors happen. Hm. Much easier than that - the code is new in HEAD. 8.2 did fprintf(stderr). And HEAD still does that in at least one case. Anyway, I'll go ahead with the patch I wrote since it does Seem Nicer to actually use the PQexpbuffer code there, and the patch was rather trivial, but it's certainly not something to backpatch then... I also found at least one other place in libpq where it still does fprintf(stderr). That should probably be fixed at the same time, right? //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Pavan Deolasee wrote: On 7/23/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Certainly it doesn't prevent starvation completely -- really there is no way to completely prevent starvation unless you have as many workers as you have tables, and one disk for each. What DSM does do is let the big tables be vacuumed quickly which makes most of the problem go away. Frankly I haven't seen DSM results very closely, but DSM can help us avoid full heap scans (and thats a big thing!), but it can't avoid the associated index scans and that might limit our ability to vacuum very large tables frequently. I haven't seen DSM either so IMBFoS. You are right about index scans though. Fortunately they are not as expensive as they used to be thanks to Heikki's changes to allow physical order scanning. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Oops in fe-auth.c
Magnus Hagander [EMAIL PROTECTED] writes: I also found at least one other place in libpq where it still does fprintf(stderr). That should probably be fixed at the same time, right? Yeah, we should be using the error message buffer if at all possible. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Oops in fe-auth.c
Magnus Hagander [EMAIL PROTECTED] writes: I never actually tested if it crashes on mingw, but looking some more at it it really should - once one of these errors happen. Hm. Much easier than that - the code is new in HEAD. 8.2 did fprintf(stderr). And HEAD still does that in at least one case. Anyway, I'll go ahead with the patch I wrote since it does Seem Nicer to actually use the PQexpbuffer code there, and the patch was rather trivial, but it's certainly not something to backpatch then... It does look like there is a risk in 8.2 and before, though: the fe-auth.c code has a lot of snprintf's with PQERRORMSG_LENGTH, which should all be INITIAL_EXPBUFFER_SIZE according to that header comment. snprintf typically doesn't write more than it has to, but if there ever were a message exceeding INITIAL_EXPBUFFER_SIZE we'd be at risk of a memory clobber. So that should be changed as far back as it does that. Do you want to take care of it? I can if you don't want to. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Oops in fe-auth.c
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: I never actually tested if it crashes on mingw, but looking some more at it it really should - once one of these errors happen. Hm. Much easier than that - the code is new in HEAD. 8.2 did fprintf(stderr). And HEAD still does that in at least one case. Anyway, I'll go ahead with the patch I wrote since it does Seem Nicer to actually use the PQexpbuffer code there, and the patch was rather trivial, but it's certainly not something to backpatch then... It does look like there is a risk in 8.2 and before, though: the fe-auth.c code has a lot of snprintf's with PQERRORMSG_LENGTH, which should all be INITIAL_EXPBUFFER_SIZE according to that header comment. snprintf typically doesn't write more than it has to, but if there ever were a message exceeding INITIAL_EXPBUFFER_SIZE we'd be at risk of a memory clobber. So that should be changed as far back as it does that. Do you want to take care of it? I can if you don't want to. Oh, didn't realize that one. I can take a look at that as well, once I'm done with this one. Seems easy enough - I'll leave you to focus on the more difficult stuff :-) //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Full page images in WAL Cache Invalidation
Simon Riggs wrote: On Sun, 2007-07-22 at 19:58 +0200, Florian G. Pflug wrote: Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. I don't believe there is any workable solution to that short of logging cache-flush operations in WAL. The reason that I dislike WAL-logging of the flush operations so much is that it since peopel are concerned about the amount of wal traffic postgres generated, such a solution would introduce yet another GUC. And to make this reasonable foolproof, the slave would need a way to detect if that GUC is set correctly on the master. All in all, that seems to be quite hackish... Seems like we should WAL log flush operations first. It's fairly straightforward to do that and we can then measure its effect on the primary easily enough. Your other suggestions seem much more complex. I think we have a reasonable tolerance for increases in WAL and as you said earlier, we may balance that out with other optimisations. Or we may find a more efficient way of doing it later. Let's aim to get that first query running, then go back and tune it later. I've so far added an LWLock that makes replay and queries mutually exclusive, Simple testcases seem to work, but I haven't really beaten the system yet... Of course, my current version falls over as soon as you do DDL on the master - working on fixing that, and on subsequently removing that lock again :-) greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] 8.2.4 signal 11 with large transaction
Sibte Abbas [EMAIL PROTECTED] writes: I think printing the first 1K would make more sense. If I understand you correctly, the code path which you are referring to is the send_message_to_server_log() function in elog.c? No, the place that has to change is where errstart() detects that we're recursing. We could possibly have it first try to make a shorter string and only give up entirely if recursion happens again, but given that this is such a corner case I don't think it's worth the complexity and risk of further bugs. I've made it just drop the statement at the same time that it decides to give up on printing other context (which can also be a source of out-of-memory problems btw). http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Jim Nasby wrote: On Jul 22, 2007, at 8:54 AM, Josh Berkus wrote: Tom, Note to all: we ***HAVE TO*** settle on some reasonable default vacuum_cost_delay settings before we can ship 8.3. With no cost delay and two or three workers active, 8.3's autovac does indeed send performance into the tank. I've been using 20ms for most of my setups. That's aimed at reducing autovac to almost no impact at all, but taking a long time. Maybe 10ms? I've found 20ms to be a pretty good number for run-of-the-mill IO capability, and 10ms to be good for a good RAID setup (RAID10, 8+ drives, BBU). For a default setting, I think it'd be better to lean towards 20ms. OK, 20ms it is then. Here is a patch. I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ? msg ? src/tools/entab/entab ? src/tools/entab/entab.fix.diff Index: src/backend/utils/misc/guc.c === RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.405 diff -c -p -r1.405 guc.c *** src/backend/utils/misc/guc.c 10 Jul 2007 13:14:21 - 1.405 --- src/backend/utils/misc/guc.c 23 Jul 2007 17:50:01 - *** static struct config_int ConfigureNamesI *** 1348,1354 GUC_UNIT_MS }, autovacuum_vac_cost_delay, ! -1, -1, 1000, NULL, NULL }, { --- 1348,1354 GUC_UNIT_MS }, autovacuum_vac_cost_delay, ! 20, -1, 1000, NULL, NULL }, { *** static struct config_int ConfigureNamesI *** 1656,1662 NULL }, autovacuum_vac_thresh, ! 500, 0, INT_MAX, NULL, NULL }, { {autovacuum_analyze_threshold, PGC_SIGHUP, AUTOVACUUM, --- 1656,1662 NULL }, autovacuum_vac_thresh, ! 50, 0, INT_MAX, NULL, NULL }, { {autovacuum_analyze_threshold, PGC_SIGHUP, AUTOVACUUM, *** static struct config_int ConfigureNamesI *** 1664,1670 NULL }, autovacuum_anl_thresh, ! 250, 0, INT_MAX, NULL, NULL }, { /* see varsup.c for why this is PGC_POSTMASTER not PGC_SIGHUP */ --- 1664,1670 NULL }, autovacuum_anl_thresh, ! 50, 0, INT_MAX, NULL, NULL }, { /* see varsup.c for why this is PGC_POSTMASTER not PGC_SIGHUP */ Index: src/backend/utils/misc/postgresql.conf.sample === RCS file: /cvsroot/pgsql/src/backend/utils/misc/postgresql.conf.sample,v retrieving revision 1.219 diff -c -p -r1.219 postgresql.conf.sample *** src/backend/utils/misc/postgresql.conf.sample 10 Jul 2007 13:14:21 - 1.219 --- src/backend/utils/misc/postgresql.conf.sample 23 Jul 2007 17:50:01 - *** *** 381,389 #log_autovacuum = -1 # -1 is disabled, 0 logs all actions # and their durations, 0 logs only # actions running at least N msec. ! #autovacuum_vacuum_threshold = 500 # min # of tuple updates before # vacuum ! #autovacuum_analyze_threshold = 250 # min # of tuple updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before # vacuum --- 381,389 #log_autovacuum = -1 # -1 is disabled, 0 logs all actions # and their durations, 0 logs only # actions running at least N msec. ! #autovacuum_vacuum_threshold = 50 # min # of tuple updates before # vacuum ! #autovacuum_analyze_threshold = 50 # min # of tuple updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before # vacuum *** *** 391,397 # analyze #autovacuum_freeze_max_age = 2 # maximum XID age before forced vacuum # (change requires restart) ! #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for # autovacuum, -1 means use # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for --- 391,397 # analyze #autovacuum_freeze_max_age = 2 # maximum XID age before forced vacuum # (change requires restart) ! #autovacuum_vacuum_cost_delay = 20 # default vacuum cost delay for # autovacuum, -1 means use # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Alvaro Herrera [EMAIL PROTECTED] writes: OK, 20ms it is then. Here is a patch. I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Patch probably needs to touch docs (config.sgml at least) too? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] supporting 0x00 from client Unicode JDBC
In reference to http://archives.postgresql.org/pgsql-jdbc/2007-02/msg00116.php Which backend developers can assist to modify the server to allow a modified UTF8? Thanks cody Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mailp=summer+activities+for+kidscs=bz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] supporting 0x00 from client Unicode JDBC
Cody Bennett wrote: In reference to http://archives.postgresql.org/pgsql-jdbc/2007-02/msg00116.php Which backend developers can assist to modify the server to allow a modified UTF8? I suspect you'll need divine intervention ... cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] 8.2.4 signal 11 with large transaction
On 7/23/07, Tom Lane [EMAIL PROTECTED] wrote: No, the place that has to change is where errstart() detects that we're recursing. We could possibly have it first try to make a shorter string and only give up entirely if recursion happens again, but given that this is such a corner case I don't think it's worth the complexity and risk of further bugs. I've made it just drop the statement at the same time that it decides to give up on printing other context (which can also be a source of out-of-memory problems btw). http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php Makes sense. regards, -- Sibte Abbas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote: I'd like to help where I can if nobody else is currently doing this. I would focus initially on some analysis of the various use cases to give a better view on what we would need B-tree, clustered indexes and bitmap indexes to do for us. I've done some further analysis of bitmap indexes in preparation for a comparison with clustered indexes (GIT), to help understand the use cases for each. Overall, my conclusion is that BMI and GIT have separate use cases, almost opposite use cases or at least orthogonal ones. I would eventually like both. BMI optimises for high numbers of rows per value, whilst GIT optimises for clustering of values. BMI is not useful at all for PKs, whilst GIT is specifically designed to handle them. Both handle INSERTs well, though GIT handles growing numbers of values easily, BMI prefers to keep the distribution more constant. GIT needs HOT to continue to operate effectively for long periods, whereas BMI doesn't seem to handle UPDATEs well at all (but more testing required on that one). --- Neither the latest bitmap index nor the latest GIT patch applied cleanly. The bitmap patch was close, but GIT needs an update yet to integrate Alexey's recent work. My test case was a table with 10 million rows, with columns with varying numbers of unique values. So Ndistinct = 100 means 100,000 rows per value. BITMAP INDEXES Ndistinct Best time Size in blocks 1 10.6s 100 10 10.4s 102 100 11.7s 2002 100015.1s 6006 1 19.8s 10046 10 82.1s 100442 100 - 45 Size exactly equivalent for both Integer and Text (same values). Build time was similar also. The test for 1 million distinct values didn't return after over 4 CPU minutes expended with the disk going crazy. After a number of minutes I decided to cancel the index build. Multiple cancels didn't stop the build, so after some more time I decided to kill it, which then crashed the server. Automatic restart crashed as well with a could not find transaction id 0 error. Clearly some WAL-weirdness to investigate... Overall, I'd have to say that's quite enough for me to say bitmap is not quite ready yet without clear health warnings. I had hopes... B-TREE INDEXES (Integers) Rows/value Best time Size in blocks 100049s 21899 100 49s 21899 10 49s 21899 1 47s 21899 100043s 21899 100 38s 21899 10 38s 21899 1 33s 21899 Build time for Integers shown. Build time for Text ~x5-6 times as long. Testing against equivalent b-tree builds, the fastest b-tree build I could get was 33s on a unique integer index. So BMI build time is certainly optimised for low numbers of distinct values, but doesn't have any optimisation for when the BMI is built on a poor candidate column. GIT does degrade down to a normal b-tree when clustering isn't sufficient to give reduction in index size. The cross-over point was between 10^4 and 10^5 distinct values for both size and build time; on that test around 100-1000 rows per value. So BMIs are probably still useful with varying number of rows per value, but overall high Ndistinct proves inefficient in both build time and space allocation. This isn't such a surprise since we know that b-tree build uses a sort-based plan whereas BMI uses a hash based plan; neither will win all of the time, we know that from the executor. GIT works well even with unique indexes, since each grouped tuple covers a range of values. I'll re-run the tests when I can to get timings. GIT can compress typically down to 1-5% with clustered data, not quite as good as bitmap's 0.5% best. GIT's design was to have an index that was tuned for clustered data, yet degrades cleanly to a standard b-tree when conditions are not right. This makes me think that a hybrid b-tree should be possible, even desirable. When the data is clustered, use the grouping technique to reduce he number of tuples stored and when the data is highly non-unique use the bitmap technique to reduce numbers of tuples. Using both techniques in the same index would offer even wider flexibility, since we'd be able to cater for real-world data more easily. Both GIT and BMI use bitmaps, just in different ways. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Simon Riggs [EMAIL PROTECTED] writes: ... BMI is not useful at all for PKs, whilst GIT is specifically designed to handle them. This seems a strange statement, because GIT doesn't look particularly efficient for unique indexes AFAICS. In the worst case you'd have to look individually at each tuple on a heap page to check for uniqueness conflict (no binary search, because you couldn't assume they are ordered). B-TREE INDEXES (Integers) Rows/valueBest time Size in blocks 1000 49s 21899 100 49s 21899 1049s 21899 1 47s 21899 1000 43s 21899 100 38s 21899 1038s 21899 1 33s 21899 Surely the GIT code failed to kick in at all here? That's just about exactly the index size I'd expect for 10 million integers with the existing btree code (at least when MAXALIGN=4). regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch)
On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ... BMI is not useful at all for PKs, whilst GIT is specifically designed to handle them. This seems a strange statement, because GIT doesn't look particularly efficient for unique indexes AFAICS. In the worst case you'd have to look individually at each tuple on a heap page to check for uniqueness conflict (no binary search, because you couldn't assume they are ordered). That is one of a few heuristics about the patch that need some active discussion, so I'm glad you asked. The main use case is nearly-unique, so for cases where we have a Master:Detail relationship, e.g. Order:OrderItem. The Order index is a PK, with the OrderItem index as a nearly unique key. The index is not brilliant for the Order index, but is good for the OrderItem index. Heikki designed the grouping so that there is a state change between non-grouped and non-grouped (normal) index entries. By default the patch uses a threshold of non-grouped - grouped at N=2 index entries and then no limit on the number of rows/block. Currently you can tune N, but we might also envisage setting a limit on the width of the range of values to limit the number of tids stored in a grouped index entry. That could control the uniqueness overhead. On an I/O bound workload the space saving on the index outweighs the CPU loss from uniqueness checking. When I/O is not an issue then unfortunately there is a CPU overhead. For GIT it would appear that the summary is that it gives a slight loss on medium sized PK indexes and an increasing win as index size increases. We struggled to come up with ways of making it Just Work with as few parameters as possible. B-TREE INDEXES (Integers) Rows/value Best time Size in blocks 100049s 21899 100 49s 21899 10 49s 21899 1 47s 21899 100043s 21899 100 38s 21899 10 38s 21899 1 33s 21899 Surely the GIT code failed to kick in at all here? That's just about exactly the index size I'd expect for 10 million integers with the existing btree code (at least when MAXALIGN=4). That was the b-tree test, i.e. the control. The GIT patch has bitrot, so not able to test just yet. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why so many out-of-disk-space failures on buildfarm machines?
On 7/18/07, Tom Lane [EMAIL PROTECTED] wrote: Andrew Dunstan [EMAIL PROTECTED] writes: I don't think we're ever going to fix things for the 7.3 error you're getting - please take it out of your rotation. 7.3 isn't quite as dead as Joshua suggested earlier, but it's certainly on life support. I checked the CVS logs and it appears that we fixed several contrib modules, not only cube, to work with flex 2.5.31 during the 7.4 devel cycle. I don't think anyone cares to back-port that much work. Our position should be if you want to build 7.3 you need flex 2.5.4 to do it. If Mark still wants to test 7.3, he could install flex 2.5.4 someplace and make sure that's first in the PATH while building 7.3. I have flex 2.5.33 on the system, but I have decided to take the easy way out and removed 7.3 out of my rotation. Regards, Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: OK, 20ms it is then. Here is a patch. I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Patch probably needs to touch docs (config.sgml at least) too? That's right -- I was happy because I checked maintenance.sgml and catalog.sgml and they didn't need any update, I forgot to check config.sgml. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: doc/src/sgml/config.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.131 diff -c -p -r1.131 config.sgml *** doc/src/sgml/config.sgml 18 Jul 2007 12:00:47 - 1.131 --- doc/src/sgml/config.sgml 24 Jul 2007 00:22:46 - *** SELECT * FROM parent WHERE key = 2400; *** 3247,3253 para Specifies the minimum number of updated or deleted tuples needed to trigger a commandVACUUM/ in any one table. ! The default is 500 tuples. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in --- 3247,3253 para Specifies the minimum number of updated or deleted tuples needed to trigger a commandVACUUM/ in any one table. ! The default is 50 tuples. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in *** SELECT * FROM parent WHERE key = 2400; *** 3265,3271 para Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an commandANALYZE/ in any one table. ! The default is 250 tuples. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in --- 3265,3271 para Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an commandANALYZE/ in any one table. ! The default is 50 tuples. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in *** SELECT * FROM parent WHERE key = 2400; *** 3343,3350 para Specifies the cost delay value that will be used in automatic commandVACUUM/ operations. If literal-1/ is ! specified (which is the default), the regular xref linkend=guc-vacuum-cost-delay value will be used. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in --- 3343,3351 para Specifies the cost delay value that will be used in automatic commandVACUUM/ operations. If literal-1/ is ! specified, the regular xref linkend=guc-vacuum-cost-delay value will be used. + The default value is 20 milliseconds. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. This setting can be overridden for individual tables by entries in Index: src/backend/utils/misc/guc.c === RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.405 diff -c -p -r1.405 guc.c *** src/backend/utils/misc/guc.c 10 Jul 2007 13:14:21 - 1.405 --- src/backend/utils/misc/guc.c 24 Jul 2007 00:22:51 - *** static struct config_int ConfigureNamesI *** 1348,1354 GUC_UNIT_MS }, autovacuum_vac_cost_delay, ! -1, -1, 1000, NULL, NULL }, { --- 1348,1354 GUC_UNIT_MS }, autovacuum_vac_cost_delay, ! 20, -1, 1000, NULL, NULL }, { *** static struct config_int ConfigureNamesI *** 1656,1662 NULL }, autovacuum_vac_thresh, ! 500, 0, INT_MAX, NULL, NULL }, { {autovacuum_analyze_threshold, PGC_SIGHUP, AUTOVACUUM, --- 1656,1662 NULL }, autovacuum_vac_thresh, ! 50, 0, INT_MAX, NULL, NULL }, { {autovacuum_analyze_threshold, PGC_SIGHUP, AUTOVACUUM, *** static struct config_int ConfigureNamesI *** 1664,1670 NULL }, autovacuum_anl_thresh, ! 250, 0, INT_MAX, NULL, NULL }, { /* see varsup.c for why this is PGC_POSTMASTER not PGC_SIGHUP */ --- 1664,1670 NULL }, autovacuum_anl_thresh, ! 50, 0, INT_MAX, NULL, NULL }, { /*
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Alvaro Herrera [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Did we also reach any consensus about lowering the percentage of dead tuples in a table before we trigger vacuum? I think 20% is way too high and 5% is saner. I actually think it would be better even lower but would be ok with 5%. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Did we also reach any consensus about lowering the percentage of dead tuples in a table before we trigger vacuum? I think 20% is way too high and 5% is saner. I actually think it would be better even lower but would be ok with 5%. I think that 5-10% is reasonable, 20% is way to high. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] avoiding WAL logging in 8.3
Hi, I noticed in 8.3 there are chances where we can avoid WAL logging. For example, 8.3's pgbench was modified to use TRUNCATE right before COPY. Is there any documentation which describes that kind of techniques? If there's none, I would volunteer the work to create such a documentation since I think this is valuable information for DBAs who wish to migrate to 8.3. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] autovacuum default parameters (was Re: 8.2 is 30% better in pgbench than 8.3)
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Did we also reach any consensus about lowering the percentage of dead tuples in a table before we trigger vacuum? I think 20% is way too high and 5% is saner. I actually think it would be better even lower but would be ok with 5%. We didn't, but while I agree with the idea, I think 5% is too low. I don't want autovacuum to get excessively aggressive. Is 10% not enough? How about the analyze scale factor, should we keep the current 10%? I have less of a problem with reducing it further since analyze is cheaper than vacuum. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum default parameters (was Re: 8.2 is 30% better in pgbench than 8.3)
Alvaro Herrera wrote: Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: I am taking the liberty to also lower the vacuum and analyze threshold default values to 50, per previous discussion. Did we also reach any consensus about lowering the percentage of dead tuples in a table before we trigger vacuum? I think 20% is way too high and 5% is saner. I actually think it would be better even lower but would be ok with 5%. We didn't, but while I agree with the idea, I think 5% is too low. I don't want autovacuum to get excessively aggressive. Is 10% not enough? It depends really. 10% on a small table seems like a waste except that small tables are quick to vacuum. 10% on a table with 20 million rows, is a lot of dead rows. Joshua D. Drake How about the analyze scale factor, should we keep the current 10%? I have less of a problem with reducing it further since analyze is cheaper than vacuum. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] GucContext of log_autovacuum
The GucContext of log_autovacuum is PGC_BACKEND in the CVS HEAD, but should it be PGC_SIGHUP? We cannot modify the variable on-the-fly because the parameter is used only by autovacuum worker processes. The similar variables, like autovacuum_vacuum_scale_factor, are defined as PGC_SIGHUP. Index: src/backend/utils/misc/guc.c === --- src/backend/utils/misc/guc.c(head) +++ src/backend/utils/misc/guc.c(working copy) @@ -1552,7 +1552,7 @@ }, { - {log_autovacuum, PGC_BACKEND, LOGGING_WHAT, + {log_autovacuum, PGC_SIGHUP, LOGGING_WHAT, gettext_noop(Sets the minimum execution time above which autovacuum actions will be logged.), gettext_noop(Zero prints all actions. The default is -1 (turning this feature off).), Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum default parameters (was Re: 8.2 is 30% better in pgbench than 8.3)
Alvaro Herrera [EMAIL PROTECTED] wrote: We didn't, but while I agree with the idea, I think 5% is too low. I don't want autovacuum to get excessively aggressive. Is 10% not enough? I think the threshold should be a little less than PCTFREE of indexes, to avoid splitting of btree leaves. It might be good to decrease the fillfactor to 85% or so when we choise 10% for it. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster