Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote: I wrote: It'd definitely be nicer that way, but given the current limitations of bootstrap mode I see no non-kluge way to make a built-in function have OUT parameters. (Hint: array_in doesn't work in bootstrap mode.) Actually, that turns out not to be so hard to fix as I thought. array_in only needs to work for the array types used in the core system tables, and bootstrap.c already has a hard-wired table of that info ... we only have to make it available to array_in. Which I just did. Cool; I'd noticed that this would have been the first such function. So let's fix pg_xlogfile_name_offset() to have two OUT parameters instead of returning a smushed-together string. I'll do this, but I'm conscious that this is a cosmetic change. I'm going on vacation very soon now, so test reports of the major functionality would be greatly appreciated. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
On 8/15/06 6:18 AM, Tom Lane [EMAIL PROTECTED] wrote: Gavin Sherry [EMAIL PROTECTED] writes: On Mon, 14 Aug 2006, Tom Lane wrote: Correct me if I'm wrong, but isn't the patch's present hacking on the executor intended to make it happen like this? Not really. It reads ahead on the bitmap index and passes back the bitmap words. The other executor routines are hacked up to process the data in this format. Well, as I said, I don't think there's justification for exposing a bitmap index's internal data formats to the rest of the system like that: it's not very future-proof and I don't see that it's buying any significant performance gain. At some point you have to convert to TIDs anyway, at least in the sense of knowing what page and line number you are at, so passing the data as an array of TIDs really isn't going to hurt much. So my advice is to rip out all those changes and go back to the existing tidbitmap.c readout API. There's nothing wrong with the TBMIterateResult data structure. The bitmap words in the bitmap index are very simple and can be very generic. You can think about them as one bit per tuple along with some padding bits between heap pages. The problem I have is that I do not know a good way to construct an in-memory version of this for other index structures, like b-tree. To be able to handle both cases nicely, you are right -- TBMIterateResult is better. Or, PagetableEntry may be better since it will make AND/OR easier. What I do find interesting to think about is whether, strictly within tidbitmap.c, there could be an alternate kind of bitmap object that doesn't have to materialize the whole bitmap for an indexscan in memory because it knows it can fetch the data on-demand, ie, build the next page TBMIterateResult data structure on-the-fly from the index when it's requested. Call it a stream bitmap in contrast to the present materialized bitmaps. The trick here is to be able to AND and OR a stream bitmap with another stream bitmap or a materialized bitmap. I don't see any reason in principle why that couldn't be done: the output of the AND/OR would be a stream of TBMIterateResults just like the inputs. That is, it's another stream bitmap, but with a different generating function and some internal state that includes its source bitmaps. You'd have to sort a materialized bitmap into order before starting to AND/OR it with a stream bitmap, but that code is there already. I like this idea. I think that we can define a new TBMStatus to be TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except that we add a new returning bool argument, stating if this is a stream bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages, and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several bitmaps, the result bitmap is a stream bitmap if there is at least one bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to be able to pull more data from its subnode. Thanks, Jie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Enum proposal / design
We forgot to mention that we'll need to implement domains over enums and arrays of enums too. cheers andrew Tom Dunstan wrote: Hi guys Andrew and I got together and worked out a more detailed idea of how we want to add enums to the postgresql core. This follows on from his original enumkit prototype last year [1]. Here's a more formal proposal / design with what we came up with. Comments / criticism hereby solicited. How they will work (once created) is more or less the same as last time with the enumkit, with the exception of how they're created. Enum types will be created with a specialised version of the CREATE TYPE command thusly: CREATE TYPE rgb AS ENUM ('red', 'green', 'blue'); They can then be used as column types, being input in quoted string form as with other user types: CREATE TABLE enumtest (col rgb); INSERT INTO enumtest VALUES ('red'); Input is to be case sensitive, and ordering is to be in the definition order, not the collation order of the text values (ie 'red' 'green' in the example above). See the original thread for more discussion and usage examples. The implementation will work as below. I've included something of a list of stuff to do as well. On disk, enums will occupy 4 bytes: the high 22 bits will be an enum identifier, with the bottom 10 bits being the enum value. This allows 1024 values for a given enum, and 2^22 different enum types, both of which should be heaps. The exact distribution of bits doesn't matter all that much, we just picked some that we were comfortable with. The identifier is required as output functions are not fed information about which exact type they are being asked to format (see below). The creation of a new pg_enum catalog is required. This will hold: - the type OID for the enum, from pg_type - the enum identifier for on disk storage - the enum values in definition order, as an array of text values The CREATE TYPE command will create a row in pg_type and a row in pg_enum. We will get a new enum id by scanning pg_enum and looking for the first unused value, rather than using a sequence, to make reuse of enum ids more predictable. Two new syscaches on pg_enum will be created to simplify lookup in the i/o functions: one indexed by type oid for the input function, and one indexed by enum id for the output function. All functions will be builtins; there will be no duplicate entries of them in pg_proc as was required for the enumkit. The i/o functions will both cache enum info in the same way that the domain and composite type i/o functions do, by attaching the data to the fcinfo-flinfo-fn_extra pointer. The input function will look up the enum data in the syscache using the type oid that it will be passed, and cache it in a hashtable or binary tree for easy repeated lookup. The output function will look up the enum data in the syscache using the enum id stripped from the high 22 bits of the on-disk value and cache the data as a straight array for easy access, with the enum value being used as a index into the array. The other functions will all work pretty much like they did in the enumkit, with comparison operators more or less treating the enum as its integer representation. The grammar will have to be extended to support the new CREATE TYPE syntax. This should not require making ENUM a reserved word. Likewise psql will be extended to learn the new grammar. There's probably a bit of work to do in DROP TYPE to make sure it deletes rows from pg_enum when appropriate. pg_dump must be taught how to dump enums properly. We'll need some regression tests, maybe including one in one of the PL testsuites to ensure that the io functions work happily when called from a non-standard direction. Documentation etc. General discussion: While we would really like to have had a 2 byte representation on disk (or even 1 for most cases), with the stored value being *just* the enum ordinal and not containing any type info about the enum type itself, this is difficult. Since the output function cleanup [2] [3], postgresql doesn't pass through the expected output type to output functions. This makes it difficult to tell the difference between e.g. the first value of the various enums, which would all have an integer representation of 0. We could have gone down the path of having the output function look up its expected type from the fcinfo-flinfo struct, as Martijn's tagged types do [4], but that would have required extra entries in pg_proc for every single enum. Alternatively we could have stored the full enum type oid on disk, but that would have blown out the on-disk representation to 5 or 6 bytes. The given approach of having a smaller enum id and the enum ordinal value stored in the 4 bytes seems a reasonable tradeoff given the current constraints. To preempt some questions (particularly some which came up in the enumkit discussion), here's a
[HACKERS] seahorse buildfarm issues
hi all! seahorse is struggling to submit buildfarm reports for a few days now. there seems to a rather weird thing going on since what appears to happen is that the build fails during make check with a crashing postmaster but that crash is also hanging the buildfarm-script and so it fails to submit a report. After that event - there are no remaining processes in the taskmanager but the OS still believes that some files (postgres.exe and some of the files used in the regression test) are still in use. The only way to recover from that (at least that I found as a non-windows person) is a reboot of the whole VM - this results then in a real large number of application failed to initialize properly errors during shutdown(!). the applications mentioned in the errors are more.exe,cmd.exe and diff.exe. running the buildfarm script manually works just fine - which makes that whole issue even more weird. the following is in the postmaster.log: ERROR: invalid input syntax for type circle: (3,(1,2),3) ERROR: date/time value current is no longer supported ERROR: date/time field value out of range: 1997-02-29 ERROR: invalid input syntax for type time with time zone: 15:36:39 America/New_York LOG: server process (PID 2016) exited with exit code -1073741502 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process I'm out of ideas how to go further in debugging that issue - any ideas(maybe from somebody who knows windows better than I do) ? Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] cache reference leak and problem in alloc set warnings
Hi, I've been trying to implement INOUT/OUT functionality in PL/scheme. When I return a record type tuple, postmaster complains with below warnings: WARNING: problem in alloc set ExprContext: detected write past chunk end in block 0x8462f00, chunk 0x84634c8 WARNING: cache reference leak: cache pg_type (34), tuple 2/7 has count 1 I found a related thread in the ml archives that Joe Conway fixed a similar problem in one of his patches but I couldn't figure out how he did it. Can somebody help me to figure out the reasons of above warnings and how can I fix them? Regards. P.S. Also here's the backtrace of stack just before warnings are dumped. Yeah, it's a little bit useless 'cause there's nearly one way to reach these errors but... I thought it can give an oversight to hackers who takes a quick look. Breakpoint 2, AllocSetCheck (context=0x845ff58) at aset.c:1155 1155elog(WARNING, problem in alloc set %s: detected write past c (gdb) where #0 AllocSetCheck (context=0x845ff58) at aset.c:1155 #1 0x0829b728 in AllocSetReset (context=0x845ff58) at aset.c:407 #2 0x0829c958 in MemoryContextReset (context=0x845ff58) at mcxt.c:129 #3 0x0817dce5 in ExecResult (node=0x84a0754) at nodeResult.c:113 #4 0x0816b423 in ExecProcNode (node=0x84a0754) at execProcnode.c:334 #5 0x081698fb in ExecutePlan (estate=0x84a05bc, planstate=0x84a0754, operation=CMD_SELECT, numberTuples=0, direction=138818820, dest=0x84102ec) at execMain.c:1145 #6 0x0816888b in ExecutorRun (queryDesc=0x842c680, direction=ForwardScanDirection, count=138818820) at execMain.c:223 #7 0x08204a08 in PortalRunSelect (portal=0x842eae4, forward=1 '\001', count=0, dest=0x84102ec) at pquery.c:803 #8 0x08204762 in PortalRun (portal=0x842eae4, count=2147483647, dest=0x84102ec, altdest=0x84102ec, completionTag=0xbfc23cb0 ) at pquery.c:655 #9 0x082001e5 in exec_simple_query (query_string=0x840f91c SELECT in_out_t_2(13, true);) at postgres.c:1004 #10 0x08202de5 in PostgresMain (argc=4, argv=0x83bd7fc, username=0x83bd7d4 vy) at postgres.c:3184 #11 0x081d6b54 in BackendRun (port=0x83d21a8) at postmaster.c:2853 #12 0x081d636f in BackendStartup (port=0x83d21a8) at postmaster.c:2490 #13 0x081d455e in ServerLoop () at postmaster.c:1203 #14 0x081d39ca in PostmasterMain (argc=3, argv=0x83bb888) at postmaster.c:955 #15 0x0818d404 in main (argc=3, argv=0x83bb888) at main.c:187 Breakpoint 1, PrintCatCacheLeakWarning (tuple=0xb5ef7dbc) at catcache.c:1808 1808Assert(ct-ct_magic == CT_MAGIC); (gdb) where #0 PrintCatCacheLeakWarning (tuple=0xb5ef7dbc) at catcache.c:1808 #1 0x0829e927 in ResourceOwnerReleaseInternal (owner=0x83da800, phase=RESOURCE_RELEASE_AFTER_LOCKS, isCommit=1 '\001', isTopLevel=0 '\0') at resowner.c:273 #2 0x0829e64c in ResourceOwnerRelease (owner=0x83da800, phase=RESOURCE_RELEASE_AFTER_LOCKS, isCommit=1 '\001', isTopLevel=0 '\0') at resowner.c:165 #3 0x0829dd8e in PortalDrop (portal=0x842eae4, isTopCommit=0 '\0') at portalmem.c:358 #4 0x082001f9 in exec_simple_query (query_string=0x840f91c SELECT in_out_t_2(13, true);) at postgres.c:1012 #5 0x08202de5 in PostgresMain (argc=4, argv=0x83bd7fc, username=0x83bd7d4 vy) at postgres.c:3184 #6 0x081d6b54 in BackendRun (port=0x83d21a8) at postmaster.c:2853 #7 0x081d636f in BackendStartup (port=0x83d21a8) at postmaster.c:2490 #8 0x081d455e in ServerLoop () at postmaster.c:1203 #9 0x081d39ca in PostmasterMain (argc=3, argv=0x83bb888) at postmaster.c:955 #10 0x0818d404 in main (argc=3, argv=0x83bb888) at main.c:187 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Am Mittwoch, 16. August 2006 14:10 schrieb Robert Treat: I'm not sure I follow this, since currently anyone can email the bugs list or use the bugs - email form from the website. Are you looking to increase the barrier for bug reporting? Only a small fraction of the new posts on pgsql-bugs are actually bugs. Most are confused or misdirected users. I don't want to raise that barrier. But I want a higher barrier before something is recorded in the bug tracking system. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
On Wednesday 16 August 2006 00:52, Peter Eisentraut wrote: Tom Lane wrote: that the bug tracker would have to have a reasonable output email capability, but I'd not necessarily insist on being able to input to it by mail. Red Hat's present bugzilla system could be described that way --- and while I can't say I'm in love with it, I can deal with it. Bugzilla is good in that you need to sign up to report anything (or at least it can be configured that way, not sure), which might reduce the amount of noise. The other systems that have been mentioned have by design little or no barrier of entry, which doesn't seem to be what we want. I'm not sure I follow this, since currently anyone can email the bugs list or use the bugs - email form from the website. Are you looking to increase the barrier for bug reporting? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
On Wed, Aug 16, 2006 at 02:28:53PM +0200, Peter Eisentraut wrote: Am Mittwoch, 16. August 2006 14:10 schrieb Robert Treat: I'm not sure I follow this, since currently anyone can email the bugs list or use the bugs - email form from the website. Are you looking to increase the barrier for bug reporting? Only a small fraction of the new posts on pgsql-bugs are actually bugs. Most are confused or misdirected users. I don't want to raise that barrier. But I want a higher barrier before something is recorded in the bug tracking system. Well, you need to get some agreement on what the bug tracker is for. Is it: a) a front-end to deal with complaints and bugs people have. Is it something you expect end users to look at? This is how Debian uses its bug-tracker, to make sure issues people bring up don't get lost. You can always close the bug if it isn't a real bug. Or: b) a private bug database only used by -hackers to track known outstanding bugs and patches. If you want the latter, the approach would be to keep pgsql-bugs and when a real issue comes up, bounce it to the bug tracker. Any subsequent email discussion should then get logged in the bug report. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote: So let's fix pg_xlogfile_name_offset() to have two OUT parameters instead of returning a smushed-together string. I'll do this, but I'm conscious that this is a cosmetic change. Well, it's cosmetic, but it's also an API change, which means that this is our only opportunity to get it right. Once these functions are in a release it will be too hard to change them. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Martijn van Oosterhout wrote: On Wed, Aug 16, 2006 at 02:28:53PM +0200, Peter Eisentraut wrote: Am Mittwoch, 16. August 2006 14:10 schrieb Robert Treat: I'm not sure I follow this, since currently anyone can email the bugs list or use the bugs - email form from the website. Are you looking to increase the barrier for bug reporting? Only a small fraction of the new posts on pgsql-bugs are actually bugs. Most are confused or misdirected users. I don't want to raise that barrier. But I want a higher barrier before something is recorded in the bug tracking system. Well, you need to get some agreement on what the bug tracker is for. Is it: a) a front-end to deal with complaints and bugs people have. Is it something you expect end users to look at? This is how Debian uses its bug-tracker, to make sure issues people bring up don't get lost. You can always close the bug if it isn't a real bug. Or: b) a private bug database only used by -hackers to track known outstanding bugs and patches. If you want the latter, the approach would be to keep pgsql-bugs and when a real issue comes up, bounce it to the bug tracker. Any subsequent email discussion should then get logged in the bug report. Have a nice day, What we are talking about here is bug triage. Weeding out misreports, duplicates etc. is a prime part of this function. It is essential to the health of any functioning bug tracking system. All it takes is resources. Is it worth it? Yes, IMNSHO, but it's a judgement call. One sensible way to do this would be to have a group of suitably qualified volunteers who could perform this function on a roster basis, for, say, a week or a two at a time. That way we could the load off key personnel like Tom (I am in favor of anything which would reduce the demands we place on Tom ;-) ) cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
Jie Zhang [EMAIL PROTECTED] writes: On 8/15/06 6:18 AM, Tom Lane [EMAIL PROTECTED] wrote: Well, as I said, I don't think there's justification for exposing a bitmap index's internal data formats to the rest of the system like that: it's not very future-proof and I don't see that it's buying any significant performance gain. The bitmap words in the bitmap index are very simple and can be very generic. They're not generic in the least: there's a compression scheme involved that you might want to whack around at any time. So I disagree with the idea that it's OK to expose the format outside the access/bitmap/ module. I like this idea. I think that we can define a new TBMStatus to be TBM_STREAM in TIDBitmap. It occurs to me that what tbm_begin_iterate really is is a constructor for a stream bitmap object that reads out the contents of a tbm bitmap (we need a decent name for the non-stream data structure ... maybe hash bitmap?). If we think of it like that then we can unify the ideas some more. My proposal at this point would be to invent two different Node types, one for stream bitmaps and one for hash bitmaps. The initial input to nodeBitmapHeapscan can be either, but if it's given a hash bitmap then it stream-ifies it for use. amgetmulti can return either kind, and nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do. Preserving the existing optimization for ORing hash bitmaps is a bit tricky but I think it's doable. Consider this API for amgetmulti: amgetmulti takes an argument which can be either a hash bitmap or NULL. It returns an object that must be either a hash or stream bitmap. If it wants to return a stream bitmap, it simply disregards the argument and returns a constructed stream-bitmap object. If it wants to return a hash bitmap, then if the argument is not NULL, OR the additional bits into the argument object and return it; if the argument is NULL, construct a fresh hash-bitmap object, set bits in it, and return it. Assume that we have the existing hash-bitmap AND/OR functions as well as constructors for AND and OR stream bitmaps that take lists of input stream objects. Then the algorithm for nodeBitmapOr looks like this: HashBitmap *hashBitmap = NULL; List *streamBitmaps = NIL; foreach(input plan) { Node *newBitmap = amgetmulti(hashBitmap); if (IsA(newBitmap, HashBitmap)) { // any OR-ing required was done implicitly hashBitmap = newBitmap; } else { Assert(IsA(newBitmap, StreamBitmap)); streamBitmaps = lappend(streamBitmaps, newBitmap); } } if (streamBitmaps == NIL) { // all inputs returned hash, so we're done return hashBitmap; } else { // need a stream OR operation atop the inputs if (hashBitmap) streamBitmaps = lappend(streamBitmaps, HashToStreamBitmap(hashBitmap)); return ConstructStreamOr(streamBitmaps); } nodeBitmapAnd is a bit different but not any harder. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] cache reference leak and problem in alloc set warnings
On Aug 16 03:09, Volkan YAZICI wrote: WARNING: problem in alloc set ExprContext: detected write past chunk end in block 0x8462f00, chunk 0x84634c8 WARNING: cache reference leak: cache pg_type (34), tuple 2/7 has count 1 Excuse me for bugging the list. I've solved the problem. I should look for ReleaseSysCache() call just after every SearchSysCache(). Regards. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Enum proposal / design
Tom Dunstan [EMAIL PROTECTED] writes: Andrew and I got together and worked out a more detailed idea of how we want to add enums to the postgresql core. This follows on from his original enumkit prototype last year [1]. Here's a more formal proposal / design with what we came up with. Comments / criticism hereby solicited. ... On disk, enums will occupy 4 bytes: the high 22 bits will be an enum identifier, with the bottom 10 bits being the enum value. This allows 1024 values for a given enum, and 2^22 different enum types, both of which should be heaps. The exact distribution of bits doesn't matter all that much, we just picked some that we were comfortable with. I think this is excessive concern for bit-shaving. Make the on-disk representation be 8 bytes instead of 4, then you can store the OID directly and have no need for the separate identifier concept. This in turn eliminates one index, one syscache, and one set of lookup/cache routines. And you can have as many values of an enum as you darn please. The i/o functions will both cache enum info in the same way that the domain and composite type i/o functions do, by attaching the data to the fcinfo-flinfo-fn_extra pointer. The input function will look up the enum data in the syscache using the type oid that it will be passed, and cache it in a hashtable or binary tree for easy repeated lookup. If you didn't notice already: typcache is the place to put any type-related caching you need to add. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Vacuum verbose output
-hackers, I'm working on a vacuum verbose log analyzer for a few weeks now. My goal is to allow a more systematic study of this log output to help improve the vacuum strategy. To do so, I spent a few hours studying the vacuum code and especially the way it logs what it's doing. What surprises me is: * the inconsistency between the display of the duration for lazy vacuum and vacuum full (especially the index cleanup included in the former case and not in the latter) * the fact that there are quite a lot of work done outside of the duration calculated and displayed in the log (the vacuum_heap/repair_frag step for example). The consequence is that it's really hard to figure out how much time was elapsed for each table. Moreover in the case of a vacuum analyze, the analyze operation is not included at all in the time elapsed and we can't know how long it was for each table. I attached an analysis on how it works currently. I'd like to be able to find in the log the time really elapsed for each operation. I'll probably be able to write a patch for this but I'd like to know if anybody else think it's worth it and if anybody has advice/ideas on how to do it nicely. Regards, -- Guillaume vacuum for each relation vacuum_rel full_vacuum_rel (if vacuum full) scan_heap pg_rusage_init(ru0); ereport(vacuuming tableX); vacuum the table ereport(all information + pg_rusage_show(ru0)) for each index vacuum_index pg_rusage_init(ru0); clean the index ereport(all information + pg_rusage_show(ru0)) check for tuple count mismatch vacuum_heap or repair_frag update free space map update statistics in pg_class report results to the stat collector lazy_vacuum_rel (if vacuum) open the indexes lazy_scan_heap pg_rusage_init(ru0); ereport(vacuuming tableX); for each index lazy_vacuum_index pg_rusage_init(ru0); clean the index ereport(all information + pg_rusage_show(ru0)) lazy_vacuum_heap ereport(all information + pg_rusage_show(ru0)) close the indexes optionnaly truncate the relation update free space map update statistics in pg_class report results to the stat collector vacuum_rel(toast table if any) same as above analyze_rel (if analyze) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] libpq Describe Extension [WAS: Bytea and perl]
Volkan YAZICI [EMAIL PROTECTED] writes: On Aug 16 11:37, Tom Lane wrote: I think this viewpoint has pretty much carried the day, so the PQdescribe functions should remain separate. However, it still seems to me that it'd be a shame if PQdescribePrepared couldn't return the statement's output column types, seeing that the backend is going to pass that info to us anyway. I think you have a misunderstanding about the patch I previously sent. When you issue a PQdescribePrepared() call, in the first PQgetResult() call returned PGresult will have the input parameter types of the prepared statement. And in the second PQgetResult() call, returned PGresult will hold statement's output column types. [ raised eyebrow... ] You're right, I didn't understand that, and now that I do I find it completely unacceptable. We need exactly one PGresult per operation, or things just get too weird for clients to manage, particularly when considering async behavior. What you suggest is a *huge* violation of the principle of least surprise. Adding a couple more PGresult accessor functions seems far saner. Another possibility can be like this: PGresult *PQdescribePrepared(PGconn *conn, const char *stmt, Oid **argtypes); No, because that doesn't work at all for the async case. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Enum proposal / design
Tom Lane wrote: Tom Dunstan [EMAIL PROTECTED] writes: On disk, enums will occupy 4 bytes: the high 22 bits will be an enum identifier, with the bottom 10 bits being the enum value. This allows 1024 values for a given enum, and 2^22 different enum types, both of which should be heaps. The exact distribution of bits doesn't matter all that much, we just picked some that we were comfortable with. I think this is excessive concern for bit-shaving. Make the on-disk representation be 8 bytes instead of 4, then you can store the OID directly and have no need for the separate identifier concept. This in turn eliminates one index, one syscache, and one set of lookup/cache routines. And you can have as many values of an enum as you darn please. That's all true. It's a bit depressing to think that IMO 99% of users of this will have enum values whose range would fit into 1 byte, but we'll be using 8 to store it on disk. I had convinced myself that 4 was ok on the basis that alignment issues in surrounding columns would pad out the remaining bits anyway much of the time. Was I correct in that assumption? Would e.g. an int after a char require 3 bytes of padding? Ok, I'll run one more idea up the flagpole before giving up on a 4 byte on disk representation. :) How about assigning a unique 4 byte id to each enum value, and storing that on disk. This would be unique across the database, not per enum type. The structure of pg_enum would be a bit different, as the per-type enum id would be gone, and there would be multiple rows for each enum type. The columns would be: the type oid, the associated unique id and the textual representation. That would probably simplify the caching mechanism as well, since input function lookups could do a straight syscache lookup on type oid and text representation, and the output function could do a straight lookup on the unique id. No need to muck around creating a little dynahash or whatever to attach to the fn_entra pointer. It does still require the extra syscache, but it removes the limitations on number of enum types and number of values per type while keeping the on disk size smallish. I like that better than the original idea, actually. If you didn't notice already: typcache is the place to put any type-related caching you need to add. I hadn't. I'll investigate. Thanks. Cheers Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] libpq Describe Extension [WAS: Bytea and perl]
On Aug 16 11:37, Tom Lane wrote: Volkan YAZICI [EMAIL PROTECTED] writes: On Aug 11 12:51, Greg Sabino Mullane wrote: Prepared statements are not visible nor survivable outside of your session, so this doesn't really make sense. If your application needs the information, it can get it at prepare time. What about persistent connections? Actually, I can give lots of corner cases to support my idea but they're not that often used. I think, as long as we'll break compatibility, placing Describe facility in the PQprepare() is not the way to go. I think this viewpoint has pretty much carried the day, so the PQdescribe functions should remain separate. However, it still seems to me that it'd be a shame if PQdescribePrepared couldn't return the statement's output column types, seeing that the backend is going to pass that info to us anyway. I think you have a misunderstanding about the patch I previously sent. When you issue a PQdescribePrepared() call, in the first PQgetResult() call returned PGresult will have the input parameter types of the prepared statement. And in the second PQgetResult() call, returned PGresult will hold statement's output column types. So I propose storing the parameter type info in a new section of a PGresult struct, and adding new PGresult accessor functions PQnparams, PQparamtype (or maybe PQptype to follow the existing PQftype precedent more closely) to fetch the parameter type info. The existing functions PQnfields etc will fetch output-column info. Aside from being more functional, this definition maintains the principle of least surprise, in that the interpretation of a PGresult from Describe isn't fundamentally different from a PGresult from a regular query. Another possibility can be like this: PGresult *PQdescribePrepared(PGconn *conn, const char *stmt, Oid **argtypes); A PQdescribePrepared() call will immediatly return a PGresult (previosly, we were just returning a boolean value that shows the result of the command send status) result that holds statement's output column types and argtypes will get altered to point to an Oid array that has input parameter type information. (By assigning NULL value to argtypes, user will decide to receive or not receive input parameter types.) We also need async versions PQsendDescribePrepared and PQsendDescribePortal, as I mentioned before. If you decided on the method to use I'm volunteered to modify existing patch. Waiting for your comments. Regards. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
On Wed, Aug 16, 2006 at 09:14:47AM -0400, Andrew Dunstan wrote: What we are talking about here is bug triage. Weeding out misreports, duplicates etc. is a prime part of this function. It is essential to the health of any functioning bug tracking system. All it takes is resources. Is it worth it? Yes, IMNSHO, but it's a judgement call. One sensible way to do this would be to have a group of suitably qualified volunteers who could perform this function on a roster basis, for, say, a week or a two at a time. That way we could the load off key personnel like Tom (I am in favor of anything which would reduce the demands we place on Tom ;-) ) Actually, I'd bet we don't need to put such a formal system in place. I suspect that we'll have users actually looking at the incomming bugs and commenting if they're not valid. As we notice folks who are doing a good job of that, we can give them the privleges to mark bugs as invalid. In the meantime, I'd be glad to help out with 'weeding' incomming bug reports. Depending on the bug tracking system, you can even just let people do this ad-hoc... bugzilla (for example) has an unconfirmed status for new bugs; it would just take people looking at all unconfirmed bugs and marking them appropriately. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] timing psql internal commands
I have just noticed that psql's \timing does not apply to internal commnds like \copy, which surprised me a bit. Is there any reason why it should not apply at least in the case of \copy, which after all does real work, as opposed to to the client housekeeping and info functions that most psql internal commands perform? cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Enum proposal / design
Tom Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: I think this is excessive concern for bit-shaving. Make the on-disk representation be 8 bytes instead of 4, then you can store the OID directly and have no need for the separate identifier concept. That's all true. It's a bit depressing to think that IMO 99% of users of this will have enum values whose range would fit into 1 byte, but we'll be using 8 to store it on disk. I had convinced myself that 4 was ok on the basis that alignment issues in surrounding columns would pad out the remaining bits anyway much of the time. Right, and on a 64-bit machine the same frequently holds at the 8-byte level, so it's not real clear how much you're saving. Ok, I'll run one more idea up the flagpole before giving up on a 4 byte on disk representation. :) How about assigning a unique 4 byte id to each enum value, and storing that on disk. This would be unique across the database, not per enum type. The structure of pg_enum would be a bit different, as the per-type enum id would be gone, and there would be multiple rows for each enum type. The columns would be: the type oid, the associated unique id and the textual representation. That seems not a bad idea. I had been considering complaining that the array-based catalog structure was denormalized, but refrained ... I like the fact that this approach makes it normalized. Another thought is that this isn't really tied to any particular width of stored enum values. You could easily imagine a compile time switch to say you want 2-byte enums instead of 4. Or 8; or even 1. Even more radical: do it at runtime. You could assign the typlen (stored width) of an enum type at creation time on the basis of the largest identifier it contains. This might be a bit too weird because enums created earlier would have a size advantage over those created later, but if you are looking to shave space ... That reminds me: were you intending to allow an ALTER ENUM operation to add (or remove, or rename) elements of an enum type? The above method would fail for the case where an ADD operation needed to assign an identifier wider than the type allowed for. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
On Tue, Aug 15, 2006 at 10:43:12PM -0700, Josh Berkus wrote: Tom, These days I doubt there's anyone around the project who refuses to use a web browser at all. However, I still personally find it much more convenient to read and respond to mailing-list postings than to have to go and visit random web pages to find out if there's something I need to know about. So my current take on this would be that the bug tracker would have to have a reasonable output email capability, but I'd not necessarily insist on being able to input to it by mail. Red Hat's present bugzilla system could be described that way --- and while I can't say I'm in love with it, I can deal with it. Actually, if that's the only objection it's solved. RT will now allow you to create, comment on, modify, and close bugs by e-mail. And the RT team would be thrilled to have us using it, in theory enough to provide some setup help. There's one thing that RT doesn't do by e-mail (can't remember offhand) but that's a TODO for them so it should be fixed soon. So, if the only real requirement for a bug tracker is that we can handle it 100% by e-mail, and integrate it with the pgsql-bugs list, that is possible. Does Trac have similar capability? Reason I'm asking is that I think *eventually* it would be very useful to have trac's ability to link bugs, version control, wiki, etc. all together. I know it'll probably be quite some time before that happens, but I'm sure that if we go with RT it'll never happen. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] libpq Describe Extension [WAS: Bytea and perl]
Volkan YAZICI [EMAIL PROTECTED] writes: On Aug 11 12:51, Greg Sabino Mullane wrote: Prepared statements are not visible nor survivable outside of your session, so this doesn't really make sense. If your application needs the information, it can get it at prepare time. What about persistent connections? Actually, I can give lots of corner cases to support my idea but they're not that often used. I think, as long as we'll break compatibility, placing Describe facility in the PQprepare() is not the way to go. I think this viewpoint has pretty much carried the day, so the PQdescribe functions should remain separate. However, it still seems to me that it'd be a shame if PQdescribePrepared couldn't return the statement's output column types, seeing that the backend is going to pass that info to us anyway. So I propose storing the parameter type info in a new section of a PGresult struct, and adding new PGresult accessor functions PQnparams, PQparamtype (or maybe PQptype to follow the existing PQftype precedent more closely) to fetch the parameter type info. The existing functions PQnfields etc will fetch output-column info. Aside from being more functional, this definition maintains the principle of least surprise, in that the interpretation of a PGresult from Describe isn't fundamentally different from a PGresult from a regular query. We also need async versions PQsendDescribePrepared and PQsendDescribePortal, as I mentioned before. Anyone have different suggestions for the names of these functions? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
Simon Riggs [EMAIL PROTECTED] writes: We want a single row output, with two columns, yes? Presumably: xlogfilenameTEXT offset INTEGER Sounds right to me. int4 should be wide enough for practical xlog segment sizes. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] selecting large result sets in psql using cursors
Chris Mair [EMAIL PROTECTED] writes: attached is the new and fixed version of the patch for selecting large result sets from psql using cursors. The is_select_command bit is wrong because it doesn't allow for left parentheses in front of the SELECT keyword (something entirely reasonable when considering big union/intersect/except trees). Also you'd need to allow for VALUES as the first keyword. But isn't the whole thing unnecessary? ISTM you could just ship the query with the DECLARE CURSOR prepended, and see whether you get a syntax error or not. At some point we ought to extend libpq enough to expose the V3-protocol feature that allows partial fetches from portals; that would be a cleaner way to implement this feature. However since nobody has yet proposed a good API for this in libpq, I don't object to implementing \u with DECLARE CURSOR for now. BTW, \u seems not to have any mnemonic value whatsoever ... isn't there some other name we could use? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Enum proposal / design
(I had a private bet with myself that Tom Lane would object to the bit shaving ;-) ) Tom Lane wrote: Ok, I'll run one more idea up the flagpole before giving up on a 4 byte on disk representation. :) How about assigning a unique 4 byte id to each enum value, and storing that on disk. This would be unique across the database, not per enum type. The structure of pg_enum would be a bit different, as the per-type enum id would be gone, and there would be multiple rows for each enum type. The columns would be: the type oid, the associated unique id and the textual representation. That seems not a bad idea. I had been considering complaining that the array-based catalog structure was denormalized, but refrained ... I like the fact that this approach makes it normalized. Another thought is that this isn't really tied to any particular width of stored enum values. You could easily imagine a compile time switch to say you want 2-byte enums instead of 4. Or 8; or even 1. Even more radical: do it at runtime. You could assign the typlen (stored width) of an enum type at creation time on the basis of the largest identifier it contains. This might be a bit too weird because enums created earlier would have a size advantage over those created later, but if you are looking to shave space ... I'm not sure I like either of these options. The configure option at least would make it too easy to break loading a dump from a db with different compile time limit, and the runtime typelen stuff just seems messy. I'm inclined to say let's keep it simple and stay with a fixed 4-byte global size. That reminds me: were you intending to allow an ALTER ENUM operation to add (or remove, or rename) elements of an enum type? The above method would fail for the case where an ADD operation needed to assign an identifier wider than the type allowed for. No, I think that's something of a footgun. We'd have to check every row to ensure we weren't orphaning some value. The workaround is to create a new enum type and then do alter table alter column type ... although I realise that could cause dependency problems too. Of course, people will be able to hack the catalog if they want to, but then it will be on their heads if things break - the intention is to treat these as essentially static - for dynamic stuff use a domain or a lookup table. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] BugTracker
On Wed, Aug 16, 2006 at 11:12:11AM +0800, Christopher Kings-Lynne wrote: Trac does support PostgreSQL... The thing I don't understand at this point is what exactly is the nature of the integration with the SCM. I don't see it being likely that there will be a deep integration of the PostgreSQL SCM (whatever the SCM platform) with Trac; that's way too much change to expect quickly... Basically I have it set up like this: * Trac has built-in browsing of the svn via the web * When I commit, the commit message must have a reference to an open ticket in Trac, eg: Rearrange blah to fix bad bug. Fixes #745 * In trac's development timeline, or whatever you can see all the commits against each ticket, and just click on them to see the complete diff for each change set in SVN. * Commit messages can contain full wiki markup, that fully integrates with the wiki that is all thoughout Trac. So, you can have wiki in your commit messages that refers to other bugs, wiki pages, source code files and lines, etc. Basically, Trac is cool. I don't see us adopting it too quickly for PostgreSQL though :P Well, CMD does have it up and running with our repository as sucked out of CVS. Granted, not full functionality, but better than nothing. If Josh turns on the rest of the stuff folks could go play with it and see what they think. BTW, if GNATS is what FreeBSD uses I'd have to agree that it's pretty ugly. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Enum proposal / design
Andrew Dunstan wrote: Even more radical: do it at runtime. You could assign the typlen (stored width) of an enum type at creation time on the basis of the largest identifier it contains. This might be a bit too weird because enums created earlier would have a size advantage over those created later, but if you are looking to shave space ... I'm not sure I like either of these options. The configure option at least would make it too easy to break loading a dump from a db with different compile time limit, and the runtime typelen stuff just seems messy. I thought the runtime one was kinda cute, actually, but you would have to have duplicate functions for the differently sized types, eg. enum1_out, enum2_out etc since otherwise you wouldn't know what sized parameter you were just handed. And as Tom pointed out there could be issues when someone wanted to modify the type. I'm inclined to say let's keep it simple and stay with a fixed 4-byte global size. Fair enough. I'm ok with 4 bytes; 8 seemed a bit gratuitous. That reminds me: were you intending to allow an ALTER ENUM operation to add (or remove, or rename) elements of an enum type? The above method would fail for the case where an ADD operation needed to assign an identifier wider than the type allowed for. No, I think that's something of a footgun. We'd have to check every row to ensure we weren't orphaning some value. The workaround is to create a new enum type and then do alter table alter column type ... although I realise that could cause dependency problems too. Well, one option that we might want to consider down the line is doing all that behind the scenes in an ALTER TYPE statement. Of the unsupported stuff that I listed, being able to alter the enum definition was the one that I thought had the most likely use case. Anyway, it's not something that we need to sort out straight away since there's a workaround. I suspect that it only came up because there would have been consequences for the ALTER if we had gone with the variable size idea, depending on how the ALTER was implemented. Of course, people will be able to hack the catalog if they want to, but then it will be on their heads if things break - the intention is to treat these as essentially static - for dynamic stuff use a domain or a lookup table. Right. Altering the values is a schema change (and I'd argue that domains fall into the same boat). If you want user-editable entries, create a separate table. Cheers Tom ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
On Wed, 2006-08-16 at 08:51 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-08-15 at 18:42 -0400, Tom Lane wrote: So let's fix pg_xlogfile_name_offset() to have two OUT parameters instead of returning a smushed-together string. I'll do this, but I'm conscious that this is a cosmetic change. Well, it's cosmetic, but it's also an API change, which means that this is our only opportunity to get it right. Once these functions are in a release it will be too hard to change them. I've just started working this part, now I have the rest complete. We want a single row output, with two columns, yes? Presumably: xlogfilenameTEXT offset INTEGER -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Jim C. Nasby wrote: On Tue, Aug 15, 2006 at 10:43:12PM -0700, Josh Berkus wrote: Tom, These days I doubt there's anyone around the project who refuses to use a web browser at all. However, I still personally find it much more convenient to read and respond to mailing-list postings than to have to go and visit random web pages to find out if there's something I need to know about. So my current take on this would be that the bug tracker would have to have a reasonable output email capability, but I'd not necessarily insist on being able to input to it by mail. Red Hat's present bugzilla system could be described that way --- and while I can't say I'm in love with it, I can deal with it. Actually, if that's the only objection it's solved. RT will now allow you to create, comment on, modify, and close bugs by e-mail. And the RT team would be thrilled to have us using it, in theory enough to provide some setup help. There's one thing that RT doesn't do by e-mail (can't remember offhand) but that's a TODO for them so it should be fixed soon. So, if the only real requirement for a bug tracker is that we can handle it 100% by e-mail, and integrate it with the pgsql-bugs list, that is possible. Does Trac have similar capability? Reason I'm asking is that I think *eventually* it would be very useful to have trac's ability to link bugs, version control, wiki, etc. all together. I know it'll probably be quite some time before that happens, but I'm sure that if we go with RT it'll never happen. guys, just a sobering refrain from the troll audience -- establishing trac/subversion, as a formal mechanism within postgesql circles, would go a long way toward showing the real world out there that postgresql is professionalizing (I know) and systematizing, etc.ad infinitum. Let everyone identify bugs (keeps novices busy), the more the merrier! New classes of semi-programmers will arise, lets call them buggers, and bugger watchers, unless they know English very well, pretty soon, the system will get used by real programmers, because in the long run, it saves time, and gets results. And folks, lets learn from the goofs of the Freebsd crowd, and maybe even from the Torvalds gang. Michael -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.10.10/419 - Release Date: 8/15/2006 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Enum proposal / design
Tom Dunstan [EMAIL PROTECTED] writes: I thought the runtime one was kinda cute, actually, but you would have to have duplicate functions for the differently sized types, eg. enum1_out, enum2_out etc since otherwise you wouldn't know what sized parameter you were just handed. I'm not sure that that matters really. What you are actually going to get handed is a Datum that IIRC is right-justified and zero-padded, so very probably one function would work for all stored widths. The bigger issue I think is the surprise factor if a column gets wider over a dump and reload. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
Simon Riggs [EMAIL PROTECTED] writes: Wise one: what should my pg_proc look like? DATA(insert OID = 2850 ( pg_xlogfile_name_offset PGNSP PGUID 12 f f t f i 1 2249 25 25 25 23 i o o _null_ pg_xlogfile_name_offset - _null_ )); Oh, as far as that goes, the array columns need to look like something array_in will eat; and you should provide parameter names so that select * from will produce useful headings. So probably more like DATA(insert OID = 2850 ( pg_xlogfile_name_offsetPGNSP PGUID 12 f f t f i 1 2249 25 {25,25,23} {i,o,o} {wal_offset,filename,offset} pg_xlogfile_name_offset - _null_ )); I think you can get away without inner quotes (ie, not {'i','o','o'}) as long as you aren't using anything weird like spaces in a parameter name. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
On Wed, 16 Aug 2006, Robert Treat wrote: I'm not sure I follow this, since currently anyone can email the bugs list or use the bugs - email form from the website. Are you looking to increase the barrier for bug reporting? Any garbage (ie. spam) is generally filtered before it hits the -bugs list itself Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
On Wed, 2006-08-16 at 11:45 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: We want a single row output, with two columns, yes? Presumably: xlogfilenameTEXT offset INTEGER Sounds right to me. int4 should be wide enough for practical xlog segment sizes. Wise one: what should my pg_proc look like? I'm the lucky man to break the _null_ _null_ _null_ rule... I've tried DATA(insert OID = 2850 ( pg_xlogfile_name_offsetPGNSP PGUID 12 f f t f i 1 2249 25 25 25 23 i o o _null_ pg_xlogfile_name_offset - _null_ )); but my initdb fails with selecting default shared_buffers/max_fsm_pages ... 2kB/100 creating configuration files ... ok creating template1 database in a/base/1 ... FATAL: cache lookup failed for type 26 child process exited with exit code 1 initdb: removing data directory a Thinking this might be an 0-referenced array issue, I also tried 24 24 22 in the above, but that bombs with the same error. Currently, if I just leave it as it is, then initdb runs but then hangs/bombs when you invokle the function (as you might expect). As far as I can tell, the function isn't ever called correctly without this... copied here for info. /* * Compute an xlog file name and decimal byte offset given a WAL location, * such as is returned by pg_stop_backup() or pg_xlog_switch(). * * Note that a location exactly at a segment boundary is taken to be in * the previous segment. This is usually the right thing, since the * expected usage is to determine which xlog file(s) are ready to archive. */ Datum pg_xlogfile_name_offset(PG_FUNCTION_ARGS) { text *location = PG_GETARG_TEXT_P(0); char *locationstr; unsigned int uxlogid; unsigned int uxrecoff; uint32 xlogid; uint32 xlogseg; uint32 xrecoff; XLogRecPtr locationpoint; charxlogfilename[MAXFNAMELEN]; TupleDesc returnTupleDesc; Datum values[2]; boolisnull[2]; HeapTuple returnHeapTuple; Datum result; /* * Read input and parse */ locationstr = DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(location))); if (sscanf(locationstr, %X/%X, uxlogid, uxrecoff) != 2) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(could not parse xlog location \%s\, locationstr))); locationpoint.xlogid = uxlogid; locationpoint.xrecoff = uxrecoff; /* Construct a tuple descriptor for the result rows. */ returnTupleDesc = CreateTemplateTupleDesc(2, false); TupleDescInitEntry(returnTupleDesc, (AttrNumber) 1, xlogfilename, TEXTOID, -1, 0); TupleDescInitEntry(returnTupleDesc, (AttrNumber) 2, offset, INT4OID, -1, 0); returnTupleDesc = BlessTupleDesc(returnTupleDesc); /* * xlogfilename */ XLByteToPrevSeg(locationpoint, xlogid, xlogseg); XLogFileName(xlogfilename, ThisTimeLineID, xlogid, xlogseg); values[0] = PointerGetDatum(xlogfilename); isnull[0] = false; /* * offset */ xrecoff = locationpoint.xrecoff - xlogseg * XLogSegSize; values[1] = UInt32GetDatum(xrecoff); isnull[1] = false; /* * Tuple jam: Having first prepared your Datums, then squash together */ returnHeapTuple = heap_form_tuple(returnTupleDesc, values, isnull); result = HeapTupleGetDatum(returnHeapTuple); PG_RETURN_DATUM(result); } -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
Simon Riggs [EMAIL PROTECTED] writes: but my initdb fails with creating template1 database in a/base/1 ... FATAL: cache lookup failed for type 26 Um ... when did you last cvs update? That was the behavior up till I fixed array_in for bootstrap mode, yesterday afternoon ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
On Wed, 2006-08-16 at 16:51 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: but my initdb fails with creating template1 database in a/base/1 ... FATAL: cache lookup failed for type 26 Um ... when did you last cvs update? That was the behavior up till I fixed array_in for bootstrap mode, yesterday afternoon ... Sounds like it must be so. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
On Wed, 2006-08-16 at 17:09 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Wise one: what should my pg_proc look like? DATA(insert OID = 2850 ( pg_xlogfile_name_offsetPGNSP PGUID 12 f f t f i 1 2249 25 25 25 23 i o o _null_ pg_xlogfile_name_offset - _null_ )); Oh, as far as that goes, the array columns need to look like something array_in will eat; and you should provide parameter names so that select * from will produce useful headings. So probably more like DATA(insert OID = 2850 ( pg_xlogfile_name_offset PGNSP PGUID 12 f f t f i 1 2249 25 {25,25,23} {i,o,o} {wal_offset,filename,offset} pg_xlogfile_name_offset - _null_ )); I think you can get away without inner quotes (ie, not {'i','o','o'}) as long as you aren't using anything weird like spaces in a parameter name. archive_timeout++.patch re-submitted on other thread, now including these changes also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] An Idea for planner hints
Jim C. Nasby [EMAIL PROTECTED] writes: On Tue, Aug 15, 2006 at 07:55:28PM +0200, Peter Eisentraut wrote: It would not be hard to create an auto explain analyze mode that implicitly runs EXPLAIN ANALYZE along with every query and logs the result. On its face, it sounds like an obviously great idea. I just don't see how you would put that to actual use, unless you want to read server logs all day long. Grepping for query duration and using the statistics views are much more manageable tuning methods. In my view anyway. Well, the output would really need to go into some machine-readable format, since you certainly aren't going to read it. That would also make it trivial to identify plans that diverged greatly from reality. Oracle's EXPLAIN had a peculiar design feature that always seemed bizarre from a user's point of view. But here's where it begins to become clear what they were thinking. It stuffs the EXPLAIN output into a table. It means you can then use SQL to format the data for display, to generate aggregate reports of plans, or to search for plans or plan nodes that meet certain criteria. They don't even have to be plans generated by your session. You can have an application run explain on its queries and then go and peek at the plans from a separate session. And it doesn't interfere with the query outputting its normal output. I'm not sure it's worth throwing out the more user-friendly interface we have now but I think it's clear that a table is the obvious machine-readable format if you're already sitting in an SQL database... :) Also, incidentally you guys are still thinking of applications that don't use prepared queries and parameters extensively. If they do they won't have reams of plans since there'll only be one ream of plans with one plan for each query on a session start not one for each execution. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Enum proposal / design
On Wed, Aug 16, 2006 at 04:13:43PM -0400, Tom Lane wrote: Tom Dunstan [EMAIL PROTECTED] writes: I thought the runtime one was kinda cute, actually, but you would have to have duplicate functions for the differently sized types, eg. enum1_out, enum2_out etc since otherwise you wouldn't know what sized parameter you were just handed. I'm not sure that that matters really. What you are actually going to get handed is a Datum that IIRC is right-justified and zero-padded, so very probably one function would work for all stored widths. The bigger issue I think is the surprise factor if a column gets wider over a dump and reload. Actually, if we're going to support variable-width enums, I think it makes the most sense to just expose that to the user, since they'll be able to have a chance of figuring out which size would make the most sense for a given table (unless you want to add logic to look at the table's layout...) If we wanted to provide an idiot-proof version that was unsized, we could just make that an alias for a 4 or 8 byte enum. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Enum proposal / design
Tom Dunstan [EMAIL PROTECTED] writes: Andrew Dunstan wrote: I'm inclined to say let's keep it simple and stay with a fixed 4-byte global size. Fair enough. I'm ok with 4 bytes; 8 seemed a bit gratuitous. If you're gonna fix it at 4 bytes, then I strongly suggest that the value identifiers actually be OIDs assigned through the standard OID-generating mechanism, and that the pg_enum table have the structure standard system OID column unique enum-value identifier enumtypid OID of enum type it belongs to enumnamename of enum value unique indexes on: oid (enumtypid, enumname) The advantage of doing this is that you can use the existing, well debugged, normally-quite-fast mechanisms for generating new unique value identifiers. Rather than consing up your own slow full-table-scan mechanism as envisioned in the original proposal. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] An Idea for planner hints
On Wed, Aug 16, 2006 at 06:48:09PM -0400, Gregory Stark wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Tue, Aug 15, 2006 at 07:55:28PM +0200, Peter Eisentraut wrote: It would not be hard to create an auto explain analyze mode that implicitly runs EXPLAIN ANALYZE along with every query and logs the result. On its face, it sounds like an obviously great idea. I just don't see how you would put that to actual use, unless you want to read server logs all day long. Grepping for query duration and using the statistics views are much more manageable tuning methods. In my view anyway. Well, the output would really need to go into some machine-readable format, since you certainly aren't going to read it. That would also make it trivial to identify plans that diverged greatly from reality. Oracle's EXPLAIN had a peculiar design feature that always seemed bizarre from a user's point of view. But here's where it begins to become clear what they were thinking. It stuffs the EXPLAIN output into a table. It means you can then use SQL to format the data for display, to generate aggregate reports of plans, or to search for plans or plan nodes that meet certain criteria. They don't even have to be plans generated by your session. You can have an application run explain on its queries and then go and peek at the plans from a separate session. And it doesn't interfere with the query outputting its normal output. I'm not sure it's worth throwing out the more user-friendly interface we have now but I think it's clear that a table is the obvious machine-readable format if you're already sitting in an SQL database... :) Actually, I had another idea, though I'm not sure how useful it will ultimately be... There's now a program to analyze generic PostgreSQL logs, someone else just posted that they're working on an analyzer for VACUUM, and there's a desire for machine-readable EXPLAIN output. What about providing a secondary logging mechanism that produces machine-readable output for different operations? The three I just mentioned are obvious choices, but there could be more. Also, incidentally you guys are still thinking of applications that don't use prepared queries and parameters extensively. If they do they won't have reams of plans since there'll only be one ream of plans with one plan for each query on a session start not one for each execution. That behavior could presumably be changed if we added the ability to analyze every statement a particular session was running. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Andrew Dunstan [EMAIL PROTECTED] writes: What we are talking about here is bug triage. Really? We have a problem with too many bug reports and need a tool to help triage them? That's the first I've heard of that. Think about what tasks you do now and what tool would make it easier. Don't try to invent problems to solve. The Debian system would be basically zero operational change. pgsql-bugs would continue to exist exactly as it does now except it would go through debbugs. Any message there would open a bug report. Anyone responding to say that's not a bug would just include the magic phrase to close the bug report too. Anyone responding with questions or data would just respond as normal. The net result would be exactly as it is now except that there would be a tool to view what bugs are still open and look at all the data accumulated on that bug. And you could look back at old bugs to see what version they were fixed in and what the bug looked like to see if it matched the problem a user is having. In short, it's just a tool to solve a problem we actually have (having a convenient archive of data about current and past bugs) without inventing problems to solve with extra process that we aren't already doing anyways. RT can be set up similarly but I'm not sure how much work it would take to make it as seamless. Debbugs has the advantage of working that way pretty much out of the box. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Enum proposal / design
Tom Lane [EMAIL PROTECTED] writes: I think this is excessive concern for bit-shaving. Make the on-disk representation be 8 bytes instead of 4, then you can store the OID directly and have no need for the separate identifier concept. This in turn eliminates one index, one syscache, and one set of lookup/cache routines. And you can have as many values of an enum as you darn please. Egads. bit-shaving is *important*. If it's 8 bytes you could just use a char(4) and store 4 character text codes instead. The whole reason to want this feature is precisely for bit-shaving. I was originally going to reply with some thoughts about how we really ought to fix things so that we don't need to store the type in every record of the entire table. That would let you use 1 or 2 bytes for most applications. Data density is the dominant factor controlling the overall speed of your database. If you can shave 10% off the width of your records that's a 10% speed gain in i/o and a 10% gain in headroom. This is the same issue we have with char(n) and numeric(x,y) already. If we found a general solution for getting the type name to the enum would it also help getting the typmod to char(n) and numeric(x,y)? Would it let us store those as fixed sized data types? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Adjust autovacuum naptime automatically
Hi hackers, There is a comment in autovacuum.c: | XXX todo: implement sleep scale factor that existed in contrib code. and the attached is a patch to implement it. In contrib code, sleep scale factor was used to adjust naptime only to lengthen the naptime. But I changed the behavior to be able to shorten it. In the case of a heavily update workload, the default naptime (60 seconds) is too long to keep the number of dead tuples low. With my patch, the naptime will be adjusted around 3 seconds at the case of pgbench (scale=10, 80 tps) with default other autovacuum parameters. I have something that I want to discuss with you: - Can we use the process-exitcode to make autovacuum daemon to communicate with postmaster? I used it to notify there are any vacuum jobs or not. - I removed autovacuum_naptime guc variable, because it is adjusted automatically now. Is it appropriate? Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center autovacuum_adjust_naptime-0817.patch Description: Binary data ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Enum proposal / design
Tom Lane [EMAIL PROTECTED] writes: If you're gonna fix it at 4 bytes, then I strongly suggest that the value identifiers actually be OIDs assigned through the standard OID-generating mechanism, and that the pg_enum table have the structure ... The advantage of doing this is that you can use the existing, well debugged, normally-quite-fast mechanisms for generating new unique value identifiers. Rather than consing up your own slow full-table-scan mechanism as envisioned in the original proposal. Yeah, I was never all that happy with that anyway, and figured for the unique value thingy that we could either use oids or set up a new sequence, but oids sounded like significantly less work. Cheers Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Enum proposal / design
Tom Lane [EMAIL PROTECTED] writes: I think this is excessive concern for bit-shaving. Egads. bit-shaving is *important*. If it's 8 bytes you could just use a char(4) and store 4 character text codes instead. The whole reason to want this feature is precisely for bit-shaving. Well, and that there's no straight substitute for the actual feature. The closest you'll get is a domain, but they don't order stuff properly. Efficiency is clearly a driving factor as well, though, hence my reluctance to store 8 bytes on disk. :) ... This is the same issue we have with char(n) and numeric(x ,y) already. If we found a general solution for getting the type name to the enum would it also help getting the typmod to char(n) and numeric(x,y)? Would it let us store those as fixed sized data types? It also affects composite types. And some user types out there like Martijn's tagged types. I didn't really want to go down that path in this thread since it would turn what should be a fairly non-intrusive patch to add a new type into a big thing, and I really just wanted to get enums in. :) I tend to think of it the other way around from how you put it: if a general solution to that problem can be found which does fall afoul of the security issues that were the reason for multi-argument output functions to be killed off in the first place, then great, and enums can directly benefit. Cheers Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Adjust autovacuum naptime automatically
ITAGAKI Takahiro wrote: In the case of a heavily update workload, the default naptime (60 seconds) is too long to keep the number of dead tuples low. With my patch, the naptime will be adjusted around 3 seconds at the case of pgbench (scale=10, 80 tps) with default other autovacuum parameters. Interesting. To be frank I don't know what the sleep scale factor was supposed to do. I have something that I want to discuss with you: - Can we use the process-exitcode to make autovacuum daemon to communicate with postmaster? I used it to notify there are any vacuum jobs or not. I can only tell you we do this is Mammoth Replicator and it works for us. Whether this is a very good idea, I don't know. I didn't find any other means to communicate stuff from dying processes to the postmaster. - I removed autovacuum_naptime guc variable, because it is adjusted automatically now. Is it appropriate? I think we should provide the user with a way to stop the naptime from changing at all. Eventually we will have the promised maintenance windows feature which will mean the user will not have to worry at all about the naptime, but in the meantime I think we should keep it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pgstattuple extension for indexes
Hi Nagayasu san and folks, I have a question about pgstatindex. Satoshi Nagayasu [EMAIL PROTECTED] wrote: Attached patch has been cleaned up, and modified to be able to work with CVS HEAD. Index leaf pages are ordered just after REINDEX. [1] [2] [3] After full-split, they will be the following: [1] [3] [5] [2] [4] [6] because new pages are allocated at the end of the index file. I think this condition should be regarded as full-fragmented, but pgstatindex reports that the leaf_fragmentation is 50%. Presently, fragmentation factor is computed as the code below: if (opaque-btpo_next != P_NONE opaque-btpo_next blkno) stat-fragments++; But the method has the above problem. So I suggest to use whether the right link points to the next adjacent page or not. if (opaque-btpo_next != P_NONE opaque-btpo_next != blkno + 1) stat-fragments++; Do you think which method is better? Or do you have other ideas? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Gregory Stark wrote: The Debian system would be basically zero operational change. pgsql-bugs would continue to exist exactly as it does now except it would go through debbugs. Debbugs is fine and all, but they don't seem to publish their code on a regular basis. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Andrew Dunstan wrote: What we are talking about here is bug triage. I think we are actually talking about bug *tracking*. One sensible way to do this would be to have a group of suitably qualified volunteers who could perform this function on a roster basis, for, say, a week or a two at a time. Organising a roster, a rotating roster at that, is probably the single most difficult thing you can do in this group. :-) -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: BugTracker (Was: Re: [HACKERS] 8.2 features status)
Martijn van Oosterhout wrote: If you want the latter, the approach would be to keep pgsql-bugs and when a real issue comes up, bounce it to the bug tracker. Any subsequent email discussion should then get logged in the bug report. That's what I want. I don't want the bug tracking system to be the primary frontend to users off the street. Because quite frankly most users are too confused to know what a real bug is. That doesn't mean that I want a closed BTS, but a system that requires sign up and user accounts (like Bugzilla) imposes the right barrier to random abuse in my mind. Note that RT stands for Request Tracker, which on its face is a different thing, namely a system to do tracking of requests by users off the street. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Adjust autovacuum naptime automatically
Alvaro Herrera wrote: ITAGAKI Takahiro wrote: In the case of a heavily update workload, the default naptime (60 seconds) is too long to keep the number of dead tuples low. With my patch, the naptime will be adjusted around 3 seconds at the case of pgbench (scale=10, 80 tps) with default other autovacuum parameters. What is this based on? That is, based on what information is it deciding to reduce the naptime? Interesting. To be frank I don't know what the sleep scale factor was supposed to do. I'm not sure that sleep scale factor is a good idea or not at this point, but what I was thinking back in the day when i originally wrote the contrib autovacuum is that I didn't want the system to get bogged down constantly vacuuming. So, if it just spent a long time working on one database, it would sleep for long time. Given that we can now specify the vacuum cost delay settings for autovacuum and disable tables and everything else, I'm not sure we this anymore, at least not as it was originally designed. It sounds like Itagaki is doing things a little different with his patch, but I'm not sure I understand it. - I removed autovacuum_naptime guc variable, because it is adjusted automatically now. Is it appropriate? I think we should provide the user with a way to stop the naptime from changing at all. Eventually we will have the promised maintenance windows feature which will mean the user will not have to worry at all about the naptime, but in the meantime I think we should keep it. I'm not sure that's true. I believe we will want the naptime GUC option even after we have the maintenance window. I think we might ignore the naptime during the maintenance window, but even after we have the maintenance window, we will still vacuum during the day as required. My vision of the maintenance window has always been very simple, that is, during the maintenance window the thresholds get reduced by some factor (probably a GUC variable) so during the day it might take 1 updates on a table to cause a vacuum but during the naptime it might be 10% of that, 1000. Is this in-line with what others were thinking? ---(end of broadcast)--- TIP 6: explain analyze is your friend