Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: ... That kinda works, but the problem is that restartpoints are time based, not log based. We need them to be deterministic for us to rely upon them in the above way. Right, but the performance disadvantages of making them strictly log-distance-based are pretty daunting. We don't really want slaves doing that while they're in catchup mode. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: I think we can get away with writing the LSN value to disk, as you suggested, but only every so often. No need to do it after every WAL record, just consistently every so often, so it gives us a point at which we know we are safe. Huh? How does that make you safe? What you need to know is the max LSN that could possibly be on disk. Hmm, actually we could get away with tying this to fetching WAL files from the archive. When switching to a new WAL file, write out the *ending* WAL address of that file to pg_control. Then process the WAL records in it. Whether or not any of the affected pages get to disk, we know that there is no LSN on disk exceeding what we already put in pg_control. If we crash and restart, we'll have to get to the end of this file before we start letting backends in; which might be further than we actually got before the crash, but not too much further because we already know the whole WAL file is available. Or is that the same thing you were saying? The detail about using the end address seems fairly critical, and you didn't mention it... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote: ... If we crash and restart, we'll have to get to the end of this file before we start letting backends in; which might be further than we actually got before the crash, but not too much further because we already know the whole WAL file is available. Don't want to make it per file though. Big systems can whizz through WAL files very quickly, so we either make it a big number e.g. 255 files per xlogid, or we make it settable (and recorded in pg_control). I think you are missing the point I made above. If you set the okay-to-resume point N files ahead, and then the master stops generating files so quickly, you've got a problem --- it might be a long time until the slave starts letting backends in after a crash/restart. Fetching a new WAL segment from the archive is expensive enough that an additional write/fsync per cycle doesn't seem that big a problem to me. There's almost certainly a few fsync-equivalents going on in the filesystem to create and delete the retrieved segment files. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote: After reading this for awhile, I realized that there is a rather fundamental problem with it: it switches into consistent recovery mode as soon as it's read WAL beyond ControlFile-minRecoveryPoint. In a crash recovery situation that typically is before the last checkpoint (if indeed it's not still zero), and what that means is that this patch will activate the bgwriter and start letting in backends instantaneously after a crash, long before we can have any certainty that the DB state really is consistent. In a normal crash recovery situation this would be easily fixed by simply not letting it go to consistent recovery state at all, but what about recovery from a restartpoint? We don't want a slave that's crashed once to never let backends in again. But I don't see how to determine that we're far enough past the restartpoint to be consistent again. In crash recovery we assume (without proof ;-)) that we're consistent once we reach the end of valid-looking WAL, but that rule doesn't help for a slave that's following a continuing WAL sequence. Perhaps something could be done based on noting when we have to pull in a WAL segment from the recovery_command, but it sounds like a pretty fragile assumption. Seems like we just say we only signal the postmaster if InArchiveRecovery. Archive recovery from a restartpoint is still archive recovery, so this shouldn't be a problem in the way you mention. The presence of recovery.conf overrides all other cases. What that implements is my comment that we don't have to let anyone in at all during a plain crash recovery. It does nothing AFAICS for the problem that when restarting archive recovery from a restartpoint, it's not clear when it is safe to start letting in backends. You need to get past the highest LSN that has made it out to disk, and there is no good way to know what that is. Unless we can get past this problem the whole thing seems a bit dead in the water :-( * I'm a bit uncomfortable with the fact that the IsRecoveryProcessingMode flag is read and written with no lock. It's not a dynamic state, so I can fix that inside IsRecoveryProcessingMode() with a local state to make check faster. Erm, this code doesn't look like it can allow IsRecoveryProcessingMode to become locally true in the first place? I guess you could fix it by initializing IsRecoveryProcessingMode to true, but that seems likely to break other places. Maybe better is to have an additional local state variable showing whether the flag has ever been fetched from shared memory. The other issues don't seem worth arguing about ... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] get_relation_stats_hook()
Simon Riggs [EMAIL PROTECTED] writes: New version of Postgres patch, v5. Implements suggested changes. Ready for review and apply. Applied with some revisions. The method for passing back freefunc didn't work, so I made it pass the whole VariableStatsData struct instead; this might allow some additional flexibility by changing other fields besides the intended statsTuple and freefunc. Also, I was still unhappy about adding a hook in the midst of code that clearly needs improvement, without making it possible for the hook to override the adjacent broken code paths; so I refactored the API a bit for that too. The plugin function would now be something like this: static bool plugin_get_relation_stats(PlannerInfo *root, RangeTblEntry *rte, AttrNumber attnum, VariableStatData *vardata) { HeapTuplestatstup = NULL; /* For now, we only cover the simple-relation case */ if (rte-rtekind != RTE_RELATION || rte-inh) return false; if (!get_tom_stats_tupletable(rte-relid, attnum)) return false; /* * Get stats if present. We asked for only one row, so no need for loops. */ if (SPI_processed 0) statstup = SPI_copytuple(SPI_tuptable-vals[0]); SPI_freetuptable(SPI_tuptable); SPI_finish(); if (!statstup) return false;/* should this happen? */ vardata-statsTuple = statstup; /* define function to use when time to free the tuple */ vardata-freefunc = heap_freetuple; return true; } and if you want to insert stats for expression indexes then there's a separate get_index_stats_hook for that. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: It does nothing AFAICS for the problem that when restarting archive recovery from a restartpoint, it's not clear when it is safe to start letting in backends. You need to get past the highest LSN that has made it out to disk, and there is no good way to know what that is. AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in the controlfile, so whenever we restart we can see that it has been set previously and now we are beyond it. Right ... So if we crash during recovery and then restart *after* we reached minRecoveryLoc then we resume in safe mode almost immediately. Wrong. What minRecoveryLoc is is an upper bound for the LSNs that might be on-disk in the filesystem backup that an archive recovery starts from. (Defined as such, it never changes during a restartpoint crash/restart.) Once you pass that, the on-disk state as modified by any dirty buffers inside the recovery process represents a consistent database state. However, the on-disk state alone is not guaranteed consistent. As you flush some (not all) of your shared buffers you enter other not-certainly-consistent on-disk states. If we crash in such a state, we know how to use the last restartpoint plus WAL replay to recover to another state in which disk + dirty buffers are consistent. However, we reach such a state only when we have read WAL to beyond the highest LSN that has reached disk --- and in recovery mode there is no clean way to determine what that was. Perhaps a solution is to make XLogFLush not be a no-op in recovery mode, but have it scribble a highest-LSN somewhere on stable storage (maybe scribble on pg_control itself, or maybe better someplace else). I'm not totally sure about that. But I am sure that doing nothing will be unreliable. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: Version 7 After reading this for awhile, I realized that there is a rather fundamental problem with it: it switches into consistent recovery mode as soon as it's read WAL beyond ControlFile-minRecoveryPoint. In a crash recovery situation that typically is before the last checkpoint (if indeed it's not still zero), and what that means is that this patch will activate the bgwriter and start letting in backends instantaneously after a crash, long before we can have any certainty that the DB state really is consistent. In a normal crash recovery situation this would be easily fixed by simply not letting it go to consistent recovery state at all, but what about recovery from a restartpoint? We don't want a slave that's crashed once to never let backends in again. But I don't see how to determine that we're far enough past the restartpoint to be consistent again. In crash recovery we assume (without proof ;-)) that we're consistent once we reach the end of valid-looking WAL, but that rule doesn't help for a slave that's following a continuing WAL sequence. Perhaps something could be done based on noting when we have to pull in a WAL segment from the recovery_command, but it sounds like a pretty fragile assumption. Anyway, that's sufficiently bad that I'm bouncing the patch for reconsideration. Some other issues I noted before giving up: * I'm a bit uncomfortable with the fact that the IsRecoveryProcessingMode flag is read and written with no lock. There's no atomicity or concurrent-write problem, sure, but on a multiprocessor with weak memory ordering guarantees (eg PPC) readers could get a fairly stale value of the flag. The false to true transition happens before anyone except the startup process is running, so that's no problem; the net effect is then that backends might think that recovery mode was still active for awhile after it wasn't. This seems a bit scary, eg in the patch as it stands that'll disable XLogFlush calls that should have happened. You could fix that by grabbing/releasing some spinlock (any old one) around the accesses, but are any of the call sites performance-critical? The one in XLogInsert looks like it is, if nothing else. * I kinda think you broke XLogFlush anyway. It's certainly clear that the WARNING case at the bottom is unreachable with the patch, and I think that means that you've messed up an important error robustness behavior. Is it still possible to get out of recovery mode if there are any bad LSNs in the shared buffer pool? * The use of InRecovery in CreateCheckPoint seems pretty bogus, since that function can be called from the bgwriter in which the flag will never be true. Either this needs to be IsRecoveryProcessingMode(), or it's useless because we'll never create ordinary checkpoints until after subtrans.c is up anyway. * The patch moves the clearing of InRecovery from after to before StartupCLOG, RecoverPreparedTransactions, etc. Is that really a good idea? It makes it hard for those modules to know if they are coming up after a normal or recovery startup. I think they may not care at the moment, but I'd leave that alone without a darn good reason to change it. * The comment about CheckpointLock being only pro forma is now wrong, if you are going to use locking it to implement a delay in exitRecovery. But I don't understand why the delay there. If it's needed, seems like the path where a restartpoint *isn't* in progress is wrong --- don't you actually need to start one and wait for it? Also I note that if the LWLockConditionalAcquire succeeds, you neglect to release the lock, which surely can't be right. * The tail end of StartupXLOG() looks pretty unsafe to me. Surely we mustn't clear IsRecoveryProcessingMode until after we have established the safe-recovery checkpoint. The comments there seem to be only vaguely related to the current state of the patch, too. * Logging of recoveryLastXTime seems pretty bogus now. It's supposed to be associated with a restartpoint completion report, but now it's just out in the ether somewhere and doesn't represent a guarantee that we're synchronized that far. * backup.sgml needs to be updated to say that log_restartpoints is obsolete. * There are a bunch of disturbing assumptions in the SLRU-related modules about their StartUp calls being executed without any concurrent access. This isn't really a problem that needs to be dealt with in this patch, I think, but that will all have to be cleaned up before we dare allow any backends to run concurrently with recovery. btree's suppression of relcache invals for metapage updates will be a problem too. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Simon Riggs [EMAIL PROTECTED] writes: maintenance_work_mem is already used for 3 separate operations that bear little resemblance to each other. If it's appropriate for all of those then its appropriate for this usage also. No, it isn't. The fundamental point here is that this isn't a memory allocation parameter; it's a switchover threshold between two different behaviors. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Simon Riggs [EMAIL PROTECTED] writes: Thinks: Why not just sort all of the time and skip the debate entirely? The sort is demonstrably a loser for smaller indexes. Admittedly, if the index is small then the sort can't cost all that much, but if the (correct) threshold is some large fraction of shared_buffers then it could still take awhile on installations with lots-o-buffers. The other side of that coin is that it's not clear this is really worth arguing about, much less exposing a separate parameter for. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Jonah H. Harris [EMAIL PROTECTED] writes: On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: I'm considering changing hashbuild to switch over at shared_buffers instead of effective_cache_size --- any thoughts about that? Switching to shared_buffers gets my vote, on my test table with 50,000,000 rows it takes about 8 minutes to create an index using the default effective_cache_size. With effective_cache_size set to 6GB (machine has 8GB) its still going an hour later. Agreed. I think using shared_buffers as a cutoff is a much better idea as well. Already done in CVS a week or so back, but thanks for following up with some confirmation. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Simon Riggs [EMAIL PROTECTED] writes: I wasn't very happy with effective_cache_size and not happy with shared_buffers either. If building hash indexes is memory critical then we just need to say so and encourage others to set memory use correctly. People are already aware that maintenance_work_mem needs to be increased for large index builds and we will confuse people if we ignore that and use another parameter instead. I think you've got this completely backwards. The general theory about maintenance_work_mem is set it as large as you can stand it. The issue at hand here is that the crossover point for hash index sort building seems to be a good deal less than all-the-memory-you-have. Perhaps there is a case for giving this behavior its very own configuration parameter; but seeing that we still don't have all that much of a use case for hash indexes at all, I don't feel a need to do that yet. In any case, tying it to maintenance_work_mem is certainly wrong. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: Having some trouble trying to get a clean state change from recovery to normal mode. Startup needs to be able to write WAL at the end of recovery so it can write a ShutdownCheckpoint, yet must not be allowed to write WAL before that. Other services are still not fully up yet. So every other process apart from Startup musn't write WAL until Startup has fully completed its actions, which is sometime later. ... We *might* solve this by making the final checkpoint that Startup writes into an online checkpoint. Do we really need a checkpoint there at all? I can't recall right now if there was a good reason why Vadim made it do that. I have a feeling that it might have had to do with an assumption that the recovery environment was completely distinct from live environment; which is a statement that's certainly not going to be true in a query-answering slave. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote: Do we really need a checkpoint there at all? Timelines only change at shutdown checkpoints. Hmm. I *think* that that is just a debugging crosscheck rather than a critical property. But yeah, it would take some close investigation, which maybe isn't warranted if you have a less-invasive solution. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
I did some testing of my own on the hash index patch, and mostly seem to have confirmed Alex's results. I used a table like this: create table tab (f1 serial primary key, f2 some-datatype); create index ind on tab using hash (f2); and populated it with 2 million rows of data; then timed queries like this: select * from tab a join tab b using(f2) where f2 = (select f2 from tab c where c.f1 = (select int4(random() * 2e6))); using pgbench like this: pgbench -n -c 1 -T 60 -M prepared -f query.sql hashdb To test wide indexed columns I used a text column with entries of 100 random characters (identical to Alex's test). I saw a performance gain of about 50% with an empty kernel cache (26.9 vs 41.9 tps), dropping to about 14% once the table and index were fully swapped in (4185 vs 4764 tps). This was in a C-locale database. Presumably the win would have been significantly greater in a non-C-locale test, but I didn't try it. To test narrow indexed columns I made f2 a bigint containing 200 consecutive integers. Here I saw about a 5% improvement with either empty cache (48.1 vs 50.5 tps) or full cache (4590 vs 4800 tps). This is not too surprising since about the same amount of I/O is needed either way, and bigint comparison is very cheap. (This is a 64-bit machine, and it's defaulting to pass-by-value for bigints, so value comparisons are hardly more expensive than hashcode comparisons.) But the patch still wins a bit by being able to do binary search within index pages. In both of the above cases there were only a negligible number of hash collisions. I made up another test case, still 2 million bigints, but with values chosen so that almost every entry had a hash collision with another entry (a small fraction had two or even 3 collisions). This showed about a 25% slowdown compared to CVS HEAD with empty cache (49.9 tps down to 37.2), decreasing to 3% with full cache (4609 vs 4482 tps). I experimented with some variant queries that did more hash index fetches per query, and saw penalties as high as 50%. However, this is surely by far the worst case for the patch: no savings in index size, and a ridiculously high collision rate. Lastly, for comparison's sake I tried the wide column case with a btree instead of hash index. This gave me 31.5 tps with empty cache, 4749 tps with full cache. Note that the btree is losing significantly to the patched hash index in empty-cache conditions --- this presumably reflects the much larger index size causing more I/O. I'm thinking that we should go ahead and apply the patch. AFAIR this is the first example we've ever seen of hash beating btree for fetch performance, and it's pretty obvious that removing the indexed value from the index contents is the reason for the win. So I've lost interest in the alternative that involved storing both hashcode and indexed value. We now see a possible production use-case for hash, namely indexing wide column values. BTW, one thing I noticed was that the hash index build time for the wide column case got a lot worse after applying the patch (from 56 to 237 sec). The reason for this turned out to be that with the smaller predicted index size, the code decided not to use the pre-sorting method that was recently added. Reducing effective_cache_size to less than the index size brought the time back down, to about 54 sec. So it would seem that effective_cache_size is too large a cutoff value. I'm considering changing hashbuild to switch over at shared_buffers instead of effective_cache_size --- any thoughts about that? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Ryan Bradetich [EMAIL PROTECTED] writes: I am assuming you are seeing this error in the uint_test1.sql: ERROR: could not find hash function for hash operator 16524 I can bypass the error in uint_test1.sql by disabling the hash joins. I am going to dig in and figure out why the hashjoin operation is broken. Well, the cause of that one would've been marking an operator as HASHES without providing a hash opclass to back it up. IIRC the test case involved ? That shouldn't even be marked HASHES anyway ... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Zdenek Kotala [EMAIL PROTECTED] writes: I attach cleanup patch which we discussed last commit fest. It introduce new macros HashGetMetaPage and HashGetBitmap and of course, it break backward on disk format compatibility which we will break it anyway when Xiao's patch will be committed. If we accept that patch, I'm okay with including this one too. Still hoping for some performance testing though. (Note that this one isn't going to affect performance, so no need to include it for that purpose.) regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: Included patch with the following changes: * new postmaster mode known as consistent recovery, entered only when recovery passes safe/consistent point. InRedo is now set in all processes when started/connected in consistent recovery mode. I looked at this and didn't like the InRedo signaling at all. In the first place, it looks like you're expecting the flag to be passed down via fork(), but that won't work in EXEC_BACKEND mode. (Yes, easily fixed, but it's wrong as-is.) In the second place, the method of signaling it to the bgwriter looks to have race conditions: in principle, at least, I think the startup process could try to clear the shared-memory InRedo flag before the bgwriter has gotten as far as setting it. (This might work if the startup process can't exit redo mode without the bgwriter having finished a checkpoint, but such a dependency is too rube goldbergian for me.) ISTM that it would probably be better if there were exactly one InRedo flag in shared memory, probably in xlog.c's shared state, with the postmaster not being responsible for setting or clearing it; rather the startup process should do those things. * bgwriter and stats process starts in consistent recovery mode. bgwriter changes mode when startup process completes. I'm not sure about the interaction of this. In particular, what about recovery restart points before we have reached the safe stop point? I don't think we want to give up the capability of having those. Also, it seems pretty bogus to update the in-memory ControlFile checkpoint values before the restart point is actually done. It looks to me like what you have done is to try to use those fields as signaling for the restart request in addition to their existing purposes, which I think is confusing and probably dangerous. I'd rather there were a different signaling path and ControlFile maintains its currrent definition. I found the changes in bgwriter.c unreadable. You've evidently moved blocks of code around, but exactly what did you mean to change? Why is there so much duplicated code now? I've kept the distinction between restartpoints and checkpoints in code, to avoid convoluted code. Hmm, I dunno, it seems like that might be a bad choice. Are you sure it's not cleaner to just use the regular checkpoint code? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote: Hmm, I dunno, it seems like that might be a bad choice. Are you sure it's not cleaner to just use the regular checkpoint code? When I tried to write it, it just looked to my eyes like every single line had a caveat which looked ugly and multiplied the testing. You're the code dude, always happy to structure things as you suggest. If you're sure, that is. No, was just wondering if the other way would be better. If you're sure not, that's fine. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Jaime Casanova [EMAIL PROTECTED] writes: contrib_regression=# select * from t1 where f1 35; ERROR: unsupported type: 16486 That obviously isn't supposed to happen. Where's it coming from exactly? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Jaime Casanova [EMAIL PROTECTED] writes: On Sun, Sep 7, 2008 at 2:41 AM, Tom Lane [EMAIL PROTECTED] wrote: That obviously isn't supposed to happen. Where's it coming from exactly? convert_numeric_to_scalar() in src/backend/utils/adt/selfuncs.c the problem seems to be that we are asking for each type of numeric and of course that doesn't know nothing about unsigned integers so its treating it as a non-numeric. Ah. The scalarltsel/scalargtsel stuff has always been a bit bogus for cross-type comparisons; it assumes it will know both or neither of the two datatypes. So you can get away with using those functions for uint uint (although they won't be very bright about it); but using them for uint int fails outright. If you read the comments around that stuff it leaves quite a lot to be desired, but I don't really have better ideas at the moment. The best near-term solution for the uint module is probably not to rely on scalarltsel/scalargtsel for uint comparisons, but to make its own selectivity functions that know the uint types plus whatever standard types you want to have comparisons with. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Alex Hunsaker [EMAIL PROTECTED] writes: - tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, false); + tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, true); Hah, I bet that explains Jonah's complaint that recheck didn't seem to be happening in his tests. Nice catch. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Alex Hunsaker [EMAIL PROTECTED] writes: On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. And here is the fix, we just forget to set the recheck flag for bitmap scans. For the convenience of anyone intending to test, here is an updated patch against CVS HEAD that incorporates Alex's fix. regards, tom lane binwQERo3PM5e.bin Description: hash-v5.patch.gz -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Jaime Casanova [EMAIL PROTECTED] writes: seems there is something wrong in the unlikely macro (i'm using GCC 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64) postgres=# select -256::uint1; ERROR: uint1 out of range No, that's just because this is parsed as -(256::uint1) regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
David Rowley [EMAIL PROTECTED] writes: I've made and attached the changes Heikki recommended. I looked this over a bit and was immediately confused by one thing: the introductory comment says that the skip table size ought to be based on the length of the haystack, which makes sense to me, but the code is actually initializing it on the basis of len2, ie, the length of the needle. Isn't that a bug? Was the same bug present in the tests you made to determine the best table sizes? Stylistic issues: I really dislike having two copies of the heuristic size-setting code. I think it's worth introducing a second test of use_wchar in order to arrange text_position_setup like this: ... existing code ... choose skiptablesize initialize skip table (this loop doesn't depend on char width) if (!state-use_wchar) process char needle else process wide-char needle Also it might be worth having local-variable copies of skiptablesize and len2 for this initialization loop: for (ai = 0; ai = state-skiptablesize; ai++) state-skiptable[ai] = state-len2; I'm not at all sure whether a C compiler will think that it's allowed to avoid fetching these values again on each iteration, seeing that the loop is storing into other integer members of *state. Given that this loop is exactly the overhead we're trying to control by adjusting skiptablesize, making it as tight as possible seems worthwhile. Now that the skip table is a member of TextPositionState, I was not quite sure if I should #define a size for it. It would certainly look neater, only the code that defines the skip table size in text_position_start assumes 256 elements. I tend to agree that a #define is pointless there, since there's no easy way to make the relevant code do anything with it. It would be worthwhile to point out adjacent to the code what the max length is, though. I had gotten as far as rewriting your introductory comment like this /* * Prepare the skip table for Boyer-Moore-Horspool searching. First we * must determine how big a skip table to use. The declaration of * TextPositionState allows up to 256 elements, but with haystack lengths * of only a few characters we don't really want to have to initialize so * many elements --- it would take too long in comparison to the actual * search time. So we choose a useful skip table size based on the * haystack length. before noticing that what I was writing wasn't what the code actually did. Another point that doesn't seem to be included in your comments is that the skip table size *must* be a power of 2 because we use bit masking. (In fact state-skiptablesize might better be named skiptablemask since it's really 1 less than the table size.) BTW, try to avoid introducing so much random vertical space. IMHO it does little for readability and much to make routines too long to fit in one screenful. Generally the PG code doesn't use double blank lines anywhere inside a function, nor blank lines before a right brace line. pg_indent will clean this up to some extent, but it would be better if the code looked more like the project style in the first place. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
I wrote: I looked this over a bit and was immediately confused by one thing: the introductory comment says that the skip table size ought to be based on the length of the haystack, which makes sense to me, but the code is actually initializing it on the basis of len2, ie, the length of the needle. Isn't that a bug? Was the same bug present in the tests you made to determine the best table sizes? BTW, to the extent that you feel like testing a different idea, I would suggest: * don't bother initializing the skiptable when len1 len2 * otherwise, choose its size based on len1 - len2, not just len1 or len2. This is (one less than) the maximum number of search loop consultations of the skip table that can happen, so it seems like a plausible number, and better than either length alone. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Jaime Casanova [EMAIL PROTECTED] writes: On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane [EMAIL PROTECTED] wrote: postgres=# select -256::uint1; ERROR: uint1 out of range No, that's just because this is parsed as -(256::uint1) actually, i thought that case is right but the -255::uint1 returning a negative number (aka -255) is what bothers me Well, again, that's -(255::uint1). I suppose uint1 hasn't got a negation operator (what would it do??), so probably the sequence of events is to form 255::uint1, then implicitly promote it to some signed type or other (most likely int4), then negate. Not much to be done about this unless you want to get rid of the implicit coercion to signed types, which would probably defeat most of the purpose. Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO. Casting any negative value to uint ought to fail, no? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Jaime Casanova [EMAIL PROTECTED] writes: then the patch is right but it seems to me like that is broking the law of less surprise i expected -2::uint1 to be equivalent to (-2)::uint1 that should be at least documented, no? See the precedence table here: http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE :: binds more tightly than -, and always has. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
David Rowley [EMAIL PROTECTED] writes: Tom Lane wrote: Hmm. B-M-H has worst case search speed O(M*N) (where M = length of pattern, N = length of search string); whereas full B-M is O(N). Maybe we should build the second table when M is large? I'll look into this. If it pays off for longer searches I'll submit a patch. I won't have the time until after the 15th, so perhaps that's in November's commit fest? Yes. Let's get B-M-H in during this fest and then you can look into whether a follow-on patch is worthwhile. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
I wrote: You have the unique-versus-not dimension, On second thought, actually not. What we want to look at is the penalty for false matches due to *distinct* key values that happen to have the same hash codes. Your test case for all-the-same is using all the same key values, which means it'll hit the heap a lot, but none of those will be wasted trips. So what we need for testing is a few different key values that hash to the same code. Not sure about an easy way to find such. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Heikki Linnakangas [EMAIL PROTECTED] writes: After reading the wikipedia article on Boyer-Moore search algorithm, it looks to me like this patch actually implements the simpler Boyer-Moore-Horspool algorithm that only uses one lookup table. That's probably fine, as it ought to be faster on small needles and haystacks because it requires less effort to build the lookup tables, even though the worst-case performance is worse. It should still be faster than what we have now. Hmm. B-M-H has worst case search speed O(M*N) (where M = length of pattern, N = length of search string); whereas full B-M is O(N). Maybe we should build the second table when M is large? Although realistically that is probably gilding the lily, since frankly there haven't been many real-world complaints about the speed of these functions anyway ... The skip table really should be constructed only once in text_position_start and stored in TextPositionState. That would make a big difference to the performance of those functions that call text_position_next repeatedly: replace_text, split_text and text_to_array. +1. The heuristic about big a skip table to make may need some adjustment as well, since it seems to be considering only a single search. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Alex Hunsaker [EMAIL PROTECTED] writes: On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote: So what we need for testing is a few different key values that hash to the same code. Not sure about an easy way to find such. Hrm, well I have not really looked at the hash algorithm but I assume we could just reduce the number of buckets? No, we need fully equal hash keys, else the code won't visit the heap. I guess one thing we could do for testing purposes is lobotomize one of the datatype-specific hash functions. For instance, make int8_hash return the input mod 2^32, ignoring the upper bytes. Then it'd be easy to compute different int8s that hash to the same thing. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Alex Hunsaker [EMAIL PROTECTED] writes: On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote: * check that the queries actually use the indexes (not sure that the proposed switch settings ensure this, not to mention you didn't create the indexes) Well I was assuming I could just test the speed of a hash join... Uh, no, hash joins have nearly zip to do with hash indexes. They rely on the same per-datatype support functions but that's the end of the commonality. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: Did plan invalidation make it safe to rely on the presence of a unique index for planning decisions? My understanding was Yes and this case was the specific reason I originally wanted to pursue plan invalidation back in 2006. Yeah, it should work. The theory is that any schema change that could affect planning should result in broadcasting a relcache inval message for the table (not just the index, note). I'm pretty confident that that works for index addition and removal (cf index_update_stats and index_drop). There might be some situations where we need to force a relcache inval but don't currently do so --- constraint addition/removal for instance I'm not too sure about. But that would represent an easily fixable bug. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs [EMAIL PROTECTED] writes: Oh. How does the query look like after removing the join, then? Same answer, just slower. Removing the join makes the access to a into a SeqScan, whereas it was a two-table index plan when both tables present. I don't really believe this: please show an actual case where the join would be faster. AFAICS, in the outer-join examples, it is not possible for a join to enable some kind of indexscan on the outer table, because by definition an outer join excludes none of the left-hand rows. So a seqscan on the outer is optimal. I also find all the worry about generating other plans for the inner relation to be off the mark. You're not going to *use* any plan for the inner rel, so who cares what plans it has? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs [EMAIL PROTECTED] writes: As discussed on 26 June, Join Removal/Vertical Partitioning, here's a patch to remove joins in certain circumstances. Some points not made in the thread so far: + if (checkrel-rtekind != RTE_RELATION) + return; This isn't right, or at least isn't sufficient, because rtekind is undefined for joinrels. I think it should be something like /* * Since we can't prove anything unless keeprel has a unique * index, give up immediately if it's a join or not a table. */ if (checkrel-reloptkind == RELOPT_JOINREL || checkrel-rtekind != RTE_RELATION) return; Maybe you should also check for checkrel-indexlist == NIL here, since there's no point in doing the attr_needed bookkeeping if no indexes. + /* + * Check that the query targetlist does not contain any non-junk + * target entries for the relation. If it does, we cannot remove join. + */ + if (checkrel-min_attr 0) + minattr = checkrel-min_attr; + + for (attrno = minattr; attrno checkrel-max_attr; attrno++) + { + if (!bms_is_empty(checkrel-attr_needed[attrno])) + return; + } This part seems pretty wrong. In the first place, you mustn't discriminate against system columns like that. (Consider for instance that the only attr being used from the inner rel is its OID column.) You're failing to check max_attr too --- I believe the loop ought to run from min_attr to max_attr inclusive. In the second place, I don't see how the routine ever succeeds at all if it insists on attr_needed being empty, because whatever attrs are used in the join condition will surely have nonempty attr_needed. What you want is to see whether there are any attrs that are needed *above the join*, which would be something like if (!bms_is_subset(checkrel-attr_needed[attrno], joinrel-relids)) fail; The reference to non-junk in the comment is off base too. Attrs are used or not, there's no concept of a junk attr. + * XXX Seems not worth searching partial indexes because those + * are only usable with a appropriate restriction, so the + * only way they could ever be usable is if there was a + * restriction that exactly matched the index predicate, + * which is possible but generally unlikely. I haven't thought this through entirely, but wouldn't a partial index be okay if it's marked predOK? You might be right that the case is unlikely, but if it's only one extra line to support it ... + if (removable + joinrel-cheapest_total_path keeprel-cheapest_total_path) This test on cheapest_total_path seems pretty wacko --- even granting that you meant to test the costs and not compare pointer addresses, isn't it backward? Also I doubt that the joinrel's cheapest_total_path field is set yet where you're calling this. In any case I'm unconvinced that join removal could ever not be a win, so that test seems unnecessary anyhow. + { + elog(LOG, join removed); + joinrel-pathlist = keeprel-pathlist; + joinrel-joininfo = keeprel-baserestrictinfo; + } Huh? What in the world are you doing to joininfo here? This function should only be manipulating the path list. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs [EMAIL PROTECTED] writes: + if (removable + joinrel-cheapest_total_path keeprel-cheapest_total_path) + { + elog(LOG, join removed); + joinrel-pathlist = keeprel-pathlist; + joinrel-joininfo = keeprel-baserestrictinfo; + } + } On third thought: if you think that the join paths could possibly win, then the correct coding here is something like foreach(path, keeprel-pathlist) add_path(joinrel, ...) The reason is that it's not an all-or-nothing choice: some of the paths might offer cheaper startup cost, or present a useful sort order. So just present them as available alternatives and let add_path sort it out. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] WIP Join Removal
Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2008-09-02 at 12:28 -0400, Tom Lane wrote: I haven't thought this through entirely, but wouldn't a partial index be okay if it's marked predOK? You might be right that the case is unlikely, but if it's only one extra line to support it ... As of now, its it is predOK then that implies there was a qual on the checkrel so can't remove the join. That conclusion seems utterly wrong to me. Per the example of a left join b on (a.x = b.y and b.z = 1) b.z = 1 will bubble down to be a qual on b. It doesn't prevent the join optimization, and it does possibly allow the use of a partial index. In particular a unique index on b.y with a predicate involving b.z would be relevant to the optimization decision here. In slightly more realistic cases, b might be a view on c that imposes a WHERE condition on c.z, so that's another avenue for a qual to exist below the join. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Hint Bits and Write I/O
Alvaro Herrera [EMAIL PROTECTED] writes: I think it makes sense to commit this patch now, per previous discussions on which we have agreed to make incremental changes. Yeah, but at the same time there is merit in the argument that the proposed patch hasn't actually been proven to be usable for anything. I would be a lot happier if there were even a trivial proof-of-concept plugin example submitted with it, just to prove that there were no showstopper problems in the plugin design, like failure to pass essential information or not getting the locking straight. I'm just wondering if the change of usage_count from 16 to 8 bits was discussed and agreed? Umm ... it was not, but given that we have logic in there to limit the usage_count to 5 or so, it's hard to argue that there's a big problem. I confess to not having read the patch in detail --- where did the other 8 bits go to? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: The key problem is that pg_restore is broken: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore is *not* required, because once the dump has been split the restore never needs to be. This argument is nonsense. The typical usage of this capability, IMHO, will be pg_dump -Fc whole.dump pg_restore --schema-before-data whole.dump before.sql pg_restore --data-only whole.dump data.sql pg_restore --schema-after-data whole.dump after.sql followed by editing the schema pieces and then loading. One reason is that this gives you a consistent dump, whereas three successive pg_dump runs could never guarantee any such thing. Another reason is that you may well not know when you prepare the dump that you will need split output, because the requirement to edit the dump is likely to be realized only when you go to load it. In any case, why did you put the switches into pg_restore.c if you thought it wasn't useful for pg_restore to handle them? Not so easy to fix is that COMMENTs might be either before or after data depending on what kind of object they are attached to. Is there anything to fix? Well, yeah. If you attach a comment to an after-data object and test --schema-after-data, you'll notice the comment is lost. And there's yet another issue here, which is that it's not entirely clear that the type of an object uniquely determines whether it's before or after data. Don't understand that. Objects are sorted in well-defined order, specified in pg_dump_sort. After which we do a topological sort that enforces dependency ordering. The question to worry about is whether there can ever be a dependency from a normally-before object to a normally-after object, which would cause the dependency sort to move the latter in front of the former (in one way or another). I'm not saying that any such case can occur today, but I don't think it's an impossibility for it to arise in future. I don't want this relatively minor feature to be putting limits on what kinds of dependencies the system can have. I'm conscious that the major work proposed will take weeks to complete I don't think that what I am proposing is that complicated; I would anticipate it requiring somewhere on the order of two dozen lines of code. I was thinking of doing a preliminary loop through the TocEntry list to identify the ordinal numbers of the first and last data items, and then the main loop could compare a counter to those numbers to decide which of the three sections it was in. Plus you'd need another ArchiveEntry call someplace to prepare the dummy data item if one was needed. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: Another issue is that the rules for deciding which objects are before data and which are after data are wrong. In particular ACLs are after data not before data, which is relatively easy to fix. OK This was partially why I was complaining about having documentation, and a policy for that matter, which goes into more detail about why X is before or after the data. I agree that they're after today, but I don't see any particular reason why they should be one or the other. If a table's ACL revokes owner insert privilege, and was placed before the data load steps, those steps would fail. We are relying on the default table privileges until we are done doing everything we need to do to the tables (and perhaps other objects, I'm not sure if there are any other comparable problems). regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS][PATCHES] odd output in restore mode
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote: I thought the latest conclusion was that changing the behavior of pg_standby itself wouldn't address the problem anyway, and that what we need is just a docs patch recommending that people use safe copying methods in their scripts that copy to the archive area? Plus the rest of this patch, which is really very simple. Why? AFAICT the patch is just a kluge that adds user-visible complexity without providing a solution that's actually sure to work. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
be taught to accept and pass down --schema-before-data and --schema-after-data switches. regards, tom lane binBy3Q4pEZad.bin Description: pg_dump_beforeafter.v7.patch.gz -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Simon Riggs [EMAIL PROTECTED] writes: [80k patch] Surely there is a whole lot of unintended noise in this patch? I certainly don't believe that you meant to change keywords.c for instance. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723
Tatsuo Ishii [EMAIL PROTECTED] writes: Reviewers, please let me know if you find problems with the patches. If none, I would like to commit this weekend. Given that everyone who has tested this has found a different way to crash it, and that the frequency of crash reports shows no signs of slowing down, I have to think that committing it is premature. I tried to look through the patch just now and failed to make any sense of it, because of the complete absence of documentation. Two unexplained examples added to the SELECT reference page don't do it for me. I want to see an explanation of exactly what behaviors are intended to be provided (and, in view of the long TODO list that was posted awhile back, what isn't provided). And there needs to be more than zero internal documentation. A README file, or perhaps a very long file header comment, needs to be provided to explain what's supposed to happen, when, and where when processing a recursive query. (For comparison look at the README.HOT file that was created to explain the HOT patch --- something at about that level of detail would help this patch a lot. Or consider adding a section to chapter 43 in the SGML docs.) We really can't accept a patch that is so poorly documented as to be unreviewable. Unreviewable also means it'll be unmaintainable going forward. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723
Tatsuo Ishii [EMAIL PROTECTED] writes: Reviewers, please let me know if you find problems with the patches. If none, I would like to commit this weekend. Has this patch actually been reviewed yet? The only reports I've seen are from testing; nothing from anyone actually reading the code. I know I've not looked at it yet. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
daveg [EMAIL PROTECTED] writes: On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: In most cases our policy has been that pg_dumpall should accept and pass through any pg_dump option for which it's sensible to do so. I did not make that happen but it seems it'd be a reasonable follow-on patch. I'll remember that next time. Er .. actually that was a direct request for you to do it. Finally, you changed the option value and case from 1 to case 2. getopt_long only returns the value argument when there is no flag to set, so 1 was unique and could have been read as the first no-short option with an argument. Yeah. The code *worked* as you submitted it, but what was bothering me was that the val = 1 table entries worked in two completely different ways for the different argument types. I first thought that you'd broken the existing long argument options --- you hadn't, but I had to go re-read the getopt_long source to convince myself of that. So it seemed like using a different val value might help clarify the difference in behavior for future readers of the code. In particular the next guy who wants to add a long option with parameter value would certainly not be able to use val = 1; but I thought the code as you had it wouldn't give him any clue what to do. If you've got a better idea about how to deal with that, feel free... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Please, no. Negative logic seems likely to cause endless confusion. I think it might actually be less confusing, because with this approach, each switch has an identifiable default (no) and setting it doesn't cause side-effects on settings of other switches. The interactions of the switches as Simon presents 'em seem less than obvious. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data Even that is a lie though, which I guess is what my problem is. True; the stuff done after is done that way at least in part for performance reasons rather than because it has to be done that way. (I think it's not only performance issues, though --- for circular FKs you pretty much have to load the data first.) I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? Argh. The command-line options follow the 'data'/'load' line (--schema-pre-load and --schema-post-load), and so I think those are fine. The problem was that in the documentation he switched to saying they were Pre-Schema and Post-Schema, which could lead to confusion. Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: Are there use cases for just --omit-post-load or --omit-pre-load? Probably not many. The thing that's bothering me is the action-at-a-distance property of the positive-logic switches. How are we going to explain this? By default, --schema-pre-load, --data-only, --schema-post-load are all ON. But if you turn one of them ON (never mind that it was already ON by default), that changes the defaults for the other two to OFF. Then you have to turn them ON (never mind that the default for them is ON) if you want two out of the three categories. You have to bend your mind into a pretzel to wrap it around this behavior. Yeah, it might be convenient once you understand it, but how long will it take for the savings in typing to repay the time to understand it and the mistakes along the way? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS] [PATCHES] WITH RECUSIVE patches 0717
Tatsuo Ishii [EMAIL PROTECTED] writes: Thus I think we should avoid this kind of ORDER BY. Probably we should avoid LIMIT/OFFSET and FOR UPDATE as well. What of index-optimized SELECT max(...) ? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
daveg [EMAIL PROTECTED] writes: Here is an updated version of this patch against head. It builds, runs and functions as expected. I did not build the sgml. Applied with mostly minor cosmetic improvements --- the only actual error I noticed was failing to check whether the server version supports statement_timeout. In most cases our policy has been that pg_dumpall should accept and pass through any pg_dump option for which it's sensible to do so. I did not make that happen but it seems it'd be a reasonable follow-on patch. A minor point is that the syntax -X lock-wait-timeout=n or -X lock-wait-timeout n won't work, although perhaps people used to -X might expect it to. Since we deprecate -X (and don't even document it anymore), I thought that making this work would be much more trouble than it's worth, but perhaps that's open to argument. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: * daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. That's not how pg_dump has traditionally worked, and the point of this patch is to add options to easily segregate the main pieces of the existing pg_dump output (main schema definition, data dump, key/index building). You suggestion brings up an interesting point that should pg_dump's traditional output structure change the --schema-post-load set of objects wouldn't be as clear to newcomers since the load and the indexes would be interleaved in the regular output. Yeah. Also, that is pushing into an entirely different line of development, which is to enable multithreaded pg_restore. The patch at hand is necessarily incompatible with that type of operation, and wouldn't be used together with it. As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data and not try to be any more specific than that. There are corner cases that will turn any simple breakdown into a lie, and I doubt that it's worth trying to explain them all. (Take a close look at the dependency loop breaking logic in pg_dump if you doubt this.) I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-07-17 at 17:10 -0400, Alvaro Herrera wrote: I don't like your wording though; it feels too verbose (and you're losing the ANALYZE in case it's doing both things). How about snprintf(activity, MAX_AUTOVAC_ACTIV_LEN, autovacuum: VACUUM%s%s, vac tab-at_doanalyze ? ANALYZE : , tab-at_wraparound ? (wraparound) : ); Yes, looks good. May I suggest (to prevent wraparound) or something like that? Otherwise, +1. You're not proposing it for 8.3 right? I think I am. It's an important diagnostic for your other fix. I agree, this is important for visibility into what's happening. The string isn't getting translated so I don't see any big downside to applying the patch in back branches. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Pavel Stehule [EMAIL PROTECTED] writes: this version is WIP - I have to clean comments, and will do some documentation. But I am sure, I am not able explain this feature in english well. Please, can some body help me with documentation? So now, plpgsql is more/less ruby :) Applied with revisions. The duplicate-argument detection logic in FuncnameGetCandidates was pretty thoroughly broken, and there were some internal API decisions I didn't like, as well as a few sins of omission like not making ruleutils.c work. I did some work on the docs but there may be a few other places that could stand to mention variadic functions. I didn't do anything about the extra pg_proc column, but will start to work on that now. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Pavel Stehule [EMAIL PROTECTED] writes: sample: select mleast(variadic array[1,2,3,4,5]); Note this would also address Jeff's complaint about not being able to pass zero variadic parameters: select mleast(variadic array[]::int[]); Looks a bit ugly but the type is specified, so it would work correctly with polymorphic functions. Are you intending to change this right now and resubmit, or is it work for later? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Jeff Davis [EMAIL PROTECTED] writes: I don't have a strong opinion, but allowing zero-argument variadic function calls -- and therefore causing foo(variadic int[]) and foo(variadic text[]) to conflict -- makes more sense than requiring one argument. I hadn't even thought about that point, but the idea that those two would conflict bothers me quite a lot. Not least because there's no reasonable way to enforce it with the existing unique indexes on pg_proc. I think you'd have to leave the variadic argument out of proargtypes altogether, and that seems mad. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Zdenek Kotala [EMAIL PROTECTED] writes: Good catch. if we have to bump catalog version then I have there small patch which introduce following macro and remove PageHeaderData stucture from HashMetaPageData: Seems like a bad idea --- PageGetContents is not guaranteed to deliver a maxaligned pointer, so at least in principle this could result in alignment violations. It would accidentally fail to fail as of CVS HEAD, but any future rearrangement of PageHeaderData or HashMetaPageData could create a problem. (Possibly PageGetContents *should* guarantee a maxaligned result, but the current coding does not.) regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Heikki Linnakangas [EMAIL PROTECTED] writes: ... That macro is actually doing the same thing as PageGetContents, so I switched to using that. As that moves the data sligthly on those bitmap pages, I guess we'll need a catversion bump. I'm amazed that Zdenek didn't scream bloody murder about that. You're creating a work item for in-place-upgrade that would not otherwise exist, in exchange for a completely trivial bit of code beautification. (The same can be said of his proposed change to hash meta pages.) I'm planning to go over this patch today and apply it sans the parts that would require catversion bump. We can argue later about whether those are really worth doing, but I'm leaning to not --- unless Zdenek says that he has no intention of making in-place-upgrade handle hash indexes any time soon. BTW, after further thought about the PageGetContents() situation: right now we can change it to guarantee maxalignment for free, since SizeOfPageHeaderData happens to be maxaligned on all platforms (this wasn't the case as recently as 8.2). So I'm thinking we should do that. There's at least one place that thinks that PageGetContents is the same as page + SizeOfPageHeaderData, but that's easily fixed. On balance it seems like hidden assumptions about alignment are a bigger risk than assumptions about that offset --- anyone want to argue the contrary? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] GIN improvements
Teodor Sigaev [EMAIL PROTECTED] writes: http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz I've committed the multicolumn one with minor revisions (fix some poor English in docs and comments, add regression-test coverage). Do you need more review of fast_insert yet? It looked like a number of people commented on it already. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] get_relation_stats_hook()
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote: I think you need to move up a level, and perhaps refactor some of the existing code to make it easier to inject made-up stats. I've looked at ways of refactoring this, but the hooks provided here complement the existing functionality fairly well. The hooks proposed here are completely useless --- it's impossible to return anything except actual catalog data, because whatever is returned is going to be subjected to ReleaseSysCache() later on. I'd think you at least need to be able to turn that into a no-op, or perhaps a pfree() call. Also it's clear that some calls of examine_variable would still return regular syscache entries, so the cleanup behavior has to be determinable on a per-call basis. I also still think that you need to reconsider the level at which the hook gets control. It's impossible given these hook definitions to create fake data for an appendrel or sub-SELECT, which seems to me to be an important requirement, at least till such time as the core planner handles those cases better. My feeling is that the hooks ought to be at more or less the same semantic level as examine_variable and ReleaseVariableStats, and that rather than making special-case hooks for the other couple of direct accesses to STATRELATT, the right thing is to make sure that the examine_variable hook will work for them too. I do now concur that having the hook return a pg_statistic tuple is appropriate --- after looking at the code that uses this stuff, decoupling that representation would be more work than it's worth. I think it might be a good idea to prepare an actual working example of a plug-in that does something with the hook in order to verify that you've got a useful definition. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS] [PATCHES] GIN improvements
Teodor Sigaev [EMAIL PROTECTED] writes: I looked this over and it looks good in general. May I think that patch passed review and commit it? I'd still like to take a look. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] psql command setting
Simon Riggs [EMAIL PROTECTED] writes: Recent patch allowed \timing [on|off] This patch allows same option on all remaining toggles, so every option works in the same way. We already discussed that and rejected it, because the remaining toggles aren't really toggles. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets
Andrew Dunstan [EMAIL PROTECTED] writes: Robert Treat wrote: Hmm... I've always been told that Solaris didn't support this because the Solaris developers feel that IDENT is inherently insecure. We don't actually use the Ident protocol for Unix sockets on any platform. Indeed. If the Solaris folk feel that getupeercred() is insecure, they had better explain why their kernel is that broken. This is entirely unrelated to the known shortcomings of the ident IP protocol. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pgbench minor fixes
Simon Riggs [EMAIL PROTECTED] writes: On Sun, 2008-07-06 at 00:06 +1000, Russell Smith wrote: Simon Riggs wrote: 1. -i option should run vacuum analyze only on pgbench tables, not *all* tables in database. How does this work with custom scripts? I don't think we can justify vacuuming every table in the database just so we get any tables being used for one set of tests. Actually your point is that the -i option wouldn't be used anyway with a custom script ... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [HACKERS] [PATCHES] Explain XML patch v2
Peter Eisentraut [EMAIL PROTECTED] writes: Am Freitag, 4. Juli 2008 schrieb Tom Raney: Regarding the XML datum, in order to support that, will all users need  to compile with libxml?  Are there any lighter weight solutions to  serialize other than libxml? You can create XML without libxml. Seems to me that anyone who wants this feature will probably also want the existing libxml-based features, so they'll be building with libxml anyway. So I'd not be in favor of expending any extra code on a roll-your-own solution. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Sorting writes during checkpoint
Simon Riggs [EMAIL PROTECTED] writes: Anyway, I note that we don't have an easy way of sorting by tablespace, Say what? tablespace is the first component of relfilenode. but I'm sure it would be possible to look up the tablespace for a file within a plugin. If the information weren't readily available from relfilenode, it would *not* be possible for a bufmgr plugin to look it up. bufmgr is much too low-level to be dependent on performing catalog lookups. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
daveg [EMAIL PROTECTED] writes: On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote: - The statement_timeout is set back with statement_timeout = default Maybe it would be better to do = 0 here? Although such decision would go outside the scope of the patch, I see no sense having any other statement_timeout for actual dumping. I'd prefer to leave whatever policy is otherwise in place alone. The policy in place in CVS HEAD is that pg_dump explicitly sets statement_timeout to 0. Setting it to default would break that, and will certainly not be accepted. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets
Garick Hamlin [EMAIL PROTECTED] writes: I have a patch that I have been using to support postgresql's notion of ident authentication when using unix domain sockets on Solaris. This patch basically just adds support for using getupeercred() on Solaris so unix sockets and ident auth works just like it does on Linux and elsewhere. Cool. + #if defined(HAVE_GETPEERUCRED) + #include ucred.h + #endif But this is not cool. There might be systems out there that have getpeerucred() but not ucred.h, and this coding would cause a compile failure (even if they actually wouldn't be trying to use getpeerucred() because they have some other way to do it). You need an explicit configure probe for the header file too, I think. Also, what is the rationale for putting this before the HAVE_STRUCT_CMSGCRED case instead of after? Again, that seems like it could cause unexpected behavioral changes on platforms that work fine now (consider possibility that getpeerucred is there but broken). regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Exposing keywords to clients
Nikhils [EMAIL PROTECTED] writes: Attached is an updated patch, giving the following output. Applied with minor revisions. Here are some comments from me: a) Changed localised to localized to be consistent with the references elsewhere in the same file. What I didn't like about the docs was the classification of the function as a session information function. There's certainly nothing session-dependent about it. I ended up putting it under System Catalog Information Functions, which isn't really quite right either but it's closer than the other one. b) I wonder if we need the default case in the switch statement at all, since we are scanning the statically populated ScanKeywords array with proper category values for each entry. I left it in for safety, but made it just return nulls --- there's no point in wasting more code space on it than necessary, and certainly no point in asking translators to translate a useless string. c) There was a warning during compilation since we were assigning a const pointer to a char pointer values[0] = ScanKeywords[funcctx-call_cntr].name; Yeah, I couldn't see a good way around that either --- we could pstrdup but that seems overkill, and adjusting the API of BuildTupleFromCStrings to take const char ** doesn't look appetizing either. d) oid 2700 has been claimed by another function in the meanwhile. Modified it to 2701. DATA(insert OID = 2701 ( pg_get_keywordsPGNSP PGUID 12 10 400 f f t t s 0 2249 2701 was claimed too. You need to use the unused_oids script to find unique free OIDs. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Patch to change psql default banner v6
Guillaume Lelarge [EMAIL PROTECTED] writes: Attached is a new version of the patch. It fixes a few issues when one adds a pattern to metacommands. Applied with some editorialization. There were some places you evidently hadn't tested against all supported server versions ... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES][UPDATED] A GUC variable to replace PGBE_ACTIVITY_SIZE
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Huh? How could we be assigning to a slot that is not in use? Before the patch, we loop through the shared PgBackendStatus slots (there is MaxBackends of them), and issue a memcpy for each to copy it to our local slot. After that, we check if it was actually in use. After the patch, we still loop through the shared slots, but only issue the memcpy for slots that are in use. Oh, you're talking about *fetching* the slot contents, not *assigning* to them. Sorry, probably should have read the patch instead of just reacting to the comment ... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Explain XML patch
Simon Riggs [EMAIL PROTECTED] writes: You probably need to say a whole lot more about this patch. I've updated the wiki with things I've learned while submitting patches. http://wiki.postgresql.org/wiki/Submitting_a_Patch Anybody mind if I update that to put more emphasis on the need for documentation? As in documentation patches are *required* if your patch adds or changes user-visible behavior? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Explain XML patch
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Anybody mind if I update that to put more emphasis on the need for documentation? As in documentation patches are *required* if your patch adds or changes user-visible behavior? Sure, but I do documentation updates for non-English speakers occasionally. I know that docs from non-native speakers often need heavy editorialization, but that is little excuse for leaving the committer to do *all* the docs work. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Fix pg_ctl restart bug
Bruce Momjian [EMAIL PROTECTED] writes: However, as of 2004-10-15, this has not worked. :-( The attached patch is the one that caused the bug --- on non-Unix systems, SYSTEMQUOTE is , meaning zero-length string. I should have seen the bug when I applied the contributed patch in 2004. So, shouldn't this fix be back-patched? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout
daveg [EMAIL PROTECTED] writes: lock-timeout sets statement_timeout to a small value while locks are being taken on all the tables. Then it resets it to default. So it could reset it to whatever the new default is. reset to default is *surely* not the right behavior; resetting to the setting that had been in effect might be a bit sane. But the whole design sounds pretty broken to start with. The timer management code already understands the concept of scheduling the interrupt for the nearest of a couple of potentially active timeouts. ISTM any patch intended to add a feature like this ought to extend that logic rather than hack around changing the values of global variables. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout
daveg [EMAIL PROTECTED] writes: Are we talking about the same patch? Maybe not --- I thought you were talking about a backend-side behavioral change. Because I don't know what you are refering to with timer management code and scheduling the interrupt in the context of pg_dump. I'm not sure that I see a good argument for making pg_dump deliberately fail, if that's what you're proposing. Maybe I'm just too old-school, but there simply is not any other higher priority for a database than safeguarding your data. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Pavel Stehule [EMAIL PROTECTED] writes: Tom Lane wrote: Your point about the syntax is good though. It would be better if the syntax were like create function foo (a text, variadic b int[]) or maybe even better create function foo (a text, variadic b int) I don't see problem with your syntax. It well block combination OUT and VARIADIC parameter - my one request, variadic parameter have to be array. Well, we should certainly store the parameter type as an array in proargtypes, because that makes this feature transparent to all the PLs. However, it doesn't follow that the CREATE FUNCTION syntax has to specify the array type rather than the element type. I think the Java precedent might be good reason to go with using the element type in the function declaration. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Andrew Dunstan [EMAIL PROTECTED] writes: This proposal strikes me as half-baked. Either we need proper and full support for variadic functions, or we don't, but I don't think we need syntactic sugar like the above (or maybe in this case it's really syntactic saccharine). What would you consider proper and full support? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: What would you consider proper and full support? I don't know. But this doesn't feel like it. That's a fairly weak argument for rejecting a patch that provides a feature many people have asked for. I thought the patch was pretty clever, actually. The main functionality complaint someone might level against it is that all the variadic arguments have to be (coercible to) the same type. However, that's still pretty useful, and I don't see a reasonable solution that provides more generality than that in a type-safe way. I'm quite happy that you can't write sprintf() using this ;-) A different line of argument is whether this functionality is sufficiently badly needed that we should get out in front of the SQL standard on providing it, and risk being stuck with legacy behavior if they eventually adopt some other mechanism to solve the same problem. I'm not sure how worried I am about that. There are certainly a boatload of Postgres-isms in and around CREATE FUNCTION already, so it's hard to make a case against just one more. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] variadic function support
Andrew Dunstan [EMAIL PROTECTED] writes: But if I have foo( a text, b int[]) it looks odd if both these calls are legal: foo('a',1,2,3,) foo('a',ARRAY[1,2,3]) which I understand would be the case with the current patch. Maybe I misunderstand what is supposed to happen, but I believe that if the function is marked VARIADIC then the second case would in fact be rejected: the signature of the function for parameter-matching purposes is text followed by one or more ints, never text and int[]. I'm also still curious to know how the following would be handled: foo(a text[], b text[]) I think a is just text[], full stop. Only the last parameter is interpreted differently for variadic. Your point about the syntax is good though. It would be better if the syntax were like create function foo (a text, variadic b int[]) or maybe even better create function foo (a text, variadic b int) since (a) this makes it much more obvious to the reader what the function might match, and (b) it leaves the door open for marking multiple parameters as variadic, if we can figure out what that means. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Simplify formatting.c
Bruce Momjian [EMAIL PROTECTED] writes: The third step is for oracle_compat.c::initcap() to use formatting.c::str_initcap(). You can see the result; patch attached (not applied). This greatly reduces the size of initcap(), with the downside that we are making two extra copies of the string to convert it to/from char*. Is this acceptable? I'd say not. Can't we do some more refactoring and avoid so many useless conversions? Seems like str_initcap is the wrong primitive API --- the work ought to be done by a function that takes a char pointer and a length. That would be a suitable basis for functions operating on both text datums and C strings. (Perhaps what I should be asking is whether the performance of upper() and lower() is equally bad. Certainly all three should have comparable code, so maybe they all need refactoring.) regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Simplify formatting.c
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: I'd say not. Can't we do some more refactoring and avoid so many useless conversions? Seems like str_initcap is the wrong primitive API --- the work ought to be done by a function that takes a char pointer and a length. That would be a suitable basis for functions operating on both text datums and C strings. Yea, I thought about that idea too but it is going to add a strlen() calls in some places, but not in critical ones. Sure, but the cost-per-byte of the strlen should be a good bit less than the cost-per-byte of the actual conversion, so that doesn't bother me too much. Actually it seems like the hard part is not so much the input representation as the output representation --- what should the base-level initcap routine return, to be reasonably efficient for both cases? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Rewrite sinval messaging to reduce contention
I wrote: ... enabling LWLOCK_STATS shows that the contention rate on the sinval locks is now completely negligible --- one block per thousand acquisitions on SInvalWriteLock, and less than one in 1 on SInvalReadLock. The vast majority of the LWLock contention now comes from WALInsertLock and the LockMgr locks: Lock# acquisitions # times blocked SInvalReadLock 6469840 380 SInvalWriteLock 240567 163 WALInsertLock 2388805 89142 LockMgr partition locks 8253142 177715 For comparison's sake I rebuilt CVS HEAD with LWLOCK_STATS enabled and repeated the same test. I got SInvalLock 81090044505750 WALInsertLock 2382254 62747 LockMgr locks 10657480171799 The change in sinval numbers is gratifying, but for awhile I didn't believe these results because of the discrepancy in LockMgr acquisition figures. I think though that what we are seeing here is that CVS HEAD has the reset-everyone-on-sinval-queue-overflow behavior, which results in a whole lot of useless cache resets, which results in a lot of unnecessary cache reloads, and every one of those requires taking AccessShareLock on one or more system catalogs in order to suck the data back in. So the reduction in LockMgr traffic is explained by not doing so many cache resets. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] Rewrite sinval messaging to reduce contention
so this is quite successful at eliminating spinlock-level contention and does a reasonable job at reducing the amount of LWLock processing. Comments? Does anyone have more realistic test cases for this problem, or want to fiddle with the tuning #defines in sinvaladt.c? regards, tom lane bin7h80XxstUn.bin Description: sinval-rewrite-1.patch.gz -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Rewrite sinval messaging to reduce contention
I wrote: In this thread back in January http://archives.postgresql.org/pgsql-performance/2008-01/msg1.php we saw evidence that contention for SInvalLock was a problem, and we (or at least I) concluded that the sinval messaging system needed a nontrivial rewrite to get rid of its habit of sending everyone to read the queue at the same time. Here is the proposed rewrite. After further thought, I came up with two more improvements: * It turns out not to be too hard to apply the idea of moving multiple messages per LWLock acquisition on the writing side too, so I did that. * There is not actually any good reason why writers have to block readers or vice versa, except in the infrequent case that SICleanupQueue is needed. Writers don't look at the ProcStates that the readers are changing. The only point of overlap is that a writer will change maxMsgNum which readers want to look at --- but we are already assuming in several places that reading or writing an int is atomic, and if we assume that here too, it seems to work fine. To implement this, I split SInvalLock into SInvalReadLock and SInvalWriteLock. This seems to be reaching a point of diminishing returns for the particular test case I'm using --- the TPS rate only went up from 700 to 730 or so. However, enabling LWLOCK_STATS shows that the contention rate on the sinval locks is now completely negligible --- one block per thousand acquisitions on SInvalWriteLock, and less than one in 1 on SInvalReadLock. The vast majority of the LWLock contention now comes from WALInsertLock and the LockMgr locks: Lock# acquisitions # times blocked SInvalReadLock 6469840 380 SInvalWriteLock 240567 163 WALInsertLock 2388805 89142 LockMgr partition locks 8253142 177715 So I think this patch accomplishes the goal of making sinval not be a cause of contention. Patch version 2 attached. regards, tom lane binW9bQG9JmZj.bin Description: sinval-rewrite-2.patch.gz -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] GIN improvements
Teodor Sigaev [EMAIL PROTECTED] writes: So, I didn't see any comments/objections and I intend to commit this patch for next two days and synchronize 'fast insert into GIN' patch with CVS. Objections? I think it hasn't really gotten reviewed at all (certainly not by me). If you want other people to look it over you should wait for next month's commit fest. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL
Alex Hunsaker [EMAIL PROTECTED] writes: Ok Here it is: -Moves CheckDropPermissions and friends from utility.c to aclchk.c (pg_drop_permission_check) -Makes all the Remove* functions take a DropStmt *, they each do their own foreach() loop and permission checks Applied with revisions. I had suggested moving stuff into aclchk.c on the assumption that it needed to be called from more than one place. But after you got rid of the separate RemoveIndex and RemoveView functions (which was a good idea), there was only one call site for those functions, so I just folded them into tablecmds.c; and in fact integrated CheckDropPermissions right into RemoveRelations so it looked more like all the other DropStmt functions. Also, RemoveTypes/RemoveDomains might as well be integrated completely since we're doing relations that way. I also chose to clean up the conversion dropping stuff, since there didn't seem to be any point in the separation between ConversionDrop and DropConversionCommand. The only actual bug I found was that you'd used break where you should have used continue for a non-existent object in each routine, so a multi-object DROP IF NOT EXISTS would fail to perform as expected. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] relscan.h split
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Also, it seemed like some of those .c files had no business poking into the scan structs anyway; particularly contrib. Did you check whether the inclusions could be avoided? Not really, unless we were to provide something a routine that returns the current block of a scan. Current buffer you mean. That wouldn't be a bad idea --- the wart I added to genam.c the other day to recheck the current tuple could be replaced with that, I think, though it wouldn't really be much less warty. BTW I think your change in pgstattuple.c is probably dangerous: if the relation gets extended between where heap_beginscan_strat sets rs_nblocks and where you put RelationGetNumberOfBlocks, I think the block counting will get messed up. That one really does need access to the internals. Other than that, this looks pretty sane to me. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] SSL configure patch
[EMAIL PROTECTED] writes: Here is the SSL patch we discussed previously for 8.3.1. This appears to change user-facing behavior, which makes the lack of documentation updates unacceptable. Also, it would be helpful to reviewers if you provided a link to that previous discussion. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Better formatting of functions in pg_dump
Greg Sabino Mullane [EMAIL PROTECTED] writes: Why the random switching between newline-before and newline-after styles? Please be consistent. I thought they were all after. On second glance, they still seem all after? Oh, my mistake, I had failed to see that the patch was getting rid of newline-before style in this function. I think you might have gone a bit overboard on adding whitespace, but the previous objection is nonsense, sorry. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Better formatting of functions in pg_dump
Greg Sabino Mullane [EMAIL PROTECTED] writes: Attached patch puts the metadata about a function, especially the language name, at the top of the CREATE FUNCTION statement, above the possibly long, multi-line function definition. Why the random switching between newline-before and newline-after styles? Please be consistent. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] SQL: table function support
David Fetter [EMAIL PROTECTED] writes: On Mon, Jun 09, 2008 at 05:56:59PM -0700, Neil Conway wrote: I'm not necessarily opposed to this, but I wonder if we really need *more* syntax variants for declaring set-returning functions. The existing patchwork of features is confusing enough as it is... The way we declare set-returning functions ranges from odd to byzantine. A clear, easy-to-understand syntax (even if it's just sugar over something else) like Pavel's would go a long way toward getting developers actually to use them. Apparently, whether the syntax is byzantine or not is in the eye of the beholder. I find the TABLE() syntax to be *less* clear. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] relscan.h split
Alvaro Herrera [EMAIL PROTECTED] writes: I propose the following patch which moves the struct definitions to a separate new header relscan_internal.h. This seems a little bizarre, seeing that there is almost nothing in relscan.h except those structs. Perhaps a better idea would be to put the opaque-pointer typedefs into heapam.h and genam.h respectively, and then see where you could remove inclusions of relscan.h. Also, it seemed like some of those .c files had no business poking into the scan structs anyway; particularly contrib. Did you check whether the inclusions could be avoided? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL
Alvaro Herrera [EMAIL PROTECTED] writes: Alex Hunsaker escribió: I'm not proposing this patch for actual submission, more of a would this work? If I'm not missing something glaring obvious Ill go ahead and make the rest of the Remove things behave the same way I don't think there's anything wrong with that in principle. However, does your patch actually work? The changes in expected/ is unexpected, I think. No, they look right --- we're losing gripes about earlier tables cascading to subobjects of later tables, which is exactly what we want. I don't really like the patch though; it seems kind of a brute force solution. You've got ProcessUtility iterating through a list of objects and doing a little bit of work on each one, and then making a new list that RemoveRelation (which is now misnamed) again iterates through and does a little bit of work on each one, and then passes that off *again*. There's no principled division of labor at work there; in particular it's far from obvious where things get locked. You've also managed to embed an assumption not previously present, that all the objects in the list are of exactly the same type. I think it would be better if the DropStmt loop were responsible for constructing a list of ObjectAddresses that it handed off directly to performMultipleDeletions. This is why I was imagining replacing RemoveRelation et al with something that passed back an ObjectAddress, and getting worried about introducing references to ObjectAddress into all those header files. Another possibility would be to get rid of RemoveRelation et al altogether and embed that code right into the DropStmt processing (though at this point we'd need to split it out of ProcessUtility; it's already a bit large for where it is). That didn't seem tremendously clean either, though it would at least have the virtue of improving consistency about where privilege checks etc get done. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL
Alex Hunsaker [EMAIL PROTECTED] writes: Yep, I thought about doing the reverse. Namely Just passing the DropStmt to RemoveRelation(s). But then all the permission check functions are in utility.c. Splitting those out seemed to be about the same as splitting out all the ObjectAddress stuff... Well, that might actually be a good approach: try to get ProcessUtility back down to being just a dispatch switch. It's pretty much of a wart that we're doing any permissions checking in utility.c at all. Possibly those functions should be moved to aclchk.c and then used from RemoveRelation(s) and friends, which would stay where they are but change API. I think the fundamental tension here is between two theories of organizing the code: we've got the notion of collect operations on an object type together, which leads to eg putting RemoveTSConfiguration in tsearchcmds.c, as against collect similar operations together, which leads to wanting all the DROP operations in the same module. In the abstract it's not too easy to choose between these, but I think you'll probably find that the first one works better here, mainly because each of those object-type modules knows how to work with a particular system catalog. A DROP module is going to find itself importing all the catalog headers plus probably utility routines from all over. So that's leading me to lean towards keeping RemoveRelation et al where they are and distributing the work currently done in ProcessUtility out to them. This sounds duplicative, but about all that really is there to duplicate is a foreach loop, which you're going to need anyway if the routines are to handle multiple objects. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Refactoring xlogutils.c
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: The code motion in md.c looks fairly bogus; was that a copy-and-paste error? That's intentional. That check has been moved to smgrcreate(), so that it's done before calling TablespaceCreateDbSpace(). The reason for that is that smgrcreate() is now called for every XLogReadBuffer() invocation, so we want to make it exit quickly if there's nothing to do. Oh, I see --- I misread the patch the first time. That seems like a reasonable change. On second look though, I think we should actually leave that check in mdcreate(). Even though it'd be dead code since we do the same check in smgrcreate already, removing it changes the semantics of mdcreate(): it'd no longer be acceptable to call mdcreate() if the file is open already. I don't believe there is any way to call mdcreate except through smgrcreate, so this is probably not a problem. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: ... I wonder if it would be worth refactoring the code so that a multiple-object DROP is implemented via performMultipleDeletions(). This would have more than just cosmetic advantages: it would no longer matter what order you listed the tables in. But the refactoring required looks bigger and more tedious than I want to tackle right now. Hmm, this is a bit ugly. I'd vote for doing the refactoring. However, I'd say you should commit the patch you currently have and let one of the younger hackers fix that problem -- it looks like an good beginner project. Agreed --- I committed what I had, anyone want to volunteer for refactoring the execution of DropStmt? After looking again, I think that this is not technically very difficult, but coming up with something that looks tasteful to everyone might be tricky. In particular I didn't see a nice way to do it without using struct ObjectAddress in a bunch of header files that don't currently include dependency.h. A possible response to that is to move ObjectAddress into postgres.h, but that seems a bit ugly too. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] \timing on/off
Heikki Linnakangas [EMAIL PROTECTED] writes: I gather that the conclusion of the thread Making sure \timing is on, starting at: http://archives.postgresql.org/pgsql-general/2008-05/msg00324.php was that we should leave \H and \a alone for now, because it's not clear what \H off would do. Yeah, I think that was the consensus. Here's a patch, based on David's patch, that turns timing into a \pset variable, and makes \timing an alias for \pset timing. This makes \timing behave the same as \x. This seems a bit random to me. AFAIR all of the \pset properties are associated with formatting of table output. \timing doesn't belong. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] SQL: table function support
Pavel Stehule [EMAIL PROTECTED] writes: what is more logical and consistent? They're both utterly arbitrary, but the setof syntax has been in Postgres since forever, and applies to more things than just record. The other one doesn't fit in with anything else --- it's just a syntactic wart. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Refactoring xlogutils.c
Heikki Linnakangas [EMAIL PROTECTED] writes: Attached is an updated version of my patch to refactor the XLogOpenRelation/XLogReadBuffer interface, in preparation for the relation forks patch, and subsequently the FSM rewrite patch. The code motion in md.c looks fairly bogus; was that a copy-and-paste error? Otherwise it looks pretty sane, but I have one stylistic gripe: I'm dubious about your willingness to assume that pfree() is enough for getting rid of a fake relcache entry. Relcache entries are complex beasts and it may not continue to work to do that. It's also possible that we'd have additional resources attached to them someday. So I think it'd be worth having a FreeFakeRelcacheEntry() routine to localize the knowledge of how to get rid of them. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches