Re: [HACKERS] HOT for PostgreSQL 8.3
Ühel kenal päeval, T, 2007-02-13 kell 09:38, kirjutas Tom Lane: Heikki Linnakangas [EMAIL PROTECTED] writes: Hannu Krosing wrote: Are we actually doing that ? I.E are null bitmaps really allocated in 1 byte steps nowadays ? Yes. Not really; we still have to MAXALIGN at the end of the bitmap. The point is that you can get 8 bits in there before paying the first additional MAXALIGN increment. It's all moot anyway since 8 bits isn't enough for a pointer ... With 8k pages and MAXALIGN=8 we just barely can, as with current page structure (tuple headers together with data) the minimal tuple size for that case is 24 for header + 8 for data = 32 bytes which means 256 tuples per page (minus page header) and so we can store tuple number in 8 bits. OTOH, for same page HOT tuples, we have the command and trx ids stored twice first as cmax,xmax of the old tuple and as cmin,xmin of the updated tuple. One of these could probably be used for in-page HOT tuple pointer. As we can't rely on get-xmax-from-next-in-chain behaviour for off-page tuples, we need to move the other way, that is storing a pointer to previous version and getting xmin from there. That means keeping a chain back pointers in xmin fields for all but the oldest tuple in HOT chain/cycle. This requires a little shuffling around of xmin/xmax info when removing dead HOT chain members, but this would void the need to do any index changes during HOT updates. Avoiding all index updates is probably good, as it means we dont need to do any index page locking. So the proposed structure for HOT chain is like this: * Oldest tuple in chain has real xmin/xmax, this needs a flag to be recognisable, of we may just start transaction counter at 64K so that any xmin that fits in 2 bytes is immediately recognized as hot back pointer. Or start cmin at 1 and use cmin=0 as pointer flag. * All newer tuples have real xmax, and in-page pointer to previous version in place of xmin. the real xmin is xmax of the previous tuple. When previous tuple is removed, xmin is stored in the new oldest tuple. * Index can point to any tuple in HOT chain. Finding the tuple by index * To find a tuple, one errives to (possibly) center of HOT chain, and the tuple may be found either by following the in-page pointer stored in xmin field or ctid pointers. * If the tuples ctid is pointing to off page, which means it must thus be the newest one in HOT chain and also end of it. The chain ends also when the tuple is inserted by an rollbacked transaction. * there is a special case when the index pointer points to a rollbacked tuple. In this case the tuple can't be removed, but should be reused on next update. But this is the case for any HOT rollbacks. The proposed structure should enables cyclic reuse of dead tuples as they fall out of visibility window and avoids updating index pointers. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixing insecure security definer functions
Regarding the advisory on possibly insecure security definer functions that I just sent out (by overriding the search path you can make the function do whatever you want with the privileges of the function owner), the favored solution after some initial discussion in the core team was to save the search path at creation time with each function. Have you considered hardcoding the schema for each object where it was found at creation time ? This seems more intuitive to me. Also using a search path, leaves the possibility to inject an object into a previous schema. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] HOT WIP Patch - version 1
This is a WIP patch based on the recent posting by Simon and discussions thereafter. We are trying to do one piece at a time and intention is to post the work ASAP so that we could get early and continuous feedback from the community. We could then incorporate those suggestions in the next WIP patch. To start with, this patch implements HOT-update for a simple case when there is enough free space in the same block so that it can accommodate the new version of the tuple. A necessary condition for doing HOT-update is that none of the index columns is changed. The old version is marked as HEAP_UPDATE_ROOT and the new version is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated, no new index entry is added. When fetching a tuple using an index, if the root tuple is not visible to the given snapshot, the ctid chain is followed until a visible tuple is found or end of HOT-update chain is reached. The prior_xmax/next_xmin chain is validated while following the ctid chain. This patch is generated on the current CVS head. It passes all the regression tests, but I haven't measured any performance impact since thats not the goal for posting this early version. There are several things that are not yet implemented and there are few unresolved issues for which I am looking for community help and feedback. Open Issues: -- - CREATE INDEX needs more work in the HOT context. The existing HOT tuples may require chilling for the CREATE INDEX to work correctly. There are concerns about the crash-safety on chilling operation. Few suggestions were posted in this regard. We need to conclude that and post a working design/patch. - We need to find a way to handle DEAD root tuples, either convert them into stubs or overwrite them with a new version. We can also perform pointer swinging from the index. Again there are concerns about crash-safety and concurrent index-scans working properly. We don't have a community consensus on any of the suggestions in this regard. But hopefully we would converge on some design soon. - Retail VACUUM. We need to implement the block-level vacuum for UPDATEs to find enough free space in the block to do HOT-update. Though we are still discussing how to handle the dead root tuples, we should be able to remove any intermediate dead tuples in the HOT-update chain safely. If we do so without fixing the root tuple, the prior_xmax/next_xmin chain would be broken. A similar problem exists with freezing HOT tuples. Whats Next: - In the current implementation, an HOT-updated tuple can not be vacuumed because it might be in the middle of the access path to the heap-only visible tuple. This can cause the table to grow rapidly even if autovacuum is turned on. The HOT-update chain also keeps growing if there is enough free space in the block. I am thinking of implementing some sort of HOT-update chain squeezing logic so that intermediate dead tuples can be retired and vacuumed away. This would also help us keep the HOT-update chain small enough so that the chain following does not become unduly costly. I am thinking of squeezing the HOT-update chain while following it in the index fetch. If the root tuple is dead, we follow the chain until the first LIVE or RECENTLY_DEAD tuple is found. The ctid pointer in the root tuple is made point to the first LIVE or RECENTLY_DEAD tuple. All the intermediate DEAD tuples are marked ~HEAP_UPDATE_ROOT so that they can be vacuumed in the next cycle. We hold an exclusive lock on the page while doing so. That should avoid any race conditions. This infrastructure should also help us retail vacuum the block later. Please let me know your comments. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com NewHOT-v1.1-pgsql-head.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixing insecure security definer functions
Am Mittwoch, 14. Februar 2007 10:21 schrieb Zeugswetter Andreas ADI SD: Have you considered hardcoding the schema for each object where it was found at creation time ? Unless anyone has solved the halting problem lately, I don't think it's possible to determine at creation time which objects will be accessed. At least not for all languages. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] integer datetimes
Our docs for the integer datetime option says: Use 64-bit integer storage for datetimes and intervals, rather than the default floating-point storage. This reduces the range of representable values but guarantees microsecond precision across the full range (see Section 8.5 for more information). Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] anyelement2 pseudotype
Tom Lane wrote: As for actually adding it, grep for all references to ANYELEMENT and add code accordingly; shouldn't be that hard. Note you'd need to add an anyarray2 at the same time for things to keep working sanely. The enum patch [1] does exactly this with an ANYENUM pseudo-type. It should provide a pretty good overview of what will be required. Cheers Tom [1] http://archives.postgresql.org/pgsql-patches/2007-02/msg00239.php ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.1 stats issues on Win32
Magnus Hagander wrote: I've been looking at backporting the stats fix committed to head and 8.2 into 8.1, but realised that it's just not going to work. 8.1 still uses the dual stats processor stuff, which means that the simplification just is not possible. The most obvious result is that autovacuum is very likely to fail on 8.1 if your system load is high enough. (all of stats fail of course, but autovac is a very common user of this) Should we note this somewhere? Probably. Oh, and if we were looking for reasons to deprecate 8.1, this sounds like a pretty good one for me. I still think we should keep patchin it, but it is a very good reason to encourage our users to switch to 8.2. Now, we could try to fix it there, but we've seen a lot of issues since day one coming from the inherit socket in two steps, so even if we can get this one fix, there could be more lurking around in the dual-process model. I personally don't think it's worth investing the required time into fixing that on 8.1. I'm inclined to agree - time would be better spent improving current and future versions than chasing stats issues which as we know can be a real pita to find. Regards, Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Writing triggers in C++
Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Writing triggers in C++
Andreas Pflug wrote: Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. I've written an aggregate function that uses c++ stl hashes, and it seems to work pretty well. I'd think that using exceptions should be fine, as long as you make sure to _always_ catch any exception that might be thrown inside your own c++ code, and don't let it propagate into backend code. STL allows you to specify custom allocator classes as template parameters to hash, vector and the like. You can use that to let STL allocate memory from the correct memory context. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT for PostgreSQL 8.3
On 2/14/07, Hannu Krosing [EMAIL PROTECTED] wrote: OTOH, for same page HOT tuples, we have the command and trx ids stored twice first as cmax,xmax of the old tuple and as cmin,xmin of the updated tuple. One of these could probably be used for in-page HOT tuple pointer. I think we recently merged cmin/cmax into a single combo cid. So I don't think we can use cmin for storing back pointer. Idea of using xmin seems feasible though. As we can't rely on get-xmax-from-next-in-chain behaviour for off-page tuples, we need to move the other way, that is storing a pointer to previous version and getting xmin from there. That means keeping a chain back pointers in xmin fields for all but the oldest tuple in HOT chain/cycle. Why can't we store the real xmin/xmax at the end of the chain and store forward pointers in the xmax ? I find the idea of having back pointers more compelling though. This requires a little shuffling around of xmin/xmax info when removing dead HOT chain members, but this would void the need to do any index changes during HOT updates. Avoiding all index updates is probably good, as it means we dont need to do any index page locking. I agree. If we can design a solution that does not require any index updates, that would save us a lot. One problem with such a approach though is that we might have to live with a dead tuple/stub until another update occurs and the tuple/stub is reused. So the proposed structure for HOT chain is like this: * Oldest tuple in chain has real xmin/xmax, this needs a flag to be recognisable, of we may just start transaction counter at 64K so that any xmin that fits in 2 bytes is immediately recognized as hot back pointer. Or start cmin at 1 and use cmin=0 as pointer flag. * All newer tuples have real xmax, and in-page pointer to previous version in place of xmin. the real xmin is xmax of the previous tuple. When previous tuple is removed, xmin is stored in the new oldest tuple. * Index can point to any tuple in HOT chain. Finding the tuple by index * To find a tuple, one errives to (possibly) center of HOT chain, and the tuple may be found either by following the in-page pointer stored in xmin field or ctid pointers. Can we quickly figure out based on the given snapshot and the root tuple xmin/xmax, whether to follow the forward or the backward pointer to arrive at a visible tuple ? * If the tuples ctid is pointing to off page, which means it must thus be the newest one in HOT chain and also end of it. The chain ends also when the tuple is inserted by an rollbacked transaction. * there is a special case when the index pointer points to a rollbacked tuple. In this case the tuple can't be removed, but should be reused on next update. But this is the case for any HOT rollbacks. One problem is with aborted HOT-updates. According to this scheme, xmin of the aborted version is stored in the previous version. What happens when that gets updated again and committed ? If we follow the back pointer from the aborted tuple to fetch the xmin, we would actually be fetching the txid of the committed transaction which later updated the tuple. This would fool us to incorrectly believe that the aborted tuple is LIVE. May be we can set XMIN_INVALID for the next tuple in the chain when a tuple being updated has t_ctid pointing to a tuple in the same page, other than itself. I am sure there are interesting interactions between vacuum-ing that needs to be considered as well. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Writing triggers in C++
Florian G. Pflug wrote: Andreas Pflug wrote: Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. I've written an aggregate function that uses c++ stl hashes, and it seems to work pretty well. I'd think that using exceptions should be fine, as long as you make sure to _always_ catch any exception that might be thrown inside your own c++ code, and don't let it propagate into backend code. STL allows you to specify custom allocator classes as template parameters to hash, vector and the like. You can use that to let STL allocate memory from the correct memory context. What happens if Postgres raises an elog(ERROR) in the code you're catching exceptions in? Is it propagated outwards? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Plan for compressed varlena headers
So to implement the agreed upon plan I see the following items. 1) Replace the VARATT_SIZEP macro with SET_VARLENA_LEN. I intend to keep this patch separate as it will bitrot quickly and would be best if it could be applied as soon as possible even before the main patch is committed. I just sent a patch to do this with some notes to pgsql-patches. 2) Replace VARATT* macros to store and retrieve the toast bits in a manner that will work for variable length headers. This either means storing the bits at the least-significant position or using network byte order. If we want to allow storing 1 headers unaligned which I think would be good then I still think we have to read them using bytewise lookups -- ie by casting to (char*). That means network byte order or using the low order bits is equally efficient. 3) Have VARSIZE and VARATT_SIZE recognize short headers and produce accurate values. 4) Change heap_deform*tuple, heap_getattr and any other functions and macros in heapam.c that step through tuples to recognize the new headers. Actually mostly these should just work because att_addlength uses VARSIZE but there may be additional changes. Other places that use att_addlength and need to be checked are heaptuple.c, indextuple.c, arrayfuncs.c, datum.c, varlena.c, and execQual.c, and flatfiles.c. 5) Change pg_detoast_datum to recognize the new header types and decompress them. 5) Change heap_form_tuple to compress headers where possible. 6) Fix the toaster to generate new-style toasted data Did I miss anything? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT for PostgreSQL 8.3
Hannu Krosing [EMAIL PROTECTED] writes: Ãhel kenal päeval, T, 2007-02-13 kell 09:38, kirjutas Tom Lane: It's all moot anyway since 8 bits isn't enough for a pointer ... With 8k pages and MAXALIGN=8 we just barely can, as with current page structure (tuple headers together with data) the minimal tuple size for that case is 24 for header + 8 for data = 32 bytes which means 256 tuples per page (minus page header) and so we can store tuple number in 8 bits. But neither of those assumptions is acceptable --- I would think in fact that people would become *more* interested in having large pages with HOT, because it would give more room for updates-on-the-same-page. Besides, you forgot the case of an empty tuple (= 8 columns, all NULL). OTOH, for same page HOT tuples, we have the command and trx ids stored twice first as cmax,xmax of the old tuple and as cmin,xmin of the updated tuple. One of these could probably be used for in-page HOT tuple pointer. This proposal seems awfully fragile, because the existing tuple-chain-following logic *depends for correctness* on comparing each tuple's xmin to prior xmax. I don't think you can just wave your hands and say we don't need that cross-check. Furthermore it seems to me you haven't fixed the problem, which is that you can't remove the chain member that is being pointed at by off-page links (either index entries or a previous generation of the same tuple). As described, you've made that problem worse because you're trying to say we don't know which of the chain entries is pointed at. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] anyelement2 pseudotype
Tom Dunstan [EMAIL PROTECTED] writes: The enum patch [1] does exactly this with an ANYENUM pseudo-type. It should provide a pretty good overview of what will be required. ANYENUM? What's the use-case for that? These special cases in the type system are enough of a pain-in-the-neck for the code that I'm disinclined to add one without a very solid argument for it. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] anyelement2 pseudotype
Tom Lane wrote: Tom Dunstan [EMAIL PROTECTED] writes: The enum patch [1] does exactly this with an ANYENUM pseudo-type. It should provide a pretty good overview of what will be required. ANYENUM? What's the use-case for that? These special cases in the type system are enough of a pain-in-the-neck for the code that I'm disinclined to add one without a very solid argument for it. Well ... *somebody* suggested it here ... http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php ;-) cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Writing triggers in C++
Alvaro Herrera wrote: Florian G. Pflug wrote: Andreas Pflug wrote: Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. I've written an aggregate function that uses c++ stl hashes, and it seems to work pretty well. I'd think that using exceptions should be fine, as long as you make sure to _always_ catch any exception that might be thrown inside your own c++ code, and don't let it propagate into backend code. STL allows you to specify custom allocator classes as template parameters to hash, vector and the like. You can use that to let STL allocate memory from the correct memory context. What happens if Postgres raises an elog(ERROR) in the code you're catching exceptions in? Is it propagated outwards? In my case, the only possible source of an elog(ERROR) would palloc(), when the machine is out of memory (Does it even throw elog(ERROR), or does it return NULL just as malloc() ?). Since this is rather unlikely, and would probably lead to a postgres shutdown anyway, I didn't really care about that case. You're right of course that this is different for triggers - they're much more likely to call SPI functions or otherwise interact with the backend than my rather self-contained aggregate function. Still, I'd think that an elog(ERROR) would propagate outwards - but any C++ destructors of local (stack-allocated) objects wouldn't be called. So, to be safe, I guess one would need to surround any call that could call elog(ERROR) with an appropriate handler that translates the elog(ERROR) into a C++ exception. This C++ exception would have to be translated back into an elog(ERROR) at the outmost level of C++ code. Maybe we should create some wiki page or pgfoundry project that collects all glue code, tipps and tricks that people invented to glue C++ into the postgres backend. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] anyelement2 pseudotype
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: ANYENUM? What's the use-case for that? Well ... *somebody* suggested it here ... http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php Well, in that usage (ie, for enum I/O functions) it's not actually necessary that the type system as a whole understand ANYENUM as something that any enum type can be cast to, because you're going to hot-wire the pg_type entries during CREATE ENUM anyway. What I'm wondering is if there's a use-case for it during ordinary user operations with enums. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] HOT WIP Patch - version 1
Pavan Deolasee wrote: - We need to find a way to handle DEAD root tuples, either convert them into stubs or overwrite them with a new version. We can also perform pointer swinging from the index. Again there are concerns about crash-safety and concurrent index-scans working properly. We don't have a community consensus on any of the suggestions in this regard. But hopefully we would converge on some design soon. This seems to be the most fundamental problem we have at the moment. If we stick to the basic rule we have now that a live tuple's ctid doesn't change, the only way to get rid of a dead tuple at the root of the update chain is by changing the index pointer. The backward-pointers Hannu suggested or the scan the whole page to find the previous tuple would allow reuse of those dead tuples for new tuples in the chain, but even those methods wouldn't completely eliminate the problem. We could say that in some scenarios we just leave behind some dead tuples/stubs that can never be reclaimed. What do you guys think, if we can bring it down to just an extra line pointer, would that be acceptable? We could also do that for now and implement the pointer-swinging later if it turns out to be a problem in practice. What's the verdict on relaxing the live tuple's ctid doesn't change rule? If we did allow that within a page, what would we need to change? Inside the backend we'd have to make sure that whenever a ctid is used, the page is kept pinned. How likely is it that it would brake any external projects, and how difficult would it be to detect the broken usage pattern? It would mean that the ctid system column could change within a transaction, unless we change it so that it returns the ctid of the root tuple + xmin + cmin, but it'd be a user-visible change anyhow. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Writing triggers in C++
Florian G. Pflug wrote: Alvaro Herrera wrote: Florian G. Pflug wrote: Andreas Pflug wrote: Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. I've written an aggregate function that uses c++ stl hashes, and it seems to work pretty well. I'd think that using exceptions should be fine, as long as you make sure to _always_ catch any exception that might be thrown inside your own c++ code, and don't let it propagate into backend code. STL allows you to specify custom allocator classes as template parameters to hash, vector and the like. You can use that to let STL allocate memory from the correct memory context. What happens if Postgres raises an elog(ERROR) in the code you're catching exceptions in? Is it propagated outwards? In my case, the only possible source of an elog(ERROR) would palloc(), when the machine is out of memory (Does it even throw elog(ERROR), or does it return NULL just as malloc() ?). Since this is rather unlikely, and would probably lead to a postgres shutdown anyway, I didn't really care about that case. No, an out-of-memory leads to elog(ERROR), which rolls back the current transaction. This releases some memory so the system can continue working. In fact we periodically see out-of-memory reports, and they certainly _don't_ cause a general shutdown. You're right of course that this is different for triggers - they're much more likely to call SPI functions or otherwise interact with the backend than my rather self-contained aggregate function. Still, I'd think that an elog(ERROR) would propagate outwards - but any C++ destructors of local (stack-allocated) objects wouldn't be called. Probably stack allocation doesn't matter much, as I think that would be unwinded by the longjmp call. I don't know a lot about C++, but if there are allocations in the data area then those would probably not be freed. But it makes me wonder -- is longjmp very compatible with C++ exceptions at all? I know that it causes problems with POSIX thread cancel_push() and cancel_pop() for example (meaning, they can't be used). So, to be safe, I guess one would need to surround any call that could call elog(ERROR) with an appropriate handler that translates the elog(ERROR) into a C++ exception. This C++ exception would have to be translated back into an elog(ERROR) at the outmost level of C++ code. Sort of a PG_RE_THROW() in the exception handler, I guess. Maybe we should create some wiki page or pgfoundry project that collects all glue code, tipps and tricks that people invented to glue C++ into the postgres backend. If it can be made to work, sure; in techdocs. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] integer datetimes
Magnus Hagander [EMAIL PROTECTED] writes: Our docs for the integer datetime option says: Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. Buildfarm proves only that the regression tests don't expose any bugs, not that there aren't any. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) FWIW, there are several Linux distros that build their RPMs that way, so it's not like people aren't using it. But it seems like we find bugs in the datetime/interval stuff all the time, as people trip over different weird edge cases. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Writing triggers in C++
On Wed, 2007-02-14 at 13:19 -0300, Alvaro Herrera wrote: Probably stack allocation doesn't matter much, as I think that would be unwinded by the longjmp call. I don't know a lot about C++, but if there are allocations in the data area then those would probably not be freed. But it makes me wonder -- is longjmp very compatible with C++ exceptions at all? C-style stack unwinding (using setjmp and longjmp from csetjmp) is incompatible with exception-handling and is best avoided. (Stroustrup, p. 433). Which presumably means that in practice, the interaction between these features is implementation-defined. -Neil ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] anyelement2 pseudotype
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: ANYENUM? What's the use-case for that? Well ... *somebody* suggested it here ... http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php Well, in that usage (ie, for enum I/O functions) it's not actually necessary that the type system as a whole understand ANYENUM as something that any enum type can be cast to, because you're going to hot-wire the pg_type entries during CREATE ENUM anyway. What I'm wondering is if there's a use-case for it during ordinary user operations with enums. If you look further down in the thread you'll see that I suggested hiding it, because i didn't think there was much use case for it in user code, but you didn't seem to think much of that idea. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] NULL and plpgsql rows
On Tue, Feb 13, 2007 at 05:55:11PM -0500, Bruce Momjian wrote: Is there a TODO here? --- Jim Nasby wrote: On Oct 2, 2006, at 6:28 PM, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: However, the test right above that means that we'll fail if the user tries something like row_variable := NULL;: The patch you seem to have in mind would allow row_variable := int_variable; to succeed if the int_variable chanced to contain NULL, which is surely not very desirable. Well, that's Tom's objection, though I'm not sure if by 'int_variable' he means 'internal' or 'integer'. Personally, I think it would be useful to just allow setting a row or record variable to NULL as I showed it above; ie: no variables involved. This is something you might want to do to invalidate a row/record variable after taking some action (perhaps deleting a row). You'd also think that you should be able to detect if a record variable is null, as you can with row. So, I suggest: * Allow row and record variables in plpgsql to be set to NULL It's not clear if it's a wise idea to allow this assignment from a variable. It may be better to only allow explicitly setting them, ie: row_variable := NULL; * Allow testing a record variable to see if it's NULL Currently works for row variables, but not record variables -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] anyelement2 pseudotype
adding an anyelement2 pseudotype ... The context was a compatibility SQL function to support Oracle's DECODE function. The reason it's not in there already is we didn't seem to have quite enough use-case to justify it. Do you have more? No. Even this case, for me, is more an expedient than a necessity. I could just rewrite my Oracle code to use CASE, but I've a lot of code to convert, and the transformation is a bit error prone. I'm also looking at a scripted code edit to rewrite the Oracle stuff, and comparing this to the cost a PG compatibility function. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark [EMAIL PROTECTED] writes: 1) Replace the VARATT_SIZEP macro with SET_VARLENA_LEN. If we're going to do this then it's time to play the name game; that is, pick names we actually like and that work well together. The original TOAST patch made things a bit messy in this area. We should try to improve the situation now, not make it worse. We have the following macro names that are commonly used for varlena struct access: VARHDRSZ VARATT_SIZEP(x) VARATT_SIZE(x) VARATT_DATA(x) VARSIZE(x) equivalent to VARATT_SIZE VARDATA(x) equivalent to VARATT_DATA My recollection is that VARHDRSZ, VARSIZE, VARDATA were Berkeley-era and the other three were added by the TOAST patch. The lack of consistency is a bit glaring, and having two ways to do the exact same thing doesn't seem like an improvement. A first-cut proposal: VARHDRSZsame as now, ie, size of 4-byte header VARSIZE(x) for *reading* a 4-byte-header length word VARDATA(x) same as now, ie, ptr + 4 bytes SET_VARSIZE(x, len) for *writing* a 4-byte-header length word We have to remove VARATT_SIZEP anyway to catch unmodified code, and I'd propose removing VARATT_SIZE and VARATT_DATA too, as they never caught on. We'll also need names for the macros that can read the length and find the data of a datum in either-1-or-4-byte-header format. These should probably be named as variants of VARSIZE and VARDATA, but I'm not sure what exactly; any thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] integer datetimes
On Wed, Feb 14, 2007 at 11:27:31AM -0500, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Our docs for the integer datetime option says: Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. Buildfarm proves only that the regression tests don't expose any bugs, not that there aren't any. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) FWIW, there are several Linux distros that build their RPMs that way, so it's not like people aren't using it. But it seems like we find bugs in the datetime/interval stuff all the time, as people trip over different weird edge cases. Certainly, but is it more likely to trip on these in the integer datetime case, really? //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT WIP Patch - version 1
Heikki Linnakangas [EMAIL PROTECTED] writes: What's the verdict on relaxing the live tuple's ctid doesn't change rule? I think that's unacceptable; it is known that that will break the ODBC and JDBC drivers, as well as any other programs that make use of the ctid for re-finding a tuple they read earlier in the same transaction. We have not only never deprecated client-side use of ctid for this, but actively encouraged it, for instance by going out of our way to support fast access for queries WHERE ctid = 'constant'. What's more, your proposal would break plain old UPDATE and DELETE, as well as SELECT FOR UPDATE, none of which promise to hold a pin continuously on every page containing a tuple they might decide to revisit (by ctid) later. Are you prepared to disallow hash join and sort/merge join in all such queries? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Writing triggers in C++
Alvaro Herrera wrote: Florian G. Pflug wrote: Alvaro Herrera wrote: Florian G. Pflug wrote: Andreas Pflug wrote: Tom Lane wrote: Jacob Rief [EMAIL PROTECTED] writes: I tried to write a trigger using C++. That is most likely not going to work anyway, because the backend operating environment is C not C++. If you dumb it down enough --- no exceptions, no RTTI, no use of C++ library --- then it might work, I can confirm that it does work this way. I've written an aggregate function that uses c++ stl hashes, and it seems to work pretty well. I'd think that using exceptions should be fine, as long as you make sure to _always_ catch any exception that might be thrown inside your own c++ code, and don't let it propagate into backend code. STL allows you to specify custom allocator classes as template parameters to hash, vector and the like. You can use that to let STL allocate memory from the correct memory context. What happens if Postgres raises an elog(ERROR) in the code you're catching exceptions in? Is it propagated outwards? In my case, the only possible source of an elog(ERROR) would palloc(), when the machine is out of memory (Does it even throw elog(ERROR), or does it return NULL just as malloc() ?). Since this is rather unlikely, and would probably lead to a postgres shutdown anyway, I didn't really care about that case. No, an out-of-memory leads to elog(ERROR), which rolls back the current transaction. This releases some memory so the system can continue working. In fact we periodically see out-of-memory reports, and they certainly _don't_ cause a general shutdown. Sorry, I explained my point badly. What I actually meant is that in my specific use-case (Lots of small transaction, non of which use much memory), the only reason for out-of-memory conditions I've even seen was some application gone wild that ate up all available memory. In that case, postgres dies sooner or later, because any memory freed during rollback is immediatly used by that other application. In general, of course, you're right. You're right of course that this is different for triggers - they're much more likely to call SPI functions or otherwise interact with the backend than my rather self-contained aggregate function. Still, I'd think that an elog(ERROR) would propagate outwards - but any C++ destructors of local (stack-allocated) objects wouldn't be called. Probably stack allocation doesn't matter much, as I think that would be unwinded by the longjmp call. I don't know a lot about C++, but if there are allocations in the data area then those would probably not be freed. But it makes me wonder -- is longjmp very compatible with C++ exceptions at all? I know that it causes problems with POSIX thread cancel_push() and cancel_pop() for example (meaning, they can't be used). Yeah, the memory taken by stack-allocated objects is freed (basically by just resetting the stack pointer). But normally, C++ would call the destructor of a stack-allocated objects _before_ resetting the stack-pointer. Since setjmp/longjmp don't know anything about C++, they will omit this step. Whether this causes problems or not depends on the objects that you allocated on the stack... So, to be safe, I guess one would need to surround any call that could call elog(ERROR) with an appropriate handler that translates the elog(ERROR) into a C++ exception. This C++ exception would have to be translated back into an elog(ERROR) at the outmost level of C++ code. Sort of a PG_RE_THROW() in the exception handler, I guess. Maybe we should create some wiki page or pgfoundry project that collects all glue code, tipps and tricks that people invented to glue C++ into the postgres backend. If it can be made to work, sure; in techdocs. I was thinking that two pairs of macros, PG_BEGIN_CPP, PG_END_CPP and PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND should be able to take care of the exception handling issues. You'd need to wrap any code-block that calls postgres functions that might do an elog(ERROR) inside PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND. Vice versa, any block of c++ code that is called from the backend would need to start with PG_BEGIN_CPP, and end with PG_END_CPP. Do you see any other possible problems, aside from memory managemt issues? greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Writing triggers in C++
Neil Conway wrote: On Wed, 2007-02-14 at 13:19 -0300, Alvaro Herrera wrote: Probably stack allocation doesn't matter much, as I think that would be unwinded by the longjmp call. I don't know a lot about C++, but if there are allocations in the data area then those would probably not be freed. But it makes me wonder -- is longjmp very compatible with C++ exceptions at all? C-style stack unwinding (using setjmp and longjmp from csetjmp) is incompatible with exception-handling and is best avoided. (Stroustrup, p. 433). Which presumably means that in practice, the interaction between these features is implementation-defined. Well, as long as you don't longjmp past an C++ catch block, and don't throw an C++ exception past an setjmp handler, there should be no problem I think. Or at least I can't imagine how a problem could arise.. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] integer datetimes
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Our docs for the integer datetime option says: Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. Buildfarm proves only that the regression tests don't expose any bugs, not that there aren't any. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) FWIW, there are several Linux distros that build their RPMs that way, so it's not like people aren't using it. But it seems like we find bugs in the datetime/interval stuff all the time, as people trip over different weird edge cases. I think it's disappointing, to say the least, that we treat this code as a sort of second class citizen. BTW, the buildfarm has a majority of machines using it by design - it's in the default set of options in the distributed config file. If we think there are bugs we haven't found, then we need to engage in some sort of analytical effort to isolate them. I don't see any reason in principle why this code should be any more buggy than the float based datetimes, and I see plenty of reason in principle why we should make sure it's right. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT WIP Patch - version 1
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: What's the verdict on relaxing the live tuple's ctid doesn't change rule? I think that's unacceptable; it is known that that will break the ODBC and JDBC drivers, as well as any other programs that make use of the ctid for re-finding a tuple they read earlier in the same transaction. AFAIK the JDBC driver doesn't use ctid. But ODBC and other programs do. We have not only never deprecated client-side use of ctid for this, but actively encouraged it, for instance by going out of our way to support fast access for queries WHERE ctid = 'constant'. The idea I had was to change what the ctid system column returns to root ctid + xmin + cmin. As long as programs treat the ctid as an opaque string, it should work. Tid scan would use that to locate the original tuple. What's more, your proposal would break plain old UPDATE and DELETE, as well as SELECT FOR UPDATE, none of which promise to hold a pin continuously on every page containing a tuple they might decide to revisit (by ctid) later. Are you prepared to disallow hash join and sort/merge join in all such queries? No, of course not. We'd have to do the same thing here; use root tid + xmin + cmin instead of just ctid. But now that I think of it, how do we get the root tid of a tuple? I suppose we'd be back to having backpointers or scanning the whole page... I guess pointer-swinging it is, then. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT WIP Patch - version 1
On 2/14/07, Tom Lane [EMAIL PROTECTED] wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: What's the verdict on relaxing the live tuple's ctid doesn't change rule? I think that's unacceptable; it is known that that will break the ODBC and JDBC drivers, as well as any other programs that make use of the ctid for re-finding a tuple they read earlier in the same transaction. We have not only never deprecated client-side use of ctid for this, but actively encouraged it, for instance by going out of our way to support fast access for queries WHERE ctid = 'constant'. What's more, your proposal would break plain old UPDATE and DELETE, as well as SELECT FOR UPDATE, none of which promise to hold a pin continuously on every page containing a tuple they might decide to revisit (by ctid) later. Are you prepared to disallow hash join and sort/merge join in all such queries? Not that I am suggesting we do this, but I believe we had some solution to this problem in the earlier version of HOT. The live tuple when copied-back to the root tuple, the tuple is marked with a HEAP_COPIED_BACK flag. HeapTupleSatisfiesUpdate() checks for this flag and if set returns a new return code, HeapTupleCopiedBack. heap_update() returns the same to ExecUpdate along with the ctid of the root tuple. The UPDATE/DELETE operation then retried on the root tuple, very similar to read-committed update/delete. The xmax of the copied-back tuple is set so that its not vacuumed away until all the current transactions are completed. Though I have tested this patch several times and it seems to work fine, I probably don''t have insight into the code as much others on this list has. So if someone wants to take a look and see if it would work fine, I would be more than happy to post the latest HOT patch (older design). Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT WIP Patch - version 1
What's the verdict on relaxing the live tuple's ctid doesn't change rule? If we did allow that within a page, what would we need to change? I already said this, but why would this need to be visible from the outside ? A few assumptions: no back pointers indexes only point at slots marked as roots (and non hot tuples) During vacuum, you swap the tuples and keep a stub at the slot that the user's ctid might be pointing at. You mark the stub to detect this situation. When a select/update by ctid comes along it needs to do one step to the root and use that tuple instead. It needs a second vacuum (or a per page vacuum during update) to remove the extra stub when it is dead and not recently dead. I fail to see the hole. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Writing triggers in C++
Florian G. Pflug writes: Maybe we should create some wiki page or pgfoundry project that collects all glue code, tipps and tricks that people invented to glue C++ into the postgres backend. If it can be made to work, sure; in techdocs. I was thinking that two pairs of macros, PG_BEGIN_CPP, PG_END_CPP and PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND should be able to take care of the exception handling issues. You'd need to wrap any code-block that calls postgres functions that might do an elog(ERROR) inside PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND. Vice versa, any block of c++ code that is called from the backend would need to start with PG_BEGIN_CPP, and end with PG_END_CPP. I've made positive experiences with such a setup, although I've spared the PG_BEGIN_CPP/PG_END_CPP by doing the exception conversion in a C++ language handler that instantiates functors using the portable class loading technique described in this paper: http://www.s11n.net/papers/classloading_cpp.html I'd be glad to help out on a pgfoundry project to make C++ a better citizen for extending postgres. regards, andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] HOT WIP Patch - version 1
But now that I think of it, how do we get the root tid of a tuple? I suppose we'd be back to having backpointers or scanning the whole page... I guess pointer-swinging it is, then. During vacuum you see a root [stub] not recently dead. You follow the chain to detect if you find a live tuple that can replace the root. You replace the root. You replace the original with a stub that points at the root and mark it recently dead (and HEAP_COPIED_BACK aka Pavan). ... (see prev post) No need for anyone but vacuum to find a root. Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] anyelement2 pseudotype
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: ANYENUM? What's the use-case for that? Well ... *somebody* suggested it here ... http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php Well, in that usage (ie, for enum I/O functions) it's not actually necessary that the type system as a whole understand ANYENUM as something that any enum type can be cast to, because you're going to hot-wire the pg_type entries during CREATE ENUM anyway. Well, it's not just I/O functions in pg_type, it's functions, operators, aggregates, index methods etc. There are 34 OIDs used up by the enum patch, and most of those catalog entries would have to be duplicated per enum type by CREATE TYPE in the absence of ANYENUM; since you'd given the hand-wavy suggestion anyway, it seemed better not to spam the catalogs. Regarding the type system understanding ANYENUM, most of the type system treats ANYENUM identically to ANYELEMENT, the only parts that really need to understand it are the bits that try to tie down concrete types. For those, non-enum types are rejected if the generic type is ANYENUM. That's it, basically. What I'm wondering is if there's a use-case for it during ordinary user operations with enums. Not really. I allowed it to occur in plpgsql, mostly for completeness, but I didn't bother for the other P/Ls as there didn't seem to be much of a use case. Cheers Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] anyelement2 pseudotype
Tom Dunstan [EMAIL PROTECTED] writes: Regarding the type system understanding ANYENUM, most of the type system treats ANYENUM identically to ANYELEMENT, the only parts that really need to understand it are the bits that try to tie down concrete types. The reason I'm feeling annoyed with ANYfoo stuff today is that yesterday I had to put a special hack for ANYARRAY into the ri_triggers code, which you'd think would have no concern with it. But perhaps this is just an indication that we need to refactor the code in parse_coerce.c. (The problem in ri_triggers is that it uses find_coercion_pathway() which does not concern itself with ANYfoo types.) Anyway, objection withdrawn --- I just thought it seemed a good idea to question whether we were adding a frammish we didn't really need. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT WIP Patch - version 1
Just to summarize: o Every tuple gets a heap ctid o Only the root tuple gets an index entry o We can easily remove dead tuples that aren't the root because by definition, nothing points to them, including backends and indexes The problem is that a dead root tuple has to stay around because while no backends can see it, the index does. We could move a live tuple into root ctid slot, but if we do that, the live tuple changes its ctid while it is visible. Could we insert index tuples for the live tuple and then remove the root tuple, perhaps later? So basically we break the chain at that time. The problem there is that we basically have nothing better than what we have now --- we are just delaying the index insert, and I don't see what that buys us. Could a _new_ tuple take over the root tuple slot? It is new, so it doesn't have a ctid yet to change. I think that means the index walking could go forward or backward, but on the same page. To illustrate, with ctid slot numbers: [1] root INSERT [2] [3] [1] root INSERT [2] UPDATE [3] [1] root INSERT (dead) [2] UPDATE 1 [3] [1] root UPDATE 2 [2] UPDATE 1 [3] --- Heikki Linnakangas wrote: Pavan Deolasee wrote: - We need to find a way to handle DEAD root tuples, either convert them into stubs or overwrite them with a new version. We can also perform pointer swinging from the index. Again there are concerns about crash-safety and concurrent index-scans working properly. We don't have a community consensus on any of the suggestions in this regard. But hopefully we would converge on some design soon. This seems to be the most fundamental problem we have at the moment. If we stick to the basic rule we have now that a live tuple's ctid doesn't change, the only way to get rid of a dead tuple at the root of the update chain is by changing the index pointer. The backward-pointers Hannu suggested or the scan the whole page to find the previous tuple would allow reuse of those dead tuples for new tuples in the chain, but even those methods wouldn't completely eliminate the problem. We could say that in some scenarios we just leave behind some dead tuples/stubs that can never be reclaimed. What do you guys think, if we can bring it down to just an extra line pointer, would that be acceptable? We could also do that for now and implement the pointer-swinging later if it turns out to be a problem in practice. What's the verdict on relaxing the live tuple's ctid doesn't change rule? If we did allow that within a page, what would we need to change? Inside the backend we'd have to make sure that whenever a ctid is used, the page is kept pinned. How likely is it that it would brake any external projects, and how difficult would it be to detect the broken usage pattern? It would mean that the ctid system column could change within a transaction, unless we change it so that it returns the ctid of the root tuple + xmin + cmin, but it'd be a user-visible change anyhow. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixing insecure security definer functions
On Tue, 2007-02-13 at 20:01 -0500, Tom Lane wrote: I would suggest that the search path be added as an explicit parameter to CREATE FUNCTION, with a default of the current setting. The main reason for this is that it's going to be a real PITA for pg_dump if we don't allow an explicit specification. It might also be worth allowing PATH NULL or some such locution to specify the current behavior, for those who really want it. (In particular, most C functions would want this to avoid useless overhead for calls to things that aren't affected by search path.) It might also be useful to allow something such as PATH CURRENT to attach the current schema as the search path for all calls of that function. This would be useful because then SQL scripts for installing 3rd party modules could install nicely into any schema by merely setting search_path before running the script. For instance, PostGIS doesn't support installing into a schema other than public because they want to have a static SQL install script rather than generate one based on your desired search path. Regards, Jeff Davis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark wrote: 2) Replace VARATT* macros to store and retrieve the toast bits in a manner that will work for variable length headers. This either means storing the bits at the least-significant position or using network byte order. If we want to allow storing 1 headers unaligned which I think would be good then I still think we have to read them using bytewise lookups -- ie by casting to (char*). That means network byte order or using the low order bits is equally efficient. I think the plan was to have the macro code conditional on big-little endian. We can deal with doing 1 headers unaligned at some future date if we feel we need it, and the macros will make it transparent. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT WIP Patch - version 1
On Wed, Feb 14, 2007 at 01:56:03PM -0500, Bruce Momjian wrote: Could we insert index tuples for the live tuple and then remove the root tuple, perhaps later? So basically we break the chain at that time. The problem there is that we basically have nothing better than what we have now --- we are just delaying the index insert, and I don't see what that buys us. At some point - inserting into the block would not be possible, as there is no free space. Would that be a good time to do the index insert? Then, a later vacuum would eventually prune out the whole old chain. As long as vacuuming the intermediate entries in the chain keeps the block with free space, there is no need to remove the root tuple. If space ever runs out (vacuum not running frequently enough - many updates performed in the same interval) - fall back to the mechanism that is being used today. I see it buying increased performance for rows that are frequently updated. If it can delay modifying the indices to only once every 10 or more updates, it seems to me that the improvement should be significant. Perhaps PostgreSQL could be used for page hit counters again... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] anyelement2 pseudotype
I wrote: Tom Dunstan [EMAIL PROTECTED] writes: Regarding the type system understanding ANYENUM, most of the type system treats ANYENUM identically to ANYELEMENT, the only parts that really need to understand it are the bits that try to tie down concrete types. The reason I'm feeling annoyed with ANYfoo stuff today is that yesterday I had to put a special hack for ANYARRAY into the ri_triggers code, which you'd think would have no concern with it. Actually ... now that I re-read that remark, I think you may have done the wrong things with ANYENUM. I think that ANYENUM may in fact be closer to ANYARRAY than it is to ANYELEMENT, because ANYELEMENT pretty nearly means anything at all whereas ANYARRAY identifies a subset of types that share some properties, which is an accurate description of ANYENUM as well. In particular, it is sensible to have b-tree index opclasses that are declared to operate on ANYARRAY. If you've got b-tree support for ANYENUM, as I hope you do, then you'll have to patch that same spot in ri_triggers that now knows about ANYARRAY. So you might want to take another pass through the code and see if you shouldn't be modeling ANYENUM more closely on ANYARRAY than ANYELEMENT. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Plan for compressed varlena headers
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: If we want to allow storing 1 headers unaligned which I think would be good then I still think we have to read them using bytewise lookups -- ie by casting to (char*). That means network byte order or using the low order bits is equally efficient. I think the plan was to have the macro code conditional on big-little endian. We can deal with doing 1 headers unaligned at some future date if we feel we need it, and the macros will make it transparent. Forcing bytewise access does not sound like a good plan to me --- you're very much at the mercy of the compiler whether you get good code for that. Plus you can't do it without multiple evaluation of the macro argument, which is something I'd really prefer we not introduce into such a widely-used macro. The only argument in favor is to save a couple bytes of alignment padding, but since this is only going to happen for wide data values, the significance of that is minimal. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
On Tue, 13 Feb 2007, Marc Munro wrote: On Mon, 2007-12-02 at 00:10 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: Consider a table C containing 2 child records C1 and C2, of parent P. If transaction T1 updates C1 and C2, the locking order of the the records will be C1, P, C2. Another transaction, T2, that attempts to update only C2, will lock the records in order C2, P. The locks on C2 and P are taken in different orders by the two transactions, leading to the possibility of deadlock. But the lock on P is shared, hence no deadlock. Doh! Yes, you are right. It is not that simple. For deadlock to occur, we need a transaction that takes an exclusive lock on P as well as on one of the children. Let us replace T2 with a new transaction, T3, which is going to update P and only one of its children. If T3 is going to update P and C1 without the possibility of deadlock against T1, then it must take out the locks in the order C1, P. If, on the other hand, it is going to update P and C2, then the locks must be taken in the order P, C2. This means that there is no single strategy we can apply to T3 that will guarantee to avoid deadlocks with transactions that update only C (ie transactions, which to a developers point of view do nothing to P, and so should be unable to deadlock with T3). This scenario would do it, too: Table X has rows representing an object of some kind. These objects contain other objects, which are represented by rows in table Y. Suppose X stores a count of the Y objects it contains in a particular status (because your application needs to get this quickly) and suppose the count is updated by a trigger. The Y objects hold references to the containing X, checked by FK constraints. A query which updates the status of one or more Ys can deadlock with another instance of itself. It first locks a Y row, then shared-locks an X row, then updates the X row (when the trigger runs). Two transactions could get to the shared-lock stage simultaneously, then deadlock. I've come across some a bit like this in my own applications. I'm sure there are many, many, others. From an application developer's standpoint there are few options, none of them ideal: 1) Insist on a locking policy that requires updates to first lock their parent records. This is horrible for so many reasons. It should be unnecessary; it causes exclusive locking on parent records, thereby eliminating the gains made by introducing row share locks in 8.1; it is onerous on the developers; it is error-prone; etc I once tried to define a locking order for rows in a database. It doesn't work (though this was at a time when FK constraint checking used FOR UPDATE locks, which, of course, made things much worse). This wasn't specifically for FK checks, but they were an important cause of deadlocks. Firstly, you have no idea of the order in which rows locked by a statement will be locked. UPDATE d SET counter=counter+1 WHERE d.a=1 could deadlock with UPDATE d SET counter=counter+1 WHERE d.b=1. Secondly, even if you could defining a usable locking order across all of the rows in your database (not just the tables) is nearly impossible. I suppose you could base it on, say, the order (tablename, id) but you'd have to go to extreme lengths to follow this. Imagine having to determine the id of every row you want to update and the ID and table name of every row they'll lock because of FK constraints and then sort a big set of 'SELECT .. FOR SHARE' and 'UPDATE' statements. That's a lot of queries - and huge variety of useful queries you can't use any more. And once you've done all that you might find your application has race conditions - there are sometimes other reasons for performing queries in a certain order. 2) Remove FK constraints to eliminate the possibility of RI-triggered deadlocks. Ugh. Deadlocks aren't the only problem FK constraints' locks are going to cause you. It's quite possible you have, somewhere, a small number of rows referenced via FKs by a huge number of rows. Think about the amount of locking (not just locks on rows, but internal locks on bits of cache) and cache coherency logic going on in a production SMP machine. It'll depend on your load and schema, of course, but I've found the constraints to be mostly impractical in my production systems. Some I can keep, but only if I know that the checks aren't going to be triggered too often. 3) Encapsulate all transactions in some form of retry mechanism that traps deadlocks and retries those transactions. This may not be practicable, and incurs all of the overhead of encountering and trapping deadlocks in the first place. Also, as each deadlock occurs, a number of locks will be left active before deadlock detection kicks in, increasing the window for further deadlocks. On a busy system, the first deadlock may well trigger a cascade of further deadlocks. It's essential to
Re: [HACKERS] integer datetimes
On Wed, Feb 14, 2007 at 12:38:12PM -0500, Andrew Dunstan wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Our docs for the integer datetime option says: Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. Buildfarm proves only that the regression tests don't expose any bugs, not that there aren't any. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) FWIW, there are several Linux distros that build their RPMs that way, so it's not like people aren't using it. But it seems like we find bugs in the datetime/interval stuff all the time, as people trip over different weird edge cases. I think it's disappointing, to say the least, that we treat this code as a sort of second class citizen. BTW, the buildfarm has a majority of machines using it by design - it's in the default set of options in the distributed config file. If we think there are bugs we haven't found, then we need to engage in some sort of analytical effort to isolate them. I don't see any reason in principle why this code should be any more buggy than the float based datetimes, and I see plenty of reason in principle why we should make sure it's right. That was exactly what I thought, which is why I was kinda surprised to see that note in the configure stuff. If we go with that, then we can say that *any* new feature is less tested, no? ;-) //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Day and month name localization uses wrong locale category
Added to TODO: o Use LC_TIME for localized weekday/month names, rather than LC_MESSAGES http://archives.postgresql.org/pgsql-hackers/2006-11/msg00390.php --- Euler Taveira de Oliveira wrote: Peter Eisentraut wrote: What about using strftime()? So we couldn't worry about gettext translations; all is localized. Why didn't I think it before? :-) I'll try to code a patch today later if noone objects. How is this going? Finished. Sorry for the delay I had some trouble understanding how backend treats the locale stuff (Neil pointed out the path). Now TM mode is returning strftime() output. It would be nice if in the future we change this to pg_strftime() but unfortunately the last one is not i18n. :( template1=# show lc_time; lc_time - pt_BR (1 registro) template1=# select to_char(now(), 'TMDay, DD TMMonth '); to_char --- Segunda, 20 Novembro 2006 (1 registro) template1=# set lc_time to 'C'; SET template1=# select to_char(now(), 'TMDay, DD TMMonth '); to_char -- Monday, 20 November 2006 (1 registro) template1=# set lc_time to 'de_DE'; SET template1=# select to_char(now(), 'TMDay, DD TMMonth '); to_char -- Montag, 20 November 2006 (1 registro) template1=# Comments? -- Euler Taveira de Oliveira http://www.timbira.com/ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] XML regression test failure
sponge failed the XML regression test once today: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spongedt=2007-02-14%2007:30:02 If i read that correctly we could just add a simple ORDER BY table_name to that query to get a more reliable result. Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT WIP Patch - version 1
Zeugswetter Andreas ADI SD wrote: A few assumptions: no back pointers indexes only point at slots marked as roots (and non hot tuples) During vacuum, you swap the tuples and keep a stub at the slot that the user's ctid might be pointing at. You mark the stub to detect this situation. When a select/update by ctid comes along it needs to do one step to the root and use that tuple instead. As Pavan pointed out, that's more or less what he ended up doing originally. You need to mark the stub with the current most recent xid, and wait until that's no longer running. Only after that you can remove the stub. It needs a second vacuum (or a per page vacuum during update) to remove the extra stub when it is dead and not recently dead. Requiring two vacuums to remove the tuple sounds bad at first, but it's actually not so bad since both steps could by done by retail vacuum, or even normal scans while. I fail to see the hole. The only potential problem I can see is how to make sure that a heap scan or a bitmap heap scan doesn't visit the tuple twice. If we make sure that the page is scanned in one go while keeping the buffer pinned, we're good. We already do that except for system catalogs, so I believe we'd have to forbid hot updates on system tables, like we forbid bitmap scans. To me this sounds like the best idea this far. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixing insecure security definer functions
Andreas, Have you considered hardcoding the schema for each object where it was found at creation time ? This seems more intuitive to me. This isn't practical. Consider the schema qualification syntax for operators. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Tom Lane [EMAIL PROTECTED] writes: Plus you can't do it without multiple evaluation of the macro argument, which is something I'd really prefer we not introduce into such a widely-used macro. I don't see any way to do VARSIZE without multiply evaluating its argument. It's got to mask out the relevant bits then take the appropriate number of bytes and shift the appropriate number of bits to the right. If we wanted to require GCC we could use temporary variables in macros. Or we could use a global variable and declare that you can't use VARSIZE inside the argument to VARSIZE. (Actually I can't construct a scenario where it would break, perhaps it's safe.) But I just did a find-grep and the most complicated expression I can find is VARATT_SIZE(DatumGetPointer(values[i])). And that's in code I think I'll have to touch anyways. I can't find any instances of anything that would be dangerous or noticeably inefficient. It would be pretty strange code that wanted to know the size of a datum but didn't care enough about the actual contents of the datum to store it in a temporary variable. The only circumstances I could see it happening is if someone wrote code like: len = VARSIZE(datum = DirectFunctionCall()) There are no instances that I can find of that form that I can find. The only argument in favor is to save a couple bytes of alignment padding, but since this is only going to happen for wide data values, the significance of that is minimal. Yeah I realized the same thing earlier. At least in the case of four-byte headers padding is logical on all fronts since nul bytes will have the right bitpattern. And it's a big cpu win on four-byte headers since it affects reading non-compressed varlenas being passed around in the executor. I'm not sure the same logic holds for two-byte headers. They're a) not so expensive in the first place, b) not so large in the first place and c) would require padding with a special pad byte. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Plus you can't do it without multiple evaluation of the macro argument, I don't see any way to do VARSIZE without multiply evaluating its argument. Some variant of #define VARSIZE(x) (ntohl((x)-vl_len) 0x3fff) #define VARSIZE(x) ((x)-vl_len 2) The 1-or-4-byte version is a lot harder, but also will be used in a lot fewer places, all of which will get looked at when it gets installed. I'm prepared to put up with multiple eval for that. I *don't* want to assume that existing code can tolerate multiple eval in a macro that has existed forever and never did it before. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Variable length varlena headers redux
On Tue, Feb 13, 2007 at 01:32:11PM -0500, Bruce Momjian wrote: Gregory Stark wrote: Alternatively, what does the trailing a in varlena signify? Would this be varlenb? attribute -- Actually varlena stands for variable length array. elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Variable length varlena headers redux
I want to register, just in principle, that I object to changing the structure of a varlena. The idea behind the data type is simple, clean and fast. And it is easily understood by developers and by people developing applications and functions in PostgreSQL. Of course you will do what you will. Be careful out there. elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Variable length varlena headers redux
elein [EMAIL PROTECTED] writes: I want to register, just in principle, that I object to changing the structure of a varlena. Indeed, I'm doing my best to restrain Greg from changing what a datatype-specific function will see by default. The stuff with variable-length headers should be confined to a relatively small number of functions in which we're willing to pay a code-ugliness penalty for speed. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XML regression test failure
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: sponge failed the XML regression test once today: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spongedt=2007-02-14%2007:30:02 If i read that correctly we could just add a simple ORDER BY table_name to that query to get a more reliable result. Done. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings