Re: [HACKERS] Foreign keys
First of all i'de like to apolagize cause my poor english. After this, i shuould say that i beleavee a year ago i brought this problem to the community but i donn't remember some answering about it. The problem is: Every time a users misses a external refrenced key the PGSql raises an exception. Well as far as i realise if we had 5 or 10 Foreign keys during an Insert/Update transaction only exception should be raised reporting all erros/messages after last external refrenced field missed at one time,not one by one. Well, in order to implement this idea we will need to desable the built-in refencial integrety and build it all by your self- all the validation (look-ups etc..) before insert/update If tg_op='insert' or tg_op='update' then as people do with non relational Databases - all hand-made. Well, this is very hard to beleave!!! I must be missing something. Please i'must be wrong can some one explain me what i'm missing? When there is a constraint violation, the current transaction is rolled back anyhow. What's the purpose of letting you insert 1000 records, then, at the end say: hah, all is rolled back becauase the 2nd record was invalid. PG justly throws the exception immediately to let you know it's futile inserting 998 more records. I don't see a problem here. Bye, Chris. -- Chris Mair http://www.1006.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Foreign keys
MAR - Secretariado Geral wrote: Hi everybody, First of all i'de like to apolagize cause my poor english. After this, i shuould say that i beleavee a year ago i brought this problem to the community but i donn't remember some answering about it. The problem is: Every time a users misses a external refrenced key the PGSql raises an exception. Well as far as i realise if we had 5 or 10 Foreign keys during an Insert/Update transaction only exception should be raised reporting all erros/messages after last external refrenced field missed at one time,not one by one. Well, in order to implement this idea we will need to desable the built-in refencial integrety and build it all by your self- all the validation (look-ups etc..) before insert/update If tg_op='insert' or tg_op='update' then as people do with non relational Databases - all hand-made. Well, this is very hard to beleave!!! I must be missing something. Please i'must be wrong can some one explain me what i'm missing? I'm not sure what you are complining about exactly but maybe you want to declare your FK as DEFERRABLE INITIALLY DEFERRED ? That way the constraint checking happens at the end of the transaction and not immediately Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Foreign keys
Chris Mair [EMAIL PROTECTED] writes: What's the purpose of letting you insert 1000 records, then, at the end say: hah, all is rolled back becauase the 2nd record was invalid. PG justly throws the exception immediately to let you know it's futile inserting 998 more records. Well there's plenty of cases where people want that and we support it with deferred constraints. However the OP sounds like he wants something else. I think what he wants is when he inserts a record and it fails due to foreign key constraints to report all the violated constraints, not just the first one found. I never run into this problem myself because I think of foreign key constraints as more akin to C assertions. They're a backstop to make sure the application is working correctly. I never write code that expects foreign key constraint errors and tries to handle them. But there's nothing saying that's the only approach. The feature request seems pretty reasonable to me. I'm not sure how hard it would be with the ri triggers as written. I'm not sure there's anywhere for triggers to store their return values so I'm unclear this can even be done using triggers. But to answer his original question: yes that's the way Postgres works and if you want to report all the violations together you'll have to check them yourself. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Foreign keys
On Sun, 10 Sep 2006, Gregory Stark wrote: Chris Mair [EMAIL PROTECTED] writes: What's the purpose of letting you insert 1000 records, then, at the end say: hah, all is rolled back becauase the 2nd record was invalid. PG justly throws the exception immediately to let you know it's futile inserting 998 more records. Well there's plenty of cases where people want that and we support it with deferred constraints. However the OP sounds like he wants something else. I think what he wants is when he inserts a record and it fails due to foreign key constraints to report all the violated constraints, not just the first one found. I never run into this problem myself because I think of foreign key constraints as more akin to C assertions. They're a backstop to make sure the application is working correctly. I never write code that expects foreign key constraint errors and tries to handle them. But there's nothing saying that's the only approach. The feature request seems pretty reasonable to me. I'm not sure how hard it would be with the ri triggers as written. I'm not sure there's anywhere for triggers to store their return values so I'm unclear this can even be done using triggers. I think if we were going to do this that all the constraint violations for unique, not null, check and foreign keys should be handled similarly, so we'd probably want something more general than just a way for the ri triggers to do this. I don't have a good idea of the right solution for that though. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Foreign keys
Stephan Szabo [EMAIL PROTECTED] writes: I think if we were going to do this that all the constraint violations for unique, not null, check and foreign keys should be handled similarly, so we'd probably want something more general than just a way for the ri triggers to do this. I don't have a good idea of the right solution for that though. It seems pretty unwieldy to me: it's not hard to imagine a long INSERT throwing millions of separate foreign-key errors before it's done, for instance. And then there's the cascading-errors problem, ie, bogus reports that happen because some prior step failed ... not least being your client crashing and failing to tell you anything about what happened because it ran out of memory for the error list. My advice is to rethink the client code that wants such a behavior. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module
Tom, I've checked the version in the cvs and I had made significant changes from that version. Hm, it sounds like I guessed wrong about which version was newer ... is there something flaky about your machine's system clock? The file timestamps in the two tarballs definitely pointed the other way. Nope, not a system clock problem, not that I know of... I don't know what the problem was... anyway, with the patch I made it should now be fully functional. I fixed some major bugs that prevented some ISBN numbers from working. I've merged the changes made for the cvs with my own, hoping I didn't miss anything that was already fixed in the cvs... I noticed you changed some log messages and that you added GET_STR and GET_TEXT. indeed there was that thing that took me some time to figure in both ean13_cast_to_text() and isn_cast_to_text() functions, not changing to a valid TEXT the return value; but I did not know about the other *_cast_from_text() functions having the problem of the missing GET_STR() (I was using VARDATA()). Yeah, your to/from text functions simply did not work --- the VARDATA hack for example presumes that the data part of a text value is null-terminated, which it isn't. In any case it seems pretty likely that the content of TEXT datums will change someday to allow better charset/collation support, so you really are better off calling textin/textout if possible rather than assuming you know the representation of type TEXT. I copied the GET_STR/GET_TEXT macros from some other contrib module ... you're free to do it differently if you want, but in any case you have to deal with the fact that TEXT is *not* a C-string. Yeah, that's what I painfully learned, specially with the *_cast_to_text() functions... those were definitely crashing the server. I didn't know the VARDATA() hack could fail at some point (even in the patch, I already replaced it with GET_STR(), just in case)... and since it took me so long to figure out why the other two functions where crashing, and not knowing what exactly was that you made with the GET_TEXT() macro, I just decided to use my own patched version... I suppose the use of GET_TEXT() is better, as it's smaller and makes the code easier to follow; perhaps we might want to change those two functions back the way you had them (using GET_TEXT()) As for the log message changes, there's room for debate about that, but the way you had it struck me as far too chatty. The use-case for weak checking is that you have a large pile of data you don't want to validate right now, correct? So why would you want a warning for every single bad datum during the import? Also, all the fancy formatting in that one big warning was directly against our message style guidelines. That's right, that's the idea behind weak mode... I've been in that position and I found the weak mode very useful, so I thought it would be nice to have it as a feature that others could benefit from. And yes, I know the messages were far too chatty, as you rightfully said so. In my latest version (before the patch) I took away some of the messages, and I also shortened the *big* warning to a single lineĀ but I think not having so many messages it's truly better; I wasn't sure when the messages were shown (this is my first module, you know) but it seems those removed messages were really not needed at all. I'm very happy to have contributed a bit to the growth and improvement of PostgreSQL, and I look forward the day I can make bigger/better contributions. And Tom, were you able to apply the patch without problems ('cause this is one of the first times I post a patch in a mailing list.) Regards, Kronuz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Foreign keys
Tom Lane [EMAIL PROTECTED] writes: Stephan Szabo [EMAIL PROTECTED] writes: I think if we were going to do this that all the constraint violations for unique, not null, check and foreign keys should be handled similarly, so we'd probably want something more general than just a way for the ri triggers to do this. I don't have a good idea of the right solution for that though. It seems pretty unwieldy to me: it's not hard to imagine a long INSERT throwing millions of separate foreign-key errors before it's done, for instance. And then there's the cascading-errors problem, ie, bogus reports that happen because some prior step failed ... not least being your client crashing and failing to tell you anything about what happened because it ran out of memory for the error list. My advice is to rethink the client code that wants such a behavior. Well you're still talking about the case of multiple queries deferring all constraint checks to the end of the transaction. I'm not sure what the original poster had in mind but that's not how I read it and it wasn't what I was speculating about. I was only thinking of situations like: INSERT INTO TABLE USER (name, department, zipcode) VALUES ('Tom', 0, '0'); We'll fail with an error like violates foreign key constraint user_department_fkey. It won't say anything about the zipcode also being invalid. I sure hate UIs that give you one error at a time so you have to keep fixing one problem, clicking ok again, only to have yet another error pop up, rinse, lather, repeat until you finally get all the problems sorted out. Now I've never actually run into this because as I mention I always treated the database constraints as assertion checks independent of the application which usually enforces stricter conditions anyways. But I could see someone arguing that having two independent sets of code implementing the same set of conditions is poor. In any case the same logic that leads to it being desirable to report all the errors to the user in a UI and not just report them one by one also applies to the database. I'm not sure it's the most important issue in the world, but it does seem like a it would be nice feature if it reported all the errors in the statement, not just the first one it finds. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Foreign keys
In any case the same logic that leads to it being desirable to report all the errors to the user in a UI and not just report them one by one also applies to the database. I'm not sure it's the most important issue in the world, but it does seem like a it would be nice feature if it reported all the errors in the statement, not just the first one it finds. Seems kind of extraneous to me. I am guessing it would cause yet further overhead with our foreign key checks. My testing shows that the use of foreign keys on high velocity single transaction loads, can cause easily a 50% reduction in performance. Why add to that? What we need to be doing is finding a way to decrease the impact of foreign key checks. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Foreign keys
On Sun, 10 Sep 2006, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Stephan Szabo [EMAIL PROTECTED] writes: I think if we were going to do this that all the constraint violations for unique, not null, check and foreign keys should be handled similarly, so we'd probably want something more general than just a way for the ri triggers to do this. I don't have a good idea of the right solution for that though. It seems pretty unwieldy to me: it's not hard to imagine a long INSERT throwing millions of separate foreign-key errors before it's done, for instance. And then there's the cascading-errors problem, ie, bogus reports that happen because some prior step failed ... not least being your client crashing and failing to tell you anything about what happened because it ran out of memory for the error list. My advice is to rethink the client code that wants such a behavior. Well you're still talking about the case of multiple queries deferring all constraint checks to the end of the transaction. Well, or insert ... select or update or delete. Most deferred conditions can happen within one statement as well. In any case the same logic that leads to it being desirable to report all the errors to the user in a UI and not just report them one by one also applies to the database. I'm not sure it's the most important issue in the world, but it does seem like a it would be nice feature if it reported all the errors in the statement, not just the first one it finds. SQL seems to have a notion of setting the size of the diagnostics area for a transaction to hold a number of conditions. There are a few odd bits, for example it's mostly unordered, but the sqlcode returned must match to the first condition and we presumably want to make sure that if there are any errors that we return an exception sql code not a completion one. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] contrib uninstall scripts need some love
Awhile back, someone went around and made sure that all the contrib modules had uninstall scripts, but most of the new contributions since then don't have 'em: adminpack hstore pg_freespacemap pgrowlocks sslinfo all seem to need one. Also, I tested a few of the existing uninstall scripts in passing while fixing the contains/contained mess, and two out of four were broken (failed to remove everything installed by the module, as shown by getting errors if you then try to reinstall it). Seems like this area needs more attention ... anyone want to work on it? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module
Jeremy Kronuz [EMAIL PROTECTED] writes: ... Tom, were you able to apply the patch without problems ('cause this is one of the first times I post a patch in a mailing list.) Um, no, doesn't work at all unfortunately: $ patch --dry ~/ean.patch patching file isn.c patch: malformed patch at line 6: enum isn_type { INVALID, ANY, EAN13, ISBN, ISMN, ISSN, UPC }; $ It looks like the patch got whitespace-mangled in transit. With many mail programs it works better to add a patch as an attachment than to try to include it directly in the message text. Also, we usually ask for diff -c rather than diff -u format ... I for one find it easier to read. Please sync the text-conversion issues and resubmit. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable at compile time, right? Do we use less padding in the layout for tables on Intel-based servers? If not, could we? I would be particularly interested in the creation of a 24-bit integer if it could pack into only three bytes. (If the layout forces an extra byte of padding per integer, the advantage is lost.) For argument sake, if I created a contrib extension called int3 which stored 24-bit integers, in the int3.source file I could write: CREATE TYPE int3 ( internallength = 3, input = int3_in, output = int3_out, alignment = ALIGNMENT ); And then have sed replace ALIGNMENT with either char or int4 depending on the architecture. Is there a reason this wouldn't work? For the example schema which started this thread, a contrib extension for ascii fields could be written, with types like ascii1, ascii2, ascii3, and ascii4, each with implicit upcasts to text. A contrib for int1 and uint1 could be written to store single byte integers in a single byte, performing math on them correctly, etc. mark The only way we could pack stuff without alignment is to go over to the idea that memory and disk representations are different --- where in this case the conversion might just be a memcpy to a known-aligned location. The performance costs of that seem pretty daunting, however, especially when you reflect that simply stepping over a varlena field would require memcpy'ing its length word to someplace. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] contrib uninstall scripts need some love
Tom Lane wrote: Awhile back, someone went around and made sure that all the contrib modules had uninstall scripts, but most of the new contributions since then don't have 'em: adminpack hstore pg_freespacemap pgrowlocks sslinfo all seem to need one. Also, I tested a few of the existing uninstall scripts in passing while fixing the contains/contained mess, and two out of four were broken (failed to remove everything installed by the module, as shown by getting errors if you then try to reinstall it). Seems like this area needs more attention ... anyone want to work on it? I'll take it. How long do I have? Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module
... Tom, were you able to apply the patch without problems ('cause this is one of the first times I post a patch in a mailing list.) Um, no, doesn't work at all unfortunately: $ patch --dry ~/ean.patch patching file isn.c patch: malformed patch at line 6: enum isn_type { INVALID, ANY, EAN13, ISBN, ISMN, ISSN, UPC }; $ It looks like the patch got whitespace-mangled in transit. With many mail programs it works better to add a patch as an attachment than to try to include it directly in the message text. Also, we usually ask for diff -c rather than diff -u format ... I for one find it easier to read. Please sync the text-conversion issues and resubmit. Tom, I'm attaching the new patch now... it's now using the GET_TEXT(), I've updated the README and I've used the -c format as you suggested. Please, let me know how it goes for you. Regards, Kronuz. isn2c.diff Description: Binary data ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable at compile time, right? Do we use less padding in the layout for tables on Intel-based servers? If not, could we? Intel CPUs may not complain about unaligned reads, they're still inefficient. Internally it does two aligned reads and rearranges the bytes. On other architechtures the OS can emulate that but postgres doesn't use that for obvious reasons. For the example schema which started this thread, a contrib extension for ascii fields could be written, with types like ascii1, ascii2, ascii3, and ascii4, each with implicit upcasts to text. A contrib for int1 and uint1 could be written to store single byte integers in a single byte, performing math on them correctly, etc. The problem is that for each of those ascii types, to actually use them they would have to be converted, which would amount to allocating some memory, copying and adding a length header. At some point you have to wonder whether you're actually saving anything. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] contrib uninstall scripts need some love
On Sun, Sep 10, 2006 at 12:09:24PM -0700, Joshua D. Drake wrote: Tom Lane wrote: Awhile back, someone went around and made sure that all the contrib modules had uninstall scripts, but most of the new contributions since then don't have 'em: adminpack hstore pg_freespacemap pgrowlocks sslinfo all seem to need one. Also, I tested a few of the existing uninstall scripts in passing while fixing the contains/contained mess, and two out of four were broken (failed to remove everything installed by the module, as shown by getting errors if you then try to reinstall it). Seems like this area needs more attention ... anyone want to work on it? I'll take it. How long do I have? For reference, here are some comments about the cleanup work I did a while back: http://archives.postgresql.org/pgsql-patches/2006-03/msg00163.php -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] contrib uninstall scripts need some love
Joshua D. Drake [EMAIL PROTECTED] writes: Tom Lane wrote: Seems like this area needs more attention ... anyone want to work on it? I'll take it. How long do I have? Since it's contrib, I don't think we need to hold you to being done before beta1. But the sooner the better of course. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout wrote: On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable at compile time, right? Do we use less padding in the layout for tables on Intel-based servers? If not, could we? Intel CPUs may not complain about unaligned reads, they're still inefficient. Internally it does two aligned reads and rearranges the bytes. On other architechtures the OS can emulate that but postgres doesn't use that for obvious reasons. This gets back to the CPU vs. I/O bound issue, right? Might not some people (with heavily taxed disks but lightly taxed CPU) prefer that trade-off? For the example schema which started this thread, a contrib extension for ascii fields could be written, with types like ascii1, ascii2, ascii3, and ascii4, each with implicit upcasts to text. A contrib for int1 and uint1 could be written to store single byte integers in a single byte, performing math on them correctly, etc. The problem is that for each of those ascii types, to actually use them they would have to be converted, which would amount to allocating some memory, copying and adding a length header. At some point you have to wonder whether you're actually saving anything. Have a nice day, I'm not sure what you mean by actually use them. The types could have their own comparator operators. So you could use them for sorting and indexing, and use them in WHERE clauses with these comparisons without any conversion to/from text. I mentioned implicit upcasts to text merely to handle other cases, such as using them in a LIKE or ILIKE, or concatenation, etc., where the work of providing this functionality for each contrib datatype would not really be justified. I'm not personally as interested in the aforementioned ascii types as I am in the int1 and int3 types, but the argument in favor of each is about the same. If a person has a large table made of small data, it seems really nuts to have 150% - 400% bloat on that table, when such a small amount of work is needed to write the contrib datatypes necessary to store the data compactly. The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = text, ascii3 = text, etc.) Operations on data of differing type can be done in the canonical type, but the common case for many users would be operations between data of the same type, for which no conversion is required. Am I missing something that would prevent this approach from working? I am seriously considering writing these contrib datatypes for use either on pgfoundary or the contrib/ subdirectory for the 8.3 release, but am looking for advice if I am really off-base. Thanks, mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module
Jeremy Kronuz [EMAIL PROTECTED] writes: Tom, I'm attaching the new patch now... it's now using the GET_TEXT(), I've updated the README and I've used the -c format as you suggested. Please, let me know how it goes for you. Looks great, applied. If you are interested, consider working up a few regression tests to see if there are any portability issues. I don't see anything obviously dangerous here, but you never know. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] contrib uninstall scripts need some love
Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Tom Lane wrote: Seems like this area needs more attention ... anyone want to work on it? I'll take it. How long do I have? Since it's contrib, I don't think we need to hold you to being done before beta1. But the sooner the better of course. O.k., I will start working through it and report at the end of the week. Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = text, ascii3 = text, etc.) This would work all right for the string-category cases, since TEXT is the only thing you really care about having them cast to anyway. It probably won't work all that well for int1/int3, because you really want them to coerce implicitly to all the wider numeric types. Otherwise, perfectly sane queries like int8 + int1 fail. Part of the issue here is that we deliberately keep the parser from searching for multi-step coercions. So for example if you only provide int1-int2 then the existence of up-casts from int2 doesn't help you use an int1 with anything except int2. I am not sure whether any problems would be created if you did provide the full spectrum of up-casts. I remember having argued that there would be problems with trying to invent uint2/uint4 types, but that was a very long time ago, before we had pg_cast and some other changes in the type resolution rules. With the current system it might work OK. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] contrib uninstall scripts need some love
Let me know if you need an extra pair of eyes. G.- On 9/10/06, Joshua D. Drake [EMAIL PROTECTED] wrote: Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Tom Lane wrote: Seems like this area needs more attention ... anyone want to work on it? I'll take it. How long do I have? Since it's contrib, I don't think we need to hold you to being done before beta1. But the sooner the better of course. O.k., I will start working through it and report at the end of the week. Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Guido Barosio --- http://www.globant.com [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
Added to TODO: * Consider ways of storing rows more compactly on disk o Store disk pages with no alignment/padding? o Reorder physical storage order to reduce padding? o Support a smaller header for short variable-length fields? o Reduce the row header size? --- Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I think it would be good to see if we can extend the varlena data types to support a shorter header for storing short byte values. Looking at the header now we have: This isn't the first time we've been down that route. There were some extensive discussions a while back. I think there were even patches. I don't remember why it was eventually rejected. I suspect it simply got too complex. But I think this is a dead-end route. What you're looking at is the number 1 repeated for *every* record in the table. And what your proposing amounts to noticing that the number 4 fits in a byte and doesn't need a whole word to store it. Well sure, but you don't even need a byte if it's going to be the same for every record in the table. If someone popped up on the list asking about whether Postgres compressed their data efficiently if they stored a column that was identical throughout the whole table you would tell them to normalize their data. -- greg -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = text, ascii3 = text, etc.) This would work all right for the string-category cases, since TEXT is the only thing you really care about having them cast to anyway. It probably won't work all that well for int1/int3, because you really want them to coerce implicitly to all the wider numeric types. Otherwise, perfectly sane queries like int8 + int1 fail. Part of the issue here is that we deliberately keep the parser from searching for multi-step coercions. So for example if you only provide int1-int2 then the existence of up-casts from int2 doesn't help you use an int1 with anything except int2. I am not sure whether any problems would be created if you did provide the full spectrum of up-casts. I remember having argued that there would be problems with trying to invent uint2/uint4 types, but that was a very long time ago, before we had pg_cast and some other changes in the type resolution rules. With the current system it might work OK. regards, tom lane Thanks Tom, I will try this then. I won't be proposing to ever put this in core, as the increased code size isn't justified for people who aren't using these types (IMHO). Any further feedback on why this wouldn't work is appreciated, as it might save me some time learning on my own. But otherwise I'll post back in a few days when this is finished. mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
Bruce Momjian [EMAIL PROTECTED] writes: * Consider ways of storing rows more compactly on disk o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that already today: see TOAST. It strikes me that it might be useful to think about solving the problem with a second generation toast mechanism. The first generation only worried about storing large values, but the second generation would also address the problem of storing small values efficiently. Or you could think about it as a second generation varlena. This mindset would make for a slightly different set of choices about where the work gets done. I'm not sure which is better. Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... (b) alignment either one or two bytes, not four. TEXT would be perfectly happy with 1-byte alignment, but for NUMERIC we might want 2. I'm inclined to bag the idea of storing the length words separately from the data proper. Although it probably would make for some marginal gain in cache efficiency, I don't see any reasonable way at all to fit it into the current system structure, whereas either the toast or next gen varlena approaches seem fairly straightforward. And having to track an additional pointer inside the inner loops of heap_form_tuple and heap_deform_tuple could eat up any performance gain anyway. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [COMMITTERS] pgsql: Back out patch added during 8.2.X development: Allow to_char()
[EMAIL PROTECTED] (Bruce Momjian) writes: Back out patch added during 8.2.X development: Allow to_char() D format specifiers for interval/time. It doesn't work, and I doubt it is useful enough to fix (D = day of week). Hm, shouldn't there be a documentation update to go with that? Or was it never documented anyway :-( regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] contrib uninstall scripts need some love
On Sun, Sep 10, 2006 at 01:50:39PM -0700, Joshua D. Drake wrote: O.k., I will start working through it and report at the end of the week. I spent a few minutes doing the same tests I did a few months ago and found problems with dblink and ltree; I'll submit patches for those. Tom, do you recall which modules gave you trouble? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] contrib uninstall scripts need some love
Michael Fuhr [EMAIL PROTECTED] writes: I spent a few minutes doing the same tests I did a few months ago and found problems with dblink and ltree; I'll submit patches for those. Tom, do you recall which modules gave you trouble? Um ... hstore and tsearch2, I think. I fixed those, but was thinking that other stuff might crawl out from under similar rocks. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: * Consider ways of storing rows more compactly on disk o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that already today: see TOAST. It strikes me that it might be useful to think about solving the problem with a second generation toast mechanism. The first generation only worried about storing large values, but the second generation would also address the problem of storing small values efficiently. Or you could think about it as a second generation varlena. This mindset would make for a slightly different set of choices about where the work gets done. I'm not sure which is better. Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could compressed inline values exceed two bytes? (b) alignment either one or two bytes, not four. TEXT would be perfectly happy with 1-byte alignment, but for NUMERIC we might want 2. I'm inclined to bag the idea of storing the length words separately from the data proper. Although it probably would make for some marginal gain in cache efficiency, I don't see any reasonable way at all to fit it into the current system structure, whereas either the toast or next gen varlena approaches seem fairly straightforward. And having to track an additional pointer inside the inner loops of heap_form_tuple and heap_deform_tuple could eat up any performance gain anyway. Good point. How do we do it now? I assume we store just the fixed-size toast pointer length in the heap attribute, not the toast length. Why haven't we investigated shrinking the varlena header before? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] contrib/xml2 and PG_MODULE_MAGIC
contrib/xml2 currently has PG_MODULE_MAGIC in xslt_proc.c, which results in a runtime error on systems that built the module without support for libxslt per the comments in the Makefile. Should PG_MODULE_MAGIC be in xpath.c instead? -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: Back out patch added during
Tom Lane wrote: [EMAIL PROTECTED] (Bruce Momjian) writes: Back out patch added during 8.2.X development: Allow to_char() D format specifiers for interval/time. It doesn't work, and I doubt it is useful enough to fix (D = day of week). Hm, shouldn't there be a documentation update to go with that? Or was it never documented anyway :-( Not documented. Using D for an interval/time is more of a corner case / don't do that. ;-) -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] contrib uninstall scripts need some love
On Sun, Sep 10, 2006 at 07:38:24PM -0400, Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: I spent a few minutes doing the same tests I did a few months ago and found problems with dblink and ltree; I'll submit patches for those. Tom, do you recall which modules gave you trouble? Um ... hstore and tsearch2, I think. I fixed those, but was thinking that other stuff might crawl out from under similar rocks. I don't see an uninstall script for hstore; that one needs to be written. The latest untsearch2.sql looks good in my tests. Should untsearch2.sql be renamed to uninstall_tsearch2.sql to be consistent with all the other uninstall scripts? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Lock partitions
I see this in the CVS commits for 8.2. Did we determine the proper number of lock partitions? Should it be based on the number of buffers or concurrent sessions allowed? Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could compressed inline values exceed two bytes? After expansion, perhaps, but it's the on-disk footprint that concerns us here. I thought a bit more about this and came up with a zeroth-order sketch: The length word for an on-disk datum could be either 1 or 2 bytes; in the 2-byte case we'd need to be prepared to fetch the bytes separately to avoid alignment issues. The high bits of the first byte say what's up: * First two bits 00: 2-byte length word, uncompressed inline data follows. This allows a maximum on-disk size of 16K for an uncompressed datum, so we lose nothing at all for standard-size disk pages and not much for 32K pages (remember the toaster will try to compress any tuple exceeding 1/4 page anyway ... this just makes it mandatory). * First two bits 01: 2-byte length word, compressed inline data follows. Again, hard limit of 16K, so if your data exceeds that you have to push it out to the toast table. Again, this policy costs zero for standard size disk pages and not much for 32K pages. * First two bits 10: 1-byte length word, zero to 62 bytes of uncompressed inline data follows. This is the case that wins for short values. * First two bits 11: 1-byte length word, pointer to out-of-line toast data follows. We may as well let the low 6 bits of the length word be the size of the toast pointer, same as it works now. Since the toast pointer is not guaranteed aligned anymore, we'd have to memcpy it somewhere before using it ... but compared to the other costs of fetching a toast value, that's surely down in the noise. The distinction between compressed and uncompressed toast data would need to be indicated in the body of the toast pointer, not in the length word as today, but nobody outside of tuptoaster.c would care. Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means 2-byte length word, high bit 1 means 1-byte. It doesn't care whether the data is compressed or toasted, same as today. There are other ways we could divvy up the bit assignments of course. The main issue is keeping track of whether any given Datum is in this compressed-for-disk format or in the uncompressed 4-byte-length-word format. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] contrib uninstall scripts need some love
Michael Fuhr [EMAIL PROTECTED] writes: Should untsearch2.sql be renamed to uninstall_tsearch2.sql to be consistent with all the other uninstall scripts? Yeah, that and lo_drop are outliers that probably ought to be renamed. Offhand I don't see any serious compatibility objection --- who does module removals except by hand? --- but if anyone sees a problem speak now ... I'm also wondering why the heck contrib/lo installs lo_test.sql. That ought to be a regression test anyway, and we don't install regression scripts. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Lock partitions
Bruce Momjian [EMAIL PROTECTED] writes: I see this in the CVS commits for 8.2. Did we determine the proper number of lock partitions? Should it be based on the number of buffers or concurrent sessions allowed? No. NUM_LOCK_PARTITIONS needs to be a compile-time constant for a number of reasons, and there is absolutely zero evidence to justify making any effort (and spending any cycles) on a variable value. It would be nice to see some results from the OSDL tests with, say, 4, 8, and 16 lock partitions before we forget about the point though. Anybody know whether OSDL is in a position to run tests for us? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Foreign keys
Joshua D. Drake wrote: In any case the same logic that leads to it being desirable to report all the errors to the user in a UI and not just report them one by one also applies to the database. I'm not sure it's the most important issue in the world, but it does seem like a it would be nice feature if it reported all the errors in the statement, not just the first one it finds. Seems kind of extraneous to me. I am guessing it would cause yet further overhead with our foreign key checks. But in this case, it would be (or should be) overhead only in the case of failure. In the case of success, all the constraints are checked anyway -- they just succeed. I would expect that the number of applications for which a constraint violation is the norm and not the exception is very small. But Tom's concern is a valid one. I expect a reasonable compromise would be to record and show the errors for only the non-deferred constraints in the currently executing statement, because after that point the transaction is in an error state anyway and thus can't continue without a rollback to a savepoint. It probably wouldn't make sense to evaluate the deferred constraints within the erroring statement anyway -- they're deferred, which by definition means they don't get evaluated until commit, so evaluating them at failure time could easily show errors that are only there because subsequent statements never got executed. As for the deferred constraints, it might be reasonable to show errors only up to some limit (controlled by a GUC, perhaps), with the default limit being 1, which is what we have now. Otherwise you run the risk of throwing millions of errors, which is surely not desirable. The downside to this is until you've hit the limit, you have to evaluate *all* the deferred constraints, which could take a while, whereas the current setup will return immediately upon encountering the first constraint error. My testing shows that the use of foreign keys on high velocity single transaction loads, can cause easily a 50% reduction in performance. Why add to that? What we need to be doing is finding a way to decrease the impact of foreign key checks. I definitely agree here, but this should be independent of how foreign key failures are handled once they're detected. In other words, what you're experiencing is the perfomance hit that comes from evaluating the constraints, not from reporting the errors afterwards. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
Tom Lane [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could compressed inline values exceed two bytes? After expansion, perhaps, but it's the on-disk footprint that concerns us here. I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for the uncompressed and/or detoasted data. In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted to larger than 16k? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could compressed inline values exceed two bytes? After expansion, perhaps, but it's the on-disk footprint that concerns us here. I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for the uncompressed and/or detoasted data. Well, we have to expand the TOAST anyway in memory, so when we do that we already give it the right length header. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
Gregory Stark [EMAIL PROTECTED] writes: I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for the uncompressed and/or detoasted data. In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted to larger than 16k? I'm imagining that it would give you the same old uncompressed in-memory representation as it does now, ie, 4-byte length word and uncompressed data. The weak spot of the scheme is that it assumes different, incompatible in-memory and on-disk representations. This seems to require either (a) coercing values to in-memory form before they ever get handed to any datatype manipulation function, or (b) thinking of some magic way to pass out-of-band info about the contents of the datum. (b) is the same stumbling block we have in connection with making typmod available to datatype manipulation functions. I don't want to reject (b) entirely, but it seems to require some pretty major structural changes. OTOH (a) is not very pleasant either, and so what would be nice is if we could tell by inspection of the Datum alone which format it's in. After further thought I have an alternate proposal that does that, but it's got its own disadvantage: it requires storing uncompressed 4-byte length words in big-endian byte order everywhere. This might be a showstopper (does anyone know the cost of ntohl() on modern Intel CPUs?), but if it's not then I see things working like this: * If high order bit of datum's first byte is 0, then it's an uncompressed datum in what's essentially the same as our current in-memory format except that the 4-byte length word must be big-endian (to ensure that the leading bit can be kept zero). In particular this format will be aligned on 4- or 8-byte boundary as called for by the datatype definition. * If high order bit of first byte is 1, then it's some compressed variant. I'd propose divvying up the code space like this: * 0xxx uncompressed 4-byte length word as stated above * 10xx 1-byte length word, up to 62 bytes of data * 110x 2-byte length word, uncompressed inline data * 1110 2-byte length word, compressed inline data * 1-byte length word, out-of-line TOAST pointer This limits us to 8K uncompressed or 4K compressed inline data without toasting, which is slightly annoying but probably still an insignificant limitation. It also means more distinct cases for the heap_deform_tuple inner loop to think about, which might be a problem. Since the compressed forms would not be aligned to any boundary, there's an important special case here: how can heap_deform_tuple tell whether the next field is compressed or not? The answer is that we'll have to require pad bytes between fields to be zero. (They already are zeroed by heap_form_tuple, but now it'd be a requirement.) So the algorithm for decoding a non-null field is: * if looking at a byte with high bit 0, then we are either on the start of an uncompressed field, or on a pad byte before such a field. Advance to the declared alignment boundary for the datatype, read a 4-byte length word, and proceed. * if looking at a byte with high bit 1, then we are at the start of a compressed field (which will never have any preceding pad bytes). Decode length as per rules above. The good thing about this approach is that it requires zero changes to fundamental system structure. The pack/unpack rules in heap_form_tuple and heap_deform_tuple change a bit, and the mechanics of PG_DETOAST_DATUM change, but a Datum is still just a pointer and you can always tell what you've got by examining the pointed-to data. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: After further thought I have an alternate proposal that does that, but it's got its own disadvantage: it requires storing uncompressed 4-byte length words in big-endian byte order everywhere. This might be a showstopper (does anyone know the cost of ntohl() on modern Intel CPUs?), but if it's not then I see things working like this: * If high order bit of datum's first byte is 0, then it's an uncompressed datum in what's essentially the same as our current in-memory format except that the 4-byte length word must be big-endian (to ensure that the leading bit can be kept zero). In particular this format will be aligned on 4- or 8-byte boundary as called for by the datatype definition. * If high order bit of first byte is 1, then it's some compressed variant. I'd propose divvying up the code space like this: * 0xxx uncompressed 4-byte length word as stated above * 10xx 1-byte length word, up to 62 bytes of data * 110x 2-byte length word, uncompressed inline data * 1110 2-byte length word, compressed inline data * 1-byte length word, out-of-line TOAST pointer Great. I assumed we would have to use a variable-length header, as you described. I don't think ntohl() is going to be a problem. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pgsql: Install a cleaner solution to the AIX libpq linking problem, as
After a long battle with technology, [EMAIL PROTECTED] (Tom Lane), an earthling, wrote: Log Message: --- Install a cleaner solution to the AIX libpq linking problem, as per an earlier discussion. Centralize assumptions about what libpq depends on in one place in Makefile.global. I am unconvinced that this list is complete, but since ecpg seems to have gotten along with just these entries, we'll try it this way and see what happens. Looks good thus far... It'll be hitting the buildfarm shortly... I had been getting edgy about the absence of commits on this; glad to see this. -- (reverse (concatenate 'string moc.liamg @ enworbbc)) http://cbbrowne.com/info/linuxdistributions.html Rules of the Evil Overlord #59. I will never build a sentient computer smarter than I am. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: (does anyone know the cost of ntohl() on modern Intel CPUs?) I wrote a simple test program to determine this: #include arpa/inet.h int main (int argc, char *argv[]) { unsigned long i; uint32_t a; a = 0; for (i = 0 ; i 40L ; ++i) { #ifdef CALL_NTOHL a = ntohl(i); #endif } return a; } I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit mode, another one with the same processor running in 32-bit mode, a a third running a Pentium 4 1.5 GHz processor, and a fourth running a pair of 2.8 GHz Xeons in hyperthreading mode. I compiled the test program on the 32-bit systems with the -std=c9x option so that the constant would be treated as unsigned. Other than that, the compilation method I used was identical: no optimization, since it would skip the loop entirely in the version without the ntohl() call. I compiled it both with and without defining CALL_NTOHL, and measured the difference in billed CPU seconds. Based on the above, on both Athlon 64 systems, each ntohl() invocation and assignment takes 1.04 nanoseconds to complete (I presume the assignment is to a register, but I'd have to examine the assembly to know for sure). On the 1.5 GHz P4 system, each iteration takes 8.49 nanoseconds. And on the 2.8 GHz Xeon system, each iteration takes 5.01 nanoseconds. That seems reasonably fast to me... -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
On Sun, 10 Sep 2006, Kevin Brown wrote: Tom Lane wrote: (does anyone know the cost of ntohl() on modern Intel CPUs?) I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit mode, another one with the same processor running in 32-bit mode, a a third running a Pentium 4 1.5 GHz processor, and a fourth running a pair of 2.8 GHz Xeons in hyperthreading mode. I compiled the test program on the 32-bit systems with the -std=c9x option so that the constant would be treated as unsigned. Other than that, the compilation method I used was identical: no optimization, since it would skip the loop entirely in the version without the ntohl() call. I compiled it both with and without defining CALL_NTOHL, and measured the difference in billed CPU seconds. Based on the above, on both Athlon 64 systems, each ntohl() invocation and assignment takes 1.04 nanoseconds to complete (I presume the assignment is to a register, but I'd have to examine the assembly to know for sure). On the 1.5 GHz P4 system, each iteration takes 8.49 nanoseconds. And on the 2.8 GHz Xeon system, each iteration takes 5.01 nanoseconds. Of course, that depends on the particular OS and variant as well. IIRC, at some point an instruction was added to x86 instruction set to do byte swapping. This is from /usr/include/netinet/in.h on a gentoo linux box with glibc 2.3 #ifdef __OPTIMIZE__ /* We can optimize calls to the conversion functions. Either nothing has to be done or we are using directly the byte-swapping functions which often can be inlined. */ # if __BYTE_ORDER == __BIG_ENDIAN /* The host byte order is the same as network byte order, so these functions are all just identity. */ # define ntohl(x) (x) # define ntohs(x) (x) # define htonl(x) (x) # define htons(x) (x) # else # if __BYTE_ORDER == __LITTLE_ENDIAN # define ntohl(x) __bswap_32 (x) # define ntohs(x) __bswap_16 (x) # define htonl(x) __bswap_32 (x) # define htons(x) __bswap_16 (x) # endif # endif #endif And from bits/byteswap.h /* To swap the bytes in a word the i486 processors and up provide the `bswap' opcode. On i386 we have to use three instructions. */ # if !defined __i486__ !defined __pentium__ !defined __pentiumpro__ \ !defined __pentium4__ # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x);\ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__ (rorw $8, %w0; \ rorl $16, %0; \ rorw $8, %w0\ : =r (__v) \ : 0 (__x) \ : cc); \ __v; })) # else # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x);\ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__ (bswap %0 : =r (__v) : 0 (__x)); \ __v; })) # endif /me searches around his hard drive for the ia32 developers reference BSWAP Opcode Instruction Description 0F C8+rdBSWAP r32 Reverse the byte order of a 32-bit register ... The BSWAP instruction is not supported on IA-32 processors earlier than the Intel486 processor family. ... I have read some odd stuff about instructions like these. Apparently the fact that this is a prefixed instruction (the 0F byte at the beginning) costs an extra clock cycle, so though this instruction should take 1 cycle, it ends up taking 2. I am unclear whether or not this is rectified in later pentium chips. So to answer the question about how much ntohl costs on recent Intel boxes, a properly optimized build with a friendly libc like I quoted should be able to do it in 2 cycles. -- In Ohio, if you ignore an orator on Decoration day to such an extent as to publicly play croquet or pitch horseshoes within one mile of the speaker's stand, you can be fined $25.00. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] contrib uninstall script fixes
Michael Fuhr [EMAIL PROTECTED] writes: The uninstall scripts for dblink and ltree were missing some functions. Applied, thanks. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings